Recommender Systems for scientific publications: The
172
Dissertation zur Erlangung des Doktorgrades der Technischen Fakultät der Albert-Ludwigs-Universität Freiburg Recommending Scientific Publications: Addressing the One-class Problem and Concept Drift vorgelegt von Anas Alzogbi Albert-Ludwigs-Universität Freiburg Technische Fakultät Institut für Informatik 2018
Recommender Systems for scientific publications: The
Recommender Systems for scientific publications: The concept drift
and The implicit ratings problemAlbert-Ludwigs-Universität
Freiburg
vorgelegt von
Anas Alzogbi
Erstgutachter und Betreuer der Arbeit Prof. Dr. Georg Lausen
Albert-Ludwigs-Universität Freiburg
Zweitgutachter Prof. Dr. Dr. Lars Schmidt-Thieme Universität
Hildesheim
Datum der Promotion [22.03.2019]
Abstract
The rapidly increasing number of newly published scientific
publications put scholars and researchers against the challenge of
staying up to date and well informed about new findings in their
domains. Recent studies showed that there are more than one hundred
thousand new papers in computer science are published each year,
and there are three times more papers published in 2010 than in
2000. Recommender systems (RS) have lately gained considerable
attention as a powerful tool for pro- viding personalized
scientific paper recommendations. Owing to their capability of
modeling the user’s interests and exploring the online archives for
relevant papers, RS are the natural fit for this scenario. Various
approaches from existing RS meth- ods have been explored and
successfully applied for scientific paper recommendation ranging
from Content-Based Filtering (CBF) to Collaborative Filtering (CF)
and hybrid approaches. In this thesis, we investigate suitable
approaches for recommending scientific pub- lications. Among the
challenges that face recommender systems for scientific pub-
lications is the high sparsity in users-items relation driven by
the large number of papers relative to the much lower number of
users, known as the high sparsity in the rating matrix. Because of
this high sparsity, which is a major obstacle against CF methods,
the focus of existing works has been on CBF approaches. Within
these approaches, constructing user’s profile is mainly achieved
based on memory-based or heuristic-based methods. Memory-based
methods construct the user profile by applying an aggregation
function over the feature vectors of relevant papers. Such methods
depend strongly on the underlying assumption related to the
employed aggregation method. On the contrary, model-based
approaches, since they rely on a learning algorithm, have the
potential to build more representative user models. However,
model-based approaches didn’t gain much attention as a solution for
scien- tific publication recommendation due to the following
problems: (a) the low number of ratings available per user, which
corresponds to low number of training instances; (b) the available
ratings are positive-only ratings, which gives rise to the
one-class problem; and (c) the high cost related to training and
maintaining a separate model for each user in the system. On the
other hand, the temporal aspect of the system adds extra challenges
con- sidering the drifting interests of the users. Users’ interests
change over time, as a result, old ratings do not hold the same
importance for the recommender system as the recent ones.
Therefore, not all available ratings (which are already few) can be
beneficial for learning the recommendation model. This aspect adds
more compli- cations to the previously mentioned challenges.
In this thesis, we focus on these issues, which we summarize in the
following two challenges: the one-class problem and the concept
drift in user’s interest. Based on our survey on the related
literature, we found out that these issues didn’t gain enough
attention among the works that developed recommender systems for
scien- tific recommendations. Thus, we investigate in this thesis
the adaptation of different ideas from several domains such as
machine learning and information retrieval to design useful
recommender systems for scientific publications. The contribution
of this work can be grouped into four parts:
• First, we introduce a literature survey exploring the latest
related works. The goal of this survey is twofold: first, to
identify successful and promising ex- isting recommendation
approaches for scientific publications and second, to investigate
which works addressed our targeted challenges in this thesis.
• Second, we address the one-class problem and present two
model-based content- based recommendation approaches. In the first
solution, we model the prob- lem as a linear regression model and
train a supervised model for each user. In the second solution, we
investigate the application of pairwise preference learning for
content-based filtering and present an approach based on pairwise
learning-to-rank.
• Third, we address the efficiency problem of our model-based
recommender and present a system design that builds on the widely
known Apache Spark cluster management framework. Our presented
system allows efficient computation of multi-models (one model per
user) on a cluster of machines.
• In the last part of this work, we focus on the concept drift in
user’s interest aspect. We first study the presence of concept
drift in a real-world dataset and then, we present a time-aware
recommender system that accounts for the concept drift in user’s
interest.
Zusammenfassung
Die rasant wachsende Anzahl neu erscheinender wissenschaftlicher
Publikationen stellt Wissenschaftler vor die Herausforderung sich
kontinuierlich über neue Erkennt- nisse in ihrem Forschungsbereich
zu informieren. Neuere Studien zeigten, dass jedes Jahr mehr als
hunderttausend neue Arbeiten in der Informatik veröffentlicht
werden, und zwischen den Jahren 2000 und 2010 stieg die Anzahl
veröffentlicher Arbeiten auf das Dreifache an. Empfehlungssysteme
haben in letzter Zeit viel Aufmerksamkeit als leistungsstarkes
Werkzeug zur Erstellung einer personalisierten Empfehlung für wis-
senschaftliche Arbeiten erlangt. Aufgrund der Fähigkeit, Interessen
von Benutzern zu modellieren und Online-Archieve nach relevanten
Artikeln zu durchsuchen, sind die Empfehlungssysteme die ideale
Lösung für dieses Szenario. Verschiedene An- sätze aus bestehenden
Empfehlungssysteme wurden bereits erforscht und erfolgreich für die
Empfehlung wissenschaftlicher Arbeiten angewendet, die von
Content-Based Filtering (CBF) über Collaborative Filtering (CF) bis
hin zu hybriden Ansätzen reichen. Wir untersuchen in dieser Arbeit
geeignete Ansätze zur Empfehlung wis- senschaftlicher
Publikationen. Die Empfehlungssysteme für wissenschaftliche Pub-
likationen stehen vor mehreren Herausforderungen. Zu diesen
Herausforderungen gehört die hohe Spärlichkeit zwischen Benutzern
und Artikeln, die durch die große Anzahl von Artikeln im Vergleich
zu der viel geringeren Anzahl von Benutzern verur- sacht wird. Ein
Problem, das als die hohe Spärlichkeit in der Bewertungsmatrix
bekannt ist. Da diese hohe Spärlichkeit der Bewertungsmatrix ein
Problem für CF-Methoden darstellt, lag der Schwerpunkt der
bestehenden Arbeiten auf CBF- Ansätzen. Diese Ansätze lösen die
Erstellung von Benutzerprofile hauptsächlich auf Grundlage
speicherbasierter oder heuristischer Methoden. Das speicherbasierte
Ver- fahren konstruiert das Benutzerprofil, indem es eine
Aggregationsfunktion über die Feature-Vektoren relevanter Artikel
anwendet. Solche Methoden hängen stark von der zugrunde liegenden
Annahme in Bezug auf die verwendete Aggregationsmethode ab. Anders
als speicherbasierte Verfahren, haben modellbasierte Ansätze das
Poten- zial repräsentativere Nutzermodelle aufzubauen, da sie auf
einem Lernalgorithmus basieren. Allerdings sie fanden als Lösung
für die Empfehlung wissenschaftlicher Publikationen aufgrund der
folgenden Probleme nicht viel Beachtung: (a) die geringe Anzahl der
verfügbaren Ratings pro Benutzer, was einer geringen Anzahl von
Train- ingsinstanzen entspricht; (b) die verfügbaren Ratings sind
nur positive Ratings, was zu dem Einklassenproblem führt; und (c)
die hohen Kosten für das Lernen und Ak- tualisieren eines
individuellen Modells für jeden Benutzer im System. Andererseits
spiegelt sich der zeitliche Aspekt des Systems deutlich in dem
sich
ständig ändernden Interesse der Nutzer wider. Die Interessen der
Nutzer ändern sich im Laufe der Zeit, so dass alte Ratings für das
Empfehlungssystem nicht die gleiche Bedeutung haben wie die
neusten. Daher können nicht alle verfügbaren Rat- ings (die bereits
wenige sind) für das Erlernen des Empfehlungsmodells verwendet
werden. Dieser Aspekt fügt weitere Komplikationen zu dem zuvor
erwähnten Prob- lem hinzu. In dieser Arbeit beschäftigen wir uns
mit diesen Themen. Wir fassen sie in den folgenden zwei
Herausforderungen zusammen: das Einklassenproblem und die
Konzeptverschiebung im Interesse der Benutzer. Basierend auf
unserer struk- turierten Literaturstudie fanden wir heraus, dass
diese Fragen bei den Arbeiten, die Empfehlungssysteme für
wissenschaftliche Empfehlungen entwickelten, nicht genügend
Beachtung fanden. In Rahmen dieser Arbeit untersuchen wir deshalb
die Anwendung und Erweiterung von Methoden aus verschiedenen
Bereichen wie Maschinelles Lernen und Information Retrieval, um
nützliche Empfehlungssysteme für wissenschaftliche Publikationen zu
entwickeln. Der Beitrag dieser Arbeit lässt sich in vier Teile
gliedern:
• Zuerst stellen wir eine Literaturstudie vor, die die verwandten
Arbeiten unter- sucht. Das Ziel dieser Literaturübersicht ist
Zweifach: Einerseits, erfolgreiche und vielversprechende bestehende
Empfehlungsansätze für wissenschaftliche Publikationen zu
identifizieren und andererseits, zu untersuchen, welche Ar- beiten
sich mit unseren zielgerichteten Herausforderungen befassen.
• Zweitens gehen wir auf das Einklassenproblem ein und präsentieren
zwei mod- ellbasierte inhaltsbasierte Ansätze. In der ersten Lösung
modellieren wir das Problem als lineares Regressionsmodell und
trainieren ein überwachtes Modell für jeden Benutzer. In der
zweiten Lösung untersuchen wir die Anwendung des Pairwise
preference Learning für die inhaltsbasierte Filterung und
präsentieren einen Ansatz, der auf dem Pairwise Learning-to-rank
basiert.
• Drittens gehen wir auf das Effizienzproblem unseres
modellbasierten Empfehlungssystems ein und präsentieren ein
Systemdesign, das auf dem weit verbreiteten Apache Spark Cluster
Management Framework aufbaut. Unser vorgestelltes System ermöglicht
die effiziente Berechnung von Multi-Modellen auf einem Cluster von
Maschinen.
• Im letzten Teil dieser Arbeit konzentrieren wir uns auf das
Problem der Konzeptverschiebung von Benutzerinteressen. Wir
untersuchen zunächst das Vorkommen von Konzeptverschiebungen in
realen Datensätzen und präsen- tieren dann ein zeitbewusstes
Empfehlungssystem, das die Konzeptver- schiebung von
Benutzerinteressen berücksichtigt.
Contents
I. Preface 1
1. Introduction 3 1.1. Motivation and Problem description . . . . .
. . . . . . . . . . . . . . 3 1.2. Problem Statement . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 5 1.3. Thesis
Contributions & Published Work . . . . . . . . . . . . . . . .
5
1.3.1. Literature survey . . . . . . . . . . . . . . . . . . . . .
. . . . 6 1.3.2. Addressing the one-class problem . . . . . . . . .
. . . . . . . 6 1.3.3. Addressing the concept-drift in users
interests . . . . . . . . . 7
1.4. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 8
II. State-of-the-art: Recommender systems for Scientific Pa- pers
9
2. Recommender Systems 11 2.1. Introduction . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 11 2.2. Recommender Systems
and the Users-items Interactions . . . . . . . 12
2.2.1. Explicit feedback . . . . . . . . . . . . . . . . . . . . .
. . . . 13 2.2.2. Implicit feedback . . . . . . . . . . . . . . . .
. . . . . . . . . 13
2.3. Overview of Recommendation Techniques . . . . . . . . . . . .
. . . . 14 2.3.1. Content-based filtering (CBF) . . . . . . . . . .
. . . . . . . . 14 2.3.2. Collaborative filtering and latent factor
models . . . . . . . . 16 2.3.3. Hybrid approaches . . . . . . . .
. . . . . . . . . . . . . . . . 18
2.4. Matrix Factorization in CF . . . . . . . . . . . . . . . . . .
. . . . . . 18 2.4.1. Least squares modeling . . . . . . . . . . .
. . . . . . . . . . . 19 2.4.2. Probabilistic matrix factorization
. . . . . . . . . . . . . . . . 19 2.4.3. Matrix factorization
algorithms . . . . . . . . . . . . . . . . . 20
2.5. Implicit Feedback and One-class problem . . . . . . . . . . .
. . . . . 24
3. Literature Survey 27 3.1. Introduction . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 27 3.2. Relevant Papers
Identification . . . . . . . . . . . . . . . . . . . . . . 28 3.3.
Paper Recommendation and Citation Recommendation . . . . . . . .
29
i
3.4. Dimensions for Approaches Comparison . . . . . . . . . . . . .
. . . . 29 3.4.1. Recommendation approaches . . . . . . . . . . . .
. . . . . . 30 3.4.2. Recommendation scenarios . . . . . . . . . .
. . . . . . . . . . 33 3.4.3. User modeling . . . . . . . . . . . .
. . . . . . . . . . . . . . . 34 3.4.4. Publications data usage . .
. . . . . . . . . . . . . . . . . . . 34 3.4.5. Publication
representation . . . . . . . . . . . . . . . . . . . . 35 3.4.6.
Matching method . . . . . . . . . . . . . . . . . . . . . . . . .
39 3.4.7. Evaluation strategy . . . . . . . . . . . . . . . . . . .
. . . . . 40 3.4.8. Implicit feedback and time-aware . . . . . . .
. . . . . . . . . 40
3.5. Approaches for Recommending Scientific Publications . . . . .
. . . . 41 3.5.1. Content-based filtering approaches . . . . . . .
. . . . . . . . 41 3.5.2. Graph-based approaches . . . . . . . . .
. . . . . . . . . . . . 44 3.5.3. Latent factor model approaches .
. . . . . . . . . . . . . . . . 46 3.5.4. Hybrid approaches . . . .
. . . . . . . . . . . . . . . . . . . . 46 3.5.5. Preference
learning approaches . . . . . . . . . . . . . . . . . 47 3.5.6.
Cross-domain approaches . . . . . . . . . . . . . . . . . . . . .
48 3.5.7. Co-occurrence based approaches . . . . . . . . . . . . .
. . . . 48
3.6. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 49 3.7. Conclusion . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 50
III. One-class Problem in Scientific Paper Recommender Sys- tem
51
4. Content-based Filtering using Multi-variate Linear Regression 53
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 53 4.2. PubRec Overview . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 55 4.3. Papers Modeling . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 55
4.3.1. Publication representation . . . . . . . . . . . . . . . . .
. . . 56 4.3.2. Keywords extraction . . . . . . . . . . . . . . . .
. . . . . . . 57
4.4. Learning Algorithm and Recommendation Process of PubRec . . .
. 58 4.4.1. Importance score . . . . . . . . . . . . . . . . . . .
. . . . . . 59 4.4.2. Learning user profile . . . . . . . . . . . .
. . . . . . . . . . . 61 4.4.3. Recommendation generation . . . . .
. . . . . . . . . . . . . . 62
4.5. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 62 4.5.1. Dataset . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 62 4.5.2. Experimental setup and results .
. . . . . . . . . . . . . . . . 63
4.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 66
5. Pairwise Preference Learning for CBF Recommenders 69 5.1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 69 5.2. Learning-to-Rank Overview . . . . . . . . . . . .
. . . . . . . . . . . 71
5.2.1. Pointwise learning-to-rank . . . . . . . . . . . . . . . . .
. . . 72 5.2.2. Pairwise learning-to-rank . . . . . . . . . . . . .
. . . . . . . . 72
5.2.3. Listwise learning-to-rank . . . . . . . . . . . . . . . . .
. . . . 72 5.3. Ranking in Recommender Systems . . . . . . . . . .
. . . . . . . . . 72
5.3.1. LTR model for recommender systems . . . . . . . . . . . . .
. 73 5.3.2. Pairwise preferences to learn from positive-only
feedback . . . 73 5.3.3. General model versus an individual model .
. . . . . . . . . . 74
5.4. From Preference Pairs to Recommendations . . . . . . . . . . .
. . . 76 5.4.1. Notation and definitions . . . . . . . . . . . . .
. . . . . . . . 76 5.4.2. The recommendation approach . . . . . . .
. . . . . . . . . . 76 5.4.3. Model learning . . . . . . . . . . .
. . . . . . . . . . . . . . . 78 5.4.4. Recommendation generation .
. . . . . . . . . . . . . . . . . . 80
5.5. Preference Pairs Validation . . . . . . . . . . . . . . . . .
. . . . . . . 80 5.5.1. Pruning based validation (PBV) . . . . . .
. . . . . . . . . . . 80 5.5.2. Weighting based validation (WBV) .
. . . . . . . . . . . . . . 81
5.6. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 82 5.6.1. Dataset & experimental setup . . . . .
. . . . . . . . . . . . . 82 5.6.2. Evaluation metrics . . . . . .
. . . . . . . . . . . . . . . . . . 82 5.6.3. Results and
discussion . . . . . . . . . . . . . . . . . . . . . . 83
5.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 85
6. Simultaneous model Learning for Multiple LTR Models 87 6.1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 87 6.2. Problem Definition . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 89 6.3. Distributed Computation Framework
. . . . . . . . . . . . . . . . . . 89
6.3.1. Spark architecture . . . . . . . . . . . . . . . . . . . . .
. . . 90 6.3.2. SVM on Spark . . . . . . . . . . . . . . . . . . .
. . . . . . . 91
6.4. RankingSVM Recommender on Spark . . . . . . . . . . . . . . .
. . . 93 6.4.1. Sequential models learning . . . . . . . . . . . .
. . . . . . . . 94 6.4.2. Parallel models learning . . . . . . . .
. . . . . . . . . . . . . 95
6.5. Efficiency Analyzing for PML and SML . . . . . . . . . . . . .
. . . . 96 6.5.1. Map computations . . . . . . . . . . . . . . . .
. . . . . . . . 97 6.5.2. Shuffle . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 98 6.5.3. Reduce computations . . . . .
. . . . . . . . . . . . . . . . . . 102 6.5.4. Conclusion of
complexity analysis . . . . . . . . . . . . . . . . 103
6.6. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 103 6.6.1. Dataset . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 103 6.6.2. Experiments and results
discussion . . . . . . . . . . . . . . . 103
6.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 105
IV. Time-aware Recommendations 107
7. Concept Drift Detection in Users Behavior 109 7.1. Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2. Concept Drift Definition . . . . . . . . . . . . . . . . . . .
. . . . . . 110
7.3. Concept Drift Detection . . . . . . . . . . . . . . . . . . .
. . . . . . 112 7.4. Detecting Concept Drift for Publication
Recommendation . . . . . . 114
7.4.1. Representation model of papers . . . . . . . . . . . . . . .
. . 114 7.4.2. Drift points identification . . . . . . . . . . . .
. . . . . . . . 115
7.5. Citeulike Dataset . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 116 7.6. Concept Drift in Citeulike Dataset . . . . .
. . . . . . . . . . . . . . 117
7.6.1. Parameters selection . . . . . . . . . . . . . . . . . . . .
. . . 117 7.6.2. Analyzing users behavior . . . . . . . . . . . . .
. . . . . . . . 120 7.6.3. Users with similar behavioral patterns .
. . . . . . . . . . . . 122 7.6.4. Drift points detection in
citeulike dataset . . . . . . . . . . . . 122
7.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 124
8. Time-aware Collaborative Topic Regression 127 8.1. Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2. Related Work . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 129 8.3. Problem Statement and Preliminaries . . . . .
. . . . . . . . . . . . . 130
8.3.1. Notation and problem statement . . . . . . . . . . . . . . .
. 130 8.3.2. Collaborative topic regression (CTR) . . . . . . . . .
. . . . . 131
8.4. Time-aware Collaborative Topic Regression (T-CTR) . . . . . .
. . . 132 8.4.1. Concept drift score . . . . . . . . . . . . . . .
. . . . . . . . . 133 8.4.2. Confidence weights . . . . . . . . . .
. . . . . . . . . . . . . . 134 8.4.3. Model learning and
prediction . . . . . . . . . . . . . . . . . . 134
8.5. Evaluation and Discussion . . . . . . . . . . . . . . . . . .
. . . . . . 135 8.5.1. Dataset . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 135 8.5.2. Experimental setup . . . . . . .
. . . . . . . . . . . . . . . . . 136 8.5.3. Time-aware vs
time-ignorant evaluations . . . . . . . . . . . . 137 8.5.4.
Baselines comparison . . . . . . . . . . . . . . . . . . . . . . .
137 8.5.5. User-specific vs common concept drift scores . . . . . .
. . . . 139
8.6. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 141
V. Discussion 143
Bibliography 149
List of Tables
3.1. The mapping of reviewed papers. Part 1 . . . . . . . . . . . .
. . . . 31 3.1. The mapping of reviewed papers. Part 2 . . . . . .
. . . . . . . . . . 32 3.2. Reviewed papers distribution over the
recommendation approaches . 33 3.3. The mapping of reviewed papers.
Part 3 . . . . . . . . . . . . . . . . 36 3.3. The mapping of
reviewed papers. Part 4 . . . . . . . . . . . . . . . . 37
4.1. Performance comparison . . . . . . . . . . . . . . . . . . . .
. . . . . 66
5.1. Performance comparison between WBV and baselines . . . . . . .
. . 83 5.2. Performance comparison between WBV, LR and SVM . . . .
. . . . 83
6.1. Cost comparison between PML and SML for reduce operations . .
. . 102 6.2. Fold Statistics . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 104 6.3. Training time of SML and PML . . .
. . . . . . . . . . . . . . . . . . 105
7.1. Setting the topic significance threshold . . . . . . . . . . .
. . . . . . 118 7.2. User groups in citeulike dataset . . . . . . .
. . . . . . . . . . . . . . 122
8.1. Citeulike dataset statistics for each fold . . . . . . . . . .
. . . . . . . 136 8.2. Number of intervals in the dataset for each
fold . . . . . . . . . . . . 140
v
2.1. Schematic illustration of content-based filtering recommender
system 15 2.2. Schematic illustration of Collaborative filtering
recommender system . 17 2.3. Matrix factorization . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 19
3.1. Scholarly data usage in the reviewed recommendation papers . .
. . . 35 3.2. Publication representation in the reviewed works . .
. . . . . . . . . . 39 3.3. The matching methods in the reviewed
works . . . . . . . . . . . . . 39 3.4. The evaluation methods in
the reviewed works . . . . . . . . . . . . . 40
4.1. Overview of the recommendation approach (PubRec) . . . . . . .
. . 55 4.2. Scientific publications data structure . . . . . . . .
. . . . . . . . . . 56 4.3. Decay function . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . 60 4.4. Linear regression
modeling . . . . . . . . . . . . . . . . . . . . . . . . 61 4.5.
Parameters tuning . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 64 4.6. MRR results . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 65
5.1. Example of LTR modeling for recommender systems . . . . . . .
. . . 75 5.2. Peer papers and preference pairs formulation . . . .
. . . . . . . . . . 77 5.3. Overview of the proposed approach steps
. . . . . . . . . . . . . . . . 78 5.4. Performance comparison
between WBV, PBV, LR and SVM . . . . . 84
6.1. Spark Architecture . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 90 6.2. Sequential model learning . . . . . . . . . . .
. . . . . . . . . . . . . 94 6.3. Parallels model learning . . . .
. . . . . . . . . . . . . . . . . . . . . 95 6.4. Shuffle operation
example . . . . . . . . . . . . . . . . . . . . . . . . 99 6.5.
Training Time for SML and PML . . . . . . . . . . . . . . . . . . .
. 105
7.1. The rating series and the rating graph . . . . . . . . . . . .
. . . . . 113 7.2. Terms-papers distributions in citeulike dataset
. . . . . . . . . . . . . 117 7.3. The distribution of
representative topics per paper . . . . . . . . . . . 119 7.4. The
distribution of representative topics per paper . . . . . . . . . .
. 120 7.5. Analysis of users behavior in the citeulike dataset . .
. . . . . . . . . 121 7.6. Number of drift points in each user
group . . . . . . . . . . . . . . . 123 7.7. Number of drift points
per duration for each user group . . . . . . . . 124 7.8. Number of
drift points per ratings for each user group . . . . . . . . .
124
8.1. Computing the pairwise similarities in the rating series . . .
. . . . . 133
vii
8.2. Concept drift score influence on the decay function . . . . .
. . . . . 135 8.3. Time-aware and time-ignorant splits . . . . . .
. . . . . . . . . . . . 137 8.4. Performance comparison of CTR
between time-aware and time-ignorant
splits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 138 8.5. Performance comparison for T-CTR and the
baseline methods . . . . 139 8.6. User-specific compared to common
concept drift score . . . . . . . . . 140
Part I.
Introduction
Contents
1.1. Motivation and Problem description . . . . . . . . . . . 3
1.2. Problem Statement . . . . . . . . . . . . . . . . . . . . . 5
1.3. Thesis Contributions & Published Work . . . . . . . . 5
1.4. Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . .
8
1.1. Motivation and Problem description
Modern research is remarkably boosted by contemporary
research-supporting tools available for researchers nowadays.
Thanks to digital libraries, researchers through- out the world
have the opportunity to boost their work by accessing a large body
of complete human knowledge with little effort. However, the sheer
amount of rapidly published scientific publications in all science
disciplines overwhelms researchers and scholars with a large number
of potentially relevant and important scientific publi- cations.
Digital libraries and online archives for scientific publications
typically offer the possibility to explore their archives over
keyword-based and bibliographic-based search. Such methods are
important and present a possibility to explore the libraries and
locate potentially relevant content. However, the efficiency of
such traditional information retrieval methods lies in the hands of
the users. In the absence of user modeling, users are expected to
guess the correct search terms in order to find po- tentially
relevant papers, which opens the door for missing relevant papers
that are indexed under different terms. This case sheds light on
the need for tools that can understand and model users interests,
and then explore the online archives to discover relevant
scientific pub- lications. Recommender systems provide the natural
fit for this problem, and a considerable amount of research has
been done in the last decade with the goal of designing suitable
recommender systems for this task [BGLB16]. The primary goal of
employing recommender systems here is twofold. First, discovering
relevant research publications which otherwise would not be found
by the user; second, al- lowing researchers and scholars to
concentrate more on the actual research work by
3
Chapter 1 Introduction
lifting the searching and exploring tasks off their shoulders. The
fundamental problem that recommender systems, in general, try to
solve is to estimate the utility of a particular item, the
candidate item for a given user, the active user. To achieve this
task, recommender systems leverage any combination of the following
information, information related to the user, to the item, to the
previous interactions between users and items, or information
related to the context in which the recommendation is requested,
such as time or purpose. Thus, recom- mender systems deal with two
types of entities, the user and the item. In scientific publication
recommendation, the items are strictly scientific publica- tions.
This type of items has particular characteristics which push
towards specific design decisions due to the challenges or the
opportunities associated with these characteristics. Furthermore,
the challenges do not come from the papers only, but there are also
challenges related to the users and their interactions with the
scien- tific publications. We can summarize the challenges that
face a recommender system when recommending scientific publications
as follows:
1. Challenges related to scientific publications: Scientific
publications are mainly represented by their textual content. This
content holds all ideas and contributions brought by the
publications. It also plays a central role in attracting users
towards the paper. After all, users are interested in that content.
The unstructured nature of the textual con- tent poses a challenge
against the systems that try to analyze and match it against the
content of other publications or against the user’s needs. An im-
portant aspect here is the polysemy and synonymy problems
encountered in the textual content. Therefore, recommender
designers need to find suitable representation model for the
publications’ textual content, that allows expres- sive modeling
for the publications.
2. Challenges related to users behavior
• Unlike other recommendation applications such as movies or songs
rec- ommendations, in scientific publications recommendation, the
number of papers surpasses the number of users by far, especially
users that are willing to provide ratings. This leads to an extreme
sparse setup. For example, in a study conducted in [Vel13], it was
found that the ratings’ sparsity in Mendeley [JHGdZG12], an
academic social network, is almost three orders of magnitude higher
than that of Netflix, the famous movies provider. Also, in an
analysis on a real-world dataset, we found that users have on
average around 40 papers in their personalized libraries. Above
that, these libraries are built over months of the user’s activity
in the system, whereas users of Spotify, for example, listen on
average to 50 songs in a single day.
• Not only are the available ratings limited but they also are all
positive ratings. This issue is known as the one-class problem
where only positive ratings are available. Users do not usually
provide information about
4
1.2 Problem Statement
irrelevant papers. This forces the recommender system to learn a
model from an extremely biased set of ratings that contains
information about the positive class only.
• The last challenge is related to the temporal aspect of the
system. Several conditions might change over time in such a system.
Most importantly is the change in user’s interest. For example, a
user who was interested in papers related to semantic web three
years ago might have shifted his/her main interest towards machine
learning-related papers. Consequently, not all available ratings
(which are already few) have the same level of importance for
understanding the actual user’s needs. Therefore, it is essential
for the recommendation method to be aware of this temporal aspect
and to adapt to the changes in users’ interests over time.
These challenges form the basic problems that we tackle in this
thesis. In the following sections, we present the problem statement
and discuss these challenges in the light of the main contributions
of this thesis.
1.2. Problem Statement
Concretely, the problem statement for this thesis can be described
as follows. We design a recommender system that can recommend
“useful” scientific publications to users assuming the following
setup. The input to the recommender system is:
1. A set of scientific publications P , where each publication is
associated with two kinds of attributes, the textual content
including the title, the abstract and the author-defined keywords
list; and structural attributes, namely the publication year and
the publishing venue.
2. A set of users U , for each user we have the list of relevant
publications from P . Each relevant publication is associated with
the timestamp which refers to the time when the user showed his/her
interest towards the publication.
The expected output is an ordered list of recommended publications
from P for each user from U . We base our work in this thesis on
this setup and study and present different rec- ommendation
techniques that can solve the previously mentioned
challenges.
1.3. Thesis Contributions & Published Work
Our contributions in this thesis mainly tackle the challenges
presented earlier. In the following subsections, we summarize these
contributions in three points.
5
Chapter 1 Introduction
1.3.1. Literature survey
Scientific publication recommendation is developing as a separate
research direc- tion emerging from the much-bigger recommender
systems community. Therefore, surveying the contributions that
address scientific publication recommendation pro- vides a proper
understanding about the approaches and challenges specific to this
branch. In our first contribution, we present a survey for the
recent works in the domain of scientific publication
recommendation. Our goal is to form an exten- sive understanding of
the following points: identifying successful and promising
recommendation approaches that have been employed to solve our
problem; and investigating the challenges and opportunities each of
these approaches bring. Ad- ditionally, this survey allows us to
situate our contributions of this thesis among existing works. We
present the survey in Chapter 3.
1.3.2. Addressing the one-class problem
Most of existing approaches for recommending scientific
publications depend on im- plicit feedback rather than the explicit
feedback as a source for the users ratings. This is because of the
much more availability of the former compared to that of the
latter. Such methods analyze users’ online activities such as paper
searching and browsing [XGLC14]; bookmarking or tagging [WB11,
WCL13, BBM16], or paper authoring and citing [APSF16, SK13].
However, using such interactions as a source for the users ratings
leads to the one-class problem, which is basically the absence of
the negative feedback [PZC+08]. Existing works that addressed the
one-class problem for recommender systems in general have typically
considered the Collaborative Filtering (CF) recommendation
scenario. However, CF approaches are not the best fit for the
scientific paper rec- ommendation since they can not recommend
unseen papers. Another problem that contributed to abandoning CF
approaches is the high sparsity in users-items relation driven by
the large number of papers relative to the much lower number of
users. As a result, the focus of previous works for papers
recommendation has been on Content-based Filtering (CBF)
approaches. In CBF, constructing user’s profile was almost
exclusively achieved based on memory- based or heuristic-based
methods. These methods construct the user profile by ap- plying an
aggregation function over the feature vectors of relevant papers.
Such methods depend strongly on the underlying assumption related
to the employed aggregation method. On the contrary, model-based
approaches, since they rely on a learning algorithm, have the
potential to build more representative user mod- els. But,
model-based CBF approaches didn’t gain much attention as a solution
for scientific publication recommendation. The reasons can be
summarized in the following three points: (a) the absence of
negative ratings (the one-class problem) significantly limits the
ability of machine learning algorithms to learn a represen- tative
model; (b) the scarcity in the available ratings for each user,
which means a
6
1.3 Thesis Contributions & Published Work
low number of training instances; and (c) model-based CBF
approaches are always connected with a high cost since a separate
model should be trained for each user. In Part III of this thesis,
we focus on these challenges and show how model-based CBF
recommenders can be both applicable and efficient for serving the
scientific publications recommendation. Our contributions in this
aspect are:
• In our first contribution [AAFL15], we studied how to model the
recommenda- tion task under the absence of the negative class. We
presented a supervised learning modeling that models the
recommendation prediction as a regres- sion problem and suggested
to employ the rating’s age in order to achieve a multi-level
labeling scheme as a solution for the one-class problem.
• In our second work [AAFL16], we adopted a different technique for
addressing the one-class problem, namely pairwise learning-to-rank.
In this case, we mod- eled the recommendation task as a ranking
prediction problem and defined a rank-based loss that can learn
from the preference between relevant and un- observed papers. We
provided two verification methods for accounting for the potential
errors resulting from utilizing the unobserved papers and conducted
offline evaluations on a real-world dataset to evaluate our
approach.
• As an extension to our learning-to-rank approach, we addressed
the efficiency issue of the model-based content-based recommender
in [AKL19]. We pro- vided a system design that leverages the
computational power of multiple computation units (cluster of
machines) in order to enable efficient training of supervised
models for a large number of users. This system was implemented in
Apache Spark.
1.3.3. Addressing the concept-drift in users interests
In our final contribution, we addressed the temporal aspect of the
problem. We first studied the presence of concept drift in user’s
interest on a real-world dataset collected from an online system
that allows users to save and share academic pa- pers, the
citeulike social bookmark website. Afterward, we presented in
[Alz18] a time-aware recommendation method, where we adapted and
extended an exist- ing promising recommendation method, namely the
Collaborative Topic Regression (CTR) [WB11], enabling the method to
account for the concept drift in users in- terest. We additionally
conducted systematic experiments on the citeulike dataset and found
that time-ignorant offline evaluation methods promise unrealistic
results. Additionally, we showed that our presented time-aware
approach leads to better recommendations especially under the
realistic time-aware evaluation framework.
7
Chapter 1 Introduction
1.4. Thesis Outline
This thesis is divided into five parts. The following table
provides an overview of the topics and contributions of these
parts.
Part I Chapter 1 provides an introduction to the theses, where we
state the problem statement and summarize the contributions.
Part II Provides the foundations and the background knowledge
important for understanding the presented concepts. We also present
in this part our literature survey. Chapter 2 presents an overview
about the recommender systems, high- lighting concepts that are
relevant to our work. Chapter 3 presents a survey on recent related
research that is relevant to our problem.
Part III In this part, we address the one-class problem. Chapter 4
presents a model-based content-based system for recommend- ing
scientific publications. Chapter 5 presents another model-based
approach, which tackles the one-class problem by employing pairwise
preference comparison tech- nique. Chapter 6 presents an efficient
implementation for the learning-to-rank recommender that can
efficiently train supervised models for a large num- ber of
users.
Part IV In this part, we address the temporal aspect of the
scientific paper rec- ommender system. Chapter 7 presents an
exploration study for concept drift detection and provide a study
on a real-world dataset to examine the presence of concept drift in
our scenario. Chapter 8 presents our approach to account for the
concept drift in user’s interest in the underlying problem.
Part V Chapter 9 concludes the thesis with a summary.
8
9
Contents
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
11 2.2. Recommender Systems and the Users-items Interactions 12
2.3. Overview of Recommendation Techniques . . . . . . . 14 2.4.
Matrix Factorization in CF . . . . . . . . . . . . . . . . 18 2.5.
Implicit Feedback and One-class problem . . . . . . . . 24
2.1. Introduction
This chapter serves as a review of background knowledge related to
recommender systems. We focus on the topics relevant to this
thesis, important for understanding the presented contributions.
Readers that are familiar with the following concepts can skip this
chapter: content-based filtering, collaborative filtering,
model-based recommender systems, matrix factorization and implicit
feedback. We will start our explanation with an introduction to the
notation used in this thesis.
Mathematical notation. We denote a matrix by a capital letter
(e.g., R, D). The elements of the matrix are represented by lower
case letters with a double subscript (e.g., rij, djk). A row or a
column of a matrix is represented by a subscripted capital letter
(e.g., Rj is the jth row or column of the matrix R. Vectors are
denoted by lower case letters printed in a boldface font. For
example, u, v. The elements of the vector are represented by lower
case letters with a single subscript (e.g., ui). We also consider
that vectors are always column vectors, which is also valid to
indexing a row or a column of a matrix. For example, the
dimensionality of a vector u is Rd×1 where d is the vector length.
Similarly, for a matrix R ∈ Rn×m, the ith row is the column vector
Ri ∈ Rm×1 and the jth column is the column vector Rj ∈ Rn×1.
11
Chapter 2 Recommender Systems
Chapter structure The rest of this chapter is organized as
following. First, we introduce recommender systems and the
different interaction models in section 2.2. In section 2.3, we
present an overview about main techniques for recommender systems.
We then move to explain in section 2.4 an important method for CF
recommenders, namely matrix factorization. Finally, we explain in
Section 2.5 an important problem that we address in this thesis,
the one-class problem .
2.2. Recommender Systems and the Users-items Interactions
At the basic level, recommender systems deal with two main
entities, namely users and items. Users interact with items in
several ways depending on the underlying do- main, this interaction
is the essential source of information that help recommender
systems generate future recommendations. Therefore, the user-items
interactions can be seen as the third main entity of the
recommender system. We first introduce some terminology which is
widely adopted in the community of recommender systems. The user
for which we generate recommendations is the ac- tive user. The set
of items that appear in the previous interactions of the user are
the observed items and the rest of the items are the unobserved
items. Unobserved items are candidates for recommendation and they
are, therefore, called the candi- date items. More precisely, the
candidate set of items is a subset of the unobserved items. The
goal of a recommender system is to estimate the utility of each
candidate item i to a user u (the active user), and recommend the
set of candidate items with the highest estimated utility to u. To
achieve its goal, a recommender system leverages a wide range of
available information that can be categorized in:
• Items-related information
• Users-related information
• Contextual information related to the context in which the
recommendation is requested.
• Information related to previous interactions between the user(s)
and the item(s).
Different recommendation techniques use different data from this
list. They also vary in the way how the utilized data is processed.
For example, we can explain the three main classes of
recommendation approaches namely the Content-based Filtering (CBF),
the Collaborative Filtering (CF) and the hybrid approaches, by
categorizing the used information as follows:
• Using previous collective interactions of the active user only in
addition to information related to the items gives rise to the
content-base filtering ap- proaches;
12
2.2 Recommender Systems and the Users-items Interactions
• Using previous interactions of all users, gives rise to the
collaborative filtering approaches.
• Using the interactions of all users in addition to the item’s
information leads to hybrid approaches.
Later in section 2.3, we will give more details about these
approaches, but before that, we will explain how users’
interactions and activities are modeled. Users interactions with
items can be seen as feedback provided by the users to the
recommender system. Different types of feedback can be observed,
ranging from the less obvious feedback such as searching for a
product, clicking on an advertisement or reading a news article, to
more direct feedback such as purchasing a product or adding a paper
to the user library. Therefore, these actions can be categorized in
two categories, namely implicit feedback and explicit feedback. In
both cases, users interactions with items are translated into
ratings. The following subsections explain these two feedback
categories.
2.2.1. Explicit feedback
This is the most convenient input for the recommender system, where
users explic- itly give a kind of “rating” for an item expressing
how much the item is relevant. An example for such feedback is the
stars feedback system that Netflix adopts. Netflix users can rate
movies in a scale between one star and five stars with one star
means not interested and five stars means very interested. For
recommender systems, this kind of feedback is very useful since it
provides a convenient level of details about users’ tastes.
However, requesting explicit feedback from users doesn’t coincide
with the users goals. Users want to spend time on the systems to
benefit and utilize the services provided by that system. They are
not usually willing to spend time for giving feedback especially
when they don’t find a direct reward. For example, Net- flix users
would prefer start watching another movie rather than spending some
time rating the already watched movie. Therefore, explicit feedback
are not always avail- able and even when they are available, they
are sparse. As a result, an alternative source for information
about the users interactions is needed, which brings us to the
implicit feedback.
2.2.2. Implicit feedback
Obtaining explicit feedback imposes extra work on the users.
Ratings needed for learning a representative model are rarely
supplied by the users [SKK01]. Therefore, instead of expecting
users to explicitly provide feedback, recommender systems can
observe users actions and activities and infer users’ tastes
accordingly. Here, the users are relieved from providing explicit
feedback. Instead, the system “implicitly” derives the users
feedback. For example, if the user stopped watching a movie after a
short time and never continued, then we can infer that the user is
most likely not
13
Chapter 2 Recommender Systems
interested in that movie. On the contrary, if a user watches a
movie multiple times, we assume that he/she is interested in that
movie.
2.3. Overview of Recommendation Techniques
Approaches to generate recommendations can be grouped in several
categories, we provide in this section a general overview of
approaches that are widely adopted for scientific paper
recommendations and detail important background information needed
to understand the rest of this thesis. A more in-depth
categorization for recommendation approaches relevant to scientific
papers recommendation, will be introduced later in the literature
survey presented in Chapter 3. As we mentioned earlier, we can
identify different recommendation approaches by looking at two
aspects, (a) which information is utilized; and (b) how this
informa- tion is processed. Based on this, we can differentiate
between the following main classes for recommendation
approaches.
2.3.1. Content-based filtering (CBF)
Content-based systems recommend items that are similar to what the
active user has liked in the past. In CBF recommenders, the
similarity is based on the items’ attributes or features.
Therefore, both users and items are modeled in a shared feature
space. This approach makes content-based recommender system
especially useful in such scenarios where items can be represented
using a descriptive set of features. It is particularly useful when
there are only few ratings available for the items or when new
items are added to the system because other items with similar
attributes or features might have been rated by the active user.
Hence, content- based systems are based on the following two
sources of information:
1. The set of attributes that provide a comprehensive description
of the items. An example of such attributes for a scientific
publication could be the title, the authors-list, the list of
author-defined keywords, the year of publication, the number of
pages, the venue, etc.
2. The affinity of the active user towards items features, i.e. the
importance of each feature/attribute to the user.
Figure 2.1 depicts the general system design of a CBF recommender.
Designing or choosing the features for the items is done by
applying feature engineering to define a representative set of
features. On the other hand, defining the mapping between the
active user and the items features is a central task of CBF systems
which is known as “user modeling” [RRS15]. In the simplest form,
the mapping is provided explicitly by the user through a user
profile. According to Vivacqua et al. [VOdS09], this is called the
declaration method for constructing user profiles. In
14
Figure 2.1. Schematic illustration of content-based filtering
recommender system.
this case, the CBF recommendation algorithm predicts the relevance
of a candidate item to the active user by measuring the similarity
between the user profile and the candidate item’s attributes.
Hence, no “user modeling” is involved here. However, we can not
count on the availability of such profiles since users don’t tend
to put efforts in defining their interest in details. Therefore,
more advanced approaches try to build or learn the user profile by
analyzing the properties of the observed items, the items that
appear in the user’s previous activities. In other words, the
system tries to find the correct mapping between the user and the
items features. The important decision that should be made here is
how to model the problem of learning user profile. Based on the
classification suggested by Adomavicius et al. in [AHT08],
approaches for building user profile can be categorized in the
following two categories:
2.3.1.1. Memory-based methods for user modeling
Such methods categorize the observed items into relevant items and
irrelevant items denoted by I+
u , I − u respectively. Then, the user profile is constructed as an
aggre-
gation of the features of the relevant items. An example for this
approach is the relevance feedback methods, the Rocchio algorithm
[Roc71]. This algorithm builds the user profile w(u) for a user u
using the following formula:
w(u) = β
|I+ u |
∑ i∈I+
x(i) (2.1)
Where x(i) is the feature vector of item i. The parameters β and γ
control the influence of the relevant and irrelevant items
respectively. Memory-based methods are also seen as heuristic-based
methods, they are practical
15
Chapter 2 Recommender Systems
and easy to implement, but are usually criticized because they over
simplify the model and depend on a presumed heuristic encoded in
the adopted aggregation function for building the user
profile.
2.3.1.2. Model-based methods for user modeling
Instead of using an aggregation function, model-based approaches
depend on a ma- chine learning algorithm for learning the features
importance for the active user. A supervised learning problem is
formulated where the user profile is trained using a machine
learning algorithm with the observed user’s ratings as the training
data. Compared to memory-based, this approach relies on less
predefined assumptions, which allows producing more representative
models. However, model-based meth- ods suffer from the shortage in
training data which explains why most of the existing CBF systems
opt for memory-based methods over model-based methods for build-
ing user profile. In CBF, we learn one model per user using the
user’s ratings only. Therefore, the training data are limited to
the ratings provided by that user. Usu- ally in recommender
systems, we don’t have a lot of ratings per user that suffice for
training a representative model. This is also an important reason
why recommender systems seek to collect ratings from implicit
feedback (cf. section 2.2.2). However, utilizing implicit feedback
comes with its own challenges such as the one-class prob- lem,
where ratings are limited to the positive feedback only. Later in
section 2.5, we will provide more details about the one-class
problem.
2.3.2. Collaborative filtering and latent factor models
Collaborative filtering (CF) is another main class of
recommendation approaches. In CF recommenders, all ratings
available in the system are utilized, including ratings from other
users in addition to ratings from the active user. It is therefore
a sense of collaboration between the users because every rating
added by any user helps the sys- tem getting better in terms of
generating relevant recommendations. Because CBF systems base their
recommendations only on the active user ratings, the resulting
recommendations are always very similar to what the user has
already consumed. Hence, CBF recommender cannot bring anything new
to the user, which is known as the over-specification problem of
CBF systems. CF recommenders overcome this problem as they employ
ratings from other users, which makes the generated rec-
ommendations to be influenced by the new kind of items explored by
other users. Figure 2.2 illustrates the system design of
collaborative filtering recommenders. In CF methods, available
ratings are modeled as an incomplete matrix, known as the rating
matrix. This is a two dimensional matrix with one row for each user
and one column for each item. Having n users and m items, we denote
the rating matrix by R ∈ Rn×m, where an entry rui represents the
rating of user u on item i. Only known or available ratings are
stored in the rating matrix, whereas unknown ratings are left
empty. Based on this modeling, the task of a CF recommender system
is to
16
Figure 2.2. Schematic illustration of Collaborative filtering
recommender system.
predict the missing values (or ratings) in the rating matrix.
Afterwards, in order to generate the recommendations for a user u,
we sort the predicted values from the row Ru and recommend the set
of items that correspond to the highest predicted scores. CF
approaches are mainly grouped in two categories, neighborhood
models and Latent Factor Models (LFM) [HKV08].
Neighborhood models. These methods are also referred to as
memory-based CF try to find users with “similar” behavior to that
of the active user, these users form the neighborhood. The
similarity between users is based on their rating behavior i.e. two
users are similar if they rate items alike. After identifying the
neighborhood, a process that is similar to majority voting computes
a vote or a prediction score for each candidate item.
Latent Factor Models. On the other hand, latent factor models which
are also known as model-based CF are based on dimensionality
reduction. The basic idea here is to factorize the rating matrix R
into low-dimensional but complete matri- ces. Multiplying the
resulted low-dimensional complete matrices together leads to a
complete matrix R′ that approximates R. The unknown ratings from R
can now be estimated by the corresponding values from R′. This
scheme is a direct application of matrix factorization in the
recommendation domain and was first presented by Simon Funk in his
blog post [Fun06] as a successful solution to the Netflix chal-
lenge [Net06]. Latent factor models gained recently a lot of
attention in the recommender systems community especially after
matrix factorization was one of the winning methods in the Netflix
prize in 2009. LFM are also considered as the state-of-the-art
method for
17
Chapter 2 Recommender Systems
recommender systems [A+16], and there has been a wide range of
different variations for matrix factorization presented for solving
the recommendation problem. We will explain matrix factorization in
more details in the next section, but first, we will proceed to
explain the third recommendation approach in the following
subsection.
2.3.3. Hybrid approaches
As we mentioned earlier, content-based filtering excel in cases
where the items have expressive features and where items might not
have enough ratings to make them discoverable. But CBF approaches
are sensitive to the feature extraction method, which can be
erroneous. They also suffer from the over-specification problem
where users get recommendations very similar to what they have
consumed. Collaborative filtering recommenders on the other hand,
don’t require feature representations for the items. But, they
depend on other users’ ratings to generate recommendations, which
might be inefficient when no enough ratings are available and when
new users or items are added to the system. To overcome the
problems of these two approaches hybrid recommendations were
presented to seek the best of both worlds. A lot of recommendation
methods has been presented in the literature that can be
categorized under hybrid approaches, we refer the reader to the
work presented by Robin Burke in [Bur07] for a comprehensive
categorization for hybrid methods.
2.4. Matrix Factorization in CF
The basic method for matrix factorization in CF is the
UV-decomposition, which is usually referred to as SVD within the
recommender systems community. It is important to note that this is
not the Singular Value Decomposition which is also a dimensionality
reduction method that does matrix factorization. The difference is
that UV-decomposition factorizes the matrix in two matrices whereas
Singular Value Decomposition factorizes the matrix into a product
of three matrices. To avoid the confusion between these two
methods, we will use the term UV-decomposition. Figure 2.3
illustrates the process of UV-decomposition, where we factorize the
rating matrix R in two matrices U and V with dimensions n × k and m
× k respectively. k is the number of latent factors which decides
the dimensionality of the shared space. U gives the latent
representation for all users in the low-dimensional space, where
each row of U represents a user as a k-dimensional vector.
Similarly, V provides the latent representation of all items in the
k-dimensional space, and each item is represented as a
k-dimensional vector as well. The factorization is basically
achieved by the following method, find U and V such that, for all
known ratings rui, we get UT
u Vi = rui. This means, find two low-dimensional matrices such that
their multiplication results in reconstructing the known ratings.
Existing algorithms for solving the UV-decomposition are all based
on this basic method, but they vary in the problem modeling and in
the algorithmic steps.
18
There exist different modelings to the matrix factorization such as
least squares, probabilistic matrix factorization, maximum
margin-based, ranking error-based etc. In the following
subsections, we explain the first two modeling methods, which are
widely adopted. Additionally, we explain two different algorithms
for solving the UV-decomposition. For a comprehensive explanation
about the rest of the models, we refer the reader to
[VBCG17].
Figure 2.3. Illustrating UV-decomposition as a matrix factorization
method on the rating matrix R using k latent factors.
2.4.1. Least squares modeling
Let the set of all known ratings denoted by ,
= {rij ∈ R | rij is a known rating}
The basic modeling for UV-decomposition maps directly to the basic
method pre- sented above. In least squares modeling, we want to
find two matrices U ∈ Rn×k
and V ∈ Rm×k such that the following loss is minimized:
L = 1 ||
2.4.2. Probabilistic matrix factorization
Mnih and Salakhutdinov presented in [MS08] a probabilistic model
for matrix fac- torization, which came to be widely known as the
Probabilistic Matrix Factorization
19
Chapter 2 Recommender Systems
(PMF). It applies maximum posterior estimation, which is derived as
following,
(U , V ) = arg max U,V
P (U, V |R)
P (R)
] = arg max
= arg max U,V
[logP (R |U, V ) + log P (U) + logP (V )]
The conditional distribution over the known ratings is defined as a
Gaussian distri- bution with mean rij and variance σ2,
P (R|U, V ) = ∏ rij∈
N (rij|UT i Vj, σ
2) (2.3)
Assuming the priors P (U), P (V ) follow Gaussian distribution with
a zero mean and variances σ2
u, σ 2 v respectively, and I is a k × k identity matrix:
p(Ui) = n∏ i=1 N (0, σ2
u I) (2.4)
v I) (2.5)
Given the ratings’ conditional distribution, the priors in addition
to the variances σ, σu, σv, we get the following log posterior of
U, V and R.
log(U, V |R, σ, σu, σv) = − 1 2σ2
∑ rij∈
2σ2 u
V T j Vj (2.6)
Maximizing the log-posterior is equivalent to minimizing the sum of
squared errors loss function with quadratic regularization
terms:
L = 1 2 ∑ rij∈
(rij − UT i Vj)2 + λu
2
Where λu = σ2/σ2 u and λv = σ2/σ2
v are the regularization parameters for U and V respectively.
2.4.3. Matrix factorization algorithms
Both of the modelings presented in the previous subsections reduce
matrix factor- ization to an optimization problem in which, the sum
of squared errors is to be
20
2.4 Matrix Factorization in CF
minimized. As we saw, the Maximum-a-Posteriori solution (MAP) leads
to a regu- larized loss (Equation 2.7). We can rewrite this loss
function in the following form. This form makes it easier to
compute the derivatives in the coming subsections,
L = 1 2
n∑ i=1
m∑ j=1
2
V T j Vj (2.8)
Where Iij is an indicator function that indicates the known
ratings:
Iij =
We will explain two widely-adopted algorithms for solving the
matrix factorization. Both algorithms solve the following
optimization:
(U , V ) = arg min U,V
L (2.9)
where L is the loss defined in Equation 2.8. Both algorithms find
the optimal value of U and V . Afterwards, we can estimate the
utility of an item j to user i by the dot product between their
latent factors:
Rij = UT i Vj (2.10)
2.4.3.1. Stochastic gradient descent (SGD)
SGD was first presented to solve the matrix factorization by Simon
Funk in his blog post for the Netflix prize [Fun06]. In order to
solve the optimization defined in Equation 2.9, the algorithm loops
over all known ratings, and computes for each rating the prediction
error,
Eij := rij − UT i Vj (2.11)
Then, Ui and Vj are updated by a magnitude proportional to a given
learning rate γ in a direction opposite to the gradient of the loss
(Equation 2.8). To compute the gradient of the loss with respect to
Ui, we assume Vj is constant. Analogously, when computing the
gradient of the loss with respect to Vj, we fix Ui, as
following
∂L ∂Ui
∂L ∂Vj
21
Chapter 2 Recommender Systems
Based on these partial derivatives of the loss with respect to Ui
and Vj, we extract the following update rules for Ui and Vj,
Ui := Ui + γ ( m∑ j=1
Iij Eij Vj − λuUi)
Iij Eij Ui − λvVj)
The initial values of Ui and Vj are assigned randomly. After
iterating over all known ratings, the algorithm repeats the same
steps again until convergence or until reaching a predefined number
of epochs. Finally, it outputs the matrix factors U and V .
2.4.3.2. Alternating least squares (ALS)
The loss function (Equation 2.8) is non-convex because it has two
unknown variables, namely U and V . Therefore, the optimization
problem (Equation 2.9) cannot be solved analytically. If we fix one
of the variables, the loss function becomes quadratic in terms of
the other variable and the optimization can be solved in a closed
ana- lytical form. This is the basis of ALS, presented by Zhou et
al. in [ZWSP08]. ALS rotates between fixing one of the variables Ui
and Vj and solves the loss function for the other variable. This
way, the non-convex problem is turned into a quadratic problem that
can be solved optimally as following. Differentiating Equation 2.8
with respect to vector Ui, analogously with respect to the vector
Vj results in the partial derivatives that are formulated in
Equation 2.12 and Equation 2.13 respectively. In order to find the
optimal values for these vari- ables, we set the respective
derivatives to zero and solve for the corresponding vari-
able,
λuUi − m∑ j=1
Iij rij Vj + m∑ j=1
Iij Vj (UT i Vj) = 0
The dot product of the real vectors Ui and Vj is commutative, UT i
Vj = V T
j Ui,
Iij Vj (V T j Ui) = 0
Matrix multiplication is associative,
⇒ λuUi − m∑ j=1
Iij (Vj V T j )Ui = 0
The sum ∑m j=1 Iij rij Vj can be rewritten as V T I(i) Ri,
also
∑m j=1 Iij (Vj V T
j ) can be rewritten as V T I(i) V , where I(i) is a diagonal
matrix with the dimensions m ×m
22
2.4 Matrix Factorization in CF
and the values Iij at its diagonal for all j’s,
⇒ λuUi − V T I(i) Ri + V T I(i) V Ui = 0 ⇒ (λuI + V T I(i) V )Ui =
V T I(i) Ri
Which leads to the following solution for Ui,
Ui = (λuI + V T I(i) V )−1 V T I(i) Ri (2.14)
Analogously, fixing Ui and setting the derivative to zero and
solving for Vj leads to
Vj = (λvI + UT I(j) U)−1 UT I(j) Rj (2.15)
Where I(j) is a diagonal matrix with the dimensions n× n and the
values Iij at its diagonal for all i’s, The ALS steps are:
1. Initialize the matrices U and V randomly 2. for each user i, set
the new values for Ui using Equation 2.14 3. for each item j, set
the new values for Vi using Equation 2.15 4. repeat the last two
steps until convergence or until reaching a predefined num-
ber of epochs
2.4.3.3. Comparing SGD and ALS
Both algorithms solve the UV-decomposition for CF recommendations,
but each one has its own characteristics which makes it better in
some scenarios. We will review important aspects that allows
comparing the algorithms and understanding when and why to use
which algorithm. Some of these aspects are based on the discussion
provided in [KBV09]. In terms of efficiency, ALS is typically
slower than SGD as it involves a least squares solution, the
solution in Equations 2.14 and 2.15. However, in total, ALS needs
fewer iterations to reach the same level of accuracy. An important
remark about ALS is that computing users latent vectors Ui’s are
independent from each others, the same applies for the items latent
vectors Vj’s. This allows ALS to benefit from parallel computation
for the latent vectors. On the other hand, since SGD iterates over
known ratings only and updates the models accordingly, it gets more
efficient in case the known ratings are very few i.e. in case of a
sparse rating matrix. The opposite occurs in the case of implicit
feedback, where the unknown ratings are usually treated as negative
ratings with low confidence (as we will see in the next
subsection). In such case, ALS is more practical and SGD gets very
expensive as it needs to iterate over all matrix entries, whereas
ALS will benefit from the parallelism. Regarding parameter tuning,
Both algorithms need to tune the following parameters
1. Regularization parameters;
Chapter 2 Recommender Systems
2. The Number of epochs (iterations); and 3. The number of latent
features k.
However, SGD requires additionally tuning the step size γ.
2.5. Implicit Feedback and One-class problem
As we mentioned earlier in section 2.2.2, users hardly give
explicit feedback and therefore, recommender systems depend on
other sources of ratings, namely the im- plicit feedback inferred
from users activities. Some examples for users activities in the
domain for scientific publications recommendation are: authoring,
adding a pa- per to the personal collection, adding social tags,
downloading, reading or browsing papers, etc. Dealing with implicit
feedback usually comes at a cost that implicit feed- back refer to
positive ratings only and on the contrary, negative ratings are
hardly identified. This leads to a situation known as the one-class
problem. One-class problem occurs in recommender systems when the
system uses implicit feedback in scenarios where negative ratings
are not available. As a result, the rating matrix contains only
positive ratings, whereas the rest of the ratings are unknown. The
naive solution is to treat all unknown ratings as negative i.e.
assuming that all unobserved items are irrelevant. Obviously, this
approach is not the correct solu- tion especially because the
potential relevant items belong to the set of unobserved items, and
assigning negative ratings to these items will instruct the
recommenda- tion algorithms to consider them irrelevant.
Consequently, the algorithm will not recommend them.
WALS algorithm The widely adopted method for UV-decomposition in
case of positive-only rating or one-class collaborative filtering
is treat unknown ratings as negatives while associating them with
low weights on the error term. This way, un- known ratings will
have less contribution during the learning process in comparison to
known ratings. The method is based on the ALS algorithm, it was
presented by Pan et al. in [PZC+08] as the wALS algorithm. Compared
to the ALS algorithm, wALS adds some changes to the loss function.
Instead of multiplying the prediction error of the rating rij by
the indicator function Iij as seen in Equation 2.8, it gets
multiplied by a weighting score Cij in wALS. Hence, the loss
function which is used in wALS is
LWALS = 1 2
n∑ i=1
m∑ j=1
2
V T j Vj (2.16)
Different suggestions for the weights Cij where presented in the
literature, the general idea is that unknown ratings should be
associeated weights (or confidence weights) that are lower than the
wieghts associated to known ratings. For example, Pan et
24
2.5 Implicit Feedback and One-class problem
al. presented in [PZC+08] the user-oriented weighting scheme, where
the weights correlate to the number of positive ratings of the
user. Other works set the weights as the number of time the user
consumed an item, allowing for several confidence levels even for
positive ratings [HKV08], or as a static predefined value b that is
much smaller than the weight of observed rating a: a > b >
0.
25
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . .
27 3.2. Relevant Papers Identification . . . . . . . . . . . . . .
28 3.3. Paper Recommendation and Citation Recommendation 29 3.4.
Dimensions for Approaches Comparison . . . . . . . . 29 3.5.
Approaches for Recommending Scientific Publications 41 3.6.
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . .
50
3.1. Introduction
Building a recommender system for scientific publications gained
considerable atten- tion in the research community. Since
scientific publications are feature-rich, there exists a wide range
of different representation models for them. For example, sci-
entific publications can be modeled based on the citation
relationships that connect papers with each others, leading to a
graph structure or a graph model. Another possible representation
is the vector space model, representing the papers as feature
vectors taking into consideration their rich attributes or textual
content. This made the task of solving scientific paper
recommendation attractive for researchers from different
disciplines including information retrieval, web science, machine
learning, database, etc. Each discipline approached the problem
from its perspective. As a result, we can find a wide range of
approaches in the literature that range from applying graph
algorithms that work on the citation network, modeling the problem
as a random walker problem, to model-based machine learning methods
that see the problem as a supervised learning task. Given this wide
range of contributing com- munities, and in order to draw a
detailed picture about the state-of-the-art methods that addressed
scientific publication recommendation, we provide in this chapter a
literature survey.
27
Chapter 3 Literature Survey
The main goal of this survey is to provide an overview about the
status quo of re- search on our topic, and to understand the
challenges existing works face and solve. Additionally, to
investigate if and how the issues which we address in this thesis
have been approached in previous works. This allows situating this
work among existing literature. The main contributions of this
chapter can be summarized in:
1. We present a categorization for the recommendation approaches to
the problem of recommending scientific publications.
2. We identify popular approaches in the research community. 3. We
identify the promising emerging approaches that are gaining lately
more
attention.
Chapter structure In section 3.2, we introduce the set of
publications on which we based our survey. Afterward, we
distinguish between the scientific papers rec- ommendation and a
close domain which is the scenario of citation recommendation in
section 3.3. Then, we introduce in section 3.4 our comparison
dimensions. The details of the identified recommendation approach
besides with a description for each reviewed work are presented in
section 3.5. Afterwards, we provide a brief discussion in section
3.6 and we conclude the chapter in 3.7.
3.2. Relevant Papers Identification
There has not been a lot of works that survey the domain of
scientific publications recommendation. To the best of our
knowledge, the latest and the most cited works are two surveys done
by Beel et al. [BLG+13, BGLB16]. The first one was published in
2013 and focused on the evaluation aspect of scientific paper
recommender sys- tems. Later in 2015, it was extended into a more
comprehensive survey that focused on the recommendation approaches
in addition to evaluation methods [BGLB16]. The latter surveyed
over 200 relevant publications published until June 2013 and
provided an analysis of the active research communities, the
employed recommen- dation approaches, and evaluation methods. As
our work for this thesis was done during a later period, i.e.,
after 2013, we con- sider Beel’s survey [BGLB16] the starting
point, and we aim to draw a more recent picture about the related
literature. Therefore, we investigated the papers that ad- dressed
the problem of recommending scientific publications which either
have not been considered in Beel’s survey, or were published after
June 2013. For this pur- pose, we collected our set of relevant
publications from two sources. The first one is a set of 87
publications collected by Altina Spahija [Spa18]. This set was
collected in December 2017 using Google Scholar and contains all
papers that cite Beel’s sur- vey and are relevant to the problem of
scientific publication recommendation. We studied all papers in
this set and filtered out all works that (a) don’t describe a
28
3.3 Paper Recommendation and Citation Recommendation
novel recommendation approach; (b) are not published in a
peer-reviewed venue, or (c) are not implemented and evaluated. We
ended up with 28 papers that present novel approaches for
recommending scientific publications. The second source is the set
of 14 relevant works which we discovered during our research work
on this thesis after filtering out all papers covered in [BGLB16]
and the papers which appear in the first set. The union of these
two sets forms the paper corpus which is the focus of our survey in
this chapter. This paper corpus contains 42 scientific papers pub-
lished in peer-reviewed venues, the majority of the corpus’ papers
(35 papers) are very recent, published between 2015 and 2018 and
each of these papers presents a novel approach for scientific paper
recommendation.
3.3. Paper Recommendation and Citation Recommendation
A problem that is very close to scientific publication
recommendation is citation recommendation. Generally speaking, the
two problems seem very similar since in both cases, the task is to
recommend a set of scientific publications to the user. How- ever,
the main difference lies in the recommendation purpose. Unlike our
problem, the user in citation recommendation seeks a set of papers
to be used as references for a given publication. Therefore, the
key factor here is not the user interest in general, but the
publication which will contain the recommended references. Based on
this main difference, we draw a line to distinguish between
citation recommen- dation and scientific papers recommendation
where users are usually interested in exploring unknown
publications which might be relevant, important or trending.
However, in our paper corpus, we found one scenario for citation
recommendation that we consider relevant to our study. In this
scenario, the system recommends papers for citation considering the
whole input publication instead of considering the citation context
only. Citation context refers to the set of words that sur- round
the expected reference position. This context is usually utilized
to decide which paper(s) could be cited at that position. We call
this scenario citation recom- mendation without context. We
consider this scenario similar to recommendation approaches that
recommend relevant papers given a single document, which will be
explained in more details in the next section (section 3.4.2). In
our paper corpus, the works [RFP16, GCH+17, JS17, JS18] presented
such a citation recommendation approach that falls into this
scenario.
3.4. Dimensions for Approaches Comparison
We start our study by defining a set of dimensions for comparing
the recommender systems presented in the paper corpus. After
reviewing all papers in the corpus, we
29
Chapter 3 Literature Survey
defined nine comparison dimensions and recorded for each paper the
corresponding values. Five of these dimensions have predefined set
of values. The detailed cate- gorization of the reviewed works on
these dimensions is presented in Table 3.1 and in Table 3.3. In
these tables, the sign ’X’ means that the corresponding value is
observed in the paper, the sign ’-’ means the value is not observed
in the paper, and the absence of any sign refers to the fact that
the dimension is not applicable for the corresponding paper. In the
following subsections, we introduce each of these dimensions in
details.
3.4.1. Recommendation approaches
The first and the most significant dimension in our comparison is
the recommen- dation approach. Beel et al. in [BGLB16],
distinguished between recommendation classes, approaches,
algorithms, and implementations where each concept in this list is
more specific than the preceding one. It is a hierarchy of four
levels with different abstraction levels. For example,
Collaborative Filtering (CF) is a recommendation class. The
recommendation approach was defined as “a model of how to bring a
recommendation class into practice.”, the user-based CF is, for
example, a recom- mendation approach. We started from this
hierarchical categorization, but we soon realized that it is too
rigorous and not all works can be smoothly mapped. Some works might
fall in two different approaches, also some recommendation
approaches might be mapped to multiple classes, and finally, not
all recommendation classes listed in [BGLB16] are represented in
our paper corpus. Therefore, we decided to relax the hierarchical
structure of Beel’s categorization and to concentrate on the
recommendation ap- proaches only. For our study, the recommendation
approaches give the needed level of details to understand the main
idea behind presented recommenders. After reviewing all papers from
our corpus, we identified the following seven dif- ferent
recommendation approaches: Latent Factor Models (LFM), Preference
learn- ing, Cross-domain, Content-based Filtering (CBF),
Graph-based, Hybrid and Co- occurrence. The first three of these
approaches are emerging approaches that didn’t appear in Beel’s
survey [BGLB16]. Table 3.2 shows the list of recommendation ap-
proaches with the number of papers assigned for each approach. For
some of these approaches, we could identify different directions in
the presented works, which al- lowed introducing sub-approaches. We
will provide for each of these approaches a definition and
introduce the sub approaches later in section 3.5, where we will
also provide a small summary for each reviewed work. It is
important to mention that we do not restrict the works to fall into
a single approach, on the contrary, a paper might present a
recommender that applies multiple recommendation approaches. For
example, [ZWL16] and [DNT14] present approaches that fall in both
CBF and graph-based categories. Another example is [BBM16] which
approach is both hybrid and LFM.
30
er Y ea r
P ap
m od
do cu
Le ar ni ng
ai n,
in g th e de ta ile d m ap
pi ng
ed pa
pe rs
w in g di m en sio
ns : re co m m en da
tio n ap
pr oa ch ,
tio n sc en ar io ,P
ap er sr
el ,u
se rm
d ha
nd lin
g im
ck or
le m . (C
Y ear
R ecom
m endation
approach Scenario
P aper
3.4 Dimensions for Approaches Comparison
Table 3.2. The recommendation approaches with the number of papers
from the studied corpus in each approach.
Approach Number of papers
Content-Based Filtering (CBF) 22 Hybrid 9 Graph-based 8 Latent
Factor Model 6 Preference Learning 2 Cross-domain 2 Co-occurrence
based recommendation 2
3.4.2. Recommendation scenarios
The second dimension is the recommendation scenario. We defined the
following three different scenarios based on the type of input
passed to the recommender:
1. Input Query, where the user specifies a query that defines the
user’s needs or requests explicitly. The recommender system, in
this case, acts more like a search engine that searches for
relevant papers matching the search query. Consequently, such
systems do not realize the user entity and don’t apply user
modeling and personalization. If two users enter the same search
query, the recommender does not see two different users and
delivers the same recom- mendations for both of them. In our paper
corpus, six approaches follow this scenario.
2. Single Document, where the user specifies a single document (a
research pa- per) and expects the recommender to deliver a set of
papers that are similar or relevant to the input paper. This
scenario is also not a typical recom- mendation scenario since it
removes the user entity from the model just as the first scenario.
However, systems that serve such scenarios do not apply query
matching techniques, but they can employ meta-data associated with
the input paper such as the list of references, the authors’ list,
the venue, etc. to discover related papers. Similar to the previous
scenario, also six of the reviewed papers follow this
scenario.
3. Multiple Documents, where the recommender is given, as input, a
set of publications that reflect the user interest. Systems that
follow this scenario analyze the input publications and discover
commonalities among them in order to build a user model reflecting
the user’s interest. This scenario is the predominant in our paper
corpus with 25 papers.
Systems that apply the first two scenarios can be seen as
“Case-Based” recommender system as defined by Aggarwal in [A+16],
where the user provides a single example
33
Chapter 3 Literature Survey
of interest that is considered as a user requirement rather than a
historical rating. In addition to these scenarios, we shed light on
some interesting special scenarios we came across in the reviewed
papers. One case is the shortlisting task presented in [RFP16]. The
goal of shortlisting is, given a list of relevant paper, the
reading list, the system should identify important papers out of
that list. It is rather more about finding important papers from a
limited list than discovering possible relevant papers from a wide
candidate set. Another case is the work presented in [ZWL16] where
the goal is, given a research target and the researcher’s
background knowledge, the system should recommend a set of papers
that help the researcher to achieve the re- search target. In other
words, find the papers that bridge the user’s knowledge gap. The
final case is presented in [XCJ+17], which tackles an interesting
case that is not particularly for scientific papers, but a reading
recommendation system in general. In addition to user interest
list, the authors refer to the importance of other aspects such as
the stress levels associated with the reading material, and account
for such aspects in their recommendations.
3.4.3. User modeling
Another dimension that we consider in our comparison is user
modeling. Based on this dimension, we check if the work provides a
method for user modeling or for building a user profile. We record
this information in Table 3.1. All works that followed the input
query or the single document scenarios did not pro- vide a method
for user modeling. On the contrary, most of the works for multiple
documents scenario provided a method for modeling the user entity
in the system. However, some works followed the multiple documents
scenario but didn’t support user modeling, such as [DNT14].
3.4.4. Publications data usage
Scientific publications are rich with the available descriptive and
content data. We can categorize the data available for scientific
publications in three categories:
1. Textual content, including the title, the abstract, the list of
authors-defined keywords and the publication’s full-text. The first
three information are usu- ally made publicly available online by
the publication’s publisher or the digital libraries. Unlike the
publication’s full-text which is usually protected behind paywalls,
and it is therefore, a challenging task for the recommender system
to access the publications full-text. In our paper corpus, 17
papers relied solely on textual content, 11 of them utilized the
publications’ the full-text.
2. Structured attributes, including other publications meta-data
which are like- wise usually publicly available. Such attributes
include the list of authors, the publication’s venue, the year of
publication and the number of pages, etc.
34
3.4 Dimensions for Approaches Comparison
Only two works from the reviewed paper corpus relied solely on the
structured attributes which are [XLL+16, SDF17]. Whereas, other
works that benefited from structured attributes, utilized them in
addition to other types of publica- tions’ data such as textual
content as in [ZZWS14, TMOZ16, AA17], references list as in [ICG17,
SJC+17] or both of them as in [XGLC