Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
PERSONALIZED RANKING FOR TAG-
BASED ITEM RECOMMENDATION SYSTEM
USING TENSOR MODEL
Noor Ifada
B.Eng, M.ISD
Submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy
School of Electrical Engineering and Computer Science
Faculty of Science and Engineering
Queensland University of Technology
2016
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model i
Keywords
Binary Data, Boolean Interpretation Scheme, Candidate Item, Discounted
Cumulative Gain, Graded-relevance Interpretation Scheme, Graded-relevance Data,
Graded Average Precision, Interpretation Scheme, Latent Factor, Learning-to-rank
Approach, List-wise based Ranking, Mean Square Error, Multi-graded Data,
Optimization Criterion, Point-wise based Ranking, Probabilistic Ranking, Set-based
Interpretation Scheme, Social Tagging System, Tag-based Item Recommendation,
Tagging Data, Tag Preference, Tag Usage, Tensor Factorization, Tensor Model,
Tensor Reconstruction, Top-𝑁 Recommendation, User-Tag Set Interpretation
Scheme, User Profile, Weighted Scheme.
ii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
Abstract
Social Tagging Systems (STSs) have gained great popularity on the Internet, since
users can annotate items of their interest using freely defined tags which can be used
for organising, retrieving, and sharing items with others. By learning from the user’s
past tagging behaviour using a tensor model, an STS can generate a list of item
recommendations, which may be of interest to the user. Despite its popularity, the
current tag-based item recommendation methods face several challenges. Firstly, a
tagging data interpretation scheme has an important role in defining the user profile
representation in a tensor model and greatly affects the recommendation
performance. The current interpretation schemes overgeneralise the “irrelevant”
entries of the non-observed tagging data. Secondly, when utilising the reconstructed
tensor for recommendation, the existing methods inappropriately disregard the users’
past tagging activities, which have been found to influence the user preference in the
recommended items. Thirdly, the tensor latent factors can directly be utilised for
generating recommendations, avoiding the expensiveness of the tensor reconstruction
process. Given the characteristics of user profile representation resulted from the
implementation of an interpretation scheme, this approach requires building an
efficient “learning-to-rank” model that governs the recommendation process.
This thesis proposes to tackle these challenges by developing two efficient
tagging data interpretation schemes and four ranking methods for tag-based item
recommendation systems, based on tensor models and learning-to-rank approaches.
The developed interpretation schemes, namely UTS and graded-relevance, apply
ranking constraints to interpret the tagging data that allow a ranked representation
and result in richer data. The developed ranking methods fall into the category of
point-wise and list-wise based ranking approaches and consider the recommendation
task as regression/classification and ranking respectively.
The first developed point-wise based ranking method, namely “Tensor-based
Item Recommendation using Probabilistic Ranking” (TRPR), focuses on (1)
improving the scalability during the tensor reconstruction process by implementing a
memory efficiency technique and (2) increasing the recommendation accuracy by
ranking the items of the reconstructed tensor using a subsequent probabilistic
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model iii
approach. The second method, namely “Recommendation Ranking using Weighted
Tensor” (We-Rank), focuses on dealing with the sparsity problem and improving the
recommendation accuracy during the learning-to-rank process. We-Rank implements
a weighted scheme for learning the tensor recommendation model in a way such that
the observed and non-observed entries of each user-item set are given either rewards
or penalties, i.e. the observed entries are weighted with higher values than the non-
observed ones.
The first developed list-wise based ranking method, namely “DCG
Optimization for Learning-to-Rank” (Do-Rank), learns from a user profile built using
the multi-graded data resulted from the implementation of the proposed User-Tag Set
(UTS) scheme. Do-Rank optimizes the recommendation model with respect to
Discount Cumulative Gain (DCG) as the ranking evaluation measure to appropriately
learn the tensor recommendation model built from the multi-graded data. The second
method, namely “GAP Optimization for Learning-to-Rank” (Go-Rank), learns from a
user profile built using the graded-relevance data resulted from the implementation
of the proposed graded-relevance scheme. Go-Rank optimizes the recommendation
model with respect to Graded Average Precision (GAP) as the ranking evaluation
measure to appropriately learn the tensor recommendation model built from the
graded-relevance data.
The developed methods are evaluated using the real-world and freely-available
data from tagging systems. Empirical analyses show that the UTS scheme efficiently
interprets the tagging data as a rich multi-graded data, with ordinal relevance set of
{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡}. Similarly the graded-relevance scheme
efficiently interprets the tagging data as a rich graded-relevance data with ordinal
relevance set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑙𝑖𝑘𝑒𝑙𝑦 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡}. The
experiment results show the proposed methods outperformed the benchmarking
methods. They ascertain that a combination of the interpretation scheme and
learning-to-rank approach has a positive influence in making a recommendation. The
memory efficient technique is implemented to solve the scalability issue that occurs
during the tensor reconstruction process, whereas the weighted scheme and efficient
interpretation scheme are implemented for tackling the sparsity issue. Comparing the
performance of methods based on the learning-to-rank approach, in general, the list-
wise based ranking methods achieve better performance in terms of NDCG than the
iv Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
point-wise based ranking methods. On the other hand, the latter achieves better
performance in terms of AP and MAP in comparison to the former.
This thesis contributes towards the topic under research, that of tag-based
recommendation systems, by focusing on efficiently interpreting tagging data and
implementing the learning-to-rank approaches to the tensor used as the
recommendation model. The tagging data interpretation schemes and learning-to-
rank approaches play an important role in significantly improving the tag-based item
recommendation quality.
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model v
Dedication
I dedicate this thesis to:
My Mother
My Father
My Brothers
vi Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
Table of Contents
Keywords ................................................................................................................................................. i
Abstract ................................................................................................................................................... ii
Dedication ............................................................................................................................................... v
Table of Contents ................................................................................................................................... vi
List of Figures ......................................................................................................................................... x
List of Tables ...................................................................................................................................... xiii
List of Abbreviations ............................................................................................................................ xv
Statement of Original Authorship ....................................................................................................... xvii
Acknowledgements ........................................................................................................................... xviii
CHAPTER 1: INTRODUCTION ....................................................................................................... 1
1.1 Background and Motivations ....................................................................................................... 1
1.2 Research Questions ...................................................................................................................... 6
1.3 Research Objectives ..................................................................................................................... 6
1.4 Research Contributions ................................................................................................................ 9
1.5 Research Significance ................................................................................................................ 11
1.6 Publications................................................................................................................................ 12
1.7 Thesis Outline ............................................................................................................................ 13
1.8 Chapter Summary ...................................................................................................................... 15
CHAPTER 2: LITERATURE REVIEW ......................................................................................... 17
2.1 Web Personalization .................................................................................................................. 17 2.1.1 Content-based Approaches ............................................................................................. 18 2.1.2 Collaborative Filtering Approaches ................................................................................ 20 2.1.3 Hybrid Approaches ......................................................................................................... 21 2.1.4 Summary and Discussion ............................................................................................... 22
2.2 Tag-based Item Recommendation Systems ............................................................................... 22 2.2.1 Social Tagging Systems.................................................................................................. 22 2.2.2 User Profile Modelling Approaches ............................................................................... 25 2.2.2.1 Two-Dimensional Approaches ....................................................................................... 25 2.2.2.2 Multi-Dimensional Approaches ...................................................................................... 27 2.2.3 Tagging Data Interpretation Schemes............................................................................. 31 2.2.3.1 The boolean Scheme ...................................................................................................... 31 2.2.3.2 The set-based Scheme .................................................................................................... 33 2.2.4 Summary and Discussion ............................................................................................... 33
2.3 Ranking-based Recommendation Approaches .......................................................................... 36 2.3.1 Point-wise Based Ranking Approaches .......................................................................... 37 2.3.1.1 Regression based algorithm ............................................................................................ 37 2.3.1.2 Classification based algorithm ........................................................................................ 38 2.3.2 Pair-wise Based Ranking Approaches ............................................................................ 40 2.3.2.1 Regression based algorithm ............................................................................................ 40 2.3.2.2 Classification based algorithm ........................................................................................ 42 2.3.3 List-wise Based Ranking Approaches ............................................................................ 42 2.3.3.1 Directly Optimizing Ranking Evaluation Measure......................................................... 43 2.3.3.2 Minimizing List-wise Loss ............................................................................................. 45 2.3.4 Summary and Discussion: Ranking based recommendation .......................................... 46
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model vii
2.4 Chapter Summary and Concluding Remarks ............................................................................. 48
CHAPTER 3: RESEARCH DESIGN ............................................................................................... 51
3.1 Introduction ................................................................................................................................ 51
3.2 Research Design......................................................................................................................... 51 3.2.1 Phase-One: Tagging Data Pre-Processing ...................................................................... 53 3.2.1.1 The boolean Scheme....................................................................................................... 54 3.2.1.2 The User-Tag set (UTS) Scheme .................................................................................... 55 3.2.1.3 The graded-relevance Scheme ....................................................................................... 55 3.2.2 Phase-Two: Generating Recommendations with Ranking Methods ............................... 56 3.2.2.1 Phase-Two (a): Point-wise based Ranking Approaches ................................................. 57 3.2.2.1.1 TRPR: Probabilistic Ranking ....................................................................................... 57 3.2.2.1.2 We-Rank: Weighted Tensor Approach for Ranking .................................................... 58 3.2.2.2 Phase-Two (b): List-wise based Ranking Approaches ................................................... 59 3.2.2.2.1 Do-Rank: Learning from Multi-graded Data ............................................................... 60 3.2.2.2.2 Go-Rank: Learning from Graded-relevance Data ........................................................ 60
3.3 Datasets ...................................................................................................................................... 61 3.3.1 Experimental Settings ..................................................................................................... 64
3.4 Evaluation Metrics ..................................................................................................................... 66 3.4.1 Point-wise based Ranking Approach .............................................................................. 67 3.4.2 List-wise based Ranking Approach ................................................................................ 68 3.4.2.1 Average Precision (AP) and Mean Average Precision (MAP) ....................................... 68 3.4.2.2 Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative
Gain (NDCG) ................................................................................................................. 68
3.5 Benchmarking Methods ............................................................................................................. 69 3.5.1 MAX Method ................................................................................................................. 70 3.5.2 Pairwise Interaction Tensor Factorization (PITF) Method ............................................. 70 3.5.3 CF-based method that applied the Candidate Tag Set (CTS) Method ............................ 72
3.6 Chapter Summary ...................................................................................................................... 73
CHAPTER 4: POINT-WISE BASED RANKING METHODS ..................................................... 75
4.1 Introduction ................................................................................................................................ 75 4.1.1 Challenges ...................................................................................................................... 75 4.1.2 Proposed Solutions ......................................................................................................... 76
4.2 TRPR: Probabilistic Ranking ..................................................................................................... 77 4.2.1 Overview ........................................................................................................................ 77 4.2.2 User Profile Construction ............................................................................................... 78 4.2.3 Learning-to-Rank Procedure........................................................................................... 80 4.2.3.1 Optimization Criterion and Factorization Technique ..................................................... 80 4.2.3.2 Latent Factors Generation ............................................................................................... 83 4.2.4 Recommendation Generation ......................................................................................... 84 4.2.4.1 Tensor Reconstruction .................................................................................................... 85 4.2.4.2 Candidate Item and Tag Preference Sets Generation ...................................................... 89 4.2.4.3 Top-N Item Recommendation Generation via Probabilistic Ranking ............................ 90 4.2.5 Empirical Evaluation ...................................................................................................... 94 4.2.5.1 Choosing the Latent Factor Matrix Size F ...................................................................... 94 4.2.5.2 Accuracy Performance .................................................................................................... 95 4.2.5.3 Impact of Tag Preference Set Size .................................................................................. 98 4.2.5.4 Scalability ..................................................................................................................... 100 4.2.6 Summary of Probabilistic Ranking ............................................................................... 101
4.3 We-Rank: Weighted Tensor Approach for Ranking ................................................................ 102 4.3.1 Overview ...................................................................................................................... 102 4.3.2 User Profile Construction ............................................................................................. 103 4.3.3 Learning-to-Rank Procedure......................................................................................... 104 4.3.3.1 Optimization Criterion and Factorization Technique ................................................... 104 4.3.3.2 Weighted Tensor ........................................................................................................... 105 4.3.3.3 Latent Factors Generation ............................................................................................. 110
viii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
4.3.4 Recommendation Generation ....................................................................................... 112 4.3.5 Empirical Evaluation .................................................................................................... 112 4.3.5.1 Impact of Tag Preference Size ...................................................................................... 112 4.3.5.2 Primary Tensor Y and Weighted Tensor W .................................................................. 114 4.3.5.3 Accuracy Performance ................................................................................................. 115 4.3.6 Summary of Weighted Tensor Factorization ................................................................ 117
4.4 Chapter Summary .................................................................................................................... 117
CHAPTER 5: LIST-WISE BASED RANKING METHODS ....................................................... 119
5.1 Introduction.............................................................................................................................. 119 5.1.1 Challenges .................................................................................................................... 120 5.1.2 Proposed Solutions ....................................................................................................... 121
5.2 Do-Rank: Learning From Multi-Graded Data.......................................................................... 122 5.2.1 Overview ...................................................................................................................... 122 5.2.2 User Profile Construction ............................................................................................. 123 5.2.3 Learning-to-Rank Procedure ........................................................................................ 127 5.2.3.1 Optimization Criterion and Factorization Technique ................................................... 127 5.2.3.2 Ranking Smoothing ...................................................................................................... 129 5.2.3.3 Latent Factors Generation ............................................................................................ 130 5.2.3.4 Complexity Analysis and Convergence ........................................................................ 132 5.2.4 Recommendation Generation ....................................................................................... 133 5.2.5 Empirical Evaluation .................................................................................................... 134 5.2.5.1 Accuracy Performance ................................................................................................. 134 5.2.5.2 Impact of UTS scheme .................................................................................................. 138 5.2.5.3 Scalability ..................................................................................................................... 140 5.2.5.4 Convergence ................................................................................................................. 140 5.2.6 Summary of Learning from Multi-Graded Data ........................................................... 141
5.3 Go-Rank: Learning From Graded-relevance data .................................................................... 142 5.3.1 Overview ...................................................................................................................... 142 5.3.2 User Profile Construction ............................................................................................. 143 5.3.3 Learning-to-Rank Procedure ........................................................................................ 147 5.3.3.1 Optimization Criterion and Factorization Technique ................................................... 147 5.3.3.2 Ranking Smoothing ...................................................................................................... 148 5.3.3.3 Latent Factors Generation ............................................................................................ 149 5.3.3.4 Complexity Analysis and Convergence ........................................................................ 152 5.3.4 Recommendation Generation ....................................................................................... 152 5.3.5 Empirical Evaluation .................................................................................................... 152 5.3.5.1 Impact of graded-relevance Scheme ............................................................................ 153 5.3.5.2 Accuracy Performance ................................................................................................. 155 5.3.5.3 Impact of Probability Values ........................................................................................ 158 5.3.5.4 Scalability ..................................................................................................................... 161 5.3.5.5 Convergence ................................................................................................................. 162 5.3.6 Summary of Learning from Graded-Relevance Data ................................................... 162
5.4 Chapter Summary .................................................................................................................... 163
CHAPTER 6: PERFORMANCE COMPARISONS AND ANALYSIS ...................................... 165
6.1 Impact of Interpretation Scheme to Tensor Entries Populations .............................................. 169
6.2 Impact of p-core to Tensor Entries Populations and Method Performances ............................ 170
6.3 Impact of Users Tagging Behaviours to Tensor Entries Populations ...................................... 171
6.4 Impact of “Relevant” Entries To Method Performances ......................................................... 173
6.5 Impact of Handling “Likely Relevant” Entries to Method Performances ................................ 175
6.6 Accuracy Comparisons of the Proposed Methods ................................................................... 176
6.7 Point-wise based ranking Methods Versus List-wise based Ranking Methods ....................... 179
6.8 Proposed Methods Versus Benchmarking Methods ................................................................ 182
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model ix
6.9 Efficiency Versus Method Performances ................................................................................. 184
6.10 Computation Complexity ......................................................................................................... 185
6.11 Time complexity ...................................................................................................................... 186
6.12 Strengths and Shortcomings of The Proposed Methods .......................................................... 187
6.13 Chapter Summary .................................................................................................................... 191
CHAPTER 7: CONCLUSIONS ...................................................................................................... 193
7.1 Summary of Contributions ....................................................................................................... 194
7.2 Summary of Findings ............................................................................................................... 196
7.3 Limitations and Future Works ................................................................................................. 200
BIBLIOGRAPHY ............................................................................................................................. 203
x Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
List of Figures
Figure 1.1. A sample of tagging data that holds the user, item, tag ternary relations ............................. 2
Figure 2.1. Example of popular Social Tagging System Websites ....................................................... 23
Figure 2.2. Long-tail distribution of: (a) items of bookmarked URLs, (b) users who made the
bookmarks, and (c) tags used in the bookmarks – captured from tagging data of
Delicious website (Li et al., 2008) ....................................................................................... 24
Figure 2.3. Projection of user, item, tag relation into three two-dimensional matrices ........................ 25
Figure 2.4. The Tucker factorization model for a third-order tensor .................................................... 29
Figure 2.5. The CP factorization model for third-order tensor .............................................................. 29
Figure 2.6. A toy example with U = u1, u2, u3, I = i1, i2, i3, i4, and T = t1, t2, t3, t4, t5: (a)
The observed tagging data, and the initial tensor Y ∈ R3 × 4 × 5 for which entries
are generated by implementing (b) the boolean, and (c) the set-based schemes .................. 32
Figure 2.7. Learning-to-rank framework, adapted from (Liu, 2009) .................................................... 37
Figure 3.1. The research design ............................................................................................................ 52
Figure 3.2. A toy example of entries from the observed tagging data Aob ........................................... 54
Figure 3.3. The toy example of for User 1 (u1) profile built from various interpretation
schemes: (a) boolean, (b) UTS, and (c) graded-relevance .................................................. 56
Figure 3.4. A snapshot of the Delicious dataset .................................................................................... 61
Figure 3.5. A snapshot of the LastFM dataset ....................................................................................... 62
Figure 3.6. A snapshot of the CiteULike dataset .................................................................................. 62
Figure 3.7. A snapshot of the MovieLens dataset ................................................................................. 63
Figure 4.1. Overview of the Probabilistic Ranking method (TRPR) ..................................................... 78
Figure 4.2. Example of initial tensor Y ∈ R3 × 4 × 5 as the representation of user profile in
which entries are generated by implementing the boolean interpretation scheme to
the toy example in Figure 3.2 ............................................................................................... 80
Figure 4.3. The Tucker factorization model for a third-order tensor .................................................... 81
Figure 4.4. The CP factorization model for a third-order tensor ........................................................... 81
Figure 4.5. Example of three ways matricization of a tensor Y ∈ R3 × 4 × 5 ...................................... 82
Figure 4.6. The TRPR learning algorithm, adapted from (Kutty et al., 2012) ....................................... 84
Figure 4.7. The TRPR tensor reconstruction algorithm ......................................................................... 86
Figure 4.8. Example of tensor reconstruction process by implementing the memory efficient
approach where Y ∈ R3 × 4 × 5, Q = 3, R = 4, S = 5, F = 2, and b = 2 ......................... 88
Figure 4.9. Example of the reconstructed tensor Y ∈ R3 × 4 × 5 ......................................................... 89
Figure 4.10. The probabilistic ranking for Top-N item recommendation generation algorithm ........... 91
Figure 4.11. Example of tensor model from toy dataset with only non-negative and non-zero
values displayed as table: (a) Initial tensor Y ∈ R3 × 4 × 5, and (b) Reconstructed
tensor Y ∈ R3 × 4 × 5 .......................................................................................................... 92
Figure 4.12. Performance comparison of TRPR-CP with an increasing number of F........................... 95
Figure 4.13. F1-Score at various Top-N positions on Delicious dataset ............................................... 95
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xi
Figure 4.14. F1-Score at various Top-N positions on LastFM dataset .................................................. 96
Figure 4.15. F1-Score at various Top-N positions on CiteULike dataset .............................................. 96
Figure 4.16. F1-Score at various Top-N positions on MovieLens dataset ............................................ 97
Figure 4.17. Impact of tag preference set size on Delicious dataset ...................................................... 99
Figure 4.18. Impact of tag preference set size on LastFM dataset ........................................................ 99
Figure 4.19. Impact of tag preference set size on CiteULike dataset .................................................... 99
Figure 4.20. Impact of tag preference set size on MovieLens dataset ................................................... 99
Figure 4.21. Scalability comparison by varying tensor dimensionality on Delicious dataset ............. 100
Figure 4.22. Overview of the weighted tensor approach for ranking method (We-Rank) ................... 102
Figure 4.23. The CP factorization model for third-order tensor .......................................................... 105
Figure 4.24. The user Tag Usage Likeliness generation algorithm ..................................................... 107
Figure 4.25. Example of the resulted matricization of tensor Y ∈ R3 × 4 × 5: (a) Mode-1
matricization Y(1) ∈ R3 × 20, and (b) Mode-3 matricization Y(3) ∈ R5 × 12............... 108
Figure 4.26. Example of the resulted latent feature matrix: (a) User latent feature matrix
A ∈ R3 × 2, and (b) Tag latent feature matrix B ∈ R5 × 2 ................................................ 108
Figure 4.27. Example of the resulted User Tag Usage Likeliness matrix L ∈ R3 × 5 ........................ 108
Figure 4.28. The weighted tensor W ∈ RQ × R × S construction algorithm ....................................... 109
Figure 4.29. Example of: (a) Primary tensor Y ∈ R3 × 4 × 5, and (b) the resulted Weighted
tensor W ∈ R3 × 4 × 5 ...................................................................................................... 110
Figure 4.30. The We-Rank learning algorithm .................................................................................... 111
Figure 4.31. Impact of tag preference set size on Delicious dataset .................................................... 113
Figure 4.32. Impact of tag preference set size on LastFM dataset ...................................................... 113
Figure 4.33. Impact of tag preference set size on CiteULike dataset .................................................. 113
Figure 4.34. Impact of tag preference set size on MovieLens dataset ................................................. 113
Figure 4.35. The weighted tensor W densities at various tag preference set size on: (a)
Delicious, (b) LastFM, (c) CiteULike, and (d) MovieLens datasets .................................. 114
Figure 5.1. The initial tensor Y ∈ R3 × 4 × 5 , as the representation of user profile, which
entries are generated by implementing the: (a) set-based and (b) UTS interpretation
schemes .............................................................................................................................. 125
Figure 5.2. The CP factorization model for third-order tensor ............................................................ 128
Figure 5.3. The comparison between DCG and the smoothed approximation of DCG (sDCG).......... 130
Figure 5.4. The Do-Rank learning algorithm ...................................................................................... 133
Figure 5.5. The Do-Rank scalability ................................................................................................... 140
Figure 5.6. The Do-Rank convergence criterion ................................................................................. 141
Figure 5.7. Example of initial tensor Y ∈ R3 × 4 × 5 , as the representation of user profile,
which entries are generated by implementing the (a) set-based and (b) graded-
relevance interpretation schemes ....................................................................................... 144
Figure 5.8. The comparison between GAP and the smoothed approximation of GAP (sGAP) ........... 149
Figure 5.9. The Go-Rank learning algorithm ...................................................................................... 151
Figure 5.10. Go-Rank improvement over PITF ................................................................................... 158
Figure 5.11. Impact of probability values on the Delicious dataset .................................................... 160
Figure 5.12. Impact of probability values on the LastFM dataset ....................................................... 160
xii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
Figure 5.13. Impact of probability values on the CiteULike dataset ................................................... 160
Figure 5.14. Impact of probability values on the MovieLens dataset ................................................. 160
Figure 5.15. The Go-Rank scalability ................................................................................................. 161
Figure 5.16. The Go-Rank convergence .............................................................................................. 162
Figure 6.1. Comparison of size of p-core over tensor entries population on boolean, UTS, and
graded-relevance schemes ................................................................................................. 170
Figure 6.2. Comparison of p-core over methods performances using NDCG .................................... 171
Figure 6.3. Comparison of p-core over methods performances using AP .......................................... 171
Figure 6.4. Comparison of p-core over methods performances using MAP ....................................... 171
Figure 6.5. The statistic of user-item and user-tag sets on: (a) Delicious, (b) LastFM, (c)
CiteULike, and (d) MovieLens datasets ............................................................................. 172
Figure 6.6. Comparison of “relevant” over “irrelevant” entries population ........................................ 173
Figure 6.7. Comparison of “relevant” over “likely relevant” entries .................................................. 173
Figure 6.8. Comparison of “relevant” entries population over methods performances using
NDCG ................................................................................................................................ 174
Figure 6.9. Comparison of “relevant” entries population over methods performances using AP ....... 174
Figure 6.10. Comparison of “relevant” entries population over methods performances using
MAP ................................................................................................................................... 174
Figure 6.11. Comparison of MAX-boolean over MAX- graded performances showing the
impact of inappropriately handling the “likely relevant” entries........................................ 175
Figure 6.12. Comparison of methods performances on Delicious dataset .......................................... 176
Figure 6.13. Comparison of methods performances on LastFM dataset ............................................. 178
Figure 6.14. Comparison of methods performances on CiteULike dataset ......................................... 178
Figure 6.15. Comparison of methods performances on MovieLens dataset ....................................... 179
Figure 6.16. Comparison of proposed methods performances as the average over all datasets
using NDCG, AP, and MAP .............................................................................................. 180
Figure 6.17. The comparison of efficiency over method performances .............................................. 184
Figure 6.18. The comparison of time complexity of proposed methods ............................................. 187
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xiii
List of Tables
Table 1.1. An example of item recommendations based on tagging data in Figure 1.1 .......................... 3
Table 1.2. Summary of each developed method ..................................................................................... 9
Table 1.3. The summary showing how research questions, contributions, corresponding
chapters and publications fit together in the thesis ............................................................... 15
Table 2.1. Recommendation Approach and Techniques, summarised from (Adomavicius and
Tuzhilin, 2005) ..................................................................................................................... 19
Table 2.2. Classification of tag-based recommendation research according to the user profile
modelling approaches, the data interpretation schemes, and the types of
recommendation ................................................................................................................... 35
Table 2.3. Classification of ranking-based recommendation research according to ranking
approaches, loss functions, and feedback forms. Here, I = Implicit and E = Explicit. ......... 47
Table 3.1. Details of the various characteristic of datasets.................................................................... 64
Table 3.2. The details of dataset statistics resulted from the implementation of various p-cores ......... 65
Table 3.3. The summaries of the proposed ranking methods ................................................................ 73
Table 4.1. Average TRPR accuracy improvement over MAX .............................................................. 98
Table 4.2. The density comparison of non-zero entries generated from Dtrain on the primary
tensor Y and weighted tensor W (v = 50) ........................................................................ 115
Table 4.3. F1-Score at various Top-N positions on Delicious dataset................................................. 116
Table 4.4. F1-Score at various Top-N positions on LastFM dataset ................................................... 116
Table 4.5. F1-Score at various Top-N positions on CiteULike dataset ............................................... 116
Table 4.6. F1-Score at various Top-N positions on MovieLens dataset .............................................. 116
Table 5.1. NDCG, AP, and MAP on Delicious dataset ....................................................................... 136
Table 5.2. NDCG, AP, and MAP on LastFM dataset ......................................................................... 136
Table 5.3. NDCG, AP, and MAP on CiteULike dataset ..................................................................... 137
Table 5.4. NDCG, AP, and MAP on MovieLens dataset .................................................................... 137
Table 5.5. The comparison of tensor entries population distribution generated from Dtrain
using boolean, set-based and UTS schemes ....................................................................... 139
Table 5.6. The comparison of tensor entries population distribution generated from Dtrain
using boolean, set-based, and graded-relevance schemes ................................................. 154
Table 5.7. NDCG, AP, and MAP on Delicious dataset ....................................................................... 156
Table 5.8. NDCG, AP, and MAP on LastFM dataset ......................................................................... 156
Table 5.9. NDCG, AP, and MAP on CiteULike dataset ..................................................................... 157
Table 5.10. NDCG, AP, and MAP on MovieLens dataset .................................................................. 157
Table 6.1. The proposed and benchmarking methods performances on Delicious dataset ................. 167
Table 6.2. The proposed and benchmarking methods performances on LastFM dataset .................... 167
Table 6.3. The proposed and benchmarking methods performances on CiteULike dataset ................ 168
Table 6.4. The proposed and benchmarking methods performances on MovieLens dataset .............. 168
xiv Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
Table 6.5 . The comparison of tensor entries population distribution generated from Dtrain
using boolean, UTS, and graded-relevance schemes ......................................................... 169
Table 6.6. The comparison of complexity of proposed methods ........................................................ 185
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xv
List of Abbreviations
ALS Alternating Least Square
AP Average Precision
AUC Area Under the Receiver Operating Characteristic Curve
CF Collaborative Filtering
CP Candecomp/Parafac
DCG Discounted Cumulative Gain
Do-Rank DCG Optimization for Learning-to-Rank
ERR Expected Reciprocal Rank
GAP Graded Average Precision
Go-Rank GAP Optimization for Learning-to-Rank
HOOI Higher-Order Orthogonal Iteration
HOSVD Higher-Order SVD
IDCG Ideal Discounted Cumulative Gain
MAP Mean Average Precision
MRR Mean Reciprocal Rank
MSE Mean Square Error
NDCG Normalized Discounted Cumulative Gain
PITF Pairwise Interaction Tensor Factorization
RR Reciprocal Rank
STS Social Tagging Systems
SVD Singular Value Decomposition
TF-IDF Term Frequency–Inverse Document Frequency
xvi Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
TRPR Tensor-based Item Recommendation using Probabilistic Ranking
UTS User-Tag Set
We-Rank Recommendation Ranking using Weighted Tensor
wMSE weighted Mean Square Error
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xvii
Statement of Original Authorship
The work contained in this thesis has not been previously submitted to meet
requirements for an award at this or any other higher education institution. To the
best of my knowledge and belief, the thesis contains no material previously
published or written by another person except where due reference is made.
Signature:
Date: _________________________ 12/08/2016
QUT Verified Signature
xviii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model
Acknowledgements
I would like to start by praising the Almighty, the Lord of Everything, the Beneficent
and the Merciful.
I sincerely express my gratitude and appreciation to Associate Professor Richi
Nayak, my Principal Supervisor, for her continuous guidance, encouragement, and
support throughout my PhD journey. Her valuable reviews and feedback have
sharpened and enhanced my critical thinking and research skills. Further, it is her
patience and understanding that has helped me to get through my low moments. I
also thank Associate Professor Shlomo Geva for being my Associate Supervisor.
I acknowledge the Directorate General of Higher Education (DGHE) Indonesia
for financially supporting my PhD study. My special gratitude goes to the QUT High
Performance Computing (HPC) and Research Support Group for their computational
resources and services. Further, I am indebted to my home institution in Indonesia:
Informatics Department, Faculty of Engineering, University of Trunojoyo Madura,
for the study leave.
I would like to thank the Science and Engineering Faculty (SEF), School of
Electrical Engineering and Computer Science (EECS) and Data Science (DS)
Discipline for providing me a comfortable research environment. My appreciation is
extended to Dr Sangeetha Kutty, Dr Rakesh Rawat, Dr Suren Rathnayake and Mr
Endang Djuana for the valuable conceptual and technical discussions we had at the
early stages of my candidature. My thanks go to my colleagues, Israt, Edy, Gavin,
Reza, Jun, Paul, Mahnoosh, Lin, Khanh, Fahim, Hamzah, Raji and Daniel for their
support in many circumstances.
My gratitude goes to the staff members from EECS, especially Ms Ellainne
Steele, Ms Joanne Reaves, Ms Joanne Kelly, Ms Sharon McCann, Ms Mallory Van
Nek for their administrative support and also, their personal assistance.
Proofreading service for this thesis was provided and is acknowledged,
according to the guidelines laid out in the University-endorsed national policy
guidelines for the editing of research theses.
Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xix
Last but not least, I am truly grateful to my family and friends, for their love,
support and encouragement.
Chapter 1: Introduction 1
Chapter 1: Introduction
This chapter outlines the background of the research and its motivations. The next
four sections describe the research questions, objectives, contributions, and
significance. Following on from this, the papers published from the work presented
in this thesis are listed and the remaining chapters are outlined. Finally the last
section provides the summary of the chapter.
1.1 BACKGROUND AND MOTIVATIONS
Recommendation systems help users to find relevant information on the Internet by
providing them with a list of items that they might be interested in (Zhang et al.,
2011). The list of recommendations is generated by learning from the user profiles,
which are commonly built from the information related to both the users and the
items, such as users’ purchase history (Pradel et al., 2011; Rendle, Freudenthaler, et
al., 2009), demographics (Vozalis and Margaritis, 2007), ratings (Balakrishnan and
Chopra, 2012; Koren and Sill, 2011; Weimer et al., 2007), and content of items (de
Campos et al., 2010; Pazzani and Billsus, 2007).
Accompanying the popularity of Web 2.0, are the emerging Social Tagging
System (STS) applications, in which users can organise, retrieve, and share items
(e.g. bookmarks, songs, movies, and articles) with other users (Marinho et al., 2012;
Mezghani et al., 2012; Schoefegger and Granitzer, 2012). These systems facilitate
their users to use freely defined tags for annotating items of their interest. Users are
typically allowed to use the same tag for annotating different items, as well as using
different tags for annotating the same item. A tagging activity represents the event
when a user uses a tag to annotate an item, and a ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation is
naturally formed. Over a period of time, the tagging data are recorded as a result of
the accumulated ternary relations. Figure 1.1 shows a sample of tagging data that
holds the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation, where three users, four items and five
tags are recorded in total. It is to be noted that the tagging activities of each user, i.e.
2 Chapter 1: Introduction
(a) User 1, (b) User 2, and (c) User 3, are displayed as a separate sub-figure for ease
of illustration.
Unlike the “traditional” recommendation systems, which use ratings to capture
user interest of certain items, STSs capture the user interest by analysing the tagging
data and support the process of generating item recommendations. In other words,
the system predicts the list of items that may be of interest to a user by learning from
the user’s tagging preferences. An STS facilitates a tag-based item recommendation
system, the success of which highly depends upon how the relations in the tagging
data are exploited (Bogers and van den Bosch, 2009; Kim et al., 2010).
Item 3
Item 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 1
Item 2
User 1
Item4
Item 3
Item 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 1
Item 2
User 2
Item4
Item 3
Item 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 1
Item 2
User 3
Item4
(a) User 1 (b) User 2 (c) User 3
Figure 1.1. A sample of tagging data that holds the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations
In order to boost the performance of recommendation systems with tags, the
unique multi-dimensional relations between users, items, and tags must be
appropriately modelled to represent the user profiles, such that the latent
relationships among dimensions are thoroughly captured. Therefore, building a tag-
based recommendation system needs to employ a multi-dimensional approach rather
than splitting them into multiple lower dimension models (Rendle, Balby Marinho, et
al., 2009; Rendle and Schmidt-Thieme, 2010; Symeonidis et al., 2008, 2010). Tensor
models are an approach that can preserve the multi-dimensional nature of the tagging
data and infer the latent relationships inherent in the data (Acar et al., 2011; Ifada and
Chapter 1: Introduction 3
Nayak, 2014c; Kolda and Bader, 2009; Symeonidis et al., 2010). For tag-based item
recommendation systems, tagging data can be modelled as a third-order tensor,
factorized to acquire the latent factors that govern the ternary relations, and
reconstructed to calculate the predicted preference scores for generating the list of
recommendations.
The task of a tag-based item recommendation system is to generate a list of
items that may be of interest to a user, by learning from the user’s past tagging
behaviour. Based on the sample of tagging data shown in Figure 1.1, an example of
item recommendation can be demonstrated and listed in Table 1.1. Figure 1.1 shows
that User 1 has the same tag preferences with User 2 and User 3 as they all have
used Tag 4 to annotate items. Subsequently, the system can recommend items that
have been annotated by User 2 and User 3 to User 1. In this case, the system may
recommend Item 1 and Item 4 to User 1 as those items have been previously
annotated by User 2 and User 3, respectively. Using the same approach, the system
may recommend Item 2 and Item 4 to User 2 as they have been previously annotated
by User 1 and User 3, respectively. Likewise, the system may recommend Item 1 and
Item 2 to User 3 as they have been previously annotated by User 2 and User 1,
respectively. Given the tagging data, a list of item recommendations can be
generated for each user.
User
Previous
Annotated
Item
Similar User based
on Tag Preference
Previous Annotated
Item of Similar User
Recommended
Item
User
1
Item 2,
Item 3
User 2: Tag 1, Tag 4
User 3: Tag 4
User 2: Item 1, Item 3
User 3: Item 3, Item 4
Item 1,
Item 4
User
2
Item 1,
Item 3
User 1: Tag 1, Tag 4
User 3: Tag 2, Tag 4
User 1: Item 2, Item 3
User 3: Item 3, Item 4
Item 2,
Item 4
User
3
Item 3,
Item 4
User 1: Tag 4
User 2: Tag 2, Tag 4
User 1: Item 2, Item 3
User 2: Item 1, Item 3
Item 2
Item 1
Table 1.1. An example of item recommendations based on tagging data in Figure 1.1
The web search research has established that users usually show more interest
in the few items at the top of the list of recommendations than those further down in
the list (Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009; Mohan et al., 2011;
4 Chapter 1: Introduction
Wang et al., 2013; Weimer et al., 2007). Accounting for this research, this thesis
conjectures that a tag-based item recommendation system should provide an ordered
list of item recommendations. It will be advantageous to implement a learning-to-
rank approach for learning the tag-based recommendation model to solve the item
recommendation task.
The learning-to-rank approaches can be categorised into three types: point-
wise; pair-wise; and list-wise according to the input representation and the loss
function used (Liu, 2009; Mohan et al., 2011). To solve the recommendation task
using a point-wise based ranking approach, the recommendation model is learned to
predict whether the user will like the predicted item or not, assuming there is no
interdependency between the predicted items (Liu, 2009; Mohan et al., 2011; Rendle,
2011). In a pair-wise based ranking approach, the recommendation model is learned
to predict the order of a pair of items, in which the interdependency occurs between
the two paired items (Liu, 2009; Mohan et al., 2011; Rendle, 2011). To solve the
recommendation task using a list-wise based ranking approach, the recommendation
model is learned to predict an ordered set of items that will be of interest to a user, in
which a ranking of predicted items depends on other corresponding items (Liu, 2009;
Mohan et al., 2011).
In spite of progress in this research field, there exist several challenges and
shortcomings with the current tag-based item recommendation methods:
Data interpretation. An interpretation scheme defines the user profile
representation, dictating how the user tagging activities should populate the
data structure used. It greatly affects the recommendation performance (Ifada
and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009). A tag-based item
recommendation system customarily interprets the observed data as
“positive” or “relevant” tagging data entries. Observed data is the state which
is registered by users, expressing their interest in items by annotating them
with tags. Given that the system records the tagging activities, the observed
entries can be interpreted from the tagging data straightforwardly. On the
contrary, how should the non-observed tagging data be interpreted, remains
disputed and open to researchers’ perceptions. At present, there are two well-
known interpretation schemes: (1) the boolean scheme, which interprets non-
observed entries as a single value of “0” (Symeonidis et al., 2010), and (2) the
Chapter 1: Introduction 5
set-based scheme which interprets non-observed entries as a combination of
“irrelevant” and “indecisive” entries (Rendle, Balby Marinho, et al., 2009),
i.e. entries that the users do not like and might like in the future, respectively.
The boolean scheme has the sparsity problem due to the non-observed entries
domination and the overfitting problem as it mixes the “irrelevant” and
“indecisive” entries that can be inferred from the non-observed entries
(Rendle, Balby Marinho, et al., 2009). The set-based scheme has shown how
to tackle these problems; however, it overgeneralises the “irrelevant” entries
(Ifada and Nayak, 2014a);
Utilising reconstructed tensor for generating the recommendations. The
existing approaches assume that the predicted preference score in the
reconstructed tensor represents the level of user preference for an item based
on a tag directly. These approaches generate the list of recommendations
based on the maximum values of predicted preference scores in each user-
item set (Nanopoulos, 2011; Symeonidis et al., 2010). However, they
disregard the user’s past tagging activities that have been found influencing
the user preference in the recommended items (Kim et al., 2010);
Learning from the latent factors. The task of a tag-based recommendation
system is to generate the list of items that may be of interest to a user, by
learning from the user’s tagging history. The list of item recommendations is
sorted in descending order, based on the predicted preference score that
exposes the preference level of a user for annotating an item using a tag. By
using a tensor model to build the user profile, the preference score can be
calculated from the latent factors that govern the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary
relations inherent in the tagging data. Consequently, the choice of loss
function used as the optimization criterion becomes crucial as it controls the
learning process of latent factors. The data interpretation approach used to
construct the tagging data for populating the tensor model and the learning-
to-rank approach employed to learn the recommendation model govern the
recommendation process and become significant.
6 Chapter 1: Introduction
Inspired by these challenges of the tag-based item recommendation systems, this
research aims to exploit the tensor model and learning-to-rank approaches for
providing an effective solution to the item recommendation task in a tag-based
system. It is to be noted that there exist a large number of item recommendation
works that deal with the semantic analysis of tags for sparsity dealing or improving
quality. However, very few works focus on improving the data quality via efficient
interpretation of input data. This thesis does not deal with the semantic analysis of
tags, but rather to focus on the interpretation of tagging data.
1.2 RESEARCH QUESTIONS
This thesis focuses on providing the Top-N item recommendation to a user in the tag-
based system, by implementing the tensor model and learning-to-rank approaches.
The identification of research gaps in a tag-based item recommendation system leads
to the formulation of the following research questions:
Q1: How can tagging data be efficiently interpreted, such that the user’s tagging
history is thoroughly utilised while making recommendations and results in
a rich multi-graded data?
Q2: How can a learning-to-rank approach be implemented to solve the tag-
based item recommendation task? What optimization criterion should be
used for learning the tensor recommendation model? In what order can the
Top-𝑁 item recommendation be made?
Q3: Does a combination of an interpretation scheme and a learning-to-rank
approach have a positive influence in making a recommendation? Given
that the proposed tag-based item recommendation methods are grouped as
point-wise and list-wise based ranking approaches, comparing their
performances may help to find an efficient method.
1.3 RESEARCH OBJECTIVES
Focus of this thesis is to implement two ranking approaches: point-wise and list-
wise. The pair-wise based ranking approach is not implemented in this research, as
Chapter 1: Introduction 7
its objective is to predict the order of a pair of items and therefore it disregards the
fact that Top-𝑁 recommendation is a prediction task on a list of items (Cao et al.,
2007). The recommendation task is framed as a regression/classification task by the
point-wise based ranking approach and as a ranking task by the list-wise based
ranking approach. More specifically, the research objectives required to be fulfilled
are listed as follows:
Developing the point-wise based ranking approach methods:
o Developing a method that implements a probabilistic ranking to rank the
list of recommendations. A tag-based item recommendation method
typically implements the boolean interpretation scheme for building the
tensor recommendation model and uses the least square loss function as
the optimization criterion for learning the model. For generating
recommendations, the existing methods (Nanopoulos, 2011; Symeonidis
et al., 2010) directly use the maximum values of predicted preference
scores in each user-item set of the reconstructed tensor model and ignore
the users’ past tagging activities, which results in inferior
recommendation quality. An additional challenge of this approach is the
tensor reconstruction process where the entire latent factors need to be
multiplied, in which it consumes a lot of memory and therefore scalability
becomes an issue. The developed method focuses on how the
recommendation accuracy of candidate items revealed from the
reconstructed tensor be improved and the scalability issue faced during
the tensor reconstruction process be solved;
o Developing a method that implement a weighted tensor approach for
ranking. Applying the least square loss function as the optimization
criterion, to learn the tensor recommendation model built from the
boolean interpretation scheme implementation, means that fitting both the
observed and non-observed entries has the same importance. In this case,
implementing a weighting scheme in the learning process is beneficial to
differentiate the importance of observed and non-observed entries of each
user-item set. The developed method focuses on how the quality of
recommendations be improved by implementing a weighted scheme in a
way such that the observed and non-observed entries of each user-item set
8 Chapter 1: Introduction
are given either rewards or penalties, i.e. the observed entries are
weighted with higher values than the non-observed ones, for learning the
tensor recommendation model.
Developing the list-wise based ranking approach methods:
o Developing a method to learn from multi-graded data. Implementing a
ranking-based data interpretation scheme allows the interpreted tagging
data to have a ranked representation, i.e. the observed entries are given
higher values than those of non-observed, and results in the multi-graded
tagging data representation. The tagging data is labelled with a value in
the ordinal relevance set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a
tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩. The developed method focuses on how the
tensor recommendation model built from multi-graded data be efficiently
learned by proposing and applying the User-Tag Set (UTS) for
constructing the user profile, and using the Discount Cumulative Gain
(DCG) as the optimization criterion for learning the tensor
recommendation model;
o Developing a method to learn from graded-relevance data. The multi
grading of the data with {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑖𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple
of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ can be made richer by considering the “transitional”
entries between “relevant” and “irrelevant”. The developed method
focuses on how the tensor recommendation model built from the graded-
relevance data be efficiently learned by proposing and applying the
graded-relevance interpretation scheme, to effectively leverage the
tagging data, for constructing the user profile, and using the Graded
Average Precision (GAP) as the optimization criterion for learning the
tensor recommendation model.
Comparing and analysing the results of all proposed ranking methods, and the
benchmarking methods, to reveal the strengths and shortcomings of each
method.
Chapter 1: Introduction 9
1.4 RESEARCH CONTRIBUTIONS
This thesis has developed schemes to interpret tagging data and methods to generate
tag-based item recommendation, as summarised in Table 1.2.
Developed Method Optimization
Criterion
Data
Type
Interpretation
Scheme
Ranking
Approach
TRPR: Tensor-based Item
Recommendation using
Probabilistic Ranking
Least square
loss
Binary boolean Point-wise
We-Rank:
Recommendation Ranking
using Weighted Tensor
Weighted least
square loss
Binary +
Multi-
graded
boolean +
weighted
scheme
Point-wise
Do-Rank: DCG
Optimization for Learning-
to-Rank
Discount
Cumulative
Gain (DCG)
Multi-
graded
User-Tag set
(UTS)
List-wise
Go-Rank: GAP
Optimization for Learning-
to-Rank
Graded Average
Precision (GAP)
Graded
relevance
graded-
relevance
List-wise
Table 1.2. Summary of each developed method
In particular, the contributions of this research are listed as follows:
To tackle the problems of existing interpretation schemes, two ranking-based
interpretation schemes are proposed, i.e. User-Tag Set (UTS) and graded-
relevance, which apply a ranking constraint to interpret the tagging data and
result in a richer data. The UTS scheme interprets the tagging data as multi-
graded data and results in three possible distinct entries: (1) “relevant” or “1”
– user has been observed showing his interest to items of the entries, (2)
“irrelevant” or “-1” – user is not interested with the entries, and (3)
“indecisive” or “0” – user might be interested with the entries in the future,
i.e. entries need to be predicted for generating the list of recommendations.
The graded-relevance scheme interprets the tagging data as graded-relevance
data and results in four possible distinct entries: (1) “relevant” or “2”, (2)
“likely relevant” or “1”, (3) “irrelevant” or “-1”, and (4) “indecisive” or “0”.
The “likely relevant” entries are those that the user is probably interested
10 Chapter 1: Introduction
with, yet this is not explicitly revealed. Note that items of those entries have
actually been annotated by the user using other tags. In other words, the
“likely relevant” entries are the transitional entries between the “relevant”
and “irrelevant” entries;
To improve the recommendation accuracy after the tensor model has been
reconstructed, and the scalability during the tensor reconstruction process, the
Tensor-based Item Recommendation using Probabilistic Ranking (TRPR)
method is proposed. TRPR improves the quality of recommendations by
applying the boolean interpretation scheme, for constructing user profiles,
and implementing probabilistic ranking, in which the user’s past tagging
history is taken into account, for generating the list of recommendations.
TRPR solves the scalability issue faced during the tensor reconstruction
process, by implementing a memory efficiency technique;
To improve the recommendation accuracy during the learning from the latent
factors process and to deal with the sparsity problem, the Recommendation
Ranking using Weighted Tensor (We-Rank) method is proposed. We-Rank
improves the quality of recommendations by applying the boolean
interpretation scheme for constructing user profiles, and utilising the users
past tagging histories to reveal their tag usage likeliness for learning the
tensor recommendation model. We-Rank implements a weighted scheme,
such that rewards and penalties are given to the observed and non-observed
entries of each user-item set during the learning process, respectively. In this
case, in contrast to TRPR that requires a succeeding approach to correctly
rank the order of items that might interest users after factorization and
reconstruction processes, the resulted factorized elements of We-Rank can be
directly used to make ranked recommendations;
To learn from a user profile built from multi-graded data, resulted by
implementing the proposed User-Tag Set (UTS) scheme, the DCG
Optimization for Learning-to-Rank (Do-Rank) method is proposed. The
recommendation model of Do-Rank is optimized with respect to Discount
Cumulative Gain (DCG) as the ranking evaluation measure. Do-Rank tackles
the computational expensiveness of the learning process by implementing a
Chapter 1: Introduction 11
fast learning approach that efficiently reduces the learning time, while at the
same time improving or maintaining accuracy;
To learn from a user profile built from graded-relevance data, resulted by
implementing the proposed graded-relevance scheme, the GAP Optimization
for Learning-to-Rank (Go-Rank) method is proposed. The recommendation
model of Go-Rank is optimized with respect to Graded Average Precision
(GAP) as the ranking evaluation measure. Using GAP as the optimization
criterion enables the recommendation model to set up thresholds so that the
“likely relevant” entries can be regarded as either “relevant” or “irrelevant”
entries. Go-Rank tackles the computational expensiveness of the learning
process by implementing a fast learning approach that efficiently reduces the
learning time, while at the same time improving or maintaining accuracy;
The results of all the proposed methods and benchmarking methods are
compared. Analyses of the results are conducted to reveal the strength and
shortcoming of each proposed method.
1.5 RESEARCH SIGNIFICANCE
The research carried out in this thesis advances the knowledge discovery in tag-based
recommendation systems, which focuses on efficiently interpreting tagging data and
ranking the list of recommendations. The area of tag-based recommendation systems,
in particular how the tagging data should be interpreted as it determines the
recommendation quality, is under research.
This thesis has practical significance for real-life applications since an
efficient tagging data interpretation scheme can provide an alternative solution for
solving the sparsity problem that commonly occurs in the tag-based systems, as
usually only a few entries are observed per user (Leginus et al., 2012; Rafailidis and
Daras, 2013; Rendle, Balby Marinho, et al., 2009). Moreover, an efficient
interpretation scheme is more important, instead of just simply trying to get more
dense data representation, e.g. via clustering techniques for reducing the tag
dimension to represent the semantically similar tags. Ranking the list of
recommendations has a strong practical implication since, in real-life, users usually
show more interest in the few items at the top of the list of recommendations than
12 Chapter 1: Introduction
those further down the list (Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009;
Mohan et al., 2011). In this case, working on the approaches that optimize “the top of
the list” is essential in tag-based item recommendation.
From a broader point of view, this research is providing solutions for problems
that can generate three-dimensional data. Hence, in general, any applications with
this type of data can be solved by methods proposed in this thesis. A well-known
example of such an application is Twitter1 which allows its users to use the hashtag
symbol (‘#’) before a relevant keyword to categorise their tweets. Survey by
RadiumOne (2013) reported that 58% of Twitter users use hashtags on a regular
basis. Similar to STS applications, which allow the users to use tags for annotating
items of their interest, the proposed tag-based item recommendation methods can be
implemented for the tweets recommendation system.
The context-aware recommendation system is another example of a problem
that can be solved using the proposed methods. A context-aware system incorporates
the additional contextual information, such as time and location, into the
recommendation process (Adomavicius and Tuzhilin, 2011) for generating a list of
item recommendations to users, under certain contexts. Such a system generates
three-dimensional data as the contextual information becomes the third dimension,
adding those of user and item.
1.6 PUBLICATIONS
The following publications have been produced from the work presented in this
thesis.
1. Ifada, Noor & Nayak, Richi (2016). How Relevant is the Irrelevant
Data: Leveraging the Tagging Data for a Learning-to-Rank Model. In
Proceedings of the 9th ACM International Conference on Web Search and Data
Mining – WSDM 2016, ACM New York, San Francisco, California, USA, pp.
23-32.
2. Ifada, Noor & Nayak, Richi (2015). Do-Rank: DCG Optimization for Learning-
to-rank in Tag-based Item Recommendation Systems. In Cao, T., Lim, E.-
1 https://twitter.com/
Chapter 1: Introduction 13
P., Zhou, Z.-H., Ho, T.-B., Cheung, D., & Motoda, H. (Eds.) Advances in
Knowledge Discovery and Data Mining – PAKDD 2015. Springer-Verlag Berlin
Heidelberg, Berlin, pp. 510-521.
3. Ifada, Noor & Nayak, Richi (2014). A Two-stage Item Recommendation Method
using Probabilistic Ranking with Reconstructed Tensor Model. Lecture Notes in
Computer Science: User Modeling, Adaptation, and Personalization – UMAP
2014, 8538, pp. 98-110.
4. Ifada, Noor & Nayak, Richi (2014). An Efficient Tagging Data Interpretation and
Representation Scheme for Item Recommendation. In Proceedings of the 12th
Australasian Data Mining Conference – AusDM 2014, 27-28 November 2014,
Queensland University of Technology, Gardens Point Campus, Brisbane,
Australia. (Best Paper Award)
5. Ifada, Noor (2014). A Tag-based Personalized Item Recommendation System
using Tensor Modeling and Topic Model Approaches. In Proceedings of the 37th
International ACM SIGIR Conference on Research & Development in
Information Retrieval – SIGIR 2014, ACM New York, Gold Coast, Queensland,
Australia, p. 1280.
6. Ifada, Noor & Nayak, Richi (2014). Tensor-based Item Recommendation using
Probabilistic Ranking in Social Tagging Systems. In Chung, Chin-Wan, Broder,
Andrei, Shim, Kyuseok, & Suel, Torsten (Eds.) In Proceedings of the Companion
Publication of the 23rd International Conference on World Wide Web
Companion – WWW 2014, ACM, Seoul, Republic of Korea, pp. 805-810.
1.7 THESIS OUTLINE
The remainder of the thesis is organised as follows:
Chapter 2 reviews the relevant literature. The review covers literature about
web personalization, tag-based item recommendation system, and ranking-based
recommendation approaches.
Chapter 3 presents the research design and the evaluation procedure. The
research design is described in two phases, i.e. tagging data pre-processing and the
development of tag-based item recommendation methods. The proposed methods in
14 Chapter 1: Introduction
this thesis are categorised into point-wise and list-wise ranking methods. This
chapter also includes the detailed description of the datasets, the experimental
settings, and the various evaluation measures used to evaluate the proposed tag-based
item recommendation methods. Lastly, the benchmarking methods use to evaluate
the proposed methods are presented.
Chapter 4 describes the proposed tag-based item recommendation methods
built by implementing the point-wise based ranking approach. The proposed point-
wise methods are the Tensor-based Item Recommendation using Probabilistic
Ranking (TRPR) and the Recommendation Ranking using Weighted Tensor (We-
Rank) methods. Both methods implement the standard pre-processing scheme, i.e.
implementing the boolean scheme, for building the tensor recommendation model
that represents the user profiles. The TRPR method ranks the list of item
recommendations by employing a probabilistic approach that is taking into account
the user’s past tagging history to calculate the user’s probability for annotating an
item given a list of tags, following the tensor reconstruction process, in order to
improve the quality of recommendation. TRPR also implements a memory efficiency
technique in order to solve the scalability issue that occurs during the tensor
reconstruction process. The results of TRPR are then compared to the benchmarking
methods. We-Rank, another point-wise based ranking approach method, implements
a weighted scheme for learning the tensor recommendation model, such that rewards
and penalties are given to the observed and non-observed entries of each user-item
set, respectively. The experimentation results of We-Rank are also presented and
compared against the benchmarking methods.
Chapter 5 presents the proposed tag-based item recommendation methods built
by implementing the list-wise based ranking approach. The proposed list-wise
methods include the DCG Optimization for Learning-to-Rank (Do-Rank) and the
GAP Optimization for Learning-to-Rank (Go-Rank) methods. For each method, new
pre-processing schemes are proposed. For the Do-Rank method, the User-Tag Set
(UTS) scheme is proposed for building the tensor recommendation model, which
represents the user profiles, and the Discount Cumulative Gain (DCG) is used as
optimization criterion to appropriately learn the model. The results of Do-Rank are
then compared to the benchmarking methods. For the Go-Rank method, the graded-
relevance scheme is proposed for building the tensor recommendation model, which
Chapter 1: Introduction 15
represents the user profiles, and the Graded Average Precision (GAP) is used as
optimization criterion to appropriately learn the model. The experimentation results
of Go-Rank are also presented and compared against the benchmarking methods.
In Chapter 6, the results of all the proposed methods and benchmarking
methods are compared. Analyses of the results are conducted in order to investigate
the impact of various aspects.
Chapter 7 presents the final conclusions, including listing the main contribution
and summary of the findings of this thesis. A discussion about the future research is
also identified.
1.8 CHAPTER SUMMARY
This chapter has detailed the background, motivations, objectives, and significance
of this research. Furthermore, this chapter also lists the research questions,
contributions, and the corresponding publications. Table 1.3 presents a summary
showing how they, including the corresponding chapters, fit together in this thesis.
Research
Activity:
Interpretation
Scheme
Ranking Approach Comparison
Point-wise List-wise
Research
Question: Q1 Q2 Q2 Q3
Research
Contribution:
User-Tag Set (UTS),
graded-relevance
TRPR,
We-Rank
Do-Rank,
Go-Rank
Comparison of all
proposed methods
Corresponding
Chapter: Chapter 5 Chapter 4 Chapter 5 Chapter 6
Corresponding
Publication: Paper 1, 2, 4 Paper 3, 5, 6 Paper 1, 2
Table 1.3. The summary showing how research questions, contributions, corresponding chapters and
publications fit together in the thesis
16 Chapter 1: Introduction
Chapter 2: Literature Review 17
Chapter 2: Literature Review
This chapter reviews the most relevant literature related to web personalization, tag-
based item recommendation systems and ranking-based recommendation
approaches. Since this thesis does not deal with the semantic analysis of tags, the
techniques presented in this chapter are focusing on the interpretation of tagging
data.
This chapter starts by discussing web personalization, which highlights the
importance of recommendation systems in web personalization, including the
approaches of recommendation algorithms. This thesis proposes a tag-based
recommendation system; therefore acquiring a comprehensive knowledge of
traditional recommendation systems is essential. The following section details the
tag-based item recommendation systems, which includes a brief description of Social
Tagging Systems, to grasp the important aspects of tag-based item recommendation
systems, and understanding the importance of selecting the appropriate user profile
modelling approach and tagging data interpretation scheme for building the
recommendation model. Following that, the third section discusses the ranking-based
recommendation approaches that can be implemented for learning the tag-based
recommendation model, in order to solve the recommendation task. Finally, in the
summary and conclusion section, the research gaps are derived by analysing the
shortcomings of the current approaches employed in the tag-based recommendation
systems.
2.1 WEB PERSONALIZATION
Web personalization aims to overcome the abundant information issue on the
Internet by pointing users to the list of recommendations that might interest them
(Castellano et al., 2009; Mobasher, 2007; Singh Anand and Mobasher, 2005;
Venugopal et al., 2009; Zhang et al., 2011). For this reason, web personalization and
recommendation systems are often mentioned interchangeably (Castellano et al.,
18 Chapter 2: Literature Review
2009). A recommendation system usually consists of two main stages, namely user
profiling and recommendation generation.
User profiling is the stage where user profiles are constructed which is a formal
representation of information collected from the user. The profiles can be constructed
in the two steps of feedback data collection and profile representation. Feedback data
about users can be collected explicitly and implicitly, and the user profile is derived
by analysing this data to be represented in various ways, such as vector, matrix, and
tensor. Explicit feedback data collection usually relies on personal information given
by the users via HTML forms. Another common technique is by allowing users to
express their opinions through selecting a value from a range, known as ratings.
Though explicit feedback data are effective and easy to collect, they require a user’s
willingness to participate, which might become an additional burden for the user,
while in fact some users may not accurately report their own interests (Qiu and Cho,
2006). Implicit feedback data collection can be conducted by gathering the user’s
behaviour information via click streams, bookmarking, purchasing behaviour, and
the content or structure information of the visited web pages. While this approach is
considered to be an effective way to construct user profiles, it is laborious and
expensive for gathering and filtering the data.
In the stage of recommendation generation, a list of recommendations is
provided to target users by offering items with the highest predicted ratings or the
highest recommendation scores (Lü et al., 2012). In other words, the main purpose of
a recommendation system therefore is to predict the target users’ interests. Based on
how the recommendations are generated, the recommendation algorithm can be
categorised into three approaches: content-based, collaborative filtering, and hybrid
approach (Adomavicius and Tuzhilin, 2005), in which various techniques can be
applied to each approach, as summarised in Table 2.1. The next three sub-sections
provide a brief description of each approach.
2.1.1 Content-based Approaches
The content-based recommendation approach generates a list of recommendations
based on content similarity of items to the items that the target user has previously
preferred. The information source for this approach relies on items previously rated
by the user (Lops et al., 2011; Pazzani and Billsus, 2007). Content-based approaches
Chapter 2: Literature Review 19
are mostly used for recommendation across text-based items for which content can
be represented by keywords. For each keyword, the level of importance is
determined using a weighting measure (Adomavicius and Tuzhilin, 2005; Lops et al.,
2011).
Recommendation
Approach
Recommendation Technique
Heuristic-based Model-based
Content-based TF-IDF (Information
Retrieval)
Clustering
Bayesian classifier
Clustering
Decision trees
Artificial neural networks
Collaborative Nearest neighbour
(cosine, correlation)
Clustering
Graph theory
Bayesian networks
Clustering
Artificial neural networks
Linear regression
Probabilistic models
Hybrid Linear combination of
predicted rating
Various voting schemes
Incorporating one
component as a part of the
heuristic for the other
Incorporating one
component as a part of the
model for the other
Building one unifying
model
Table 2.1. Recommendation Approach and Techniques, summarised from (Adomavicius and Tuzhilin,
2005)
Despite its success, this approach has several limitations, i.e. limited content,
over specialisation and new user problems (Adomavicius and Tuzhilin, 2005). For a
content-analysis method, appropriate suggestions cannot be made if the analysed
content does not contain enough information to illustrate user preference. Since the
approach can only recommend items whose scores are high for a user profile, the
user receives recommendation of items that are similar to those already rated. The
new user problem is caused by the insufficient number of ratings given by the new
users. A content-based approach can provide accurate recommendations only if the
items contain rich content information, such as books and articles. When there is no
20 Chapter 2: Literature Review
adequate content information, the collaborative filtering approach is considered a
better solution.
2.1.2 Collaborative Filtering Approaches
Collaborative filtering is the most successful recommendation approach (Su and
Khoshgoftaar, 2009), where a target user will be provided a list of item
recommendations that other users with similar preferences have liked in the past.
The collaborative filtering approach can be classified into memory-based and model-
based (Su and Khoshgoftaar, 2009).
A memory-based collaborative approach uses the user-item database to
generate prediction. The recommendation process consists of user profiling,
neighbourhood formation, and recommendation generation. It implements a K-
Nearest Neighbourhood (KNN) method to form the neighbourhood of each user or
each item (Adomavicius and Tuzhilin, 2005; Koren and Bell, 2011). Similarity
measurement between two users or two items is commonly done using the Cosine
similarity or Pearson correlation. Each target user receives a list of recommended
items based on the similarity scores that form the user’s neighbourhood. This
approach has gained popularity because of its simplicity and ability to recommend
any kind of items, i.e. the ones that do not have sufficient contextual information and
those that are dissimilar to items selected by the user in the past (Adomavicius and
Tuzhilin, 2005).
A model-based approach develops models to learn the complex patterns based
on the training data and then employs it for calculating the intelligent predictions.
Several methods can be implemented to generate the model using Bayesian
approaches (Alper, 2012), clustering techniques (Begelman et al., 2006; Pan et al.,
2013; Shepitsen et al., 2008), and latent factor models based on matrix factorization
techniques (Koren, 2008; Koren and Bell, 2011).
Despite of its achievement, the collaborative approach has limitations such as
the cold-start (new user or item) and sparsity problems (Adomavicius and Tuzhilin,
2005; Lee, 2001). The cold-start problem arises since a prediction cannot be
provided for a new user or a new item for which the rating history is unavailable. The
sparsity problem occurs due to the lack of ratings present for all items for all users.
Chapter 2: Literature Review 21
Another challenge is scalability as it requires data from a large number of users and
items (Lü et al., 2012) for finding alike users based on the rating data.
2.1.3 Hybrid Approaches
As discussed above, each of the content-based and collaborative filtering techniques
has limitations. A hybrid technique comprises of multiple recommendation
techniques, therefore their strength, can improve the recommendation performance
(Adomavicius and Tuzhilin, 2005; Burke, 2007). Burke (2007) has classified the
hybrid approaches into seven categories:
Weighted: numerically combine the predicted preference scores calculated
from different recommendation approaches.
Switching: choose among recommendation approaches based on the met
criterion.
Mixed: combine multiple recommendation approaches simultaneously.
Feature combination: combine the features of different knowledge sources
into a single recommendation method.
Feature argumentation: features resulted from the first recommendation
approach is used as part of input to the next approach.
Cascade: employ a second recommendation approach to refine the output of
the first approach.
Meta-level: use the model learned from the first recommendation approach
as the input to the second approach.
The hybrid recommendation system can solve the cold-start problem by
extracting latent features from items using the probabilistic model (Maneeroj and
Takasu, 2009). The similarities between items and users are computed for predicting
an unknown rating of a user to an item. The collaborative filtering can be
semantically enhanced by using the structured semantic knowledge of items in
aggregation with user-item mappings for creating a combined similarity measure and
generating predictions (Mobasher et al., 2004). This method could overcome the
newly added items or the very sparse data sets problems.
22 Chapter 2: Literature Review
2.1.4 Summary and Discussion
Recommendation systems are a well-established research area (Adomavicius and
Tuzhilin, 2005; Zhang et al., 2011) and work as an essential component for web
personalization (Castellano et al., 2009). As previously described, the
recommendation algorithms can be categorised into three approaches: content-based,
collaborative filtering, and hybrid (Adomavicius and Tuzhilin, 2005).
A serious concern to be noted for recommendation systems is the lack of
explicit feedback data, which results in an inadequate quantity of data available for
recommendation. The possibility of improving recommendation accuracy by
integrating information from supplementary data sources is a promising solution.
Yet, this technique suffers from the additional data collection issue that is always
tiresome, lengthy and costly (de Campos et al., 2010; Lekakos and Giaglis, 2007).
This gives a natural path to the Social Tagging Systems that provide an alternative
way out to the data inadequacy issue (Das et al., 2011; Luo et al., 2012). The next
section will present the applicability of the social tagging technology for the
recommendation system.
2.2 TAG-BASED ITEM RECOMMENDATION SYSTEMS
The tag-based item recommendation systems have gained great popularity, due to the
growing presence of user generated information on the Web (Lü et al., 2012; Zhang
et al., 2011). They support the technology to organise the information and make it
accessible wisely. This section talks about three important aspects in a tag-based item
recommendation system: Social Tagging Systems, User Profile Modelling
Approaches, and Tagging Data Interpretation Schemes.
2.2.1 Social Tagging Systems
Social Tagging Systems (STSs) have secured a significant role in Web 2.0 (Marinho
et al., 2011; Mezghani et al., 2012). An STS allows its users to organise, retrieve, and
share their resources (in other words, items) with other users (Marinho et al., 2012).
These systems facilitate their users to use any chosen tags for annotating items of
their interest (Mezghani et al., 2012), such as photos (www.flickr.com), songs
(www.last.fm), scientific papers (http://citeulike.org) or websites
Chapter 2: Literature Review 23
(http://delicious.com). These tags are reusable for later purposes and shareable with
other users (Schoefegger and Granitzer, 2012). Figure 2.1 shows some of the popular
STS Websites.
Figure 2.1. Example of popular Social Tagging System Websites
Tags in an STS are considered as a kind of meta-data, such as summaries,
profiles, attributes, and contents for items in other web systems. The main
differences between tags and other meta-data are that they are not predefined by
domain experts and are attached to both the users who created them and to the items
(Bogers and van den Bosch, 2009). Tags indirectly reveal a user personal interest and
can connect users to the tagged items. In a tagging system, the user activity and the
item and tag popularities form long-tailed distributions. The long-tail occurs in the
user and item distributions since most items are only selected once by the users and
most users only select one item (Li et al., 2008), while the tag long-tail distribution is
the result of personal tagging (Halpin et al., 2007). Figure 2.2 shows the long-tail
distributions captured from tagging data of the Delicious website.
24 Chapter 2: Literature Review
(a) (b) (c)
Figure 2.2. Long-tail distribution of: (a) items of bookmarked URLs, (b) users who made the
bookmarks, and (c) tags used in the bookmarks – captured from tagging data of Delicious website (Li
et al., 2008)
A tagging activity represents the condition when a user uses a tag to annotate
an item, in which a ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation is naturally formed. Over a
period of time, the tagging data is recorded as a result of the accumulated ternary
relations. The tag-based systems can serve as a supplementary source of information
to build user profile for the personalized recommendation system (Bogers and van
den Bosch, 2009; Lü et al., 2012; Schoefegger and Granitzer, 2012; Zhang et al.,
2011; Zhang et al., 2010). The ternary relations between users with both items and
tags enhance the information communication and sharing (Halpin et al., 2007;
Marinho et al., 2011). Therefore, the success of a tag-based recommendation system
depends on how the relations in the tagging data are exploited (Bogers and van den
Bosch, 2009; Kim et al., 2010).
There are three types of recommendations in tag-based systems, i.e. user, item,
and tag recommendations (Symeonidis et al., 2010; Zhang et al., 2011). User
recommendation is recommending users who have similar profiles to a target user,
by connecting users who used the set of tags frequently used by others, as well as
persuading them to contribute and share more content. Item recommendation is
recommending items to a target user based on tags that are commonly used by other
similar users; while in Tag recommendation, a tag is recommended to a target user,
based on what other similar users have provided for the same items. The focus of this
thesis is to generate item recommendations.
Compared to the “traditional” recommendation systems, adding tags to items
can be considered as implicit feedback on items (Liang et al., 2008). Tags are able to
represent user preferences and provide quality recommendations and solve problems
Chapter 2: Literature Review 25
in recommendation systems such as the cold-start problem (Zhang et al., 2010). A
tag-based item recommendation method predicts the list of items that may be of
interest to a user, by learning from the user’s tagging preferences.
2.2.2 User Profile Modelling Approaches
User profiles can be modelled using two data modelling approaches, i.e. two-
dimensional and multi-dimensional approaches. The two-dimensional approach
represents data as vector or matrix models. On the other hand, the multi-dimensional
approach represents data as a multi-dimensional model, such as tensor model. The
following two sub-sections discuss the two approaches in more detail.
2.2.2.1 Two-Dimensional Approaches
Basic Concept
A two-dimensional user profile modelling approach, commonly used to model the
users and items relations, cannot be directly employed to a tag-based item
recommendation system. This approach is unable (1) capturing the three-dimensional
representative of tagging data directly, i.e. modelling users, items and tags relations
(Nanopoulos, 2011; Symeonidis et al., 2010) as well as (2) modelling the many-to-
many relationship that exists among these three dimensions. Researchers have solved
this problem of integrating the tags by extending the user-item matrix used in the
standard collaborative filtering technique to enhance item recommendation. The
three-dimensional relation between ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ is projected into, a lower
dimension of, three two-dimensional matrices, i.e. ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and
⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩, as illustrated in Figure 2.3.
item-tag
frequency user-item binary
user-tag frequency
» + +
Figure 2.3. Projection of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ relation into three two-dimensional matrices
26 Chapter 2: Literature Review
Existing approaches
Several techniques can be employed to compute the user or item similarities in order
to generate the Top-𝑁 item recommendation prediction scores. Tso-Sutter et al.
(2008) used an extended fusion technique that applies a tag extension method to
combine two conditional probabilities of user-based and item-based similarities.
Alternatively, Liang et al. (2009) proposed to combine the similarities between users
and items. User similarity is achieved using the similarity of user tags, user items,
and user tag-item; while for the item similarity, it is calculated from the percentage
of items being put in the same tag, the percentage of being tagged by the same user,
and the percentage of common tag-item relationship.
Another useful two-dimensional approach to employ tags is by using the
hybrid technique. Tags and other meta-data of items are incorporated into the
collaborative filtering algorithm by substituting the usage-based similarity measures
with the tag overlap and combining tag-based similarity with usage-based similarity
(Bogers and van den Bosch, 2009). From here, the item recommendation can then be
calculated by implementing a content-based algorithm, which uses the metadata
content.
Tags can also be used to build user and item tag clouds (Barragans-Martinez et
al., 2010). User clouds consist of tags that have never been assigned by users,
whereas an item tag cloud contains the tags that have been used to the item by users.
An item is recommended to the target user by directly comparing its tag cloud using
the content-based technique. In order to improve the recommendation, the
collaborative filtering technique is complemented by using a target-user tag cloud
that designates the user to the suitable item.
Kim et al. (2010) proposed an effective method CTS based on the concept that
tags included by a certain user implies the user’s latent preference, i.e. the user-
created tags are determining the user-to-user similarity that is used to find the latent
tags for each user. Tags have also been integrated with a user profile that is built
based on user ratings with user-generated tags (Kim et al., 2011). The similarity is
then calculated by associating the tag weights with user rating. In this way, the three-
dimensional relation is projected onto a two-dimensional matrix by considering,
only, the ⟨𝑡𝑎𝑔, 𝑖𝑡𝑒𝑚⟩ relationship and the similarity is computed between users for
Chapter 2: Literature Review 27
the relevant and irrelevant frequent tag patterns. They showed that CTS
outperformed other two-dimensional approaches.
The CTS method (Kim et al., 2010) is used for benchmarking in this thesis, due
to its leading performance amongst two-dimensional approaches as well as its
relevancy to the tag-based item recommendation methods proposed in this thesis. It
comes closest to the proposed methods, in terms of not dealing with the semantic
problems of tags and not adding external information for generating the list of
recommendations, other than that of tagging data.
2.2.2.2 Multi-Dimensional Approaches
Although the approach of splitting the three-dimensional characteristic of tagging
into two-dimensional ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and ⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ pair relations is
possible, the total interaction between the three dimensions is lost. Consequently,
representing tagging data using the two-dimensional approach will not sufficiently
expose the latent relationship between user, item, and tag and results in poorer
recommendation quality (Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-
Thieme, 2010; Symeonidis et al., 2008, 2010).
The tensor model has been successfully used to represent multi-dimensional
data for many decades in various fields such as bioinformatics (Dyrby et al., 2005;
Troyanskaya et al., 2001), chemistry (Appellof and Davidson, 1981), computer
vision (Liu, Musialski, et al., 2009; Vasilescu and Terzopoulos, 2002), web mining
(Sun et al., 2005), monitoring systems (Tsourakakis, 2009), and recommendation
systems (Kutty et al., 2012; Rawat et al., 2011). In the past few years, researchers
have adopted tensor models to represent tagging data as it has shown to efficiently
capture the latent relationships among the users, items, and tags (Ifada and Nayak,
2014c; Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010;
Symeonidis et al., 2008, 2010). For this reason, this thesis uses a tensor model for
solving the item recommendation task in tag-based recommendation systems.
28 Chapter 2: Literature Review
User Profile Construction
Using a tensor model, the tagging data is constructed as a third-order tensor. In other
words, user profiles are represented in the form of a tensor model. Section 2.2.3
details how the tensor model is populated by implementing a tagging data
interpretation scheme.
Latent Factors Generation
Latent factors generation is the process of deriving the latent relationships between
dimensions of the tensor model. This process is conducted by implementing the
tensor factorization technique.
Two broad families of tensor factorizations are Tucker (Tucker, 1966) and
Candecomp/Parafac (CP) (Carroll and Chang, 1970). Tucker (1966) factorization
generalises Singular Value Decomposition (SVD) into a higher-order form by
performing SVD on the matricized data for each dimension (Kolda, 2006). Higher-
Order SVD (HOSVD) and Higher-Order Orthogonal Iteration (HOOI) are two
common tensor factorizations based on Tucker.
HOSVD factorizes a tensor into a core tensor and latent factor matrices
correspond to each mode (De Lathauwer et al., 2000a; Kolda and Bader, 2009).
HOSVD does not produce an optimal rank approximation of tensor 𝒴 since it
optimizes each mode separately and disregards the interaction among them (Kolda,
2006). In HOSVD, all factor matrices are orthogonal and the matrix slices of core
tensor are mutually orthogonal (Bergqvist and Larsson, 2010).
For a third-order tensor, 𝒴 ∈ ℝ𝑄×𝑅×𝑆 where 𝑄, 𝑅, and 𝑆 are the size of a set of
users, items, and tags, respectively, the HOSVD factorization results in three latent
factor matrices of 𝑀(1) ∈ ℝ𝑄×𝐽, 𝑀(2) ∈ ℝ𝑅×𝐾, and 𝑀(3) ∈ ℝ𝑆×𝐿 and one core tensor
𝒞 ∈ ℝ𝐽×𝐾×𝐿. The 𝐽, 𝐾, and 𝐿 are the number of columns in the corresponding latent
factor matrices.
𝒴 ∶= 𝒞 ×1 M(1) ×2 M
(2) ×3 M(3) (2.1)
The latent factor matrices are determined by implementing SVD on each tensor
mode for each dimension. The core tensor defines the interaction between the users,
items and tags and gives significant impact on the result (Sun et al., 2005). The
Chapter 2: Literature Review 29
HOSVD or Tucker factorization for the third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is illustrated
in Figure 2.4.
HOOI (De Lathauwer et al., 2000b; Kolda, 2006) is the iterative least square
optimization approach for Tucker. It uses HOSVD to initialize the factor matrices.
The factorization is achieved by solving:
min𝒞, M(1), M(2), M(3)‖𝒴 − 𝒞 ×1 M(1) ×2 M(2) ×3 M(3)‖ (2.2)
M(2)
(R x K)
Y »
C(core)
M(1)
(Q x J)
M(3)
(S x L)(J x K x L)
(Q x R x S)
Figure 2.4. The Tucker factorization model for a third-order tensor
m(2)
1
Y »m
(3)1
m(1)
1
m(3)
2
m(1)
2
m(3)
F
m(1)
F
+ +
m(2)
2 m(2)
F
+. . .
Figure 2.5. The CP factorization model for third-order tensor
The CP factorization (Carroll and Chang, 1970; Harshman, 1970) can be
considered as a special case of the Tucker model where the core tensor is diagonal
(Mørup et al., 2008). It factorizes a tensor into a sum of component rank-one tensors
that optimally approximate the original tensor (Kolda and Bader, 2009).
For example, given a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, the CP factorization can
be defined as:
𝒴 ∶= ∑ 𝜆𝑓 𝑚𝑓(1)
∘ 𝑚𝑓(2)
∘ 𝑚𝑓(3)𝐹
𝑓=1 (2.3)
Where 𝐹 is a positive integer while 𝜆 ∈ ℝ𝐹 , 𝑚𝑓(1)
∈ ℝ𝑄, 𝑚𝑓(2)
∈ ℝ𝑅, and 𝑚𝑓(3)
∈ ℝ𝑆.
CP factorization for the third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is illustrated in Figure 2.5.
30 Chapter 2: Literature Review
Recommendation Generation
Two approaches of using a tensor model for generating list of recommendations are:
(1) by factorizing the tensor model and using latent factors to infer the
recommendations (Leginus et al., 2012; Rendle and Schmidt-Thieme, 2010); and (2)
by reconstructing the latent factors and using the reconstructed tensor to infer the
recommendations (Kutty et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013;
Symeonidis et al., 2008, 2010).
Remark: Scalability is a common problem in the tensor model. For the first
approach of generating recommendations, i.e. inferring recommendations based on
latent factors, existing works propose to tackle the issue within the factorization
process by applying the memory efficient (Kolda and Sun, 2008) and pair-wise
optimization criterion (Rendle and Schmidt-Thieme, 2010) approaches. The second
type of recommendation generation approach, i.e. using a tensor reconstruction
approach, is a step further than the former. Tensor reconstruction is an approximation
of the initial tensor, computed by multiplying all latent factors, to reveal the latent
relationships between dimensions of the tensor model. This process is memory
expensive and, therefore, reconstructing large size tensors is infeasible (Kutty et al.,
2012; Leginus et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013;
Symeonidis et al., 2010). Solving the scalability problem is still an open problem.
This thesis proposes solutions for the two recommendation generation
approaches. To solve the scalability problem of the first recommendation approach,
this thesis implements the weighted scheme and the list-wise ranking based criterion
approaches that are presented in Chapter 4 and Chapter 5 respectively. For the
second recommendation generation approach, a memory efficient loop approach is
applied for scalable full tensor reconstruction, as presented in Chapter 4.
In the next sections, the connections between the tag-based item
recommendation methods developed in this thesis and the closely related methods
are classified and detailed in terms of the tagging data interpretation schemes
(Section 2.2.3) and ranking-based recommendation approaches (Section 2.3) used.
The summary of the classifications are presented in Table 2.2 and Table 2.3.
Chapter 2: Literature Review 31
2.2.3 Tagging Data Interpretation Schemes
Data interpretation is the process of interpreting information, i.e. tagging data,
collected from the users for representing the user profiles in a tensor model.
Selecting the appropriate interpretation scheme in a tag-based recommendation
method is crucial as it defines the user profile representation and affects the
recommendation quality. Different interpretation schemes generate different types of
data be populated in the tensor model, which influence how the task of
recommendation be solved, later detailed in Section 2.3.
A typical tag-based item recommendation method customarily interprets the
observed tagging data as “positive” or “relevant” entries. On the contrary, how the
non-observed tagging data should be interpreted, remains disputed and open to
researchers’ perceptions. There are two well-known interpretation schemes, namely
boolean and set-based schemes. Fundamentally, these two schemes differ in the way
that the non-observed tagging data is interpreted.
2.2.3.1 The boolean Scheme
The boolean scheme (Symeonidis et al., 2010) is commonly used in a tag-based item
recommendation method. It simply interprets the tagging data as binary data that
includes two types of entries, i.e. “relevant” and “irrelevant” entries. The “relevant”
entries, labelled as “1”, are the observed entries where the user has explicitly
revealed interest by annotating an item using tags; while the “irrelevant” entries,
labelled as “0”, are the remaining (non-observed) entries. The recommendation
model based on the boolean scheme tries to learn and predict a 0 for each of the
“irrelevant” cases (Rendle, Balby Marinho, et al., 2009).
Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set
of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. From the tagging data
𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an activity of user 𝑢 to
annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in
which users have expressed their interest to items in the past by annotating them
using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Note that the number of observed tagging data is usually
very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|.
32 Chapter 2: Literature Review
An initial third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed where 𝑄, 𝑅, and 𝑆 are
the size of the set of users, items and tags respectively, while each tensor entry, 𝑦𝑢,𝑖,𝑡,
is given a numerical value that represents the relevance grade of tagging activity.
Figure 2.6(a) illustrates a toy example, in which a tensor holds the record of 𝐴𝑜𝑏,
𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}.
Each slice of the tensor represents a user matrix, which contains the user tag usage
for each item. The rules of boolean scheme relevance grade labelling to generate the
entries of tensor 𝒴 can be formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (2.4)
Figure 2.6(b) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5, for which entries
are generated from the tagging data by implementing the boolean scheme.
User 3
+
+ +
User 2
+
+ +User 1
+ +
+
tag
ite
m
User 30 0 0 0 0
0
0
0
0 0 0 0
1 0 0 0
0 0 1 1
User 2
1 0 0 0 0
0
0
0
0 0 0 0
1 0 1 0
0 0 0 0
User 10 0 0 0 0
1
0
0
1 0 0
0 0 1 0
0 0 0 0
tag
ite
m 0
(a) (b)
User 30 -1 0 -1 -1
0
0
0
-1 0 -1 -1
1 0 -1 -1
-1 0 1 1
User 2
1 -1 0 -1 0
-1
-1
-1
-1 0 -1 0
1 0 1 0
-1 0 -1 0
User 1-1 0 -1 -1 0
1
-1
-1
1 -1 0
0 -1 1 0
0 -1 -1 0
tag
ite
m 0
(c)
Figure 2.6. A toy example with 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}: (a) The
observed tagging data, and the initial tensor 𝒴 ∈ ℝ3×4×5 for which entries are generated by
implementing (b) the boolean, and (c) the set-based schemes
Chapter 2: Literature Review 33
2.2.3.2 The set-based Scheme
The set-based scheme interprets the tagging data as multi-graded data of three
distinct entries, i.e. “relevant”, “irrelevant”, and “indecisive”, revealed from the
observed and non-observed entries. The set-based scheme was proposed solving the
two shortcomings of the boolean scheme: (1) the sparsity problem – 0 values
dominate the data, and (2) the overfitting problem – all non-observed entries are
denoted as 0 (Rendle, Balby Marinho, et al., 2009).
A ranking constraint is employed in the set-based scheme to differentiate the
relevance grade of data. The scheme infers that, for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏, user 𝑢 is less
favourable to use tag 𝑡 for annotating any items of “irrelevant” entries other than
those of “relevant” entries (Gemmell et al., 2011). Accordingly, higher ordinal
relevance values are assigned to the “relevant” entries and labelled with “1” value,
whereas the “irrelevant” entries are labelled with “–1” value. The “0” value is used to
label “indecisive” entries to be predicted for generating recommendations. The rules
of set-based scheme relevance grade labelling to generate the entries of tensor 𝒴 can
be formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ {𝑖|(𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏}
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(2.5)
Figure 2.6(c) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 for which entries
are generated from the tagging data by implementing the set-based scheme.
2.2.4 Summary and Discussion
Tag-based recommendation systems capture the user interest by analysing tagging
data and support the process of generating a list of item recommendations by
learning from the users tagging preferences. Tagging data records the user’s tagging
activities in a tag-based system and results in accumulation of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩
ternary relations over a period of time. The success of a tag-based recommendation
system depends on how the relations in the tagging data are exploited (Bogers and
van den Bosch, 2009; Kim et al., 2010). User profile can be modelled either using a
two-dimensional approach by projecting the tagging data ternary relation into
multiple matrix models, or a multi-dimensional approach by representing the tagging
34 Chapter 2: Literature Review
data as a tensor model, in which a data interpretation scheme is required to define
the user profile representation. Table 2.2 summarises the tag-based recommendation
research according to the user profile modelling approaches, the data interpretation
schemes, and the types of recommendation. It is to be noted that the focus of this
thesis is to generate item recommendations.
Since tagging data is a multi-dimensional data, it is natural to model the user
profiles generated from tagging data with a multi-dimensional approach, i.e. tensor
model. Researchers have proposed two ways to interpret the tagging data to populate
the tensor models. The first one is the straightforward boolean scheme. Overcoming
its drawback of a sparsity problem as the non-observed data dominate the tensor
model, the set-based scheme is proposed. Despite its success in solving the
drawbacks of the boolean scheme, the set-based scheme still lacks in efficiently
learning from the non-observed data as it overgeneralises the “irrelevant” entries of
the non-observed data (Ifada and Nayak, 2014a). This brings the necessity of
alternative interpretation schemes that can thoroughly utilise the user’s tagging
history for generating the list of recommendations. Table 2.2 lists the two solutions
proposed in this thesis, UTS and graded-relevance schemes, to tackle the
shortcoming and fill the gap, as described in Chapter 5.
Once the tensor model is constructed, the next steps are latent factors
generation via tensor factorization, and tensor reconstruction stages. Two ways of
inferring recommendations from a tensor model are: (1) using the latent factors
(Leginus et al., 2012; Rendle and Schmidt-Thieme, 2010); and (2) using the
reconstructed tensor, i.e. full reconstruction of the original tensor (Kutty et al., 2012;
Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2008, 2010). For
the first approach, existing studies implementing memory efficient (Kolda and Sun,
2008) and pair-wise optimization criterion (Rendle and Schmidt-Thieme, 2010)
approaches to solve the scalability problem occurred within the factorization process.
For the second approach, tensor reconstruction is the process of approximating the
initial tensor, computed by multiplying all latent factors. It is memory expensive and,
therefore, reconstructing large size tensors is infeasible (Kutty et al., 2012; Leginus
et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010).
This thesis develops four methods for generating tag-based item recommendations
that provide solutions for both approaches, as described in Chapter 4 and Chapter 5.
Chapter 2: Literature Review 35
User Profile
Modelling
Approach
Method
Type of
Recommen-
dation
boolean interpretation scheme
Matrix Fusion (Tso-Sutter et al., 2008)
Best CF run (Bogers and van den Bosch, 2009)
CTS (Kim et al., 2010)
Tag Cloud (Barragans-Martinez et al., 2010)
WTR (Liang et al., 2010)
User-tag-object Diffusion (Zhang et al., 2010)
CUM (Kim et al., 2011)
LIM-Item (Alper, 2012)
Item
Topickr (Negoescu and Gatica-Perez, 2008)
SimGroup (Lee and Brusilovsky, 2010)
UCTM (Kim and El Saddik, 2013)
User
Vote+ (Sigurbjörnsson and Van Zwol, 2008)
TagiCofi (Zhen et al., 2009)
LIM-Tag (Alper, 2012)
Tag
Tensor MAX-Item (Symeonidis et al., 2010)
TB (Nanopoulos, 2011)
Spectral K-means (Leginus et al., 2012)
TFC (Rafailidis and Daras, 2013)
TRPR (Ifada and Nayak, 2014b, 2014c)
We-Rank
Item
Tensor Reduction (Symeonidis et al., 2008)
MAX-User (Symeonidis et al., 2010)
User
MAX-Tag (Symeonidis et al., 2010)
LOTD (Cai et al., 2011)
Tag
set-based interpretation scheme
Tensor RTF (Rendle, Balby Marinho, et al., 2009)
PITF (Rendle & Schmidt-Thieme, 2010)
RMTF (Jitao et al., 2012)
Tag
UTS interpretation scheme
Tensor Do-Rank (Ifada and Nayak, 2015) Item
graded-relevance interpretation scheme
Tensor Go-Rank (Ifada and Nayak, 2016) Item
Table 2.2. Classification of tag-based recommendation research according to the user profile
modelling approaches, the data interpretation schemes, and the types of recommendation
36 Chapter 2: Literature Review
2.3 RANKING-BASED RECOMMENDATION APPROACHES
Section 2.2 has detailed the important aspects of tag-based item recommendation
systems and the importance of selecting the appropriate user profile modelling
approach and tagging data interpretation scheme for populating the model. In other
words, Section 2.2 is underlying the model to be used. However, there is a scope for
improvement that is focusing on the ranking of list of recommendations. This section
discusses the ranking-based recommendation approaches that can be implemented to
solve the recommendation task by learning from the constructed model.
The task of a tag-based item recommendation system is to generate the list of
items that may be of interest to a user, by learning from the user’s past tagging
behaviour. By using the predicted preference scores, the list of item
recommendations is then sorted in descending order. Users usually show more
interest in the few items at the top of the list than those further down in the list
(Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009; Mohan et al., 2011; Wang
et al., 2013; Weimer et al., 2007). The order of items in the recommendation list is
essential and, therefore, it becomes advantageous to implement a learning-to-rank
approach for learning the tag-based recommendation model, to solve the item
recommendation task.
Figure 2.7 shows the typical learning-to-rank approach framework. In the
learning phase, a learning algorithm is applied to learn the ranking model, built from
the training data, such that it can predict the ground truth relevance grades in the
training data as accurately as possible, in terms of a loss function. In the test phase,
the model learned in the training phase is employed to generate the list of
recommendations for a target user.
The learning-to-rank approaches can be categorised into three types: point-
wise, pair-wise, and list-wise according to the input representation and loss function
used (Balakrishnan and Chopra, 2012; Liu, 2009; Mohan et al., 2011). The following
sub section details the process of learning-to-rank of each approach.
Chapter 2: Literature Review 37
Learning System
Ranking System
Test Data
Training Data
Ranking Model
min Loss
Prediction
Figure 2.7. Learning-to-rank framework, adapted from (Liu, 2009)
2.3.1 Point-wise Based Ranking Approaches
To solve the recommendation task using a point-wise based ranking approach, a
recommendation model is learned to predict whether the user will like the predicted
item or not, assuming there is no interdependency between the predicted items (Liu,
2009; Mohan et al., 2011; Rendle, 2011). The learning model contains a function that
takes the feature vector of an item as the input and predicts the relevance degree of
that item to the user. The function is defined, such that the ranking model is learned
as the corresponding regression or classification problem (Balakrishnan and Chopra,
2012; Liu, 2009).
2.3.1.1 Regression based algorithm
The regression function is used when the output of the ranking model contains real-
valued predicted preference scores. In this case, the function space of a tag-based
item recommendation system can be generally formulated as follows:
𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔) → ℝ (2.6)
Several recommendation systems have implemented regression-based
algorithms and point-wise based ranking approaches. For the systems that use ratings
as explicit feedback data, SVD++ (Koren, 2008) proposed to solve the
recommendation task by merging the matrix latent factor and neighbourhood models.
38 Chapter 2: Literature Review
In order to improve the accuracy, the method extends the models by exploiting both
explicit and implicit feedback data. Similarly, MF (Koren et al., 2009) also developed
an extended matrix latent factor model by combining it with the temporal effect
model. Alternatively, Koren and Sill (2011) suggested that the user’s rating
feedbacks should be viewed as ordinal rather than numeric values, in order to
understand the genuine reflection of user preference. By implementing the matrix
factorization approach, this method parameterizes a threshold in such a way that it
allows each user to have a different scoring scale. On the other hand, Collaborative
Ranking (Balakrishnan and Chopra, 2012) tried to compute predicted rescaled
ratings, instead of predicted rating values, by simultaneously learning the latent
factors controlled by a set of parameters 𝜃. However, undoubtedly, all of these
rating-based methods, in which the user profiles are generated from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩
binary relation, are not suitable for the task of tag-based item recommendation
systems that build the user profiles from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation.
For the case of item recommendation systems with implicit feedback data,
MAX (Symeonidis et al., 2010) was proposed to solve the prediction problem of the
tag-based systems. This method is one of the first tag-based item recommendation
methods that used tensor as its learning model, in which the boolean scheme is used
for constructing the user profiles. It applies the HOSVD-based decomposition
technique (Kolda and Bader, 2009) and directly utilises the reconstructed tensor to
generate the list of recommendations based on the maximum values of the calculated
predicted score on each user-item set. The method simply assumes that the level of
user preference on a candidate item is solely represented by the calculated predicted
score on a tag, i.e. the influence of other tags is disregarded. This affects the
recommendation quality. Furthermore, the method builds the tensor model from a
relatively small size data due to scalability issue that commonly occurs for
reconstructing large tensor models.
2.3.1.2 Classification based algorithm
The classification function is used when the output of the ranking model contains
discrete predicted preference scores. In this case, the function space of a tag-based
item recommendation system can be generally formulated as follows:
𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔) → ℤ (2.7)
Chapter 2: Literature Review 39
For the systems that use implicit feedback data, Li et al. (2007) proposed to
solve the ranking problem by converting classification results to class probabilities
using a logistic function. By implementing a weighted scheme, the probabilities are
then converted as ranking scores. Despite it being claimed as a robust method, this
work is not yet suitable for a tag-based item recommendation system as it was built
for a web search system, where the data consists of a set of queries, in which a set of
returned documents is listed for each query. Features captured from the system –
such as anchor text, URL, document title, and body of the text (Burges et al., 2005) –
are then used to label the relevance of each ⟨𝑞𝑢𝑒𝑟𝑦, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡⟩ binary relation. This
is unlike the tag-based system that labels the relevance of each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩
ternary relation based on the recorded tagging data.
Existing point-wise ranking based recommendation methods (Nanopoulos,
2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010), which use tensor as the
learning model, directly employ a reconstructed tensor for generating the
recommendations based on the maximum values of predicted score in each user-item
set of tensor elements. These approaches solely assume that the predicted score in the
reconstructed tensor represents the level of user preference for an item based on a tag
only and disregard the activity histories of the users (Jain and Varma, 2011), which
influence the user likelihood to select the recommended items as studied widely in
recommendation research (Kim et al., 2010). In other words, the point-wise ranking
based approach cannot properly deal with the relative order of the list of
recommended items. Consequently, this approach may unintentionally
overemphasise the items that are further down in the list (Liu, 2009).
This thesis attempts to tackle these disadvantages of the point-wise based
ranking approach by proposing two methods, i.e. Tensor-based Item
Recommendation using Probabilistic Ranking (TRPR) and Ranking using Weighted
Tensor (We-Rank). TRPR deals with the problem by applying probabilistic ranking to
the list of candidate items; meanwhile We-Rank employs a weighting scheme to
ensure that the items are appropriately emphasised, during the learning process. The
details of these methods are presented in Chapter 4.
For the benchmarking purpose, MAX by Symeonidis et al. (2010) is used due
to its good performance record as well as for the relevancy with the above two
proposed methods. MAX has the same learning framework as the proposed method
40 Chapter 2: Literature Review
as it does not implement any additional technique to deal with the semantical
problems of tags. Moreover, both MAX and the two point-wise proposed methods
implement tensor model, a multi-dimensional approach, to build the user profile; and
the boolean scheme for interpreting tagging data and populating the tensor model.
2.3.2 Pair-wise Based Ranking Approaches
The pair-wise based ranking approach gives an alternative solution to model the
relative order of the list of recommended items. Using this approach, a
recommendation model is learned to predict the order of a pair of items, in which the
interdependency occurs between the two paired items (Liu, 2009; Mohan et al., 2011;
Rendle, 2011). The learning model of such an approach contains functions that take a
pair of items as the input, in order to predict the relative order between them. The
loss function of this approach is defined, such that the ranking model is learned as a
pair-wise regression or classification loss (Liu, 2009).
2.3.2.1 Regression based algorithm
In this case, the function space of a tag-based item recommendation system can be
generally formulated as follows:
𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, 𝑖𝑡𝑒𝑚2, 𝑡𝑎𝑔) → ℝ (2.8)
Several recommendation systems have been proposed that implemented a
regression based algorithm and a pair-wise based ranking approach. For such
systems that use rating as explicit feedback data, EigenRank (Liu and Yang, 2008)
proposed to solve the recommendation task by determining the user’s similarity,
based on the correlation between the rankings of pair of items rather than the rating
values. Using the similarity, the target user’s neighbours are selected in order to
calculate the user and item predicted preference score, in which the random walk
model was applied. Alternatively, Liu, Zhao, et al. (2009) also tried to model the
user’s preferences from the relative ordering of items by employing the probabilistic
latent preference analysis (pLPA), instead of the commonly used statistical model,
i.e. probabilistic latent semantic analysis (pLSA) (Hofmann, 1999). This method
claimed that it solved the limitation of the pLSA model, by employing pLPA and can
handle the task of recommendation systems that use either explicit or implicit
Chapter 2: Literature Review 41
feedback data. In a different way, Balakrishnan and Chopra (2012) implemented a
matrix factorization approach, in which the latent factor’s model and a set of
parameters 𝜃 are simultaneously learned via stochastic gradient descent procedure,
in order to compute the predicted rescaled ratings.
For the case of item recommendation systems with implicit feedback data, BPR
(Rendle, Freudenthaler, et al., 2009) was proposed. The method used the ranking-
based scheme, i.e. set-based scheme, to construct the user profile, the matrix
factorization to generate the latent factors, and the smoothed AUC-based
optimization to formulate the objective function of the learning model. Definitely,
this work and all of the rating-based methods are not suitable for the task of tag-
based item recommendation systems, as they are trying to solve the problem of the
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation only, instead of solving the problem of
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation. Using the binary relation to solve the
recommendation problem of the ternary relation will means that the additional third
dimension, i.e. tag, is not used in the learning model. This is disadvantageous for a
tag-based item recommendation system, since it predicts the list of items that may be
of interest to a user, by learning from the user’s past tagging behaviour.
Subsequently, Rendle and Schmidt-Thieme (2010) implemented the framework
of BPR to solve the task of tag recommendation systems using tensor factorization to
generate the latent factors and named the method as PITF. As this method is using
Area Under the Receiver Operating Characteristic Curve (AUC) as the optimization
criterion, it undesirably assigns equal penalty to all mistakes made in the list
regardless of their positions, such as top or bottom, in the recommendation list (Shi,
Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012). Additionally, recent work has
shown that the implementation of a set-based scheme is not efficient for interpreting
tagging data (Ifada and Nayak, 2014a) since it overgeneralises the “irrelevant”
entries of the non-observed data, as previously described in Section 2.2.3.2. PITF is
different from the tag-based item recommendation system, as it generates
recommendations based on two specified dimensions, i.e. user and item; while the
latter are made with specified users only, i.e. list of item recommendations generated
for a target user is influenced by all tags.
42 Chapter 2: Literature Review
2.3.2.2 Classification based algorithm
In this case, the function space of a tag-based item recommendation system can be
generally formulated as follows:
𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, 𝑖𝑡𝑒𝑚2, 𝑡𝑎𝑔) → ℤ (2.9)
For the systems that use rating as explicit feedback data, Balcan et al. (2007)
suggested solving the ranking problem by implementing a robust reduction technique
in order to reduce the ranking, as measured by the Area Under the Receiver
Operating Characteristic Curve (AUC), to binary classification. Though the results
are promising, this work is not suitable for the tag-based item recommendation
system, as it was built for a system of two dimensional correlations. A
recommendation method for binary relation builds its learning model so that there is
only one relevance value on each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ correlation. On the other hand, a tag-
based item recommendation method builds its learning model from the
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations tagging data where there are multiple relevance
values of items on each (𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔) set.
2.3.3 List-wise Based Ranking Approaches
Though the pair-wise based ranking approach offers advantage over the point-wise
approach, it disregards the fact that ranking is a prediction task on a list of items (Cao
et al., 2007). This makes the list-wise based ranking approach a suitable solution to
solve a recommendation task (Liu, 2009), as used in this thesis. Using this approach,
a recommendation model is learned to predict an ordered set of items which will be
of interest to a user, in which the ranking of a predicted item depends on other
corresponding items (Liu, 2009; Mohan et al., 2011). The learning model of such an
approach contains a function that can take a group of items as the input, in order to
predict either their relevance grades or permutation. In this case, the function space
of a tag-based item recommendation system can be generally formulated as follows:
𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, … , 𝑖𝑡𝑒𝑚𝑛, 𝑡𝑎𝑔) → ℝ (2.10)
The loss function of a list-wise approach can be divided into two categories,
i.e. (1) directly optimizing the ranking evaluation measures, and (2) minimizing the
list-wise loss function.
Chapter 2: Literature Review 43
2.3.3.1 Directly Optimizing Ranking Evaluation Measure
In this category, a learning-to-rank approach optimizes the recommendation model
with respect to the ranking evaluation measure in order to generate a quality Top-𝑁
recommendation list (Chapelle and Wu, 2010; Cremonesi et al., 2010; Liu, 2009; Xu
and Li, 2007). The widely used ranking evaluation measures include Mean Average
Precision (MAP), Mean Reciprocal Rank (MRR), and Discount Cumulative Gain
(DCG).
MAP and MRR are commonly used measures in the case of binary relevance
data (Chapelle and Wu, 2010; Liu, 2009; Shi, Karatzoglou, Baltrunas, Larson,
Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas, Larson, Oliver, et al., 2012). MAP
is defined as the mean value of Average Precision (AP) that considers the rank
position of each relevant item. In this case, AP is the average of precision scores at
the positions where there are relevant items (Buckley and Voorhees, 2000; Chapelle
and Wu, 2010). MRR is the mean of reciprocal rank (RR), which is equivalent to
MAP in cases where the user wishes to see only one relevant item (Craswell, 2009;
Voorhees, 1999). The RR itself is the reciprocal of the rank of the first relevant item.
TFMAP (Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012) was
proposed to solve the task of the context-aware recommendation systems by directly
optimizing MAP for the learning model, that is, to generate a list of items to each
user under a given context. The method used the boolean scheme to construct the
user profile, the tensor factorization to generate the latent factors, and the smoothed
version of MAP to formulate the objective function of the learning model so that the
standard optimization approach can be deployed. In addition, CLiMF (Shi,
Karatzoglou, Baltrunas, Larson, Oliver, et al., 2012) was proposed to solve the task
of the social network recommendation systems by directly optimizing MRR for the
learning model. The method used the boolean scheme to construct the user profile,
the matrix factorization to generate the latent factors, and the lower bound of
smoothed version of RR to formulate the objective function of the learning model so
that the standard optimization approach can be deployed.
Despite the promising results, the aforementioned works are quite different
from the tag-based item recommendation problem, which is the focus of this thesis.
For the case of context-aware recommendation systems, the list of recommendations
is generated from the specified user and context dimensions, while the tag-based one
44 Chapter 2: Literature Review
is made from specified user only. For the case of social network recommendation
systems, the method is solving the problem of two dimensional data, i.e.
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation, which makes it undoubtedly different from the tag-
based item recommendation problem that has to solve the problem of three
dimensional data, i.e. ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation.
Different from MAP and MRR, DCG is more widely used in the case of multi-
graded relevance data (Chapelle and Wu, 2010; Liu, 2009; Weimer et al., 2007).
DCG assumes that the higher the ranked position of a relevant item, the more
important it is to the user and the more likely it is to be selected (Järvelin and
Kekäläinen, 2002). Accordingly, DCG implements a discount function such that the
score of an item at the lower ranks is reduced. NDCG is the normalization of DCG
by its Ideal DCG (IDCG), i.e. the DCG of the best ranking result.
Recent works have investigated the possibility of extending the binary
relevance data measure so that it could deal with the multi-graded relevance data. As
a result, the Expected Reciprocal Rank (ERR) (Chapelle et al., 2009) and the Graded
Average Precision (GAP) (Ferrante et al., 2014; Robertson et al., 2010) were
proposed as the generalisation of Average Precision (AP) and Reciprocal Rank (RR)
respectively, that work as alternative measures for multi-graded relevance data.
To solve the task of recommendation systems which use rating as explicit
feedback data, CoFiRank (Weimer et al., 2007) was proposed, in which the objective
of the learning model was to optimize the Normalized DCG (NDCG). This method
used the matrix factorization to generate the latent factors and used the minimization
of a convex upper bound to formulate the objective function of the learning model,
so that NDCG can be minimized. Shi et al. (2013b) employed a matrix factorization
technique and optimized the learning model based on the lower bound of the
smoothed RR measure. Alternatively, GAPfm (Shi et al., 2013a) was developed,
using the matrix factorization to generate the latent factors and the smoothed version
of GAP to formulate the objective function of the learning model. However, it is to
be noted once more that the tag-based recommendation problem is different from
these works and poses difficulty as the tag-based systems use tags as implicit
feedback data. Recommendation systems with explicit rating data build their model
by collecting the ratings, which represent the preference level of each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩
binary relation. The list of recommendations is then generated by ranking the
Chapter 2: Literature Review 45
predicted preference scores of the unobserved ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ relations (Balakrishnan
and Chopra, 2012; Weimer et al., 2007). In contrast, a recommendation system with
tagging data builds its model by using the user tagging history as data entries. The
key challenges of this system over the aforementioned explicit feedback data system
are modelling the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary input data, inferring the latent
relationships, and predicting each entry with a score that indicates its relevance
degree (Ifada and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009). The
recommendation list is generated by ranking the predicted preference scores of list of
items that may be of interest to a user, under all tags.
2.3.3.2 Minimizing List-wise Loss
In this category, a learning-to-rank approach optimizes the recommendation model
with respect to the list-wise loss function in order to generate a quality Top-N
recommendation list (Liu, 2009).
ListRank (Shi et al., 2010) was proposed to solve the task of recommendation
systems that use rating as explicit feedback data. The method used the matrix
factorization to generate the latent factors and the cross-entropy of top-one
probabilities of the items as the loss function. As previously described, the task of
this work is different from the task of a tag-based item recommendation system.
Currently, no existing tag-based item recommendation works in the list-wise
based ranking approach are available. For this reason, PITF is used as one of the
benchmarking methods, due to its well-known reputation and its relevancy to two
methods proposed in this thesis, i.e. DCG Optimization for Learning-to-Rank (Do-
Rank) and GAP Optimization for Learning-to-Rank (Go-Rank) methods. All of these
methods implement tensor model, a multi-dimensional approach, to build the user
profile, and a ranking-based scheme for interpreting the tagging data and populating
the tensor model. Details of the proposed methods are provided in Chapter 5. It is to
be noted that since this thesis has developed the methods of tag-based item
recommendation systems, the PITF is extended for the task of item recommendation
systems. The adaptation is necessary as the task of recommending tags differs from
the task of recommending items.
46 Chapter 2: Literature Review
2.3.4 Summary and Discussion: Ranking based recommendation
Implementing a learning-to-rank approach is advantageous for solving the tag-based
item recommendation task, i.e. generating an ordered list of item recommendations
based on user preferences. The choice of input representation and loss function used
in learning the model determine the type of ranking approach. Using the point-wise
based ranking approach, a recommendation ranking problem is modelled as a
regression or classification task, and therefore this approach uses the regression or
classification loss as the loss function. Using the pair-wise based ranking approach, a
recommendation ranking is modelled as a pair-wise regression or classification task,
and therefore this approach employs the pair-wise regression or classification loss as
the loss function. On the other hand, to solve the recommendation task using a list-
wise based ranking approach, a recommendation model is learned to predict an
ordered list of items, and therefore direct optimization to the ranking evaluation
measures or minimization of the list-wise loss is used as the loss function for this
approach.
Table 2.3 summarises the ranking-based research according to their ranking
approaches, loss functions, and feedback forms. It can be observed that there are not
many works that implement the learning-to-rank approach that have been done for
solving the tag-based item recommendation task. In fact, there is no existing work in
a list-wise based ranking approach. The four tag-based item recommendation
methods proposed in this thesis are filling the research gap, as described in Chapter 4
and Chapter 5.
Chapter 2: Literature Review 47
Methods Loss Function Feedback Data
Point-wise Based Ranking Approach
McRank (Li et al., 2007) (Information Retrieval) Classification I (feature)
SVD++ (Koren, 2008) Regression I + E (rating)
MF (Koren et al., 2009) Regression E (rating) + temporal
effect model
MAX (Symeonidis et al., 2010) Regression I (tag)
TB (Nanopoulos, 2011) Regression I (tag)
OrdRec (Koren and Sill, 2011) Regression E (rating)
CR-Pointwise (Balakrishnan and Chopra, 2012) Regression E (rating)
TFC (Rafailidis and Daras, 2013) Regression I (tag)
TRPR (Ifada and Nayak, 2014b, 2014c) Regression I (tag)
We-Rank Regression I (tag)
Pair-wise Based Ranking Approach
Robust Reduction (Balcan et al., 2007) Classification E (rating)
EigenRank (Liu and Yang, 2008) Regression E (rating)
pLPA (Liu, Zhao, et al., 2009) Regression E (rating)
BPR (Rendle, Freudenthaler, et al., 2009) AUC E (rating)
PITF (Rendle and Schmidt-Thieme, 2010) AUC I (tag)
CR-Pairwise (Balakrishnan and Chopra, 2012) Regression E (rating)
RMTF (Jitao et al., 2012) Regression I (tag)
List-wise Based Ranking Approach
CoFiRank (Weimer et al., 2007) NDCG E (rating)
ListRank (Shi et al., 2010) Cross-entropy E (rating)
TFMAP (Shi, Karatzoglou, Baltrunas, Larson,
Hanjalic, et al., 2012)
MAP I (context)
CLiMF (Shi, Karatzoglou, Baltrunas, Larson,
Oliver, et al., 2012)
MRR I (trust relationship)
GAPfm (Shi et al., 2013a) GAP E (rating)
xCLiMF (Shi et al., 2013b) RR E (rating)
Do-Rank (Ifada and Nayak, 2015) DCG I (tag)
Go-Rank (Ifada and Nayak, 2016) GAP I (tag)
Table 2.3. Classification of ranking-based recommendation research according to ranking approaches,
loss functions, and feedback forms. Here, I = Implicit and E = Explicit.
48 Chapter 2: Literature Review
2.4 CHAPTER SUMMARY AND CONCLUDING REMARKS
This chapter has reviewed the literature, divided into three sections, to solve the
problem of item recommendation in a tag-based system. The chapter begins with
introducing the “traditional” recommendation system, a class of function in Web
personalization, that has overcome the abundant information era by filtering the
irrelevant items to users and recommends the list of items interesting to users by
learning from their profiles. Content-based, collaborative filtering, and hybrid
approaches are the categories of the recommendation algorithms. A concluding
discussion on this section highlights an essential concern of the recommendation
systems, i.e. lack of explicit feedback data, which causes the quantity of data
available for the recommendation to be inadequate.
The following section examined the state of the research into the tag-based
item recommendation systems. These systems are now under research accompanying
the popularity of Social Tagging System (STS) applications in Web 2.0. A general
overview of STSs was first provided. This allows distinguishing the important
aspects of the tag-based item recommendation system, and understanding the
importance of selecting the appropriate user profile modelling approach and tagging
data interpretation scheme for building the recommendation model. Two user profile
modelling approaches, two-dimensional and multi-dimensional, were reviewed. This
leads to the decision of using a tensor to model the tagging data to solve the
problems of this thesis. Subsequently, two tagging data interpretation schemes,
boolean and set-based, were explained. Both schemes interpret the “relevant” entries
straightforwardly from the observed tagging data, while they vary in how the non-
observed data should be interpreted. Comprehensive observation of the schemes
confirms that those schemes lack in efficiently learning from the non-observed data.
This brings the necessity of alternative interpretation schemes that can thoroughly
utilise the user’s tagging history for generating the list of recommendations. A
concluding discussion on this section presents the classification of tag-based
recommendation research according to the user profile modelling approaches, the
data interpretation schemes, and the type of recommendations.
The remainder of this chapter examined the state of the research into the
ranking-based recommendation approaches since implementing a learning-to-rank
approach for learning the tag-based recommendation model is beneficial for solving
Chapter 2: Literature Review 49
the recommendation task. Three types of learning-to-rank approaches, i.e. point-
wise, pair-wise, and list-wise, were discussed. The categorisation of the approaches
is based on the input representation, determined by the data interpretation scheme
used, and the loss function that defines the optimization criterion of the learning
process. The characteristics of each approach were reviewed which leads to the
decision of implementing the point-wise and the list-wise approaches to build the
four proposed tag-based item recommendation methods. A concluding discussion on
this section presents the classification of ranking-based recommendation researches
according to their ranking approaches, loss functions, and feedback forms. It is to be
noted that that there are not many works, and in fact no existing work in list-wise
based ranking approach, that has been proposed specifically for the task of tag-based
item recommendation.
In summary, the following research gaps are highlighted after reviewing the
literatures:
Lack of efficient schemes that can thoroughly utilise the user’s tagging
history for generating the list of recommendations, as emphasised in Section
2.2.4;
Lack of efficient methods that efficiently implement a learning-to-rank
approach to solve the tag-based item recommendation task, as emphasised in
Section 2.3.4;
The works listed in Table 2.2 and Table 2.3 can essentially be categorised based on
the data interpretation schemes and ranking approaches used for constructing the user
profile and learning the recommendation model, respectively. However, no work has
been thoroughly done to study the correlation between those two in making
recommendation. This leads to the existence of the subsequent research gap:
Lack of comprehensive works that study whether a combination of an
interpretation scheme and a learning-to-rank approach has a positive
influence in making a recommendation.
The above three gaps have led to the formulated research questions and
objectives as listed in Section 1.2 and Section 1.3, respectively. Chapter 4 and
Chapter 5 will discuss the proposed methods to answer the research questions and to
achieve the research objectives.
50 Chapter 2: Literature Review
Chapter 3: Research Design 51
Chapter 3: Research Design
3.1 INTRODUCTION
This chapter describes the research design used in pre-processing the tagging data
and developing the proposed ranking methods for generating tag-based item
recommendations. Note that the proposed ranking recommendation methods are
detailed in the next two chapters. Chapter 4 will describe the point-wise based
ranking recommendation methods that implement the standard pre-processing
scheme. Chapter 5 will present the proposed pre-processing schemes and the list-
wise based ranking recommendation methods.
This chapter also presents the real-world, but openly available, tagging system
datasets that were used to evaluate the proposed methods. The selected datasets vary
in exhibiting the characteristics of user tagging behaviour. The evaluation measures
used to measure the performance of the proposed methods in this thesis have been
detailed here. Finally, the state-of-the-art tag-based item recommendation methods
benchmarked in this thesis have been provided.
3.2 RESEARCH DESIGN
This research aims to develop methods for building a tag-based item
recommendation system that explores the interplay between the multi-dimensions of
tagging data. In order to achieve the goal, efficient tagging data interpretation
schemes and ranking methods are proposed.
As illustrated in Figure 3.1, there are two major phases of the proposed
research. Phase 1 involves the pre-processing of tagging data to construct the user
profile representation as a tensor model populated by using the tagging data
interpretation scheme. Phase 2 includes the proposed ranking methods that were
developed based on: (a) point-wise ranking; and (b) list-wise ranking approaches.
The detail of each phase is described in the following subsections.
52 Chapter 3: Research Design
Tensor
Factorization
Tensor
Reconstruction
Phase 1: Tagging Data Pre-processing
Evaluation and
Comparison
Phase 2(a): Point-wise based Ranking Approaches
Interpreting The Non-observed Tagging Data
Tagging Data
Phase 2(b): List-wise based Ranking Approaches
Weighted Tensor
ConstructionRanking
Smoothing
Model the Observed Tagging Data as Tensor
Optimization
Criterion for
Multi-graded
Data
Fast Learning
(Efficiently
Selecting the
Entries)
Ranking
Smoothing
Optimization
Criterion for
Graded-Relevance
Data
Fast Learning
(Efficiently
Selecting the
Entries)
Tensor Model
Representation of
Tagging Data
Probabilistic
Ranking
Generate
Candidate Items
and Tag
Preferences
TRPR:
Probabilistic Ranking
(Section 4.2)
We-Rank: Weighted Tensor for Ranking (Section 4.3)
Tensor
Factorization
Do-Rank:Learning from Multi-Graded Data(Section 5.2)
Go-Rank:Learning from Graded-Relevance Data(Section 5.3)
Tensor
Factorization
Tensor
Factorization
User Profile
Construction based
on boolean scheme
Optimization
Criterion for
Regression Data by
implementing the
Weighted Scheme
Top-N Item
Recommendation
Top-N Item
Recommendation
Top-N Item
Recommendation
Top-N Item
Recommendation
User Profile
Construction based
on UTS scheme
User Profile
Construction based
on Graded-
relevance scheme
User Profile
Construction based
on boolean scheme
Generate
User Tag
Preference
Optimization
Criterion for
Regression Data
Figure 3.1. The research design
Chapter 3: Research Design 53
3.2.1 Phase-One: Tagging Data Pre-Processing
The pre-processing phase includes interpretation of tagging data for constructing the
user profile representation. Tagging data records the user’s tagging activities in a tag-
based system and results in accumulation of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations over
a period of time. The relation is naturally formed when a user uses a tag to annotate
an item. Such systems typically allow users to annotate an item with different tags as
well as different items being annotated with the same tag. Analysis of the tagging
data, that reflects the user profiles, allows a system to discover the latent factors that
govern the ternary relations. In this research, a user’s profiles, generated from the
tagging data, are represented as a tensor model.
Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set
of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. Tagging data, denoted as 𝐴,
can be defined as:
𝐴 ∶= 𝑈 × 𝐼 × 𝑇 (3.1)
where a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the activity of user 𝑢 to annotate item 𝑖
using tag 𝑡. The ternary relations within the tagging data can be naturally modelled
as a three-dimensional tensor of:
𝒴 ∈ ℝ𝑄×𝑅×𝑆 (3.2)
where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the
relevance grade that user 𝑢 has revealed in annotating item 𝑖 using a tag 𝑡. Each slice
of the tensor represents a user matrix that contains the user tags usage on annotating
items. In this research, the tensor 𝒴 is used as the base model in which ranking
learning is executed. In other words, the tensor model is called a ranking learning
model.
The observed tagging data, denoted as 𝐴𝑜𝑏, defines the state in which users
have revealed their interest to items in the past by annotating them using tags:
𝐴𝑜𝑏 ⊆ 𝐴 (3.3)
Usually, the number of observed tagging data is very less thus
|𝐴𝑜𝑏| ≪ |𝐴| (3.4)
54 Chapter 3: Research Design
Figure 3.2 presents a tensor toy example of 𝒴 ∈ ℝ3×4×5 where 𝑈 =
{𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4} and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. The “+” symbols denote the
entries of observed tagging data 𝐴𝑜𝑏, for example, user 𝑢1 has tagged item 𝑖2 using
tags 𝑡1 and 𝑡3. A tag-based recommendation system customarily interprets the
observed tagging data as “positive” or “relevant” entries. On the contrary, how the
non-observed tagging data should be interpreted, remains disputed and open to
researchers’ perceptions. The selection of an interpretation scheme is essential at this
stage as it defines the user profile representation and affects the recommendation
performance.
User 3
+
+ +
User 2
+
+ +User 1
+ +
+
tag
ite
m
Figure 3.2. A toy example of entries from the observed tagging data 𝐴𝑜𝑏
This thesis implements three different schemes for interpreting the tagging data
namely, boolean (Symeonidis et al., 2010), User-Tag set (UTS) based on set-based
(Rendle, Balby Marinho, et al., 2009) and graded-relevance. Fundamentally, these
three schemes differ in the way how the non-observed tagging data is interpreted.
Each scheme is governed by the underlying recommendation approach employed to
solve the recommendation task. Note that the last two schemes are developed in this
thesis.
3.2.1.1 The boolean Scheme
The boolean scheme interprets the observed tagging data as “relevant” entries
whereas all non-observed data are interpreted as “irrelevant” entries. As a result, this
scheme generates two possible distinct entries, “relevant” (“1”), “irrelevant” (“0”),
for each cell in the tensor. Figure 3.3(a) shows the toy example of the User 1 (𝑢1)
profile built by implementing the boolean scheme.
Chapter 3: Research Design 55
3.2.1.2 The User-Tag set (UTS) Scheme
The UTS scheme, based on set-based (Rendle, Balby Marinho, et al., 2009),
interprets the non-observed tagging data as follows: for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, only
the items that have not tagged by user 𝑢 are regarded as “irrelevant”, while the rest of
the non-observed items are labelled as “indecisive”, i.e. as entries to be predicted
when generating the recommendations (Ifada and Nayak, 2014a). The observed
tagging data is interpreted as “relevant”. Hence, the UTS scheme has three possible
entries: “relevant” (“1”), “irrelevant” (“-1”), or “indecisive” (“0”), for each cell in the
tensor. As the non-observed data no longer dominates the entries, this scheme is able
to overcome the sparsity problem of the boolean scheme.
Figure 3.3(b) shows the toy example of User 1 (𝑢1) profile built by
implementing the UTS scheme. The figure shows that there exist (𝑢1, 𝑡1), (𝑢1, 𝑡3),
and (𝑢1, 𝑡4) sets that have been used to annotate {𝑖2}, {𝑖2}, and {𝑖3}, respectively.
Those observed entries are regarded as “relevant” and labelled as “1” while all non-
observed entries of non-existed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5), on any items are
regarded as “indecisive” and labelled as “0”. As both 𝑖1 and 𝑖4 have never been
annotated by 𝑢1 using any other tags, therefore the entries of {𝑖1, 𝑖4} within (𝑢1, 𝑡1),
(𝑢1, 𝑡3), and (𝑢1, 𝑡4) are regarded as “irrelevant” and labelled as “-1”.
3.2.1.3 The graded-relevance Scheme
The set-based scheme interprets the observed entries as “relevant” (or “positive”)
entries, while the non-observed entries are interpreted as a combination of
“irrelevant” (or “negative”) and “indecisive” (or “null’) entries. The “irrelevant”
entries are entries that the users do not like, while the users might like the
“indecisive” entries in the future, i.e. entries to be predicted by the recommendation
system. The problem of the scheme is that it overgeneralises the “irrelevant” entries.
In fact, there exist entries of non-observed data; these should not merely be
interpreted as “irrelevant” or “indecisive” entries, as they can be “relevant”.
The graded-relevance scheme interprets the non-observed tagging data in
higher granularity and results in four possible distinct entries, “relevant” (“2”),
“likely relevant” (“1”), “irrelevant” (“-1”), or “indecisive” (“0”). Under this scheme,
the rules of labelling the “relevant”, “irrelevant”, and “indecisive” entries are the
same as those of the UTS scheme, however the “relevant” entries are given numeric
56 Chapter 3: Research Design
values of “2” instead of “1”. The “likely relevant” entries are perceived as the
“transitional” entries positioned between the “relevant” and “irrelevant” entries, i.e.
those entries in which items have been tagged, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set.
Figure 3.3(c) shows the toy example of the User 1 (𝑢1) profile built by
implementing the graded-relevance scheme. The figure shows that there exists
entries of 𝑖3 within (𝑢1, 𝑡1) and (𝑢1, 𝑡3) that are not interpreted as “irrelevant” as the
item 𝑖3 occurs as “relevant” on (𝑢1, 𝑡4). Similarly, 𝑖2 is “relevant” on (𝑢1, 𝑡1) and
(𝑢1, 𝑡3), and therefore 𝑖2 within (𝑢1, 𝑡4) cannot be “irrelevant”. Note that those
entries certainly should not be regarded “indecisive” as the items of the entries have
been selected by 𝑢1 and, therefore, they are not required to be predicted in the future.
User 10 0 0 0 0
1
0
0
1 0 0
0 0 1 0
0 0 0 0
tag
ite
m 0
User 1-1 0 -1 -1 0
1
0
-1
1 0 0
0 0 1 0
0 -1 -1 0
tag
ite
m 0
User 1-1 0 -1 -1 0
2
1
-1
2 1 0
0 1 2 0
0 -1 -1 0
tag
ite
m 0
(a) (b) (c)
Figure 3.3. The toy example of for User 1 (𝑢1) profile built from various interpretation schemes:
(a) boolean, (b) UTS, and (c) graded-relevance
3.2.2 Phase-Two: Generating Recommendations with Ranking Methods
This phase generates recommendations using the pre-processed tagged data based on
the proposed ranking recommendation methods according to the point-wise and list-
wise based ranking approaches.
In this thesis, the recommendation task is approached in two ways. Firstly, the
recommendation task is approached as a prediction task by predicting whether an
item will be “relevant” or “irrelevant” to a user. In this task, there is no
interdependency between the predicted item and other items (Liu, 2009; Rendle,
2011). In other words, this task can be called point-wise based ranking. Given this
setting, the boolean scheme is the most appropriate interpretation scheme used for
constructing the learning model. The scheme interprets the observed tagging data
entry as “1”, representing the “relevant” entry; and denotes the non-observed one as
“0”, representing the “irrelevant” entry. This thesis develops two point-wise based
ranking recommendation methods – TRPR: Probabilistic Ranking and We-Rank:
Chapter 3: Research Design 57
Weighted Tensor for Ranking – in which the boolean scheme is implemented. They
are described in Section 3.2.2.1. Details of the methods and the scheme are presented
in Chapter 4.
Secondly, the recommendation task is approached as a ranking task by
predicting an ordered list of items which will be of interest to a user. In this task, the
predicted entries depend on other corresponding entries (Liu, 2009; Rendle, 2011). In
other words, this task can be named as list-wise based ranking. The boolean scheme
for constructing the learning model is inappropriate in this task as the predicted
entries should be represented in ranked order. This thesis proposes two interpretation
schemes that apply the ranking constraint to interpret the tagging data, namely UTS
and graded-relevance, resulting in a multi-graded and graded-relevance input data
respectively. For tensor models populated with these interpretation schemes, the
recommendation task is viewed as a ranking problem. Therefore, the corresponding
ranking evaluation measure should be used as the optimization criterion. This thesis
develops two list-wise based ranking recommendation methods – Do-Rank: Learning
from Multi-graded Data and Go-Rank: Learning from Graded-relevance Data – in
which the UTS and graded-relevance schemes are implemented respectively. They
are described in Section 3.2.2.2. Detail of the methods and the schemes are presented
in Chapter 5.
3.2.2.1 Phase-Two (a): Point-wise based Ranking Approaches
The task of a tag-based recommendation system is to generate the list of items that
may be of interest to a user, by learning from the user’s past tagging behaviour. In
this phase, solving the recommendation task using a point-wise based ranking
approach, the task of recommendation is regarded as a regression/classification
learning problem where the tagging data is interpreted using the boolean scheme.
Therefore the corresponding regression/classification loss function is used as the
optimization criterion (Liu, 2009; Rendle, 2011). Two methods are proposed under
this approach and the brief details are described as follows.
3.2.2.1.1 TRPR: Probabilistic Ranking
The Tensor-based Item Recommendation using Probabilistic Ranking (TRPR)
method is developed by applying a probabilistic technique with the tensor model. In
58 Chapter 3: Research Design
TRPR, the recommendation quality is improved by ranking the candidate items
selected from the reconstructed tensor model.
A third-order tensor model representing the user profiles is built from the
tagging data by implementing the boolean scheme, briefly described in Section
3.2.1.1. The populated tensor model represents the collaborative activities of the
users that can be inferred by doing regression analysis. The tensor model is
optimized with respect to the least square loss function. This process is called
learning the recommendation model for regression. The factorized tensor model can
reveal the hidden relationships (i.e. collaborative activities) between the users. These
latent factors have been used to reconstruct the tensor and identify the new derived
entries for making recommendations.
Unlike the conventional way (Nanopoulos, 2011; Rafailidis and Daras, 2013;
Symeonidis et al., 2010), the reconstructed tensor element is not directly used for
generating recommendations. Instead, from the reconstructed tensor, the list of tags
and items preferences for each user is firstly generated, and then a probabilistic
approach is used to calculate the preferences of the users to rank recommended items
and generate a Top-𝑁 item list. An additional challenge of this method is the tensor
reconstruction process where the entire latent factors need to be multiplied. This
process consumes a lot of memory and the scalability becomes an issue. Given that
the factorized tensor consists of one core tensor and three factor matrices, a two-
stage iterative approach is proposed to reconstruct the tensor. Firstly, the core
element is multiplied by the first two factor matrices sequentially and the memory is
cleared during each process after saving the result. Lastly, a memory efficient loop is
implemented when the last factor matrix is multiplied by the complete result of the
previous process. Detail of the TRPR is presented in Chapter 4, Section 4.2.
3.2.2.1.2 We-Rank: Weighted Tensor Approach for Ranking
The Recommendation Ranking using Weighted Tensor (We-Rank) method is
developed by implementing a weighted tensor to reflect rewards and penalties to the
observed and non-observed tagging data entries of the primary tensor model that is
used for learning the regression model.
The same as TRPR, a third-order tensor representing the user profile is built
from the tagging data by implementing the boolean scheme. Unlike TRPR, We-Rank
Chapter 3: Research Design 59
does not solely use the user profiles for finding the hidden relationships between the
users by optimizing the tensor model. It also builds a weighted tensor, for which
entries are generated by considering the user’s past tagging behaviour. The weighted
tensor plays an important role in the learning algorithm as it controls and
differentiates the rewards and penalties for the observed and non-observed tagging
data entries respectively. The optimization criterion becomes a weighted least square
loss function. In comparison to TRPR that applies a succeeding approach to correctly
rank the order of recommended items after factorization and reconstruction
processes, the resulted latent factors of We-Rank can be directly used to make the
ranked recommendations.
For generating the entries of weighted tensor, this thesis proposes to firstly
calculate the likeliness score of each user in using each tag to annotate items.
Afterwards, tags are listed in descending order based on the likeliness scores for each
user. In this way, We-Rank can treat and call the tags with high likeliness scores as
the user’s positive tag preference set, whereas those of low scores can be treated and
call as the user’s negative tag preference set. Given the observed tagging data entries
and the generated positive and negative tag preference sets, the weighted tensor is
constructed. The entries in the weighted tensor are a bijective mapping to the entries
of the first tensor model that represents the user profile. Each observed entry of the
primary tensor is rewarded, such that the associated entry of the weighted tensor
holds higher positive value than that of the non-observed one. Detail of the We-Rank
is presented in Chapter 4, Section 4.3.
3.2.2.2 Phase-Two (b): List-wise based Ranking Approaches
The order of items in the recommendation list is imperative as users show more
interest in the few top items (Cremonesi et al., 2010). Inspired by this, a
recommendation task can be regarded as a ranking problem, in which the item
preference scores need to be calculated and sorted for generating the list. While
viewing the recommendation task as a ranking problem, a recommendation model
should be optimized with respect to the ranking evaluation measure so that a list of
items optimized from the ranking evaluation measure perspective can be
recommended to each user (Chapelle and Wu, 2010; Cremonesi et al., 2010; Xu and
Li, 2007). The user profile should also be built based on a scheme that interprets the
60 Chapter 3: Research Design
tagging data as a ranking representation. Two methods are proposed under this
approach and the brief details are described as follows.
3.2.2.2.1 Do-Rank: Learning from Multi-graded Data
The proposed UTS scheme, an efficient version of the set-based scheme (Rendle,
Balby Marinho, et al., 2009), results in the multi-graded tagging data representation
(Ifada and Nayak, 2014a, 2015) as briefly described in Section 3.2.1.2. The tagging
data is labelled with a value in the ordinal relevance set of
{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩. The “relevant”
(or “positive”) entries hold the highest value grade and represent the observed
tagging data entries, which indicate that the user has annotated an item using those
tags. Whereas both “irrelevant” (or “negative”) and “indecisive” (or “null”) entries
represent the non-observed entries, which respectively indicate that users are not
interested or users might be interested with the items in the future.
In the proposed method, namely DCG Optimization for Learning-to-Rank (Do-
Rank), the recommendation model is optimized with respect to Discount Cumulative
Gain (DCG) as the ranking evaluation measure for learning from the multi-graded
tagging data. Do-Rank generates an optimal list of item recommendations from the
DCG perspective for each user. However, optimizing DCG across all users in the
recommendation model is computationally expensive. To tackle this issue, a fast
learning algorithm is implemented. Detail of the Do-Rank is presented in Chapter 5,
Section 5.2.
3.2.2.2.2 Go-Rank: Learning from Graded-relevance Data
The proposed ranking method, namely GAP Optimization for Learning-to-Rank (Go-
Rank), applies the proposed graded-relevance scheme to interpret the tagging data
effectively. Briefly described in Section 3.2.1.3, the scheme sets the entries of non-
observed data that are not “irrelevant” entries – since items of those entries have
been annotated using other tags by the user – as the transitional entries between the
“relevant” and “irrelevant” entries. The “transitional” entries lead to the selection of
a ranking evaluation measure that should be used to handle the data and works as the
optimization criterion for learning the recommendation model. The Graded Average
Precision (GAP) is the generalization of Average Precision (AP) for ordinal
Chapter 3: Research Design 61
relevance data. Using GAP as the optimized ranking evaluation measure enables the
learning model to set up thresholds so that the “likely relevant” entries can be
regarded as either “relevant” or “irrelevant” entries. The ranking method generates a
list of items ranked from the GAP perspective for each user. Additionally, for the
purpose of fast and efficient learning, the entries are filtered as not all of them are
necessary to be used for learning. Detail of the Go-Rank is presented in Chapter 5,
Section 5.3.
3.3 DATASETS
Four publicly available real-world datasets were used to build the models and
evaluate the proposed methods in comprehensive experiments. Table 3.1 details the
various characteristics of these datasets. The detail of four tagging datasets is as
follows:
1) Delicious Dataset. This dataset is obtained from the DAI-Labor corpus2. The
corpus is retrieved from the Delicious Social Bookmarking Website
(http://delicious.com/) which contains 420 million observed tagging data of
bookmarking between September 2003 and December 2007 (Wetzker et al.,
2008). This thesis uses a portion of the dataset between January 2004 and April
2004. Figure 3.4 shows a snapshot of the Delicious dataset.
Figure 3.4. A snapshot of the Delicious dataset
2 Available at http://www.dai-labor.de/en/irml/datasets/delicious/
62 Chapter 3: Research Design
2) LastFM Dataset. This dataset is obtained from the GroupLens corpus3. The
corpus is retrieved from the Last.fm online music system (http://www.last.fm/)
which contains social networking, tagging, and music artist listening
information from a set of 2K users (Cantador et al., 2011). This thesis uses the
“user_taggedartists-timestamps.dat” file that contains the observed tagging data
of artists provided by each user. Figure 3.5 shows a snapshot of the LastFM
dataset.
Figure 3.5. A snapshot of the LastFM dataset
3) CiteULike Dataset. This dataset is obtained from the CiteULike website4.
CiteULike (http://citeulike.org/) is a website that provides a service for
managing and discovering scholarly references. It contains the tagging
information of users for managing and discovering scholarly references
from 2007-05-30 onwards. This thesis uses a portion of the dataset from the
year 2012. Figure 3.6 shows a snapshot of the CiteULike dataset.
Figure 3.6. A snapshot of the CiteULike dataset
4) MovieLens Dataset. This dataset is obtained from the GroupLens corpus5. The
corpus is an extension of the MovieLens10M dataset, published by the
GroupLens research group (http://www.grouplens.org/), which contains
3 Available at http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip
4 Available at http://static.citeulike.org/data/current.bz2
5 Available at http://files.grouplens.org/datasets/hetrec2011/hetrec2011-movielens-2k-v2.zip
Chapter 3: Research Design 63
personal ratings and tags about movies (Cantador et al., 2011). This thesis uses
the “user_taggedmovies-timestamps.dat” file that contains the observed
tagging data of movies provided by each user. Figure 3.7 shows a snapshot of
the MovieLens dataset.
Figure 3.7. A snapshot of the MovieLens dataset
These datasets have been used in evaluation as they are commonly used by
previous researchers (Bogers and van den Bosch, 2009; Kim et al., 2010; Rendle,
Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010; Symeonidis et al.,
2010; Tso-Sutter et al., 2008; Zhang et al., 2010). More importantly, these datasets
show diverse characteristics, as shown in Table 3.1, that assist in evaluating the
methods. The density of a dataset is calculated as:
𝑑𝑒𝑛𝑠𝑖𝑡𝑦 =|𝐴𝑜𝑏|
𝑄∙𝑅∙𝑆 (3.5)
The last row of Table 3.1 highlights the diversity of the datasets, as the ratio of
average observed entries of tagging data are different. The characteristics of user’s
tagging behaviour are captured differently in each tagging system. Users of the
LastFM and MovieLens datasets show that they, on average, have annotated
comparable number of items to the number of tags used; whereas users of the
Delicious and CiteULike datasets show that they, on average, have annotated large
number of items in comparison to the number of tags used.
64 Chapter 3: Research Design
Attribute
Dataset
Delicious
(item:
bookmark)
LastFM
(item:
artist)
CiteULike
(item:
article)
MovieLens
(item:
movie)
#users (𝑄) 5,311 1,892 22,610 2,113
#items (𝑅) 147,770 17,632 562,108 10,197
#tags (𝑆) 31,366 11,946 178,270 13,222
# observed entries of
tagging data (|𝐴𝑜𝑏|)
456,064 186,479 2,103,367 47,957
Density 0.000002% 0.000047% 0.0000001% 0.000017%
Avg observed entries of
tagging data per user
85.872 98.562 93.028 22.696
Avg observed entries of
tagging data per item
3.086 10.576 3.742 4. 703
Avg observed entries of
tagging data per tag
14.540 15.610 11.80 3.627
Ratio of Avg observed
entries of tagging data
(user:item:tag)
1:28:6 1:9:6 1:25:8 1:5:6
Table 3.1. Details of the various characteristic of datasets
3.3.1 Experimental Settings
Recommendation methods commonly suffer from the sparse data and consequently
generate low quality recommendations for the long-tail items (i.e. items that are
rarely selected by users) and users (i.e. users that rarely selected items) (Halpin et al.,
2007; Jäschke et al., 2007). Adapting the standard and common technique of
removing noise and reducing the data sparsity (Nanopoulos, 2011; Rafailidis and
Daras, 2013; Symeonidis et al., 2010), the datasets are refined by using the 𝑝-core
technique (Batagelj and Zaveršnik, 2002). This technique allows selecting users,
items, and tags that have occurred in at least 𝑝 number posts. Post is the set of
distinct (𝑢, 𝑖) ∈ 𝐴𝑜𝑏. This thesis follows this procedure and implements 10-core, the
accepted and realistic 𝑝-core setup in the literature (Jäschke et al., 2007; Rendle,
Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010; Tso-Sutter et al.,
Chapter 3: Research Design 65
2008), to refine each dataset used in the experiments. Setting a lower core threshold
might cause the users, items, and tags to have insufficient ties between dimensions,
making it difficult to discover common user interests for generating accurate
predictions (Li et al., 2008). Researchers have used 5-core or lower on small datasets
as many of users in the datasets have fewer than 10 posts (Jäschke et al., 2007;
Rendle and Schmidt-Thieme, 2010). Since this thesis uses large datasets, the most
appropriate lowest threshold is 10-core, as evidenced by other research (Jäschke et
al., 2007; Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010;
Tso-Sutter et al., 2008).
Additionally, to avoid bias in the experiments, this thesis also implements two
other ranges of 𝑝-core to each dataset, i.e. 15-core and 20-core. This variation will
minimize the likelihood of producing unstable results that are likely to happen when
only one choice of core size is used in the experiments (Doerfel and Jäschke, 2013).
It is to be noted that, the higher the 𝑝-core is set, the less sparse the resultant set and
the smaller each dimension size become. Table 3.2 details the Delicious, LastFM,
CiteULike, and MovieLens datasets statistics resulted from the implementation of
various 𝑝-cores.
Dataset 𝒑-
core
#users
(𝑸)
#items
(𝑹)
#tags
(𝑺)
Observed Tagging
Data Entries (|𝑨𝒐𝒃|) Density
Delicious 10 2,009 1,485 2,589 50,991 0.0007%
15 1,609 719 1,761 32,389 0.0016%
20 1,359 424 1,321 23,442 0.0031%
LastFM 10 867 1,715 1,423 99,211 0.0047%
15 703 1,018 1,063 76,808 0.0100%
20 601 681 838 61,739 0.0180%
CiteULike 10 1,129 548 2,403 17,161 0.0012%
15 721 203 1,334 8,099 0.0042%
20 529 89 844 4,254 0.0100%
MovieLens 10 357 709 799 14,535 0.0072%
15 262 396 558 9,442 0.0100%
20 200 217 425 6,023 0.0327%
Table 3.2. The details of dataset statistics resulted from the implementation of various 𝑝-cores
66 Chapter 3: Research Design
For each of the datasets, a 5-fold cross-validation experimentation is conducted
where each fold randomly generates 80% of the data set as training (𝐷𝑡𝑟𝑎𝑖𝑛) and
another 20% as a test (𝐷𝑡𝑒𝑠𝑡) based on the number of posts data. The 𝐷𝑡𝑟𝑎𝑖𝑛and
𝐷𝑡𝑒𝑠𝑡do not overlap in posts, i.e., there exist no triplets for a user-item set in the
𝐷𝑡𝑟𝑎𝑖𝑛 if the set is present in the 𝐷𝑡𝑒𝑠𝑡. The performance evaluation is reported over
the average values on all five runs.
3.4 EVALUATION METRICS
The evaluation of the performance of tag-based item recommendation methods is
conducted via offline setting. Offline evaluation means that the ranking methods are
evaluated on the pre-collected real-world tagging data (Shani and Gunawardana,
2011). This approach follows the typical evaluation scenario in academic research
(Marinho et al., 2012), in which it does not require any interaction with the real users
and therefore it allows the method comparison at a low cost (Shani and
Gunawardana, 2011). The weakness of this approach, however, is that it cannot
directly measure the influence of the recommendation method on the user behaviour
– the user reaction to real-time recommendation (Shani and Gunawardana, 2011).
The recommendation task is formulized to predict the Top-𝑁 items for the set
of target users present in 𝐷𝑡𝑒𝑠𝑡. Assuming the ground truth of items for each target
user 𝑢 can be found from the dataset; it is assigned as 𝑇𝑒𝑠𝑡𝑢 ⊆ 𝐼. The ranked list
recommendation for target user 𝑢 is represented by the permutation of items 𝐼,
denoted as 𝑙𝑢, where 𝑙𝑢(𝑛) is the item at position 𝑛 in the list. For instance, 𝑙𝑢1(1) =
𝑖1 indicates that the top position in the ranked list of recommendations for target user
𝑢1 is item 𝑖1.
The quality of recommendations is determined by measuring how successful
the method is for predicting the items in 𝑇𝑒𝑠𝑡𝑢 for each target user 𝑢. In this thesis,
distinct evaluation measures are used to evaluate the performances of each proposed
method. The selection of measures is governed by the underlying approaches
implemented for solving the recommendation task, i.e., point wise and list-wise. The
score of each measure ranges between 0 and 1, representing the lowest and highest
possible values, respectively.
Chapter 3: Research Design 67
3.4.1 Point-wise based Ranking Approach
Precision, Recall, and F1-Score are commonly used for evaluating the performance
of the recommendation model that is built by implementing the point-wise based
ranking approach, in which the recommendation task is approached as a
regression/classification task. The recommendation model is learned such that it
would predict whether the user will like the predicted item or not, where there is no
interdependency between the predicted item with other items present in the dataset
(Liu, 2009; Rendle, 2011)
For each target user, the predicted or recommended Top-𝑁 list of items,
𝑇𝑜𝑝(𝑙𝑢, 𝑁) where 𝑙𝑢 ∈ 𝐼, are compared to the ground truth items, 𝑇𝑒𝑠𝑡𝑢. Precision
measures the proportion of how many items in the predicted Top-𝑁 list are in 𝑇𝑒𝑠𝑡𝑢,
while recall measures how many of the ground truth items 𝑇𝑒𝑠𝑡𝑢 are covered by the
predicted Top-𝑁 list. The precision and recall per target user 𝑢, at 𝑁 position, are
computed as follows:
𝑇𝑜𝑝(𝑙𝑢, 𝑁) ∶= {𝑙𝑢(1),… , 𝑙𝑢(𝑁)} (3.6)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=|𝑇𝑜𝑝(𝑙𝑢,𝑁)∩𝑇𝑒𝑠𝑡𝑢|
𝑁 (3.7)
𝑅𝑒𝑐𝑎𝑙𝑙(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=|𝑇𝑜𝑝(𝑙𝑢,𝑁)∩𝑇𝑒𝑠𝑡𝑢|
𝑇𝑒𝑠𝑡𝑢 (3.8)
The reported precision and recall values are the average values over all users in
the 𝐷𝑡𝑒𝑠𝑡; the average precision and recall are calculated as:
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1
|𝐷𝑡𝑒𝑠𝑡|∑ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡
(3.9)
𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1
|𝐷𝑡𝑒𝑠𝑡|∑ 𝑅𝑒𝑐𝑎𝑙𝑙(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡
(3.10)
Additionally, F1-Score is reported to represent the harmonic mean of average
precision and recall:
𝐹1(�̃�, 𝑁) ∶=2∙ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡,𝑁)∙ 𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡,𝑁)
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡,𝑁)+𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡,𝑁) (3.11)
68 Chapter 3: Research Design
3.4.2 List-wise based Ranking Approach
AP, MAP and NDCG are the widely used measures for evaluating the performance
of a recommendation model that is built by implementing the list-wise based ranking
approach, in which the recommendation task is approached as a ranking task. In this
case, the model is learned such that it would predict an ordered set of items that will
be of interest to a user, in which the predicted items depend on other corresponding
items (Liu, 2009; Rendle, 2011).
3.4.2.1 Average Precision (AP) and Mean Average Precision (MAP)
Average Precision (AP) is the average of precisions at the positions where the
predicted item 𝑙𝑢 is in 𝑇𝑒𝑠𝑡𝑢. The Average Precision (AP), at 𝑁 position, is defined
as:
𝐴𝑃(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=1
|𝑇𝑒𝑠𝑡𝑢|∑ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑛) ∙ 𝕀(𝑙𝑢(𝑛) ∈ 𝑇𝑒𝑠𝑡𝑢)𝑁
𝑛=1 (3.12)
where 𝕀(∙) is the indicator function, which is equal to 1 if the condition is satisfied,
and 0 otherwise. The Mean Average Precision (MAP) is presented as the average
over all users in the 𝐷𝑡𝑒𝑠𝑡:
𝑀𝐴𝑃(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1
|𝐷𝑡𝑒𝑠𝑡|∑ 𝐴𝑃(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡
(3.13)
3.4.2.2 Discounted Cumulative Gain (DCG) and Normalized Discounted
Cumulative Gain (NDCG)
In the Discounted Cumulative Gain (DCG), predicted items with higher ranked
position are more important to the user than those of the lower ranked. The DCG per
target user 𝑢 is defined as:
𝐷𝐶𝐺(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶= ∑1
𝑙𝑜𝑔2(1+𝑛)∙ 𝕀(𝑙𝑢(𝑛) ∈ 𝑇𝑒𝑠𝑡𝑢)𝑁
𝑛=1 (3.14)
where 𝕀(∙) is the indicator function, which is equal to 1 if the condition is satisfied,
and 0 otherwise. In Equation (3.13), the numerator is the gain function that gives
weight to the predicted items 𝑙𝑢 if they exist in 𝑇𝑒𝑠𝑡𝑢 whereas the denominator is the
discount function that makes the predicted items at lower ranks contribute less to the
DCG score (Balakrishnan and Chopra, 2012; Chapelle and Wu, 2010).
Chapter 3: Research Design 69
The DCG score over all users in the 𝐷𝑡𝑒𝑠𝑡 is formulated as:
𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1
|𝐷𝑡𝑒𝑠𝑡|∑ 𝐷𝐶𝐺(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡
(3.15)
Where the maximum possible DCG score, i.e. Ideal DCG (IDCG), is given as:
𝐼𝐷𝐶𝐺(𝑁) ∶= ∑1
𝑙𝑜𝑔2(1+𝑛)𝑁𝑛=1 (3.16)
The Normalized Discounted Cumulative Gain (NDCG) is then derived by
normalizing the DCG with IDCG:
𝑁𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡,𝑁)
𝐼𝐷𝐶𝐺(𝑁) (3.17)
3.5 BENCHMARKING METHODS
This section details the benchmarks used for evaluating the proposed methods. The
purpose is to comprehend the strengths and weaknesses of both the proposed
methods and the relevant state-of-the-art methods. This thesis is principally trying to
solve the tag-based item recommendation task by applying the tensor model and
highlighting the practical utilisation of an efficient tagging data interpretation scheme
and a learning-to-rank approach. This is a fairly new research topic. The tensor
models have become popular within a few years in recommendation research and
researchers are still stumped by many challenges such as scalability, sparsity, and
learning from the latent factors. Additionally, treating recommendation as a ranking
approach and using optimization to achieve the most optimized recommendation list
are a new research area. Consequently, there are not many direct relevant works that
can be used. The closest works are chosen that can be used for comparison directly
or with some adaptation for benchmarking. The selection of benchmarking methods
is driven by those aspects. It is to be noted that dealing with the semantical problems
of tags is beyond the focus of this research and the benchmarks have been selected
accordingly. Details of the benchmarks are described as follows:
70 Chapter 3: Research Design
3.5.1 MAX Method
The MAX method (Symeonidis et al., 2010) is one of the first tag-based item
recommendation methods that used the tensor as its learning model. This method
uses the boolean scheme for constructing the user profiles, in which the task of
recommendations is regarded as predicting whether an item will be “relevant” or
“irrelevant” to a user. In other words, the method implements a point-wise based
ranking approach for learning the tensor recommendation model. The MAX method
applies the HOSVD-based factorization technique (Kolda and Bader, 2009) and
directly utilises the reconstructed tensor to generate the list of recommendations
based on the maximum values of the calculated predicted score on each user-item
set. This method simply assumes that the level of user preference on a candidate item
is solely represented by the calculated predicted score on a tag, i.e. the influence of
other tags is disregarded. It is to be noted that the MAX method builds the tensor
model from a relatively small size data (105 × 246 × 591 representing the size of
users, items, and tags) due to the scalability issue that commonly occurs in
reconstructing large tensor models.
Using the MAX method as a benchmarking method is necessary, as one of the
research objectives of this thesis is to develop two methods that use the boolean
scheme for constructing the user profiles and implements the point-wise based
ranking approach for learning the tensor model. The first proposed method is the
Tensor-based Item Recommendation using Probabilistic Ranking (TRPR) method,
which attempts to improve the quality of recommendations via a probabilistic
ranking approach in such a way that selection of candidate items achieved from the
reconstructed is taking into account the user’s tag preferences, while at the same time
solving the scalability issue. The second proposed method is Ranking using Weighted
Tensor (We-Rank), which attempts to utilise the user’s past tagging history via a
weighted scheme in such a way that the list of recommendations can be directly
generated from the factorized tensor.
3.5.2 Pairwise Interaction Tensor Factorization (PITF) Method
The Pairwise Interaction Tensor Factorization (PITF) method (Rendle and Schmidt-
Thieme, 2010) is a well-known and leading tensor-based tag recommendation
method. This method uses the set-based scheme (Rendle, Balby Marinho, et al.,
Chapter 3: Research Design 71
2009) for constructing the user profiles and the task of recommendations is regarded
as predicting the order of a pair of items, in which the interdependency occurs
between the two paired items. In other words, the method implements a pair-wise
based ranking approach for learning the tensor recommendation model. Since this
thesis has developed the methods of item recommendations based on user activities,
the PITF method is extended in this thesis for the task of item recommendation. The
adaptation is necessary as the task of recommending tags differs from the task of
recommending items.
For tag recommendation, predictions are generated for each predefined user
and item set, i.e. the recommendation system predicts tags for an item to a user.
However, for item recommendation, the recommendation system predicts items
based on the user information only. Consequently, a method must calculate the item
ranking score from the whole available tags before deciding which items are in the
Top-𝑁 recommendation list for the user.
Using the set-based interpretation scheme, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, the PITF
method represents the ranking of tagging data as (𝑢, 𝑡, 𝑖𝑃, 𝑖𝑁), where (𝑢, 𝑡, 𝑖𝑃) is a
triple of “relevant” or “positive” entry and (𝑢, 𝑡, 𝑖𝑁) is a triple of “irrelevant” or
“negative” entry. It then creates a tensor factorization model, which employs the
stochastic gradient descent algorithm (Rendle, Freudenthaler, et al., 2009) for
optimizing the ranking function in such a way that the positive entries are assigned
with higher values than the negative entries. This ensures the notion that the user
favours the positive entries more than the negative ones. The model is formulated as:
�̂� ∶= 𝑀(1) ∙ 𝑀(2)𝑈+𝑀(1) ∙ 𝑀(2)𝑇 (3.18)
where 𝑀(1) is the user latent factor matrix, 𝑀(3) is the tag latent factor matrix, 𝑀(2)𝑈
is the item factor matrix with respect to users, 𝑀(2)𝑇 is the item factor matrix with
respect to tags, and �̂� is the new tensor. The relevance recommendation ranking
score is calculated as:
�̂�𝑢,𝑡,𝑖 ∶= ∑ 𝑚(1)𝑢,𝑓 ∙ 𝑚(2)
𝑖,𝑓𝑈𝐹
𝑓=1 + ∑ 𝑚(3)𝑖,𝑓 ∙ 𝑚(2)
𝑖,𝑓𝑇𝐹
𝑓=1 (3.19)
where 𝐹 is the size of latent factors.
72 Chapter 3: Research Design
Using the PITF method as a benchmarking method is necessary as a research
objective of this thesis to develop two methods that use the ranking-based scheme for
constructing the user profiles and implements the list-wise based ranking approach
for learning the tensor model. This thesis conjectures that a recommendation task
should be regarded as a ranking task, i.e. predicting an ordered set of items that will
be of interest to a user where the predicted items depend on other corresponding
items, instead of predicting the order of a pair of items where the interdependency
only occurs between the two paired items. It is to be noted, employing a ranking-
based scheme results in multi-graded or graded-relevance data, while implementing a
list-wise based ranking approach requires that the ranking evaluation measure must
be directly optimized in order to learn the recommendation model. The two proposed
list-wise based ranking methods are: (1) DCG Optimization for Learning-to-Rank
(Do-Rank) that uses the proposed UTS scheme for constructing the tensor learning
model and the DCG as the optimization criterion; and (2) GAP Optimization for
Learning-to-Rank (Go-Rank) that uses the proposed graded-relevance scheme for
constructing the tensor learning model and the GAP as the optimization criterion.
3.5.3 CF-based method that applied the Candidate Tag Set (CTS) Method
The CTS method (Kim et al., 2010) is the state-of-the-art tag-based item
recommendation method that used the matrix as its learning model. This method
projects the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relationship into three binary relationships, i.e.
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and ⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩, and uses the boolean scheme for
constructing the user profile. The CTS method considers that the tags included by a
certain user imply the latent preference and, therefore, the similarity between users
can be determined based on the user-created tags. The list of recommendations for
each user is generated by firstly identifying the tag preferences of each user, such
that the user’ likelihood on selecting items can be calculated.
Using the CTS method as a benchmarking method is necessary as this thesis is
emphasising that, given the ternary relation of tagging data, a tag-based
recommendation system needs to employ a multi-dimensional approach, i.e. tensor
model, rather than splitting them into a lower dimension model, i.e. matrix model.
Chapter 3: Research Design 73
3.6 CHAPTER SUMMARY
This chapter has detailed the research design that is used in the research conducted in
the next two chapters. It briefly explains the data interpretation schemes detailing
how tagging data can be interpreted in a tensor model in order to improve the
recommendation performance. It briefly explains the two categories of methods:
point-wise and list-wise, that are developed in this thesis for generating item
recommendations based on users’ tagging activities. Table 3.3 shows the summary of
ranking methods proposed in this thesis. Four real-world and freely-available
datasets that are used for the evaluation of the proposed methods have been
described. The evaluation measures for the performance comparison of the proposed
and benchmarking methods are presented, along with the benchmarks that will be
used for comparing the proposed point-wise based and list-wise based ranking
methods, as detailed in Chapter 4 and Chapter 5, respectively.
Ranking
Method
User Profile
Construction
Recomme
ndation
Task
Learning
Approach
Optimizati
on
Criterion
Recommendation
Generation
Approach
TRPR:
Probabilistic
Ranking
boolean
scheme
Prediction
task
Point-wise
based
ranking
Least
square loss
Two stage: Full
tensor
reconstruction from
the latent factors +
probabilistic
ranking
We-Rank:
Weighted
Tensor
Approach for
Ranking
boolean
scheme +
weighting to
reward and
penalise entries
Prediction
task
Point-wise
based
ranking
Weighted
least square
loss
Directly use the
latent factors
Do-Rank:
Learning from
Multi-graded
Data
UTS scheme Ranking
task
List-wise
based
ranking
Discount
Cumulative
Gain (DCG)
Directly use the
latent factors
Go-Rank:
Learning from
Graded-
relevance Data
graded-
relevance
scheme
Ranking
task
List-wise
based
ranking
Graded
Average
Precision
(GAP)
Directly use the
latent factors
Table 3.3. The summaries of the proposed ranking methods
74 Chapter 3: Research Design
Chapter 4: Point-wise based Ranking Methods 75
Chapter 4: Point-wise based Ranking
Methods
This chapter presents the point-wise based ranking recommendation methods
developed to solve the tag-based item recommendation task in this thesis. It begins
with an introduction of the point-wise based ranking approach. The next two sections
detail the proposed methods, i.e. TRPR: Probabilistic Ranking and We-Rank:
Weighted Tensor for Ranking, in which experiments are conducted for each method
and the results are discussed.
4.1 INTRODUCTION
Solving the recommendation task using a point-wise based ranking approach, the
recommendation problem is seen as a regression/classification learning problem, i.e.
predicting whether an item will be “relevant” or “irrelevant” to a user and that there
is no interdependency between the predicted item and other items (Liu, 2009; Mohan
et al., 2011; Rendle, 2011). In this case, a recommendation model should be
optimized with respect to the corresponding regression/classification loss function
(Liu, 2009; Mohan et al., 2011; Rendle, 2011).
4.1.1 Challenges
The discussion in Chapter 2 has established the merit of using the tensor model to
represent the ternary latent relations inherent in tagging data and infer the user
likeliness score from them. Using tensor models for generating recommendations
faces various challenges. Scalability is a common problem in generating
recommendations for large datasets using a tensor model. Full tensor reconstruction
is computed by multiplying all latent factors. This process is memory expensive and,
therefore, reconstructing large size tensors is infeasible (Kutty et al., 2012; Leginus
et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010).
A memory efficient method (Kolda and Sun, 2008) for enabling a latent factors
generation process has been proposed to fulfil the purpose of many applications that
76 Chapter 4: Point-wise based Ranking Methods
do not need a tensor to be reconstructed. However, the tensor-based recommendation
methods require the tensor to be fully reconstructed for identifying new entries, to be
used for generating a list of recommendations (Kutty et al., 2012; Nanopoulos, 2011;
Rafailidis and Daras, 2013; Symeonidis et al., 2008, 2010). This expensiveness of
tensor reconstruction has not been properly addressed yet (Ifada and Nayak, 2014b,
2014c).
Another problem faced by tensor-based recommendation models is the quality
of recommendation. Existing point-wise based ranking recommendation methods
(Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010) use a
reconstructed tensor directly for generating the recommendations based on the
maximum values of predicted scores in each user-item set of tensor elements. These
approaches assume that the predicted score in the reconstructed tensor can represent
the level of user preference for an item based on a tag only. They disregard the
activity history of the users (Jain and Varma, 2011) that is known to influence the
user likelihood to the recommended items (Kim et al., 2010).
Furthermore, data sparsity is also a common problem for tensor models. From
the various characteristics of real-world tagging data listed in Table 3.1, it can be
seen that tagging data is extremely sparse. For this reason, implementing the boolean
scheme to populate the tensor model from the tagging data results in the domination
of “0” values (representing the non-observed data) against the “1” values
(representing the observed data) in tensor entries. Directly applying the factorization
technique on this model will overfit the numerical values of “1” and “0” (Rendle,
Balby Marinho, et al., 2009). To solve this problem, researchers have represented the
model as a weighted version of the error function to ignore the missing data and
model only the known entries (Acar et al., 2011). However, the approach simply
builds the weighted learning model by generating a weighted tensor as a bijective
mapping of values of the primary tensor model entries. This disregards the users’
past tagging behaviours, i.e. as users may have different preferences of using
different tags for annotating items (Ifada and Nayak, 2014a; Wetzker et al., 2008).
4.1.2 Proposed Solutions
Two methods, TRPR and We-Rank are presented in this chapter to tackle the
challenges described above. TRPR and We-Rank fall under the category of point-
Chapter 4: Point-wise based Ranking Methods 77
wise based ranking approach since they use the regression loss as the
recommendation model optimization criterion. For constructing the user profile
representation, both methods implement the boolean scheme to interpret the tagging
data. TRPR, the first developed method in this chapter, focuses on improving
scalability during the tensor reconstruction process and improving recommendation
accuracy after the tensor model has been reconstructed. It is to be noted that this
method does not deal with the complexity within the latent factors generation task.
We-Rank, the second method, focuses on dealing with the sparsity problem and
improving the recommendation accuracy during the learning-to-rank procedure.
4.2 TRPR: PROBABILISTIC RANKING
4.2.1 Overview
The developed TRPR method focuses on improving the scalability during the tensor
reconstruction process and the recommendation accuracy after the tensor model has
been reconstructed. Figure 4.1 illustrates the overview of the probabilistic ranking
method, called the Tensor-based Item Recommendation using Probabilistic Ranking
(TRPR). It utilises a memory efficient loop approach for scalable full tensor
reconstruction and a probabilistic ranking to improve the accuracy of
recommendations generated from the reconstructed tensor.
To begin, a third-order tensor model representing the user profiles is built from
the tagging data by implementing the boolean scheme. The resulting tensor model
represents the collaborative activities of the users and becomes the input to learn the
recommendation model for regression. This model is then factorized to find the latent
factors in the user, item, and tag dimensions, which are used to reconstruct the tensor
and identify the new derived entries for making recommendations. To tackle the
expensiveness of the tensor reconstruction process, TRPR applies a memory efficient
loop, by implementing the block-striped (matrix) product approach, for multiplying
the factorized elements of the tensor model to enable a scalable tensor reconstruction.
Unlike the conventional way (Nanopoulos, 2011; Rafailidis and Daras, 2013;
Symeonidis et al., 2010), TRPR does not directly use the reconstructed tensor entries
for generating recommendations. Instead, the list of candidate items and tag
78 Chapter 4: Point-wise based Ranking Methods
preferences for each user are firstly generated from the reconstructed tensor entries.
The probabilistic ranking stage generates the Top-𝑁 list of item recommendations to
users. TRPR improves the recommendation quality by ranking the items, in which
the probability of users to select items of the candidate item set are calculated by
employing the tag preference set. This ensures that the list of recommended items is
ranked according to the probability value that the user may like.
boolean
Scheme
Tagging Data
Item 3
Item 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 1
Item
2User
1
User2
User3
Item4
User Profile
Construction
YÎRQ x R x S
Latent Factors
Generation
C, M(1), M
(2),M
(3)
Tensor Reconstruction
(Memory Efficient Approach: n-mode
block-striped (matrix) product):
Y’ÎRQ x R x S
Candidate Item Set
Generation
Zu
Tag Preference Set
Generation
Xu
Top-N Item Recommendation Generation
(Probabilistic Ranking)
TopNu
Target
User
Figure 4.1. Overview of the Probabilistic Ranking method (TRPR)
The next three sub-sections detail the three main processes in TRPR: (1) user
profile construction, i.e. tensor model construction; (2) learning-to-rank procedure,
i.e. latent factors generation via tensor factorization to derive the relationships
inherent in the model; and (3) recommendation generation. The last two sub-sections
present empirical evaluation and summary of the method.
4.2.2 User Profile Construction
The user profile construction includes constructing an initial tensor to model the
multi-dimension tagging data. The TRPR method uses the boolean scheme
(Symeonidis et al., 2010) to populate the third-order tensor model. This tensor model
can now represent the user profile and becomes the underlying ranking learning
Chapter 4: Point-wise based Ranking Methods 79
model for recommendation. The boolean scheme is commonly used in tag-based
item recommendation methods and simply interprets the tagging data as binary data,
which includes two types of entries, i.e. “relevant” and “irrelevant” entries. The
“relevant” entries, labelled as “1”, are the observed entries where the user has
explicitly revealed interest by annotating item using tags; the “irrelevant” entries,
labelled as “0”, are the remaining (non-observed) entries.
Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set
of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. From the tagging data
𝐴 ∶ 𝑈 × 𝐼 × 𝑇, a vector of 𝑎: (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an activity of user 𝑢 to annotate
item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state, for which users
have expressed their interest to items in the past, by annotating those items using tags
where 𝐴𝑜𝑏 ⊆ 𝐴. Note that the number of observed tagging data is usually very
sparse, thus |𝐴𝑜𝑏| ≪ |𝐴|. The initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed where 𝑄, 𝑅,
and 𝑆 are the size of set of users, items and tags respectively, while each tensor entry,
𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the relevance grade based on the user
tagging activity. The rules of the boolean scheme relevance grade labelling to
generate the entries of tensor 𝒴 can be formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.1)
Example 4.1: Tagging Data Interpretation using boolean scheme.
An example of tagging data in Figure 3.2 illustrates a toy example that represents a
tensor model that holds the record of 𝐴𝑜𝑏, 𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3},
𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. Each slice of the tensor represents a user
matrix, which contains the user tag usage for each item. The “+” symbols represent
the 𝐴𝑜𝑏 entries; for instance, the observed tagging data example of Figure 3.2 shows
that user 𝑢1 has annotated item 𝑖2 using tag 𝑡1. Figure 4.2 illustrates the constructed
initial tensor 𝒴 ∈ ℝ3×4×5, as the representation of user profile, in which entries are
generated from the tagging data by implementing the boolean interpretation scheme
as formulated in Equation (4.1).
80 Chapter 4: Point-wise based Ranking Methods
User 30 0 0 0 0
0
0
0
0 0 0 0
1 0 0 0
0 0 1 1
User 2
1 0 0 0 0
0
0
0
0 0 0 0
1 0 1 0
0 0 0 0
User 10 0 0 0 0
1
0
0
1 0 0
0 0 1 0
0 0 0 0
tag
ite
m 0
Figure 4.2. Example of initial tensor 𝒴 ∈ ℝ3×4×5 as the representation of user profile in which entries
are generated by implementing the boolean interpretation scheme to the toy example in Figure 3.2
4.2.3 Learning-to-Rank Procedure
This section details the process of learning and generating latent factors that
correspond to each dimension of tensor 𝒴.
4.2.3.1 Optimization Criterion and Factorization Technique
Mean Square Error (MSE) is the common optimization criterion for solving a
regression/classification task. The MSE of all users over all items under all tags can
be defined as:
𝑀𝑆𝐸 ∶=1
𝑄𝑅𝑆∑ ∑ ∑ [𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡]
2𝑡∈𝑇𝑖∈𝐼𝑢∈𝑈 (4.2)
where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned as one of elements in the binary
relevance set of { 0, 1} from the user profile represented by the initial tensor model.
The �̂�𝑢,𝑖,𝑡 is the predicted preference score that reflects the preference level of user 𝑢
for annotating item 𝑖 using tag 𝑡, calculated from the latent factors of the tensor
model. Recall that 𝑄, 𝑅, and 𝑆 are the number of users, items, and tags, respectively.
Latent factors are learned from the tensor 𝒴 to derive the latent relationships
between the dimensions of users, items and tags. The practice of generating the latent
factors is commonly called the tensor factorization process (Koren et al., 2009;
Rendle and Schmidt-Thieme, 2010). Two broad families of factorization techniques
are Tucker and Candecomp/Parafac (CP) (Kolda and Bader, 2009). A Tucker model,
as illustrated in Figure 4.3, includes the Higher-Order SVD (HOSVD) and Higher-
Order Orthogonal Iteration (HOOI) models (Kolda and Bader, 2009). On the other
Chapter 4: Point-wise based Ranking Methods 81
hand, a CP model can be considered as a special case of Tucker where the core
tensor is diagonal (Kolda and Bader, 2009), as illustrated in Figure 4.4.
Y » M(2)
(R xF)
M(1)
(Q x F)
M(3)
(S x F)
C(F x F x F)
(Q x R x S)
Figure 4.3. The Tucker factorization model for a third-order tensor
Y » M(2)
(R xF)
1 1 1
M(1)
(Q x F)
M(3)
(S x F)
C(F x F x F)
(Q x R x S)
Figure 4.4. The CP factorization model for a third-order tensor
TRPR can implement either the Tucker or the CP model as the predictor
function for calculating the predicted preference score �̂�𝑢,𝑖,𝑡. For a third-order tensor
𝒴 ∈ ℝ𝑄×𝑅×𝑆, a Tucker or CP model may perform Singular Value Decomposition
(SVD) on each mode-𝑛 matricization of tensor 𝒴 (Kolda and Bader, 2009) and
results in three latent factors matrices, corresponding to each dimension of tensor 𝒴,
𝑀(1) ∈ ℝ𝑄×𝐹, 𝑀(2) ∈ ℝ𝑅×𝐹, and 𝑀(3) ∈ ℝ𝑆×𝐹, where 𝐹 ≪ 𝐻 and 𝐻 ∈ {𝑄, 𝑅, 𝑆}. The
diagonal core tensor 𝒞 ∈ ℝ𝐹×𝐹×𝐹, that defines the interaction between the users,
items and tags (Kolda and Bader, 2009) can be calculated by multiplying all the
latent factors together as:
𝒞 ∶= 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ ×3 (𝑀(3))′ (4.3)
82 Chapter 4: Point-wise based Ranking Methods
Afterwards, the predicted preference score �̂�𝑢,𝑖,𝑡, a score that reflects the preference
level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡, is calculated as:
�̂�𝑢,𝑖,𝑡 ∶= ∑ ∑ ∑ 𝑐𝑓,𝑓,𝑓 ∙ 𝑚𝑢,𝑓(1)
∙ 𝑚𝑖,𝑡(2)
∙ 𝑚𝑡,𝑓(3)𝐹
𝑓=1𝐹𝑓=1
𝐹𝑓=1 = ⟦𝒞;𝑀(1), 𝑀(2), 𝑀(3)⟧ (4.4)
Definition 4.1.
The 𝑛-mode (matrix) product, denoted by ×𝑛, is a multiplication operation of a
tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 with a matrix 𝑀 ∈ ℝ𝐷×𝐹 in mode-𝑛. This operation is
equivalent to multiplying the matrix 𝑀 by the appropriate tensor mode-𝑛
matricization 𝑌(𝑛) (Kolda and Bader, 2009):
𝒴�̂� ∶= 𝒴 ×𝑛 𝑀 ⟺ �̂�(𝑛) ∶= 𝑀𝑌(𝑛) (4.5)
Definition 4.2.
The mode-𝑛 matricization of tensor 𝒴, denoted by 𝑌(𝑛), is the process of re-arranging
the tensor elements into a matrix element (Kolda and Bader, 2009). For instance, a
tensor 𝒴 ∈ ℝ3×4×5 can be rearranged as three ways of matricization, i.e. 𝑌(1) ∈
ℝ3×20, 𝑌(2) ∈ ℝ4×15, and 𝑌(3) ∈ ℝ5×12, as illustrated in Figure 4.5.
User 3i u G S ee
j
k
l
v H T ff
w I U gg
x J V hh
User 2
e q C O aa
f
g
h
r D P bb
s E Q cc
t F R dd
User 1a m y K W
b
c
d
z L X
o A M Y
p B N Z
tag
ite
m n
𝒴 ∈ ℝ3×4×5
𝑌(1) = [
a b c d m n o p y z A B K L M N W X Y Ze f g h q r s t C D E F O P Q R aa bb cc ddi j k l u v w x G H I J S T U V ee ff gg hh
]
𝑌(2) = [
a e i m q u y C G K O S W aa eeb f j n r v z D H L P T X bb ffc g k o s w A E I M Q U Y cc ggd h l p t x B F J N R V Z dd hh
]
𝑌(3) =
[ a b c d e f g h i j k lm n o p q r s t u v w xy z A B C D E F G H I JK L M N O P Q R S T U VW X Y Z aa bb cc dd ee ff gg hh]
Figure 4.5. Example of three ways matricization of a tensor 𝒴 ∈ ℝ3×4×5
Chapter 4: Point-wise based Ranking Methods 83
4.2.3.2 Latent Factors Generation
Latent factors generation, via tensor factorization, is the process of deriving the latent
relationships between dimensions of the tensor model. The latent factors matrices,
𝑀(1), 𝑀(2), and 𝑀(3) corresponding to each dimension of tensor 𝒴, are generated by
optimizing the objective function of the tensor model. Given Equation (4.2), the
objective function can be formulated as (Kolda and Bader, 2009):
𝐿(𝛩) ∶= ∑ ∑ ∑ [𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡]2
𝑡∈𝑇𝑖∈𝐼𝑢∈𝑈 = [𝒴 − ⟦𝒞;𝑀(1), 𝑀(2), 𝑀(3)⟧]2 (4.6)
Note that the constant coefficient 1
𝑄𝑅𝑆 in 𝑀𝑆𝐸 can be neglected in Equation
(4.6) since it has no influence on the optimization. TRPR implements the Alternating
Least Square (ALS) approach (Kolda and Bader, 2009) to optimize the objective
function in Equation (4.6). The ALS approach is executed by fixing all but one
matrix of 𝑀(1), 𝑀(2), and 𝑀(3) as follows:
𝑀(1) = 𝒴 ×2 (𝑀(2))′ ×3 (𝑀(3))′ (4.7)
𝑀(2) = 𝒴 ×1 (𝑀(1))′ ×3 (𝑀(3))′ (4.8)
𝑀(3) = 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ (4.9)
This process is repeated until a certain number of iterations, i.e. as the convergence
criterion is satisfied (Carroll and Chang, 1970; Harshman, 1970; Kolda and Bader,
2009). The suggested number of iterations is no more than 50 (Bader et al., 2012).
Figure 4.6 shows the learning algorithm used in TRPR.
84 Chapter 4: Point-wise based Ranking Methods
Figure 4.6. The TRPR learning algorithm, adapted from (Kutty et al., 2012)
4.2.4 Recommendation Generation
This section details how TRPR generates the Top-𝑁 list of item recommendations for
each user. Existing methods rank the candidate items based on the maximum value
of �̂�𝑢,𝑖,𝑡 for each (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 of the sets in the reconstructed tensor 𝒴 ̂(Kutty et al.,
2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010). These
approaches fail to consider the user’s tag usage history in the initial tensor 𝒴 as it
solely generates the recommendations using the level of user preference for an item,
based on a tag only. TRPR approaches the problem of item recommendation as a
classification problem, making Naïve Bayes (Baker and McCallum, 1998) apt for
finding an efficient solution (Lops et al., 2011). In this case, the list of
recommendations is ranked and generated based on the probability score of a user to
select candidate items, in which the user tag usage history (or preference) is taken
into account.
Algorithm: TRPR Learning
Input: Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ∶= 𝑈 × 𝐼 × 𝑇, the size of latent factor 𝐹, maximal
iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
Output: Latent Factors Matrices 𝑀(1), 𝑀(2), 𝑀(3) and the core tensor 𝒞
1. Construct the initial tensor model with the tagging data
𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆
Populate 𝒴 using Equation (4.1)
2. Apply a factorization technique to tensor 𝒴 to get:
a. The latent factors:
Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹, 𝑀(2)(0)
∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0
repeat
𝑀(1) 𝐹 leading left singular vectors of Equation (4.7)
𝑀(2) 𝐹 leading left singular vectors of Equation (4.8)
𝑀(3) 𝐹 leading left singular vectors of Equation (4.9)
ℎ + +
until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥 b. The core tensor:
𝒞 ← 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ ×3 (𝑀(1))′
where 𝐶 ∈ ℝ𝐹×𝐹×𝐹
Chapter 4: Point-wise based Ranking Methods 85
4.2.4.1 Tensor Reconstruction
Tensor reconstruction is the process of revealing new entries that are inferred from
the latent factors. The reconstructed tensor �̂� is derived by multiplying the core
tensor by all latent factor matrices:
�̂� ∶= 𝒞 ×1 𝑀(1) ×2 𝑀(2) ×3 𝑀(3) (4.10)
Implementing the general 𝑛-mode matrix product for reconstructing a tensor on
a large dataset is expensive, due to memory overflow. The problem becomes worse
in the last step of multiplication, where the effects of earlier latent factors have been
included. TRPR proposes a memory-efficient loop approach, i.e. the block-striped
(matrix) product approach to solve the problem of the last step of multiplication.
In TRPR, the 1-mode and 2-mode (matrix) products are implemented to
multiply the core tensor 𝒞 by the reduced factor matrices 𝑀(1) and 𝑀(2), in order to
obtain the intermediate tensor results 𝒴1̂ and 𝒴2̂ sequentially. To multiply 𝒴2̂ by the
third reduced factor matrix 𝑀(3), a 3-mode block-striped (matrix) product is
implemented for the purpose of memory efficiency. The block-stripping of the
matrix 𝑀(3) and multiplication subtasks allow producing smaller manipulations that
can fit in the allowed memory size. In this case, the multiplication task between the
mode-3 matrix (𝑌2̂) (an equivalent form of tensor 𝒴2̂) and 𝑀(3) is split into 𝑁 number
of subtasks, where 𝑁 ∶= 𝑆 𝑑𝑖𝑣 𝑏, 𝑆 is the size of set of tags, and 𝑏 is a user-given
block-strip row size (𝑏 ≪ S). This means that a matrix (𝑀(3))𝑛
, where |(𝑀(3))𝑛| =
𝑏, is obtained and multiplied by 𝑌2̂, at each subtask. Finally, the complete
reconstructed tensor �̂� is achieved by combining all subtask results. Figure 4.7
shows the tensor reconstruction algorithm used in TRPR.
86 Chapter 4: Point-wise based Ranking Methods
Figure 4.7. The TRPR tensor reconstruction algorithm
Algorithm: TRPR Tensor Reconstruction
Input: Latent factor matrices: 𝑀(1), 𝑀(2), 𝑀(3), 𝒞, 𝑄 = |𝑈|, 𝑅 = |𝐼|, 𝑆 = |𝑇|, Block-strip row size 𝑏 where 𝑏 ≪ 𝑆; 𝑁 ∶= 𝑆 𝑑𝑖𝑣 𝑏 and 𝑑 ∶= 𝑆 𝑚𝑜𝑑 𝑏.
Output: Reconstructed Tensor �̂�
1. 1-mode (matrix) product:
𝒴1̂ 𝒞 ×1 𝑀(1)
where 𝒴1̂ ∈ ℝ𝑄×𝐹×𝐹
2. 2-mode (matrix) product:
𝒴2̂ 𝒴1̂ ×2 𝑀(2)
where 𝒴2̂ ∈ ℝ𝑄×𝑅×𝐹
3. Block-striped 3-mode (matrix) product:
for 𝑛 1 to 𝑁
(𝑀(3))𝑛
𝑀(3)((𝑛−1)𝑏+1,𝐹)
where (𝑀(3))𝑛
∈ ℝ𝑏×𝐹
/* 3-mode (matrix) product: */
(𝒴3̂)𝑛 𝒴2̂ ×3 (𝑀(3))
𝑛⟺ (𝑌3̂(3))
𝑛 (𝑀(3))
𝑛𝑌2̂(3)
where (𝑌3̂(3))1
∈ ℝ𝑏×𝑄𝑅
(𝒴3̂)𝑛 mode-3 de-matricization of (𝑌3̂(3))
𝑛
where (𝒴3̂)𝑛∈ ℝ𝑄×𝑅×𝑏
𝒴3̂ 𝒴3̂ + (𝒴3̂)𝑛
end for
if 𝑑 0 then
𝑛 𝑛 + 1
(𝑀(3))𝑛
𝑀(3)((𝑛−1)𝑑+1,𝐹)
where (𝑀(3))𝑛
∈ ℝ𝑑×𝐹
/* 3-mode (matrix) product: */
(𝒴3̂)𝑛 𝒴2̂ ×3 (𝑀(3))
𝑛⟺ (𝑌3̂(3))
𝑛 (𝑀(3))
𝑛𝑌2̂(3)
where (𝑌3̂(3))1
∈ ℝ𝑑×𝑄𝑅
(𝒴3̂)𝑛 mode-3 de-matricization of (𝑌3̂(3))
𝑛
where (𝒴3̂)𝑛∈ ℝ𝑄×𝑅×𝑑
𝒴3̂ 𝒴3̂ + (𝒴3̂)𝑛
end if
�̂� 𝒴3̂ where �̂� ∈ ℝ𝑄×𝑅×𝑆
Chapter 4: Point-wise based Ranking Methods 87
Example 4.2: TRPR Tensor Reconstruction.
An example of tensor reconstruction is illustrated in Figure 4.8 by using the toy
example of a third-order tensor 𝒴 ∈ ℝ3×4×5 shown in Figure 4.2. Applying a
factorization technique with 𝐹 = 2 as the reduction size to 𝒴 results in three latent
factor matrices and one core tensor, 𝑀(1) ∈ ℝ3×2, 𝑀(2) ∈ ℝ4×2, 𝑀(3) ∈ ℝ5×2, and
𝐶 ∈ ℝ2×2×2. The reconstructed tensor �̂� is derived by multiplying all factorized
elements together. The intermediate tensor 𝒴2̂ ∈ ℝ3×4×2 is obtained by
implementing the 1-mode and 2-mode (matrix) products sequentially on the core
tensor 𝒞 and the latent factor matrices 𝑀(1) and 𝑀(2).
For the purpose of memory efficiency, the parallel matrix multiplication based on the
row wise block-striped matrix product is applied to multiply 𝒴2̂ by the last factor
matrix 𝑀(3). In this case, the 3-mode (matrix) products of tensor 𝒴2̂ by matrix 𝑀(3)
(denoted as 𝒴2̂ ×3 𝑀(3)) is converted as the multiplication of matrix 𝑀(3) by 𝑌2̂(3) as
the tensor 𝒴2̂ mode-3 matricization (denoted as 𝑀(3)𝑌2̂(3)).
As shown in Figure 4.8, the 3-mode (matrix) products of tensor 𝒴2̂ by matrix 𝑀(3) is
split, by choosing 𝑏 = 2, into three 3-mode block-striped (matrix) products, resulting
(𝑌3̂(3))1
∈ ℝ2×12, (𝑌3̂(3))2
∈ ℝ2×12, and (𝑌3̂(3))3
∈ ℝ1×12. The full reconstructed
tensor is derived by combining the mode-3 de-matricization of the three resulted 3-
mode block-striped (matrix) products, i.e. 𝒴3̂ (𝒴3̂)1+ (𝒴3̂)2
+ (𝒴3̂)3, where �̂�
𝒴3̂, �̂� ∈ ℝ3×4×5.
88 Chapter 4: Point-wise based Ranking Methods
ME
MO
RY
E
FFIC
IEN
T
AP
PR
OA
CH
EQUIVALENT
User 3
User 2
User 1
t a g
it
em
3-mode (matrix) product;
tag
ite
m
2-mode (matrix) product
3-mode (matrix) product with matricization;
tag
use
r
1-mode (matrix) product
COMBINING RESULTS
3-mode block-striped (matrix) product Full Reconstructed Tensor
;
User 3
User 2
User 1
t a g
it
em
Figure 4.8. Example of tensor reconstruction process by implementing the memory efficient approach
where 𝒴 ∈ ℝ3×4×5, 𝑄 = 3, 𝑅 = 4, 𝑆 = 5, 𝐹 = 2, and 𝑏 = 2
Chapter 4: Point-wise based Ranking Methods 89
The reconstructed tensor �̂� identifies the new entries that are inferred from the
latent factors. It is to be noted that, compared to the initial tensor in Figure 4.2, the
relevance grade 𝑦𝑢,𝑖,𝑡 in 𝒴 has been recalculated as predicted preference score �̂�𝑢,𝑖,𝑡
in �̂�, as illustrated in Figure 4.9. The predicted preference score �̂�𝑢,𝑖,𝑡 represents the
likeliness of user 𝑢 to tag item 𝑖 with tag 𝑡. As the system is trying to recommend
items that have not been selected by the users, the list of item recommendations for
each user will be selected based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.
User 1-0.0494
0.7318
0.5577
-0.0232
tag
ite
m
0.1359
-0.0332
0.2997
0.1664
-0.1249
0.6034
0.3170
-0.0445
0.1102
0.1537
0.4369
0.1928
-0.0186
-0.0215
0.1257
0.1199
User 20.7319
-0.0604
0.5335
-0.0071
-0.0186
0.1609
0.4453
0.2011
0.5665
-0.1961
0.1817
-0.0424
0.1602
0.1202
0.6126
0.2476
-00302
-0.0329
0.1404
0.1312
User 30.1037
0.1037
0.1683
0.1205
-0.0308
-0.0343
0.4882
0.2469
0.1150
0.1163
-0.1216
-0.0491
-0.0443
-0.0495
0.6666
0.3140
-0.0092
-0.0102
0.2043
0.1400
Figure 4.9. Example of the reconstructed tensor �̂� ∈ ℝ3×4×5
4.2.4.2 Candidate Item and Tag Preference Sets Generation
As previously described, TRPR attempts to rank and generate the list of
recommendations based on the probability score of a user 𝑢 to select items by taking
into account the user’s tag usage history. For this reason, two sets are created, i.e. the
candidate item set and the tag preference set, for each user 𝑢. The candidate item set,
𝑍𝑢 = {𝑖1, 𝑖2 , 𝑖3, … , 𝑖𝑅} where 𝑍𝑢 ⊆ 𝐼 with |𝑍𝑢| ≤ 𝑅 and 𝑅 is the size of set of items,
is a list of items that the user 𝑢 might be interested in based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.
Note that setting a user-defined threshold to the entries in �̂� is necessary in order to
determine whether the items can be considered for recommendations (Kutty et al.,
2012). The tag preference set, 𝑋𝑢 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑣} where 𝑋𝑢 ⊆ 𝑇 with |𝑋𝑢| ≤ 𝑣
and 𝑣 is the size of tag preference set where 𝑣 ≤ 𝑆 with 𝑆 as the size of set of tags, is
a list of tags that user 𝑢 has used to annotate the items. The tag preference set is
generated based on the maximum values of �̂�𝑢,𝑖,𝑡 on each (𝑢, 𝑖) ∈ 𝐴 set.
90 Chapter 4: Point-wise based Ranking Methods
4.2.4.3 Top-𝑁 Item Recommendation Generation via Probabilistic Ranking
The probabilistic ranking approach calculates the probability of users to select items
in 𝑍𝑢 by observing the previous usage activities of tag preference set 𝑋𝑢 in 𝒴. The
Bayes’ theorem is used for predicting the class candidate item 𝑍𝑢 that have the
highest posterior probability given 𝑋𝑢, 𝑝(𝑍𝑢|𝑋𝑢). The conditional probability can be
formulated as (McCallum and Nigam, 1998):
𝑝(𝑍𝑢|𝑋𝑢) =𝑝(𝑍𝑢)𝑝(𝑋𝑢|𝑍𝑢)
𝑝(𝑋𝑢) (4.11)
where prior 𝑝(𝑍𝑢) is the prior distributions of parameter set 𝑍𝑢 before 𝑋𝑢 is
observed; 𝑝(𝑋𝑢|𝑍𝑢) is the probability of observing tag preference set 𝑋𝑢 given 𝑍𝑢;
and 𝑝(𝑋𝑢) is the probability of observing 𝑋𝑢.
TRPR generates the Top-𝑁 list of recommendations for target user 𝑢 by
implementing the assumption of multinomial event model distribution for the Naïve
Bayes classifier, i.e. assuming that an item 𝑖𝑟 ∈ 𝑍𝑢 is represented by the number of
occurrences of 𝑡𝑐 ∈ 𝑋𝑢 (McCallum and Nigam, 1998). In this case, the posterior
probability 𝑝𝑢,𝑖𝑑 of user 𝑢 with tag preference 𝑋𝑢 for candidate item 𝑖𝑑 ∈ 𝑍𝑢 is
obtained by multiplying the prior probability of 𝑖𝑑, 𝑝(𝑍𝑢 = 𝑖𝑑), with the probability
of tag preference 𝑡𝑐 ∈ 𝑋𝑢 given 𝑖𝑑, 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑):
𝑝𝑢,𝑖𝑑 ∶= 𝑝(𝑖𝑑|𝑋𝑢) = 𝑝(𝑍𝑢 = 𝑖𝑑)∏ 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑)((∑ 𝑦𝑢,𝑖,𝑡𝑐𝑅𝑖=1 )+1)|𝑋𝑢|
𝑐=1 (4.12)
where 𝑦𝑢,𝑖,𝑡𝑐 denotes the binary relevance grade for user 𝑢 who has used tag
preference 𝑡𝑐 to annotate item 𝑖. The 𝑝(𝑍𝑢 = 𝑖𝑑) and 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑) are calculated
as:
𝑝(𝑍𝑢 = 𝑖𝑑) =∑ 𝕀((𝑢,𝑖𝑑,∗)∈𝐴𝑜𝑏)
𝑄𝑢=1
∑ ∑ 𝕀((𝑢,𝑖,∗)∈𝐴𝑜𝑏)𝑄𝑢=1
𝑅𝑖=1
(4.13)
𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑) =1+∑ 𝑦𝑢,𝑖𝑑,𝑡𝑐
𝑄𝑢=1
𝑆+∑ ∑ 𝑦𝑢,𝑖𝑑,t𝑆𝑡=1
𝑄𝑢=1
(4.14)
where the 𝕀(∙) is the indicator function which is equal to 1 if the condition is
satisfied, and 0 otherwise. Recall 𝑄, 𝑅, 𝑆 are the number of users, items, and tags,
respectively. To avoid zero values resulted from Equation (4.12) and Equation
Chapter 4: Point-wise based Ranking Methods 91
(4.14), a Laplacean estimate (Lops et al., 2011) is applied as a smoothing method by
adding one to those equations.
For the target user 𝑢, the list of Top-𝑁 item recommendations is an ordered set
of 𝑁 items, 𝑇𝑜𝑝𝑁𝑢, obtained by sorting the 𝑝𝑢,𝑖𝑑 of user’s candidate items in
descending order. Figure 4.10 describes the probabilistic ranking algorithm for
generating the Top-𝑁 list of item recommendation.
Figure 4.10. The probabilistic ranking for Top-𝑁 item recommendation generation algorithm
𝑝𝑢,𝑖𝑑 𝑝(𝑖𝑑|𝑋𝑢)
Algorithm: Probabilistic Ranking for Top-𝑵 Item Recommendation
Generation
Input: Initial tensor 𝒴, Reconstructed tensor �̂�, Tag preference size 𝑣, Number
of Recommendation 𝑁
Output: The list of 𝑁 items: 𝑇𝑜𝑝𝑁𝑢
For each target user 𝑢:
1. Generate the candidate item set:
𝑍𝑢 = {𝑖1, 𝑖2 , 𝑖3, … , 𝑖𝑅} with |𝑍𝑢| ≤ 𝑅
where 𝑍𝑢 {𝑖|(𝑢, 𝑖,∗) ∈ 𝐴\𝐴𝑜𝑏} /* non-observed items */
2. Generate the tag preference set:
𝑋𝑢 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑣} such that |𝑋𝑢| ≤ 𝑣
where each tag preference is derived based on 𝑚𝑎𝑥�̂�𝑢,𝑖,𝑡( (𝑢, 𝑖) ∈ 𝐴)
3. Calculate posterior probability of each item in 𝑍𝑢 and use the value for
generating Top-𝑁 item recommendation:
𝐷 ∅, 𝐿𝑖𝑠𝑡𝑃 ∅
/* initialize the 𝐿𝑖𝑠𝑡𝑃 using the first 𝑁 posterior values of 𝑍𝑢 */
for 𝑑 1 to 𝑁
𝑝𝑢,𝑖𝑑 𝑝(𝑖𝑑|𝑋𝑢)
𝐿𝑖𝑠𝑡𝑃𝐿𝑖𝑠𝑡𝑃 ⋃𝑝𝑢,𝑖𝑑
𝐷 𝐷 ∪ 𝑑
end for
/* update 𝐿𝑖𝑠𝑡𝑃 */
for 𝑑 (𝑁 + 1) to |𝑍𝑢|
if 𝑝𝑢,𝑖𝑑 > (min 𝐿𝑖𝑠𝑡𝑃) then
𝐿𝑖𝑠𝑡𝑃 𝐿𝑖𝑠𝑡𝑃 − (min 𝐿𝑖𝑠𝑡𝑃)
𝐷𝐷 − 𝑑𝑚𝑖𝑛
𝐿𝑖𝑠𝑡𝑃 ← 𝐿𝑖𝑠𝑡𝑃 ∪ 𝑝𝑢,𝑖𝑑
𝐷𝐷 ∪ 𝑑
end if
end for
𝑇𝑜𝑝𝑁𝑢 {𝑖𝑑 ∈ 𝑍𝑢 |𝑑 ∈ 𝐷}
92 Chapter 4: Point-wise based Ranking Methods
Example 4.3: Probabilistic Ranking.
An example on how to calculate the posterior probability score, using the toy
example, is illustrated by firstly representing the entries of the initial tensor 𝒴 in
Figure 4.2 and the reconstructed tensor �̂� in Figure 4.9 as tables in Figure 4.11(a)
and (b) respectively. Note that the non-negative and non-zero entries are disregarded,
which results in 38 non-zero entries out of a total of 60 entries in �̂�. From Figure
4.11(b), it can be observed that the tensor reconstruction process has recalculated
𝑦𝑢,𝑖,𝑡 in 𝒴 as continuous values �̂�𝑢,𝑖,𝑡 in �̂�.
(a) (b)
Figure 4.11. Example of tensor model from toy dataset with only non-negative and non-zero values
displayed as table: (a) Initial tensor 𝒴 ∈ ℝ3×4×5, and (b) Reconstructed tensor �̂� ∈ ℝ3×4×5
Chapter 4: Point-wise based Ranking Methods 93
Since the system is interested in recommending items, the process would identify
items that have not been selected by each targeted user based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.
As highlighted in Figure 4.11(b), �̂� identifies six new observed (𝑢, 𝑖) sets in total for
𝑢1, 𝑢2, and 𝑢3, i.e. {(𝑢1, 𝑖1), (𝑢1, 𝑖4), (𝑢2, 𝑖2), (𝑢2, 𝑖4), (𝑢3, 𝑖1), (𝑢3, 𝑖2)}. The
candidate item set of each user, that needs to be ranked probabilistically, is derived
as 𝑍𝑢1= {𝑖1, 𝑖4}; 𝑍𝑢2
= {𝑖2, 𝑖4}; 𝑍𝑢3= {𝑖1, 𝑖2}. By choosing 𝑣 = 3, the tag preference
sets of each user are derived as 𝑋𝑢1= {𝑡1, 𝑡3, 𝑡4}, 𝑋𝑢2
= {𝑡1, 𝑡3, 𝑡4}, and 𝑋𝑢3=
{𝑡2, 𝑡4, 𝑡5}. Using Equation (4.12), the posterior probability scores of the candidate
items for each user are calculated as:
𝑝𝑢1,𝑖1 =1
6∗ [(
1+1
5+1)1+1
∗ (1+0
5+1)1+1
∗ (1+0
5+1)1+1
] = 1.43𝑒−05
𝑝𝑢1,𝑖4 =1
6∗ [(
1+0
5+2)1+1
∗ (1+0
5+2)1+1
∗ (1+1
5+2)1+1
] = 0.57𝑒−05
𝑝𝑢2,𝑖2 =1
6∗ [(
1+1
5+2)1+1
∗ (1+1
5+2)0+1
∗ (1+0
5+2)1+1
] = 7.93e−05
𝑝𝑢2,𝑖4 =1
6∗ [(
1+0
5+2)1+1
∗ (1+0
5+2)0+1
∗ (1+1
5+2)1+1
] = 3.97e−05
𝑝𝑢3,𝑖1 =1
6∗ [(
1+0
5+1)1+1
∗ (1+0
5+1)1+1
∗ (1+0
5+1)1+1
] = 0.36e−05
𝑝𝑢3,𝑖2 =1
6∗ [(
1+0
5+1)1+1
∗ (1+0
5+1)1+1
∗ (1+0
5+1)1+1
] = 0.36e−05
Given the above posterior probability scores, the list of Top-𝑁 item
recommendations for each user can now be generated. It can be concluded that 𝑖1 is
more likely to interest 𝑢1 than 𝑖4 since 𝑝𝑢1,𝑖1: 1.43𝑒−05 > 𝑝𝑢1,𝑖4: 0.57𝑒−05. While 𝑖2
is more likely to interest 𝑢2 than 𝑖4, since 𝑝𝑢2,𝑖2: 7.93𝑒−05 > 𝑝𝑢2,𝑖4: 3.97𝑒−05. On the
other hand, 𝑖1 and 𝑖2 are on the same level of interest for 𝑢3, as 𝑝𝑢3,𝑖1: 0.36𝑒−05 =
𝑝𝑢3,𝑖2: 0.36𝑒−05.
As a result, 𝑇𝑜𝑝𝑁𝑢1, 𝑇𝑜𝑝𝑁𝑢2
and 𝑇𝑜𝑝𝑁𝑢3 are generated in the sequence order of {𝑖1,
𝑖4}, {𝑖2, 𝑖4} and {𝑖1, 𝑖2}, respectively. These results differ from the conventional
tensor-based approaches (Kutty et al., 2012; Nanopoulos, 2011; Rafailidis and Daras,
2013; Symeonidis et al., 2010) which generate 𝑇𝑜𝑝𝑁𝑢1, 𝑇𝑜𝑝𝑁𝑢2
and 𝑇𝑜𝑝𝑁𝑢3 as the
sequence order of {𝑖4, 𝑖1}, {𝑖4, 𝑖2} and {𝑖2, 𝑖1}, respectively. Note that the orders of
94 Chapter 4: Point-wise based Ranking Methods
conventional approach would be resultant based on the following conditions:
𝑚𝑎𝑥(�̂�𝑢1,𝑖1): 0.1359 < 𝑚𝑎𝑥(�̂�𝑢1,𝑖4): 0.1928, 𝑚𝑎𝑥(�̂�𝑢2,𝑖2): 0.1609 <
𝑚𝑎𝑥(�̂�𝑢2,𝑖4): 0.2476, and 𝑚𝑎𝑥(�̂�𝑢3,𝑖1): 0.1150 < 𝑚𝑎𝑥(�̂�𝑢3,𝑖2): 0.1163.
4.2.5 Empirical Evaluation
The proposed point-wise tensor-based TRPR method is evaluated and compared to
another point-wise tensor-based method, MAX (Symeonidis et al., 2010), and the
matrix-based method CTS (Kim et al., 2010). (Rendle and Schmidt-Thieme, 2010).
The variation of TRPR and MAX methods are demonstrated with three commonly
used tensor factorization techniques (i.e. CP, HOOI, and HOSVD (Kolda and Bader,
2009)) using the Matlab Tensor Toolbox (Bader et al., 2012). The results are
presented as TRPR-CP, TRPR-HOOI, TRPR-HOSVD, and MAX-CP, MAX-HOOI,
MAX-HOSVD for these variations.
The experiments are conducted by the 5-fold cross-validation experimentation.
For each fold, each dataset is randomly divided into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a
test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not
overlap in posts, i.e., there exist no triplets for a user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet
(𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The recommendation task is to predict and rank the
Top-𝑁 items for the users present in 𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured
using F1-Score and reported over the average values on all five runs.
4.2.5.1 Choosing the Latent Factor Matrix Size 𝐹
Investigation on the impact of latent factor matrix size, 𝐹, towards performance, is
conducted in order to choose which size of 𝐹 is to be used for the experiments.
Figure 4.12 shows the performance comparison of TRPR-CP on Delicious 10–core
with an increasing number of 𝐹 from 8 to 256. It can be observed that the
recommendation quality does not benefit from 𝐹 more than 128. For this reason, for
all the tensor-based methods, the size of the latent factor matrix 𝐹 is set to 128.
Chapter 4: Point-wise based Ranking Methods 95
Figure 4.12. Performance comparison of TRPR-CP with an increasing number of 𝐹
4.2.5.2 Accuracy Performance
The comparison of recommendation accuracy between the proposed TRPR method
and the benchmarking methods are investigated using the F1-Score on various Top-
𝑁 positions. Figure 4.13, Figure 4.14, Figure 4.15 and Figure 4.16 demonstrate that
TRPR outperforms the matrix-based method CTS and the conventional tensor-based
method MAX on the Delicious, LastFM, CiteULike and MovieLens datasets,
respectively.
(a)
(b)
(c)
Figure 4.13. F1-Score at various Top-𝑁 positions on Delicious dataset
5 10 15 202
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Delicious 10-core
Top-N
F1
-Sco
re
TRPR-CP F=8
TRPR-CP F=16
TRPR-CP F=32
TRPR-CP F=64
TRPR-CP F=128
TRPR-CP F=256
F1@5 F1@10 F1@15 F1@200
2
4
Top-N
Sco
re (
%)
Delicious 10-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
2
4
Top-N
Sco
re (
%)
Delicious 15-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
2
4
Top-N
Sco
re (
%)
Delicious 20-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
96 Chapter 4: Point-wise based Ranking Methods
(a)
(b)
(c)
Figure 4.14. F1-Score at various Top-𝑁 positions on LastFM dataset
(a)
(b)
(c)
Figure 4.15. F1-Score at various Top-𝑁 positions on CiteULike dataset
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
LastFM 10-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
LastFM 15-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
LastFM 20-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
2
4
Top-N
Sco
re (
%)
CiteULike 10-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
CiteULike 15-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
CiteULike 20-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
Chapter 4: Point-wise based Ranking Methods 97
(a)
(b)
(c)
Figure 4.16. F1-Score at various Top-𝑁 positions on MovieLens dataset
Table 4.1 lists the average of TRPR recommendation accuracy improvement to
show the outperformance over the MAX method when implemented on various
factorization techniques. The percentage scores are reported as an average
improvement over Top-5, Top-10, Top-15, and Top-20 values. The results show
that probabilistically ranking the candidate items, generated from the reconstructed
tensor, by utilising the user’s past tagging activities, can significantly improve the
recommendation accuracy.
When observing the robustness of TRPR with several factorization techniques,
it can be noted that TRPR with CP and HOOI factorization techniques achieve bigger
improvement compared to that of HOSVD. HOSVD optimizes each mode of tensor
𝒴 dimension separately and disregards the interaction among them (Kolda and
Bader, 2009). Therefore the candidate item and tag preference sets generated from
the reconstructed tensor �̂� could not reveal the user interest as much as CP and
HOOI that take all lateral interactions into consideration in the optimization process.
F1@5 F1@10 F1@15 F1@200
5
10
Top-NS
co
re (
%)
MovieLens 10-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
5
10
Top-N
Sco
re (
%)
MovieLens 15-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
F1@5 F1@10 F1@15 F1@200
10
20
Top-N
Sco
re (
%)
MovieLens 20-core
MAX-CP
TRPR-CP
MAX-HOSVD
TRPR-HOSVD
MAX-HOOI
TRPR-HOOI
CTS
98 Chapter 4: Point-wise based Ranking Methods
Dataset 𝒑-core Factorization technique
CP HOSVD HOOI
Delicious 10 31.14% 15.23% 24.77%
15 43.41% 11.96% 45.35%
20 25.39% 1.28% 24.31%
LastFM 10 25.83% 0.22% 25.35%
15 30.60% 2.92% 25.69%
20 27.16% 13.17% 26.92%
CiteULike 10 37.52% 10.99% 35.05%
15 28.31% 7.51% 30.12%
20 29.54% 3.43% 32.06%
MovieLens 10 23.53% 2.56% 22.49%
15 38.39% 11.00% 28.24%
20 23.30% 13.86% 26.13%
Table 4.1. Average TRPR accuracy improvement over MAX
Additionally, from Table 4.1, it can also be observed that the 𝑝-core size
impacts improvement of recommendation accuracy. On the Delicious and CiteUlike
datasets, for all factorization techniques, the larger the size of the 𝑝-core, the less
improvement is achieved. On the contrary, the improvement tends to increase when a
larger 𝑝-core size is implemented on LastFM and MovieLens datasets. The
characteristic of the datasets listed in Table 3.2 is the reason behind this. Table 3.2
shows that the number of users is always greater than the number of items available
on the Delicious and CiteULike dataset. While on the LastFM and MovieLens
dataset, the numbers of items offered is always more than the number of users.
4.2.5.3 Impact of Tag Preference Set Size
The impact of tag preference set size to the performance of TRPR is investigated by
measuring the F1-Score at various scales of 𝑣 values, i.e. 10 to 100. The examination
is demonstrated on TRPR implemented to the CP factorization technique as
implementation on other techniques show similar results.
Figure 4.17, Figure 4.18, Figure 4.19, and Figure 4.20 display the impact of tag
preference set size on the Delicious, LastFM, CiteULike, and Movielens datasets,
Chapter 4: Point-wise based Ranking Methods 99
respectively. The trend shows that TRPR achieves the best F1-Score on small 𝑣
values (at most 𝑣 = 40), while using larger 𝑣 value results into inferior performance
as the tag preference becomes too general and may not really indicate the users
preference.
(a) (b) (c)
Figure 4.17. Impact of tag preference set size on Delicious dataset
(a) (b) (c)
Figure 4.18. Impact of tag preference set size on LastFM dataset
(a) (b) (c)
Figure 4.19. Impact of tag preference set size on CiteULike dataset
(a) (b) (c)
Figure 4.20. Impact of tag preference set size on MovieLens dataset
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
Delicious 10-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
Delicious 15-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
Delicious 20-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
LastFM 10-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
LastFM 15-core
10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
LastFM 20-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
CiteULike 10-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
CiteULike 15-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
Tag Preference Size (v)
F1-
Sco
re@
10 (
%)
CiteULike 20-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
MovieLens 10-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
MovieLens 15-core
10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
Tag Preference Size (v)
F1-
Sco
re@
10 (%
)
MovieLens 20-core
100 Chapter 4: Point-wise based Ranking Methods
4.2.5.4 Scalability
The scalability of TRPR, in comparison to the MAX methods (Symeonidis et al.,
2010), is examined on the full tensor reconstruction process in terms of the space
consumption and CPU runtime. The examination is demonstrated on the largest
dataset used in this thesis, i.e. Delicious dataset. The space consumption and CPU
runtime are measured at various 𝑝-core, i.e. 10, 15, 20, 50, 80, and 100 core sizes.
As a result, six tensor models of different 𝑢𝑠𝑒𝑟 × 𝑖𝑡𝑒𝑚 × 𝑡𝑎𝑔 dimensionalities were
built as: 2,009 × 1,485 × 2,589; 1,609 × 719 × 1,761; 1,359 × 424 × 1,321;
665 × 52 × 422; 362 × 13 × 189; and 250 × 7 × 125, respectively. Accordingly,
the bigger the 𝑝-core, the smaller the tensor dimensionality is achieved.
(a) Space Consumption
(b) CPU Runtime
Figure 4.21. Scalability comparison by varying tensor dimensionality on Delicious dataset
250×7×125 362×13×189 665×52×422 1359x424x1321 1609×719×1761 2009x1485x258910
0
101
102
103
104
105
106
107
108
109
Space Consumption
Tensor Dimensionality
log
Sp
ace
(kb
)
MAX-CP
TRPR-CP
MAX-HOOI
TRPR-HOOI
MAX-HOSVD
TRPR-HOSVD
250×7×125 362×13×189 665×52×422 1359x424x1321 1609×719×1761 2009x1485x258910
0
101
102
103
104
105
106
107
108
109
CPU Runtime
Tensor Dimensionality
log
Ru
nti
me (
sec)
MAX-CP
TRPR-CP
MAX-HOOI
TRPR-HOOI
MAX-HOSVD
TRPR-HOSVD
Chapter 4: Point-wise based Ranking Methods 101
Figure 4.21 demonstrates the scalability analysis of TRPR and MAX methods
on a single processor. It can be observed that TRPR is able to run on all tensor
dimensionalities. On the contrary, MAX failed to run on the two largest data
(2,009 × 1,485 × 2,589 and 1,609 × 719 × 1,761), due to memory overflow.
These results show that TRPR is scalable for large tensor size on any factorization
techniques with nearly constant space consumption and a linear time computation to
the tensor dimensionality. It is to be noted that for the purpose of accuracy
benchmarking, the 𝑛-mode block-striped (matrix) product was implemented to the
MAX method for making it applicable for all datasets used in the experiments.
4.2.6 Summary of Probabilistic Ranking
In this section, the Tensor-based Item Recommendation using Probabilistic Ranking
methods (TRPR) is proposed to address the scalability and accuracy challenges in
using tensor models in a tag-based item recommendation system. The method utilises
a memory efficient loop technique to enable scalable tensor reconstruction and
probabilistic ranking to improve the recommendation accuracy of candidate items
generated from the reconstructed tensor. TRPR developed the simple but effective
concept of block-striped parallel matrix multiplication to enable scalable tensor
reconstruction as well as advancing the concept of probabilistic ranking to achieve
higher recommendation accuracy.
The experimental results on various real-world datasets have demonstrated
that:
The implementation of an 𝑛-mode block-striped (matrix) product makes the
full tensor reconstruction scalable for large datasets;
The proposed TRPR, with the variations of 𝑝-core and factorization
techniques, outperforms the benchmarking methods in terms of accuracy.
This ascertains that recommendation accuracy can be improved with
probabilistically ranking the candidate items, generated from the full
reconstructed tensor, by utilising the user’s past tagging activities.
102 Chapter 4: Point-wise based Ranking Methods
4.3 WE-RANK: WEIGHTED TENSOR APPROACH FOR RANKING
4.3.1 Overview
The developed We-Rank method focuses on dealing with the sparsity problem and
improving the recommendation accuracy during the learning-to-rank procedure.
Figure 4.22 illustrates the overview of the recommendation ranking using the
weighted tensor approach, called Recommendation Ranking using Weighted Tensor
(We-Rank).
boolean
Scheme
Tagging Data
Item 3
Item 1
Tag 2
Tag 3
Tag 4
Tag 5
Tag 1
Item
2User
1
User2
User3
Item4
User Profile
Construction
YÎRQ x R x S
Latent Factors
Generation
M(1)
, M(2)
,M(3)
Top-N Item Recommendation Generation
TopNu
Target
User
User Tag Usage
Likeliness
Generation
LÎRQ x S
Weighted Tensor
Construction
WÎRQ x R x S
Positive and
Negative Tag
Preference Sets
Generation
L+
u, L-u
Figure 4.22. Overview of the weighted tensor approach for ranking method (We-Rank)
To begin, the same as in TRPR, a third-order tensor representing the user
profiles is constructed from the tagging data by using the boolean scheme. Unlike
TRPR, We-Rank does not solely use the boolean user profiles for finding the hidden
relationships between the users, i.e. optimizing the tensor model with the least square
loss function. For learning the model, We-Rank implements a weighted tensor
approach that plays an important role in learning, as it controls and differentiates the
reward and penalty for the observed and non-observed tagging data entries of the
primary tensor model respectively. The primary tensor model that represents the non-
boolean values now becomes the underlying learning-to-rank model for
recommendation. This weighted approach tackles the data sparsity problem by
Chapter 4: Point-wise based Ranking Methods 103
ignoring the missing data and modelling only the known entries (Acar et al., 2011).
The learning model can now be optimized as a weighted least square loss function.
Based on initial tensor boolean values, We-Rank first calculates the user tag
usage likeliness to each tag. The likeliness scores are then sorted in descending order
so that tags with high and low likeliness scores can be distinguished for each user.
For each user, the tags with high likeliness scores are called the positive tag
preference set, whereas those of low scores are called the negative tag preference set.
Given the observed tagging data entries in the primary tensor model, a multi-
dimensional data structure is needed to store the values of positive and negative tag
preference sets. For this reason, another tensor is constructed and is called a weighted
tensor in this thesis. The entries of the weighted tensor are a bijective mapping to the
entries of the primary tensor model that represents the user profiles. This mapping
ensures that each observed entry of the primary tensor is rewarded, such that the
corresponding entry of the weighted tensor holds higher positive value than those of
the non-observed ones. Note that the values in the weighted tensor are not changing
over the learning process. The process of weighted tensor construction is detailed in
Section 4.3.3.2 and the example is shown in Figure 4.29. In comparison to a TRPR
that needs to apply a subsequent approach for correctly ranking the order of
recommended items following the reconstruction processes, the resultant latent
factors of We-Rank can be directly used for generating the Top-𝑁 list of item
recommendations.
The next three sub-sections detail the three main processes in We-Rank: (1)
user profile construction, i.e. tensor model construction; (2) learning-to-rank
procedure, i.e. latent factors generation via tensor factorization to derive the
relationships inherent in the model; and (3) recommendation generation. The last two
sub-sections present empirical evaluation and summary of the method.
4.3.2 User Profile Construction
The user profile construction is the process of constructing the initial tensor to model
the multi-dimension data. In the same way as TRPR, the developed We-Rank method
uses a boolean scheme (Symeonidis et al., 2010) to build a primary third-order tensor
model to represent the user profile and ranking learning model. The boolean scheme
simply interprets the tagging data as binary data, which includes two types of entries:
104 Chapter 4: Point-wise based Ranking Methods
“relevant” and “irrelevant”. The “relevant” entries, labelled as “1”, are the observed
entries where the user has explicitly revealed interest by annotating item using tags;
while the “irrelevant” entries, labelled as “0”, are the remaining (non-observed)
entries.
Once again, let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users,
𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆
tags. From the tagging data 𝐴 ∶ 𝑈 × 𝐼 × 𝑇, a vector of 𝑎: (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an
activity of user 𝑢 to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏,
defines the state in which users have expressed their interest to items in the past, by
annotating those items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed
tagging data is very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|. The initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is
constructed where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a binary numerical value that
represents the relevance grade of the tagging activity. The rules of boolean scheme
relevance grade labelling to generate the entries of tensor 𝒴 is formulated in
Equation (4.1). A sample primary tensor model 𝒴 ∈ ℝ3×4×5 populated by
implementing the boolean interpretation scheme is illustrated in Figure 4.2.
4.3.3 Learning-to-Rank Procedure
This section details how the latent factor corresponding to each dimension of tensor
𝒴 is inferred.
4.3.3.1 Optimization Criterion and Factorization Technique
In order to emphasise and penalise the observed and non-observed tagging data
respectively, the weighted Mean Square Error (wMSE) is used as the optimization
criterion for solving a regression/classification task. The wMSE of all users over all
items under all tags can be defined as (Acar et al., 2011):
𝑤𝑀𝑆𝐸 ≔ ∑ ∑ ∑ [𝑤𝑢,𝑖,𝑡(𝑦𝑢,𝑖,𝑡− �̂�
𝑢,𝑖,𝑡)]
2𝑆𝑡=1
𝑅𝑖=1
𝑄𝑢=1 (4.15)
where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned as one of elements in the binary
relevance set of { 0, 1} from the primary tensor model representing the user profiles.
�̂�𝑢,𝑖,𝑡 is the predicted preference score that reflects the preference level of user 𝑢 for
annotating item 𝑖 using a tag 𝑡, calculated from the latent factors. 𝑤𝑢,𝑖,𝑡 is the
Chapter 4: Point-wise based Ranking Methods 105
weighted reward/penalty value for 𝑦𝑢,𝑖,𝑡 calculated from the weighted tensor
explained later in Section 4.3.3.2.
From the primary tensor 𝒴, latent factors are derived to infer the latent
relationships between the dimensions of users, items and tags. The CP model (Kolda
and Bader, 2009) is used as the factorization technique as well as the predictor
function model. CP is a well-known factorization technique that has shown to be less
expensive in both memory and time consumption compared to another well-known
algorithm, Tucker (Kolda and Bader, 2009).
Y »
(Q x R x S)
+m
(3)1
m(1)
1
m(2)
1
m(3)
2
m(1)
2
m(2)
2
+m
(3)F
m(1)
F
m(2)
F
. . .
Figure 4.23. The CP factorization model for third-order tensor
As illustrated in Figure 4.23, CP factorizes a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆
into a sum of latent factors rank-one of 𝑚𝑓(1)
∈ ℝ𝑄, 𝑚𝑓(2)
∈ ℝ𝑅, and 𝑚𝑓(3)
∈ ℝ𝑆 for
𝑓 = 1,… , 𝐹, where 𝐹 is the column size of the corresponding latent factors matrix.
These latent factors are used in calculating the predicted score that reflects the
preference level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡. The predicted
preference score is calculated as:
�̂�𝑢,𝑖,𝑡 ∶= ∑ 𝑚𝑢,𝑓(1)
∙ 𝑚𝑖,𝑓(2)
∙ 𝑚𝑡,𝑓(3)𝐹
𝑓=1 = ⟦𝑀(1), 𝑀(2), 𝑀(3)⟧ (4.16)
4.3.3.2 Weighted Tensor
This section details how the weighted tensor 𝒲 is constructed. The weighted tensor
𝒲 is a main component in the We-Rank method as it regulates the importance of the
observed and non-observed entries in the learning model. The entries of 𝒲 are
generated based on the tag usage likeliness of each user.
106 Chapter 4: Point-wise based Ranking Methods
The User Tag Usage Likeliness Generation
We-Rank assumes that there are two characteristics that can determine the tag usage
likeliness of each user. Firstly, users use different choices of tags for annotating the
same item. From the toy example as shown in Figure 4.2, it can be observed that
each user has used different tags for annotating 𝑖3, i.e. 𝑢1, 𝑢2, and 𝑢3 use {𝑡4},
{𝑡2, 𝑡4}, and {𝑡2}, respectively. Secondly, the same tag can be used for annotating
different items. From Figure 4.2, it can be observed that 𝑡1 has been used by 𝑢1 and
𝑢2 for annotating {𝑖2} and {𝑖1} respectively.
This thesis attempts to capture these two characteristics by revealing the user
and tag latent features. Given the primary tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 which represents the
user profile, the user and tag latent features can be revealed by applying the non-
negative matrix factorization technique (Cichocki et al., 2009; Kim et al., 2014) to
the mode-1 and mode-3 matricizations of tensor 𝒴, i.e. 𝑌(1) ∈ ℝ𝑄×𝑅𝑆 and 𝑌(3) ∈
ℝ𝑆×𝑄𝑅. The formulations can be presented as follows:
𝑌(1) ∶= 𝐴𝐶′ (4.17)
𝑌(3) ∶= 𝐵𝐷′ (4.18)
where 𝐴 ∈ ℝ𝑄×𝐹 and 𝐵 ∈ ℝ𝑆×𝐹 are the user and tag latent features respectively, in
which 𝐹 is the size of the latent feature. Whereas 𝐶 ∈ ℝ𝑅𝑆×𝐹and 𝐷 ∈ ℝ𝑄𝑅×𝐹 are the
coefficient matrices of the mode-1 and mode-3 matricizations of tensor 𝒴, in regards
to the user and tag latent features, respectively. Equation (4.17) and Equation (4.18)
define that each user and tag, which is represented as columns in 𝑌(1) and 𝑌(3)
matrices, can be approximated as a non-negative linear combination of basis vector,
which are represented as columns in 𝐴 and 𝐵 matrices (Kim et al., 2014),
respectively. In other words, each column of the 𝐴 and 𝐵 matrices is representing the
importance of the user latent feature (𝛽) and the tag latent feature (𝜑) to a particular
user and tag, respectively. After the user and tag latent features are generated, the tag
usage likeliness of a user 𝑢 to a tag 𝑡 can then be calculated as:
𝑙𝑢,𝑡 ∶= ∑ 𝑎𝑢,𝑘𝑏𝑘,𝑡𝐹𝑘=1 (4.19)
where 𝑙𝑢,𝑡, 𝑎𝑢,𝑘, and 𝑏𝑘,𝑡 are elements of the User Tag Usage Likeliness matrix
𝐿 ∈ ℝ𝑄×𝑆, the User Latent Feature matrix 𝐴 ∈ ℝ𝑄×𝐹, and the Tag Latent Feature
Chapter 4: Point-wise based Ranking Methods 107
matrix 𝐵 ∈ ℝ𝑅×𝐹. The detail of the User Tag Usage Likeliness Generation algorithm
is shown in Figure 4.24.
1: Algorithm: User Tag Usage Likeliness Generation
2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, the size of latent feature 𝐹
3: Output: User Tag Usage Likeliness Matrix 𝐿 ∈ ℝ𝑄×𝑆
4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆
5: Generate 𝑦𝑢,𝑖,𝑡 by using Equation (4.1)
6: Get Y(1) by implementing mode-1 matricization on tensor 𝒴
7: Get 𝑌(3) by implementing mode-3 matricization on tensor 𝒴
8: Get 𝐴 ∈ ℝ𝑄×𝐹, 𝐵 ∈ ℝ𝑆×𝐹 by implementing the non-negative matrix
factorization on 𝑌(1) and 𝑌(3)
9: for 𝑢 ∈ 𝑈 do
10: for 𝑡 ∈ 𝑇 do
11: for 𝑘 ← 1 𝑡𝑜 𝐹 do
12: 𝑙𝑢,𝑡 ⟵ 𝑎𝑢,𝑘𝑏𝑘,𝑡
13: end
14: end
15: end /* 𝐿 ∈ ℝ𝑄×𝑆 */
Figure 4.24. The user Tag Usage Likeliness generation algorithm
Example 4.4: User Tag Usage Likeliness Generation.
An example of how to generate User Tag Usage Likeliness is illustrated by using the
toy example of a third-order tensor 𝒴 ∈ ℝ3×4×5 in Figure 4.2. As shown in Figure
4.25(a) and (b), the mode-1 and mode-3 matricizations of tensor 𝒴 result into
𝑌(1) ∈ ℝ3×20 and 𝑌(3) ∈ ℝ5×12. Figure 4.26(a) and (b) present the resultant user and
tag latent feature matrices after applying the non-negative matrix factorization to 𝑌(1)
and 𝑌(3) respectively, by choosing 𝐹 = 2.
As previously described, each column of the 𝐴 and 𝐵 matrices is representing the
importance of the user latent feature (𝛽) and the tag latent feature (𝜑) to a particular
user and tag, respectively. From Figure 4.26(a), it can be observed that the
importance of feature 𝛽1 and 𝛽2 to 𝑢1 is 0.3943 and 1.0000, respectively. On the
other hand, from Figure 4.26(b), it can be perceived that the importance of feature 𝜑1
and 𝜑2 to 𝑡1 is 0.0000 and 0.7071, respectively. The tag usage likeliness of a user 𝑢
to a tag 𝑡 is calculated by using Equation (4.19) and the resultant matrix is shown in
Figure 4.27.
108 Chapter 4: Point-wise based Ranking Methods
𝑌(1) = [0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 01 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1
]
(a)
𝑌(3) =
[ 0 1 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 1 00 1 0 0 0 0 0 0 0 0 0 00 0 1 0 0 0 1 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0]
(b)
Figure 4.25. Example of the resulted matricization of tensor 𝒴 ∈ ℝ3×4×5: (a) Mode-1 matricization
𝑌(1) ∈ ℝ3×20, and (b) Mode-3 matricization 𝑌(3) ∈ ℝ5×12
0.3943 1.0000
0.7835 0.0000
0.4803 0.0000
User Feature
u1
u2
u3
β1 β2
0.0000 0.7071
0.7013 0.0000
0.0000 0.7071
Tag Feature
t1
t2
t3
j1 j2
0.7106 0.0000
0.0570 0.0000
t4
t5
(a) (b)
Figure 4.26. Example of the resulted latent feature matrix: (a) User latent feature matrix 𝐴 ∈ ℝ3×2,
and (b) Tag latent feature matrix 𝐵 ∈ ℝ5×2
0.7071 0.2765
0.0000 0.5495
0.0000 0.3368
User Tag Usage Likeliness
u1
u2
u3
t1 t2 t3 t4 t5 0.7071 0.2802
0.0000 0.5568
0.0000 0.3413
0.0225
0.0447
0.0274
Figure 4.27. Example of the resulted User Tag Usage Likeliness matrix 𝐿 ∈ ℝ3×5
Positive and Negative Tag Preference Sets Generation
Once the user tag usage likeliness to each tag is calculated, the list of tags is ordered
in descending order based on the likeliness scores. For each user, tags at the top and
bottom of the list can now be distinguished and called positive and negative tag
preference sets respectively. This distinction is done by setting the size of the tag
preference set, 𝑣. The positive tag preference set, 𝐿𝑢+, is generated based on the larger
Chapter 4: Point-wise based Ranking Methods 109
values of 𝐿𝑢,∗, where 𝐿𝑢+ ⊆ 𝑇 such that |𝐿𝑢
+| ≤ 𝑣. Whereas the negative tag
preference set, 𝐿𝑢−, is generated based on the lower values of 𝐿𝑢,∗, where 𝐿𝑢
− ⊆ 𝑇 such
that |𝐿𝑢−| ≤ 𝑣.
Weighted Tensor Construction
Given the observed tagging data entries, positive and negative tag preference sets,
the weighted tensor 𝒲 can be constructed as detailed in Figure 4.28. The entries of
𝒲 are a bijective mapping to the entries of the primary tensor model 𝒴 that
represents the user profile. Given the list of observed entries, the positive and
negative tag preference sets, 𝑤𝑢,𝑖,𝑡 can be assigned as one of the elements of the
ordinal relevance values of {2,1,0, −1}. The “2” value represents the observed
entries, whereas “1” and “-1” values represent non-observed entries, which belongs
to the positive and negative tag preference sets respectively. Meanwhile, any other
entries are labelled as “0”. From Figure 4.28, it can be noted that each observed entry
of the primary tensor is indisputably rewarded such that the associated entry of the
weighted tensor holds higher positive value than that of the non-observed one.
1: Algorithm: Weighted Tensor Construction
2: Input : Tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, User Tag Usage Likeliness Matrix 𝐿 ∈ ℝ𝑄×𝑆 ,
Tag preference size 𝑣
3: Output: Weighted Tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆
4: for 𝑢 ∈ 𝑈 do
5: Get 𝐿𝑢+ max (𝐿𝑢,∗) such that (|𝐿𝑢
+| ≤ 𝑣)
6: Get 𝐿𝑢− min (𝐿𝑢,∗) such that (|𝐿𝑢
−| ≤ 𝑣)
7: Initialize 𝒲 ∈ ℝ𝑄×𝑅×𝑆 with zeroes
8: for 𝑖 ∈ 𝐼 do
9: for 𝑡 ∈ 𝑇 do
10: if 𝑦𝑢,𝑖,𝑡 == 1 then /* observed entries */
11: 𝑤𝑢,𝑖,𝑡 ⟵ 2
12: elseif (𝑡 ∈ 𝐿𝑢+) ∧ (𝑦𝑢,𝑖,𝑡 ≠ 1) then
13: 𝑤𝑢,𝑖,𝑡 ⟵ 1
14: elseif (𝑡 ∈ 𝐿𝑢−) ∧ (𝑦𝑢,𝑖,𝑡 ≠ 1) then
15: 𝑤𝑢,𝑖,𝑡 ⟵ −1
16: end
17: end
18: end
19: end
Figure 4.28. The weighted tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆 construction algorithm
110 Chapter 4: Point-wise based Ranking Methods
Example 4.5: Weighted Tensor Construction.
An example on how to construct the weighted tensor 𝒲 is illustrated by using the
toy example of primary tensor 𝒴 ∈ ℝ3×4×5 as shown in Figure 4.29(a) and the User
Tag Usage Likeliness matrix 𝐿 ∈ ℝ3×5, built in Example 4.4, as shown in Figure
4.27.
By choosing 𝑣 = 2, the positive user preference sets of each user are derived as
𝐿𝑢1+ = {𝑡1, 𝑡3}, 𝐿𝑢2
+ = {𝑡4, 𝑡2}, and 𝐿𝑢3+ = {𝑡4, 𝑡2}. Whereas the negative preference
sets of each user are derived as 𝐿𝑢1− = {𝑡2, 𝑡5}, 𝐿𝑢2
− = {𝑡3, 𝑡1}, and 𝐿𝑢3− = {𝑡3, 𝑡1}.
By implementing the weighted tensor construction algorithm in Figure 4.28, the
entries of weighted tensor 𝒲 are generated and the result is shown in Figure 4.29(b).
It can be observed that 𝒲 is resulted by regarding the user’s collaborations and
associations, as its construction process implicitly clusters similar users as per tag
usage, i.e. by implementing the Tag Usage Likeliness generation algorithm in Figure
4.24. Note that the non-negative matrix factorization technique is not used for
generating the latent factors of We-Rank since it takes only non-negative values for
all the latent factors (Xu et al., 2003), which means that entries that the users do not
like – represented as negative values – would not be regarded for generating the list
of recommendations.
User 30 0 0 0 0
0
0
0
0 0 0 0
1 0 0 0
0 0 1 1
User 2
1 0 0 0 0
0
0
0
0 0 0 0
1 0 1 0
0 0 0 0
User 10 0 0 0 0
1
0
0
1 0 0
0 0 1 0
0 0 0 0
tag
ite
m 0
User 30 0 0 0 0
0
-1
-1
0 0 0 0
2 -1 1 0
1 -1 2 2
User 2
2 1 -1 1 0
0
-1
0
0 0 0 0
2 -1 2 0
0 0 0 0
tag
ite
m
User 1
0 0 0 0 0
2
1
0
-1 2 0 -1
-1 1 2 -1
0 0 0 0
(a) (b)
Figure 4.29. Example of: (a) Primary tensor 𝒴 ∈ ℝ3×4×5, and (b) the resulted Weighted tensor
𝒲 ∈ ℝ3×4×5
4.3.3.3 Latent Factors Generation
The latent factors generation, via tensor factorization, is the process of deriving the
latent relationships between dimensions of tensor model. The latent factorss, 𝑀(1),
Chapter 4: Point-wise based Ranking Methods 111
𝑀(2), and 𝑀(3), corresponding to each dimension of tensor 𝒴, are generated by
optimizing the objective function of the recommendation model.
Given the optimization criterion in Equation (4.15), the resultant objective
function of We-Rank can be formulated as (Acar et al., 2011):
𝐿(𝛩) ∶= ∑ ∑ ∑ [𝑤𝑢,𝑖,𝑡(𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡)]2𝑆
𝑡=1𝑅𝑖=1
𝑄𝑢=1 = [𝒲 ∗ (𝒴 − ⟦𝑀(1),𝑀(2),𝑀(3)⟧)]
2(4.20)
The gradients of 𝑤𝑀𝑆𝐸 given a case (𝑢, 𝑖, 𝑡) with respect to the model parameter are
formulated as follows (Acar et al., 2011):
𝜕𝐿
𝜕𝑚𝑢(1) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡
2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1
𝑅𝑖=1 ∙ 𝑚𝑖
(2)∙ 𝑚𝑡
(3)𝑄𝑢=1 (4.21)
𝜕𝐿
𝜕𝑚𝑖(2) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡
2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1
𝑅𝑖=1 ∙ 𝑚𝑢
(1)∙ 𝑚𝑡
(3)𝑄𝑢=1 (4.22)
𝜕𝐿
𝜕𝑚𝑡(3) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡
2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1
𝑅𝑖=1 ∙ 𝑚𝑢
(1)∙ 𝑚𝑖
(2)𝑄𝑢=1 (4.23)
The We-Rank learning algorithm is outlined in Figure 4.30 to find the latent factors.
1: Algorithm: We-Rank Learning
2: Input : Initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, Weighted tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆, latent
factor matrix column size 𝐹, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
3: Output: Latent factors 𝑀(1), 𝑀(2), 𝑀(3)
4: Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹 , 𝑀(2)(0)
∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0
5: repeat
6: for 𝑢 ∈ 𝑈 do
7: 𝑚𝑢(1)
⟵ 𝑚𝑢(1)
+𝜕𝐿
𝜕𝑚𝑢(1) based on Equation (4.21)
8: End 9: for 𝑖 ∈ 𝐼 do
10: 𝑚𝑖(2)
⟵ 𝑚𝑖(2)
+𝜕𝐿
𝜕𝑚𝑖(2) based on Equation (4.22)
11: End 12: for 𝑡 ∈ 𝑇 do
13: 𝑚𝑡(3)
⟵ 𝑚𝑡(3)
+𝜕𝐿
𝜕𝑚𝑡(𝑡) based on Equation (4.23)
14: end 15: + + ℎ
16: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
Figure 4.30. The We-Rank learning algorithm
112 Chapter 4: Point-wise based Ranking Methods
4.3.4 Recommendation Generation
Since We-Rank implements a weighted scheme during the learning process such that
reward and penalty are given to the observed and non-observed tagging data entries
respectively, the resultant latent factors of We-Rank can be directly used for
generating the Top-𝑁 list of item recommendations for each target user. In this case,
the recommended items are selected based on the maximum value of �̂�𝑢,𝑖,𝑡,
calculated using Equation (4.16).
4.3.5 Empirical Evaluation
The performance of We-Rank is compared with benchmarking methods, including
MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme, 2010), and
CTS (Kim et al., 2010). It is to be noted that comparison with the previously
proposed method, TRPR, is presented in Chapter 6. For all tensor-based methods, the
size of latent factor matrix 𝐹 is set to 128 as the recommendation quality usually
does not benefit from more than that value. The experiments are conducted by 5-fold
cross-validation experimentation. For each fold, each dataset is randomly divided
into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of
posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not overlap in posts, i.e., there exist no triplets for a
user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet (𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The
recommendation task is to predict and rank the Top-𝑁 items for the users present in
𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured using the F1-Score and reported over
the average values on all five runs.
4.3.5.1 Impact of Tag Preference Size
The impact of tag preference set size to the performance of We-Rank is investigated
by measuring the F1-Score at various scales of 𝑣 values, i.e. 10 to 100. Figure 4.31,
Figure 4.32, Figure 4.33, and Figure 4.34 respectively display the impact of tag
preference set size on Delicious, LastFM, CiteULike, and Movielens datasets. It can
be observed that We-Rank achieves the best F1-Score when 𝑣 is in the range of 50 to
90. After these values, further increase of 𝑣 value decreases the We-Rank
performance. This result indicates that selecting too many numbers of tags is not
only causing unnecessary computation cost but also corrupting the user’s preference.
Chapter 4: Point-wise based Ranking Methods 113
(a) 10-core (b) 15-core (c) 20-core
Figure 4.31. Impact of tag preference set size on Delicious dataset
(a) 10-core (b) 15-core (c) 20-core
Figure 4.32. Impact of tag preference set size on LastFM dataset
(a) 10-core (b) 15-core (c) 20-core
Figure 4.33. Impact of tag preference set size on CiteULike dataset
(a) 10-core (b) 15-core (c) 20-core
Figure 4.34. Impact of tag preference set size on MovieLens dataset
10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
Delicious 10-core
10 20 30 40 50 60 70 80 901000
0.2
0.4
0.6
0.8
1
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
Delicious 15-core
10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
Delicious 20-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
LastFM 10-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
4
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
LastFM 15-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
LastFM 20-core
10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
1
1.2
1.4
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
CiteULike 10-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
CiteULike 15-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
CiteULike 20-core
10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3
3.5
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
MovieLens 10-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
MovieLens 15-core
10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
Tag Preference Size (v)
F1-
Sc
ore
@10
(%
)
MovieLens 20-core
114 Chapter 4: Point-wise based Ranking Methods
4.3.5.2 Primary Tensor 𝒴 and Weighted Tensor 𝒲
The impact of tag preference set size to the weighted tensor 𝒲 is investigated by
comparing its density from 𝐷𝑡𝑟𝑎𝑖𝑛 at various scales of 𝑣 values, i.e. 0 to 100. Note
that the density of 𝒲 is the same with that of primary tensor 𝒴 when 𝑣 = 0. Figure
4.35 shows that the weighted tensor 𝒲 density is linear to the tag preference set size
on each dataset used in this thesis.
(a) (b)
(c) (d)
Figure 4.35. The weighted tensor 𝒲 densities at various tag preference set size on: (a) Delicious, (b)
LastFM, (c) CiteULike, and (d) MovieLens datasets
To study how the sparsity problem within 𝒴 is solved in the form of 𝒲, their
densities are compared to results of the impact of tag preference set size to We-Rank
performance shown in Section 4.3.5.1. Table 4.2 lists the density comparison of non-
zeros entries generated from 𝐷𝑡𝑟𝑎𝑖𝑛 on the 𝒴 and 𝒲 with 𝑣 = 50. From Table 4.2
and results in Section 4.3.5.1, it can be observed that from a sparse 𝒴, a denser 𝒲
can be generated to weigh each 𝒴 entry during the learning process, and that the
density of 𝒲 is influencing the performance. This observation is not just confirming
that the implementation of 𝒲 can solve the sparsity problem within 𝒴, but it also
affecting the success of We-Rank.
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
Tag Preference Set Size (v)
Den
sity
(%
)
Delicious
10-core
15-core
20-core
0 10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
Tag Preference Set Size (v)
Den
sity
(%
)
LastFM
10-core
15-core
20-core
0 10 20 30 40 50 60 70 80 90 1000
0.2
0.4
0.6
0.8
Tag Preference Set Size (v)
Den
sity
(%
)
CiteULike
10-core
15-core
20-core
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
Tag Preference Set Size (v)
Den
sity
(%
)
MovieLens
10-core
15-core
20-core
Chapter 4: Point-wise based Ranking Methods 115
Dataset Tensor
Density of
non-zero tensor entries (%)
𝟏𝟎-core 𝟏𝟓-core 𝟐𝟎-core
Delicious Primary 𝒴 0.0006 0.0014 0.0027
Weighted 𝒲 (𝑣 = 50) 0.0291 0.0678 0.1305
LastFM Primary 𝒴 0.0042 0.0092 0.0163
Weighted 𝒲 (𝑣 = 50) 0.1377 0.3017 0.5297
CiteULike Primary 𝒴 0.0010 0.0037 0.0095
Weighted 𝒲 (𝑣 = 50) 0.0392 0.1470 0.3897
MovieLens Primary 𝒴 0.0062 0.0283 0.0284
Weighted 𝒲 (𝑣 = 50) 0.3629 0.8399 1.6020
Table 4.2. The density comparison of non-zero entries generated from 𝐷𝑡𝑟𝑎𝑖𝑛 on the primary tensor 𝒴
and weighted tensor 𝒲 (𝑣 = 50)
4.3.5.3 Accuracy Performance
The comparison of recommendation accuracy between We-Rank and the
benchmarking methods are investigated using F1-Score on various Top-𝑁 positions.
Table 4.3, Table 4.4, Table 4.5, and Table 4.6 list the comparison for the Delicious,
LastFM, CiteULike and MovieLens datasets, respectively. Data in these tables
indicate that We-Rank achieves better performance in comparison to benchmarking
methods on the 15 and 20-cores of MovieLens dataset only (Table 4.6). We-Rank
implements weight values, calculated from the user’s tag usage likeliness, to either
reward or penalise each entry of the primary tensor. In other words, its performance
highly depends on how well the user’s tag usage likeliness is captured. Therefore,
looking at the densities listed in Table 4.2, We-Rank outperformance occurs when the
user’s tag usage likeliness are sufficiently captured, i.e. from the dataset with dense
𝒴, as demonstrated on the 15 and 20-cores of MovieLens dataset. This indicates that
We-Rank may perform better for a much larger 𝑝-core size, to result in dense 𝒴, on
other datasets.
116 Chapter 4: Point-wise based Ranking Methods
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20
MAX 2.18 2.28 2.34 2.26 2.68 2.83 2.70 2.67 3.26 3.46 3.37 3.22
PITF 2.18 2.41 2.34 2.28 2.40 2.64 2.62 2.54 2.91 2.98 3.01 2.87
CTS 1.96 2.22 2.31 2.26 2.40 2.62 2.65 2.58 2.71 2.88 2.80 2.85
We-Rank 1.94 2.06 2.06 1.73 2.13 2.30 2.19 2.12 2.59 2.77 2.65 2.54
Table 4.3. F1-Score at various Top-𝑁 positions on Delicious dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20
MAX 6.22 6.80 6.94 6.66 6.40 7.68 8.01 7.83 7.13 8.12 8.36 8.15
PITF 5.06 6.28 6.71 6.77 6.15 7.30 7.75 7.86 6.60 8.17 8.54 8.59
CTS 4.01 4.72 4.76 4.51 4.05 4.43 4.58 4.44 4.83 6.29 6.59 6.50
We-Rank 4.00 4.70 4.76 4.48 4.78 4.98 4.68 4.82 5.13 6.43 6.68 6.91
Table 4.4. F1-Score at various Top-𝑁 positions on LastFM dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20
MAX 2.90 2.93 3.00 2.91 5.42 4.97 4.58 4.53 7.68 7.54 7.31 7.17
PITF 3.89 4.07 3.97 3.86 4.84 4.87 4.47 4.24 5.79 5.58 5.43 5.41
CTS 2.73 2.63 2.55 2.43 4.83 4.40 3.91 3.69 7.00 6.31 5.68 5.34
We-Rank 2.19 2.41 2.45 2.51 2.86 3.33 3.64 3.68 7.14 6.41 6.06 6.99
Table 4.5. F1-Score at various Top-𝑁 positions on CiteULike dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20
MAX 5.91 6.36 6.14 5.98 6.93 6.91 6.55 6.07 9.92 9.37 8.73 8.43
PITF 3.50 4.34 4.47 4.66 4.11 5.45 5.64 5.80 6.58 7.23 7.34 7.55
CTS 5.06 5.56 5.40 5.26 6.09 6.13 5.79 5.68 7.64 8.50 7.98 7.39
We-Rank 4.35 4.96 5.17 5.24 6.99 6.91 6.74 6.80 9.85 9.59 9.05 9.00
Table 4.6. F1-Score at various Top-𝑁 positions on MovieLens dataset
Chapter 4: Point-wise based Ranking Methods 117
4.3.6 Summary of Weighted Tensor Factorization
In this section, a weighted tensor factorization method, named Recommendation
Ranking using Weighted Tensor (We-Rank), is proposed to address the sparsity
problem and accuracy challenges in using tensor models in a tag-based item
recommendation system.
The experimental results on various real-world datasets have demonstrated
that:
The implementation of weighted tensor 𝒲 solves the sparsity problem within
primary tensor 𝒴;
On dense datasets, We-Rank outperforms the benchmarking methods which
indicate that it efficiently utilises the weighted scheme to reward or penalise
each primary tensor model entries during the learning process. However, We-
Rank underperformance in comparison to the benchmarking methods on most
of the datasets is due to its sensitivity of how well the users tag usage
likeliness, that is a driving factor for populating the weight values, is
captured.
4.4 CHAPTER SUMMARY
This chapter has detailed the two point-wise based ranking recommendation methods
developed in this thesis, namely Tensor-based Item Recommendation using
Probabilistic Ranking (TRPR) and Recommendation Ranking using Weighted Tensor
(We-Rank), to solve the tag-based item recommendation task. Both methods
implement the boolean scheme to populate the entries of the tensor model.
TRPR focuses on improving the scalability during the tensor reconstruction
process and the recommendation accuracy after the tensor model has been
reconstructed. As demonstrated in the result, the implementation of an 𝑛-mode
block-striped (matrix) product makes the full tensor reconstruction scalable for large
datasets. TRPR outperforms the benchmarking methods in terms of accuracy with
variations of 𝑝-core and factorization techniques. We-Rank focuses on dealing with
the sparsity problem and improving the recommendation accuracy during the
learning-to-rank procedure. The experimental results have demonstrated that the
118 Chapter 4: Point-wise based Ranking Methods
implementation of weighted tensor 𝒲 solves the sparsity problem within primary
tensor 𝒴. We-Rank outperforms the benchmarking methods on dense datasets only,
since it is prone to how well the user’s tag usage likeliness, which is used to populate
the weight values, is captured.
Chapter 5: List-wise based Ranking Methods 119
Chapter 5: List-wise based Ranking
Methods
This chapter presents the developed list-wise based ranking recommendation
methods based on multi-graded data. It begins with an introduction of the list-wise
based ranking approach. The next two sections detail the proposed methods, i.e. Do-
Rank: Learning from multi-graded data and Go-Rank: Learning from graded-
relevance data, in which the novel User-Tag Set (UTS) and graded-relevance
schemes are implemented to interpret the tagging data. For each method, the
experiments are conducted and the results are then discussed.
5.1 INTRODUCTION
Solving the recommendation task using a list-wise based ranking approach, the
recommendation problem is seen as a ranking learning problem, i.e. predicting an
ordered list of items that will be of interest to a user, in which the predicted entries
depend on other corresponding entries (Liu, 2009; Mohan et al., 2011; Rendle,
2011). In this case, a recommendation model should be optimized with respect to the
ranking evaluation measure so that a list of items optimized from the ranking
evaluation measure perspective can be recommended to each user (Chapelle and Wu,
2010; Cremonesi et al., 2010; Xu and Li, 2007). The task of a tag-based item
recommendation system is to generate the list of items that may be of interest to a
user, by learning from the user’s past tagging behaviour that is recorded in tagging
data. The list of recommended items is sorted in descending order based on the
predicted preference score that reflects the preference level of a user for annotating
an item using a tag. Given that users usually show more interest in the few items at
the top of the list than those further down the list (Cremonesi et al., 2010), the order
of items in the recommendation list is crucial. In this case, the recommendation task
can be regarded as a ranking problem and solved by implementing the list-wise based
ranking approach.
120 Chapter 5: List-wise based Ranking Methods
5.1.1 Challenges
Implementing the list-wise based ranking approach in a tag-based item
recommendation system makes it natural to implement an interpretation scheme that
can leverage the tagging data as a ranking representation, for building the user
profile. In other words, such a scheme must apply a ranking constraint to interpret
the tagging data, resulting in non-binary relevance (or multi-graded) input data.
According to the set-based scheme, the observed tagging data can be
customarily interpreted as “positive” (or “relevant”) entries as users have implicitly
expressed their interest in items using tags. On the other hand, the non-observed
tagging data can reveal two types of information: (1) “negative” (or “irrelevant”)
entries that define the state where a user is not interested in the items; or (2) “null”
(or “indecisive”) entries that define the state where a user might be interested in
items in the future (Rendle, Balby Marinho, et al., 2009). These indecisive entries
need to be predicted and become candidate recommendations. Accordingly, entries
of tagging data can be labelled using the ordinal relevance set of
{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ resulting in
multi-graded data (Ifada and Nayak, 2014a, 2015; Rendle, Balby Marinho, et al.,
2009; Rendle and Schmidt-Thieme, 2010).
To implement the list-wise based ranking approach, a choice of optimization
criterion is crucial as it controls the latent factors learning process and it depends on
the type of data used (Chapelle and Wu, 2010; Liu, 2009). Mean Average Precision
(MAP), Mean Reciprocal Rank (MRR) and Discount Cumulative Gain (DCG) are
widely used measures for evaluating the ranking performance of a ranking model
(Liu, 2009). MAP is defined as the mean value of Average Precision (AP) that
considers the rank position of each relevant item. In this case, AP is the average of
precision scores at the positions where there are relevant items (Buckley and
Voorhees, 2000; Chapelle and Wu, 2010). MRR is the mean of Reciprocal Rank
(RR), which is equivalent to MAP in cases where the user wishes to see only one
relevant item (Craswell, 2009; Voorhees, 1999). The RR itself is the reciprocal of the
rank of the first relevant item.
The nature of MAP and MRR make them commonly used measures for
optimizing the binary relevance input data (Chapelle and Wu, 2010; Liu, 2009; Shi,
Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas,
Chapter 5: List-wise based Ranking Methods 121
Larson, Oliver, et al., 2012). They are deemed unsuitable to be used as the
optimization criterion for solving the recommendation task in tag-based
recommendation systems that entries of tagging data are interpreted as multi-graded
data. Instead, DCG is more widely used in the case of multi-graded data (Chapelle
and Wu, 2010; Liu, 2009; Weimer et al., 2007). DCG assumes that the higher the
ranked position of a relevant items, the more important it is to the user and the more
likely it is to be selected (Järvelin and Kekäläinen, 2002). However, directly
optimizing DCG across all users in the recommendation model is computationally
expensive. To deal with this, a fast learning approach is desirable for scalable
learning process.
Moreover, this thesis shows that not all non-observed entries, on each observed
user-tag set, should simply be regarded as “irrelevant” entries (Ifada and Nayak,
2014a). Those entries are not the “indecisive” entries as the items of the entries have
already been selected by the user in the past and, therefore, they are not required to
be predicted in the future. This opens up a new problem as to how to further detail
the entries of non-observed data, in which the tagging data is interpreted as a graded-
relevance data (Ifada and Nayak, 2016) since there exist transitional entries between
the “relevant” and “irrelevant” entries.
To learn from the graded-relevance data, DCG can no longer be used as the
optimization criterion for implementing the list-wise based ranking approach as it is
not suitable to handle the graded data with “transitional” entries (Ifada and Nayak,
2016). Alternatively, Graded Average Precision (GAP) (Robertson et al., 2010) has
been shown to effectively work as the generalisation of Average Precision (AP) for
the case of rating of explicit feedback data (Robertson et al., 2010). Yet, GAP has
never been used on a tag-based recommendation system.
5.1.2 Proposed Solutions
The proposed Do-Rank and Go-Rank methods fall under the category of list-wise
based ranking approach since they use the ranking evaluation measure as the
recommendation model optimization criterion. As a different type of data requires a
different optimization criterion in the learning-to rank-approach (Chapelle and Wu,
2010; Liu, 2009), it makes it obvious that the two alternative forms of the interpreted
122 Chapter 5: List-wise based Ranking Methods
tagging data require different optimization criterion for learning the recommendation
model.
Do-Rank, the first developed method in this chapter, focuses on learning from
multi-graded data, in which the Discount Cumulative Gain (DCG) ranking evaluation
measure is used as the optimization criterion; whereas Go-Rank focuses on learning
from the graded-relevance data, in which the Graded Average Precision (GAP)
ranking evaluation measure is used as the optimization criterion. For constructing the
user profile representation, the developed Do-Rank and Go-Rank methods implement
the novel ranking based interpretation schemes, namely User-Tag Set (UTS) and
graded-relevance schemes, respectively. The next two sections present each of the
methods.
5.2 DO-RANK: LEARNING FROM MULTI-GRADED DATA
5.2.1 Overview
The novel DCG Optimization for Learning-to-Rank (Do-Rank) method is developed
for learning from multi-graded data on a tag-based item recommendation model, in
which the DCG ranking evaluation measure is used as the optimization criterion. In
other words, this method generates an optimal list of recommended items from the
DCG perspective for all users. This section also presents the proposed User-Tag Set
(UTS) scheme based on set-based (Rendle, Balby Marinho, et al., 2009) to construct
the initial third-order tensor model for representing the user profile.
Previous work (Weimer et al., 2007) has proposed using Normalized DCG
(NDCG) as the recommendation model optimization criterion, yet the problem
solved was for a recommendation system that uses ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation
rating data entries as explicit feedback data. This means that the list of
recommendations is generated by ranking the predicted rating scores inferred from
the non-observed ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ relations (Balakrishnan and Chopra, 2012; Weimer et
al., 2007). In contrast, this thesis deals with a quite different and difficult problem in
comparison to the prior work as the tag-based recommendation systems use the
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation tagging data entries as implicit feedback data.
Chapter 5: List-wise based Ranking Methods 123
The key challenges faced are inferring the latent relationships of ternary input
data and predicting the preference score of each entry (Ifada and Nayak, 2014a;
Rendle, Balby Marinho, et al., 2009). In contrast to the explicit feedback data, which
has one preference or rating score only on each observed user-item set, the implicit
feedback data of the tag-based recommendation system has multiple preference
scores. This means that the list of recommendations needs to be generated by ranking
the predicted preference scores of items under all tags that may be of interest to a
user. Consequently, the preference scores calculated by the recommendation system
must infer the tag that will influence the user for choosing the recommended item.
The next three sub-sections detail the three main processes in Do-Rank: (1)
user profile construction, i.e. tensor model construction; (2) learning-to-rank
procedure, i.e. latent factors generation via tensor factorization to derive the
relationships inherent in the model; and (3) recommendation generation. The last two
sub-sections present empirical evaluation and summary of the method.
5.2.2 User Profile Construction
The user profile construction is the process of building an initial tensor to model the
multi-dimension data. The Do-Rank method uses the proposed User-Tag Set (UTS)
scheme, based on set-based (Rendle, Balby Marinho, et al., 2009) to populate the
tensor model.
Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set
of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. From the tagging data
𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the tagging activity of user 𝑢
to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in
which users have expressed their interest to items in the past, by annotating those
items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed tagging data is
very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|. Initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed to represent
the user profile, where 𝑄, 𝑅, and 𝑆 are the size of set of users, items and tags
respectively. Each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the
relevance grade of tagging activity.
The set-based scheme is a well-known ranking based interpretation scheme
that has shown good performance when implemented on a pair-wise ranking based
124 Chapter 5: List-wise based Ranking Methods
approach (Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010).
In contrast to the boolean scheme that interprets tagging data as binary entries
(Symeonidis et al., 2010), the set-based scheme interprets the tagging data as multi-
graded data of three distinct entries that are revealed from the observed and non-
observed entries, i.e. “relevant”, “irrelevant”, and “indecisive”. The set-based
scheme solves the two shortcomings of the boolean scheme: (1) the sparsity problem
– the 0 values dominate the data, and (2) the overfitting problem – all non-observed
entries are denoted as 0 (Rendle, Balby Marinho, et al., 2009).
The set-based scheme differentiates the relevance grade of the resulted multi-
graded data by applying a ranking constraint. Given the “relevant” and “irrelevant”
entries, the scheme infers that, for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, user 𝑢 is less favourable to
use tag 𝑡 for annotating any items other than those of “relevant” entries (Gemmell et
al., 2011). For that reason, higher ordinal relevance values are assigned to the
“relevant” entries and labelled with “1” value, whereas the “irrelevant” entries are
labelled with “–1” value. The “0” value is used to label the “indecisive” entries, i.e.
entries to be predicted for generating the recommendations. The rules of set-based
scheme relevance grade labelling to generate the entries of tensor 𝒴 can be
formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(5.1)
Example 5.1: Tagging Data Interpretation using set-based scheme.
An example on how the set-based scheme interprets tagging data is illustrated by
using the entries of User 1 (𝑢1) in the toy example illustrated in Figure 3.2. Recall
that the toy example represents a tensor model that holds the record of 𝐴𝑜𝑏 and
𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}.
Each slice of the tensor represents a user matrix that contains the user tag usage for
each item. The “+” symbols represent the 𝐴𝑜𝑏 entries, for instance, the observed
tagging data example of Figure 3.2 shows that user 𝑢1 has annotated item 𝑖2 using
tag 𝑡1. Figure 5.1(a) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 as the
representation of user profile, in which entries are generated from the tagging data by
implementing the set-based interpretation scheme as formulated in Equation (5.1).
Chapter 5: List-wise based Ranking Methods 125
In Figure 3.2, the observed entries show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his
interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, the tagging data
is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖3, 𝑖4} to be annotated using 𝑡1; (2)
{𝑖2} more than {𝑖1, 𝑖3, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3} more than {𝑖1, 𝑖2, 𝑖4}
to be annotated using 𝑡4. The representation, as shown in Figure 5.1(a), can then be
generated as: (1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “1”) while
those of {𝑖1, 𝑖3, 𝑖4} are “irrelevant” (or graded as “-1”); (2) on (𝑢1, 𝑡3) set, entry of
{𝑖2} is “relevant” while those of {𝑖1, 𝑖3, 𝑖4} are “irrelevant”; and (3) on (𝑢1, 𝑡4) set,
entry of {𝑖3} is “relevant” while those of {𝑖1, 𝑖2, 𝑖4} are “irrelevant”. Notice that the
“indecisive” entries, graded as “0”, are revealed from the non-observed sets, i.e.
(𝑢1, 𝑡2) and (𝑢1, 𝑡5).
User 30 -1 0 -1 -1
0
0
0
-1 0 -1 -1
1 0 -1 -1
-1 0 1 1
User 2
1 -1 0 -1 0
-1
-1
-1
-1 0 -1 0
1 0 1 0
-1 0 -1 0
User 1-1 0 -1 -1 0
1
-1
-1
1 -1 0
0 -1 1 0
0 -1 -1 0
tag
ite
m 0
User 30 -1 0 -1 -1
0
0
0
-1 0 -1 -1
1 0 0 0
0 0 1 1
User 2
1 0 0 0 0
-1
0
-1
-1 0 -1 0
1 0 1 0
-1 0 -1 0
User 1-1 0 -1 -1 0
1
0
-1
1 0 0
0 0 1 0
0 -1 -1 0
tag
ite
m 0
(a) (b)
Figure 5.1. The initial tensor 𝒴 ∈ ℝ3×4×5 , as the representation of user profile, which entries are
generated by implementing the: (a) set-based and (b) UTS interpretation schemes
Despite the capability of the set-based scheme in generating multi-graded
values from the tagging data, this scheme overgeneralises its interpretation, since it
completely disregards the fact that the “irrelevant” items, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set,
have possibly been annotated by the user using other tags. Motivated by this
shortcoming, preliminary work (Ifada and Nayak, 2014a) was conducted to study the
variation of the set-based scheme, in which the impact of regarding the user’s
previously annotated items in the tagging data interpretation was investigated. In the
study, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, items that have not been tagged by user 𝑢 are
considered as “irrelevant” entries, whereas those that have been tagged by a user are
considered as “indecisive” entries. This variant offers two advantages over the set-
based scheme. Firstly, it can interpret the tagging data more efficiently (Ifada and
Nayak, 2014a) as it infers the user tagging history more intensely by taking into
126 Chapter 5: List-wise based Ranking Methods
account the user’s collection of previously selected items and, therefore, only those
that are not within the collection are regarded as “irrelevant” entries. For this reason,
this variant of a set-based scheme was called non-user-collection on User-Tag Set
scheme. However, for simplicity, this scheme is called User-Tag Set (UTS) scheme
in this thesis. Next, the implementation of a user’s item collection constraint makes
this scheme result in less dense interpreted entries in comparison to that of the set-
based scheme. In other words, there is less data that needs to be learned from the
model. The rules of UTS scheme relevance grade labelling to generate the entries of
tensor 𝒴 can be formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐼\{𝑖|(𝑢, 𝑖,∗) ∈ 𝐴𝑜𝑏} 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(5.2)
Example 5.2: Tagging Data Interpretation using UTS scheme.
An example on how the UTS scheme interprets tagging data is illustrated by using
the entries of User 1 (𝑢1) in Figure 3.2. Recall that the toy example represents a
tensor model that holds the record of 𝐴𝑜𝑏, 𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3},
𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. Figure 5.1(b) illustrates the constructed
initial tensor 𝒴 ∈ ℝ3×4×5 as the representation of user profile, in which entries are
generated from the tagging data by implementing the UTS interpretation scheme as
formulated in Equation (5.2).
In Figure 3.2, the observed entries show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his
interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, the tagging data
is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡1; (2)
{𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3} more than {𝑖1, 𝑖4} to be
annotated using 𝑡4. Note that, in the same order of exemplification, the same
statement cannot be determined for: (1) {𝑖3} as it was annotated using 𝑡4; (2) {𝑖3} as
it was annotated using 𝑡4; and (3) {𝑖2} as it was annotated using 𝑡1 and 𝑡3.
As shown in Figure 5.1(b), the user profile representation can then be generated as:
(1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “1”) while those of {𝑖1, 𝑖4}
are “irrelevant” (or graded as “-1”); (2) on (𝑢1, 𝑡3) set, entry of {𝑖2} is “relevant”
while those of {𝑖1, 𝑖4} are “irrelevant”; and (3) on (𝑢1, 𝑡4) set, entry of {𝑖3} is
“relevant” while those of {𝑖1, 𝑖4} are “irrelevant”. Note that the “indecisive” entries,
Chapter 5: List-wise based Ranking Methods 127
graded as “0”, are revealed from: (1) the entries of observed (𝑢1, 𝑡1), (𝑢1, 𝑡3), and
(𝑢1, 𝑡4) sets that are not either “relevant” or “irrelevant”; and (2) the entries of the
non-observed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5).
The examples of the set-based scheme (Figure 5.1(a)) and the UTS scheme
(Figure 5.1(b)) show that the former overgeneralises the “irrelevant” entries. The set-
based scheme assumes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other than those
appearing in observed entries are “irrelevant” and this disregards the fact that those
items have been annotated by the user using other tags. In contrast, the UTS scheme
states that the “irrelevant” entries should only be interpreted from items that have not
been tagged by the user, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set.
5.2.3 Learning-to-Rank Procedure
This section details how the latent factors corresponding to each dimension of tensor
𝒴 are learned and generated.
5.2.3.1 Optimization Criterion and Factorization Technique
Discounted Cumulative Gain (DCG) is a widely used measure for evaluating the
performance of ranking models that include multiple relevance graded data (Chapelle
and Wu, 2010), compared to other measures such as Mean Average Precision (MAP)
and Mean Reciprocal Rank (MRR) as they are more commonly used to handle binary
relevance data. Since the recommendation model is optimized with respect to the
evaluation measure such that it can generate a quality Top-𝑁 recommendation list
(Chapelle and Wu, 2010; Cremonesi et al., 2010; Xu and Li, 2007), the
recommendation task now becomes to recommend an optimal item list (from the
DCG perspective) to users using the latent factors.
DCG is significantly useful for solving the recommendation task; it generates a
quality Top-𝑁 recommendation list since it allows the correct order of higher ranked
items to be more important than that of the lower ranked items (Balakrishnan and
Chopra, 2012; Chapelle and Wu, 2010). In other words, the higher positions have
more influence on the DCG score. The DCG score for a user 𝑢 across all items under
tag 𝑡 can be defined as:
𝐷𝐶𝐺𝑢,𝑡 ≔ ∑2
𝑦𝑢,𝑖,𝑡−1
𝑙𝑜𝑔2(1+𝑟𝑢,𝑖,𝑡)𝑖∈𝐼 (5.3)
128 Chapter 5: List-wise based Ranking Methods
where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned a value from the ordinal relevance
set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {1, 0, −1}) based on the initial tensor
model. The 𝑟𝑢,𝑖,t is the ranking position of item 𝑖 for user 𝑢 with tag 𝑡 and, is
approximated using �̂�𝑢,𝑖,𝑡, i.e. the predicted preference score that reflects the
preference level of user 𝑢 for annotating item 𝑖 using tag 𝑡, calculated from the
latent factors. The numerator of Equation (5.3) is the gain function that gives weight
to the items based on their relevance grade, while the denominator is the discount
function that makes items lower down in the ranked list, contribute less to the score.
The DCG score of all users over all items under all tags can be defined as:
𝐷𝐶𝐺 ≔1
𝑄𝑆∑ ∑ ∑
2𝑦𝑢,𝑖,𝑡−1
𝑙𝑜𝑔2(1+𝑟𝑢,𝑖,𝑡)𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 (5.4)
From the tensor 𝒴, latent factors matrices are generated and learned in order to
derive the latent relationships between the dimensions of users, items and tags. To
develop Do-Rank, the CP model (Kolda and Bader, 2009) is used as the factorization
technique as well as the predictor function model. CP is a well-known factorization
technique that has been shown to be less expensive in both memory and time
consumption compared to Tucker (Kolda and Bader, 2009).
Y »
(Q x R x S)
+m
(3)1
m(1)
1
m(2)
1
m(3)
2
m(1)
2
m(2)
2
+m
(3)F
m(1)
F
m(2)
F
. . .
Figure 5.2. The CP factorization model for third-order tensor
As illustrated in Figure 5.2, CP factorizes a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 into
a sum of latent factor rank-one of 𝑚𝑓(1)
∈ ℝ𝑄, 𝑚𝑓(2)
∈ ℝ𝑅, and 𝑚𝑓(3)
∈ ℝ𝑆 for
𝑓 = 1,… , 𝐹, where 𝐹 is the column size of the corresponding latent factors. These
latent factors are used in calculating the predicted score that reflects the preference
Chapter 5: List-wise based Ranking Methods 129
level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡. Recall that a CP model can
also be considered as a special case of Tucker with a diagonal core tensor (Kolda and
Bader, 2009), as illustrated in Figure 4.4. The predicted preference score is
calculated as:
�̂�𝑢,𝑖,𝑡 ∶= ∑ 𝑚𝑢,𝑓(1)
∙ 𝑚𝑖,𝑓(2)
∙ 𝑚𝑡,𝑓(3)𝐹
𝑓=1 = ⟦𝑀(1), 𝑀(2), 𝑀(3)⟧ (5.5)
5.2.3.2 Ranking Smoothing
It can be seen from Equation (5.4) that DCG is dependent on the ranking positions.
The rankings change in a non-smooth way with respect to the predicted preference
scores calculated based on the model parameters (i.e. latent factor matrices). The
non-smooth function of DCG makes difficult the application of standard
optimization approaches such as gradient descent since they require smoothness in
the objective function (Chapelle and Wu, 2010; Wu et al., 2009).
The proposed method Do-Rank solves the non-smoothing problem of DCG by
approximating the ranking position 𝑟𝑢,𝑖,𝑡 using a smoothing function with respect to
the model parameters. Inspired by the learning-to-rank approach from the field of
Information Retrieval (Chapelle and Wu, 2010; Wu et al., 2009), 𝑟𝑢,𝑖,𝑡 is
approximated by the following smoothing function:
𝑟𝑢,𝑖,𝑡 ≈ 1 + ∑ 𝜎(Δ�̂�)𝑗≠𝑖 (5.6)
where 𝜎(𝑥) is the logistic function 1
1+𝑒−𝑥, and Δ�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡 is the predicted
preference scores difference for two items calculated from the latent factors.
Substituting Equation (5.6) to Equation (5.4), the smoothed approximation of DCG is
obtained as:
𝑠𝐷𝐶𝐺 ≔1
𝑄𝑆∑ ∑ ∑
2𝑦𝑢,𝑖,𝑡−1
1+log2(∑ 𝜎(Δ�̂�)𝑗≠𝑖 )𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 (5.7)
Figure 5.3 shows the comparison between DCG, calculated using Equation (5.4), and
the smoothed approximation of DCG (𝑠𝐷𝐶𝐺), calculated using Equation (5.7).
130 Chapter 5: List-wise based Ranking Methods
Figure 5.3. The comparison between DCG and the smoothed approximation of DCG (𝑠𝐷𝐶𝐺)
5.2.3.3 Latent Factors Generation
Given Equation (5.6), the resultant objective function can now be formulated as:
𝐿(Θ) ≔ ∑ ∑ ∑2
𝑦𝑢,𝑖,𝑡−1
1+log2(∑ 𝜎(Δ�̂�)𝑗≠𝑖 )𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆Θ‖Θ‖𝐹2 (5.8)
where 𝜆Θ is the regularization coefficient corresponding to 𝜎Θ as model parameters
that controls overfitting. Note that the constant coefficient 1
𝑄𝑆 in 𝑠𝐷𝐶𝐺 (Equation
(5.7)) can be neglected since it has no influence on the optimization. The gradient
descent is performed to optimize the objective function in Equation (5.8). Given a
case (𝑢, 𝑖, 𝑡) with respect to the model parameters {𝑀(1), 𝑀(2), 𝑀(3)}, the gradient of
𝑠𝐷𝐶𝐺 can be achieved by computing the derivation of Equation (5.8) as follows:
𝜕𝐿
𝜕𝜃=
𝜕
𝜕𝜃(∑ ∑ ∑
2𝑦𝑢,𝑖,𝑡−1
1+log2((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 ) − 𝜆𝜃𝜃 (5.9)
which can be rewritten as:
𝜕𝐿
𝜕𝜃= ∑ ∑ ∑
−(2𝑦𝑢,𝑖,𝑡−1)[
1
((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ) ln2)(
𝜕
𝜕𝜃(∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))]
[1+log2((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))]2𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.10)
Given that:
𝜕
𝜕𝜃(∑ 𝜎(Δ�̂�)𝑗≠𝑖 ) = (∑ (−𝜎(Δ�̂�) + (𝜎(Δ�̂�))
2)𝑗≠𝑖 )
𝜕
𝜕𝜃Δ�̂� (5.11)
1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
DCG vs Smoothed DCG (sDCG)
Point
Sc
ore
DCG
Smoothed DCG (sDCG)
Chapter 5: List-wise based Ranking Methods 131
And replacing 𝜎(𝛥�̂�) as 𝛿, for notational convenience, the resultant gradient of
𝑠𝐷𝐶𝐺 is obtained as:
𝜕𝐿
𝜕𝜃= ∑ ∑ ∑
−(2𝑦𝑢,𝑖,𝑡−1)[
1
(ln2∑ 𝛿𝑗≠𝑖 )(∑ (−𝛿+𝛿2)𝑗≠𝑖
𝜕
𝜕𝜃Δ�̂�)]
[1+log2(∑ 𝛿𝑗≠𝑖 )]2𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.12)
Equation (5.12) confirms that only 𝜕
𝜕𝜃Δ�̂� needs to be computed, with respect to
the model parameters, to implement the 𝑠𝐷𝐶𝐺 optimization. However, it can also be
noticed that directly optimizing 𝑠𝐷𝐶𝐺 across all users in the recommendation model
is computationally expensive, since the pair-wise predicted preference score
difference Δ�̂� between each item and all other items in the system needs to be
calculated.
Do-Rank proposes to solve the computation problem by employing a fast
learning approach. The basic idea of the approach is to optimize 𝑠𝐷𝐶𝐺 by computing
only the pair-wise predicted preference score difference Δ�̂� between items of
“relevant” entries and those of “irrelevant” entries, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. In this
case, the user’s “relevant” (or positive) and “irrelevant” (or negative) items are
inferred from each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set and then defined as: (1) 𝑍𝑃 = {𝑖|𝑦𝑢,𝑖,𝑡 = 1},
positive items derived from the observed data, and (2) 𝑍𝑁 = {𝑖|𝑦𝑢,𝑖,𝑡 = −1},
negative items derived from the items that have not been tagged by user 𝑢 using any
other tags. The resultant objective function can now be formulated by:
𝐿(Θ) ≔ ∑ ∑ ∑2
𝑦𝑢,𝑖,𝑡−1
1+log2(∑ 𝜎(Δ�̂�)𝑗∈𝑍𝑁 )𝑖∈𝑍𝑃𝑡∈𝑇𝑢∈𝑈 − 𝜆Θ‖Θ‖𝐹2 (5.13)
where Δ�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡.
The gradient of 𝑠𝐷𝐶𝐺 given a case (𝑢, 𝑖, 𝑗, 𝑡) with respect to the model
parameter = {𝑚𝑢(1)
, 𝑚𝑖(2)
, 𝑚𝑗(2)
, 𝑚𝑡(3)
} is given by Equation (5.14).
𝜕𝐿
𝜕𝜃= ∑ ∑ ∑
−(2𝑦𝑢,𝑖,𝑡−1)[
1
(ln2 ∑ 𝛿𝑗∈𝑍𝑁 )(∑ (−𝛿+𝛿2)𝑗∈𝑍𝑁
𝜕
𝜕𝜃Δ�̂�)]
[1+log2(∑ 𝛿𝑗∈𝑍𝑁 )]2𝑖∈𝑍𝑃𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.14)
where 𝛿 = 𝜎(Δ�̂�).
132 Chapter 5: List-wise based Ranking Methods
To apply the 𝑠𝐷𝐶𝐺 optimization, the gradients for the model based on its
parameters only have to compute the gradient of 𝜕
𝜕𝜃Δ�̂� as follows:
𝜕Δ�̂�
𝜕𝑚𝑢(1) = (𝑚𝑖
(2)⨀𝑚𝑡
(3)− 𝑚𝑗
(2)⨀𝑚𝑡
(3)) (5.15)
𝜕Δ�̂�
𝜕𝑚𝑖(2) = (𝑚𝑢
(1)⨀𝑚𝑡
(3)) (5.16)
𝜕Δ�̂�
𝜕𝑚𝑗(2) = −(𝑚𝑢
(1)⨀𝑚𝑡
(3)) (5.17)
𝜕Δ�̂�
𝜕𝑚𝑡(3) = (𝑚𝑢
(1)⨀𝑚𝑖
(2)− 𝑚𝑢
(1)⨀𝑚𝑗
(2)) (5.18)
where ⨀ denotes an element-wise product. It can be noted, from Equation (5.14),
that to optimize 𝑠𝐷𝐶𝐺 across all users and under all tags, Δ�̂� needs to be computed
for each 𝑍𝑃 only that is less computationally expensive than computing Δ�̂� for each
𝑅, since |𝑍𝑃| ≪ 𝑅. The Do-Rank learning algorithm is outlined in Figure 5.4.
5.2.3.4 Complexity Analysis and Convergence
The complexity of learning process for a single iteration is analysed. Complexity of
the Do-Rank with fast learning (as illustrated in Figure 5.4) is 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑝�̃�))
where 𝑝 and �̃� denote the average number of 𝑍𝑃 and 𝑍𝑁 per (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. Since
𝑝, �̃� ≪ 𝑅, the Do-Rank complexity now becomes linear to the size of 𝑅, instead of
exponential to 𝑅 as it would have been in the absence of fast learning and the UTS
scheme.
The objective function of Do-Rank is optimizing 𝑠𝐷𝐶𝐺 (Equation (5.7)), the
smoothed approximation of DCG (Equation (5.4)). Do-Rank uses the iterated DCG
scores during the optimization process as the termination criterion (Shi, Karatzoglou,
Baltrunas, Larson, Hanjalic, et al., 2012), instead of using the conventional criteria
such as the number of iterations (Shi, Karatzoglou, Baltrunas, Larson, Oliver, et al.,
2012) and the convergence rate (Rendle and Schmidt-Thieme, 2010). The
optimization process is terminated when DCG scores start to decline, where usually
it only requires less than 20 iterations to reach this stage. Since the number of
required iterations is quite small, this does not affect Do-Rank complexity.
Chapter 5: List-wise based Ranking Methods 133
1: Algorithm: Do-Rank Learning
2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, learning rate 𝛼, factor matrix
column size 𝐹, regularization 𝜆, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
3: Output: Latent factors 𝑀(1), 𝑀(2), 𝑀(3)
4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆
5: Populate 𝒴 using Equation (5.2) 6: 𝑍𝑃 = {𝑖|𝑦𝑢,𝑖,𝑡 = 1}, 𝑍𝑁 = {𝑖|𝑦𝑢,𝑖,𝑡 = −1}
7: Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹 , 𝑀(2)(0)
∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0
8: 𝑔0 = 𝐷𝐶𝐺 based 𝒴 and 𝑀(1)(0), 𝑀(2)(0)
, 𝑀(3)(0)
9: repeat
10: for 𝑢 ∈ 𝑈 do
11: 𝑚𝑢(1)
⟵ 𝑚𝑢(1)
+ 𝛼𝜕𝐿
𝜕𝑚𝑢(1) based on Equation (5.14) and (5.15)
12: for 𝑡 ∈ 𝑇 do
13: 𝑚𝑡(3)
⟵ 𝑚𝑡(3)
+ 𝛼𝜕𝐿
𝜕𝑚𝑡(3) based on Equation (5.14) and (5.18)
14: for 𝑢 ∈ 𝑈 do
15: for 𝑡 ∈ 𝑇 do 16: for 𝑖 ∈ 𝑍𝑃 do
17: for 𝑗 ∈ 𝑍𝑁 do
18: 𝑚𝑖(2)
⟵ 𝑚𝑖(2)
+ 𝛼𝜕𝐿
𝜕𝑚𝑖(2) based on Equation (5.14) and (5.16)
19: 𝑚𝑗(2)
⟵ 𝑚𝑗(2)
+ 𝛼𝜕𝐿
𝜕𝑚𝑗(2) based on Equation (5.14) and (5.17)
20: + + ℎ
21: 𝑔 = 𝐷𝐶𝐺 based 𝒴 and 𝑀(1)(ℎ), 𝑀(2)(ℎ)
, 𝑀(3)(ℎ)
22: if 𝑔 − 𝑔0 ≤ 0 23: Break
24: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
Figure 5.4. The Do-Rank learning algorithm
5.2.4 Recommendation Generation
Since Do-Rank implements a ranking-based interpretation scheme and learning-to-
rank model, the resulted latent factors of Do-Rank can be directly used for generating
the Top-𝑁 list of item recommendations for each target user. Using Equation (5.5),
the predicted preference score �̂�𝑢,𝑖,𝑡 of target user 𝑢 to item 𝑖 on tag 𝑡 is calculated.
The candidate items of user 𝑢 are identified based on the maximum �̂�𝑢,𝑖,𝑡 of each
user-item set. The score of candidate items are then ranked in descending order for
generating the list of the recommended items.
134 Chapter 5: List-wise based Ranking Methods
5.2.5 Empirical Evaluation
The proposed Do-Rank method and benchmarking methods are evaluated by 5-fold
cross-validation experimentation. For each fold, each dataset is randomly divided
into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of
posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not overlap in posts, i.e., there exist no triplets for a
user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet (𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The
recommendation task is to predict and rank the Top-𝑁 items for the users present in
𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured and reported over the average values
on all five runs using Normalized DCG (NDCG) and Average Precision (AP),
presented at various Top-𝑁 positions, as well as Mean Average Precision (MAP).
The performance of Do-Rank is compared with benchmarking methods,
including MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme,
2010), and CTS (Kim et al., 2010). It is to be noted that comparison amongst all
proposed methods is presented in Chapter 6.
To enable meaningful comparisons, the parameter values for all methods are
tuned on randomly selected 25% of all the observed data available in 𝐷𝑡𝑟𝑎𝑖𝑛. For all
tensor-based methods, the size of latent factor matrix 𝐹 is set to 128 as the
recommendation quality usually does not benefit from more than that value. The
learning rate 𝛼 and regularization 𝜆 for PITF are set as 0.01 and 0.00005,
respectively, as suggested in the article (Ifada and Nayak, 2015). For CTS, the
neighbourhood size 𝑘 and model size 𝑤 are all searched from the grid of
{10,20,30,40,50,60,70,80,90,100}. The learning rate 𝛼 and regularization 𝜆 for Do-
Rank are adjusted from 0.01 to 0.1 and 0.00001 to 0.00005, respectively.
5.2.5.1 Accuracy Performance
The recommendation performance comparisons of the proposed Do-Rank and the
benchmarking methods on each dataset are listed in Table 5.1, Table 5.2, Table 5.3,
and Table 5.4. It can be observed that Do-Rank outperforms the benchmarking
methods in terms of NDCG, AP and MAP on most datasets. It can be noted that the
higher the Top-𝑁 position, the less the NDCG score is, while in contrast, the AP
score is higher on less Top-𝑁 position.
Chapter 5: List-wise based Ranking Methods 135
Compared to PITF that employs an AUC-based optimization approach which
gives equal penalty to the mistakes at the top and bottom list of recommendations
(Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012), Do-Rank enhances the
Top-𝑁 recommendation performance by optimizing the top-biased measure DCG.
The results confirm that optimizing Top-𝑁 recommendation evaluation measure for
building the learning model will improve the recommendation performance.
Additionally, PITF is a pair-wise ranking model that aims to get the ranking order
within each pair correctly, while Do-Rank employs the list-wise ranking model
which aims to get the correct order of all items in the recommendation list. Lastly,
Do-Rank outperformance over CTS proves that the three-dimensional characteristic
of tagging data must be captured so that the many-to-many relationships that exist
among the dimensions can be kept rather than projecting the three-dimension into
two-dimensions (Symeonidis et al., 2010).
136 Chapter 5: List-wise based Ranking Methods
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75
PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29
CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07
Do-Rank 2.31 1.98 5.22 5.72 2.78 2.36 2.09 5.25 5.93 3.54 2.69 2.26 6.02 6.64 4.63
Table 5.1. NDCG, AP, and MAP on Delicious dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11
PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52
CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59
Do-Rank 8.15 7.05 14.12 14.45 6.50 8.55 7.56 15.51 15.68 7.09 9.40 8.29 17.13 17.52 7.61
Table 5.2. NDCG, AP, and MAP on LastFM dataset
Chapter 5: List-wise based Ranking Methods 137
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.50 6.88 3.97 3.48 8.31 9.24 8.82
PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.20 7.18 4.65 3.74 8.54 9.91 9.78
CTS 2.20 1.87 4.94 5.34 4.31 3.87 3.20 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33
Do-Rank 3.08 2.77 5.42 5.95 4.47 4.95 4.18 10.28 10.90 9.31 5.17 4.49 10.62 11.64 11.43
Table 5.3. NDCG, AP, and MAP on CiteULike dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.23 5.10 10.50 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.40
PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.40 9.98 5.98 6.28 5.65 12.26 12.90 8.73
CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.30
Do-Rank 6.28 5.39 11.00 11.47 6.22 8.61 7.13 16.49 16.96 9.81 11.34 9.21 21.06 21.68 14.28
Table 5.4. NDCG, AP, and MAP on MovieLens dataset
138 Chapter 5: List-wise based Ranking Methods
5.2.5.2 Impact of UTS scheme
The impact of implementing UTS scheme is investigated by comparing the tensor 𝒴
entries population generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using the proposed UTS scheme with those
of the boolean and set-based schemes. The statistics of the tensor entries population
listed in Table 5.5 shows that the “relevant” entries population of all schemes are the
same. Note that the boolean scheme generates the least variety of distinct entries as it
overfits the “irrelevant” and “indecisive” entries of the non-observed tagging data
(Ifada and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009).
It can be easily perceived from Table 5.5 that both the set-based and UTS
scheme differentiate the “irrelevant” and “indecisive” entries as two distinguished
entries, however the entries population distributions are not the same. The reason for
this is because the UTS scheme implements a user’s item collection constraint for
interpreting the non-observed tagging data, as previously described in Section 5.2.2.
This is unlike the set-based scheme, which interprets all items other than those
appearing in “relevant” entries as “irrelevant” entries. In other words, the set-based
scheme includes some relationships that are not meant to be. Therefore, the
“irrelevant” entries population of UTS is less than that of set-based scheme. On the
contrary, the “indecisive” entries population of UTS is more than that of set-based
scheme.
Chapter 5: List-wise based Ranking Methods 139
Dataset Interpretation
Scheme Distinct Entry
Tensor Population (%)
𝟏𝟎-core 𝟏𝟓-core 𝟐𝟎-core
Delicious boolean Relevant 0.0006 0.0014 0.0027
Irrel./Indecisive 99.9994 99.9986 99.9973
set-based Relevant 0.0006 0.0014 0.0027
Irrelevant 0.4377 0.5670 0.6737
Indecisive 99.5617 99.4316 99.3236
UTS Relevant 0.0006 0.0014 0.0027
Irrelevant 0.4285 0.5499 0.6477
Indecisive 99.5709 99.4487 99.3496
LastFM boolean Relevant 0.0042 0.0092 0.0163
Irrel./Indecisive 99.9958 99.9908 99.9837
set-based Relevant 0.0042 0.0092 0.0163
Irrelevant 1.2937 1.7233 2.1553
Indecisive 98.7021 98.2675 97.8284
UTS Relevant 0.0042 0.0092 0.0163
Irrelevant 1.2538 1.6443 2.0242
Indecisive 98.7420 98.3465 97.9595
CiteULike boolean Relevant 0.0010 0.0037 0.0095
Irrel./Indecisive 99.9990 99.9963 99.9905
set-based Relevant 0.0010 0.0037 0.0095
Irrelevant 0.3176 0.4777 0.5995
Indecisive 99.6814 99.5186 99.3910
UTS Relevant 0.0010 0.0037 0.0095
Irrelevant 0.3039 0.4450 0.5472
Indecisive 99.6951 99.5513 99.4433
MovieLens boolean Relevant 0.0062 0.0283 0.0284
Irrel./Indecisive 99.9938 99.9717 99.9716
set-based Relevant 0.0062 0.0283 0.0284
Irrelevant 1.6114 2.0969 2.4147
Indecisive 98.3824 97.8748 97.5569
UTS Relevant 0.0062 0.0283 0.0284
Irrelevant 1.4651 1.8476 2.0418
Indecisive 98.5287 98.1241 97.9298
Table 5.5. The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using
boolean, set-based and UTS schemes
140 Chapter 5: List-wise based Ranking Methods
5.2.5.3 Scalability
The scalability of Do-Rank is examined in terms of its learning running time to study
the impact of implementing the UTS scheme for the fast learning approach on
optimizing 𝑠𝐷𝐶𝐺 as defined in Equation (5.14). The examination is demonstrated on
the 10-core of the Delicious and MovieLens datasets, as implementation on other
datasets and cores show similar results. The learning running time is measured on a
single iteration at various scales, i.e. 10% to 100% of training set (𝐷𝑡𝑟𝑎𝑖𝑛).
Figure 5.5 shows that the running time of the “fast learning” approach is linear
to the size of data on both datasets, i.e. determined by the size of items 𝑅. The
“original learning” approach, i.e. optimizing 𝑠𝐷𝐶𝐺 without implementing the fast
learning approach requires more learning time since the computational complexity is
determined by 𝑅2, as previously described in Section 1.1.1.
(a) (b)
Figure 5.5. The Do-Rank scalability
5.2.5.4 Convergence
The learning algorithm convergence of Do-Rank is demonstrated on the MovieLens
20-core set, however, the convergence behaviours of other datasets and cores are the
same. Figure 5.6(a) and Figure 5.6(b) show the evolution of DCG@10 across
iterations on the training (𝐷𝑡𝑟𝑎𝑖𝑛) and test (𝐷𝑡𝑒𝑠𝑡) sets respectively. DCG increases
through early iterations on both sets, before the performance is declined. It ascertains
that Do-Rank is able to effectively optimize DCG. It can be noted that the DCG
measure drops after a few iterations (less than 15) which indicates that using a
measure score as termination criterion is a useful approach in order to avoid the
model to overfit (Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012).
10 20 30 40 50 60 70 80 90 1000
500
1000
1500
2000
2500
3000
3500
4000
Ratio of training set (%)
Ru
nn
ing
tim
e (
sec)
Delicious 10-core
Fast Learning
Original Learning
10 20 30 40 50 60 70 80 90 1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
Ratio of training set (%)
Ru
nn
ing
tim
e (
sec)
MovieLens 10-core
Fast Learning
Original Learning
Chapter 5: List-wise based Ranking Methods 141
(a) Training set (𝐷𝑡𝑟𝑎𝑖𝑛) (b) Test set (𝐷𝑡𝑒𝑠𝑡)
Figure 5.6. The Do-Rank convergence criterion
5.2.6 Summary of Learning from Multi-Graded Data
In this section, the DCG Optimization for Learning-to-Rank (Do-Rank) method is
proposed for learning from multi-graded data on a tag-based item recommendation
model. Do-Rank proposes the User-Tag Set (UTS) scheme for interpreting the
tagging data and directly optimizes the (smoothed) DCG for learning the tensor
model in order to generate an ordered list of items that might interest the user. A fast
learning approach is also implemented to enable efficient execution of Do-Rank.
The experimental results on various real-world datasets have demonstrated
that:
Do-Rank outperforms all benchmarking methods on the NDCG, AP, and
MAP measures on most datasets. This ascertains that optimizing DCG for
building the learning model improves the recommendation performance;
UTS scheme more efficiently interprets the tagging data, in comparison to the
set-based scheme, as it implements a user’s item collection constraint for
interpreting the non-observed tagging data;
UTS scheme improves Do-Rank scalability as it generates less dense non-
indecisive entries, in comparison to that of the set-based scheme, and
therefore less data need to be learned by the ranking model.
0 5 10 15 20
17
17.2
17.4
17.6
17.8
18
18.2
18.4
MovieLens pc-20: Training set
Number of iteration
DC
G@
10
(%
)
0 5 10 15 20
8
8.2
8.4
8.6
8.8
9
9.2
9.4
MovieLens pc-20: Test set
Number of iteration
DC
G@
10
(%
)
142 Chapter 5: List-wise based Ranking Methods
5.3 GO-RANK: LEARNING FROM GRADED-RELEVANCE DATA
5.3.1 Overview
The GAP Optimization for Learning-to-Rank (Go-Rank) method is developed for
learning from graded-relevance data on a tag-based item recommendation model, in
which the GAP ranking evaluation measure is used as the optimization criterion. Go-
Rank generates an optimal list of recommended items from the GAP perspective for
all users. This section also presents a novel graded-relevance scheme to construct the
initial third-order tensor model for representing the user profile. The proposed
graded-relevance scheme interprets the tagging data with four distinct entries, i.e.
“relevant”, “likely relevant”, “irrelevant”, and “indecisive”. The “likely relevant”
entries are the transitional entries between the “relevant” and “irrelevant” entries. It
is to be noted that using GAP as the optimized ranking evaluation measure enables
the learning model to set up thresholds so that the “likely relevant” entries can be
regarded as either “relevant” or “irrelevant” entries. Each tagging data entry can then
be graded with one of the ordinal relevance values of {2,1,0, −1}. As a result, the
scheme generates the implicit tagging data entries as multi-graded data, similar to
explicit rating data that is hard to obtain otherwise.
Graded Average Precision (GAP) (Robertson et al., 2010) has been shown to
effectively work as the generalisation of Average Precision (AP) for the case of
rating of explicit feedback data (Robertson et al., 2010). Researchers (Shi et al.,
2013a) have proposed to use GAP as the recommendation model optimization
criterion for a recommendation system that uses ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation rating
data entries as explicit feedback data. The list of recommendations is generated by
ranking the predicted preference scores inferred from the non-observed ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩
relations (Balakrishnan and Chopra, 2012; Weimer et al., 2007). In contrast, this
thesis deals with a quite different and difficult problem in comparison to the prior
work as the tag-based recommendation system is built from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩
ternary relations tagging data and the ordinal rating values are given on each
observed user-tag set. This means that there exist multiple rating values on each
observed user-tag set, instead of just one rating value like the one given for the
explicit feedback data. Eventually, the recommendation list needs to be generated by
ranking the predicted preference scores of the list of items under all tags that may be
Chapter 5: List-wise based Ranking Methods 143
of interest to a user. Therefore, the tag-based recommendation system should be able
to infer the tag that will influence the user for choosing the recommended item based
on the highest preference score.
The next three sub-sections detail the three main processes in Go-Rank: (1)
user profile construction, i.e. tensor model construction; (2) learning-to-rank
procedure, i.e. latent factors generation via tensor factorization to derive the
relationships inherent in the model; and (3) recommendation generation. The last two
sub-sections present empirical evaluation and summary of the method.
5.3.2 User Profile Construction
The user profile construction is the process of constructing the initial tensor to model
the multi-dimension data. The developed Go-Rank method uses a novel graded-
relevance scheme for constructing an initial third-order tensor model to represent the
user profile and ranking learning model. The proposed graded-relevance scheme
interprets the tagging data with four distinct entries, i.e. “relevant”, “likely relevant”,
“irrelevant”, and “indecisive”. The “likely relevant” entries are set as the transitional
entries between the “relevant” and “irrelevant” entries.
Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set
of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. From the tagging data
𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the tagging activity of user 𝑢
to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in
which users have expressed their interest to items in the past, by annotating those
items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed tagging data is
very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|.
The set-based scheme (Rendle, Balby Marinho, et al., 2009) can solve the
drawbacks of the boolean scheme by differentiating the entries of non-observed data
as “irrelevant” and “indecisive” entries according to each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. However,
as shown in Section 5.2.2, the set-based scheme is incorrect as it overgeneralises the
“irrelevant” entries and results in inferior recommendation performance (Ifada and
Nayak, 2014a). On each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, the scheme interprets any items other than
those appearing in observed entries as “irrelevant” entries (Rendle, Balby Marinho,
et al., 2009) and disregards the fact that some of those items have been annotated by
144 Chapter 5: List-wise based Ranking Methods
the user using other tags. Actually, only the items, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, that have
not been tagged by user 𝑢 should be interpreted as “irrelevant” entries (Ifada and
Nayak, 2014a). Yet, how to interpret the entries of non-observed data that should not
simply be regarded as “irrelevant” remains in dispute.
Example 5.3: Issues in Tagging Data Interpretation using set-based scheme.
An example of what the issues are, of using the set-based scheme to interpret the
tagging data, is illustrated by using the entries of User 1 (𝑢1) in the toy example
illustrated in Figure 3.2. Using the set-based scheme (Rendle, Balby Marinho, et al.,
2009), as illustrated in Figure 5.7(a), the observed and non-observed entries can be
listed based on (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 sets of the user. The observed entries show that there
exist (𝑢1, 𝑡1), (𝑢1, 𝑡3), and (𝑢1, 𝑡4) sets that have been used to annotate {𝑖2}, {𝑖2}, and
{𝑖3}, respectively. These observed entries are regarded as “relevant” entries while all
non-observed entries of non-existed set, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5) on any items can
easily be interpreted as “indecisive” entries. The problem becomes apparent when
the “irrelevant” entries are to be interpreted. It can be seen that on all sets, both 𝑖1
and 𝑖4 have never been annotated by 𝑢1 using any other tags, and therefore it makes
sense to assume that the entries of (𝑢1, 𝑡1), (𝑢1, 𝑡3), and (𝑢1, 𝑡4) with {𝑖1, 𝑖4} are
“irrelevant”. However, the entries of 𝑖3 with (𝑢1, 𝑡1) and (𝑢1, 𝑡3) should not simply
be interpreted as “irrelevant” as the item 𝑖3 occurs as “relevant” on (𝑢1, 𝑡4).
Similarly, 𝑖2 is “relevant” on (𝑢1, 𝑡1) and (𝑢1, 𝑡3), and therefore (𝑢1, 𝑡4) with 𝑖2
cannot be “irrelevant”. Simply labelling those entries as “irrelevant” is improper and
can result in inferior recommendation performance (Ifada and Nayak, 2014a). It can
be noted that those entries definitely cannot be labelled “indecisive” as they are not
amongst entries to be predicted in the future.
User 30 -1 0 -1 -1
0
0
0
-1 0 -1 -1
1 0 -1 -1
-1 0 1 1
User 2
1 -1 0 -1 0
-1
-1
-1
-1 0 -1 0
1 0 1 0
-1 0 -1 0
User 1-1 0 -1 -1 0
1
-1
-1
1 -1 0
0 -1 1 0
0 -1 -1 0
tag
ite
m 0
User 30 -1 0 -1 -1
0
0
0
-1 0 -1 -1
2 0 1 1
1 0 2 2
User 2
2 1 0 1 0
-1
1
-1
-1 0 -1 0
2 0 2 0
-1 0 -1 0
User 1-1 0 -1 -1 0
2
1
-1
2 1 0
0 1 2 0
0 -1 -1 0
tag
ite
m 0
(a) (b)
Figure 5.7. Example of initial tensor 𝒴 ∈ ℝ3×4×5 , as the representation of user profile, which entries
are generated by implementing the (a) set-based and (b) graded-relevance interpretation schemes
Chapter 5: List-wise based Ranking Methods 145
The graded-relevance interpretation scheme is proposed to effectively leverage
the tagging data for building the tensor ranking learning model 𝒴. Following the
general rule, the observed entries are regarded as “relevant”, indicating that users
have shown their interest in the entries. From the observed entries, the list of distinct
items is extracted that have been annotated by user 𝑢 using any tags. This list,
denotes as 𝐶𝑢, is defined as follows:
𝐶𝑢 = {𝑖|(𝑢, 𝑖,∗) ∈ 𝐴𝑜𝑏} (5.19)
The item set 𝐶𝑢 assists in distinguishing the non-observed data that do not
belong to either the “irrelevant” or the “indecisive” category. On each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏
set, the entries in which items have been annotated using other tags are labelled as
“likely relevant” entries. As a result, the graded-relevance scheme exposes the non-
observed entries as a mixture of three entries: (1) “likely relevant” entries – user is
probably interested in the entries, yet this is not explicitly revealed, (2) “irrelevant”
entries – user is not interested in the entries, and (3) “indecisive” entries – user might
be interested in the entries in the future. The “likely relevant” entries are revealed as
entries for which, even though they do not occur in the observed set, the items of the
entries have actually been tagged by the user.
To represent the user profile, the initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed
where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the
relevance grade of tagging activity. Having the four possible distinct values for each
entry, the entries are assigned with an ordinal relevance value, which is graded from
the highest to the lowest ones, i.e. “relevant”, “likely relevant”, “irrelevant”, and
“indecisive”. The graded-relevance scheme can generate entries labelled with
{2,1, −1,0} for the tensor model, which are comparable to rating data. The rules of
graded-relevance scheme relevance grade labelling to generate the entries of tensor
𝒴 can be formulated as follows:
𝑦𝑢,𝑖,𝑡 ≔ {
2 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏
1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐶𝑢 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏
−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐼\𝐶𝑢𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
(5.20)
146 Chapter 5: List-wise based Ranking Methods
Example 5.4: Tagging Data Interpretation using graded-relevance scheme.
An example of how the graded-relevance scheme interprets tagging data is
illustrated by using the entries of User 1 (𝑢1) in the toy example illustrated in Figure
3.2. Figure 5.7(b) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 that represents
the user profile, for which entries are generated from the tagging data by
implementing the graded-relevance interpretation scheme as formulated in Equation
(5.20).
The observed entries in Figure 3.2 show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his
interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4} and 𝐶𝑢1=
{𝑖2, 𝑖3}, the tagging data is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖4} to be
annotated using 𝑡1; (2) {𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3}
more than {𝑖1, 𝑖4} to be annotated using 𝑡4. Note that, in the same order of
exemplification, interpretations can also be made that 𝑢1 favours: (1) {𝑖3} more than
{𝑖1, 𝑖4} as it was annotated using 𝑡4, yet still less than {𝑖2}; (2) {𝑖3} more than {𝑖1, 𝑖4}
as it was annotated using 𝑡4, yet still less than {𝑖2}; and (3) {𝑖2} more than {𝑖1, 𝑖4} as
it was annotated using 𝑡1 and 𝑡3, yet still less than {𝑖3}.
As shown in Figure 5.7(b), the user profile representation can then be generated as:
(1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “2”) while those of {𝑖3}
and {𝑖1, 𝑖4} are “likely relevant” (or graded as “1”) and “irrelevant” (or graded as
“-1”), respectively; (2) on (𝑢1, 𝑡3) set, entry of {𝑖2} is “relevant” while those of {𝑖3}
and {𝑖1, 𝑖4} are “likely relevant” and “irrelevant” , respectively; and (3) on (𝑢1, 𝑡4)
set, entry of {𝑖3} is “relevant” while those of {𝑖2} and {𝑖1, 𝑖4} are “likely relevant” and
“irrelevant” , respectively. Note that the “indecisive” entries, graded as “0”, are
revealed from the entries of the non-observed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5).
Comparing the example of the graded-relevance scheme (Figure 5.7(b)) to that of
set-based scheme (Figure 5.7(a)), it can be observed that the latter overgeneralised
the “irrelevant” entries as it assumes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other
than those appearing in observed entries are “irrelevant” and disregards the fact that
those items have been tagged by the user. The graded-relevance scheme states that
“irrelevant” entries of the set-based scheme can be further broken down into “likely
relevant” and “irrelevant” entries.
Chapter 5: List-wise based Ranking Methods 147
5.3.3 Learning-to-Rank Procedure
This section details how the latent factors, corresponding to each dimension of tensor
𝒴, are learned and generated.
5.3.3.1 Optimization Criterion and Factorization Technique
GAP is the generalisation of the Average Precision (AP) measure for the ordinal
relevance data (Robertson et al., 2010). Using GAP as the optimized ranking
evaluation measure enables the recommendation model to set up thresholds so that
the “likely relevant” entries can be regarded as either “relevant” or “irrelevant”
entries. The task of recommendation can now be formulated as the recommendation
of an optimal (from the GAP perspective) items list to users using the latent factor
matrices of the tensor model. Based on the original definition of GAP (Robertson et
al., 2010), the GAP score for a user 𝑢 under tag 𝑡 can be formulated as:
𝐺𝐴𝑃𝑢,𝑡 ∶=∑ ∑ ∑ 𝑔𝑘𝕀(𝑦𝑢,𝑖,𝑡≥1)𝕀(𝑦𝑢,𝑗,𝑡≥𝜇𝑘)
𝕀(𝑟𝑢,𝑗,𝑡≤𝑟𝑢,𝑖,𝑡)
𝑟𝑢,𝑖,𝑡𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘
𝑙𝑘=1
(5.21)
And therefore the GAP score for all users under all tags can be defined as:
𝐺𝐴𝑃 ∶=1
𝑄𝑆∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦
𝕀(𝑟𝑢,𝑗,𝑡≤𝑟𝑢,𝑖,𝑡)
𝑟𝑢,𝑖,𝑡𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘
𝑙𝑘=1
𝑡∈𝑇𝑢∈𝑈 (5.22)
where 𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡) and 𝑦𝑢,𝑖,𝑡 is the relevance label assigned from the ordinal
relevance values of {2,1,0, −1} obtained from the initial tensor model. The 𝑟𝑢,𝑖,t is
the ranking position of item 𝑖 for user 𝑢 with tag 𝑡 and, is approximated using �̂�𝑢,𝑖,𝑡 ,
i.e. the predicted preference score that reflects the preference level of user 𝑢 for
annotating item 𝑖 using tag 𝑡, calculated from the latent factors. The 𝑔𝑘 denotes the
threshold probability (Robertson et al., 2010) that the user sets as a threshold of
relevance at grade 𝜇𝑘, i.e. regarding the entries with grades equal or larger than 𝜇𝑘 as
“relevant” and the others as “irrelevant”. In other words, 𝑔𝑘 and 𝜇𝑘 are the
parameters that control whether the “likely relevant” entries should be regarded as
“relevant” or ‘irrelevant”. It is to be noted that the probability values must be
exclusive and exhaustive probabilities (Robertson et al., 2010):
∑ 𝑔𝑘𝑐𝑘=1 = 1 (5.23)
148 Chapter 5: List-wise based Ranking Methods
The 𝕀(∙) is the indicator function which is equal to 1 if the condition is satisfied, and
0 otherwise. The 𝑛𝑢,𝑙,𝑡 is number of items labelled with grade 𝑙 by user 𝑢 using tag 𝑡.
Notice that for notational convenience, the following substitution is performed:
𝛽𝑔𝑦 = 𝑔𝑘𝕀(𝑦𝑢,𝑖,𝑡 > 1)𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘) (5.24)
From the tensor 𝒴, latent factors are learned and generated in order to derive
the latent relationships between the dimensions of users, items and tags. The same as
Do-Rank, Go-Rank uses the CP model (Kolda and Bader, 2009), a well-known
technique that has been shown to be less expensive in both memory and time
consumption compared to Tucker (Kolda and Bader, 2009), as the factorization
technique as well as the predictor function model. Recall that the CP factorization
model for third-order tensor and the predicted preference score is illustrated in Figure
5.2 and calculated using Equation (5.5), respectively.
5.3.3.2 Ranking Smoothing
From Equation (5.22), it can be observed that GAP is dependent on the ranking
positions of items in the recommendation list, as it is reliant on the values of
𝕀(𝑟𝑢,𝑗,𝑡 ≤ 𝑟𝑢,𝑖,𝑡) and 𝑟𝑢,𝑖,𝑡. Given that the ranking positions are determined via the
predicted preference scores calculated based on the model parameters (i.e. latent
factor matrices), the GAP function becomes non-smooth. For this reason, it is hard to
apply the standard optimization approaches to the objective function, as such
approaches require the smoothness function (Chapelle and Wu, 2010; Wu et al.,
2009). This thesis attempts to tackle this problem by implementing the smoothing
function (Chapelle and Wu, 2010) to the ranking position with respect to the model
parameters as follows:
𝕀(𝑟𝑢,𝑗,𝑡 ≤ 𝑟𝑢,𝑖,𝑡) ≈ 𝜎(∆�̂�) (5.25)
𝑟𝑢,𝑖,𝑡 ≈ 1 + ∑ 𝜎(∆�̂�)𝑗≠𝑖 (5.26)
where 𝜎 is the logistic function 𝜎(𝑥) =1
1+𝑒−𝑥 and ∆�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡.
Chapter 5: List-wise based Ranking Methods 149
Substituting Equations (5.25) and (5.26) to Equation (5.22), the smoothed
approximation of GAP is obtained as:
𝑠𝐺𝐴𝑃 ∶=1
𝑄𝑆∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)
1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘
𝑙𝑘=1
𝑡∈𝑇𝑢∈𝑈 (5.27)
Figure 5.8 shows the comparison between GAP, calculated using Equation (5.22),
and the smoothed approximation of GAP (𝑠𝐺𝐴𝑃), calculated using Equation (5.27).
Figure 5.8. The comparison between GAP and the smoothed approximation of GAP (𝑠𝐺𝐴𝑃)
5.3.3.3 Latent Factors Generation
The resultant objective function with 𝜆𝛩 as the regularization coefficient
corresponding to 𝜎𝛩 for avoiding overfitting, is formulated as:
𝐿(𝛩) ∶=1
𝑄𝑆∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)
1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘
𝑙𝑘=1
𝑡∈𝑇𝑢∈𝑈 − 𝜆𝛩‖𝛩‖𝐹2 (5.28)
The gradient descent is performed to optimize model parameters 𝛩 of ∆�̂�
formulated in Equation (5.28). It is to be noted that the 𝑄𝑆 and ∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘
𝑙𝑘=1
coefficients can be disregarded as they have no influence on the optimization.
Given a case (𝑢, 𝑖, 𝑗, 𝑡) with respect to model parameters
{𝑚𝑢(1)
, 𝑚𝑖(2)
, 𝑚𝑗(2)
, 𝑚𝑡(3)
}, the gradient of 𝑠𝐺𝐴𝑃 can be achieved by computing the
derivation of Equation (5.28) as follows:
𝜕𝐿
𝜕𝜃=
𝜕
𝜕𝜃(∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)𝑗≠𝑖𝑐𝑘=1𝑖∈𝐼
1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑡∈𝑇𝑢∈𝑈 ) − 𝜆𝜃𝜃 (5.29)
1 1.5 2 2.5 3 3.5 4 4.5 50
0.5
1
1.5
2
2.5
GAP vs Smoothed GAP (sGAP)
Point
Sc
ore
GAP
Smoothed GAP (sGAP)
150 Chapter 5: List-wise based Ranking Methods
By replacing 𝜎(𝛥�̂�) as 𝛿, for notational convenience, the derivation formulation can
be rewritten as:
𝜕𝐿
𝜕𝜃= ∑ ∑
∑ ∑ ∑ [(𝜕
𝜕𝜃(𝛽𝑔𝑦𝛿))(1+∑ 𝛿𝑗≠𝑖 )]−[(𝛽𝑔𝑦𝛿)(
𝜕
𝜕𝜃(1+∑ 𝛿𝑗≠𝑖 ))]𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.30)
Given that:
𝜕
𝜕𝜃(𝛽𝑔𝑦𝛿) = 𝛽𝑔𝑦(−𝛿 + (𝛿)2)
𝜕
𝜕𝜃∆�̂� (5.31)
and
𝜕
𝜕𝜃(1 + ∑ 𝛿𝑗≠𝑖 ) = ∑ (−𝛿 + (𝛿)2)
𝜕
𝜕𝜃∆�̂�𝑗≠𝑖 (5.32)
The resulted gradient of 𝑠𝐷𝐶𝐺 is obtained as:
𝜕𝐿
𝜕𝜃= ∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦([(−𝛿+𝛿2)(1+∑ 𝛿𝑗≠𝑖 )𝜕
𝜕𝜃𝛥�̂�]−[𝛿(∑ (−𝛿+𝛿2)𝑗≠𝑖
𝜕
𝜕𝜃𝛥�̂�)])𝑗≠𝑖
𝑐𝑘=1𝑖∈𝐼
[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.33)
Knowing that 𝛽𝑔𝑦 is actually determined by 𝕀(𝑦𝑢,𝑖,𝑡 > 1)𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘), i.e. the
entries must not be “irrelevant” and the grade of entries must be at least equal to the
threshold 𝜇𝑘; and that users are not using the whole available tags, the learning
algorithm can be modified such that it can run more efficiently and faster. In this
case, instead of calculating 𝑠𝐺𝐴𝑃 for the entire item set across all tags, the 𝑠𝐺𝐴𝑃 is
optimized only across tags that have been used by user u, 𝑉𝑢 = {𝑡|(𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏} ,
for items’ entries that are labelled as “relevant” or “likely relevant” for the user 𝑢, i.e.
𝑍𝑢 = {𝑖|𝑦𝑢,𝑖,𝑡 = 2 ∪ 𝑦𝑢,𝑖,𝑡 = 1}. Other entries are not necessary to be included since
their values are always less than any threshold graded values of 𝜇, which means that
𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘) will always be 0. As a result, the gradient of 𝑠𝐺𝐴𝑃 given a case
(𝑢, 𝑖, 𝑗, 𝑡) with respect to the model parameter, {𝑚𝑢(1)
, 𝑚𝑖(2)
, 𝑚𝑗(2)
, 𝑚𝑡(3)
}, becomes:
𝜕𝐿
𝜕𝜃= ∑ ∑
∑ ∑ ∑ 𝛽𝑔𝑦([(−𝛿+𝛿2)(1+∑ 𝛿𝑗≠𝑖 )𝜕
𝜕𝜃𝛥�̂�]−[𝛿(∑ (−𝛿+𝛿2)𝑗≠𝑖
𝜕
𝜕𝜃𝛥�̂�)])𝑗≠𝑖
𝑐𝑘=1𝑖∈𝑍𝑢
[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑉𝑢𝑢∈𝑈 − 𝜆𝜃𝜃 (5.34)
To apply the 𝑠𝐺𝐴𝑃 optimization, the model only has to substitute the computation of
𝜕
𝜕𝜃Δ�̂� based on its parameters:
𝜕Δ�̂�
𝜕𝑚𝑢(1) = (𝑚𝑖
(2)⨀𝑚𝑡
(3)− 𝑚𝑗
(2)⨀𝑚𝑡
(3)) (5.35)
Chapter 5: List-wise based Ranking Methods 151
𝜕Δ�̂�
𝜕𝑚𝑖(2) = (𝑚𝑢
(1)⨀𝑚𝑡
(3)) (5.36)
𝜕Δ�̂�
𝜕𝑚𝑗(2) = −(𝑚𝑢
(1)⨀𝑚𝑡
(3)) (5.37)
𝜕Δ�̂�
𝜕𝑚𝑡(3) = (𝑚𝑢
(1)⨀𝑚𝑖
(2)− 𝑚𝑢
(1)⨀𝑚𝑗
(2)) (5.38)
where ⨀ denotes element-wise product. The Go-Rank learning algorithm is outlined
in Figure 5.9.
1: Algorithm: Go-Rank Learning
2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, threshold probability 𝑔 ∈{𝑔1, 𝑔2}, threshold grade 𝜇 ∈ {𝜇1, 𝜇2}, learning rate 𝛼, factor matrix
column size 𝐹, regularization 𝜆, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
3: Output: Latent factors 𝑀(1), 𝑀(2), 𝑀(3)
4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆 , 𝑦𝑢,𝑖,𝑡 ∈ {2,1,0, −1} 5: Populate 𝒴 using Equation (5.20)
6: 𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡)
7: 𝑍𝑢 = {𝑖|𝑦𝑢,𝑖,𝑡 = 2 ∪ 𝑦𝑢,𝑖,𝑡 = 1} 8: 𝑉𝑢 = {𝑡|(𝑢,∗, 𝑡) ∈ 𝐴}
9: Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹 , 𝑀(2)(0)
∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0
10: repeat
11: for 𝑢 ∈ 𝑈 do
12: 𝑚𝑢(1)
⟵ 𝑚𝑢(1)
+ 𝛼𝜕𝐿
𝜕𝑚𝑢(1) based on Equation (5.34) and (5.35)
13: for 𝑡 ∈ 𝑇 do
14: 𝑚𝑡(3)
⟵ 𝑚𝑡(3)
+ 𝛼𝜕𝐿
𝜕𝑚𝑡(3) based on Equation (5.34) and (5.38)
15: for 𝑢 ∈ 𝑈 do
16: for 𝑡 ∈ 𝑉𝑢 do 17: for 𝑖 ∈ 𝑍𝑢 do 18: for 𝑘 ← 1 to 𝑐 do
19: for 𝑗 ≠ 𝑖 do
20: 𝑚𝑖(2)
⟵ 𝑚𝑖(2)
+ 𝛼𝜕𝐿
𝜕𝑚𝑖(2) based on Equation (5.34) and (5.36)
21: 𝑚𝑗(2)
⟵ 𝑚𝑗(2)
+ 𝛼𝜕𝐿
𝜕𝑚𝑗(2) based on Equation (5.34) and (5.37)
22: + + ℎ 23: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥
Figure 5.9. The Go-Rank learning algorithm
152 Chapter 5: List-wise based Ranking Methods
5.3.3.4 Complexity Analysis and Convergence
The complexity of the Go-Rank learning process is analysed on a single iteration.
The initial Go-Rank (Equation (5.28)) complexity is 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑐𝑅2)) where
𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡). Given that 𝒴 ∈ ℝ𝑄×𝑅×𝑆, the total number of possible entries of the
tensor model, i.e. the sum of “relevant”, “likely relevant”, irrelevant” and
“indecisive” entries, is calculated as |𝑌| = 𝑄𝑅𝑆. Since |𝑌| ≫ 𝑄, 𝑆, 𝑅, 𝐹, 𝑐, the overall
complexity of Go-Rank in one iteration can be regarded as |𝑌|. In other words, it is
linear to the total number of possible entries of the tensor model.
After the implementation of the proposed approach for making the learning run
efficiently and faster, the Go-Rank complexity (illustrated in Figure 5.9), becomes
𝑂(𝐹(𝑄 + 𝑆 + 𝑄�̃� 𝑐�̃�2)) where �̃� and �̃� denote the average number of 𝑉𝑢 and 𝑍𝑢.
Since �̃� ≪ 𝑆 and �̃� ≪ 𝑅, the Go-Rank complexity now becomes |�̃�|, i.e. the sum of
“relevant” and “likely relevant” entries of the tensor model, where |�̃�| ≪ |𝑌|.
5.3.4 Recommendation Generation
Since Go-Rank implements a ranking-based interpretation scheme and learning-to-
rank model, the resulted latent factors of Go-Rank can be directly used for generating
the Top-𝑁 list of item recommendations for each target user. Using Equation (5.5),
the predicted preference score �̂�𝑢,𝑖,𝑡 of target user 𝑢 to item 𝑖 on tag 𝑡 is calculated.
The candidate items of user 𝑢 are identified based on the maximum �̂�𝑢,𝑖,𝑡 of each
user-item set. The score of candidate items are then ranked in descending order for
generating the list of the recommended items.
5.3.5 Empirical Evaluation
The proposed Go-Rank method and benchmarking methods are evaluated by 5-fold
cross-validation experimentation. For each fold, each dataset is randomly divided
into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of
posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not overlap in posts, i.e., there exist no triplets for a
user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet (𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The
recommendation task is to predict and rank the Top-𝑁 items for the users present in
𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured and reported over the average values
Chapter 5: List-wise based Ranking Methods 153
on all five runs using AP and NDCG, presented at various Top-𝑁 positions, as well
as MAP. The performance of Go-Rank is compared with benchmarking methods,
including MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme,
2010), and CTS (Kim et al., 2010). It is to be noted that comparison amongst all
proposed methods is presented in Chapter 6.
To enable meaningful comparisons, the parameter values for all methods are
tuned by randomly selected 25% of all the observed data available in 𝐷𝑡𝑟𝑎𝑖𝑛. For all
tensor-based methods, the size of latent factor matrix 𝐹 is set to 128 as the
recommendation quality usually does not benefit from more than that value. The
learning rate 𝛼 and regularization 𝜆 for PITF are set as 0.01 and 5𝑒−05, respectively,
as suggested in the article (Ifada and Nayak, 2015). The parameters for MAX are
empirically tuned as t𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 = 1𝑒−04 and 𝜆 = 0. For CTS, the neighbourhood
size 𝑘 and model size 𝑤 are all searched from the grid of
{10,20,30,40,50,60,70,80,90,100}. The learning rate 𝛼 and regularization 𝜆 for Go-
Rank are adjusted from 0.01 to 0.1 and 0.00001 to 0.00005, respectively.
5.3.5.1 Impact of graded-relevance Scheme
The impact of implementing graded-relevance scheme is investigated by comparing
the tensor 𝒴 entries population generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using the proposed graded-
relevance scheme with those of the boolean and set-based schemes. Comparison
with UTS, a previously proposed interpretation scheme, is presented in Chapter 6.
The statistics of the tensor entries population listed in Table 5.6 shows that the
“relevant” entries populations on all schemes are the same. Concurring with the
previous outcome described in Section 5.3.2, the boolean scheme generates the least
variety of distinct entries in comparison to the two other schemes. A significant
difference between the set-based and graded-relevance schemes is that the latter
breaks down the “irrelevant” entries of the former into “likely relevant” and
“irrelevant” entries while the “indecisive” entries remain the same, as previously
described in Section 5.3.2. In this case, the graded-relevance scheme reveals the
small number of “irrelevant” entries from the set-based scheme as “likely relevant”
entries, i.e. less than 16%. A portion of these entries can then possibly be regarded as
“relevant” entries by using the threshold probability (Robertson et al., 2010), as later
shown in Section 5.3.5.3.
154 Chapter 5: List-wise based Ranking Methods
Dataset Interpretation
Scheme Distinct Entry
Tensor Population (%)
𝟏𝟎-core 𝟏𝟓-core 𝟐𝟎-core
Delicious boolean Relevant 0.0006 0.0014 0.0027
Irrel./Indecisive 99.9994 99.9986 99.9973
set-based Relevant 0.0006 0.0014 0.0027
Irrelevant 0.4377 0.5670 0.6737
Indecisive 99.5617 99.4316 99.3236
graded-relevance Relevant 0.0006 0.0014 0.0027
Likely Relevant 0.0092 0.0171 0.0260
Irrelevant 0.4285 0.5499 0.6477
Indecisive 99.5617 99.4316 99.3236
LastFM boolean Relevant 0.0042 0.0092 0.0163
Irrel./Indecisive 99.9958 99.9908 99.9837
set-based Relevant 0.0042 0.0092 0.0163
Irrelevant 1.2937 1.7233 2.1553
Indecisive 98.7021 98.2675 97.8284
graded-relevance Relevant 0.0042 0.0092 0.0163
Likely Relevant 0.0399 0.0790 0.1311
Irrelevant 1.2538 1.6443 2.0242
Indecisive 98.3021 98.2675 97.8284
CiteULike boolean Relevant 0.0010 0.0037 0.0095
Irrel./Indecisive 99.9990 99.9963 99.9905
set-based Relevant 0.0010 0.0037 0.0095
Irrelevant 0.3176 0.4777 0.5995
Indecisive 99.6814 99.5186 99.3910
graded-relevance Relevant 0.0010 0.0037 0.0095
Likely Relevant 0.0137 0.0327 0.0523
Irrelevant 0.3039 0.4450 0.5472
Indecisive 99.6814 99.5186 99.3910
MovieLens boolean Relevant 0.0062 0.0283 0.0284
Irrel./Indecisive 99.9938 99.9717 99.9716
set-based Relevant 0.0062 0.0283 0.0284
Irrelevant 1.6114 2.0969 2.4147
Indecisive 98.3824 97.8748 97.5569
graded-relevance Relevant 0.0062 0.0283 0.0284
Likely Relevant 0.1463 0.2493 0.3729
Irrelevant 1.4651 1.8476 2.0418
Indecisive 98.3824 97.8748 97.5569
Table 5.6. The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using
boolean, set-based, and graded-relevance schemes
Chapter 5: List-wise based Ranking Methods 155
5.3.5.2 Accuracy Performance
The recommendation performance comparisons of the proposed Go-Rank and the
benchmarking methods in terms of NDCG, AP, and MAP on each dataset are listed
in Table 5.7, Table 5.8, Table 5.9, and Table 5.10. Note that, in contrast to AP, the
higher the Top-𝑁 position, the less the NDCG score is.
The proposed Go-Rank performance in comparison to the benchmarking
methods varies on each dataset. On the Delicious dataset, Go-Rank outperforms the
benchmarking methods, other than PITF. On the LastFM and CiteULike datasets,
Go-Rank achieves superior results on the 15 and 20-cores in terms of any evaluation
measure and at all Top-𝑁 positions. However, results on the 10-core are not showing
the same trends. On the other hand, Go-Rank reaches a constant outperformance on
any 𝑝-core size of the MovieLens dataset. These variations can be explained by
observing the tensor entries population distribution of each dataset listed in Table
5.6. Go-Rank inferior results only occur on a dataset with very low “relevant” entries
population (i.e. less than 0.0030%). On a dataset with higher “relevant” entries
population, Go-Rank shows its superiority. In this case, the size of 𝑝-core is
impacting the Go-Rank improvement over benchmarking methods. That is, the
performance improvement is linear to the size of 𝑝-core on all datasets. Figure 5.10
shows the Go-Rank improvement in terms of AP@5 over one of the benchmarking
methods, PITF. Seeing the trend, Go-Rank may outperform PITF on the Delicious
dataset with larger 𝑝-core size. In general, all of these results confirm that Go-Rank
is an effective approach for learning the tensor model built with data labelled using
the proposed graded-relevance interpretation scheme, in which its GAP-based
optimization enables the learning model to set up thresholds so that the “likely
relevant” entries can be regarded as either “relevant” or “irrelevant” entries.
156 Chapter 5: List-wise based Ranking Methods
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75
PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29
CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07
Go-Rank 1.94 1.74 4.51 5.02 2.65 2.05 1.82 4.60 5.15 3.19 2.47 2.18 5.65 6.33 4.55
Table 5.7. NDCG, AP, and MAP on Delicious dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11
PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52
CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59
Go-Rank 7.39 6.35 14.22 14.15 5.65 8.60 7.59 16.38 16.68 7.27 10.28 8.91 19.33 19.34 8.93
Table 5.8. NDCG, AP, and MAP on LastFM dataset
Chapter 5: List-wise based Ranking Methods 157
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.50 6.88 3.97 3.48 8.31 9.24 8.82
PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.20 7.18 4.65 3.74 8.54 9.91 9.78
CTS 2.2 1.87 4.94 5.34 4.31 3.87 3.20 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33
Go-Rank 3.11 2.77 5.77 6.20 4.33 5.04 4.24 10.16 10.77 9.19 6.09 4.48 10.66 11.61 11.52
Table 5.9. NDCG, AP, and MAP on CiteULike dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.23 5.10 10.25 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.40
PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.40 9.98 5.98 6.28 5.65 12.26 12.90 8.73
CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.30
Go-Rank 6.24 5.11 10.52 10.93 6.18 8.61 7.01 16.24 17.02 9.73 11.34 9.34 21.32 21.73 14.46
Table 5.10. NDCG, AP, and MAP on MovieLens dataset
158 Chapter 5: List-wise based Ranking Methods
Figure 5.10. Go-Rank improvement over PITF
It is worthwhile to highlight the reasons of Go-Rank outperformance over PITF
on all datasets, except the Delicious dataset. These two methods implement the non-
boolean interpretation schemes to build the tensor models, i.e. Go-Rank uses the
graded-relevance scheme while PITF uses the set-based schemes. Go-Rank attains
great improvement over PITF for two main reasons. Go-Rank builds the learning
model as a list-wise ranking model, i.e. aiming to get the order of all lists correctly,
whereas PITF builds the learning model as a pair-wise ranking model, which means
that it attempts to get the correct ranking order within each pair only. Moreover, Go-
Rank enhances the Top-𝑁 recommendation performance by optimizing the top-
biased measure GAP, i.e. the generalisation of AP for ordinal relevance data. While
PITF implements the equal-penalty measure AUC (Shi, Karatzoglou, Baltrunas,
Larson, Hanjalic, et al., 2012). Additionally, CTS underperforms Go-Rank, which
indicates that projecting the ternary relations of tagging data into a two-dimensional
model is adversely impacting recommendation quality (Symeonidis et al., 2010).
5.3.5.3 Impact of Probability Values
The impact of probability values is examined to demonstrated how “relevant” is the
“likely relevant” data. The probabilities values, 𝑔 ∈ {𝑔1, 𝑔2} where 𝑔1 + 𝑔2 = 1, are
the probabilities that regulate whether the “likely relevant” entries should be
regarded as “relevant” or “irrelevant”. It is to be noted that 𝑔1 and 𝑔2 determine the
Delicious LastFM CiteULike MovieLens-20
-10
0
10
20
30
40
50
60
70
80
Dataset
AP
@5
Imp
rove
men
t (%
)
10-core
15-core
20-core
Chapter 5: List-wise based Ranking Methods 159
percentage of considering the “likely relevant” entries as “relevant” and ‘irrelevant”
respectively. For example, 𝑔1 = 0.1 indicates that 10% of “likely relevant” entries
will be considered as “relevant” and the other 90% of those will be considered as
“irrelevant”. The experiments were conducted with a total grid of probability values
between 0 and 1 with an interval of 0.1, resulting
𝑔1 ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. The threshold of regarding the
“likely relevant” entries as either “relevant” or ‘irrelevant” entries is fixed as
𝜇 ∈ {2,1}, following the grades of the “relevant” and “likely relevant” entries
formulated as in Equation (5.20). Depending on which threshold value is used,
𝑦𝑢,𝑗,𝑡 ≥ 𝜇 will determine the entries as “relevant” and the others as “irrelevant”.
For the Delicious dataset, Figure 5.11 shows that not all of the “likely relevant”
entries are “irrelevant” since the highest AP@5 are achieved when 𝑔1 = 0.3,
𝑔1 = 0.2 and 𝑔1 = 0.1 for the 10-core, 15-core and 20-core sets, respectively. This
means that, a total of 30%, 20% or 10% of the “likely relevant” entries are actually
found “relevant”. For the LastFM dataset, Figure 5.12 shows that the highest AP@5
is achieved when 𝑔1 = 0.1, 𝑔1 = 0.1 and 𝑔1 = 0.2 for the 10-core, 15-core and 20-
core sets, respectively. Hence, a total of 10%, 10% or 20% of the “likely relevant”
entries of the sets can be regarded as “relevant”. Likewise, Figure 5.13 shows that a
total of 20%, 20% or 40% of the “likely relevant” entries of the sets can be regarded
as “relevant” for the 10-core, 15-core and 20-core of CiteULike dataset as the
highest AP@5 are achieved when 𝑔1 = 0.2 𝑔1 = 0.2 and 𝑔1 = 0.4, respectively.
Finally, complementing the results of other datasets, the impact of probability values
experiments results on the MovieLens dataset shows that not all of the “likely
relevant” entries are “irrelevant”. Figure 5.14 shows the highest AP@5 are achieved
when 𝑔1 = 0.1, 𝑔1 = 0.2 and 𝑔1 = 0.3 for the 10-core, 15-core and 20-core sets,
respectively. In other words, a total of 10%, 20% or 30% of the “likely relevant”
entries are actually found “relevant”.
160 Chapter 5: List-wise based Ranking Methods
(a) (b) (c)
Figure 5.11. Impact of probability values on the Delicious dataset
(a) (b) (c)
Figure 5.12. Impact of probability values on the LastFM dataset
(a) (b) (c)
Figure 5.13. Impact of probability values on the CiteULike dataset
(a) (b) (c)
Figure 5.14. Impact of probability values on the MovieLens dataset
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
1
2
3
4
5Delicious 10-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
1
2
3
4
5Delicious 15-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
1
2
3
4
5
6Delicious 20-core
Probability Values (g1)
AP
@5
(%
)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0
5
10
15LastFM 10-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
5
10
15
20LastFM 15-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
5
10
15
20LastFM 20-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
1
2
3
4
5
6
CiteULike 10-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
2
4
6
8
10
12
CiteULike 15-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
2
4
6
8
10
12
CiteULike 20-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
2
4
6
8
10
12MovieLens 10-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
5
10
15
20MovieLens 15-core
Probability Values (g1)
AP
@5
(%
)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00
5
10
15
20
25MovieLens 20-core
Probability Values (g1)
AP
@5
(%
)
Chapter 5: List-wise based Ranking Methods 161
All of the results establish that any items other than those appearing in the
observed entries, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, should not simply be regarded as
“irrelevant” since the user has revealed his interest to some of them using other tags
to annotate them, regarded as “transitional” entries. The graded-relevance scheme is
an efficient scheme, as it sets those entries as distinct entries positioned between the
“relevant” and “irrelevant” entries, and label them as “likely relevant”.
Additionally, the correlation between the impacts of probability values to the
size of 𝑝-core can also be observed from the results. That is, the higher the 𝑝-core
size, change in probability value does not make much difference to performance,
except on the LastFM dataset – but it may happen for a much larger 𝑝-core size. In
this case, smaller 𝑝-core sizes are most affected by the variation of the probability
values. This observation indicates that the implementation of the graded-relevance
scheme on Go-Rank highlights data granularity more effectively on a sparse dataset
in comparison to that of a dense dataset.
5.3.5.4 Scalability
The scalability of Go-Rank is examined in terms of its learning running time to study
the impact of implementing the “fast learning” approach on optimizing 𝑠𝐺𝐴𝑃 as
defined in Equation (5.34). The examination is demonstrated on the 10-core of the
Delicious and Movielens datasets, as implementation on other datasets and cores
shows similar results. The learning running time is measured on a single iteration at
various scales, i.e. 10% to 100% of training set (𝐷𝑡𝑟𝑎𝑖𝑛). Figure 5.15 shows that the
fast learning approach time is much faster, compared to that of the “original
learning” approach (Equation (5.33)) on both datasets.
(a) (b)
Figure 5.15. The Go-Rank scalability
10 20 30 40 50 60 70 80 90 10010
5
106
107
108
109
1010
Ratio of training set (%)
Lo
g R
un
nin
g t
ime
(se
c)
Delicious 10-core
Fast Learning
Original Learning
10 20 30 40 50 60 70 80 90 10010
4
105
106
107
108
109
Ratio of training set (%)
Lo
g R
un
nin
g t
ime
(se
c)
MovieLens 10-core
Fast Learning
Original Learning
162 Chapter 5: List-wise based Ranking Methods
5.3.5.5 Convergence
The learning algorithm convergence of Go-Rank is demonstrated on the 20-core of
CiteULike and MovieLens datasets, whereas the convergence behaviours of other
datasets and cores are the same. Figure 5.16 shows the evolution of AP@5 across
iterations on the training (𝐷𝑡𝑟𝑎𝑖𝑛) and test (𝐷𝑡𝑒𝑠𝑡) sets, where the score gradually
increases along the iterations and converges after a few iterations. These results
demonstrate that Go-Rank effectively optimizes GAP.
(a) (b)
Figure 5.16. The Go-Rank convergence
5.3.6 Summary of Learning from Graded-Relevance Data
In this section, the GAP Optimization for Learning-to-Rank (Go-Rank) method is
proposed for learning from graded-relevance data on a tag-based item
recommendation model. Go-Rank proposes the graded-relevance scheme for
interpreting the tagging data and directly optimizes the (smoothed) GAP for learning
the tensor model for generating an item recommendation list. To improve the
scalability of GoRank, a fast learning approach that applies sparsity aware
optimization is implemented.
The experimental results on various real-world datasets have demonstrated
that:
The graded-relevance is an efficient scheme as it leverages the tagging data
more effectively. It establishes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other
5 10 15 20 25 300
2
4
6
8
10
12
14
16
18
20
Number of iteration
AP
@5 (
%)
CiteULike 20-core
Training set
Test set
5 10 15 20 25 300
10
20
30
40
50
60
Number of iteration
AP
@5 (
%)
MovieLens 20-core
Training set
Test set
Chapter 5: List-wise based Ranking Methods 163
than those appearing in observed entries should not simply be all regarded as
“irrelevant” entries, as considered in the set-based scheme;
Go-Rank is scalable and outperforms all benchmarking methods on the
NDCG, AP, and MAP measures on most of the datasets. All of these
ascertain that implementing the graded-relevance scheme and optimizing
GAP for building the learning model improve the recommendation
performance. A portion of “likely relevant” entries that are found “relevant”
consequently assists Go-Rank to produce a high quality recommendation.
5.4 CHAPTER SUMMARY
This chapter has detailed the two proposed list-wise based ranking recommendations
methods, namely DCG Optimization for Learning-to-Rank (Do-Rank) and GAP
Optimization for Learning-to-Rank (Go-Rank), to solve the tag-based item
recommendation task.
Do-Rank is developed for learning from multi-graded data using the DCG
ranking evaluation measure as an optimization criterion. As demonstrated in the
results, Do-Rank outperforms all benchmarking methods on the NDCG, AP, and
MAP measures on most datasets. The proposed UTS scheme, implemented for Do-
Rank, efficiently interprets the tagging data and improves Do-Rank scalability.
Meanwhile, Go-Rank is developed for learning from graded-relevance data using the
GAP ranking evaluation measure as an optimization criterion. The proposed graded-
relevance scheme that encourages tensor density is implemented to populate the
tensor entries for efficient and fast learning. The experimental results have
demonstrated that graded-relevance efficiently interprets the tagging data and that
Go-Rank is scalable and outperforms all benchmarking methods on the NDCG, AP,
and MAP measures on most of datasets.
164 Chapter 5: List-wise based Ranking Methods
Chapter 6: Performance Comparisons and Analysis 165
Chapter 6: Performance Comparisons and
Analysis
In Chapter 4, two point-wise based ranking recommendations methods including
Tensor-based Item Recommendation using Probabilistic Ranking (TRPR) and
Recommendation Ranking using Weighted Tensor (We-Rank) were discussed. In
Chapter 5, two list-wise based ranking recommendations methods including DCG
Optimization for Learning-to-Rank (Do-Rank) and GAP Optimization for Learning-
to-Rank (Go-Rank) were described. However, comparisons of all the proposed
methods and benchmarking methods have not been conducted. Characteristics such
as the strength and shortcomings of methods, method which achieves the best
performance, and the impact of an interpretation scheme are unknown. Note that
NDCG, AP, and MAP are used as the evaluation measures, as described in Section
3.4, as they are more widely used to measures for ranking performance in
comparison to F1-Score.
In this chapter, the results from all four methods are compared and analysed to
determine when would be the best situation to use a method. This chapter focuses on:
To analyse the impact of interpretation scheme to tensor entries population;
To analyse the impact of users’ tagging behaviour to tensor entries
population;
To analyse the impact of “relevant” entries to methods’ performances;
To analyse the impact of handling “likely relevant” entries to methods’
performances;
To analyse the impact of 𝑝-core to tensor entries population and performance
of the recommendation methods;
To compare and analyse the performance of two learning-to-rank approaches:
point-wise and list-wise based ranking methods;
To compare the performance of the proposed and benchmarking methods;
166 Chapter 6: Performance Comparisons and Analysis
To compare and analyse the performance, including accuracy, computation
complexity, scalability and efficiency of the proposed methods;
To discuss the strengths and shortcomings of the proposed methods.
Note that TRPR-CP is chosen to represent the performance of TRPR as
implementing CP technique results into the best performance in comparison to
implementing other factorization techniques as observed from results in Table 6.1,
Table 6.2, Table 6.3, and Table 6.4.
Chapter 6: Performance Comparisons and Analysis 167
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75
PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29
CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07
TRPR-CP 2.38 2.09 5.37 5.94 3.13 2.77 2.43 6.32 7.05 4.05 2.78 2.48 6.47 7.22 4.81
TRPR-HOSVD 1.99 1.76 4.48 4.71 3.07 2.16 2.11 4.72 5.77 3.63 2.12 1.92 4.77 5.63 4.78
TRPR-HOOI 2.01 2.11 4.27 5.49 3.55 2.39 2.16 5.70 6.59 3.77 2.12 1.92 4.77 5.63 4.78
We-Rank 1.46 1.42 1.86 2.06 2.14 2.00 1.76 3.92 4.36 2.40 2.09 1.79 4.44 4.83 3.29
Do-Rank 2.31 1.98 5.22 5.72 2.78 2.36 2.09 5.25 5.93 3.54 2.69 2.26 6.02 6.64 4.63
Go-Rank 1.94 1.74 4.51 5.02 2.65 2.05 1.82 4.60 5.15 3.19 2.47 2.18 5.65 6.33 4.55
Table 6.1. The proposed and benchmarking methods performances on Delicious dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11
PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52
CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59
TRPR-CP 6.56 5.68 13.63 13.71 5.20 8.38 7.40 15.84 16.05 7.31 9.84 8.57 18.16 18.44 8.86
TRPR-HOSVD 6.66 5.99 13.11 13.27 5.35 7.76 6.67 14.21 14.54 6.53 8.85 7.86 16.89 17.49 7.57
TRPR-HOOI 6.52 5.69 13.11 13.27 5.35 7.85 6.99 14.90 14.28 6.39 8.85 7.86 16.89 17.49 7.57
We-Rank 4.09 4.06 8.62 9.24 3.63 6.74 6.56 9.86 10.71 6.20 8.41 7.56 16.15 16.59 7.51
Do-Rank 8.15 7.05 14.12 14.45 6.50 8.55 7.56 15.51 15.68 7.09 9.40 8.29 17.13 17.52 7.61
Go-Rank 7.39 6.35 14.22 14.15 5.65 8.60 7.59 16.38 16.68 7.27 10.28 8.91 19.33 19.34 8.93
Table 6.2. The proposed and benchmarking methods performances on LastFM dataset
168 Chapter 6: Performance Comparisons and Analysis
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.5 6.88 3.97 3.48 8.31 9.24 8.82
PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.2 7.18 4.65 3.74 8.54 9.91 9.78
CTS 2.20 1.87 4.94 5.34 4.31 3.87 3.2 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33
TRPR-CP 2.60 2.28 5.81 6.38 4.83 4.41 3.65 9.69 10.19 8.89 5.97 5.20 12.34 13.30 13.25
TRPR-HOSVD 3.19 2.72 6.08 6.46 4.83 3.95 3.27 8.87 8.73 8.89 5.56 4.14 11.12 12.83 14.67
TRPR-HOOI 3.15 2.18 5.97 5.89 4.77 3.95 3.04 8.89 7.74 8.51 5.56 4.16 11.12 12.83 14.67
We-Rank 1.91 1.82 3.96 4.48 4.06 3.13 3.07 5.56 6.32 6.92 4.61 3.74 8.52 9.89 9.75
Do-Rank 3.08 2.77 5.42 5.95 4.47 4.95 4.18 10.28 10.90 9.31 5.17 4.49 10.62 11.64 11.43
Go-Rank 3.11 2.77 5.77 6.20 4.33 5.04 4.24 10.16 10.77 9.19 6.09 4.48 10.66 11.61 11.52
Table 6.3. The proposed and benchmarking methods performances on CiteULike dataset
Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)
NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP
MAX 6.23 5.10 10.50 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.4
PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.4 9.98 5.98 6.28 5.65 12.26 12.9 8.73
CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.3
TRPR-CP 7.26 5.96 13.56 13.92 7.04 8.21 6.75 16.24 16.63 9.53 11.15 9.11 21.22 21.71 13.79
TRPR-HOSVD 6.11 5.11 10.65 11.07 5.30 7.27 6.08 13.90 14.71 7.55 10.21 8.53 20.27 21.50 12.93
TRPR-HOOI 5.96 5.01 10.25 11.12 5.39 7.78 5.96 14.10 14.74 7.85 8.51 7.36 17.07 18.38 12.14
We-Rank 4.12 3.51 8.01 9.33 4.29 5.68 5.22 12.17 13.13 7.85 8.24 7.13 15.97 16.36 11.42
Do-Rank 6.28 5.39 11.00 11.47 6.22 8.61 7.13 16.49 16.96 9.81 11.34 9.21 21.06 21.68 14.28
Go-Rank 6.24 5.11 10.52 10.93 6.18 8.61 7.01 16.24 17.02 9.73 11.34 9.34 21.32 21.73 14.46
Table 6.4. The proposed and benchmarking methods performances on MovieLens dataset
Chapter 6: Performance Comparisons and Analysis 169
6.1 IMPACT OF INTERPRETATION SCHEME TO TENSOR ENTRIES
POPULATIONS
The tensor entries population resulted from 𝐷𝑡𝑟𝑎𝑖𝑛 using boolean, UTS, and graded-
relevance schemes listed in Table 5.5 and Table 5.6 are combined as Table 6.5 in
order to study the impact of the interpretation scheme to tensor entries.
Dataset Interpretation
Scheme Distinct Entry
Tensor Population (%)
𝟏𝟎-core 𝟏𝟓-core 𝟐𝟎-core
Delicious boolean Relevant 0.0006 0.0014 0.0027
Irrel./Indecisive 99.9994 99.9986 99.9973
UTS Relevant 0.0006 0.0014 0.0027
Irrelevant 0.4285 0.5499 0.6477
Indecisive 99.5709 99.4487 99.3496
graded-relevance Relevant 0.0006 0.0014 0.0027
Likely Relevant 0.0092 0.0171 0.0260
Irrelevant 0.4285 0.5499 0.6477
Indecisive 99.5617 99.4316 99.3236
LastFM boolean Relevant 0.0042 0.0092 0.0163
Irrel./Indecisive 99.9958 99.9908 99.9837
UTS Relevant 0.0042 0.0092 0.0163
Irrelevant 1.2538 1.6443 2.0242
Indecisive 98.7420 98.3465 97.9595
graded-relevance Relevant 0.0042 0.0092 0.0163
Likely Relevant 0.0399 0.0790 0.1311
Irrelevant 1.2538 1.6443 2.0242
Indecisive 98.7021 98.2675 97.8284
CiteULike boolean Relevant 0.0010 0.0037 0.0095
Irrel./Indecisive 99.9990 99.9963 99.9905
UTS Relevant 0.0010 0.0037 0.0095
Irrelevant 0.3039 0.4450 0.5472
Indecisive 99.6951 99.5513 99.4433
graded-relevance Relevant 0.0010 0.0037 0.0095
Likely Relevant 0.0137 0.0327 0.0523
Irrelevant 0.3039 0.4450 0.5472
Indecisive 99.6814 99.5186 99.3910
MovieLens boolean Relevant 0.0062 0.0283 0.0284
Irrel./Indecisive 99.9938 99.9717 99.9716
UTS Relevant 0.0062 0.0283 0.0284
Irrelevant 1.4651 1.8476 2.0418
Indecisive 98.5287 98.1241 97.9298
graded-relevance Relevant 0.0062 0.0283 0.0284
Likely Relevant 0.1463 0.2493 0.3729
Irrelevant 1.4651 1.8476 2.0418
Indecisive 98.3824 97.8748 97.5569
Table 6.5 . The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using
boolean, UTS, and graded-relevance schemes
170 Chapter 6: Performance Comparisons and Analysis
Table 6.5 shows that boolean, UTS, and graded-relevance schemes generate
two, three, and four varieties of distinct entries, respectively. Within each dataset and
core, the “relevant” entries population of all schemes are the same, while the
“indecisive” entries are not. Meanwhile, the UTS and graded-relevance schemes
generate the same number of “irrelevant” entries as the additional distinct entries of
graded-relevance, i.e. “likely relevant”, were revealed from the “indecisive” entries.
6.2 IMPACT OF 𝒑-CORE TO TENSOR ENTRIES POPULATIONS AND
METHOD PERFORMANCES
Each dataset used in this thesis has been refined using various 𝑝-core sizes, i.e. 10,
15, and 20-cores. Figure 6.1 depicts the correlation between 𝑝-core size and entries
population distribution within each scheme. The comparison is demonstrated as the
average of entries populations of all datasets on each 𝑝-core size. From Figure 6.1, it
can be observed that the “relevant”, “likely relevant”, or “irrelevant” entries
population distribution is linear to the size of the 𝑝-core. In contrast, the “indecisive”
entries populations are decreasing on larger 𝑝-core sizes. Therefore, in general, as
seen in Chapters 4 and 5, the performance of methods that implement UTS and
graded-relevance schemes is improved for datasets with a larger 𝑝-core size in
comparison to methods that implement the boolean scheme.
Figure 6.1. Comparison of size of 𝑝-core over tensor entries population on boolean, UTS, and graded-
relevance schemes
Figure 6.2, Figure 6.3, and Figure 6.4 show the comparison of 𝑝-core size to
method performances in terms of NDCG, AP, and MAP, respectively. It can be
observed that the size of the 𝑝-core is linear to the performances of all proposed
Relevant Irrelevant&Indecisive
100
101
102
103
104
boolean
Ave
rag
e o
f E
ntr
ies
Po
pu
lati
on
(%)
Relevant Irrelevant Indecisive
100
101
102
103
104
UTS
Relevant LikelyRelevant Irrelevant Indecisive
100
101
102
103
104
graded-relevance
10-core
15-core
20-core
10-core
15-core
20-core
10-core
15-core
20-core
Chapter 6: Performance Comparisons and Analysis 171
methods on any evaluation measure. The results indicate the robustness of the
proposed methods over 𝑝-core refinement procedure. Note that Table 6.5 shows that
the size of 𝑝-core is linear to the “relevant” entries population.
Figure 6.2. Comparison of 𝑝-core over methods performances using NDCG
Figure 6.3. Comparison of 𝑝-core over methods performances using AP
Figure 6.4. Comparison of 𝑝-core over methods performances using MAP
6.3 IMPACT OF USERS TAGGING BEHAVIOURS TO TENSOR
ENTRIES POPULATIONS
A Social Tagging System (STS) allows its users to use different tags for annotating
the same item as well as the same tag being able to be used for annotating different
items. This facilitates the users’ tagging behaviours to be reflected differently in each
STS by observing an observed set, user-item or user-tag, as dominant. The user-item
set is more dominant than the user-tag set when users prefer to use less tags for
annotating items. Conversely, the user-tag set is considered more dominant than the
user-item set when users prefer to use more tags for annotating items. Figure 6.5
10 15 201
1.5
2
2.5
3
Delicious
p-core
ND
CG
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 204
5
6
7
8
9
LastFM
p-coreN
DC
G@
10 (
%)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 201
2
3
4
5
6
CiteULike
p-core
ND
CG
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 202
4
6
8
10
MovieLens
p-core
ND
CG
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 202
3
4
5
6
7
8
Delicious
p-core
AP
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 208
10
12
14
16
18
20
LastFM
p-core
AP
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 204
6
8
10
12
14
CiteULike
p-core
AP
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 205
10
15
20
25
MovieLens
p-core
AP
@10
(%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 202
2.5
3
3.5
4
4.5
5
Delicious
p-core
MA
P (%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 203
4
5
6
7
8
9
LastFM
p-core
MA
P (%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 204
6
8
10
12
14
CiteULike
p-core
MA
P (%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 204
6
8
10
12
14
16
MovieLens
p-core
MA
P (%
)
TRPR-CP
We-Rank
Do-Rank
Go-Rank
172 Chapter 6: Performance Comparisons and Analysis
displays the statistics of observed sets within each dataset used in this thesis. The
statistic shows that users of the Delicious dataset have the same tagging behaviour as
those of CiteULike, i.e. the user-tag set is more dominant than user-item set. In
contrast, the users of LastFM and MovieLens have a more dominant user-item set
than user-tag set.
(a) (b)
(c) (d)
Figure 6.5. The statistic of user-item and user-tag sets on: (a) Delicious, (b) LastFM, (c) CiteULike,
and (d) MovieLens datasets
Next, comparison of the “relevant” over “irrelevant” and “likely relevant”
entries population generated using a graded-relevance scheme is conducted to study
the impact of user tagging behaviour to entries population. In this case, the values of
“relevant” entries populations of all datasets and cores are compiled and sorted in
ascending order. Note that the “irrelevant” entries of both the UTS and graded-
relevance scheme are equal, as listed in Table 6.5.
Results in Figure 6.6 and Figure 6.7, respectively, point out that, given the
“relevant” entries population, a dataset with a dominant user-item set generates more
“irrelevant” and “likely relevant” entries populations in comparison to that of a
dominant user-tag set. The reason for this is because both the “irrelevant” and “likely
relevant” entries populations are generated from the tagging data using graded-
relevance and/or UTS schemes based on each observed user-tag set. Consequently, a
dataset with a more dominant user-item set can reveal less “indecisive” entries
population and results in more “irrelevant” and “likely relevant” entries. Therefore,
10-core 15-core 20-core0
0.5
1
1.5
2
2.5x 10
4 Delicious
p-core
# se
t
User-Item
User-Tag
10-core 15-core 20-core0
0.5
1
1.5
2
2.5
3x 10
4 LastFM
p-core
# se
t
User-Item
User-Tag
10-core 15-core 20-core0
2000
4000
6000
8000
10000
CiteULike
p-core
# se
t
User-Item
User-Tag
10-core 15-core 20-core0
2000
4000
6000
8000
MovieLens
p-core
# se
t
User-Item
User-Tag
Chapter 6: Performance Comparisons and Analysis 173
in general, the performance of datasets with a dominant user-item set is improved
when implemented with methods that implement UTS and graded-relevance schemes
in comparison to datasets with a dominant user-tag set.
Figure 6.6. Comparison of “relevant” over “irrelevant” entries population
Figure 6.7. Comparison of “relevant” over “likely relevant” entries
6.4 IMPACT OF “RELEVANT” ENTRIES TO METHOD
PERFORMANCES
To study the impact of “relevant” entries population to method performance, the
values of “relevant” entries populations of all datasets and cores are compiled and
sorted in ascending order. Figure 6.8, Figure 6.9, and Figure 6.10 show the
comparisons in terms of NDCG, AP, and MAP, respectively. It can be observed that
the “relevant” entries population does not determine the method performance. That
is, a larger “relevant” entries population is not guaranteed to result in a higher
performance score. Note that results on any other entries population also showed the
same behaviour. The results indicate that methods do not solely depend on a single
distinct entry for learning the tensor model in order to generate list of item
0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840
0.5
1
1.5
2
2.5
Del
icio
us
10-c
ore
Cit
eULi
ke10
-co
re Del
icio
us
15-c
ore
Del
icio
us
20-c
ore
Cit
eULi
ke15
-co
re
Last
FM
10-c
ore
Mo
vieL
ens
10-c
ore La
stF
M15
-co
re
Cit
eULi
ke20
-co
re
Last
FM
20-c
ore
Mo
vieL
ens
15-c
ore
Mo
vieL
ens
20-c
ore
"Relevant" Entries (%)
"Irr
elev
ant"
En
trie
s (%
)
"Relevant" vs "Irrelevant" Entries Population
Dominant User-Item
Dominant User-Tag
0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Del
icio
us
10-c
ore
Cit
eULi
ke10
-co
re
Del
icio
us
15-c
ore
Del
icio
us
20-c
ore
Cit
eULi
ke15
-co
re
Last
FM
10-c
ore
Mo
vieL
ens
10-c
ore
Last
FM
15-c
ore
Cit
eULi
ke20
-co
re
Last
FM
20-c
ore
Mo
vieL
ens
15-c
ore
Mo
vieL
ens
20-c
ore
"Relevant" Entries (%)
"Lik
ely
Rel
evan
t" E
ntr
ies
(%)
"Relevant" vs "Likely Relevant" Entries Population
Dominant User-Item
Dominant User-Tag
174 Chapter 6: Performance Comparisons and Analysis
recommendations. Each method populates tensor entries by employing an
interpretation scheme with various distinct entries where each distinct entry (and
population) of the same scheme is correlated to one another, as listed in Section 6.1.
TRPR and We-Rank implement the boolean scheme, which generates two distinct
entries, while Do-Rank and Go-Rank implement the UTS and graded-relevance
schemes, which generate three and four distinct entries respectively. Therefore, the
outperformance amongst methods cannot be simply determined based on a certain
distinct entry.
Figure 6.8. Comparison of “relevant” entries population over methods performances using NDCG
Figure 6.9. Comparison of “relevant” entries population over methods performances using AP
Figure 6.10. Comparison of “relevant” entries population over methods performances using MAP
0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840
2
4
6
8
10
12
"Relevant" Entries Population vs Method Performance
"Relevant" entries (%)
ND
CG
@10
(%
)
Del
icio
us
10-c
ore
Cit
eULi
ke10
-co
re
Del
icio
us
15-c
ore
Del
icio
us
20-c
ore C
iteU
Like
15-c
ore
Last
FM
10-c
ore
Mo
vieL
ens
10-c
ore
Last
FM
15-c
ore
Cit
eULi
ke20
-co
re
Last
FM
20-c
ore
Mo
vieL
ens
15-c
ore
Mo
vieL
ens
20-c
ore
TRPR-CP
We-Rank
Do-Rank
Go-Rank
0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840
5
10
15
20
25
"Relevant" Entries Population vs Method Performance
"Relevant" entries (%)
AP
@10
(%
)
Del
icio
us
10-c
ore
Cit
eULi
ke10
-co
re
Del
icio
us
15-c
ore
Del
icio
us
20-c
ore C
iteU
Like
15-c
ore La
stF
M10
-co
re
Mo
vieL
ens
10-c
ore La
stF
M15
-co
re
Cit
eULi
ke20
-co
re
Last
FM
20-c
ore
Mo
vieL
ens
15-c
ore
Mo
vieL
ens
20-c
ore
TRPR-CP
We-Rank
Do-Rank
Go-Rank
0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840
2
4
6
8
10
12
14
16
18
"Relevant" Entries Population vs Method Performance
"Relevant" entries (%)
MA
P (
%)
Del
icio
us
10-c
ore
Cit
eULi
ke10
-co
re
Del
icio
us
15-c
ore
Del
icio
us
20-c
ore
Cit
eULi
ke15
-co
re
Last
FM
10-c
ore
Mo
vieL
ens
10-c
ore
Last
FM
15-c
ore
Cit
eULi
ke20
-co
re
Last
FM
20-c
ore
Mo
vieL
ens
15-c
ore
Mo
vieL
ens
20-c
ore
TRPR-CP
We-Rank
Do-Rank
Go-Rank
Chapter 6: Performance Comparisons and Analysis 175
6.5 IMPACT OF HANDLING “LIKELY RELEVANT” ENTRIES TO
METHOD PERFORMANCES
“Likely relevant” entries, generated from a graded-relevance scheme, are transitional
entries positioned between the “relevant” and “irrelevant” entries. A method that
takes the transitioning into account will only benefit with this data scheme. For
example, the proposed method Go-Rank uses GAP as the optimized ranking
evaluation measure to allow the tensor model to set up thresholds, so that the “likely
relevant” entries can be regarded as either “relevant” or “irrelevant” entries. The
conjecture is that optimizing inappropriate objective function (i.e. not leveraging the
data scheme) on a tensor model results in inferior recommendation quality. To
ascertain this statement, the same method with user profiles represented by two
different data schemes is used for analysis.
The method MAX (one of the benchmarking methods in Section 3.5) is used.
Firstly, the tensor model is built using a boolean scheme, calling the method MAX-
boolean. Secondly, the tensor model is built using the graded-relevance scheme,
calling the method MAX-graded. The objective function of both MAX-boolean and
MAX-graded is minimizing the Mean Square Error (MSE), suitable for solving a
classification problem (Ifada and Nayak, 2014c). The performance comparison
between these two methods is demonstrated on an AP evaluation measure, as that of
NDCG and MAP show the same trend. As shown in Figure 6.11, MAX-graded
results in poorer recommendation quality in comparison to MAX-boolean. MAX-
graded disregards the constraint where the “likely relevant” entries should be further
regarded as either “relevant” or “irrelevant” in order to effectively learn the tensor
model.
Figure 6.11. Comparison of MAX-boolean over MAX- graded performances showing the impact of
inappropriately handling the “likely relevant” entries
10 15 200
1
2
3
4
5
6
Delicious
p-core
AP
@10
MAX-boolean
MAX-graded
10 15 200
5
10
15
LastFM
p-core
MAX-boolean
MAX-graded
10 15 200
2
4
6
8
10
CiteULike
p-core
MAX-boolean
MAX-graded
10 15 200
5
10
15
20
MovieLens
p-core
MAX-boolean
MAX-graded
176 Chapter 6: Performance Comparisons and Analysis
6.6 ACCURACY COMPARISONS OF THE PROPOSED METHODS
A comparative analysis of all proposed methods has been conducted to highlight
their strength and shortcomings. This analysis is expected to lead us towards setting a
selection mechanism of a method as per data characteristics.
Delicious dataset: Figure 6.12 shows that TRPR always achieves the best
results amongst the four proposed ranking methods on any 𝑝-core size and
evaluation measure, followed by Do-Rank, Go-Rank, and We-Rank. TRPR
works best on Delicious dataset due to the following reasons:
o The Delicious dataset is categorised as a dataset with dominant user-tag
set, as shown in Figure 6.5(a). This indicates that users tend to annotate
items with a generous number of tags;
o This particular characteristic of dataset suits TRPR because the
calculation of tag probability that significantly controls the
recommendation generation process may favour this dataset;
o In TRPR, the tensor reconstruction procedure is better suited for
generating the list of recommendations, than that of factors based, since
the probabilistic ranking procedure utilises candidate item and user tag
preference sets that are both revealed from the reconstructed tensor;
o With this type of dataset, the boolean scheme is sufficient to interpret
tagging data, representing the user profile.
Figure 6.12. Comparison of methods performances on Delicious dataset
LastFM dataset: Figure 6.13 shows that Do-Rank performs the best on the
10-core and any evaluation measure, in comparison to the other proposed
10 15 200
0.5
1
1.5
2
2.5
Delicious
p-core
ND
CG
@10
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
2
4
6
8
Delicious
p-core
AP
@1
0
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
1
2
3
4
5
Delicious
p-core
MA
P
TRPR-CP
We-Rank
Do-Rank
Go-Rank
Chapter 6: Performance Comparisons and Analysis 177
methods. Yet, along with the increment of 𝑝-core size, Do-Rank
underperforms Go-Rank and TRPR in respective order. The same with
Delicious dataset, We-Rank also performs the worst on this dataset. The
results indicate that, to achieve the best performance on the LastFM dataset,
Do-Rank is the best option on small 𝑝-core size and Go-Rank is the one on
larger 𝑝-core size. This is due to the following reasons:
o The LastFM dataset is categorised as a dataset with dominant user-item
set, as shown in Figure 6.5(b). This shows that users tend to annotate
items with the least number of tags;
o The dataset with this characteristic suits either Do-Rank or Go-Rank
because the learning algorithm that formulates the ranking based on a list
of associated items may favour this dataset;
o From Figure 6.5(b), it can be observed that the over domination of the
user-item set in comparison to user-tag set is linear to the size of 𝑝-core,
i.e. the over domination significantly decreases on larger 𝑝-core size. This
indicates that, users on larger 𝑝-core size tend to use more tags for
annotating items than those on smaller size;
o Do-Rank performs better on smaller 𝑝-core size since its learning
algorithm formulates the ranking based on a list of associated items out of
all tags, most of them which have not been used by the user. When more
tags are used by the user, i.e. on larger 𝑝-core size, the user preference of
tags matters. For this reason, Go-Rank performs better, since its learning
algorithm formulates the ranking based on a list of associated items out of
tags that have been used by the user only;
o The ranking-based scheme – UTS or graded-relevance scheme – is the
suitable tagging data interpretation scheme, used to construct the user
profile, for this type of dataset;
o In Do-Rank and Go-Rank, the factors based procedure is better suited for
generating a list of recommendations, than that of tensor reconstruction
based, since the ranking-based interpretation scheme and list-wise based
ranking approach are implemented.
178 Chapter 6: Performance Comparisons and Analysis
Figure 6.13. Comparison of methods performances on LastFM dataset
CiteULike dataset: Figure 6.14 shows that TRPR outperforms the other
proposed methods, on most 𝑝-core size and evaluation measures. Similar to
Delicious and LastFM datasets, We-Rank performs the worst on this dataset.
The same reasons that make TRPR perform best on the Delicious dataset
apply here as the CiteULike dataset is also categorised as a dataset with a
dominant user-tag set, as shown in Figure 6.5(c). However, TRPR
outperformance on this dataset is not as constant as that on the Delicious
dataset due to the significant evolvement of user-tag set over domination to
the size of 𝑝-core, i.e. the over domination is decreasing on larger 𝑝-core size.
In this case, on smaller 𝑝-core size, users tend to use too many tags, which
makes the tag preference set become too general and affects method
performance. On larger 𝑝-core size, users tend to use a moderate amount of
tags, which may really indicate the users’ preferences.
Figure 6.14. Comparison of methods performances on CiteULike dataset
10 15 200
2
4
6
8
10
LastFM
p-core
ND
CG
@10
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
5
10
15
20
LastFM
p-coreA
P@
10
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
2
4
6
8
10
LastFM
p-core
MA
P
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
1
2
3
4
5
6
CiteULike
p-core
ND
CG
@10
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
5
10
15
CiteULike
p-core
AP
@1
0
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
5
10
15
CiteULike
p-core
MA
P
TRPR-CP
We-Rank
Do-Rank
Go-Rank
Chapter 6: Performance Comparisons and Analysis 179
MovieLens dataset: Figure 6.15 shows that TRPR performs the best on the
10-core as compared to the other proposed methods. Yet, after this 𝑝-core
size, it underperforms Go-Rank and Do-Rank. Similar to the results of other
datasets, We-Rank yields the worst performance as compared to other
methods. Go-Rank or Do-Rank outperformance in this dataset is caused by
the same reasons as those of the LastFM dataset since the MovieLens dataset
is also categorised as a dataset with a dominant user-item set, as shown in
Figure 6.5(d). However, TRPR works best on small 𝑝-core size, possibly due
to users of this dataset being likely to only choose certain types of items, i.e.
movies, and to use tags that are related to genres or actors’ names (Gemmell
et al., 2011). In other words, on a small 𝑝-core size, the amount of tags used
by the users for annotating items may strongly reveal the user tag preference
that is most beneficial in TRPR. While on larger 𝑝-core size, users tend to add
more and varied tags for annotating items, which makes it more difficult to
reveal the user preference.
Figure 6.15. Comparison of methods performances on MovieLens dataset
6.7 POINT-WISE BASED RANKING METHODS VERSUS LIST-WISE
BASED RANKING METHODS
Point-wise based ranking methods include TRPR and We-Rank, while list-wise based
ranking methods include Do-Rank and Go-Rank. The comparison of each method is
described as per evaluation measure from the average performance of per method
over all datasets, as shown in Figure 6.16.
10 15 200
2
4
6
8
10
MovieLens
p-core
ND
CG
@10
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
5
10
15
20
25
MovieLens
p-core
AP
@1
0
TRPR-CP
We-Rank
Do-Rank
Go-Rank
10 15 200
5
10
15
MovieLens
p-core
MA
P
TRPR-CP
We-Rank
Do-Rank
Go-Rank
180 Chapter 6: Performance Comparisons and Analysis
Figure 6.16. Comparison of proposed methods performances as the average over all datasets using
NDCG, AP, and MAP
NDCG:
o Do-Rank, the list-wise based ranking method, achieves the best
performance in compared to other proposed methods in terms of NDCG.
The implementation of the UTS scheme and DCG – as the tagging data
interpretation scheme to result in multi-graded data representation and the
learning model optimization criterion respectively– in Do-Rank suits this
evaluation measure. Recall that DCG is the widely used ranking
evaluation measure in the case of multi-graded relevance data (Chapelle
and Wu, 2010; Liu, 2009; Weimer et al., 2007), where NDCG is the
normalization of DCG by its ideal ranking;
o Despite the fact that TRPR is a point-wise based ranking method that
builds its learning model from binary data, it can achieve equal
performance with Go-Rank – a list-wise based ranking method that builds
its model from graded-relevance data. The reason for this lies in the
probabilistic approach applied in TRPR to re-rank the candidate items
generated from the full reconstructed tensor by utilising the user tag
preference. This allows the ranking interdependency amongst the list of
candidate items to one another, which may favour NDCG;
o We-Rank, a point-wise based ranking method, is performing the worst due
to the following reasons:
We-Rank implements the point-wise based ranking approach for
learning-to-rank. This means that there is no interdependency
0
1
2
3
4
5
6
7
8
9
5.13
3.97
5.20 5.13
ND
CG
@10
Methods
TRPR-CP
We-Rank
Do-Rank
Go-Rank
0
5
10
15
20
12.55
8.94
12.0512.08
AP
@10
Methods
TRPR-CP
We-Rank
Do-Rank
Go-Rank
0
2
4
6
8
10
12
7.56
5.79
7.31 7.30
MA
P
Methods
TRPR-CP
We-Rank
Do-Rank
Go-Rank
Chapter 6: Performance Comparisons and Analysis 181
between the predicted items and other items or between the tag used
to annotate the predicted items and other tags, within the learning
model;
We-Rank performance is prone to how well the users tag usage
likeliness is captured, which will be used to reward or penalise each
tensor model entry during the learning process;
We-Rank directly uses the learned latent factors for generating the
list of item recommendations for each target user, while the
weighting approach within the learning process cannot sufficiently
capture the user’s tag usage likeliness. Implementing the full tensor
reconstruction and probabilistic ranking procedure may improve the
performance of We-Rank.
AP:
o TRPR on average achieves the best performance in compared to the other
proposed methods in terms of AP. The implementation of the boolean
scheme and probabilistic approach – as respectively the tagging data
interpretation scheme to result in binary data representation and the
approach to re-rank the candidate items generated from the full
reconstructed tensor – in TRPR suits this evaluation measure. Recall that
AP is a commonly used ranking evaluation measure in the case of binary
relevance data (Chapelle and Wu, 2010; Liu, 2009; Shi, Karatzoglou,
Baltrunas, Larson, Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas,
Larson, Oliver, et al., 2012);
o Go-Rank on average achieves better results than Do-Rank because it uses
GAP as the optimization criterion for learning its model that is built from
graded-relevance data, resulted from the implementation of graded-
relevance scheme. Recall that GAP is the generalisation of AP that works
as an alternative ranking evaluation measure for multi-graded relevance
data (Ferrante et al., 2014; Robertson et al., 2010). This may make AP
favour Go-Rank over Do-Rank;
o We-Rank performs the worst due to the same reasons as those of NDCG.
182 Chapter 6: Performance Comparisons and Analysis
MAP:
o TRPR averagely achieves the best performance in comparison to other
proposed methods in term of MAP. This corresponds to results of AP as
MAP is the average of AP over all users.
In general, the list-wise based ranking methods – which include Do-Rank and Go-
Rank – can achieve better performance in terms of NDCG in comparison to the
point-wise based ranking methods – which include TRPR and We-Rank. The
characteristic of NDCG as an evaluation measure favours the methods that build the
learning model from multi-graded data. On the other hand, the point-wise based
ranking method TRPR achieves better performance in terms of AP and MAP as
compared to the list-wise based ranking methods. The characteristics of AP and
MAP as evaluation measures favour the methods that build the learning model from
binary data.
6.8 PROPOSED METHODS VERSUS BENCHMARKING METHODS
MAX: TRPR, Do-Rank and Go-Rank outperform MAX in most evaluation measures
and datasets. MAX uses the boolean scheme for constructing the user profiles and
implements a point-wise based ranking approach for learning the tensor
recommendation model. MAX directly utilises the reconstructed tensor to generate
the list of recommendations. Consequently, though it learns the latent factors of the
ternary dimensions, it may fail to rank the list of recommendations in required order.
However, We-Rank underperforms MAX on some of the datasets. The reasons are
speculated as follows:
We-Rank is a point-wise based ranking approach that uses the learned latent
factors for generating the list of item recommendations, instead of using the
full tensor reconstruction procedure;
For learning the factors model, We-Rank implements the weight values,
calculated from the user’s tag usage likeliness, to either reward or penalise
each tensor entry. Therefore, its performance highly depends on how well the
Chapter 6: Performance Comparisons and Analysis 183
user’s tag usage likeliness is captured. We-Rank outperforms MAX on the
20-core of LastFM and CiteUlike datasets, and on the 15 and 20-cores of the
MovieLens dataset since the user’s tag usage likeliness can be sufficiently
captured on those datasets.
Pairwise Interaction Tensor Factorization (PITF) Method: PITF uses the set-
based scheme for constructing the user profiles and implements a pair-wise based
ranking approach for learning the tensor recommendation model. Experiment results
show that PITF achieves inferior performance in comparison to TRPR, Do-Rank and
Go-Rank at most evaluation measures and datasets. The reasons of this are as
follows:
Despite the fact that TRPR uses the boolean scheme to generate the tensor
entries, and implements point-wise based ranking approach for learning the
tensor recommendation model, it applies a subsequent stage, i.e. the
probabilistic ranking approach, for generating the list of recommendations.
The second stage of the method probabilistically re-ranks the list of candidate
items for each user, generated from the learned latent factors, as a list-wise
ranking model in order to get the correct order of recommendations;
Do-Rank enhances recommendation performance by optimizing the top-
biased measure DCG, while PITF implements the AUC-based optimization
approach, which gives equal penalty to mistakes at the top and bottom of the
list of recommendations. Moreover, as a pair-wise ranking model, PITF aims
to get the ranking order within each pair correctly, while Do-Rank employs
the list-wise ranking model, which aims to get the correct order of all lists;
Affirming Do-Rank outperformance, Go-Rank attains great improvement
over PITF as: (1) it enhances the recommendation performance by optimizing
the top-biased measure GAP, i.e. the generalisation of AP for ordinal
relevance data; and (2) it builds the learning model as a list-wise ranking
model and aims to get the order of all lists correctly.
In general, the same reason that causes We-Rank underperformance towards MAX
also applies here. Note that We-Rank achieves better performance than PITF on the
15 and 20-cores of the MovieLens dataset.
184 Chapter 6: Performance Comparisons and Analysis
Candidate Tag Set (CTS): CTS is a matrix-based method that implements the
boolean scheme and ranks the recommendations by using the user’s past tagging
activities in forming the user’s likelihood. All of the proposed methods are better
performed than CTS since CTS projects the ternary relations of tagging data into a
two-dimensional model, impacting the recommendation quality. Same with PITF,
We-Rank only achieves better performance than CTS on the 15 and 20-cores of the
MovieLens dataset. The reason for We-Rank underperformance is the same for the
case of MAX and PITF.
6.9 EFFICIENCY VERSUS METHOD PERFORMANCES
Do-Rank and Go-Rank apply fast learning approaches to address the optimization
computation problem. As shown in Figure 5.5, Do-Rank performance is linear to the
size of data, as expressed by the size of items R with implementation of fast learning,
in comparison to 𝑅2 when implemented without fast learning. Similarly, Figure 5.15
shows Go-Rank with fast learning is a few orders of magnitude faster than Go-Rank
without fast learning. This section analyses whether there is any trade-off between
the efficiency and accuracy. The comparative AP@10 performance of fast leaning
and original learning is shown on the 10-core of each dataset, whereas results of
other 𝑝-core sizes show similar behaviour. Figure 6.17 shows that, for both Do-Rank
and Go-Rank, the implementation of a fast learning approach does not compromise
on accuracy; instead for some datasets an improved performance is shown. It
ascertains that fast learning not only efficiently reduces the learning time but also
improves or maintains the accuracy.
(a) (b)
Figure 6.17. The comparison of efficiency over method performances
Delicious LastFM CiteULike MovieLens0
5
10
15
20
5.7
2
14.4
5
5.9
5
11.4
7
5.6
7
12.2
8
5.9
5
11.3
9
Do-Rank
Datasets
AP
@1
0
Fast Learning
Original Learning
Delicious LastFM CiteULike MovieLens0
5
10
15
20
5.0
2
14.1
5
6.2
0
10.9
3
4.9
5
14.0
8
6.1
7
10.8
0
Go-Rank
Datasets
AP
@1
0
Fast Learning
Original Learning
Chapter 6: Performance Comparisons and Analysis 185
6.10 COMPUTATION COMPLEXITY
The complexity of all proposed methods is compared and listed in Table 6.6. It can
be observed that the learning of point-wise based ranking methods is less complex
than that of the list-wise based ranking methods, except when TRPR is implemented
with a Tucker-based technique, i.e. HOSVD or HOOI. On the other hand, the
recommendation generation of point-wise based ranking method TRPR is more
complex than that of list-wise based ranking methods and the point-wise based
ranking method, We-Rank. We-Rank, Do-Rank, and Go-Rank directly use the learned
latent factors for generating a list of item recommendations for each target user.
Learning
Approach Method
Complexity
Learning Recommendation
Generation
Point-wise
based ranking
TRPR Tucker-based technique:
𝑂(𝐹3(𝑄 + 𝑆 + 𝑅))
CP-based technique:
𝑂(𝐹(𝑄 + 𝑆 + 𝑅))
Tucker-based technique:
𝑂(𝐹3𝑄𝑅 ∪𝑗=1𝑙 {𝑠𝑗}) + 𝑂(𝑛𝑣)
CP-based technique:
𝑂(𝐹𝑄𝑅 ∪𝑗=1𝑙 {𝑠𝑗}) + 𝑂(𝑛𝑣)
We-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑅)) 𝑂(𝐹)
List-wise
based ranking
Do-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑅)) 𝑂(𝐹)
Go-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑄�̃� 𝑐�̃�2)) 𝑂(𝐹)
Table 6.6. The comparison of complexity of proposed methods
where:
𝑄 : size of set of users
𝑅 : size of set of items
𝑆 : size of set of tags
𝐹 : size of latent factor
𝑙 : 𝑆 𝑑𝑖𝑣 𝑏 where 𝑏 is the size of block-strip row with 𝑏 ≪ 𝑆
𝑛 : size of candidate item set
𝑣 : size of tag preference set
186 Chapter 6: Performance Comparisons and Analysis
�̃� : average number of 𝑉𝑢 (list of tags of that have been used by user u)
�̃� : average number of 𝑍𝑢 (list items which entries are labelled as “relevant” or
“likely relevant” for the user 𝑢)
𝑐 : 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡)
6.11 TIME COMPLEXITY
The running time complexity of proposed methods includes the three main processes
running time, i.e. user profile (tensor) construction, learning the latent factors, and
recommendation generation. The comparison is shown on the smallest 𝑝-core size,
i.e. 10-core, of each dataset, consuming the most processing time.
Figure 6.18 shows that the domination of running time varies on each method.
For the point-wise based ranking methods, the running time of TRPR and We-Rank
are respectively dominated by the process of recommendation generation – requiring
two stages of full tensor reconstruction from latent factors and probabilistic ranking,
described in Section 4.2.4 – and weighted tensor construction, described in Section
4.3.3.2. On the other hand, due to the nature of the list-wise based ranking approach,
learning the latent factors process dominates the running time of both Do-Rank and
Go-Rank, as described in Section 5.2.3 and Section 5.3.3 respectively. Note that the
running time of user profile (tensor) construction, learning the latent factors, and
recommendation generation processes in TRPR, We-Rank and Go-Rank, respectively,
are very low in comparison to the other processes of the methods and therefore they
are seen as a line in Figure 6.18. In general, the total running time of TRPR is
relatively comparable to that of Go-Rank, whereas the running time of We-Rank is to
that of Do-Rank.
Chapter 6: Performance Comparisons and Analysis 187
Figure 6.18. The comparison of time complexity of proposed methods
6.12 STRENGTHS AND SHORTCOMINGS OF THE PROPOSED
METHODS
Probabilistic Ranking Method: Tensor-based Item Recommendation using
Probabilistic Ranking (TRPR).
Strength:
o Dataset: TRPR is suited to work with a dataset that has a dominant user-
tag set, such as Delicious and CiteULike;
o Accuracy:
TRPR improves recommendation accuracy by the implementation of a
probabilistic approach that utilises the user tag preference for ranking
TRPR-CP We-Rank Do-Rank Go-Rank10
-2
100
102
104
106
108
Delicious 10-core
Methods
log
Ru
nti
me
(se
c)
User Profile (Tensor) Construction
Learning the Latent Factors
Recommendation Generation
TRPR-CP We-Rank Do-Rank Go-Rank10
-2
100
102
104
106
108
LastFM 10-core
Methods
log
Ru
nti
me
(se
c)
User Profile (Tensor) Construction
Learning the Latent Factors
Recommendation Generation
TRPR-CP We-Rank Do-Rank Go-Rank10
-2
100
102
104
106
CiteULike 10-core
Methods
log
Ru
nti
me
(se
c)
User Profile (Tensor) Construction
Learning the Latent Factors
Recommendation Generation
TRPR-CP We-Rank Do-Rank Go-Rank10
-2
100
102
104
106
MovieLens 10-core
Methods
log
Ru
nti
me
(se
c)
User Profile (Tensor) Construction
Learning the Latent Factors
Recommendation Generation
188 Chapter 6: Performance Comparisons and Analysis
candidate items generated from the full reconstructed tensor, in
comparison to a conventional approach that directly ranks candidate
items from the reconstructed tensor;
The combination of both a boolean scheme for building the learning
model and a probabilistic approach make TRPR work best in terms of
AP and MAP.
o Computation complexity and running time:
The implementation of a memory efficient loop approach makes
TRPR scalable for full tensor reconstruction on a large dataset;
Belonging to the point-wise based ranking approach, the learning
process TRPR requires significantly less complexity and running time
in comparison to that of methods that belong to the list-wise based
ranking approach.
Shortcomings:
o Computation complexity and running time: The recommendation
generation process of TRPR requires the two stages of full tensor
reconstruction from the learned latent factors and probabilistic ranking
procedures, which make it require significantly more complexity and
running time in comparison to that which only requires a latent factors-
based procedure.
Weighted Tensor Approach for Ranking: Recommendation Ranking using
Weighted Tensor (We-Rank).
Strength:
o Dataset: We-Rank is suited to work on dense datasets, such as the 15 and
20-cores of the MovieLens dataset;
o Accuracy: The implementation of a weighted scheme, to reward or
penalise each primary tensor entry during the learning process, makes
We-Rank outperforms the benchmarking methods on dense datasets;
Chapter 6: Performance Comparisons and Analysis 189
o Computation complexity and running time:
Belonging to the point-wise based ranking approach, the learning
process of We-Rank requires significantly less complexity and running
time in comparison to that of methods that belong to the list-wise
based ranking approach;
The learned latent factors of We-Rank can be directly used in the
recommendation generation process that makes it require less
complexity and running time in comparison to that which requires a
full tensor reconstruction procedure.
Shortcomings:
o Accuracy: We-Rank obtains low performances in comparison to the other
proposed methods, as the weighted scheme cannot sufficiently capture the
user’s tag usage likeliness.
Learning from Multi-Graded Data: DCG Optimization for Learning-to-Rank
(Do-Rank).
Strength:
o Dataset: Do-Rank is suited to work with a dataset that has a dominant
user-item set, such as LastFM and MovieLens;
o Accuracy: The combination of both the UTS scheme for building the
learning model and DCG as the optimization criterion of the learning
model optimization criterion make Do-Rank work best in terms of
NDCG;
o Efficiency: The implementation of a fast learning approach in Do-Rank
efficiently reduces the learning time, while at the same time improving or
maintaining accuracy;
o Computation complexity and running time: The learned latent factors of
Do-Rank can be directly used in the recommendation generation process
that makes it require less complexity and running time in comparison to
that which requires a full tensor reconstruction procedure.
190 Chapter 6: Performance Comparisons and Analysis
Shortcomings:
o Computation complexity and running time: Belonging to the list-wise
based ranking approach, the learning process Do-Rank requires
significantly more complexity and running time in comparison to that of
methods that belong to the point-wise based ranking approach.
Learning from Graded-relevance data: GAP Optimization for Learning-to-
Rank (Go-Rank).
Strength:
o Dataset: Go-Rank is suited to work with a dataset that has a dominant
user-item set, such as LastFM and MovieLens;
o Accuracy: The combination of a graded-relevance scheme for building
the learning model and GAP as the optimization criterion of the learning
model optimization criterion make Go-Rank work better in comparison to
Do-Rank in terms of AP;
o Efficiency: The implementation of a fast learning approach in Go-Rank
efficiently reduces the learning time, while at the same time improving or
maintaining accuracy;
o Computation complexity and running time: The learned latent factors of
Go-Rank can be directly used in the recommendation generation process
that makes it require less complexity and running time in comparison to
that which requires full tensor reconstruction procedure.
Shortcomings:
o Computation complexity and running time: Belonging to the list-wise
based ranking approach, the learning process of Go-Rank requires
significantly more complexity and running time in comparison to that of
methods that belong to the point-wise based ranking approach.
Chapter 6: Performance Comparisons and Analysis 191
6.13 CHAPTER SUMMARY
This chapter has compared and analysed the performance of all proposed methods in
order to investigate which method performs best in terms of various aspects. First,
analysis on the impact of interpretation scheme, users’ tagging behaviour, “relevant”
entries to method performances, handling “likely relevant” entries and 𝑝-core are
conducted. The results are presented from the perspective of tensor entries
population and method performances.
Next, the performance of each proposed method is compared and discussed
from the perspective of each dataset. The comparison of the proposed methods’
performances, based on the implemented learning-to-rank approach, is then
conducted. In general, the list-wise based ranking methods – which include Do-Rank
and Go-Rank – can achieve better performance in terms of NDCG in comparison to
the point-wise based ranking methods – which includes TRPR and We-Rank. On the
other hand, the point-wise based ranking method TRPR – We-Rank is excluded in
this generalisation due to its poor performance – achieves better performance in
terms of AP and MAP in comparison to the list-wise based ranking methods. The
reason for these results is due to the characteristic of each evaluation measure, i.e.
NDCG is likely to favour methods that build the learning model from multi-graded
data, while AP and MAP are likely to favour methods that build the learning model
from binary data.
In comparison with the benchmarking methods, except We-Rank, the
proposed methods outperform MAX, PITF and CTS at most evaluation measures and
datasets. We-Rank underperformance in comparison to the benchmarking methods on
most of the datasets is due to its sensitivity to how well the user’s tag usage likeliness
is captured, which will be used to reward or penalise each tensor model entry during
the learning process.
Various aspects of the proposed methods, such as efficiency, computation
complexity, and time scalability, are then presented. Finally, the strengths and
shortcomings of the proposed methods are distilled.
192 Chapter 6: Performance Comparisons and Analysis
Chapter 7: Conclusions 193
Chapter 7: Conclusions
With the growing user-generated information on the web, STSs have gained great
popularity, since users can annotate items of their interest using freely defined tags
and then utilise those tags to organise, retrieve, and share items with other users.
Over a period of time, the tagging data are recorded as a result of the accumulated
tagging activity, i.e. an event when a user uses a tag to annotate an item and forms a
⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation. By learning from the users past tagging
behaviours, a tag-based system can generate a list of item recommendations, which
may be of interest to a user. To boost the performance of such a system, the unique
multi-dimensional relations between users, items, and tags that represent the user
profiles must be appropriately modelled, such that the latent relationships among
them are thoroughly captured. Furthermore, knowing that users are more interested
in the top list recommended items, the ranking of items in the recommendation list is
crucial.
This research developed methods for building a tag-based item
recommendation system that explores the interplay between the multi-dimensions of
tagging data. In order to achieve the goal, efficient tagging data interpretation
schemes and recommendation ranking methods are proposed, in which tensor model
and learning-to-rank approaches are employed. The recommendation methods
developed in this thesis implement two ranking approaches: point-wise and list-wise.
A point-wise based ranking method approaches the recommendation task as a
regression/classification task. A list-wise based ranking method approaches the
recommendation task as a ranking task.
The first section of this chapter summarises the contributions of this research,
followed by the findings that are drawn from this thesis. Finally, limitations and
directions of current and future works are presented, respectively.
194 Chapter 7: Conclusions
7.1 SUMMARY OF CONTRIBUTIONS
The literature review presented in Chapter 2 has highlighted the following
shortcomings in the tag-based item recommendation system:
o Lack of efficient schemes that can thoroughly utilise the user’s tagging
history for generating a list of recommendations;
o Lack of efficient methods that efficiently implement a learning-to-rank
approach to solve the tag-based item recommendation task;
o Lack of comprehensive works that study whether a combination of an
interpretation scheme and a learning-to-rank approach has a positive
influence in making a recommendation.
This thesis focused on overcoming those shortcomings by proposing two novel
schemes to interpret tagging data and four tag-based recommendation methods for
generating the list of item recommendations in which tensor model and learning-to-
rank approaches are implemented. The main contributions are summarised as
follows:
Developed two ranking-based interpretation schemes which apply a ranking
constraint to interpret the tagging data and result in a richer multi-graded
relevance data:
o User-Tag Set (UTS) scheme interprets the tagging data and results in three
possible distinct entries: “relevant” or “1”, “irrelevant” or “-1”, and
“indecisive” or “0”;
o graded-relevance scheme interprets the tagging data and results in four
possible distinct entries: “relevant” or “2”, “likely relevant” or “1”,
“irrelevant” or “-1”, and “indecisive” or “0”.
Developed Tensor-based Item Recommendation using Probabilistic Ranking
(TRPR) method:
o TRPR implements a memory efficiency technique in order to solve the
scalability issue that occurs during the tensor reconstruction process;
Conclusions 195
o TRPR improves the recommendation quality by ranking the items using a
probabilistic approach, i.e. a subsequent approach after tensor
reconstruction process.
Developed Recommendation Ranking using Weighted Tensor (We-Rank)
method:
o We-Rank implements a weighted scheme for learning the tensor
recommendation model such that rewards and penalties are given to the
observed and non-observed entries of each user-item set, respectively.
Developed DCG Optimization for Learning-to-Rank (Do-Rank) method:
o Do-Rank learns from a user profile which is built from multi-graded data
resulted by implementing the proposed User-Tag Set (UTS) scheme;
o Do-Rank optimizes the recommendation model with respect to Discount
Cumulative Gain (DCG) as the ranking evaluation measure to
appropriately learn the tensor recommendation model built from multi-
graded data.
Developed GAP Optimization for Learning-to-Rank (Go-Rank) method:
o Go-Rank learns from a user profile which is built from graded-relevance
data resulted by implementing the proposed graded-relevance scheme
that effectively leverages the tagging data;
o Go-Rank optimizes the recommendation model with respect to Graded
Average Precision (GAP) as the ranking evaluation measure to
appropriately learn the tensor recommendation model built from graded-
relevance data.
Comprehensive analyses of the results of all the developed and benchmarking
methods that reveal the strength and shortcoming of each developed method,
comparison between point-wise and list-wise based ranking methods, and
various aspects that influence method performances such as interpretation
schemes, tensor entries population, and 𝑝-core.
196 Chapter 7: Conclusions
7.2 SUMMARY OF FINDINGS
The main findings from this thesis are summarised as follows:
In response to Research Question 1 (How tagging data can be efficiently
interpreted such that the user’s tagging history is thoroughly utilised while
making recommendations?), two interpretation schemes are proposed and the
findings are as follows:
o The UTS scheme efficiently interprets tagging data as a rich multi-graded
data, where each entry can be one of the elements in the ordinal relevance
set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {1, 0, − 1}). This scheme
tackles the overgeneralisation of the set-based scheme by implementing a
user’s item collection constraint such that not all items, on each observed
user-tag set, are labelled as irrelevant” entries, as they have been
annotated by the users using other tags. Accordingly, UTS generates less
dense, non-indecisive entries in comparison to that of a set-based scheme.
This means that a tensor model that uses this scheme to populate its
entries will learn less data in comparison to a set-based scheme;
o The graded-relevance scheme efficiently interprets tagging data as a rich
graded-relevance data, where each entry can be one of the elements in the
ordinal relevance set of
{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑙𝑖𝑘𝑒𝑙𝑦 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {2, 1, 0, − 1}).
The “likely relevant” entries are the transitional entries between the
“relevant” and “irrelevant” entries. This scheme tackles the
overgeneralisation of a set-based scheme in term of labelling entries as
“irrelevant” by revealing the “likely relevant” entries as transitional
entries positioned between the “relevant” and “irrelevant” entries;
o Experiment results show that both UTS and graded-relevance schemes
support ranking methods, i.e. Do-Rank and Go-Rank respectively, to
achieve quality recommendations and to deal with the scalability problem
in the learning process;
o In regards to the implementation of 𝑝-core refinement to the dataset, the
“relevant”, “likely relevant”, or “irrelevant” entries population
Conclusions 197
distribution generated from any interpretation scheme is linear to the size
of the 𝑝-core;
o The user’s tagging behaviour influences the entries population. A dataset
that has a dominant user-item set, in which users prefer to use less
number of tags for annotating items, given the “relevant” entries
population, more “irrelevant” and/or “likely relevant” entries populations
are generated by the UTS and/or graded-relevance schemes dataset in
comparison to a dataset with a dominant user-tag set, i.e. users prefer to
use more tags for annotating items.
In response to Research Question 2 (How can a learning-to-rank approach be
implemented to solve the tag-based item recommendation task? What
optimization criterion should be used for learning the tensor recommendation
model? In what order can the Top-𝑁 item recommendation be made?), four
recommendation methods are proposed with tensor model and learning-to-
rank approaches. The experimental results on various real-world datasets
indicate the following findings:
o TRPR, a point-wise based ranking method, has shown that
probabilistically ranking candidate items generated from the reconstructed
tensor achieves improved performance over just directly ranking them.
TRPR has demonstrated that the memory efficient loop approach solves
the expensiveness of full tensor reconstruction procedure in the
recommendation generation process, ensuring TRPR is scalable on a large
dataset. The calculation of tag probability that controls the ranking
approach makes TRPR suitable for a dataset with a dominant user-tag set,
such as Delicious and CiteULike datasets;
o We-Rank, a point-wise based ranking method, has been shown to
outperform the benchmarking methods by utilising the weighted scheme.
Yet, since We-Rank highly depends on how well the user tag usage
likeliness is captured – for populating the weighted tensor – it only
performs well on dense datasets. The learned latent factors of We-Rank
can be directly used for generating the Top-𝑁 list of item
198 Chapter 7: Conclusions
recommendations for each target user due to the implementation of a
weighting scheme within the learning process, avoiding the complex
recommendation generation process, as full tensor reconstruction is not
required;
o Do-Rank has been shown to outperform the benchmarking methods and
has demonstrated that the implementation of a fast learning approach in
the learning process is reducing the learning time while at the same time
improving or maintaining accuracy. The learned latent factors of Do-Rank
can be directly used for generating the Top-𝑁 list of item
recommendations for each target user, due to the implementation of a
UTS interpretation scheme and list-wise based ranking approach within
the learning process, avoiding the complex recommendation generation
process, as full tensor reconstruction is not required. The formulation of
ranking that is based on a list of associated items in the learning algorithm
makes Do-Rank suitable for a dataset with a dominant user-item set, such
as LastFM and MovieLens datasets;
o Go-Rank achieves better performance in comparison to benchmarking
methods. The fast learning approach solves the expensiveness within
learning process due to the implementation of a list-wise based ranking
approach. Results show that this is not only reducing the learning time,
but also improving or maintaining accuracy. The learned latent factors of
Go-Rank can be directly used for generating the Top-𝑁 list of item
recommendations for each target user, due to the implementation of a
graded-relevance interpretation scheme and list-wise based ranking
approach within the learning process, avoiding the complex
recommendation generation process, as full tensor reconstruction is not
required. The same with Do-Rank, Go-Rank is suitable for a dataset with
a dominant user-item set, such as LastFM and MovieLens datasets, due to
the formulation of ranking that is based on a list of associated items in the
learning algorithm.
Conclusions 199
In response to Research Question 3 (Does a combination of an interpretation
scheme and a learning-to-rank approach have a positive influence in making a
recommendation? Given that the proposed tag-based item recommendation
methods are grouped as point-wise and list-wise based ranking approaches,
comparing their performances may help to finding an efficient method),
comprehensive comparison and analysis of all proposed methods is
conducted and the findings are as follows:
o Analysis of all proposed methods confirms that a combination of the
interpretation scheme and the learning-to-rank approach has a positive
influence in making a recommendation;
o The list-wise based ranking methods – which include Do-Rank and Go-
Rank – achieve better performance in terms of NDCG in compared to the
point-wise based ranking methods – which include TRPR and We-Rank.
Combination of the UTS scheme for building the learning model and
DCG as the optimization criterion in Do-Rank, leads NDCG, the widely
used ranking evaluation measure in the case of multi-graded relevance
data, to favour this method;
o The point-wise based ranking method TRPR – We-Rank is excluded due
to its poor performance – achieves better performance in terms of AP and
MAP in comparison to the list-wise based ranking methods. Combination
of a boolean scheme for building the learning model and a probabilistic
approach for ranking the candidate items generated from the full
reconstructed tensor in TRPR leads AP and MAP, the widely used
ranking evaluation measure in the case of binary data, to favour this
method. Meanwhile, a combination of a graded-relevance scheme for
building the learning model and GAP as the optimization criterion makes
Go-Rank averagely achieve better results than Do-Rank in terms of AP.
200 Chapter 7: Conclusions
7.3 LIMITATIONS AND FUTURE WORKS
This research focuses mainly on generating item recommendations for tag-based
systems using a tensor model and learning-to-rank approach. Several future
improvements and extensions to the currently proposed methods are as follows:
Further scope of improving the proposed methods:
o We-Rank performs the worst amongst the other proposed methods. It also
underperforms in the benchmarking methods on some of the datasets used
in this thesis. This happens due to its sensitivity to how well the user’s tag
usage likeliness is captured, which will be used to reward or penalise each
tensor model entry during the learning process. Hence, future work can
include investigating a more efficient approach for capturing the user’s
tag usage likeliness in order to improve the performance of We-Rank;
o The graded-relevance scheme is an efficient scheme, as it leverages the
tagging data more effectively. In Go-Rank, it has been implemented with
GAP as the optimization criterion of the learning model, in order to
improve the tag-based item recommendation performance. For future
work, it would be interesting to investigate the potential of implementing
the graded-relevance scheme with other ranking evaluation measures for
generating tag-based item recommendations.
Further scope of extending the problems:
o One of the common problems in a tag-based recommendation system is
the cold-start user problem, i.e. the situation in which a user has annotated
a single item only with limited number of tags. This makes it difficult to
infer the user preferences on the system, due to limited usage data.
Though UTS and graded-relevance schemes, which are able to generate
richer data, have been proposed and implemented on Do-Rank and Go-
Rank, as yet no discussion and experiments have been done to investigate
the impact of those schemes in solving the cold-start problem. Future
work can include this study on Do-Rank and Go-Rank;
o The semantics behind the tags should be considered properly, as they are
freely defined by the users and therefore they can cause semantic
Conclusions 201
problems such as synonymy and polysemy (Golder and Huberman, 2006).
Future work can include the analysis of the tag semantic problem and
utilise the outcome in building the tensor model. An alternative solution
could be implementing the topic model approach to keep the tags’ nature
as the “social vocabulary” (Alper, 2012).
Further scope of extending the applications:
o The proposed methods were built to solve the problem of tag-based item
recommendation. In the future, it would be interesting to apply the
methods to other applications that can generate three-dimensional data,
similar that of a tag-based system;
o The proposed methods apply to three-dimensional data. Future work can
possibly apply them to higher order data.
202 Chapter 7: Conclusions
Bibliography 203
Bibliography
Acar, E., Dunlavy, D. M., Kolda, T. G., and Mørup, M. (2011). Scalable Tensor
Factorizations for Incomplete Data. Chemometrics and Intelligent Laboratory
Systems, 106(1), 41-56.
Adomavicius, G., and Tuzhilin, A. (2005). Toward the Next Generation of
Recommender Systems: A Survey of the State-of-the-art and Possible
Extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6),
734-749.
Adomavicius, G., and Tuzhilin, A. (2011). Context-aware Recommender Systems. In
Recommender systems handbook. 217-253, Springer.
Agichtein, E., Brill, E., and Dumais, S. (2006). Improving Web Search Ranking by
Incorporating User Behavior Information. Proc. The 29th annual
international ACM SIGIR Conference on Research and Development in
Information Retrieval, Seattle, Washington, USA, 19-26. ACM.
Alper, M. E. (2012). Personalized Recommendation in Folksonomies using a Joint
Probabilistic Model of Users, Resources and Tags. Proc. The 11th
International Conference on Machine Learning and Applications (ICMLA),
Boca Raton, Florida, 1: 368-373. IEEE.
Appellof, C. J., and Davidson, E. (1981). Strategies for Analyzing Data from Video
Fluorometric Monitoring of Liquid Chromatographic Effluents. Analytical
Chemistry, 53(13), 2053-2056.
Bader, B. W., Kolda, T. G., and others (2012). MATLAB Tensor Toolbox Version
2.5. Retrieved 12 June, from
http://www.sandia.gov/~tgkolda/TensorToolbox/
Baker, L. D., and McCallum, A. K. (1998). Distributional Clustering of Words for
Text Classification. Proc. The 21st Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
Melbourne, Australia, 96-103. ACM.
Balakrishnan, S., and Chopra, S. (2012). Collaborative Ranking. Proc. The 5th ACM
International Conference on Web Search and Data Mining, Seattle,
Washington, USA, 143-152. ACM.
Balcan, M.-F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., and
Sorkin, G. B. (2007). Robust Reductions from Ranking to Classification. In
Learning Theory. 604-619, Springer Berlin Heidelberg.
Barragans-Martinez, A. B., Rey-Lopez, M., Costa-Montenegro, E., Mikic-Fonte, F.
A., Burguillo, J. C., and Peleteiro, A. (2010). Exploiting Social Tagging in a
Web 2.0 Recommender System. Internet Computing, IEEE, 14(6), 23-30.
Batagelj, V., and Zaveršnik, M. (2002). Generalized Cores. arXiv preprint
cs/0202039,
Begelman, G., Keller, P., and Smadja, F. (2006). Automated Tag Clustering:
Improving Search and Exploration in the Tag Space. Proc. Collaborative
Web Tagging Workshop at WWW2006, Edinburgh, Scotland, 15-33.
Bergqvist, G., and Larsson, E. G. (2010). The Higher-Order Singular Value
Decomposition: Theory and an Application. IEEE Signal Processing
Magazine, 27(3), 151-154.
204 Bibliography
Bogers, T., and van den Bosch, A. (2009). Collaborative and Content-based Filtering
for Item Recommendation on Social Bookmarking Websites. Proc. The ACM
RecSys Workshop on Recommender Systems and the Social Web, New York,
USA, 9-16. ACM.
Buckley, C., and Voorhees, E. M. (2000). Evaluating Evaluation Measure Stability.
Proc. The 23rd Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, Athens, Greece, 33-40. ACM.
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and
Hullender, G. (2005). Learning to Rank using Gradient Descent. Proc. The
22nd International Conference on Machine Learning, 89-96. ACM.
Burke, R. (2007). Hybrid Web Recommender Systems. In The adaptive web. 377-
408, Springer Berlin Heidelberg.
Cai, Y., Zhang, M., Luo, D., Ding, C., and Chakravarthy, S. (2011). Low-order
Tensor Decompositions for Social Tagging Recommendation. Proc. The 4th
ACM International Conference on Web Search and Data Mining, Hong
Kong, China, 695-704. ACM.
Cantador, I., Brusilovsky, P., and Kuflik, T. (2011). Second Workshop on
Information Heterogeneity and Fusion in Recommender Systems
(HetRec2011). Proc. The 5th ACM Conference on Recommender Systems,
Chicago, Illinois, USA, 387-388.
Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. (2007). Learning to Rank: From
Pairwise Approach to Listwise Approach. Paper presented at the The 24th
International Conference on Machine Learning. Corvalis, Oregon: ACM.
Carroll, J. D., and Chang, J.-J. (1970). Analysis of Individual Differences in
Multidimensional Scaling via an N-way Generalization of “Eckart-Young”
Decomposition. Psychometrika, 35(3), 283-319.
Castellano, G., Fanelli, A., Torsello, M., and Jain, L. (2009). Innovations in Web
Personalization. In Web Personalization in Intelligent Environments. 1-26,
Springer Berlin Heidelberg.
Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. (2009). Expected Reciprocal
Rank for Graded Relevance. Proc. The 18th ACM Conference on Information
and Knowledge Management, 621-630. ACM.
Chapelle, O., and Wu, M. (2010). Gradient Descent Optimization of Smoothed
Information Retrieval Metrics. Information Retrieval, 13(3), 216-235.
Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-i. (2009). Nonnegative Matrix
and Tensor Factorizations: Applications to Exploratory Multi-way Data
Analysis and Blind Source Separation, John Wiley & Sons.
Craswell, N. (2009). Mean Reciprocal Rank. In Encyclopedia of Database Systems.
1703-1703, Springer.
Cremonesi, P., Koren, Y., and Turrin, R. (2010). Performance of Recommender
Algorithms on Top-N Recommendation Tasks. Proc. The 4th ACM
Conference on Recommender Systems, Barcelona, Spain, 39-46. ACM.
Das, M., Das, G., and Hristidis, V. (2011). Leveraging Collaborative Tagging for
Web Item Design. Proc. The 17th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, San Diego, California, USA,
538-546. ACM.
de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., and Rueda-Morales, M. A.
(2010). Combining Content-based and Collaborative Recommendations: A
Hybrid Approach based on Bayesian Networks. International Journal of
Approximate Reasoning, 51(7), 785-799.
Bibliography 205
De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000a). A Multilinear Singular
Value Decomposition. SIAM Journal on Matrix Analysis and Applications,
21(4), 1253-1278.
De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000b). On the Best Rank-1 and
Rank-(R 1, R 2,..., Rn) Approximation of Higher-Order Tensors. SIAM
Journal on Matrix Analysis and Applications, 21(4), 1324-1342.
Doerfel, S., and Jäschke, R. (2013). An Analysis of Tag-recommender Evaluation
Procedures. Proc. The 7th ACM Conference on Recommender Systems, 343-
346. ACM.
Dyrby, M., Baunsgaard, D., Bro, R., and Engelsen, S. B. (2005). Multiway
Chemometric Analysis of the Metabolic Response to Toxins Monitored by
NMR. Chemometrics and Intelligent Laboratory Systems, 76(1), 79-89.
Ferrante, M., Ferro, N., and Maistro, M. (2014). Rethinking How to Extend Average
Precision to Graded Relevance. In Information Access Evaluation.
Multilinguality, Multimodality, and Interaction. 19-30, Springer International
Publishing.
Gemmell, J., Schimoler, T., Mobasher, B., and Burke, R. (2011). Tag-based
Resource Recommendation in Social Annotation Applications. In User
Modeling, Adaption and Personalization. 111-122, Springer.
Golder, S. A., and Huberman, B. A. (2006). Usage patterns of collaborative tagging
systems. Journal of Information Science, 32(2), 198-208.
Halpin, H., Robu, V., and Shepherd, H. (2007). The Complex Dynamics of
Collaborative Tagging. Proc. The 16th International Conference on World
Wide Web, Banff, Alberta, Canada, 21: 211-220. ACM.
Harshman, R. A. (1970). Foundations of the PARAFAC Procedure: Models and
Conditions for an "Explanatory" Multimodal Factor Analysis. UCLA Working
Papers in Phonetics, 16, 1-84.
Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. Proc. The 15th
Conference on Uncertainty in Artificial Intelligence (UAI'99), Stockholm,
Sweden, 289-296. Morgan Kaufmann Publishers Inc.
Ifada, N., and Nayak, R. (2014a). An Efficient Tagging Data Interpretation and
Representation Scheme for Item Recommendation. Proc. The 12th
Australasian Data Mining Conference, Brisbane, Australia, ACS.
Ifada, N., and Nayak, R. (2014b). Tensor-based Item Recommendation using
Probabilistic Ranking in Social Tagging Systems. Proc. The 23rd
International Conference on World Wide Web Companion, Seoul, Korea,
805-810. ACM.
Ifada, N., and Nayak, R. (2014c). A Two-Stage Item Recommendation Method
Using Probabilistic Ranking with Reconstructed Tensor Model. In User
Modeling, Adaptation, and Personalization. 98-110, Springer.
Ifada, N., and Nayak, R. (2015). Do-Rank: DCG Optimization for Learning-to-Rank
in Tag-Based Item Recommendation Systems. In Advances in Knowledge
Discovery and Data Mining. 510-521, Springer.
Ifada, N., and Nayak, R. (2016). How Relevant is the Irrelevant Data: Leveraging the
Tagging Data for a Learning-to-Rank Model. Proc. The 19th ACM
International Conference on Web Search and Data Mining, San Francisco,
California, US, 23-32.
Jain, V., and Varma, M. (2011). Learning to Re-rank: Query-dependent Image Re-
ranking using Click Data. Proc. The 20th International Conference on World
Wide Web, Hyderabad, India, 277-286. ACM.
206 Bibliography
Järvelin, K., and Kekäläinen, J. (2002). Cumulated Gain-based Evaluation of IR
Techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-
446.
Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. (2007).
Tag Recommendations in Folksonomies. Proc. The 11th European
Conference on Principles and Practice of Knowledge Discovery in
Databases, Warsaw, Poland, 506-514. Springer-Verlag.
Jitao, S., Changsheng, X., and Jing, L. (2012). User-Aware Image Tag Refinement
via Ternary Semantic Analysis. IEEE Transactions on Multimedia, 14(3),
883-895.
Kim, H.-N., Alkhaldi, A., El Saddik, A., and Jo, G.-S. (2011). Collaborative User
Modeling with User-generated Tags for Social Recommender Systems.
Expert Systems with Applications, 38(7), 8488-8496.
Kim, H.-N., and El Saddik, A. (2013). Exploring Social Tagging for Personalized
Community Recommendations. User Modeling and User-Adapted
Interaction, 23(2-3), 249-285.
Kim, H.-N., Ji, A.-T., Ha, I., and Jo, G.-S. (2010). Collaborative Filtering based on
Collaborative Tagging for Enhancing the Quality of Recommendation.
Electronic Commerce Research and Applications, 9(1), 73-83.
Kim, J., He, Y., and Park, H. (2014). Algorithms for Nonnegative Matrix and Tensor
Factorizations: A Unified View based on Block Coordinate Descent
Framework. Journal of Global Optimization, 58(2), 285-319.
Kolda, T., and Bader, B. (2009). Tensor Decompositions and Applications. SIAM
Review, 51(3), 455-500.
Kolda, T. G. (2006). Multilinear Operators for Higher-order Decompositions,
Sandia National Laboratories.
Kolda, T. G., and Sun, J. (2008). Scalable Tensor Decompositions for Multi-aspect
Data Mining. Proc. The 8th IEEE International Conference on Data Mining,
Pisa, Italy, 363-372. IEEE.
Koren, Y. (2008). Factorization Meets The Neighborhood: A Multifaceted
Collaborative Filtering Model. Proc. The 14th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,
USA, 426-434. ACM.
Koren, Y., and Bell, R. (2011). Advances in Collaborative Filtering. In
Recommender Systems Handbook. 145-186, Springer US.
Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix Factorization Techniques for
Recommender Systems. Computer, 42(8), 30-37.
Koren, Y., and Sill, J. (2011). OrdRec: An Ordinal Model for Predicting
Personalized Item Rating Distributions. Proc. The 5th ACM conference on
Recommender Systems, Chicago, Illinois, USA, 117-124. ACM.
Kutty, S., Chen, L., and Nayak, R. (2012). A People-to-people Recommendation
System using Tensor Space Models. Proc. The 27th Annual ACM Symposium
on Applied Computing, Trento, Italy, 187-192. ACM.
Lee, D. H., and Brusilovsky, P. (2010). Interest Similarity of Group Members: The
Case study of Citeulike. Proc. The WebSci10: Extending the Frontiers of
Society On-Line, North Carolina, US.
Lee, W. S. (2001). Collaborative Learning and Recommender Systems. Proc. The
Eighteenth International Conference on Machine Learning, San Francisco,
CA, USA, 314-321. Morgan Kaufmann Publishers Inc.
Bibliography 207
Leginus, M., Dolog, P., and Žemaitis, V. (2012). Improving Tensor Based
Recommenders with Clustering. In User Modeling, Adaptation, and
Personalization. 151-163, Springer Berlin Heidelberg.
Lekakos, G., and Giaglis, G. (2007). A Hybrid Approach for Improving Predictive
Accuracy of Collaborative Filtering Algorithms. User Modeling and User-
Adapted Interaction, 17(1-2), 5-40.
Li, P., Wu, Q., and Burges, C. J. (2007). McRank: Learning to Rank using Multiple
Classification and Gradient Boosting. Proc. Advances in Neural Information
Processing Systems 20 (NIPS 2007), 897-904.
Li, X., Guo, L., and Zhao, Y. E. (2008). Tag-based Social Interest Discovery. Proc.
The 17th International Conference on World Wide Web, Beijing, China, 675-
684. ACM.
Liang, H., Xu, Y., Li, Y., and Nayak, R. (2008). Collaborative Filtering
Recommender Systems Using Tag Information. Proc. IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent
Technology, 2008. WI-IAT '08. , Sydney, NSW, 3: 59-62. IEEE Computer
Society.
Liang, H., Xu, Y., Li, Y., and Nayak, R. (2009). Tag Based Collaborative Filtering
for Recommender Systems. In Rough Sets and Knowledge Technology. 666-
673, Springer Berlin Heidelberg.
Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X. (2010). Connecting users and items
with weighted tags for personalized item recommendations. Proc. The 21st
ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada,
51-60. ACM.
Liu, J., Musialski, P., Wonka, P., and Ye, J. (2009). Tensor completion for estimating
missing values in visual data. Proc. IEEE 12th International Conference on
Computer Vision, 2009 2114-2121. IEEE.
Liu, N. N., and Yang, Q. (2008). Eigenrank: A Ranking-oriented Approach to
Collaborative Filtering. Proc. The 31st Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval,
Singapore, 83-90. ACM.
Liu, N. N., Zhao, M., and Yang, Q. (2009). Probabilistic Latent Preference Analysis
for Collaborative Filtering. Proc. The 18th ACM Conference on Information
and Knowledge Management, Hong Kong, China, 759-766. ACM.
Liu, T.-Y. (2009). Learning to Rank for Information Retrieval. Foundations and
Trends in Information Retrieval, 3(3), 225-331.
Lops, P., Gemmis, M., and Semeraro, G. (2011). Content-based Recommender
Systems: State of the Art and Trends. In Recommender Systems Handbook.
73-105, Springer US.
Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., and Zhou, T. (2012).
Recommender Systems. Physics Reports, 519(1), 1-49.
Luo, X., Ouyang, Y., and Xiong, Z. (2012). Improving neighborhood based
Collaborative Filtering via integrated folksonomy information. Pattern
Recognition Letters, 33(3), 263-270.
Maneeroj, S., and Takasu, A. (2009). Hybrid Recommender System Using Latent
Features. Proc. International Conference on Advanced Information
Networking and Applications Workshops (WAINA '09), Bradford, 661-666.
IEEE Computer Society.
208 Bibliography
Marinho, L., Nanopoulos, A., Schmidt-Thieme, L., Jäschke, R., Hotho, A., Stumme,
G., and Symeonidis, P. (2011). Social Tagging Recommender Systems. In
Recommender Systems Handbook. 615-644, Springer US.
Marinho, L. B., Hotho, A., Jäschke, R., Nanopoulos, A., Rendle, S., Schmidt-
Thieme, L., Stumme, G., and Symeonidis, P. (2012). Recommender Systems
for Social Tagging Systems, Springer Science & Business Media.
McCallum, A., and Nigam, K. (1998). A comparison of event models for naive bayes
text classification. Proc. AAAI-98 workshop on learning for text
categorization, 752: 41-48. Citeseer.
Mezghani, M., Zayani, C. A., Amous, I., and Gargouri, F. (2012). A User Profile
Modelling using Social Annotations: A Survey. Proc. The 21st International
Conference Companion on World Wide Web, Lyon, France, 969-976. ACM.
Mobasher, B. (2007). Data Mining for Web Personalization. In The adaptive web.
90-135, Springer Berlin Heidelberg.
Mobasher, B., Jin, X., and Zhou, Y. (2004). Semantically Enhanced Collaborative
Filtering on the Web. In Web Mining: From Web to Semantic Web. 57-76,
Springer Berlin Heidelberg.
Mohan, A., Chen, Z., and Weinberger, K. (2011). Web-Search Ranking with
Initialized Gradient Boosted Regression Trees. Journal of Machine Learning
Research, 14, 77-89.
Mørup, M., Hansen, L. K., and Arnfred, S. M. (2008). Algorithms for sparse
nonnegative Tucker decompositions. Neural computation, 20(8), 2112-2131.
Nanopoulos, A. (2011). Item Recommendation in Collaborative Tagging Systems.
IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and
Humans, 41(4), 760-771.
Negoescu, R. A., and Gatica-Perez, D. (2008). Topickr: Flickr Groups and Users
Reloaded. Proc. The 16th ACM International Conference on Multimedia,
Vancouver, British Columbia, Canada, 857-860. ACM.
Pan, R., Dolog, P., and Xu, G. (2013). KNN-Based Clustering for Improving Social
Recommender Systems. In Agents and Data Mining Interaction. 115-125,
Springer.
Pazzani, M. J., and Billsus, D. (2007). Content-Based Recommendation Systems. In
The Adaptive Web. 325-341, Springer Berlin Heidelberg.
Pradel, B., Sean, S., Delporte, J., Guérif, S., Rouveirol, C., Usunier, N., Fogelman-
Soulié, F., and Dufau-Joel, F. (2011). A case study in a recommender system
based on purchase data. Proc. Proceedings of the 17th ACM SIGKDD
international conference on Knowledge discovery and data mining, 377-385.
ACM.
Qiu, F., and Cho, J. (2006). Automatic identification of user interest for personalized
search. Proc. The 15th International Conference on World Wide Web,
Edinburgh, Scotland, 727-736. ACM.
RadiumOne (2013). #Mobile Hashtag Survey. from
http://radiumone.com/about/#research
Rafailidis, D., and Daras, P. (2013). The TFC Model: Tensor Factorization and Tag
Clustering for Item Recommendation in Social Tagging Systems. IEEE
Transactions on Systems, Man and Cybernetics, Part A: Systems and
Humans, 43(3), 673-688.
Rawat, R., Nayak, R., and Li, Y. (2011). Identifying Interests of Web Users for
Effective Recommendations. International Journal of Innovation,
Management and Technology, 2(1), 19-24.
Bibliography 209
Rendle, S. (2011). Context-aware Ranking with Factorization Models, Springer.
Rendle, S., Balby Marinho, L., Nanopoulos, A., and Schmidt-Thieme, L. (2009).
Learning Optimal Ranking with Tensor Factorization for Tag
Recommendation. Proc. The 15th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Paris, France, 727-736. ACM.
Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR:
Bayesian Personalized Ranking from Implicit Feedback. Proc. The 25th
Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec,
Canada, 452-461. AUAI Press.
Rendle, S., and Schmidt-Thieme, L. (2010). Pairwise Interaction Tensor
Factorization for Personalized Tag Recommendation. Proc. The 3rd ACM
International Conference on Web Search and Data Mining, New York, USA,
81-90. ACM.
Robertson, S. E., Kanoulas, E., and Yilmaz, E. (2010). Extending Average Precision
to Graded Relevance Judgments. Proc. The 33rd International ACM SIGIR
Conference on Research and Development in Information Retrieval, Geneva,
Switzerland, 603-610. ACM.
Schoefegger, K., and Granitzer, M. (2012). Overview and Analysis of Personal and
Social Tagging Context to Construct User Models. Proc. The 2nd Workshop
on Context-awareness in Retrieval and Recommendation, Lisbon, Portugal,
14-21. ACM.
Shani, G., and Gunawardana, A. (2011). Evaluating Recommendation Systems. In
Recommender Systems Handbook. 257-297, Springer.
Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R. (2008). Personalized
Recommendation in Social Tagging Systems using Hierarchical Clustering.
Proc. The 2008 ACM Conference on Recommender Systems, 259-266. ACM.
Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., and Hanjalic, A. (2013a).
GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains.
Proc. The 22nd ACM International Conference on Information & Knowledge
Management, San Francisco, California, USA, 2261-2266. ACM.
Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., and Hanjalic, A. (2013b).
xCLiMF: Optimizing Expected Reciprocal Rank for Data with Multiple
Levels of Relevance. Proc. The 7th ACM conference on Recommender
Systems, Hong Kong, China, 431-434. ACM.
Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Hanjalic, A., and Oliver, N.
(2012). TFMAP: Optimizing MAP for Top-N Context-Aware
Recommendation. Proc. The 35th International ACM SIGIR Conference on
Research and Development in Information Retrieval, Portland, Oregon, USA,
155-164. ACM.
Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Oliver, N., and Hanjalic, A.
(2012). CLiMF: Learning to Maximize Reciprocal Rank with Collaborative
Less-is-More Filtering. Proc. The 6th ACM Conference on Recommender
Systems, Dublin, Ireland, 139-146. ACM.
Shi, Y., Larson, M., and Hanjalic, A. (2010). List-wise Learning to Rank with Matrix
factorization for Collaborative Filtering. Proc. The 4th ACM Conference on
Recommender Systems, Barcelona, Spain, 269-272. ACM.
Sigurbjörnsson, B., and Van Zwol, R. (2008). Flickr Tag Recommendation based on
Collective Knowledge. Proc. The 17th International Conference on World
Wide Web, Beijing, China, 327-336. ACM.
210 Bibliography
Singh Anand, S., and Mobasher, B. (2005). Intelligent Techniques for Web
Personalization. In Lecture notes in computer science. 1-36, Springer Berlin
Heidelberg.
Su, X., and Khoshgoftaar, T. M. (2009). A Survey of Collaborative Filtering
Techniques. Advances in Artificial Intelligence, 2009(January), 4:2-4:2.
Sun, J. T., Zeng, H. J., Liu, H., Lu, Y., and Chen, Z. (2005). CubeSVD: a novel
approach to personalized Web search. Proc. The 14th international
conference on World Wide Web, Chiba, Japan, 382-390. ACM.
Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. (2008). Tag
Recommendations based on Tensor Dimensionality Reduction. Proc. The
2008 ACM Conference on Recommender Systems, Lausanne, Switzerland,
43-50. ACM.
Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. (2010). A Unified
Framework for Providing Recommendations in Social Tagging Systems
Based on Ternary Semantic Analysis. IEEE Transactions on Knowledge and
Data Engineering, 22(2), 179-192.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R.,
Botstein, D., and Altman, R. B. (2001). Missing Value Estimation Methods
for DNA Microarrays. Bioinformatics, 17(6), 520-525.
Tso-Sutter, K. H. L., Marinho, L. B., and Schmidt-Thieme, L. (2008). Tag-aware
Recommender Systems by Fusion of Collaborative Filtering Algorithms.
Proc. The 2008 ACM symposium on Applied computing, Fortaleza, Ceara,
Brazil, 1995-1999. ACM.
Tsourakakis, C. E. (2009). MACH: Fast Randomized Tensor Decompositions. arXiv
preprint arXiv:0909.4969.
Tucker, L. R. (1966). Some Mathematical Notes on Three-mode Factor Analysis.
Psychometrika, 31(3), 279-311.
Vasilescu, M. A. O., and Terzopoulos, D. (2002). Multilinear Analysis of Image
Ensembles: Tensorfaces. In Computer Vision—ECCV 2002. 447-460,
Springer.
Venugopal, K. R., Srinivasa, K. G., and Patnaik, L. M. (2009). Algorithms for Web
Personalization. In Soft Computing for Data Mining Applications. 217-230,
Springer Berlin Heidelberg.
Voorhees, E. M. (1999). The TREC-8 Question Answering Track Report. Proc.
TREC-8, 99: 77-82.
Vozalis, M. G., and Margaritis, K. G. (2007). Using SVD and demographic data for
the enhancement of generalized Collaborative Filtering. Information
Sciences, 177(15), 3017-3037.
Wang, Y., Wang, L., Li, Y., and He, D. (2013). A Theoretical Analysis of NDCG
Ranking Measures. Proc. The 26th Annual Conference on Learning Theory
(COLT 2013), Princeton, NJ, USA.
Weimer, M., Karatzoglou, A., Le, Q. V., and Smola, A. (2007). Maximum Margin
Matrix Factorization for Collaborative Ranking. Advances in Neural
Information Processing Systems.
Wetzker, R., Zimmermann, C., and Bauckhage, C. (2008). Analyzing social
bookmarking systems: A del. icio. us cookbook. Proc. The ECAI 2008
Mining Social Data Workshop, Patras, Greece, 26-30.
Wu, M., Chang, Y., Zheng, Z., and Zha, H. (2009). Smoothing DCG for Learning to
Rank: A Novel Approach using Smoothed Hinge Functions. Proc. The 18th
Bibliography 211
ACM Conference on Information and Knowledge Management, Hong Kong,
China, 1923-1926. ACM.
Xu, J., and Li, H. (2007). AdaRank: A Boosting Algorithm for Information Retrieval.
Proc. The 30th Annual International ACM SIGIR Conference on Research
and Development in Information Retrieval, Amsterdam, The Netherlands,
391-398. ACM.
Xu, W., Liu, X., and Gong, Y. (2003). Document Clustering based on Non-negative
Matrix Factorization. Proc. The 26th annual International ACM SIGIR
Conference on Research and Development in Informaion Retrieval, Toronto,
Canada, 267-273. ACM.
Zhang, Z.-K., Zhou, T., and Zhang, Y.-C. (2011). Tag-Aware Recommender
Systems: A State-of-the-Art Survey. Journal of Computer Science and
Technology, 26(5), 767-777.
Zhang, Z. K., Liu, C., Zhang, Y. C., and Zhou, T. (2010). Solving the Cold-start
Problem in Recommender Systems with Social Tags. EPL (Europhysics
Letters), 92(2), 28002-p28001-28002-p28006.
Zhen, Y., Li, W.-J., and Yeung, D.-Y. (2009). TagiCoFi: Tag Informed Collaborative
Filtering. Proc. The 3rd ACM Conference on Recommender Systems, New
York, USA, 69-76.