P RANKING FOR AG BASED ITEM RECOMMENDATION SYSTEM … Ifada_Thesis.pdf · Noor Ifada . B.Eng, M.ISD . Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

PERSONALIZED RANKING FOR TAG-

BASED ITEM RECOMMENDATION SYSTEM

USING TENSOR MODEL

Noor Ifada

B.Eng, M.ISD

Submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy

School of Electrical Engineering and Computer Science

Faculty of Science and Engineering

Queensland University of Technology

2016

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model i

Keywords

Binary Data, Boolean Interpretation Scheme, Candidate Item, Discounted

Cumulative Gain, Graded-relevance Interpretation Scheme, Graded-relevance Data,

Graded Average Precision, Interpretation Scheme, Latent Factor, Learning-to-rank

Approach, List-wise based Ranking, Mean Square Error, Multi-graded Data,

Optimization Criterion, Point-wise based Ranking, Probabilistic Ranking, Set-based

Interpretation Scheme, Social Tagging System, Tag-based Item Recommendation,

Tagging Data, Tag Preference, Tag Usage, Tensor Factorization, Tensor Model,

Tensor Reconstruction, Top-𝑁 Recommendation, User-Tag Set Interpretation

Scheme, User Profile, Weighted Scheme.

ii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

Abstract

Social Tagging Systems (STSs) have gained great popularity on the Internet, since

users can annotate items of their interest using freely defined tags which can be used

for organising, retrieving, and sharing items with others. By learning from the user’s

past tagging behaviour using a tensor model, an STS can generate a list of item

recommendations, which may be of interest to the user. Despite its popularity, the

current tag-based item recommendation methods face several challenges. Firstly, a

tagging data interpretation scheme has an important role in defining the user profile

representation in a tensor model and greatly affects the recommendation

performance. The current interpretation schemes overgeneralise the “irrelevant”

entries of the non-observed tagging data. Secondly, when utilising the reconstructed

tensor for recommendation, the existing methods inappropriately disregard the users’

past tagging activities, which have been found to influence the user preference in the

recommended items. Thirdly, the tensor latent factors can directly be utilised for

generating recommendations, avoiding the expensiveness of the tensor reconstruction

process. Given the characteristics of user profile representation resulted from the

implementation of an interpretation scheme, this approach requires building an

efficient “learning-to-rank” model that governs the recommendation process.

This thesis proposes to tackle these challenges by developing two efficient

tagging data interpretation schemes and four ranking methods for tag-based item

recommendation systems, based on tensor models and learning-to-rank approaches.

The developed interpretation schemes, namely UTS and graded-relevance, apply

ranking constraints to interpret the tagging data that allow a ranked representation

and result in richer data. The developed ranking methods fall into the category of

point-wise and list-wise based ranking approaches and consider the recommendation

task as regression/classification and ranking respectively.

The first developed point-wise based ranking method, namely “Tensor-based

Item Recommendation using Probabilistic Ranking” (TRPR), focuses on (1)

improving the scalability during the tensor reconstruction process by implementing a

memory efficiency technique and (2) increasing the recommendation accuracy by

ranking the items of the reconstructed tensor using a subsequent probabilistic

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model iii

approach. The second method, namely “Recommendation Ranking using Weighted

Tensor” (We-Rank), focuses on dealing with the sparsity problem and improving the

recommendation accuracy during the learning-to-rank process. We-Rank implements

a weighted scheme for learning the tensor recommendation model in a way such that

the observed and non-observed entries of each user-item set are given either rewards

or penalties, i.e. the observed entries are weighted with higher values than the non-

observed ones.

The first developed list-wise based ranking method, namely “DCG

Optimization for Learning-to-Rank” (Do-Rank), learns from a user profile built using

the multi-graded data resulted from the implementation of the proposed User-Tag Set

(UTS) scheme. Do-Rank optimizes the recommendation model with respect to

Discount Cumulative Gain (DCG) as the ranking evaluation measure to appropriately

learn the tensor recommendation model built from the multi-graded data. The second

method, namely “GAP Optimization for Learning-to-Rank” (Go-Rank), learns from a

user profile built using the graded-relevance data resulted from the implementation

of the proposed graded-relevance scheme. Go-Rank optimizes the recommendation

model with respect to Graded Average Precision (GAP) as the ranking evaluation

measure to appropriately learn the tensor recommendation model built from the

graded-relevance data.

The developed methods are evaluated using the real-world and freely-available

data from tagging systems. Empirical analyses show that the UTS scheme efficiently

interprets the tagging data as a rich multi-graded data, with ordinal relevance set of

{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡}. Similarly the graded-relevance scheme

efficiently interprets the tagging data as a rich graded-relevance data with ordinal

relevance set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑙𝑖𝑘𝑒𝑙𝑦 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡}. The

experiment results show the proposed methods outperformed the benchmarking

methods. They ascertain that a combination of the interpretation scheme and

learning-to-rank approach has a positive influence in making a recommendation. The

memory efficient technique is implemented to solve the scalability issue that occurs

during the tensor reconstruction process, whereas the weighted scheme and efficient

interpretation scheme are implemented for tackling the sparsity issue. Comparing the

performance of methods based on the learning-to-rank approach, in general, the list-

wise based ranking methods achieve better performance in terms of NDCG than the

iv Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

point-wise based ranking methods. On the other hand, the latter achieves better

performance in terms of AP and MAP in comparison to the former.

This thesis contributes towards the topic under research, that of tag-based

recommendation systems, by focusing on efficiently interpreting tagging data and

implementing the learning-to-rank approaches to the tensor used as the

recommendation model. The tagging data interpretation schemes and learning-to-

rank approaches play an important role in significantly improving the tag-based item

recommendation quality.

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model v

Dedication

I dedicate this thesis to:

My Mother

My Father

My Brothers

vi Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

Table of Contents

Keywords ................................................................................................................................................. i

Abstract ................................................................................................................................................... ii

Dedication ............................................................................................................................................... v

Table of Contents ................................................................................................................................... vi

List of Figures ......................................................................................................................................... x

List of Tables ...................................................................................................................................... xiii

List of Abbreviations ............................................................................................................................ xv

Statement of Original Authorship ....................................................................................................... xvii

Acknowledgements ........................................................................................................................... xviii

CHAPTER 1: INTRODUCTION ....................................................................................................... 1

1.1 Background and Motivations ....................................................................................................... 1

1.2 Research Questions ...................................................................................................................... 6

1.3 Research Objectives ..................................................................................................................... 6

1.4 Research Contributions ................................................................................................................ 9

1.5 Research Significance ................................................................................................................ 11

1.6 Publications................................................................................................................................ 12

1.7 Thesis Outline ............................................................................................................................ 13

1.8 Chapter Summary ...................................................................................................................... 15

CHAPTER 2: LITERATURE REVIEW ......................................................................................... 17

2.1 Web Personalization .................................................................................................................. 17 2.1.1 Content-based Approaches ............................................................................................. 18 2.1.2 Collaborative Filtering Approaches ................................................................................ 20 2.1.3 Hybrid Approaches ......................................................................................................... 21 2.1.4 Summary and Discussion ............................................................................................... 22

2.2 Tag-based Item Recommendation Systems ............................................................................... 22 2.2.1 Social Tagging Systems.................................................................................................. 22 2.2.2 User Profile Modelling Approaches ............................................................................... 25 2.2.2.1 Two-Dimensional Approaches ....................................................................................... 25 2.2.2.2 Multi-Dimensional Approaches ...................................................................................... 27 2.2.3 Tagging Data Interpretation Schemes............................................................................. 31 2.2.3.1 The boolean Scheme ...................................................................................................... 31 2.2.3.2 The set-based Scheme .................................................................................................... 33 2.2.4 Summary and Discussion ............................................................................................... 33

2.3 Ranking-based Recommendation Approaches .......................................................................... 36 2.3.1 Point-wise Based Ranking Approaches .......................................................................... 37 2.3.1.1 Regression based algorithm ............................................................................................ 37 2.3.1.2 Classification based algorithm ........................................................................................ 38 2.3.2 Pair-wise Based Ranking Approaches ............................................................................ 40 2.3.2.1 Regression based algorithm ............................................................................................ 40 2.3.2.2 Classification based algorithm ........................................................................................ 42 2.3.3 List-wise Based Ranking Approaches ............................................................................ 42 2.3.3.1 Directly Optimizing Ranking Evaluation Measure......................................................... 43 2.3.3.2 Minimizing List-wise Loss ............................................................................................. 45 2.3.4 Summary and Discussion: Ranking based recommendation .......................................... 46

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model vii

2.4 Chapter Summary and Concluding Remarks ............................................................................. 48

CHAPTER 3: RESEARCH DESIGN ............................................................................................... 51

3.1 Introduction ................................................................................................................................ 51

3.2 Research Design......................................................................................................................... 51 3.2.1 Phase-One: Tagging Data Pre-Processing ...................................................................... 53 3.2.1.1 The boolean Scheme....................................................................................................... 54 3.2.1.2 The User-Tag set (UTS) Scheme .................................................................................... 55 3.2.1.3 The graded-relevance Scheme ....................................................................................... 55 3.2.2 Phase-Two: Generating Recommendations with Ranking Methods ............................... 56 3.2.2.1 Phase-Two (a): Point-wise based Ranking Approaches ................................................. 57 3.2.2.1.1 TRPR: Probabilistic Ranking ....................................................................................... 57 3.2.2.1.2 We-Rank: Weighted Tensor Approach for Ranking .................................................... 58 3.2.2.2 Phase-Two (b): List-wise based Ranking Approaches ................................................... 59 3.2.2.2.1 Do-Rank: Learning from Multi-graded Data ............................................................... 60 3.2.2.2.2 Go-Rank: Learning from Graded-relevance Data ........................................................ 60

3.3 Datasets ...................................................................................................................................... 61 3.3.1 Experimental Settings ..................................................................................................... 64

3.4 Evaluation Metrics ..................................................................................................................... 66 3.4.1 Point-wise based Ranking Approach .............................................................................. 67 3.4.2 List-wise based Ranking Approach ................................................................................ 68 3.4.2.1 Average Precision (AP) and Mean Average Precision (MAP) ....................................... 68 3.4.2.2 Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative

Gain (NDCG) ................................................................................................................. 68

3.5 Benchmarking Methods ............................................................................................................. 69 3.5.1 MAX Method ................................................................................................................. 70 3.5.2 Pairwise Interaction Tensor Factorization (PITF) Method ............................................. 70 3.5.3 CF-based method that applied the Candidate Tag Set (CTS) Method ............................ 72

3.6 Chapter Summary ...................................................................................................................... 73

CHAPTER 4: POINT-WISE BASED RANKING METHODS ..................................................... 75

4.1 Introduction ................................................................................................................................ 75 4.1.1 Challenges ...................................................................................................................... 75 4.1.2 Proposed Solutions ......................................................................................................... 76

4.2 TRPR: Probabilistic Ranking ..................................................................................................... 77 4.2.1 Overview ........................................................................................................................ 77 4.2.2 User Profile Construction ............................................................................................... 78 4.2.3 Learning-to-Rank Procedure........................................................................................... 80 4.2.3.1 Optimization Criterion and Factorization Technique ..................................................... 80 4.2.3.2 Latent Factors Generation ............................................................................................... 83 4.2.4 Recommendation Generation ......................................................................................... 84 4.2.4.1 Tensor Reconstruction .................................................................................................... 85 4.2.4.2 Candidate Item and Tag Preference Sets Generation ...................................................... 89 4.2.4.3 Top-N Item Recommendation Generation via Probabilistic Ranking ............................ 90 4.2.5 Empirical Evaluation ...................................................................................................... 94 4.2.5.1 Choosing the Latent Factor Matrix Size F ...................................................................... 94 4.2.5.2 Accuracy Performance .................................................................................................... 95 4.2.5.3 Impact of Tag Preference Set Size .................................................................................. 98 4.2.5.4 Scalability ..................................................................................................................... 100 4.2.6 Summary of Probabilistic Ranking ............................................................................... 101

4.3 We-Rank: Weighted Tensor Approach for Ranking ................................................................ 102 4.3.1 Overview ...................................................................................................................... 102 4.3.2 User Profile Construction ............................................................................................. 103 4.3.3 Learning-to-Rank Procedure......................................................................................... 104 4.3.3.1 Optimization Criterion and Factorization Technique ................................................... 104 4.3.3.2 Weighted Tensor ........................................................................................................... 105 4.3.3.3 Latent Factors Generation ............................................................................................. 110

viii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

4.3.4 Recommendation Generation ....................................................................................... 112 4.3.5 Empirical Evaluation .................................................................................................... 112 4.3.5.1 Impact of Tag Preference Size ...................................................................................... 112 4.3.5.2 Primary Tensor Y and Weighted Tensor W .................................................................. 114 4.3.5.3 Accuracy Performance ................................................................................................. 115 4.3.6 Summary of Weighted Tensor Factorization ................................................................ 117

4.4 Chapter Summary .................................................................................................................... 117

CHAPTER 5: LIST-WISE BASED RANKING METHODS ....................................................... 119

5.1 Introduction.............................................................................................................................. 119 5.1.1 Challenges .................................................................................................................... 120 5.1.2 Proposed Solutions ....................................................................................................... 121

5.2 Do-Rank: Learning From Multi-Graded Data.......................................................................... 122 5.2.1 Overview ...................................................................................................................... 122 5.2.2 User Profile Construction ............................................................................................. 123 5.2.3 Learning-to-Rank Procedure ........................................................................................ 127 5.2.3.1 Optimization Criterion and Factorization Technique ................................................... 127 5.2.3.2 Ranking Smoothing ...................................................................................................... 129 5.2.3.3 Latent Factors Generation ............................................................................................ 130 5.2.3.4 Complexity Analysis and Convergence ........................................................................ 132 5.2.4 Recommendation Generation ....................................................................................... 133 5.2.5 Empirical Evaluation .................................................................................................... 134 5.2.5.1 Accuracy Performance ................................................................................................. 134 5.2.5.2 Impact of UTS scheme .................................................................................................. 138 5.2.5.3 Scalability ..................................................................................................................... 140 5.2.5.4 Convergence ................................................................................................................. 140 5.2.6 Summary of Learning from Multi-Graded Data ........................................................... 141

5.3 Go-Rank: Learning From Graded-relevance data .................................................................... 142 5.3.1 Overview ...................................................................................................................... 142 5.3.2 User Profile Construction ............................................................................................. 143 5.3.3 Learning-to-Rank Procedure ........................................................................................ 147 5.3.3.1 Optimization Criterion and Factorization Technique ................................................... 147 5.3.3.2 Ranking Smoothing ...................................................................................................... 148 5.3.3.3 Latent Factors Generation ............................................................................................ 149 5.3.3.4 Complexity Analysis and Convergence ........................................................................ 152 5.3.4 Recommendation Generation ....................................................................................... 152 5.3.5 Empirical Evaluation .................................................................................................... 152 5.3.5.1 Impact of graded-relevance Scheme ............................................................................ 153 5.3.5.2 Accuracy Performance ................................................................................................. 155 5.3.5.3 Impact of Probability Values ........................................................................................ 158 5.3.5.4 Scalability ..................................................................................................................... 161 5.3.5.5 Convergence ................................................................................................................. 162 5.3.6 Summary of Learning from Graded-Relevance Data ................................................... 162

5.4 Chapter Summary .................................................................................................................... 163

CHAPTER 6: PERFORMANCE COMPARISONS AND ANALYSIS ...................................... 165

6.1 Impact of Interpretation Scheme to Tensor Entries Populations .............................................. 169

6.2 Impact of p-core to Tensor Entries Populations and Method Performances ............................ 170

6.3 Impact of Users Tagging Behaviours to Tensor Entries Populations ...................................... 171

6.4 Impact of “Relevant” Entries To Method Performances ......................................................... 173

6.5 Impact of Handling “Likely Relevant” Entries to Method Performances ................................ 175

6.6 Accuracy Comparisons of the Proposed Methods ................................................................... 176

6.7 Point-wise based ranking Methods Versus List-wise based Ranking Methods ....................... 179

6.8 Proposed Methods Versus Benchmarking Methods ................................................................ 182

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model ix

6.9 Efficiency Versus Method Performances ................................................................................. 184

6.10 Computation Complexity ......................................................................................................... 185

6.11 Time complexity ...................................................................................................................... 186

6.12 Strengths and Shortcomings of The Proposed Methods .......................................................... 187

6.13 Chapter Summary .................................................................................................................... 191

CHAPTER 7: CONCLUSIONS ...................................................................................................... 193

7.1 Summary of Contributions ....................................................................................................... 194

7.2 Summary of Findings ............................................................................................................... 196

7.3 Limitations and Future Works ................................................................................................. 200

BIBLIOGRAPHY ............................................................................................................................. 203

x Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

List of Figures

Figure 1.1. A sample of tagging data that holds the user, item, tag ternary relations ............................. 2

Figure 2.1. Example of popular Social Tagging System Websites ....................................................... 23

Figure 2.2. Long-tail distribution of: (a) items of bookmarked URLs, (b) users who made the

bookmarks, and (c) tags used in the bookmarks – captured from tagging data of

Delicious website (Li et al., 2008) ....................................................................................... 24

Figure 2.3. Projection of user, item, tag relation into three two-dimensional matrices ........................ 25

Figure 2.4. The Tucker factorization model for a third-order tensor .................................................... 29

Figure 2.5. The CP factorization model for third-order tensor .............................................................. 29

Figure 2.6. A toy example with U = u1, u2, u3, I = i1, i2, i3, i4, and T = t1, t2, t3, t4, t5: (a)

The observed tagging data, and the initial tensor Y ∈ R3 × 4 × 5 for which entries

are generated by implementing (b) the boolean, and (c) the set-based schemes .................. 32

Figure 2.7. Learning-to-rank framework, adapted from (Liu, 2009) .................................................... 37

Figure 3.1. The research design ............................................................................................................ 52

Figure 3.2. A toy example of entries from the observed tagging data Aob ........................................... 54

Figure 3.3. The toy example of for User 1 (u1) profile built from various interpretation

schemes: (a) boolean, (b) UTS, and (c) graded-relevance .................................................. 56

Figure 3.4. A snapshot of the Delicious dataset .................................................................................... 61

Figure 3.5. A snapshot of the LastFM dataset ....................................................................................... 62

Figure 3.6. A snapshot of the CiteULike dataset .................................................................................. 62

Figure 3.7. A snapshot of the MovieLens dataset ................................................................................. 63

Figure 4.1. Overview of the Probabilistic Ranking method (TRPR) ..................................................... 78

Figure 4.2. Example of initial tensor Y ∈ R3 × 4 × 5 as the representation of user profile in

which entries are generated by implementing the boolean interpretation scheme to

the toy example in Figure 3.2 ............................................................................................... 80

Figure 4.3. The Tucker factorization model for a third-order tensor .................................................... 81

Figure 4.4. The CP factorization model for a third-order tensor ........................................................... 81

Figure 4.5. Example of three ways matricization of a tensor Y ∈ R3 × 4 × 5 ...................................... 82

Figure 4.6. The TRPR learning algorithm, adapted from (Kutty et al., 2012) ....................................... 84

Figure 4.7. The TRPR tensor reconstruction algorithm ......................................................................... 86

Figure 4.8. Example of tensor reconstruction process by implementing the memory efficient

approach where Y ∈ R3 × 4 × 5, Q = 3, R = 4, S = 5, F = 2, and b = 2 ......................... 88

Figure 4.9. Example of the reconstructed tensor Y ∈ R3 × 4 × 5 ......................................................... 89

Figure 4.10. The probabilistic ranking for Top-N item recommendation generation algorithm ........... 91

Figure 4.11. Example of tensor model from toy dataset with only non-negative and non-zero

values displayed as table: (a) Initial tensor Y ∈ R3 × 4 × 5, and (b) Reconstructed

tensor Y ∈ R3 × 4 × 5 .......................................................................................................... 92

Figure 4.12. Performance comparison of TRPR-CP with an increasing number of F........................... 95

Figure 4.13. F1-Score at various Top-N positions on Delicious dataset ............................................... 95

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xi

Figure 4.14. F1-Score at various Top-N positions on LastFM dataset .................................................. 96

Figure 4.15. F1-Score at various Top-N positions on CiteULike dataset .............................................. 96

Figure 4.16. F1-Score at various Top-N positions on MovieLens dataset ............................................ 97

Figure 4.17. Impact of tag preference set size on Delicious dataset ...................................................... 99

Figure 4.18. Impact of tag preference set size on LastFM dataset ........................................................ 99

Figure 4.19. Impact of tag preference set size on CiteULike dataset .................................................... 99

Figure 4.20. Impact of tag preference set size on MovieLens dataset ................................................... 99

Figure 4.21. Scalability comparison by varying tensor dimensionality on Delicious dataset ............. 100

Figure 4.22. Overview of the weighted tensor approach for ranking method (We-Rank) ................... 102

Figure 4.23. The CP factorization model for third-order tensor .......................................................... 105

Figure 4.24. The user Tag Usage Likeliness generation algorithm ..................................................... 107

Figure 4.25. Example of the resulted matricization of tensor Y ∈ R3 × 4 × 5: (a) Mode-1

matricization Y(1) ∈ R3 × 20, and (b) Mode-3 matricization Y(3) ∈ R5 × 12............... 108

Figure 4.26. Example of the resulted latent feature matrix: (a) User latent feature matrix

A ∈ R3 × 2, and (b) Tag latent feature matrix B ∈ R5 × 2 ................................................ 108

Figure 4.27. Example of the resulted User Tag Usage Likeliness matrix L ∈ R3 × 5 ........................ 108

Figure 4.28. The weighted tensor W ∈ RQ × R × S construction algorithm ....................................... 109

Figure 4.29. Example of: (a) Primary tensor Y ∈ R3 × 4 × 5, and (b) the resulted Weighted

tensor W ∈ R3 × 4 × 5 ...................................................................................................... 110

Figure 4.30. The We-Rank learning algorithm .................................................................................... 111

Figure 4.31. Impact of tag preference set size on Delicious dataset .................................................... 113

Figure 4.32. Impact of tag preference set size on LastFM dataset ...................................................... 113

Figure 4.33. Impact of tag preference set size on CiteULike dataset .................................................. 113

Figure 4.34. Impact of tag preference set size on MovieLens dataset ................................................. 113

Figure 4.35. The weighted tensor W densities at various tag preference set size on: (a)

Delicious, (b) LastFM, (c) CiteULike, and (d) MovieLens datasets .................................. 114

Figure 5.1. The initial tensor Y ∈ R3 × 4 × 5 , as the representation of user profile, which

entries are generated by implementing the: (a) set-based and (b) UTS interpretation

schemes .............................................................................................................................. 125

Figure 5.2. The CP factorization model for third-order tensor ............................................................ 128

Figure 5.3. The comparison between DCG and the smoothed approximation of DCG (sDCG).......... 130

Figure 5.4. The Do-Rank learning algorithm ...................................................................................... 133

Figure 5.5. The Do-Rank scalability ................................................................................................... 140

Figure 5.6. The Do-Rank convergence criterion ................................................................................. 141

Figure 5.7. Example of initial tensor Y ∈ R3 × 4 × 5 , as the representation of user profile,

which entries are generated by implementing the (a) set-based and (b) graded-

relevance interpretation schemes ....................................................................................... 144

Figure 5.8. The comparison between GAP and the smoothed approximation of GAP (sGAP) ........... 149

Figure 5.9. The Go-Rank learning algorithm ...................................................................................... 151

Figure 5.10. Go-Rank improvement over PITF ................................................................................... 158

Figure 5.11. Impact of probability values on the Delicious dataset .................................................... 160

Figure 5.12. Impact of probability values on the LastFM dataset ....................................................... 160

xii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

Figure 5.13. Impact of probability values on the CiteULike dataset ................................................... 160

Figure 5.14. Impact of probability values on the MovieLens dataset ................................................. 160

Figure 5.15. The Go-Rank scalability ................................................................................................. 161

Figure 5.16. The Go-Rank convergence .............................................................................................. 162

Figure 6.1. Comparison of size of p-core over tensor entries population on boolean, UTS, and

graded-relevance schemes ................................................................................................. 170

Figure 6.2. Comparison of p-core over methods performances using NDCG .................................... 171

Figure 6.3. Comparison of p-core over methods performances using AP .......................................... 171

Figure 6.4. Comparison of p-core over methods performances using MAP ....................................... 171

Figure 6.5. The statistic of user-item and user-tag sets on: (a) Delicious, (b) LastFM, (c)

CiteULike, and (d) MovieLens datasets ............................................................................. 172

Figure 6.6. Comparison of “relevant” over “irrelevant” entries population ........................................ 173

Figure 6.7. Comparison of “relevant” over “likely relevant” entries .................................................. 173

Figure 6.8. Comparison of “relevant” entries population over methods performances using

NDCG ................................................................................................................................ 174

Figure 6.9. Comparison of “relevant” entries population over methods performances using AP ....... 174

Figure 6.10. Comparison of “relevant” entries population over methods performances using

MAP ................................................................................................................................... 174

Figure 6.11. Comparison of MAX-boolean over MAX- graded performances showing the

impact of inappropriately handling the “likely relevant” entries........................................ 175

Figure 6.12. Comparison of methods performances on Delicious dataset .......................................... 176

Figure 6.13. Comparison of methods performances on LastFM dataset ............................................. 178

Figure 6.14. Comparison of methods performances on CiteULike dataset ......................................... 178

Figure 6.15. Comparison of methods performances on MovieLens dataset ....................................... 179

Figure 6.16. Comparison of proposed methods performances as the average over all datasets

using NDCG, AP, and MAP .............................................................................................. 180

Figure 6.17. The comparison of efficiency over method performances .............................................. 184

Figure 6.18. The comparison of time complexity of proposed methods ............................................. 187

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xiii

List of Tables

Table 1.1. An example of item recommendations based on tagging data in Figure 1.1 .......................... 3

Table 1.2. Summary of each developed method ..................................................................................... 9

Table 1.3. The summary showing how research questions, contributions, corresponding

chapters and publications fit together in the thesis ............................................................... 15

Table 2.1. Recommendation Approach and Techniques, summarised from (Adomavicius and

Tuzhilin, 2005) ..................................................................................................................... 19

Table 2.2. Classification of tag-based recommendation research according to the user profile

modelling approaches, the data interpretation schemes, and the types of

recommendation ................................................................................................................... 35

Table 2.3. Classification of ranking-based recommendation research according to ranking

approaches, loss functions, and feedback forms. Here, I = Implicit and E = Explicit. ......... 47

Table 3.1. Details of the various characteristic of datasets.................................................................... 64

Table 3.2. The details of dataset statistics resulted from the implementation of various p-cores ......... 65

Table 3.3. The summaries of the proposed ranking methods ................................................................ 73

Table 4.1. Average TRPR accuracy improvement over MAX .............................................................. 98

Table 4.2. The density comparison of non-zero entries generated from Dtrain on the primary

tensor Y and weighted tensor W (v = 50) ........................................................................ 115

Table 4.3. F1-Score at various Top-N positions on Delicious dataset................................................. 116

Table 4.4. F1-Score at various Top-N positions on LastFM dataset ................................................... 116

Table 4.5. F1-Score at various Top-N positions on CiteULike dataset ............................................... 116

Table 4.6. F1-Score at various Top-N positions on MovieLens dataset .............................................. 116

Table 5.1. NDCG, AP, and MAP on Delicious dataset ....................................................................... 136

Table 5.2. NDCG, AP, and MAP on LastFM dataset ......................................................................... 136

Table 5.3. NDCG, AP, and MAP on CiteULike dataset ..................................................................... 137

Table 5.4. NDCG, AP, and MAP on MovieLens dataset .................................................................... 137

Table 5.5. The comparison of tensor entries population distribution generated from Dtrain

using boolean, set-based and UTS schemes ....................................................................... 139

Table 5.6. The comparison of tensor entries population distribution generated from Dtrain

using boolean, set-based, and graded-relevance schemes ................................................. 154

Table 5.7. NDCG, AP, and MAP on Delicious dataset ....................................................................... 156

Table 5.8. NDCG, AP, and MAP on LastFM dataset ......................................................................... 156

Table 5.9. NDCG, AP, and MAP on CiteULike dataset ..................................................................... 157

Table 5.10. NDCG, AP, and MAP on MovieLens dataset .................................................................. 157

Table 6.1. The proposed and benchmarking methods performances on Delicious dataset ................. 167

Table 6.2. The proposed and benchmarking methods performances on LastFM dataset .................... 167

Table 6.3. The proposed and benchmarking methods performances on CiteULike dataset ................ 168

Table 6.4. The proposed and benchmarking methods performances on MovieLens dataset .............. 168

xiv Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

Table 6.5 . The comparison of tensor entries population distribution generated from Dtrain

using boolean, UTS, and graded-relevance schemes ......................................................... 169

Table 6.6. The comparison of complexity of proposed methods ........................................................ 185

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xv

List of Abbreviations

ALS Alternating Least Square

AP Average Precision

AUC Area Under the Receiver Operating Characteristic Curve

CF Collaborative Filtering

CP Candecomp/Parafac

DCG Discounted Cumulative Gain

Do-Rank DCG Optimization for Learning-to-Rank

ERR Expected Reciprocal Rank

GAP Graded Average Precision

Go-Rank GAP Optimization for Learning-to-Rank

HOOI Higher-Order Orthogonal Iteration

HOSVD Higher-Order SVD

IDCG Ideal Discounted Cumulative Gain

MAP Mean Average Precision

MRR Mean Reciprocal Rank

MSE Mean Square Error

NDCG Normalized Discounted Cumulative Gain

PITF Pairwise Interaction Tensor Factorization

RR Reciprocal Rank

STS Social Tagging Systems

SVD Singular Value Decomposition

TF-IDF Term Frequency–Inverse Document Frequency

xvi Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

TRPR Tensor-based Item Recommendation using Probabilistic Ranking

UTS User-Tag Set

We-Rank Recommendation Ranking using Weighted Tensor

wMSE weighted Mean Square Error

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xvii

Statement of Original Authorship

The work contained in this thesis has not been previously submitted to meet

requirements for an award at this or any other higher education institution. To the

best of my knowledge and belief, the thesis contains no material previously

published or written by another person except where due reference is made.

Signature:

Date: _________________________ 12/08/2016

QUT Verified Signature

xviii Personalized Ranking for Tag-based Item Recommendation System using Tensor Model

Acknowledgements

I would like to start by praising the Almighty, the Lord of Everything, the Beneficent

and the Merciful.

I sincerely express my gratitude and appreciation to Associate Professor Richi

Nayak, my Principal Supervisor, for her continuous guidance, encouragement, and

support throughout my PhD journey. Her valuable reviews and feedback have

sharpened and enhanced my critical thinking and research skills. Further, it is her

patience and understanding that has helped me to get through my low moments. I

also thank Associate Professor Shlomo Geva for being my Associate Supervisor.

I acknowledge the Directorate General of Higher Education (DGHE) Indonesia

for financially supporting my PhD study. My special gratitude goes to the QUT High

Performance Computing (HPC) and Research Support Group for their computational

resources and services. Further, I am indebted to my home institution in Indonesia:

Informatics Department, Faculty of Engineering, University of Trunojoyo Madura,

for the study leave.

I would like to thank the Science and Engineering Faculty (SEF), School of

Electrical Engineering and Computer Science (EECS) and Data Science (DS)

Discipline for providing me a comfortable research environment. My appreciation is

extended to Dr Sangeetha Kutty, Dr Rakesh Rawat, Dr Suren Rathnayake and Mr

Endang Djuana for the valuable conceptual and technical discussions we had at the

early stages of my candidature. My thanks go to my colleagues, Israt, Edy, Gavin,

Reza, Jun, Paul, Mahnoosh, Lin, Khanh, Fahim, Hamzah, Raji and Daniel for their

support in many circumstances.

My gratitude goes to the staff members from EECS, especially Ms Ellainne

Steele, Ms Joanne Reaves, Ms Joanne Kelly, Ms Sharon McCann, Ms Mallory Van

Nek for their administrative support and also, their personal assistance.

Proofreading service for this thesis was provided and is acknowledged,

according to the guidelines laid out in the University-endorsed national policy

guidelines for the editing of research theses.

Personalized Ranking for Tag-based Item Recommendation System using Tensor Model xix

Last but not least, I am truly grateful to my family and friends, for their love,

support and encouragement.

Chapter 1: Introduction 1

Chapter 1: Introduction

This chapter outlines the background of the research and its motivations. The next

four sections describe the research questions, objectives, contributions, and

significance. Following on from this, the papers published from the work presented

in this thesis are listed and the remaining chapters are outlined. Finally the last

section provides the summary of the chapter.

1.1 BACKGROUND AND MOTIVATIONS

Recommendation systems help users to find relevant information on the Internet by

providing them with a list of items that they might be interested in (Zhang et al.,

2011). The list of recommendations is generated by learning from the user profiles,

which are commonly built from the information related to both the users and the

items, such as users’ purchase history (Pradel et al., 2011; Rendle, Freudenthaler, et

al., 2009), demographics (Vozalis and Margaritis, 2007), ratings (Balakrishnan and

Chopra, 2012; Koren and Sill, 2011; Weimer et al., 2007), and content of items (de

Campos et al., 2010; Pazzani and Billsus, 2007).

Accompanying the popularity of Web 2.0, are the emerging Social Tagging

System (STS) applications, in which users can organise, retrieve, and share items

(e.g. bookmarks, songs, movies, and articles) with other users (Marinho et al., 2012;

Mezghani et al., 2012; Schoefegger and Granitzer, 2012). These systems facilitate

their users to use freely defined tags for annotating items of their interest. Users are

typically allowed to use the same tag for annotating different items, as well as using

different tags for annotating the same item. A tagging activity represents the event

when a user uses a tag to annotate an item, and a ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation is

naturally formed. Over a period of time, the tagging data are recorded as a result of

the accumulated ternary relations. Figure 1.1 shows a sample of tagging data that

holds the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation, where three users, four items and five

tags are recorded in total. It is to be noted that the tagging activities of each user, i.e.

2 Chapter 1: Introduction

(a) User 1, (b) User 2, and (c) User 3, are displayed as a separate sub-figure for ease

of illustration.

Unlike the “traditional” recommendation systems, which use ratings to capture

user interest of certain items, STSs capture the user interest by analysing the tagging

data and support the process of generating item recommendations. In other words,

the system predicts the list of items that may be of interest to a user by learning from

the user’s tagging preferences. An STS facilitates a tag-based item recommendation

system, the success of which highly depends upon how the relations in the tagging

data are exploited (Bogers and van den Bosch, 2009; Kim et al., 2010).

Item 3

Item 1

Tag 2

Tag 3

Tag 4

Tag 5

Tag 1

Item 2

User 1

Item4

Item 3

Item 1

Tag 2

Tag 3

Tag 4

Tag 5

Tag 1

Item 2

User 2

Item4

Item 3

Item 1

Tag 2

Tag 3

Tag 4

Tag 5

Tag 1

Item 2

User 3

Item4

(a) User 1 (b) User 2 (c) User 3

Figure 1.1. A sample of tagging data that holds the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations

In order to boost the performance of recommendation systems with tags, the

unique multi-dimensional relations between users, items, and tags must be

appropriately modelled to represent the user profiles, such that the latent

relationships among dimensions are thoroughly captured. Therefore, building a tag-

based recommendation system needs to employ a multi-dimensional approach rather

than splitting them into multiple lower dimension models (Rendle, Balby Marinho, et

al., 2009; Rendle and Schmidt-Thieme, 2010; Symeonidis et al., 2008, 2010). Tensor

models are an approach that can preserve the multi-dimensional nature of the tagging

data and infer the latent relationships inherent in the data (Acar et al., 2011; Ifada and


Nayak, 2014c; Kolda and Bader, 2009; Symeonidis et al., 2010). For tag-based item

recommendation systems, tagging data can be modelled as a third-order tensor,

factorized to acquire the latent factors that govern the ternary relations, and

reconstructed to calculate the predicted preference scores for generating the list of

recommendations.

The task of a tag-based item recommendation system is to generate a list of

items that may be of interest to a user, by learning from the user’s past tagging

behaviour. Based on the sample of tagging data shown in Figure 1.1, an example of

item recommendation can be demonstrated and listed in Table 1.1. Figure 1.1 shows

that User 1 has the same tag preferences with User 2 and User 3 as they all have

used Tag 4 to annotate items. Subsequently, the system can recommend items that

have been annotated by User 2 and User 3 to User 1. In this case, the system may

recommend Item 1 and Item 4 to User 1 as those items have been previously

annotated by User 2 and User 3, respectively. Using the same approach, the system

may recommend Item 2 and Item 4 to User 2 as they have been previously annotated

by User 1 and User 3, respectively. Likewise, the system may recommend Item 1 and

Item 2 to User 3 as they have been previously annotated by User 2 and User 1,

respectively. Given the tagging data, a list of item recommendations can be

generated for each user.

User

Previous

Annotated

Item

Similar User based

on Tag Preference

Previous Annotated

Item of Similar User

Recommended

Item

User

1

Item 2,

Item 3

User 2: Tag 1, Tag 4

User 3: Tag 4

User 2: Item 1, Item 3


Item 1,

Item 4

User

2

Item 1,

Item 3





Item 2,

Item 4

User

3

Item 3,

Item 4

User 1: Tag 4




Item 2

Item 1

Table 1.1. An example of item recommendations based on tagging data in Figure 1.1

The web search research has established that users usually show more interest

in the few items at the top of the list of recommendations than those further down in

the list (Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009; Mohan et al., 2011;


Wang et al., 2013; Weimer et al., 2007). Accounting for this research, this thesis

conjectures that a tag-based item recommendation system should provide an ordered

list of item recommendations. It will be advantageous to implement a learning-to-

rank approach for learning the tag-based recommendation model to solve the item

recommendation task.

The learning-to-rank approaches can be categorised into three types: point-

wise; pair-wise; and list-wise according to the input representation and the loss

function used (Liu, 2009; Mohan et al., 2011). To solve the recommendation task

using a point-wise based ranking approach, the recommendation model is learned to

predict whether the user will like the predicted item or not, assuming there is no

interdependency between the predicted items (Liu, 2009; Mohan et al., 2011; Rendle,

2011). In a pair-wise based ranking approach, the recommendation model is learned

to predict the order of a pair of items, in which the interdependency occurs between

the two paired items (Liu, 2009; Mohan et al., 2011; Rendle, 2011). To solve the

recommendation task using a list-wise based ranking approach, the recommendation

model is learned to predict an ordered set of items that will be of interest to a user, in

which a ranking of predicted items depends on other corresponding items (Liu, 2009;

Mohan et al., 2011).

In spite of progress in this research field, there exist several challenges and

shortcomings with the current tag-based item recommendation methods:

Data interpretation. An interpretation scheme defines the user profile

representation, dictating how the user tagging activities should populate the

data structure used. It greatly affects the recommendation performance (Ifada

and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009). A tag-based item

recommendation system customarily interprets the observed data as

“positive” or “relevant” tagging data entries. Observed data is the state which

is registered by users, expressing their interest in items by annotating them

with tags. Given that the system records the tagging activities, the observed

entries can be interpreted from the tagging data straightforwardly. On the

contrary, how should the non-observed tagging data be interpreted, remains

disputed and open to researchers’ perceptions. At present, there are two well-

known interpretation schemes: (1) the boolean scheme, which interprets non-

observed entries as a single value of “0” (Symeonidis et al., 2010), and (2) the


set-based scheme which interprets non-observed entries as a combination of

“irrelevant” and “indecisive” entries (Rendle, Balby Marinho, et al., 2009),

i.e. entries that the users do not like and might like in the future, respectively.

The boolean scheme has the sparsity problem due to the non-observed entries

domination and the overfitting problem as it mixes the “irrelevant” and

“indecisive” entries that can be inferred from the non-observed entries

(Rendle, Balby Marinho, et al., 2009). The set-based scheme has shown how

to tackle these problems; however, it overgeneralises the “irrelevant” entries

(Ifada and Nayak, 2014a);

Utilising reconstructed tensor for generating the recommendations. The

existing approaches assume that the predicted preference score in the

reconstructed tensor represents the level of user preference for an item based

on a tag directly. These approaches generate the list of recommendations

based on the maximum values of predicted preference scores in each user-

item set (Nanopoulos, 2011; Symeonidis et al., 2010). However, they

disregard the user’s past tagging activities that have been found influencing

the user preference in the recommended items (Kim et al., 2010);

Learning from the latent factors. The task of a tag-based recommendation

system is to generate the list of items that may be of interest to a user, by

learning from the user’s tagging history. The list of item recommendations is

sorted in descending order, based on the predicted preference score that

exposes the preference level of a user for annotating an item using a tag. By

using a tensor model to build the user profile, the preference score can be

calculated from the latent factors that govern the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary

relations inherent in the tagging data. Consequently, the choice of loss

function used as the optimization criterion becomes crucial as it controls the

learning process of latent factors. The data interpretation approach used to

construct the tagging data for populating the tensor model and the learning-

to-rank approach employed to learn the recommendation model govern the

recommendation process and become significant.


Inspired by these challenges of the tag-based item recommendation systems, this

research aims to exploit the tensor model and learning-to-rank approaches for

providing an effective solution to the item recommendation task in a tag-based

system. It is to be noted that there exist a large number of item recommendation

works that deal with the semantic analysis of tags for sparsity dealing or improving

quality. However, very few works focus on improving the data quality via efficient

interpretation of input data. This thesis does not deal with the semantic analysis of

tags, but rather to focus on the interpretation of tagging data.

1.2 RESEARCH QUESTIONS

This thesis focuses on providing the Top-N item recommendation to a user in the tag-

based system, by implementing the tensor model and learning-to-rank approaches.

The identification of research gaps in a tag-based item recommendation system leads

to the formulation of the following research questions:

Q1: How can tagging data be efficiently interpreted, such that the user’s tagging

history is thoroughly utilised while making recommendations and results in

a rich multi-graded data?

Q2: How can a learning-to-rank approach be implemented to solve the tag-

based item recommendation task? What optimization criterion should be

used for learning the tensor recommendation model? In what order can the

Top-𝑁 item recommendation be made?

Q3: Does a combination of an interpretation scheme and a learning-to-rank

approach have a positive influence in making a recommendation? Given

that the proposed tag-based item recommendation methods are grouped as

point-wise and list-wise based ranking approaches, comparing their

performances may help to find an efficient method.

1.3 RESEARCH OBJECTIVES

Focus of this thesis is to implement two ranking approaches: point-wise and list-

wise. The pair-wise based ranking approach is not implemented in this research, as


its objective is to predict the order of a pair of items and therefore it disregards the

fact that Top-𝑁 recommendation is a prediction task on a list of items (Cao et al.,

2007). The recommendation task is framed as a regression/classification task by the

point-wise based ranking approach and as a ranking task by the list-wise based

ranking approach. More specifically, the research objectives required to be fulfilled

are listed as follows:

Developing the point-wise based ranking approach methods:

o Developing a method that implements a probabilistic ranking to rank the

list of recommendations. A tag-based item recommendation method

typically implements the boolean interpretation scheme for building the

tensor recommendation model and uses the least square loss function as

the optimization criterion for learning the model. For generating

recommendations, the existing methods (Nanopoulos, 2011; Symeonidis

et al., 2010) directly use the maximum values of predicted preference

scores in each user-item set of the reconstructed tensor model and ignore

the users’ past tagging activities, which results in inferior

recommendation quality. An additional challenge of this approach is the

tensor reconstruction process where the entire latent factors need to be

multiplied, in which it consumes a lot of memory and therefore scalability

becomes an issue. The developed method focuses on how the

recommendation accuracy of candidate items revealed from the

reconstructed tensor be improved and the scalability issue faced during

the tensor reconstruction process be solved;

o Developing a method that implement a weighted tensor approach for

ranking. Applying the least square loss function as the optimization

criterion, to learn the tensor recommendation model built from the

boolean interpretation scheme implementation, means that fitting both the

observed and non-observed entries has the same importance. In this case,

implementing a weighting scheme in the learning process is beneficial to

differentiate the importance of observed and non-observed entries of each

user-item set. The developed method focuses on how the quality of

recommendations be improved by implementing a weighted scheme in a

way such that the observed and non-observed entries of each user-item set


are given either rewards or penalties, i.e. the observed entries are

weighted with higher values than the non-observed ones, for learning the

tensor recommendation model.

Developing the list-wise based ranking approach methods:

o Developing a method to learn from multi-graded data. Implementing a

ranking-based data interpretation scheme allows the interpreted tagging

data to have a ranked representation, i.e. the observed entries are given

higher values than those of non-observed, and results in the multi-graded

tagging data representation. The tagging data is labelled with a value in

the ordinal relevance set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a

tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩. The developed method focuses on how the

tensor recommendation model built from multi-graded data be efficiently

learned by proposing and applying the User-Tag Set (UTS) for

constructing the user profile, and using the Discount Cumulative Gain

(DCG) as the optimization criterion for learning the tensor

recommendation model;

o Developing a method to learn from graded-relevance data. The multi

grading of the data with {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑖𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple

of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ can be made richer by considering the “transitional”

entries between “relevant” and “irrelevant”. The developed method

focuses on how the tensor recommendation model built from the graded-

relevance data be efficiently learned by proposing and applying the

graded-relevance interpretation scheme, to effectively leverage the

tagging data, for constructing the user profile, and using the Graded

Average Precision (GAP) as the optimization criterion for learning the

tensor recommendation model.

Comparing and analysing the results of all proposed ranking methods, and the

benchmarking methods, to reveal the strengths and shortcomings of each

method.


1.4 RESEARCH CONTRIBUTIONS

This thesis has developed schemes to interpret tagging data and methods to generate

tag-based item recommendation, as summarised in Table 1.2.

Developed Method Optimization

Criterion

Data

Type

Interpretation

Scheme

Ranking

Approach

TRPR: Tensor-based Item

Recommendation using

Probabilistic Ranking

Least square

loss

Binary boolean Point-wise

We-Rank:

Recommendation Ranking

using Weighted Tensor

Weighted least

square loss

Binary +

Multi-

graded

boolean +

weighted

scheme

Point-wise

Do-Rank: DCG

Optimization for Learning-

to-Rank

Discount

Cumulative

Gain (DCG)

Multi-

graded

User-Tag set

(UTS)

List-wise

Go-Rank: GAP

Optimization for Learning-

to-Rank

Graded Average

Precision (GAP)

Graded

relevance

graded-

relevance

List-wise

Table 1.2. Summary of each developed method

In particular, the contributions of this research are listed as follows:

To tackle the problems of existing interpretation schemes, two ranking-based

interpretation schemes are proposed, i.e. User-Tag Set (UTS) and graded-

relevance, which apply a ranking constraint to interpret the tagging data and

result in a richer data. The UTS scheme interprets the tagging data as multi-

graded data and results in three possible distinct entries: (1) “relevant” or “1”

– user has been observed showing his interest to items of the entries, (2)

“irrelevant” or “-1” – user is not interested with the entries, and (3)

“indecisive” or “0” – user might be interested with the entries in the future,

i.e. entries need to be predicted for generating the list of recommendations.

The graded-relevance scheme interprets the tagging data as graded-relevance

data and results in four possible distinct entries: (1) “relevant” or “2”, (2)

“likely relevant” or “1”, (3) “irrelevant” or “-1”, and (4) “indecisive” or “0”.

The “likely relevant” entries are those that the user is probably interested


with, yet this is not explicitly revealed. Note that items of those entries have

actually been annotated by the user using other tags. In other words, the

“likely relevant” entries are the transitional entries between the “relevant”

and “irrelevant” entries;

To improve the recommendation accuracy after the tensor model has been

reconstructed, and the scalability during the tensor reconstruction process, the

Tensor-based Item Recommendation using Probabilistic Ranking (TRPR)

method is proposed. TRPR improves the quality of recommendations by

applying the boolean interpretation scheme, for constructing user profiles,

and implementing probabilistic ranking, in which the user’s past tagging

history is taken into account, for generating the list of recommendations.

TRPR solves the scalability issue faced during the tensor reconstruction

process, by implementing a memory efficiency technique;

To improve the recommendation accuracy during the learning from the latent

factors process and to deal with the sparsity problem, the Recommendation

Ranking using Weighted Tensor (We-Rank) method is proposed. We-Rank

improves the quality of recommendations by applying the boolean

interpretation scheme for constructing user profiles, and utilising the users

past tagging histories to reveal their tag usage likeliness for learning the

tensor recommendation model. We-Rank implements a weighted scheme,

such that rewards and penalties are given to the observed and non-observed

entries of each user-item set during the learning process, respectively. In this

case, in contrast to TRPR that requires a succeeding approach to correctly

rank the order of items that might interest users after factorization and

reconstruction processes, the resulted factorized elements of We-Rank can be

directly used to make ranked recommendations;

To learn from a user profile built from multi-graded data, resulted by

implementing the proposed User-Tag Set (UTS) scheme, the DCG

Optimization for Learning-to-Rank (Do-Rank) method is proposed. The

recommendation model of Do-Rank is optimized with respect to Discount

Cumulative Gain (DCG) as the ranking evaluation measure. Do-Rank tackles

the computational expensiveness of the learning process by implementing a


fast learning approach that efficiently reduces the learning time, while at the

same time improving or maintaining accuracy;

To learn from a user profile built from graded-relevance data, resulted by

implementing the proposed graded-relevance scheme, the GAP Optimization

for Learning-to-Rank (Go-Rank) method is proposed. The recommendation

model of Go-Rank is optimized with respect to Graded Average Precision

(GAP) as the ranking evaluation measure. Using GAP as the optimization

criterion enables the recommendation model to set up thresholds so that the

“likely relevant” entries can be regarded as either “relevant” or “irrelevant”

entries. Go-Rank tackles the computational expensiveness of the learning

process by implementing a fast learning approach that efficiently reduces the

learning time, while at the same time improving or maintaining accuracy;

The results of all the proposed methods and benchmarking methods are

compared. Analyses of the results are conducted to reveal the strength and

shortcoming of each proposed method.

1.5 RESEARCH SIGNIFICANCE

The research carried out in this thesis advances the knowledge discovery in tag-based

recommendation systems, which focuses on efficiently interpreting tagging data and

ranking the list of recommendations. The area of tag-based recommendation systems,

in particular how the tagging data should be interpreted as it determines the

recommendation quality, is under research.

This thesis has practical significance for real-life applications since an

efficient tagging data interpretation scheme can provide an alternative solution for

solving the sparsity problem that commonly occurs in the tag-based systems, as

usually only a few entries are observed per user (Leginus et al., 2012; Rafailidis and

Daras, 2013; Rendle, Balby Marinho, et al., 2009). Moreover, an efficient

interpretation scheme is more important, instead of just simply trying to get more

dense data representation, e.g. via clustering techniques for reducing the tag

dimension to represent the semantically similar tags. Ranking the list of

recommendations has a strong practical implication since, in real-life, users usually

show more interest in the few items at the top of the list of recommendations than


those further down the list (Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009;

Mohan et al., 2011). In this case, working on the approaches that optimize “the top of

the list” is essential in tag-based item recommendation.

From a broader point of view, this research is providing solutions for problems

that can generate three-dimensional data. Hence, in general, any applications with

this type of data can be solved by methods proposed in this thesis. A well-known

example of such an application is Twitter1 which allows its users to use the hashtag

symbol (‘#’) before a relevant keyword to categorise their tweets. Survey by

RadiumOne (2013) reported that 58% of Twitter users use hashtags on a regular

basis. Similar to STS applications, which allow the users to use tags for annotating

items of their interest, the proposed tag-based item recommendation methods can be

implemented for the tweets recommendation system.

The context-aware recommendation system is another example of a problem

that can be solved using the proposed methods. A context-aware system incorporates

the additional contextual information, such as time and location, into the

recommendation process (Adomavicius and Tuzhilin, 2011) for generating a list of

item recommendations to users, under certain contexts. Such a system generates

three-dimensional data as the contextual information becomes the third dimension,

adding those of user and item.

1.6 PUBLICATIONS

The following publications have been produced from the work presented in this

thesis.

1. Ifada, Noor & Nayak, Richi (2016). How Relevant is the Irrelevant

Data: Leveraging the Tagging Data for a Learning-to-Rank Model. In

Proceedings of the 9th ACM International Conference on Web Search and Data

Mining – WSDM 2016, ACM New York, San Francisco, California, USA, pp.

23-32.

2. Ifada, Noor & Nayak, Richi (2015). Do-Rank: DCG Optimization for Learning-

to-rank in Tag-based Item Recommendation Systems. In Cao, T., Lim, E.-

1 https://twitter.com/

http://eprints.qut.edu.au/83203/



P., Zhou, Z.-H., Ho, T.-B., Cheung, D., & Motoda, H. (Eds.) Advances in

Knowledge Discovery and Data Mining – PAKDD 2015. Springer-Verlag Berlin

Heidelberg, Berlin, pp. 510-521.

3. Ifada, Noor & Nayak, Richi (2014). A Two-stage Item Recommendation Method

using Probabilistic Ranking with Reconstructed Tensor Model. Lecture Notes in

Computer Science: User Modeling, Adaptation, and Personalization – UMAP

2014, 8538, pp. 98-110.

4. Ifada, Noor & Nayak, Richi (2014). An Efficient Tagging Data Interpretation and

Representation Scheme for Item Recommendation. In Proceedings of the 12th

Australasian Data Mining Conference – AusDM 2014, 27-28 November 2014,

Queensland University of Technology, Gardens Point Campus, Brisbane,

Australia. (Best Paper Award)

5. Ifada, Noor (2014). A Tag-based Personalized Item Recommendation System

using Tensor Modeling and Topic Model Approaches. In Proceedings of the 37th

International ACM SIGIR Conference on Research & Development in

Information Retrieval – SIGIR 2014, ACM New York, Gold Coast, Queensland,

Australia, p. 1280.

6. Ifada, Noor & Nayak, Richi (2014). Tensor-based Item Recommendation using

Probabilistic Ranking in Social Tagging Systems. In Chung, Chin-Wan, Broder,

Andrei, Shim, Kyuseok, & Suel, Torsten (Eds.) In Proceedings of the Companion

Publication of the 23rd International Conference on World Wide Web

Companion – WWW 2014, ACM, Seoul, Republic of Korea, pp. 805-810.

1.7 THESIS OUTLINE

The remainder of the thesis is organised as follows:

Chapter 2 reviews the relevant literature. The review covers literature about

web personalization, tag-based item recommendation system, and ranking-based

recommendation approaches.

Chapter 3 presents the research design and the evaluation procedure. The

research design is described in two phases, i.e. tagging data pre-processing and the

development of tag-based item recommendation methods. The proposed methods in










this thesis are categorised into point-wise and list-wise ranking methods. This

chapter also includes the detailed description of the datasets, the experimental

settings, and the various evaluation measures used to evaluate the proposed tag-based

item recommendation methods. Lastly, the benchmarking methods use to evaluate

the proposed methods are presented.

Chapter 4 describes the proposed tag-based item recommendation methods

built by implementing the point-wise based ranking approach. The proposed point-

wise methods are the Tensor-based Item Recommendation using Probabilistic

Ranking (TRPR) and the Recommendation Ranking using Weighted Tensor (We-

Rank) methods. Both methods implement the standard pre-processing scheme, i.e.

implementing the boolean scheme, for building the tensor recommendation model

that represents the user profiles. The TRPR method ranks the list of item

recommendations by employing a probabilistic approach that is taking into account

the user’s past tagging history to calculate the user’s probability for annotating an

item given a list of tags, following the tensor reconstruction process, in order to

improve the quality of recommendation. TRPR also implements a memory efficiency

technique in order to solve the scalability issue that occurs during the tensor

reconstruction process. The results of TRPR are then compared to the benchmarking

methods. We-Rank, another point-wise based ranking approach method, implements

a weighted scheme for learning the tensor recommendation model, such that rewards

and penalties are given to the observed and non-observed entries of each user-item

set, respectively. The experimentation results of We-Rank are also presented and

compared against the benchmarking methods.

Chapter 5 presents the proposed tag-based item recommendation methods built

by implementing the list-wise based ranking approach. The proposed list-wise

methods include the DCG Optimization for Learning-to-Rank (Do-Rank) and the

GAP Optimization for Learning-to-Rank (Go-Rank) methods. For each method, new

pre-processing schemes are proposed. For the Do-Rank method, the User-Tag Set

(UTS) scheme is proposed for building the tensor recommendation model, which

represents the user profiles, and the Discount Cumulative Gain (DCG) is used as

optimization criterion to appropriately learn the model. The results of Do-Rank are

then compared to the benchmarking methods. For the Go-Rank method, the graded-

relevance scheme is proposed for building the tensor recommendation model, which


represents the user profiles, and the Graded Average Precision (GAP) is used as

optimization criterion to appropriately learn the model. The experimentation results

of Go-Rank are also presented and compared against the benchmarking methods.

In Chapter 6, the results of all the proposed methods and benchmarking

methods are compared. Analyses of the results are conducted in order to investigate

the impact of various aspects.

Chapter 7 presents the final conclusions, including listing the main contribution

and summary of the findings of this thesis. A discussion about the future research is

also identified.

1.8 CHAPTER SUMMARY

This chapter has detailed the background, motivations, objectives, and significance

of this research. Furthermore, this chapter also lists the research questions,

contributions, and the corresponding publications. Table 1.3 presents a summary

showing how they, including the corresponding chapters, fit together in this thesis.

Research

Activity:

Interpretation

Scheme

Ranking Approach Comparison

Point-wise List-wise

Research

Question: Q1 Q2 Q2 Q3

Research

Contribution:

User-Tag Set (UTS),

graded-relevance

TRPR,

We-Rank

Do-Rank,

Go-Rank

Comparison of all

proposed methods

Corresponding

Chapter: Chapter 5 Chapter 4 Chapter 5 Chapter 6

Corresponding

Publication: Paper 1, 2, 4 Paper 3, 5, 6 Paper 1, 2

Table 1.3. The summary showing how research questions, contributions, corresponding chapters and

publications fit together in the thesis


Chapter 2: Literature Review 17

Chapter 2: Literature Review

This chapter reviews the most relevant literature related to web personalization, tag-

based item recommendation systems and ranking-based recommendation

approaches. Since this thesis does not deal with the semantic analysis of tags, the

techniques presented in this chapter are focusing on the interpretation of tagging

data.

This chapter starts by discussing web personalization, which highlights the

importance of recommendation systems in web personalization, including the

approaches of recommendation algorithms. This thesis proposes a tag-based

recommendation system; therefore acquiring a comprehensive knowledge of

traditional recommendation systems is essential. The following section details the

tag-based item recommendation systems, which includes a brief description of Social

Tagging Systems, to grasp the important aspects of tag-based item recommendation

systems, and understanding the importance of selecting the appropriate user profile

modelling approach and tagging data interpretation scheme for building the

recommendation model. Following that, the third section discusses the ranking-based

recommendation approaches that can be implemented for learning the tag-based

recommendation model, in order to solve the recommendation task. Finally, in the

summary and conclusion section, the research gaps are derived by analysing the

shortcomings of the current approaches employed in the tag-based recommendation

systems.

2.1 WEB PERSONALIZATION

Web personalization aims to overcome the abundant information issue on the

Internet by pointing users to the list of recommendations that might interest them

(Castellano et al., 2009; Mobasher, 2007; Singh Anand and Mobasher, 2005;

Venugopal et al., 2009; Zhang et al., 2011). For this reason, web personalization and

recommendation systems are often mentioned interchangeably (Castellano et al.,

18 Chapter 2: Literature Review

2009). A recommendation system usually consists of two main stages, namely user

profiling and recommendation generation.

User profiling is the stage where user profiles are constructed which is a formal

representation of information collected from the user. The profiles can be constructed

in the two steps of feedback data collection and profile representation. Feedback data

about users can be collected explicitly and implicitly, and the user profile is derived

by analysing this data to be represented in various ways, such as vector, matrix, and

tensor. Explicit feedback data collection usually relies on personal information given

by the users via HTML forms. Another common technique is by allowing users to

express their opinions through selecting a value from a range, known as ratings.

Though explicit feedback data are effective and easy to collect, they require a user’s

willingness to participate, which might become an additional burden for the user,

while in fact some users may not accurately report their own interests (Qiu and Cho,

2006). Implicit feedback data collection can be conducted by gathering the user’s

behaviour information via click streams, bookmarking, purchasing behaviour, and

the content or structure information of the visited web pages. While this approach is

considered to be an effective way to construct user profiles, it is laborious and

expensive for gathering and filtering the data.

In the stage of recommendation generation, a list of recommendations is

provided to target users by offering items with the highest predicted ratings or the

highest recommendation scores (Lü et al., 2012). In other words, the main purpose of

a recommendation system therefore is to predict the target users’ interests. Based on

how the recommendations are generated, the recommendation algorithm can be

categorised into three approaches: content-based, collaborative filtering, and hybrid

approach (Adomavicius and Tuzhilin, 2005), in which various techniques can be

applied to each approach, as summarised in Table 2.1. The next three sub-sections

provide a brief description of each approach.

2.1.1 Content-based Approaches

The content-based recommendation approach generates a list of recommendations

based on content similarity of items to the items that the target user has previously

preferred. The information source for this approach relies on items previously rated

by the user (Lops et al., 2011; Pazzani and Billsus, 2007). Content-based approaches


are mostly used for recommendation across text-based items for which content can

be represented by keywords. For each keyword, the level of importance is

determined using a weighting measure (Adomavicius and Tuzhilin, 2005; Lops et al.,

2011).

Recommendation

Approach

Recommendation Technique

Heuristic-based Model-based

Content-based TF-IDF (Information

Retrieval)

Clustering

Bayesian classifier

Clustering

Decision trees

Artificial neural networks

Collaborative Nearest neighbour

(cosine, correlation)

Clustering

Graph theory

Bayesian networks

Clustering

Artificial neural networks

Linear regression

Probabilistic models

Hybrid Linear combination of

predicted rating

Various voting schemes

Incorporating one

component as a part of the

heuristic for the other

Incorporating one

component as a part of the

model for the other

Building one unifying

model

Table 2.1. Recommendation Approach and Techniques, summarised from (Adomavicius and Tuzhilin,

2005)

Despite its success, this approach has several limitations, i.e. limited content,

over specialisation and new user problems (Adomavicius and Tuzhilin, 2005). For a

content-analysis method, appropriate suggestions cannot be made if the analysed

content does not contain enough information to illustrate user preference. Since the

approach can only recommend items whose scores are high for a user profile, the

user receives recommendation of items that are similar to those already rated. The

new user problem is caused by the insufficient number of ratings given by the new

users. A content-based approach can provide accurate recommendations only if the

items contain rich content information, such as books and articles. When there is no


adequate content information, the collaborative filtering approach is considered a

better solution.

2.1.2 Collaborative Filtering Approaches

Collaborative filtering is the most successful recommendation approach (Su and

Khoshgoftaar, 2009), where a target user will be provided a list of item

recommendations that other users with similar preferences have liked in the past.

The collaborative filtering approach can be classified into memory-based and model-

based (Su and Khoshgoftaar, 2009).

A memory-based collaborative approach uses the user-item database to

generate prediction. The recommendation process consists of user profiling,

neighbourhood formation, and recommendation generation. It implements a K-

Nearest Neighbourhood (KNN) method to form the neighbourhood of each user or

each item (Adomavicius and Tuzhilin, 2005; Koren and Bell, 2011). Similarity

measurement between two users or two items is commonly done using the Cosine

similarity or Pearson correlation. Each target user receives a list of recommended

items based on the similarity scores that form the user’s neighbourhood. This

approach has gained popularity because of its simplicity and ability to recommend

any kind of items, i.e. the ones that do not have sufficient contextual information and

those that are dissimilar to items selected by the user in the past (Adomavicius and

Tuzhilin, 2005).

A model-based approach develops models to learn the complex patterns based

on the training data and then employs it for calculating the intelligent predictions.

Several methods can be implemented to generate the model using Bayesian

approaches (Alper, 2012), clustering techniques (Begelman et al., 2006; Pan et al.,

2013; Shepitsen et al., 2008), and latent factor models based on matrix factorization

techniques (Koren, 2008; Koren and Bell, 2011).

Despite of its achievement, the collaborative approach has limitations such as

the cold-start (new user or item) and sparsity problems (Adomavicius and Tuzhilin,

2005; Lee, 2001). The cold-start problem arises since a prediction cannot be

provided for a new user or a new item for which the rating history is unavailable. The

sparsity problem occurs due to the lack of ratings present for all items for all users.


Another challenge is scalability as it requires data from a large number of users and

items (Lü et al., 2012) for finding alike users based on the rating data.

2.1.3 Hybrid Approaches

As discussed above, each of the content-based and collaborative filtering techniques

has limitations. A hybrid technique comprises of multiple recommendation

techniques, therefore their strength, can improve the recommendation performance

(Adomavicius and Tuzhilin, 2005; Burke, 2007). Burke (2007) has classified the

hybrid approaches into seven categories:

Weighted: numerically combine the predicted preference scores calculated

from different recommendation approaches.

Switching: choose among recommendation approaches based on the met

criterion.

Mixed: combine multiple recommendation approaches simultaneously.

Feature combination: combine the features of different knowledge sources

into a single recommendation method.

Feature argumentation: features resulted from the first recommendation

approach is used as part of input to the next approach.

Cascade: employ a second recommendation approach to refine the output of

the first approach.

Meta-level: use the model learned from the first recommendation approach

as the input to the second approach.

The hybrid recommendation system can solve the cold-start problem by

extracting latent features from items using the probabilistic model (Maneeroj and

Takasu, 2009). The similarities between items and users are computed for predicting

an unknown rating of a user to an item. The collaborative filtering can be

semantically enhanced by using the structured semantic knowledge of items in

aggregation with user-item mappings for creating a combined similarity measure and

generating predictions (Mobasher et al., 2004). This method could overcome the

newly added items or the very sparse data sets problems.


2.1.4 Summary and Discussion

Recommendation systems are a well-established research area (Adomavicius and

Tuzhilin, 2005; Zhang et al., 2011) and work as an essential component for web

personalization (Castellano et al., 2009). As previously described, the

recommendation algorithms can be categorised into three approaches: content-based,

collaborative filtering, and hybrid (Adomavicius and Tuzhilin, 2005).

A serious concern to be noted for recommendation systems is the lack of

explicit feedback data, which results in an inadequate quantity of data available for

recommendation. The possibility of improving recommendation accuracy by

integrating information from supplementary data sources is a promising solution.

Yet, this technique suffers from the additional data collection issue that is always

tiresome, lengthy and costly (de Campos et al., 2010; Lekakos and Giaglis, 2007).

This gives a natural path to the Social Tagging Systems that provide an alternative

way out to the data inadequacy issue (Das et al., 2011; Luo et al., 2012). The next

section will present the applicability of the social tagging technology for the

recommendation system.

2.2 TAG-BASED ITEM RECOMMENDATION SYSTEMS

The tag-based item recommendation systems have gained great popularity, due to the

growing presence of user generated information on the Web (Lü et al., 2012; Zhang

et al., 2011). They support the technology to organise the information and make it

accessible wisely. This section talks about three important aspects in a tag-based item

recommendation system: Social Tagging Systems, User Profile Modelling

Approaches, and Tagging Data Interpretation Schemes.

2.2.1 Social Tagging Systems

Social Tagging Systems (STSs) have secured a significant role in Web 2.0 (Marinho

et al., 2011; Mezghani et al., 2012). An STS allows its users to organise, retrieve, and

share their resources (in other words, items) with other users (Marinho et al., 2012).

These systems facilitate their users to use any chosen tags for annotating items of

their interest (Mezghani et al., 2012), such as photos (www.flickr.com), songs

(www.last.fm), scientific papers (http://citeulike.org) or websites


(http://delicious.com). These tags are reusable for later purposes and shareable with

other users (Schoefegger and Granitzer, 2012). Figure 2.1 shows some of the popular

STS Websites.

Figure 2.1. Example of popular Social Tagging System Websites

Tags in an STS are considered as a kind of meta-data, such as summaries,

profiles, attributes, and contents for items in other web systems. The main

differences between tags and other meta-data are that they are not predefined by

domain experts and are attached to both the users who created them and to the items

(Bogers and van den Bosch, 2009). Tags indirectly reveal a user personal interest and

can connect users to the tagged items. In a tagging system, the user activity and the

item and tag popularities form long-tailed distributions. The long-tail occurs in the

user and item distributions since most items are only selected once by the users and

most users only select one item (Li et al., 2008), while the tag long-tail distribution is

the result of personal tagging (Halpin et al., 2007). Figure 2.2 shows the long-tail

distributions captured from tagging data of the Delicious website.


(a) (b) (c)

Figure 2.2. Long-tail distribution of: (a) items of bookmarked URLs, (b) users who made the

bookmarks, and (c) tags used in the bookmarks – captured from tagging data of Delicious website (Li

et al., 2008)

A tagging activity represents the condition when a user uses a tag to annotate

an item, in which a ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation is naturally formed. Over a

period of time, the tagging data is recorded as a result of the accumulated ternary

relations. The tag-based systems can serve as a supplementary source of information

to build user profile for the personalized recommendation system (Bogers and van

den Bosch, 2009; Lü et al., 2012; Schoefegger and Granitzer, 2012; Zhang et al.,

2011; Zhang et al., 2010). The ternary relations between users with both items and

tags enhance the information communication and sharing (Halpin et al., 2007;

Marinho et al., 2011). Therefore, the success of a tag-based recommendation system

depends on how the relations in the tagging data are exploited (Bogers and van den

Bosch, 2009; Kim et al., 2010).

There are three types of recommendations in tag-based systems, i.e. user, item,

and tag recommendations (Symeonidis et al., 2010; Zhang et al., 2011). User

recommendation is recommending users who have similar profiles to a target user,

by connecting users who used the set of tags frequently used by others, as well as

persuading them to contribute and share more content. Item recommendation is

recommending items to a target user based on tags that are commonly used by other

similar users; while in Tag recommendation, a tag is recommended to a target user,

based on what other similar users have provided for the same items. The focus of this

thesis is to generate item recommendations.

Compared to the “traditional” recommendation systems, adding tags to items

can be considered as implicit feedback on items (Liang et al., 2008). Tags are able to

represent user preferences and provide quality recommendations and solve problems


in recommendation systems such as the cold-start problem (Zhang et al., 2010). A

tag-based item recommendation method predicts the list of items that may be of

interest to a user, by learning from the user’s tagging preferences.

2.2.2 User Profile Modelling Approaches

User profiles can be modelled using two data modelling approaches, i.e. two-

dimensional and multi-dimensional approaches. The two-dimensional approach

represents data as vector or matrix models. On the other hand, the multi-dimensional

approach represents data as a multi-dimensional model, such as tensor model. The

following two sub-sections discuss the two approaches in more detail.

2.2.2.1 Two-Dimensional Approaches

Basic Concept

A two-dimensional user profile modelling approach, commonly used to model the

users and items relations, cannot be directly employed to a tag-based item

recommendation system. This approach is unable (1) capturing the three-dimensional

representative of tagging data directly, i.e. modelling users, items and tags relations

(Nanopoulos, 2011; Symeonidis et al., 2010) as well as (2) modelling the many-to-

many relationship that exists among these three dimensions. Researchers have solved

this problem of integrating the tags by extending the user-item matrix used in the

standard collaborative filtering technique to enhance item recommendation. The

three-dimensional relation between ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ is projected into, a lower

dimension of, three two-dimensional matrices, i.e. ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and

⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩, as illustrated in Figure 2.3.

item-tag

frequency user-item binary

user-tag frequency

» + +

Figure 2.3. Projection of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ relation into three two-dimensional matrices


Existing approaches

Several techniques can be employed to compute the user or item similarities in order

to generate the Top-𝑁 item recommendation prediction scores. Tso-Sutter et al.

(2008) used an extended fusion technique that applies a tag extension method to

combine two conditional probabilities of user-based and item-based similarities.

Alternatively, Liang et al. (2009) proposed to combine the similarities between users

and items. User similarity is achieved using the similarity of user tags, user items,

and user tag-item; while for the item similarity, it is calculated from the percentage

of items being put in the same tag, the percentage of being tagged by the same user,

and the percentage of common tag-item relationship.

Another useful two-dimensional approach to employ tags is by using the

hybrid technique. Tags and other meta-data of items are incorporated into the

collaborative filtering algorithm by substituting the usage-based similarity measures

with the tag overlap and combining tag-based similarity with usage-based similarity

(Bogers and van den Bosch, 2009). From here, the item recommendation can then be

calculated by implementing a content-based algorithm, which uses the metadata

content.

Tags can also be used to build user and item tag clouds (Barragans-Martinez et

al., 2010). User clouds consist of tags that have never been assigned by users,

whereas an item tag cloud contains the tags that have been used to the item by users.

An item is recommended to the target user by directly comparing its tag cloud using

the content-based technique. In order to improve the recommendation, the

collaborative filtering technique is complemented by using a target-user tag cloud

that designates the user to the suitable item.

Kim et al. (2010) proposed an effective method CTS based on the concept that

tags included by a certain user implies the user’s latent preference, i.e. the user-

created tags are determining the user-to-user similarity that is used to find the latent

tags for each user. Tags have also been integrated with a user profile that is built

based on user ratings with user-generated tags (Kim et al., 2011). The similarity is

then calculated by associating the tag weights with user rating. In this way, the three-

dimensional relation is projected onto a two-dimensional matrix by considering,

only, the ⟨𝑡𝑎𝑔, 𝑖𝑡𝑒𝑚⟩ relationship and the similarity is computed between users for


the relevant and irrelevant frequent tag patterns. They showed that CTS

outperformed other two-dimensional approaches.

The CTS method (Kim et al., 2010) is used for benchmarking in this thesis, due

to its leading performance amongst two-dimensional approaches as well as its

relevancy to the tag-based item recommendation methods proposed in this thesis. It

comes closest to the proposed methods, in terms of not dealing with the semantic

problems of tags and not adding external information for generating the list of

recommendations, other than that of tagging data.

2.2.2.2 Multi-Dimensional Approaches

Although the approach of splitting the three-dimensional characteristic of tagging

into two-dimensional ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and ⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ pair relations is

possible, the total interaction between the three dimensions is lost. Consequently,

representing tagging data using the two-dimensional approach will not sufficiently

expose the latent relationship between user, item, and tag and results in poorer

recommendation quality (Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-

Thieme, 2010; Symeonidis et al., 2008, 2010).

The tensor model has been successfully used to represent multi-dimensional

data for many decades in various fields such as bioinformatics (Dyrby et al., 2005;

Troyanskaya et al., 2001), chemistry (Appellof and Davidson, 1981), computer

vision (Liu, Musialski, et al., 2009; Vasilescu and Terzopoulos, 2002), web mining

(Sun et al., 2005), monitoring systems (Tsourakakis, 2009), and recommendation

systems (Kutty et al., 2012; Rawat et al., 2011). In the past few years, researchers

have adopted tensor models to represent tagging data as it has shown to efficiently

capture the latent relationships among the users, items, and tags (Ifada and Nayak,

2014c; Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010;

Symeonidis et al., 2008, 2010). For this reason, this thesis uses a tensor model for

solving the item recommendation task in tag-based recommendation systems.


User Profile Construction

Using a tensor model, the tagging data is constructed as a third-order tensor. In other

words, user profiles are represented in the form of a tensor model. Section 2.2.3

details how the tensor model is populated by implementing a tagging data

interpretation scheme.

Latent Factors Generation

Latent factors generation is the process of deriving the latent relationships between

dimensions of the tensor model. This process is conducted by implementing the

tensor factorization technique.

Two broad families of tensor factorizations are Tucker (Tucker, 1966) and

Candecomp/Parafac (CP) (Carroll and Chang, 1970). Tucker (1966) factorization

generalises Singular Value Decomposition (SVD) into a higher-order form by

performing SVD on the matricized data for each dimension (Kolda, 2006). Higher-

Order SVD (HOSVD) and Higher-Order Orthogonal Iteration (HOOI) are two

common tensor factorizations based on Tucker.

HOSVD factorizes a tensor into a core tensor and latent factor matrices

correspond to each mode (De Lathauwer et al., 2000a; Kolda and Bader, 2009).

HOSVD does not produce an optimal rank approximation of tensor 𝒴 since it

optimizes each mode separately and disregards the interaction among them (Kolda,

2006). In HOSVD, all factor matrices are orthogonal and the matrix slices of core

tensor are mutually orthogonal (Bergqvist and Larsson, 2010).

For a third-order tensor, 𝒴 ∈ ℝ𝑄×𝑅×𝑆 where 𝑄, 𝑅, and 𝑆 are the size of a set of

users, items, and tags, respectively, the HOSVD factorization results in three latent

factor matrices of 𝑀(1) ∈ ℝ𝑄×𝐽, 𝑀(2) ∈ ℝ𝑅×𝐾, and 𝑀(3) ∈ ℝ𝑆×𝐿 and one core tensor

𝒞 ∈ ℝ𝐽×𝐾×𝐿. The 𝐽, 𝐾, and 𝐿 are the number of columns in the corresponding latent

factor matrices.

𝒴 ∶= 𝒞 ×1 M(1) ×2 M

(2) ×3 M(3) (2.1)

The latent factor matrices are determined by implementing SVD on each tensor

mode for each dimension. The core tensor defines the interaction between the users,

items and tags and gives significant impact on the result (Sun et al., 2005). The


HOSVD or Tucker factorization for the third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is illustrated

in Figure 2.4.

HOOI (De Lathauwer et al., 2000b; Kolda, 2006) is the iterative least square

optimization approach for Tucker. It uses HOSVD to initialize the factor matrices.

The factorization is achieved by solving:

min𝒞, M(1), M(2), M(3)‖𝒴 − 𝒞 ×1 M(1) ×2 M(2) ×3 M(3)‖ (2.2)

M(2)

(R x K)

Y »

C(core)

M(1)

(Q x J)

M(3)

(S x L)(J x K x L)

(Q x R x S)

Figure 2.4. The Tucker factorization model for a third-order tensor

m(2)

1

Y »m

(3)1

m(1)

1

m(3)

2

m(1)

2

m(3)

F

m(1)

F

+ +

m(2)

2 m(2)

F

+. . .

Figure 2.5. The CP factorization model for third-order tensor

The CP factorization (Carroll and Chang, 1970; Harshman, 1970) can be

considered as a special case of the Tucker model where the core tensor is diagonal

(Mørup et al., 2008). It factorizes a tensor into a sum of component rank-one tensors

that optimally approximate the original tensor (Kolda and Bader, 2009).

For example, given a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, the CP factorization can

be defined as:

𝒴 ∶= ∑ 𝜆𝑓 𝑚𝑓(1)

∘ 𝑚𝑓(2)

∘ 𝑚𝑓(3)𝐹

𝑓=1 (2.3)

Where 𝐹 is a positive integer while 𝜆 ∈ ℝ𝐹 , 𝑚𝑓(1)

∈ ℝ𝑄, 𝑚𝑓(2)

∈ ℝ𝑅, and 𝑚𝑓(3)

∈ ℝ𝑆.

CP factorization for the third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is illustrated in Figure 2.5.


Recommendation Generation

Two approaches of using a tensor model for generating list of recommendations are:

(1) by factorizing the tensor model and using latent factors to infer the

recommendations (Leginus et al., 2012; Rendle and Schmidt-Thieme, 2010); and (2)

by reconstructing the latent factors and using the reconstructed tensor to infer the

recommendations (Kutty et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013;

Symeonidis et al., 2008, 2010).

Remark: Scalability is a common problem in the tensor model. For the first

approach of generating recommendations, i.e. inferring recommendations based on

latent factors, existing works propose to tackle the issue within the factorization

process by applying the memory efficient (Kolda and Sun, 2008) and pair-wise

optimization criterion (Rendle and Schmidt-Thieme, 2010) approaches. The second

type of recommendation generation approach, i.e. using a tensor reconstruction

approach, is a step further than the former. Tensor reconstruction is an approximation

of the initial tensor, computed by multiplying all latent factors, to reveal the latent

relationships between dimensions of the tensor model. This process is memory

expensive and, therefore, reconstructing large size tensors is infeasible (Kutty et al.,

2012; Leginus et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013;

Symeonidis et al., 2010). Solving the scalability problem is still an open problem.

This thesis proposes solutions for the two recommendation generation

approaches. To solve the scalability problem of the first recommendation approach,

this thesis implements the weighted scheme and the list-wise ranking based criterion

approaches that are presented in Chapter 4 and Chapter 5 respectively. For the

second recommendation generation approach, a memory efficient loop approach is

applied for scalable full tensor reconstruction, as presented in Chapter 4.

In the next sections, the connections between the tag-based item

recommendation methods developed in this thesis and the closely related methods

are classified and detailed in terms of the tagging data interpretation schemes

(Section 2.2.3) and ranking-based recommendation approaches (Section 2.3) used.

The summary of the classifications are presented in Table 2.2 and Table 2.3.


2.2.3 Tagging Data Interpretation Schemes

Data interpretation is the process of interpreting information, i.e. tagging data,

collected from the users for representing the user profiles in a tensor model.

Selecting the appropriate interpretation scheme in a tag-based recommendation

method is crucial as it defines the user profile representation and affects the

recommendation quality. Different interpretation schemes generate different types of

data be populated in the tensor model, which influence how the task of

recommendation be solved, later detailed in Section 2.3.

A typical tag-based item recommendation method customarily interprets the

observed tagging data as “positive” or “relevant” entries. On the contrary, how the

non-observed tagging data should be interpreted, remains disputed and open to

researchers’ perceptions. There are two well-known interpretation schemes, namely

boolean and set-based schemes. Fundamentally, these two schemes differ in the way

that the non-observed tagging data is interpreted.

2.2.3.1 The boolean Scheme

The boolean scheme (Symeonidis et al., 2010) is commonly used in a tag-based item

recommendation method. It simply interprets the tagging data as binary data that

includes two types of entries, i.e. “relevant” and “irrelevant” entries. The “relevant”

entries, labelled as “1”, are the observed entries where the user has explicitly

revealed interest by annotating an item using tags; while the “irrelevant” entries,

labelled as “0”, are the remaining (non-observed) entries. The recommendation

model based on the boolean scheme tries to learn and predict a 0 for each of the

“irrelevant” cases (Rendle, Balby Marinho, et al., 2009).

Let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users, 𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set

of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. From the tagging data

𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an activity of user 𝑢 to

annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in

which users have expressed their interest to items in the past by annotating them

using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Note that the number of observed tagging data is usually

very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|.


An initial third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed where 𝑄, 𝑅, and 𝑆 are

the size of the set of users, items and tags respectively, while each tensor entry, 𝑦𝑢,𝑖,𝑡,

is given a numerical value that represents the relevance grade of tagging activity.

Figure 2.6(a) illustrates a toy example, in which a tensor holds the record of 𝐴𝑜𝑏,

𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}.

Each slice of the tensor represents a user matrix, which contains the user tag usage

for each item. The rules of boolean scheme relevance grade labelling to generate the

entries of tensor 𝒴 can be formulated as follows:

𝑦𝑢,𝑖,𝑡 ≔ {1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (2.4)

Figure 2.6(b) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5, for which entries

are generated from the tagging data by implementing the boolean scheme.

User 3

+

+ +

User 2

+

+ +User 1

+ +

+

tag

ite

m

User 30 0 0 0 0

0

0

0

0 0 0 0

1 0 0 0

0 0 1 1

User 2

1 0 0 0 0

0

0

0

0 0 0 0

1 0 1 0

0 0 0 0

User 10 0 0 0 0

1

0

0

1 0 0

0 0 1 0

0 0 0 0

tag

ite

m 0

(a) (b)

User 30 -1 0 -1 -1

0

0

0

-1 0 -1 -1

1 0 -1 -1

-1 0 1 1

User 2

1 -1 0 -1 0

-1

-1

-1

-1 0 -1 0

1 0 1 0

-1 0 -1 0

User 1-1 0 -1 -1 0

1

-1

-1

1 -1 0

0 -1 1 0

0 -1 -1 0

tag

ite

m 0

(c)

Figure 2.6. A toy example with 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}: (a) The

observed tagging data, and the initial tensor 𝒴 ∈ ℝ3×4×5 for which entries are generated by

implementing (b) the boolean, and (c) the set-based schemes


2.2.3.2 The set-based Scheme

The set-based scheme interprets the tagging data as multi-graded data of three

distinct entries, i.e. “relevant”, “irrelevant”, and “indecisive”, revealed from the

observed and non-observed entries. The set-based scheme was proposed solving the

two shortcomings of the boolean scheme: (1) the sparsity problem – 0 values

dominate the data, and (2) the overfitting problem – all non-observed entries are

denoted as 0 (Rendle, Balby Marinho, et al., 2009).

A ranking constraint is employed in the set-based scheme to differentiate the

relevance grade of data. The scheme infers that, for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏, user 𝑢 is less

favourable to use tag 𝑡 for annotating any items of “irrelevant” entries other than

those of “relevant” entries (Gemmell et al., 2011). Accordingly, higher ordinal

relevance values are assigned to the “relevant” entries and labelled with “1” value,

whereas the “irrelevant” entries are labelled with “–1” value. The “0” value is used to

label “indecisive” entries to be predicted for generating recommendations. The rules

of set-based scheme relevance grade labelling to generate the entries of tensor 𝒴 can

be formulated as follows:


−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ {𝑖|(𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏}

0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

(2.5)

Figure 2.6(c) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 for which entries

are generated from the tagging data by implementing the set-based scheme.

2.2.4 Summary and Discussion

Tag-based recommendation systems capture the user interest by analysing tagging

data and support the process of generating a list of item recommendations by

learning from the users tagging preferences. Tagging data records the user’s tagging

activities in a tag-based system and results in accumulation of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩

ternary relations over a period of time. The success of a tag-based recommendation

system depends on how the relations in the tagging data are exploited (Bogers and

van den Bosch, 2009; Kim et al., 2010). User profile can be modelled either using a

two-dimensional approach by projecting the tagging data ternary relation into

multiple matrix models, or a multi-dimensional approach by representing the tagging


data as a tensor model, in which a data interpretation scheme is required to define

the user profile representation. Table 2.2 summarises the tag-based recommendation

research according to the user profile modelling approaches, the data interpretation

schemes, and the types of recommendation. It is to be noted that the focus of this

thesis is to generate item recommendations.

Since tagging data is a multi-dimensional data, it is natural to model the user

profiles generated from tagging data with a multi-dimensional approach, i.e. tensor

model. Researchers have proposed two ways to interpret the tagging data to populate

the tensor models. The first one is the straightforward boolean scheme. Overcoming

its drawback of a sparsity problem as the non-observed data dominate the tensor

model, the set-based scheme is proposed. Despite its success in solving the

drawbacks of the boolean scheme, the set-based scheme still lacks in efficiently

learning from the non-observed data as it overgeneralises the “irrelevant” entries of

the non-observed data (Ifada and Nayak, 2014a). This brings the necessity of

alternative interpretation schemes that can thoroughly utilise the user’s tagging

history for generating the list of recommendations. Table 2.2 lists the two solutions

proposed in this thesis, UTS and graded-relevance schemes, to tackle the

shortcoming and fill the gap, as described in Chapter 5.

Once the tensor model is constructed, the next steps are latent factors

generation via tensor factorization, and tensor reconstruction stages. Two ways of

inferring recommendations from a tensor model are: (1) using the latent factors

(Leginus et al., 2012; Rendle and Schmidt-Thieme, 2010); and (2) using the

reconstructed tensor, i.e. full reconstruction of the original tensor (Kutty et al., 2012;

Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2008, 2010). For

the first approach, existing studies implementing memory efficient (Kolda and Sun,

2008) and pair-wise optimization criterion (Rendle and Schmidt-Thieme, 2010)

approaches to solve the scalability problem occurred within the factorization process.

For the second approach, tensor reconstruction is the process of approximating the

initial tensor, computed by multiplying all latent factors. It is memory expensive and,

therefore, reconstructing large size tensors is infeasible (Kutty et al., 2012; Leginus

et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010).

This thesis develops four methods for generating tag-based item recommendations

that provide solutions for both approaches, as described in Chapter 4 and Chapter 5.


User Profile

Modelling

Approach

Method

Type of

Recommen-

dation

boolean interpretation scheme

Matrix Fusion (Tso-Sutter et al., 2008)

Best CF run (Bogers and van den Bosch, 2009)

CTS (Kim et al., 2010)

Tag Cloud (Barragans-Martinez et al., 2010)

WTR (Liang et al., 2010)

User-tag-object Diffusion (Zhang et al., 2010)

CUM (Kim et al., 2011)

LIM-Item (Alper, 2012)

Item

Topickr (Negoescu and Gatica-Perez, 2008)

SimGroup (Lee and Brusilovsky, 2010)

UCTM (Kim and El Saddik, 2013)

User

Vote+ (Sigurbjörnsson and Van Zwol, 2008)

TagiCofi (Zhen et al., 2009)

LIM-Tag (Alper, 2012)

Tag

Tensor MAX-Item (Symeonidis et al., 2010)

TB (Nanopoulos, 2011)

Spectral K-means (Leginus et al., 2012)

TFC (Rafailidis and Daras, 2013)

TRPR (Ifada and Nayak, 2014b, 2014c)

We-Rank

Item

Tensor Reduction (Symeonidis et al., 2008)

MAX-User (Symeonidis et al., 2010)

User

MAX-Tag (Symeonidis et al., 2010)

LOTD (Cai et al., 2011)

Tag

set-based interpretation scheme

Tensor RTF (Rendle, Balby Marinho, et al., 2009)

PITF (Rendle & Schmidt-Thieme, 2010)

RMTF (Jitao et al., 2012)

Tag

UTS interpretation scheme

Tensor Do-Rank (Ifada and Nayak, 2015) Item

graded-relevance interpretation scheme

Tensor Go-Rank (Ifada and Nayak, 2016) Item

Table 2.2. Classification of tag-based recommendation research according to the user profile

modelling approaches, the data interpretation schemes, and the types of recommendation


2.3 RANKING-BASED RECOMMENDATION APPROACHES

Section 2.2 has detailed the important aspects of tag-based item recommendation

systems and the importance of selecting the appropriate user profile modelling

approach and tagging data interpretation scheme for populating the model. In other

words, Section 2.2 is underlying the model to be used. However, there is a scope for

improvement that is focusing on the ranking of list of recommendations. This section

discusses the ranking-based recommendation approaches that can be implemented to

solve the recommendation task by learning from the constructed model.

The task of a tag-based item recommendation system is to generate the list of

items that may be of interest to a user, by learning from the user’s past tagging

behaviour. By using the predicted preference scores, the list of item

recommendations is then sorted in descending order. Users usually show more

interest in the few items at the top of the list than those further down in the list

(Agichtein et al., 2006; Cremonesi et al., 2010; Liu, 2009; Mohan et al., 2011; Wang

et al., 2013; Weimer et al., 2007). The order of items in the recommendation list is

essential and, therefore, it becomes advantageous to implement a learning-to-rank

approach for learning the tag-based recommendation model, to solve the item


Figure 2.7 shows the typical learning-to-rank approach framework. In the

learning phase, a learning algorithm is applied to learn the ranking model, built from

the training data, such that it can predict the ground truth relevance grades in the

training data as accurately as possible, in terms of a loss function. In the test phase,

the model learned in the training phase is employed to generate the list of

recommendations for a target user.

The learning-to-rank approaches can be categorised into three types: point-

wise, pair-wise, and list-wise according to the input representation and loss function

used (Balakrishnan and Chopra, 2012; Liu, 2009; Mohan et al., 2011). The following

sub section details the process of learning-to-rank of each approach.


Learning System

Ranking System

Test Data

Training Data

Ranking Model

min Loss

Prediction

Figure 2.7. Learning-to-rank framework, adapted from (Liu, 2009)

2.3.1 Point-wise Based Ranking Approaches

To solve the recommendation task using a point-wise based ranking approach, a

recommendation model is learned to predict whether the user will like the predicted

item or not, assuming there is no interdependency between the predicted items (Liu,

2009; Mohan et al., 2011; Rendle, 2011). The learning model contains a function that

takes the feature vector of an item as the input and predicts the relevance degree of

that item to the user. The function is defined, such that the ranking model is learned

as the corresponding regression or classification problem (Balakrishnan and Chopra,

2012; Liu, 2009).

2.3.1.1 Regression based algorithm

The regression function is used when the output of the ranking model contains real-

valued predicted preference scores. In this case, the function space of a tag-based

item recommendation system can be generally formulated as follows:

𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔) → ℝ (2.6)

Several recommendation systems have implemented regression-based

algorithms and point-wise based ranking approaches. For the systems that use ratings

as explicit feedback data, SVD++ (Koren, 2008) proposed to solve the

recommendation task by merging the matrix latent factor and neighbourhood models.


In order to improve the accuracy, the method extends the models by exploiting both

explicit and implicit feedback data. Similarly, MF (Koren et al., 2009) also developed

an extended matrix latent factor model by combining it with the temporal effect

model. Alternatively, Koren and Sill (2011) suggested that the user’s rating

feedbacks should be viewed as ordinal rather than numeric values, in order to

understand the genuine reflection of user preference. By implementing the matrix

factorization approach, this method parameterizes a threshold in such a way that it

allows each user to have a different scoring scale. On the other hand, Collaborative

Ranking (Balakrishnan and Chopra, 2012) tried to compute predicted rescaled

ratings, instead of predicted rating values, by simultaneously learning the latent

factors controlled by a set of parameters 𝜃. However, undoubtedly, all of these

rating-based methods, in which the user profiles are generated from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩

binary relation, are not suitable for the task of tag-based item recommendation

systems that build the user profiles from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation.

For the case of item recommendation systems with implicit feedback data,

MAX (Symeonidis et al., 2010) was proposed to solve the prediction problem of the

tag-based systems. This method is one of the first tag-based item recommendation

methods that used tensor as its learning model, in which the boolean scheme is used

for constructing the user profiles. It applies the HOSVD-based decomposition

technique (Kolda and Bader, 2009) and directly utilises the reconstructed tensor to

generate the list of recommendations based on the maximum values of the calculated

predicted score on each user-item set. The method simply assumes that the level of

user preference on a candidate item is solely represented by the calculated predicted

score on a tag, i.e. the influence of other tags is disregarded. This affects the

recommendation quality. Furthermore, the method builds the tensor model from a

relatively small size data due to scalability issue that commonly occurs for

reconstructing large tensor models.

2.3.1.2 Classification based algorithm

The classification function is used when the output of the ranking model contains

discrete predicted preference scores. In this case, the function space of a tag-based

item recommendation system can be generally formulated as follows:

𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔) → ℤ (2.7)


For the systems that use implicit feedback data, Li et al. (2007) proposed to

solve the ranking problem by converting classification results to class probabilities

using a logistic function. By implementing a weighted scheme, the probabilities are

then converted as ranking scores. Despite it being claimed as a robust method, this

work is not yet suitable for a tag-based item recommendation system as it was built

for a web search system, where the data consists of a set of queries, in which a set of

returned documents is listed for each query. Features captured from the system –

such as anchor text, URL, document title, and body of the text (Burges et al., 2005) –

are then used to label the relevance of each ⟨𝑞𝑢𝑒𝑟𝑦, 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡⟩ binary relation. This

is unlike the tag-based system that labels the relevance of each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩

ternary relation based on the recorded tagging data.

Existing point-wise ranking based recommendation methods (Nanopoulos,

2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010), which use tensor as the

learning model, directly employ a reconstructed tensor for generating the

recommendations based on the maximum values of predicted score in each user-item

set of tensor elements. These approaches solely assume that the predicted score in the

reconstructed tensor represents the level of user preference for an item based on a tag

only and disregard the activity histories of the users (Jain and Varma, 2011), which

influence the user likelihood to select the recommended items as studied widely in

recommendation research (Kim et al., 2010). In other words, the point-wise ranking

based approach cannot properly deal with the relative order of the list of

recommended items. Consequently, this approach may unintentionally

overemphasise the items that are further down in the list (Liu, 2009).

This thesis attempts to tackle these disadvantages of the point-wise based

ranking approach by proposing two methods, i.e. Tensor-based Item

Recommendation using Probabilistic Ranking (TRPR) and Ranking using Weighted

Tensor (We-Rank). TRPR deals with the problem by applying probabilistic ranking to

the list of candidate items; meanwhile We-Rank employs a weighting scheme to

ensure that the items are appropriately emphasised, during the learning process. The

details of these methods are presented in Chapter 4.

For the benchmarking purpose, MAX by Symeonidis et al. (2010) is used due

to its good performance record as well as for the relevancy with the above two

proposed methods. MAX has the same learning framework as the proposed method


as it does not implement any additional technique to deal with the semantical

problems of tags. Moreover, both MAX and the two point-wise proposed methods

implement tensor model, a multi-dimensional approach, to build the user profile; and

the boolean scheme for interpreting tagging data and populating the tensor model.

2.3.2 Pair-wise Based Ranking Approaches

The pair-wise based ranking approach gives an alternative solution to model the

relative order of the list of recommended items. Using this approach, a

recommendation model is learned to predict the order of a pair of items, in which the

interdependency occurs between the two paired items (Liu, 2009; Mohan et al., 2011;

Rendle, 2011). The learning model of such an approach contains functions that take a

pair of items as the input, in order to predict the relative order between them. The

loss function of this approach is defined, such that the ranking model is learned as a

pair-wise regression or classification loss (Liu, 2009).

2.3.2.1 Regression based algorithm

In this case, the function space of a tag-based item recommendation system can be

generally formulated as follows:

𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, 𝑖𝑡𝑒𝑚2, 𝑡𝑎𝑔) → ℝ (2.8)

Several recommendation systems have been proposed that implemented a

regression based algorithm and a pair-wise based ranking approach. For such

systems that use rating as explicit feedback data, EigenRank (Liu and Yang, 2008)

proposed to solve the recommendation task by determining the user’s similarity,

based on the correlation between the rankings of pair of items rather than the rating

values. Using the similarity, the target user’s neighbours are selected in order to

calculate the user and item predicted preference score, in which the random walk

model was applied. Alternatively, Liu, Zhao, et al. (2009) also tried to model the

user’s preferences from the relative ordering of items by employing the probabilistic

latent preference analysis (pLPA), instead of the commonly used statistical model,

i.e. probabilistic latent semantic analysis (pLSA) (Hofmann, 1999). This method

claimed that it solved the limitation of the pLSA model, by employing pLPA and can

handle the task of recommendation systems that use either explicit or implicit


feedback data. In a different way, Balakrishnan and Chopra (2012) implemented a

matrix factorization approach, in which the latent factor’s model and a set of

parameters 𝜃 are simultaneously learned via stochastic gradient descent procedure,

in order to compute the predicted rescaled ratings.

For the case of item recommendation systems with implicit feedback data, BPR

(Rendle, Freudenthaler, et al., 2009) was proposed. The method used the ranking-

based scheme, i.e. set-based scheme, to construct the user profile, the matrix

factorization to generate the latent factors, and the smoothed AUC-based

optimization to formulate the objective function of the learning model. Definitely,

this work and all of the rating-based methods are not suitable for the task of tag-

based item recommendation systems, as they are trying to solve the problem of the

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation only, instead of solving the problem of

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation. Using the binary relation to solve the

recommendation problem of the ternary relation will means that the additional third

dimension, i.e. tag, is not used in the learning model. This is disadvantageous for a

tag-based item recommendation system, since it predicts the list of items that may be

of interest to a user, by learning from the user’s past tagging behaviour.

Subsequently, Rendle and Schmidt-Thieme (2010) implemented the framework

of BPR to solve the task of tag recommendation systems using tensor factorization to

generate the latent factors and named the method as PITF. As this method is using

Area Under the Receiver Operating Characteristic Curve (AUC) as the optimization

criterion, it undesirably assigns equal penalty to all mistakes made in the list

regardless of their positions, such as top or bottom, in the recommendation list (Shi,

Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012). Additionally, recent work has

shown that the implementation of a set-based scheme is not efficient for interpreting

tagging data (Ifada and Nayak, 2014a) since it overgeneralises the “irrelevant”

entries of the non-observed data, as previously described in Section 2.2.3.2. PITF is

different from the tag-based item recommendation system, as it generates

recommendations based on two specified dimensions, i.e. user and item; while the

latter are made with specified users only, i.e. list of item recommendations generated

for a target user is influenced by all tags.


2.3.2.2 Classification based algorithm

In this case, the function space of a tag-based item recommendation system can be

generally formulated as follows:

𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, 𝑖𝑡𝑒𝑚2, 𝑡𝑎𝑔) → ℤ (2.9)

For the systems that use rating as explicit feedback data, Balcan et al. (2007)

suggested solving the ranking problem by implementing a robust reduction technique

in order to reduce the ranking, as measured by the Area Under the Receiver

Operating Characteristic Curve (AUC), to binary classification. Though the results

are promising, this work is not suitable for the tag-based item recommendation

system, as it was built for a system of two dimensional correlations. A

recommendation method for binary relation builds its learning model so that there is

only one relevance value on each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ correlation. On the other hand, a tag-

based item recommendation method builds its learning model from the

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations tagging data where there are multiple relevance

values of items on each (𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔) set.

2.3.3 List-wise Based Ranking Approaches

Though the pair-wise based ranking approach offers advantage over the point-wise

approach, it disregards the fact that ranking is a prediction task on a list of items (Cao

et al., 2007). This makes the list-wise based ranking approach a suitable solution to

solve a recommendation task (Liu, 2009), as used in this thesis. Using this approach,

a recommendation model is learned to predict an ordered set of items which will be

of interest to a user, in which the ranking of a predicted item depends on other

corresponding items (Liu, 2009; Mohan et al., 2011). The learning model of such an

approach contains a function that can take a group of items as the input, in order to

predict either their relevance grades or permutation. In this case, the function space

of a tag-based item recommendation system can be generally formulated as follows:

𝑓(𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚1, … , 𝑖𝑡𝑒𝑚𝑛, 𝑡𝑎𝑔) → ℝ (2.10)

The loss function of a list-wise approach can be divided into two categories,

i.e. (1) directly optimizing the ranking evaluation measures, and (2) minimizing the

list-wise loss function.


2.3.3.1 Directly Optimizing Ranking Evaluation Measure

In this category, a learning-to-rank approach optimizes the recommendation model

with respect to the ranking evaluation measure in order to generate a quality Top-𝑁

recommendation list (Chapelle and Wu, 2010; Cremonesi et al., 2010; Liu, 2009; Xu

and Li, 2007). The widely used ranking evaluation measures include Mean Average

Precision (MAP), Mean Reciprocal Rank (MRR), and Discount Cumulative Gain

(DCG).

MAP and MRR are commonly used measures in the case of binary relevance

data (Chapelle and Wu, 2010; Liu, 2009; Shi, Karatzoglou, Baltrunas, Larson,

Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas, Larson, Oliver, et al., 2012). MAP

is defined as the mean value of Average Precision (AP) that considers the rank

position of each relevant item. In this case, AP is the average of precision scores at

the positions where there are relevant items (Buckley and Voorhees, 2000; Chapelle

and Wu, 2010). MRR is the mean of reciprocal rank (RR), which is equivalent to

MAP in cases where the user wishes to see only one relevant item (Craswell, 2009;

Voorhees, 1999). The RR itself is the reciprocal of the rank of the first relevant item.

TFMAP (Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012) was

proposed to solve the task of the context-aware recommendation systems by directly

optimizing MAP for the learning model, that is, to generate a list of items to each

user under a given context. The method used the boolean scheme to construct the

user profile, the tensor factorization to generate the latent factors, and the smoothed

version of MAP to formulate the objective function of the learning model so that the

standard optimization approach can be deployed. In addition, CLiMF (Shi,

Karatzoglou, Baltrunas, Larson, Oliver, et al., 2012) was proposed to solve the task

of the social network recommendation systems by directly optimizing MRR for the

learning model. The method used the boolean scheme to construct the user profile,

the matrix factorization to generate the latent factors, and the lower bound of

smoothed version of RR to formulate the objective function of the learning model so

that the standard optimization approach can be deployed.

Despite the promising results, the aforementioned works are quite different

from the tag-based item recommendation problem, which is the focus of this thesis.

For the case of context-aware recommendation systems, the list of recommendations

is generated from the specified user and context dimensions, while the tag-based one


is made from specified user only. For the case of social network recommendation

systems, the method is solving the problem of two dimensional data, i.e.

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation, which makes it undoubtedly different from the tag-

based item recommendation problem that has to solve the problem of three

dimensional data, i.e. ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation.

Different from MAP and MRR, DCG is more widely used in the case of multi-

graded relevance data (Chapelle and Wu, 2010; Liu, 2009; Weimer et al., 2007).

DCG assumes that the higher the ranked position of a relevant item, the more

important it is to the user and the more likely it is to be selected (Järvelin and

Kekäläinen, 2002). Accordingly, DCG implements a discount function such that the

score of an item at the lower ranks is reduced. NDCG is the normalization of DCG

by its Ideal DCG (IDCG), i.e. the DCG of the best ranking result.

Recent works have investigated the possibility of extending the binary

relevance data measure so that it could deal with the multi-graded relevance data. As

a result, the Expected Reciprocal Rank (ERR) (Chapelle et al., 2009) and the Graded

Average Precision (GAP) (Ferrante et al., 2014; Robertson et al., 2010) were

proposed as the generalisation of Average Precision (AP) and Reciprocal Rank (RR)

respectively, that work as alternative measures for multi-graded relevance data.

To solve the task of recommendation systems which use rating as explicit

feedback data, CoFiRank (Weimer et al., 2007) was proposed, in which the objective

of the learning model was to optimize the Normalized DCG (NDCG). This method

used the matrix factorization to generate the latent factors and used the minimization

of a convex upper bound to formulate the objective function of the learning model,

so that NDCG can be minimized. Shi et al. (2013b) employed a matrix factorization

technique and optimized the learning model based on the lower bound of the

smoothed RR measure. Alternatively, GAPfm (Shi et al., 2013a) was developed,

using the matrix factorization to generate the latent factors and the smoothed version

of GAP to formulate the objective function of the learning model. However, it is to

be noted once more that the tag-based recommendation problem is different from

these works and poses difficulty as the tag-based systems use tags as implicit

feedback data. Recommendation systems with explicit rating data build their model

by collecting the ratings, which represent the preference level of each ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩

binary relation. The list of recommendations is then generated by ranking the


predicted preference scores of the unobserved ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ relations (Balakrishnan

and Chopra, 2012; Weimer et al., 2007). In contrast, a recommendation system with

tagging data builds its model by using the user tagging history as data entries. The

key challenges of this system over the aforementioned explicit feedback data system

are modelling the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary input data, inferring the latent

relationships, and predicting each entry with a score that indicates its relevance

degree (Ifada and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009). The

recommendation list is generated by ranking the predicted preference scores of list of

items that may be of interest to a user, under all tags.

2.3.3.2 Minimizing List-wise Loss

In this category, a learning-to-rank approach optimizes the recommendation model

with respect to the list-wise loss function in order to generate a quality Top-N

recommendation list (Liu, 2009).

ListRank (Shi et al., 2010) was proposed to solve the task of recommendation

systems that use rating as explicit feedback data. The method used the matrix

factorization to generate the latent factors and the cross-entropy of top-one

probabilities of the items as the loss function. As previously described, the task of

this work is different from the task of a tag-based item recommendation system.

Currently, no existing tag-based item recommendation works in the list-wise

based ranking approach are available. For this reason, PITF is used as one of the

benchmarking methods, due to its well-known reputation and its relevancy to two

methods proposed in this thesis, i.e. DCG Optimization for Learning-to-Rank (Do-

Rank) and GAP Optimization for Learning-to-Rank (Go-Rank) methods. All of these

methods implement tensor model, a multi-dimensional approach, to build the user

profile, and a ranking-based scheme for interpreting the tagging data and populating

the tensor model. Details of the proposed methods are provided in Chapter 5. It is to

be noted that since this thesis has developed the methods of tag-based item

recommendation systems, the PITF is extended for the task of item recommendation

systems. The adaptation is necessary as the task of recommending tags differs from

the task of recommending items.


2.3.4 Summary and Discussion: Ranking based recommendation

Implementing a learning-to-rank approach is advantageous for solving the tag-based

item recommendation task, i.e. generating an ordered list of item recommendations

based on user preferences. The choice of input representation and loss function used

in learning the model determine the type of ranking approach. Using the point-wise

based ranking approach, a recommendation ranking problem is modelled as a

regression or classification task, and therefore this approach uses the regression or

classification loss as the loss function. Using the pair-wise based ranking approach, a

recommendation ranking is modelled as a pair-wise regression or classification task,

and therefore this approach employs the pair-wise regression or classification loss as

the loss function. On the other hand, to solve the recommendation task using a list-

wise based ranking approach, a recommendation model is learned to predict an

ordered list of items, and therefore direct optimization to the ranking evaluation

measures or minimization of the list-wise loss is used as the loss function for this

approach.

Table 2.3 summarises the ranking-based research according to their ranking

approaches, loss functions, and feedback forms. It can be observed that there are not

many works that implement the learning-to-rank approach that have been done for

solving the tag-based item recommendation task. In fact, there is no existing work in

a list-wise based ranking approach. The four tag-based item recommendation

methods proposed in this thesis are filling the research gap, as described in Chapter 4

and Chapter 5.


Methods Loss Function Feedback Data

Point-wise Based Ranking Approach

McRank (Li et al., 2007) (Information Retrieval) Classification I (feature)

SVD++ (Koren, 2008) Regression I + E (rating)

MF (Koren et al., 2009) Regression E (rating) + temporal

effect model

MAX (Symeonidis et al., 2010) Regression I (tag)

TB (Nanopoulos, 2011) Regression I (tag)

OrdRec (Koren and Sill, 2011) Regression E (rating)

CR-Pointwise (Balakrishnan and Chopra, 2012) Regression E (rating)

TFC (Rafailidis and Daras, 2013) Regression I (tag)

TRPR (Ifada and Nayak, 2014b, 2014c) Regression I (tag)

We-Rank Regression I (tag)

Pair-wise Based Ranking Approach

Robust Reduction (Balcan et al., 2007) Classification E (rating)

EigenRank (Liu and Yang, 2008) Regression E (rating)

pLPA (Liu, Zhao, et al., 2009) Regression E (rating)

BPR (Rendle, Freudenthaler, et al., 2009) AUC E (rating)

PITF (Rendle and Schmidt-Thieme, 2010) AUC I (tag)

CR-Pairwise (Balakrishnan and Chopra, 2012) Regression E (rating)

RMTF (Jitao et al., 2012) Regression I (tag)

List-wise Based Ranking Approach

CoFiRank (Weimer et al., 2007) NDCG E (rating)

ListRank (Shi et al., 2010) Cross-entropy E (rating)

TFMAP (Shi, Karatzoglou, Baltrunas, Larson,

Hanjalic, et al., 2012)

MAP I (context)

CLiMF (Shi, Karatzoglou, Baltrunas, Larson,

Oliver, et al., 2012)

MRR I (trust relationship)

GAPfm (Shi et al., 2013a) GAP E (rating)

xCLiMF (Shi et al., 2013b) RR E (rating)

Do-Rank (Ifada and Nayak, 2015) DCG I (tag)

Go-Rank (Ifada and Nayak, 2016) GAP I (tag)

Table 2.3. Classification of ranking-based recommendation research according to ranking approaches,

loss functions, and feedback forms. Here, I = Implicit and E = Explicit.


2.4 CHAPTER SUMMARY AND CONCLUDING REMARKS

This chapter has reviewed the literature, divided into three sections, to solve the

problem of item recommendation in a tag-based system. The chapter begins with

introducing the “traditional” recommendation system, a class of function in Web

personalization, that has overcome the abundant information era by filtering the

irrelevant items to users and recommends the list of items interesting to users by

learning from their profiles. Content-based, collaborative filtering, and hybrid

approaches are the categories of the recommendation algorithms. A concluding

discussion on this section highlights an essential concern of the recommendation

systems, i.e. lack of explicit feedback data, which causes the quantity of data

available for the recommendation to be inadequate.

The following section examined the state of the research into the tag-based

item recommendation systems. These systems are now under research accompanying

the popularity of Social Tagging System (STS) applications in Web 2.0. A general

overview of STSs was first provided. This allows distinguishing the important

aspects of the tag-based item recommendation system, and understanding the

importance of selecting the appropriate user profile modelling approach and tagging

data interpretation scheme for building the recommendation model. Two user profile

modelling approaches, two-dimensional and multi-dimensional, were reviewed. This

leads to the decision of using a tensor to model the tagging data to solve the

problems of this thesis. Subsequently, two tagging data interpretation schemes,

boolean and set-based, were explained. Both schemes interpret the “relevant” entries

straightforwardly from the observed tagging data, while they vary in how the non-

observed data should be interpreted. Comprehensive observation of the schemes

confirms that those schemes lack in efficiently learning from the non-observed data.

This brings the necessity of alternative interpretation schemes that can thoroughly

utilise the user’s tagging history for generating the list of recommendations. A

concluding discussion on this section presents the classification of tag-based

recommendation research according to the user profile modelling approaches, the

data interpretation schemes, and the type of recommendations.

The remainder of this chapter examined the state of the research into the

ranking-based recommendation approaches since implementing a learning-to-rank

approach for learning the tag-based recommendation model is beneficial for solving


the recommendation task. Three types of learning-to-rank approaches, i.e. point-

wise, pair-wise, and list-wise, were discussed. The categorisation of the approaches

is based on the input representation, determined by the data interpretation scheme

used, and the loss function that defines the optimization criterion of the learning

process. The characteristics of each approach were reviewed which leads to the

decision of implementing the point-wise and the list-wise approaches to build the

four proposed tag-based item recommendation methods. A concluding discussion on

this section presents the classification of ranking-based recommendation researches

according to their ranking approaches, loss functions, and feedback forms. It is to be

noted that that there are not many works, and in fact no existing work in list-wise

based ranking approach, that has been proposed specifically for the task of tag-based

item recommendation.

In summary, the following research gaps are highlighted after reviewing the

literatures:

Lack of efficient schemes that can thoroughly utilise the user’s tagging

history for generating the list of recommendations, as emphasised in Section

2.2.4;

Lack of efficient methods that efficiently implement a learning-to-rank

approach to solve the tag-based item recommendation task, as emphasised in

Section 2.3.4;

The works listed in Table 2.2 and Table 2.3 can essentially be categorised based on

the data interpretation schemes and ranking approaches used for constructing the user

profile and learning the recommendation model, respectively. However, no work has

been thoroughly done to study the correlation between those two in making

recommendation. This leads to the existence of the subsequent research gap:

Lack of comprehensive works that study whether a combination of an

interpretation scheme and a learning-to-rank approach has a positive

influence in making a recommendation.

The above three gaps have led to the formulated research questions and

objectives as listed in Section 1.2 and Section 1.3, respectively. Chapter 4 and

Chapter 5 will discuss the proposed methods to answer the research questions and to

achieve the research objectives.


Chapter 3: Research Design 51

Chapter 3: Research Design

3.1 INTRODUCTION

This chapter describes the research design used in pre-processing the tagging data

and developing the proposed ranking methods for generating tag-based item

recommendations. Note that the proposed ranking recommendation methods are

detailed in the next two chapters. Chapter 4 will describe the point-wise based

ranking recommendation methods that implement the standard pre-processing

scheme. Chapter 5 will present the proposed pre-processing schemes and the list-

wise based ranking recommendation methods.

This chapter also presents the real-world, but openly available, tagging system

datasets that were used to evaluate the proposed methods. The selected datasets vary

in exhibiting the characteristics of user tagging behaviour. The evaluation measures

used to measure the performance of the proposed methods in this thesis have been

detailed here. Finally, the state-of-the-art tag-based item recommendation methods

benchmarked in this thesis have been provided.

3.2 RESEARCH DESIGN

This research aims to develop methods for building a tag-based item

recommendation system that explores the interplay between the multi-dimensions of

tagging data. In order to achieve the goal, efficient tagging data interpretation

schemes and ranking methods are proposed.

As illustrated in Figure 3.1, there are two major phases of the proposed

research. Phase 1 involves the pre-processing of tagging data to construct the user

profile representation as a tensor model populated by using the tagging data

interpretation scheme. Phase 2 includes the proposed ranking methods that were

developed based on: (a) point-wise ranking; and (b) list-wise ranking approaches.

The detail of each phase is described in the following subsections.

52 Chapter 3: Research Design

Tensor

Factorization

Tensor

Reconstruction

Phase 1: Tagging Data Pre-processing

Evaluation and

Comparison

Phase 2(a): Point-wise based Ranking Approaches

Interpreting The Non-observed Tagging Data

Tagging Data

Phase 2(b): List-wise based Ranking Approaches

Weighted Tensor

ConstructionRanking

Smoothing

Model the Observed Tagging Data as Tensor

Optimization

Criterion for

Multi-graded

Data

Fast Learning

(Efficiently

Selecting the

Entries)

Ranking

Smoothing

Optimization

Criterion for

Graded-Relevance

Data

Fast Learning

(Efficiently

Selecting the

Entries)

Tensor Model

Representation of

Tagging Data

Probabilistic

Ranking

Generate

Candidate Items

and Tag

Preferences

TRPR:

Probabilistic Ranking

(Section 4.2)

We-Rank: Weighted Tensor for Ranking (Section 4.3)

Tensor

Factorization

Do-Rank:Learning from Multi-Graded Data(Section 5.2)

Go-Rank:Learning from Graded-Relevance Data(Section 5.3)

Tensor

Factorization

Tensor

Factorization

User Profile

Construction based

on boolean scheme

Optimization

Criterion for

Regression Data by

implementing the

Weighted Scheme

Top-N Item

Recommendation

Top-N Item

Recommendation

Top-N Item

Recommendation

Top-N Item

Recommendation

User Profile

Construction based

on UTS scheme

User Profile

Construction based

on Graded-

relevance scheme

User Profile

Construction based

on boolean scheme

Generate

User Tag

Preference

Optimization

Criterion for

Regression Data

Figure 3.1. The research design


3.2.1 Phase-One: Tagging Data Pre-Processing

The pre-processing phase includes interpretation of tagging data for constructing the

user profile representation. Tagging data records the user’s tagging activities in a tag-

based system and results in accumulation of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relations over

a period of time. The relation is naturally formed when a user uses a tag to annotate

an item. Such systems typically allow users to annotate an item with different tags as

well as different items being annotated with the same tag. Analysis of the tagging

data, that reflects the user profiles, allows a system to discover the latent factors that

govern the ternary relations. In this research, a user’s profiles, generated from the

tagging data, are represented as a tensor model.


of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆 tags. Tagging data, denoted as 𝐴,

can be defined as:

𝐴 ∶= 𝑈 × 𝐼 × 𝑇 (3.1)

where a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the activity of user 𝑢 to annotate item 𝑖

using tag 𝑡. The ternary relations within the tagging data can be naturally modelled

as a three-dimensional tensor of:

𝒴 ∈ ℝ𝑄×𝑅×𝑆 (3.2)

where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the

relevance grade that user 𝑢 has revealed in annotating item 𝑖 using a tag 𝑡. Each slice

of the tensor represents a user matrix that contains the user tags usage on annotating

items. In this research, the tensor 𝒴 is used as the base model in which ranking

learning is executed. In other words, the tensor model is called a ranking learning

model.

The observed tagging data, denoted as 𝐴𝑜𝑏, defines the state in which users

have revealed their interest to items in the past by annotating them using tags:

𝐴𝑜𝑏 ⊆ 𝐴 (3.3)

Usually, the number of observed tagging data is very less thus

|𝐴𝑜𝑏| ≪ |𝐴| (3.4)


Figure 3.2 presents a tensor toy example of 𝒴 ∈ ℝ3×4×5 where 𝑈 =

{𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4} and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. The “+” symbols denote the

entries of observed tagging data 𝐴𝑜𝑏, for example, user 𝑢1 has tagged item 𝑖2 using

tags 𝑡1 and 𝑡3. A tag-based recommendation system customarily interprets the

observed tagging data as “positive” or “relevant” entries. On the contrary, how the

non-observed tagging data should be interpreted, remains disputed and open to

researchers’ perceptions. The selection of an interpretation scheme is essential at this

stage as it defines the user profile representation and affects the recommendation

performance.

User 3

+

+ +

User 2

+

+ +User 1

+ +

+

tag

ite

m

Figure 3.2. A toy example of entries from the observed tagging data 𝐴𝑜𝑏

This thesis implements three different schemes for interpreting the tagging data

namely, boolean (Symeonidis et al., 2010), User-Tag set (UTS) based on set-based

(Rendle, Balby Marinho, et al., 2009) and graded-relevance. Fundamentally, these

three schemes differ in the way how the non-observed tagging data is interpreted.

Each scheme is governed by the underlying recommendation approach employed to

solve the recommendation task. Note that the last two schemes are developed in this

thesis.

3.2.1.1 The boolean Scheme

The boolean scheme interprets the observed tagging data as “relevant” entries

whereas all non-observed data are interpreted as “irrelevant” entries. As a result, this

scheme generates two possible distinct entries, “relevant” (“1”), “irrelevant” (“0”),

for each cell in the tensor. Figure 3.3(a) shows the toy example of the User 1 (𝑢1)

profile built by implementing the boolean scheme.


3.2.1.2 The User-Tag set (UTS) Scheme

The UTS scheme, based on set-based (Rendle, Balby Marinho, et al., 2009),

interprets the non-observed tagging data as follows: for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, only

the items that have not tagged by user 𝑢 are regarded as “irrelevant”, while the rest of

the non-observed items are labelled as “indecisive”, i.e. as entries to be predicted

when generating the recommendations (Ifada and Nayak, 2014a). The observed

tagging data is interpreted as “relevant”. Hence, the UTS scheme has three possible

entries: “relevant” (“1”), “irrelevant” (“-1”), or “indecisive” (“0”), for each cell in the

tensor. As the non-observed data no longer dominates the entries, this scheme is able

to overcome the sparsity problem of the boolean scheme.

Figure 3.3(b) shows the toy example of User 1 (𝑢1) profile built by

implementing the UTS scheme. The figure shows that there exist (𝑢1, 𝑡1), (𝑢1, 𝑡3),

and (𝑢1, 𝑡4) sets that have been used to annotate {𝑖2}, {𝑖2}, and {𝑖3}, respectively.

Those observed entries are regarded as “relevant” and labelled as “1” while all non-

observed entries of non-existed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5), on any items are

regarded as “indecisive” and labelled as “0”. As both 𝑖1 and 𝑖4 have never been

annotated by 𝑢1 using any other tags, therefore the entries of {𝑖1, 𝑖4} within (𝑢1, 𝑡1),

(𝑢1, 𝑡3), and (𝑢1, 𝑡4) are regarded as “irrelevant” and labelled as “-1”.

3.2.1.3 The graded-relevance Scheme

The set-based scheme interprets the observed entries as “relevant” (or “positive”)

entries, while the non-observed entries are interpreted as a combination of

“irrelevant” (or “negative”) and “indecisive” (or “null’) entries. The “irrelevant”

entries are entries that the users do not like, while the users might like the

“indecisive” entries in the future, i.e. entries to be predicted by the recommendation

system. The problem of the scheme is that it overgeneralises the “irrelevant” entries.

In fact, there exist entries of non-observed data; these should not merely be

interpreted as “irrelevant” or “indecisive” entries, as they can be “relevant”.

The graded-relevance scheme interprets the non-observed tagging data in

higher granularity and results in four possible distinct entries, “relevant” (“2”),

“likely relevant” (“1”), “irrelevant” (“-1”), or “indecisive” (“0”). Under this scheme,

the rules of labelling the “relevant”, “irrelevant”, and “indecisive” entries are the

same as those of the UTS scheme, however the “relevant” entries are given numeric


values of “2” instead of “1”. The “likely relevant” entries are perceived as the

“transitional” entries positioned between the “relevant” and “irrelevant” entries, i.e.

those entries in which items have been tagged, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set.

Figure 3.3(c) shows the toy example of the User 1 (𝑢1) profile built by

implementing the graded-relevance scheme. The figure shows that there exists

entries of 𝑖3 within (𝑢1, 𝑡1) and (𝑢1, 𝑡3) that are not interpreted as “irrelevant” as the

item 𝑖3 occurs as “relevant” on (𝑢1, 𝑡4). Similarly, 𝑖2 is “relevant” on (𝑢1, 𝑡1) and

(𝑢1, 𝑡3), and therefore 𝑖2 within (𝑢1, 𝑡4) cannot be “irrelevant”. Note that those

entries certainly should not be regarded “indecisive” as the items of the entries have

been selected by 𝑢1 and, therefore, they are not required to be predicted in the future.

User 10 0 0 0 0

1

0

0

1 0 0

0 0 1 0

0 0 0 0

tag

ite

m 0

User 1-1 0 -1 -1 0

1

0

-1

1 0 0

0 0 1 0

0 -1 -1 0

tag

ite

m 0

User 1-1 0 -1 -1 0

2

1

-1

2 1 0

0 1 2 0

0 -1 -1 0

tag

ite

m 0

(a) (b) (c)

Figure 3.3. The toy example of for User 1 (𝑢1) profile built from various interpretation schemes:

(a) boolean, (b) UTS, and (c) graded-relevance

3.2.2 Phase-Two: Generating Recommendations with Ranking Methods

This phase generates recommendations using the pre-processed tagged data based on

the proposed ranking recommendation methods according to the point-wise and list-

wise based ranking approaches.

In this thesis, the recommendation task is approached in two ways. Firstly, the

recommendation task is approached as a prediction task by predicting whether an

item will be “relevant” or “irrelevant” to a user. In this task, there is no

interdependency between the predicted item and other items (Liu, 2009; Rendle,

2011). In other words, this task can be called point-wise based ranking. Given this

setting, the boolean scheme is the most appropriate interpretation scheme used for

constructing the learning model. The scheme interprets the observed tagging data

entry as “1”, representing the “relevant” entry; and denotes the non-observed one as

“0”, representing the “irrelevant” entry. This thesis develops two point-wise based

ranking recommendation methods – TRPR: Probabilistic Ranking and We-Rank:


Weighted Tensor for Ranking – in which the boolean scheme is implemented. They

are described in Section 3.2.2.1. Details of the methods and the scheme are presented

in Chapter 4.

Secondly, the recommendation task is approached as a ranking task by

predicting an ordered list of items which will be of interest to a user. In this task, the

predicted entries depend on other corresponding entries (Liu, 2009; Rendle, 2011). In

other words, this task can be named as list-wise based ranking. The boolean scheme

for constructing the learning model is inappropriate in this task as the predicted

entries should be represented in ranked order. This thesis proposes two interpretation

schemes that apply the ranking constraint to interpret the tagging data, namely UTS

and graded-relevance, resulting in a multi-graded and graded-relevance input data

respectively. For tensor models populated with these interpretation schemes, the

recommendation task is viewed as a ranking problem. Therefore, the corresponding

ranking evaluation measure should be used as the optimization criterion. This thesis

develops two list-wise based ranking recommendation methods – Do-Rank: Learning

from Multi-graded Data and Go-Rank: Learning from Graded-relevance Data – in

which the UTS and graded-relevance schemes are implemented respectively. They

are described in Section 3.2.2.2. Detail of the methods and the schemes are presented

in Chapter 5.

3.2.2.1 Phase-Two (a): Point-wise based Ranking Approaches

The task of a tag-based recommendation system is to generate the list of items that

may be of interest to a user, by learning from the user’s past tagging behaviour. In

this phase, solving the recommendation task using a point-wise based ranking

approach, the task of recommendation is regarded as a regression/classification

learning problem where the tagging data is interpreted using the boolean scheme.

Therefore the corresponding regression/classification loss function is used as the

optimization criterion (Liu, 2009; Rendle, 2011). Two methods are proposed under

this approach and the brief details are described as follows.

3.2.2.1.1 TRPR: Probabilistic Ranking

The Tensor-based Item Recommendation using Probabilistic Ranking (TRPR)

method is developed by applying a probabilistic technique with the tensor model. In


TRPR, the recommendation quality is improved by ranking the candidate items

selected from the reconstructed tensor model.

A third-order tensor model representing the user profiles is built from the

tagging data by implementing the boolean scheme, briefly described in Section

3.2.1.1. The populated tensor model represents the collaborative activities of the

users that can be inferred by doing regression analysis. The tensor model is

optimized with respect to the least square loss function. This process is called

learning the recommendation model for regression. The factorized tensor model can

reveal the hidden relationships (i.e. collaborative activities) between the users. These

latent factors have been used to reconstruct the tensor and identify the new derived

entries for making recommendations.

Unlike the conventional way (Nanopoulos, 2011; Rafailidis and Daras, 2013;

Symeonidis et al., 2010), the reconstructed tensor element is not directly used for

generating recommendations. Instead, from the reconstructed tensor, the list of tags

and items preferences for each user is firstly generated, and then a probabilistic

approach is used to calculate the preferences of the users to rank recommended items

and generate a Top-𝑁 item list. An additional challenge of this method is the tensor

reconstruction process where the entire latent factors need to be multiplied. This

process consumes a lot of memory and the scalability becomes an issue. Given that

the factorized tensor consists of one core tensor and three factor matrices, a two-

stage iterative approach is proposed to reconstruct the tensor. Firstly, the core

element is multiplied by the first two factor matrices sequentially and the memory is

cleared during each process after saving the result. Lastly, a memory efficient loop is

implemented when the last factor matrix is multiplied by the complete result of the

previous process. Detail of the TRPR is presented in Chapter 4, Section 4.2.

3.2.2.1.2 We-Rank: Weighted Tensor Approach for Ranking

The Recommendation Ranking using Weighted Tensor (We-Rank) method is

developed by implementing a weighted tensor to reflect rewards and penalties to the

observed and non-observed tagging data entries of the primary tensor model that is

used for learning the regression model.

The same as TRPR, a third-order tensor representing the user profile is built

from the tagging data by implementing the boolean scheme. Unlike TRPR, We-Rank


does not solely use the user profiles for finding the hidden relationships between the

users by optimizing the tensor model. It also builds a weighted tensor, for which

entries are generated by considering the user’s past tagging behaviour. The weighted

tensor plays an important role in the learning algorithm as it controls and

differentiates the rewards and penalties for the observed and non-observed tagging

data entries respectively. The optimization criterion becomes a weighted least square

loss function. In comparison to TRPR that applies a succeeding approach to correctly

rank the order of recommended items after factorization and reconstruction

processes, the resulted latent factors of We-Rank can be directly used to make the

ranked recommendations.

For generating the entries of weighted tensor, this thesis proposes to firstly

calculate the likeliness score of each user in using each tag to annotate items.

Afterwards, tags are listed in descending order based on the likeliness scores for each

user. In this way, We-Rank can treat and call the tags with high likeliness scores as

the user’s positive tag preference set, whereas those of low scores can be treated and

call as the user’s negative tag preference set. Given the observed tagging data entries

and the generated positive and negative tag preference sets, the weighted tensor is

constructed. The entries in the weighted tensor are a bijective mapping to the entries

of the first tensor model that represents the user profile. Each observed entry of the

primary tensor is rewarded, such that the associated entry of the weighted tensor

holds higher positive value than that of the non-observed one. Detail of the We-Rank

is presented in Chapter 4, Section 4.3.

3.2.2.2 Phase-Two (b): List-wise based Ranking Approaches

The order of items in the recommendation list is imperative as users show more

interest in the few top items (Cremonesi et al., 2010). Inspired by this, a

recommendation task can be regarded as a ranking problem, in which the item

preference scores need to be calculated and sorted for generating the list. While

viewing the recommendation task as a ranking problem, a recommendation model

should be optimized with respect to the ranking evaluation measure so that a list of

items optimized from the ranking evaluation measure perspective can be

recommended to each user (Chapelle and Wu, 2010; Cremonesi et al., 2010; Xu and

Li, 2007). The user profile should also be built based on a scheme that interprets the


tagging data as a ranking representation. Two methods are proposed under this

approach and the brief details are described as follows.

3.2.2.2.1 Do-Rank: Learning from Multi-graded Data

The proposed UTS scheme, an efficient version of the set-based scheme (Rendle,

Balby Marinho, et al., 2009), results in the multi-graded tagging data representation

(Ifada and Nayak, 2014a, 2015) as briefly described in Section 3.2.1.2. The tagging

data is labelled with a value in the ordinal relevance set of

{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩. The “relevant”

(or “positive”) entries hold the highest value grade and represent the observed

tagging data entries, which indicate that the user has annotated an item using those

tags. Whereas both “irrelevant” (or “negative”) and “indecisive” (or “null”) entries

represent the non-observed entries, which respectively indicate that users are not

interested or users might be interested with the items in the future.

In the proposed method, namely DCG Optimization for Learning-to-Rank (Do-

Rank), the recommendation model is optimized with respect to Discount Cumulative

Gain (DCG) as the ranking evaluation measure for learning from the multi-graded

tagging data. Do-Rank generates an optimal list of item recommendations from the

DCG perspective for each user. However, optimizing DCG across all users in the

recommendation model is computationally expensive. To tackle this issue, a fast

learning algorithm is implemented. Detail of the Do-Rank is presented in Chapter 5,

Section 5.2.

3.2.2.2.2 Go-Rank: Learning from Graded-relevance Data

The proposed ranking method, namely GAP Optimization for Learning-to-Rank (Go-

Rank), applies the proposed graded-relevance scheme to interpret the tagging data

effectively. Briefly described in Section 3.2.1.3, the scheme sets the entries of non-

observed data that are not “irrelevant” entries – since items of those entries have

been annotated using other tags by the user – as the transitional entries between the

“relevant” and “irrelevant” entries. The “transitional” entries lead to the selection of

a ranking evaluation measure that should be used to handle the data and works as the

optimization criterion for learning the recommendation model. The Graded Average

Precision (GAP) is the generalization of Average Precision (AP) for ordinal


relevance data. Using GAP as the optimized ranking evaluation measure enables the

learning model to set up thresholds so that the “likely relevant” entries can be

regarded as either “relevant” or “irrelevant” entries. The ranking method generates a

list of items ranked from the GAP perspective for each user. Additionally, for the

purpose of fast and efficient learning, the entries are filtered as not all of them are

necessary to be used for learning. Detail of the Go-Rank is presented in Chapter 5,

Section 5.3.

3.3 DATASETS

Four publicly available real-world datasets were used to build the models and

evaluate the proposed methods in comprehensive experiments. Table 3.1 details the

various characteristics of these datasets. The detail of four tagging datasets is as

follows:

1) Delicious Dataset. This dataset is obtained from the DAI-Labor corpus2. The

corpus is retrieved from the Delicious Social Bookmarking Website

(http://delicious.com/) which contains 420 million observed tagging data of

bookmarking between September 2003 and December 2007 (Wetzker et al.,

2008). This thesis uses a portion of the dataset between January 2004 and April

2004. Figure 3.4 shows a snapshot of the Delicious dataset.

Figure 3.4. A snapshot of the Delicious dataset

2 Available at http://www.dai-labor.de/en/irml/datasets/delicious/

http://www.dai-labor.de/en/irml/datasets/delicious/


2) LastFM Dataset. This dataset is obtained from the GroupLens corpus3. The

corpus is retrieved from the Last.fm online music system (http://www.last.fm/)

which contains social networking, tagging, and music artist listening

information from a set of 2K users (Cantador et al., 2011). This thesis uses the

“user_taggedartists-timestamps.dat” file that contains the observed tagging data

of artists provided by each user. Figure 3.5 shows a snapshot of the LastFM

dataset.

Figure 3.5. A snapshot of the LastFM dataset

3) CiteULike Dataset. This dataset is obtained from the CiteULike website4.

CiteULike (http://citeulike.org/) is a website that provides a service for

managing and discovering scholarly references. It contains the tagging

information of users for managing and discovering scholarly references

from 2007-05-30 onwards. This thesis uses a portion of the dataset from the

year 2012. Figure 3.6 shows a snapshot of the CiteULike dataset.

Figure 3.6. A snapshot of the CiteULike dataset

4) MovieLens Dataset. This dataset is obtained from the GroupLens corpus5. The

corpus is an extension of the MovieLens10M dataset, published by the

GroupLens research group (http://www.grouplens.org/), which contains

3 Available at http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip

4 Available at http://static.citeulike.org/data/current.bz2

5 Available at http://files.grouplens.org/datasets/hetrec2011/hetrec2011-movielens-2k-v2.zip

http://files.grouplens.org/datasets/hetrec2011/hetrec2011-lastfm-2k.zip

http://files.grouplens.org/datasets/hetrec2011/hetrec2011-movielens-2k-v2.zip


personal ratings and tags about movies (Cantador et al., 2011). This thesis uses

the “user_taggedmovies-timestamps.dat” file that contains the observed

tagging data of movies provided by each user. Figure 3.7 shows a snapshot of

the MovieLens dataset.

Figure 3.7. A snapshot of the MovieLens dataset

These datasets have been used in evaluation as they are commonly used by

previous researchers (Bogers and van den Bosch, 2009; Kim et al., 2010; Rendle,

Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010; Symeonidis et al.,

2010; Tso-Sutter et al., 2008; Zhang et al., 2010). More importantly, these datasets

show diverse characteristics, as shown in Table 3.1, that assist in evaluating the

methods. The density of a dataset is calculated as:

𝑑𝑒𝑛𝑠𝑖𝑡𝑦 =|𝐴𝑜𝑏|

𝑄∙𝑅∙𝑆 (3.5)

The last row of Table 3.1 highlights the diversity of the datasets, as the ratio of

average observed entries of tagging data are different. The characteristics of user’s

tagging behaviour are captured differently in each tagging system. Users of the

LastFM and MovieLens datasets show that they, on average, have annotated

comparable number of items to the number of tags used; whereas users of the

Delicious and CiteULike datasets show that they, on average, have annotated large

number of items in comparison to the number of tags used.


Attribute

Dataset

Delicious

(item:

bookmark)

LastFM

(item:

artist)

CiteULike

(item:

article)

MovieLens

(item:

movie)

#users (𝑄) 5,311 1,892 22,610 2,113

#items (𝑅) 147,770 17,632 562,108 10,197

#tags (𝑆) 31,366 11,946 178,270 13,222

# observed entries of

tagging data (|𝐴𝑜𝑏|)

456,064 186,479 2,103,367 47,957

Density 0.000002% 0.000047% 0.0000001% 0.000017%

Avg observed entries of

tagging data per user

85.872 98.562 93.028 22.696


tagging data per item

3.086 10.576 3.742 4. 703


tagging data per tag

14.540 15.610 11.80 3.627

Ratio of Avg observed

entries of tagging data

(user:item:tag)

1:28:6 1:9:6 1:25:8 1:5:6

Table 3.1. Details of the various characteristic of datasets

3.3.1 Experimental Settings

Recommendation methods commonly suffer from the sparse data and consequently

generate low quality recommendations for the long-tail items (i.e. items that are

rarely selected by users) and users (i.e. users that rarely selected items) (Halpin et al.,

2007; Jäschke et al., 2007). Adapting the standard and common technique of

removing noise and reducing the data sparsity (Nanopoulos, 2011; Rafailidis and

Daras, 2013; Symeonidis et al., 2010), the datasets are refined by using the 𝑝-core

technique (Batagelj and Zaveršnik, 2002). This technique allows selecting users,

items, and tags that have occurred in at least 𝑝 number posts. Post is the set of

distinct (𝑢, 𝑖) ∈ 𝐴𝑜𝑏. This thesis follows this procedure and implements 10-core, the

accepted and realistic 𝑝-core setup in the literature (Jäschke et al., 2007; Rendle,

Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010; Tso-Sutter et al.,


2008), to refine each dataset used in the experiments. Setting a lower core threshold

might cause the users, items, and tags to have insufficient ties between dimensions,

making it difficult to discover common user interests for generating accurate

predictions (Li et al., 2008). Researchers have used 5-core or lower on small datasets

as many of users in the datasets have fewer than 10 posts (Jäschke et al., 2007;

Rendle and Schmidt-Thieme, 2010). Since this thesis uses large datasets, the most

appropriate lowest threshold is 10-core, as evidenced by other research (Jäschke et

al., 2007; Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010;

Tso-Sutter et al., 2008).

Additionally, to avoid bias in the experiments, this thesis also implements two

other ranges of 𝑝-core to each dataset, i.e. 15-core and 20-core. This variation will

minimize the likelihood of producing unstable results that are likely to happen when

only one choice of core size is used in the experiments (Doerfel and Jäschke, 2013).

It is to be noted that, the higher the 𝑝-core is set, the less sparse the resultant set and

the smaller each dimension size become. Table 3.2 details the Delicious, LastFM,

CiteULike, and MovieLens datasets statistics resulted from the implementation of

various 𝑝-cores.

Dataset 𝒑-

core

#users

(𝑸)

#items

(𝑹)

#tags

(𝑺)

Observed Tagging

Data Entries (|𝑨𝒐𝒃|) Density

Delicious 10 2,009 1,485 2,589 50,991 0.0007%

15 1,609 719 1,761 32,389 0.0016%

20 1,359 424 1,321 23,442 0.0031%

LastFM 10 867 1,715 1,423 99,211 0.0047%

15 703 1,018 1,063 76,808 0.0100%

20 601 681 838 61,739 0.0180%

CiteULike 10 1,129 548 2,403 17,161 0.0012%

15 721 203 1,334 8,099 0.0042%

20 529 89 844 4,254 0.0100%

MovieLens 10 357 709 799 14,535 0.0072%

15 262 396 558 9,442 0.0100%

20 200 217 425 6,023 0.0327%

Table 3.2. The details of dataset statistics resulted from the implementation of various 𝑝-cores


For each of the datasets, a 5-fold cross-validation experimentation is conducted

where each fold randomly generates 80% of the data set as training (𝐷𝑡𝑟𝑎𝑖𝑛) and

another 20% as a test (𝐷𝑡𝑒𝑠𝑡) based on the number of posts data. The 𝐷𝑡𝑟𝑎𝑖𝑛and

𝐷𝑡𝑒𝑠𝑡do not overlap in posts, i.e., there exist no triplets for a user-item set in the

𝐷𝑡𝑟𝑎𝑖𝑛 if the set is present in the 𝐷𝑡𝑒𝑠𝑡. The performance evaluation is reported over

the average values on all five runs.

3.4 EVALUATION METRICS

The evaluation of the performance of tag-based item recommendation methods is

conducted via offline setting. Offline evaluation means that the ranking methods are

evaluated on the pre-collected real-world tagging data (Shani and Gunawardana,

2011). This approach follows the typical evaluation scenario in academic research

(Marinho et al., 2012), in which it does not require any interaction with the real users

and therefore it allows the method comparison at a low cost (Shani and

Gunawardana, 2011). The weakness of this approach, however, is that it cannot

directly measure the influence of the recommendation method on the user behaviour

– the user reaction to real-time recommendation (Shani and Gunawardana, 2011).

The recommendation task is formulized to predict the Top-𝑁 items for the set

of target users present in 𝐷𝑡𝑒𝑠𝑡. Assuming the ground truth of items for each target

user 𝑢 can be found from the dataset; it is assigned as 𝑇𝑒𝑠𝑡𝑢 ⊆ 𝐼. The ranked list

recommendation for target user 𝑢 is represented by the permutation of items 𝐼,

denoted as 𝑙𝑢, where 𝑙𝑢(𝑛) is the item at position 𝑛 in the list. For instance, 𝑙𝑢1(1) =

𝑖1 indicates that the top position in the ranked list of recommendations for target user

𝑢1 is item 𝑖1.

The quality of recommendations is determined by measuring how successful

the method is for predicting the items in 𝑇𝑒𝑠𝑡𝑢 for each target user 𝑢. In this thesis,

distinct evaluation measures are used to evaluate the performances of each proposed

method. The selection of measures is governed by the underlying approaches

implemented for solving the recommendation task, i.e., point wise and list-wise. The

score of each measure ranges between 0 and 1, representing the lowest and highest

possible values, respectively.


3.4.1 Point-wise based Ranking Approach

Precision, Recall, and F1-Score are commonly used for evaluating the performance

of the recommendation model that is built by implementing the point-wise based

ranking approach, in which the recommendation task is approached as a

regression/classification task. The recommendation model is learned such that it

would predict whether the user will like the predicted item or not, where there is no

interdependency between the predicted item with other items present in the dataset

(Liu, 2009; Rendle, 2011)

For each target user, the predicted or recommended Top-𝑁 list of items,

𝑇𝑜𝑝(𝑙𝑢, 𝑁) where 𝑙𝑢 ∈ 𝐼, are compared to the ground truth items, 𝑇𝑒𝑠𝑡𝑢. Precision

measures the proportion of how many items in the predicted Top-𝑁 list are in 𝑇𝑒𝑠𝑡𝑢,

while recall measures how many of the ground truth items 𝑇𝑒𝑠𝑡𝑢 are covered by the

predicted Top-𝑁 list. The precision and recall per target user 𝑢, at 𝑁 position, are

computed as follows:

𝑇𝑜𝑝(𝑙𝑢, 𝑁) ∶= {𝑙𝑢(1),… , 𝑙𝑢(𝑁)} (3.6)

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=|𝑇𝑜𝑝(𝑙𝑢,𝑁)∩𝑇𝑒𝑠𝑡𝑢|

𝑁 (3.7)

𝑅𝑒𝑐𝑎𝑙𝑙(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=|𝑇𝑜𝑝(𝑙𝑢,𝑁)∩𝑇𝑒𝑠𝑡𝑢|

𝑇𝑒𝑠𝑡𝑢 (3.8)

The reported precision and recall values are the average values over all users in

the 𝐷𝑡𝑒𝑠𝑡; the average precision and recall are calculated as:

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1

|𝐷𝑡𝑒𝑠𝑡|∑ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡

(3.9)

𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1

|𝐷𝑡𝑒𝑠𝑡|∑ 𝑅𝑒𝑐𝑎𝑙𝑙(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡

(3.10)

Additionally, F1-Score is reported to represent the harmonic mean of average

precision and recall:

𝐹1(�̃�, 𝑁) ∶=2∙ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡,𝑁)∙ 𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡,𝑁)

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝐷𝑡𝑒𝑠𝑡,𝑁)+𝑅𝑒𝑐𝑎𝑙𝑙(𝐷𝑡𝑒𝑠𝑡,𝑁) (3.11)


3.4.2 List-wise based Ranking Approach

AP, MAP and NDCG are the widely used measures for evaluating the performance

of a recommendation model that is built by implementing the list-wise based ranking

approach, in which the recommendation task is approached as a ranking task. In this

case, the model is learned such that it would predict an ordered set of items that will

be of interest to a user, in which the predicted items depend on other corresponding

items (Liu, 2009; Rendle, 2011).

3.4.2.1 Average Precision (AP) and Mean Average Precision (MAP)

Average Precision (AP) is the average of precisions at the positions where the

predicted item 𝑙𝑢 is in 𝑇𝑒𝑠𝑡𝑢. The Average Precision (AP), at 𝑁 position, is defined

as:

𝐴𝑃(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶=1

|𝑇𝑒𝑠𝑡𝑢|∑ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑛) ∙ 𝕀(𝑙𝑢(𝑛) ∈ 𝑇𝑒𝑠𝑡𝑢)𝑁

𝑛=1 (3.12)

where 𝕀(∙) is the indicator function, which is equal to 1 if the condition is satisfied,

and 0 otherwise. The Mean Average Precision (MAP) is presented as the average

over all users in the 𝐷𝑡𝑒𝑠𝑡:

𝑀𝐴𝑃(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1

|𝐷𝑡𝑒𝑠𝑡|∑ 𝐴𝑃(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡

(3.13)

3.4.2.2 Discounted Cumulative Gain (DCG) and Normalized Discounted

Cumulative Gain (NDCG)

In the Discounted Cumulative Gain (DCG), predicted items with higher ranked

position are more important to the user than those of the lower ranked. The DCG per

target user 𝑢 is defined as:

𝐷𝐶𝐺(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁) ∶= ∑1

𝑙𝑜𝑔2(1+𝑛)∙ 𝕀(𝑙𝑢(𝑛) ∈ 𝑇𝑒𝑠𝑡𝑢)𝑁

𝑛=1 (3.14)

where 𝕀(∙) is the indicator function, which is equal to 1 if the condition is satisfied,

and 0 otherwise. In Equation (3.13), the numerator is the gain function that gives

weight to the predicted items 𝑙𝑢 if they exist in 𝑇𝑒𝑠𝑡𝑢 whereas the denominator is the

discount function that makes the predicted items at lower ranks contribute less to the

DCG score (Balakrishnan and Chopra, 2012; Chapelle and Wu, 2010).


The DCG score over all users in the 𝐷𝑡𝑒𝑠𝑡 is formulated as:

𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=1

|𝐷𝑡𝑒𝑠𝑡|∑ 𝐷𝐶𝐺(𝑇𝑒𝑠𝑡𝑢, 𝑙𝑢, 𝑁)𝑢∈𝐷𝑡𝑒𝑠𝑡

(3.15)

Where the maximum possible DCG score, i.e. Ideal DCG (IDCG), is given as:

𝐼𝐷𝐶𝐺(𝑁) ∶= ∑1

𝑙𝑜𝑔2(1+𝑛)𝑁𝑛=1 (3.16)

The Normalized Discounted Cumulative Gain (NDCG) is then derived by

normalizing the DCG with IDCG:

𝑁𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡, 𝑁) ∶=𝐷𝐶𝐺(𝐷𝑡𝑒𝑠𝑡,𝑁)

𝐼𝐷𝐶𝐺(𝑁) (3.17)

3.5 BENCHMARKING METHODS

This section details the benchmarks used for evaluating the proposed methods. The

purpose is to comprehend the strengths and weaknesses of both the proposed

methods and the relevant state-of-the-art methods. This thesis is principally trying to

solve the tag-based item recommendation task by applying the tensor model and

highlighting the practical utilisation of an efficient tagging data interpretation scheme

and a learning-to-rank approach. This is a fairly new research topic. The tensor

models have become popular within a few years in recommendation research and

researchers are still stumped by many challenges such as scalability, sparsity, and

learning from the latent factors. Additionally, treating recommendation as a ranking

approach and using optimization to achieve the most optimized recommendation list

are a new research area. Consequently, there are not many direct relevant works that

can be used. The closest works are chosen that can be used for comparison directly

or with some adaptation for benchmarking. The selection of benchmarking methods

is driven by those aspects. It is to be noted that dealing with the semantical problems

of tags is beyond the focus of this research and the benchmarks have been selected

accordingly. Details of the benchmarks are described as follows:


3.5.1 MAX Method

The MAX method (Symeonidis et al., 2010) is one of the first tag-based item

recommendation methods that used the tensor as its learning model. This method

uses the boolean scheme for constructing the user profiles, in which the task of

recommendations is regarded as predicting whether an item will be “relevant” or

“irrelevant” to a user. In other words, the method implements a point-wise based

ranking approach for learning the tensor recommendation model. The MAX method

applies the HOSVD-based factorization technique (Kolda and Bader, 2009) and

directly utilises the reconstructed tensor to generate the list of recommendations

based on the maximum values of the calculated predicted score on each user-item

set. This method simply assumes that the level of user preference on a candidate item

is solely represented by the calculated predicted score on a tag, i.e. the influence of

other tags is disregarded. It is to be noted that the MAX method builds the tensor

model from a relatively small size data (105 × 246 × 591 representing the size of

users, items, and tags) due to the scalability issue that commonly occurs in

reconstructing large tensor models.

Using the MAX method as a benchmarking method is necessary, as one of the

research objectives of this thesis is to develop two methods that use the boolean

scheme for constructing the user profiles and implements the point-wise based

ranking approach for learning the tensor model. The first proposed method is the

Tensor-based Item Recommendation using Probabilistic Ranking (TRPR) method,

which attempts to improve the quality of recommendations via a probabilistic

ranking approach in such a way that selection of candidate items achieved from the

reconstructed is taking into account the user’s tag preferences, while at the same time

solving the scalability issue. The second proposed method is Ranking using Weighted

Tensor (We-Rank), which attempts to utilise the user’s past tagging history via a

weighted scheme in such a way that the list of recommendations can be directly

generated from the factorized tensor.

3.5.2 Pairwise Interaction Tensor Factorization (PITF) Method

The Pairwise Interaction Tensor Factorization (PITF) method (Rendle and Schmidt-

Thieme, 2010) is a well-known and leading tensor-based tag recommendation

method. This method uses the set-based scheme (Rendle, Balby Marinho, et al.,


2009) for constructing the user profiles and the task of recommendations is regarded

as predicting the order of a pair of items, in which the interdependency occurs

between the two paired items. In other words, the method implements a pair-wise

based ranking approach for learning the tensor recommendation model. Since this

thesis has developed the methods of item recommendations based on user activities,

the PITF method is extended in this thesis for the task of item recommendation. The

adaptation is necessary as the task of recommending tags differs from the task of

recommending items.

For tag recommendation, predictions are generated for each predefined user

and item set, i.e. the recommendation system predicts tags for an item to a user.

However, for item recommendation, the recommendation system predicts items

based on the user information only. Consequently, a method must calculate the item

ranking score from the whole available tags before deciding which items are in the

Top-𝑁 recommendation list for the user.

Using the set-based interpretation scheme, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, the PITF

method represents the ranking of tagging data as (𝑢, 𝑡, 𝑖𝑃, 𝑖𝑁), where (𝑢, 𝑡, 𝑖𝑃) is a

triple of “relevant” or “positive” entry and (𝑢, 𝑡, 𝑖𝑁) is a triple of “irrelevant” or

“negative” entry. It then creates a tensor factorization model, which employs the

stochastic gradient descent algorithm (Rendle, Freudenthaler, et al., 2009) for

optimizing the ranking function in such a way that the positive entries are assigned

with higher values than the negative entries. This ensures the notion that the user

favours the positive entries more than the negative ones. The model is formulated as:

�̂� ∶= 𝑀(1) ∙ 𝑀(2)𝑈+𝑀(1) ∙ 𝑀(2)𝑇 (3.18)

where 𝑀(1) is the user latent factor matrix, 𝑀(3) is the tag latent factor matrix, 𝑀(2)𝑈

is the item factor matrix with respect to users, 𝑀(2)𝑇 is the item factor matrix with

respect to tags, and �̂� is the new tensor. The relevance recommendation ranking

score is calculated as:

�̂�𝑢,𝑡,𝑖 ∶= ∑ 𝑚(1)𝑢,𝑓 ∙ 𝑚(2)

𝑖,𝑓𝑈𝐹

𝑓=1 + ∑ 𝑚(3)𝑖,𝑓 ∙ 𝑚(2)

𝑖,𝑓𝑇𝐹

𝑓=1 (3.19)

where 𝐹 is the size of latent factors.


Using the PITF method as a benchmarking method is necessary as a research

objective of this thesis to develop two methods that use the ranking-based scheme for

constructing the user profiles and implements the list-wise based ranking approach

for learning the tensor model. This thesis conjectures that a recommendation task

should be regarded as a ranking task, i.e. predicting an ordered set of items that will

be of interest to a user where the predicted items depend on other corresponding

items, instead of predicting the order of a pair of items where the interdependency

only occurs between the two paired items. It is to be noted, employing a ranking-

based scheme results in multi-graded or graded-relevance data, while implementing a

list-wise based ranking approach requires that the ranking evaluation measure must

be directly optimized in order to learn the recommendation model. The two proposed

list-wise based ranking methods are: (1) DCG Optimization for Learning-to-Rank

(Do-Rank) that uses the proposed UTS scheme for constructing the tensor learning

model and the DCG as the optimization criterion; and (2) GAP Optimization for

Learning-to-Rank (Go-Rank) that uses the proposed graded-relevance scheme for

constructing the tensor learning model and the GAP as the optimization criterion.

3.5.3 CF-based method that applied the Candidate Tag Set (CTS) Method

The CTS method (Kim et al., 2010) is the state-of-the-art tag-based item

recommendation method that used the matrix as its learning model. This method

projects the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relationship into three binary relationships, i.e.

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩, ⟨𝑢𝑠𝑒𝑟, 𝑡𝑎𝑔⟩, and ⟨𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩, and uses the boolean scheme for

constructing the user profile. The CTS method considers that the tags included by a

certain user imply the latent preference and, therefore, the similarity between users

can be determined based on the user-created tags. The list of recommendations for

each user is generated by firstly identifying the tag preferences of each user, such

that the user’ likelihood on selecting items can be calculated.

Using the CTS method as a benchmarking method is necessary as this thesis is

emphasising that, given the ternary relation of tagging data, a tag-based

recommendation system needs to employ a multi-dimensional approach, i.e. tensor

model, rather than splitting them into a lower dimension model, i.e. matrix model.


3.6 CHAPTER SUMMARY

This chapter has detailed the research design that is used in the research conducted in

the next two chapters. It briefly explains the data interpretation schemes detailing

how tagging data can be interpreted in a tensor model in order to improve the

recommendation performance. It briefly explains the two categories of methods:

point-wise and list-wise, that are developed in this thesis for generating item

recommendations based on users’ tagging activities. Table 3.3 shows the summary of

ranking methods proposed in this thesis. Four real-world and freely-available

datasets that are used for the evaluation of the proposed methods have been

described. The evaluation measures for the performance comparison of the proposed

and benchmarking methods are presented, along with the benchmarks that will be

used for comparing the proposed point-wise based and list-wise based ranking

methods, as detailed in Chapter 4 and Chapter 5, respectively.

Ranking

Method

User Profile

Construction

Recomme

ndation

Task

Learning

Approach

Optimizati

on

Criterion

Recommendation

Generation

Approach

TRPR:

Probabilistic

Ranking

boolean

scheme

Prediction

task

Point-wise

based

ranking

Least

square loss

Two stage: Full

tensor

reconstruction from

the latent factors +

probabilistic

ranking

We-Rank:

Weighted

Tensor

Approach for

Ranking

boolean

scheme +

weighting to

reward and

penalise entries

Prediction

task

Point-wise

based

ranking

Weighted

least square

loss

Directly use the

latent factors

Do-Rank:

Learning from

Multi-graded

Data

UTS scheme Ranking

task

List-wise

based

ranking

Discount

Cumulative

Gain (DCG)

Directly use the

latent factors

Go-Rank:

Learning from

Graded-

relevance Data

graded-

relevance

scheme

Ranking

task

List-wise

based

ranking

Graded

Average

Precision

(GAP)

Directly use the

latent factors

Table 3.3. The summaries of the proposed ranking methods


Chapter 4: Point-wise based Ranking Methods 75

Chapter 4: Point-wise based Ranking

Methods

This chapter presents the point-wise based ranking recommendation methods

developed to solve the tag-based item recommendation task in this thesis. It begins

with an introduction of the point-wise based ranking approach. The next two sections

detail the proposed methods, i.e. TRPR: Probabilistic Ranking and We-Rank:

Weighted Tensor for Ranking, in which experiments are conducted for each method

and the results are discussed.

4.1 INTRODUCTION

Solving the recommendation task using a point-wise based ranking approach, the

recommendation problem is seen as a regression/classification learning problem, i.e.

predicting whether an item will be “relevant” or “irrelevant” to a user and that there

is no interdependency between the predicted item and other items (Liu, 2009; Mohan

et al., 2011; Rendle, 2011). In this case, a recommendation model should be

optimized with respect to the corresponding regression/classification loss function

(Liu, 2009; Mohan et al., 2011; Rendle, 2011).

4.1.1 Challenges

The discussion in Chapter 2 has established the merit of using the tensor model to

represent the ternary latent relations inherent in tagging data and infer the user

likeliness score from them. Using tensor models for generating recommendations

faces various challenges. Scalability is a common problem in generating

recommendations for large datasets using a tensor model. Full tensor reconstruction

is computed by multiplying all latent factors. This process is memory expensive and,

therefore, reconstructing large size tensors is infeasible (Kutty et al., 2012; Leginus

et al., 2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010).

A memory efficient method (Kolda and Sun, 2008) for enabling a latent factors

generation process has been proposed to fulfil the purpose of many applications that

76 Chapter 4: Point-wise based Ranking Methods

do not need a tensor to be reconstructed. However, the tensor-based recommendation

methods require the tensor to be fully reconstructed for identifying new entries, to be

used for generating a list of recommendations (Kutty et al., 2012; Nanopoulos, 2011;

Rafailidis and Daras, 2013; Symeonidis et al., 2008, 2010). This expensiveness of

tensor reconstruction has not been properly addressed yet (Ifada and Nayak, 2014b,

2014c).

Another problem faced by tensor-based recommendation models is the quality

of recommendation. Existing point-wise based ranking recommendation methods

(Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010) use a

reconstructed tensor directly for generating the recommendations based on the

maximum values of predicted scores in each user-item set of tensor elements. These

approaches assume that the predicted score in the reconstructed tensor can represent

the level of user preference for an item based on a tag only. They disregard the

activity history of the users (Jain and Varma, 2011) that is known to influence the

user likelihood to the recommended items (Kim et al., 2010).

Furthermore, data sparsity is also a common problem for tensor models. From

the various characteristics of real-world tagging data listed in Table 3.1, it can be

seen that tagging data is extremely sparse. For this reason, implementing the boolean

scheme to populate the tensor model from the tagging data results in the domination

of “0” values (representing the non-observed data) against the “1” values

(representing the observed data) in tensor entries. Directly applying the factorization

technique on this model will overfit the numerical values of “1” and “0” (Rendle,

Balby Marinho, et al., 2009). To solve this problem, researchers have represented the

model as a weighted version of the error function to ignore the missing data and

model only the known entries (Acar et al., 2011). However, the approach simply

builds the weighted learning model by generating a weighted tensor as a bijective

mapping of values of the primary tensor model entries. This disregards the users’

past tagging behaviours, i.e. as users may have different preferences of using

different tags for annotating items (Ifada and Nayak, 2014a; Wetzker et al., 2008).

4.1.2 Proposed Solutions

Two methods, TRPR and We-Rank are presented in this chapter to tackle the

challenges described above. TRPR and We-Rank fall under the category of point-


wise based ranking approach since they use the regression loss as the

recommendation model optimization criterion. For constructing the user profile

representation, both methods implement the boolean scheme to interpret the tagging

data. TRPR, the first developed method in this chapter, focuses on improving

scalability during the tensor reconstruction process and improving recommendation

accuracy after the tensor model has been reconstructed. It is to be noted that this

method does not deal with the complexity within the latent factors generation task.

We-Rank, the second method, focuses on dealing with the sparsity problem and

improving the recommendation accuracy during the learning-to-rank procedure.

4.2 TRPR: PROBABILISTIC RANKING

4.2.1 Overview

The developed TRPR method focuses on improving the scalability during the tensor

reconstruction process and the recommendation accuracy after the tensor model has

been reconstructed. Figure 4.1 illustrates the overview of the probabilistic ranking

method, called the Tensor-based Item Recommendation using Probabilistic Ranking

(TRPR). It utilises a memory efficient loop approach for scalable full tensor

reconstruction and a probabilistic ranking to improve the accuracy of

recommendations generated from the reconstructed tensor.

To begin, a third-order tensor model representing the user profiles is built from

the tagging data by implementing the boolean scheme. The resulting tensor model

represents the collaborative activities of the users and becomes the input to learn the

recommendation model for regression. This model is then factorized to find the latent

factors in the user, item, and tag dimensions, which are used to reconstruct the tensor

and identify the new derived entries for making recommendations. To tackle the

expensiveness of the tensor reconstruction process, TRPR applies a memory efficient

loop, by implementing the block-striped (matrix) product approach, for multiplying

the factorized elements of the tensor model to enable a scalable tensor reconstruction.

Unlike the conventional way (Nanopoulos, 2011; Rafailidis and Daras, 2013;

Symeonidis et al., 2010), TRPR does not directly use the reconstructed tensor entries

for generating recommendations. Instead, the list of candidate items and tag


preferences for each user are firstly generated from the reconstructed tensor entries.

The probabilistic ranking stage generates the Top-𝑁 list of item recommendations to

users. TRPR improves the recommendation quality by ranking the items, in which

the probability of users to select items of the candidate item set are calculated by

employing the tag preference set. This ensures that the list of recommended items is

ranked according to the probability value that the user may like.

boolean

Scheme

Tagging Data

Item 3

Item 1

Tag 2

Tag 3

Tag 4

Tag 5

Tag 1

Item

2User

1

User2

User3

Item4

User Profile

Construction

YÎRQ x R x S

Latent Factors

Generation

C, M(1), M

(2),M

(3)

Tensor Reconstruction

(Memory Efficient Approach: n-mode

block-striped (matrix) product):

Y’ÎRQ x R x S

Candidate Item Set

Generation

Zu

Tag Preference Set

Generation

Xu

Top-N Item Recommendation Generation

(Probabilistic Ranking)

TopNu

Target

User

Figure 4.1. Overview of the Probabilistic Ranking method (TRPR)

The next three sub-sections detail the three main processes in TRPR: (1) user

profile construction, i.e. tensor model construction; (2) learning-to-rank procedure,

i.e. latent factors generation via tensor factorization to derive the relationships

inherent in the model; and (3) recommendation generation. The last two sub-sections

present empirical evaluation and summary of the method.

4.2.2 User Profile Construction

The user profile construction includes constructing an initial tensor to model the

multi-dimension tagging data. The TRPR method uses the boolean scheme

(Symeonidis et al., 2010) to populate the third-order tensor model. This tensor model

can now represent the user profile and becomes the underlying ranking learning


model for recommendation. The boolean scheme is commonly used in tag-based

item recommendation methods and simply interprets the tagging data as binary data,

which includes two types of entries, i.e. “relevant” and “irrelevant” entries. The

“relevant” entries, labelled as “1”, are the observed entries where the user has

explicitly revealed interest by annotating item using tags; the “irrelevant” entries,

labelled as “0”, are the remaining (non-observed) entries.



𝐴 ∶ 𝑈 × 𝐼 × 𝑇, a vector of 𝑎: (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an activity of user 𝑢 to annotate

item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state, for which users

have expressed their interest to items in the past, by annotating those items using tags

where 𝐴𝑜𝑏 ⊆ 𝐴. Note that the number of observed tagging data is usually very

sparse, thus |𝐴𝑜𝑏| ≪ |𝐴|. The initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed where 𝑄, 𝑅,

and 𝑆 are the size of set of users, items and tags respectively, while each tensor entry,

𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the relevance grade based on the user

tagging activity. The rules of the boolean scheme relevance grade labelling to

generate the entries of tensor 𝒴 can be formulated as follows:


0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (4.1)

Example 4.1: Tagging Data Interpretation using boolean scheme.

An example of tagging data in Figure 3.2 illustrates a toy example that represents a

tensor model that holds the record of 𝐴𝑜𝑏, 𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3},

𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. Each slice of the tensor represents a user

matrix, which contains the user tag usage for each item. The “+” symbols represent

the 𝐴𝑜𝑏 entries; for instance, the observed tagging data example of Figure 3.2 shows

that user 𝑢1 has annotated item 𝑖2 using tag 𝑡1. Figure 4.2 illustrates the constructed

initial tensor 𝒴 ∈ ℝ3×4×5, as the representation of user profile, in which entries are

generated from the tagging data by implementing the boolean interpretation scheme

as formulated in Equation (4.1).


User 30 0 0 0 0

0

0

0

0 0 0 0

1 0 0 0

0 0 1 1

User 2

1 0 0 0 0

0

0

0

0 0 0 0

1 0 1 0

0 0 0 0

User 10 0 0 0 0

1

0

0

1 0 0

0 0 1 0

0 0 0 0

tag

ite

m 0

Figure 4.2. Example of initial tensor 𝒴 ∈ ℝ3×4×5 as the representation of user profile in which entries

are generated by implementing the boolean interpretation scheme to the toy example in Figure 3.2

4.2.3 Learning-to-Rank Procedure

This section details the process of learning and generating latent factors that

correspond to each dimension of tensor 𝒴.

4.2.3.1 Optimization Criterion and Factorization Technique

Mean Square Error (MSE) is the common optimization criterion for solving a

regression/classification task. The MSE of all users over all items under all tags can

be defined as:

𝑀𝑆𝐸 ∶=1

𝑄𝑅𝑆∑ ∑ ∑ [𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡]

2𝑡∈𝑇𝑖∈𝐼𝑢∈𝑈 (4.2)

where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned as one of elements in the binary

relevance set of { 0, 1} from the user profile represented by the initial tensor model.

The �̂�𝑢,𝑖,𝑡 is the predicted preference score that reflects the preference level of user 𝑢

for annotating item 𝑖 using tag 𝑡, calculated from the latent factors of the tensor

model. Recall that 𝑄, 𝑅, and 𝑆 are the number of users, items, and tags, respectively.

Latent factors are learned from the tensor 𝒴 to derive the latent relationships

between the dimensions of users, items and tags. The practice of generating the latent

factors is commonly called the tensor factorization process (Koren et al., 2009;

Rendle and Schmidt-Thieme, 2010). Two broad families of factorization techniques

are Tucker and Candecomp/Parafac (CP) (Kolda and Bader, 2009). A Tucker model,

as illustrated in Figure 4.3, includes the Higher-Order SVD (HOSVD) and Higher-

Order Orthogonal Iteration (HOOI) models (Kolda and Bader, 2009). On the other


hand, a CP model can be considered as a special case of Tucker where the core

tensor is diagonal (Kolda and Bader, 2009), as illustrated in Figure 4.4.

Y » M(2)

(R xF)

M(1)

(Q x F)

M(3)

(S x F)

C(F x F x F)

(Q x R x S)

Figure 4.3. The Tucker factorization model for a third-order tensor

Y » M(2)

(R xF)

1 1 1

M(1)

(Q x F)

M(3)

(S x F)

C(F x F x F)

(Q x R x S)

Figure 4.4. The CP factorization model for a third-order tensor

TRPR can implement either the Tucker or the CP model as the predictor

function for calculating the predicted preference score �̂�𝑢,𝑖,𝑡. For a third-order tensor

𝒴 ∈ ℝ𝑄×𝑅×𝑆, a Tucker or CP model may perform Singular Value Decomposition

(SVD) on each mode-𝑛 matricization of tensor 𝒴 (Kolda and Bader, 2009) and

results in three latent factors matrices, corresponding to each dimension of tensor 𝒴,

𝑀(1) ∈ ℝ𝑄×𝐹, 𝑀(2) ∈ ℝ𝑅×𝐹, and 𝑀(3) ∈ ℝ𝑆×𝐹, where 𝐹 ≪ 𝐻 and 𝐻 ∈ {𝑄, 𝑅, 𝑆}. The

diagonal core tensor 𝒞 ∈ ℝ𝐹×𝐹×𝐹, that defines the interaction between the users,

items and tags (Kolda and Bader, 2009) can be calculated by multiplying all the

latent factors together as:

𝒞 ∶= 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ ×3 (𝑀(3))′ (4.3)


Afterwards, the predicted preference score �̂�𝑢,𝑖,𝑡, a score that reflects the preference

level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡, is calculated as:

�̂�𝑢,𝑖,𝑡 ∶= ∑ ∑ ∑ 𝑐𝑓,𝑓,𝑓 ∙ 𝑚𝑢,𝑓(1)

∙ 𝑚𝑖,𝑡(2)

∙ 𝑚𝑡,𝑓(3)𝐹

𝑓=1𝐹𝑓=1

𝐹𝑓=1 = ⟦𝒞;𝑀(1), 𝑀(2), 𝑀(3)⟧ (4.4)

Definition 4.1.

The 𝑛-mode (matrix) product, denoted by ×𝑛, is a multiplication operation of a

tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 with a matrix 𝑀 ∈ ℝ𝐷×𝐹 in mode-𝑛. This operation is

equivalent to multiplying the matrix 𝑀 by the appropriate tensor mode-𝑛

matricization 𝑌(𝑛) (Kolda and Bader, 2009):

𝒴�̂� ∶= 𝒴 ×𝑛 𝑀 ⟺ �̂�(𝑛) ∶= 𝑀𝑌(𝑛) (4.5)

Definition 4.2.

The mode-𝑛 matricization of tensor 𝒴, denoted by 𝑌(𝑛), is the process of re-arranging

the tensor elements into a matrix element (Kolda and Bader, 2009). For instance, a

tensor 𝒴 ∈ ℝ3×4×5 can be rearranged as three ways of matricization, i.e. 𝑌(1) ∈

ℝ3×20, 𝑌(2) ∈ ℝ4×15, and 𝑌(3) ∈ ℝ5×12, as illustrated in Figure 4.5.

User 3i u G S ee

j

k

l

v H T ff

w I U gg

x J V hh

User 2

e q C O aa

f

g

h

r D P bb

s E Q cc

t F R dd

User 1a m y K W

b

c

d

z L X

o A M Y

p B N Z

tag

ite

m n

𝒴 ∈ ℝ3×4×5

𝑌(1) = [

a b c d m n o p y z A B K L M N W X Y Ze f g h q r s t C D E F O P Q R aa bb cc ddi j k l u v w x G H I J S T U V ee ff gg hh

]

𝑌(2) = [

a e i m q u y C G K O S W aa eeb f j n r v z D H L P T X bb ffc g k o s w A E I M Q U Y cc ggd h l p t x B F J N R V Z dd hh

]

𝑌(3) =

[ a b c d e f g h i j k lm n o p q r s t u v w xy z A B C D E F G H I JK L M N O P Q R S T U VW X Y Z aa bb cc dd ee ff gg hh]

Figure 4.5. Example of three ways matricization of a tensor 𝒴 ∈ ℝ3×4×5


4.2.3.2 Latent Factors Generation

Latent factors generation, via tensor factorization, is the process of deriving the latent

relationships between dimensions of the tensor model. The latent factors matrices,

𝑀(1), 𝑀(2), and 𝑀(3) corresponding to each dimension of tensor 𝒴, are generated by

optimizing the objective function of the tensor model. Given Equation (4.2), the

objective function can be formulated as (Kolda and Bader, 2009):

𝐿(𝛩) ∶= ∑ ∑ ∑ [𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡]2

𝑡∈𝑇𝑖∈𝐼𝑢∈𝑈 = [𝒴 − ⟦𝒞;𝑀(1), 𝑀(2), 𝑀(3)⟧]2 (4.6)

Note that the constant coefficient 1

𝑄𝑅𝑆 in 𝑀𝑆𝐸 can be neglected in Equation

(4.6) since it has no influence on the optimization. TRPR implements the Alternating

Least Square (ALS) approach (Kolda and Bader, 2009) to optimize the objective

function in Equation (4.6). The ALS approach is executed by fixing all but one

matrix of 𝑀(1), 𝑀(2), and 𝑀(3) as follows:

𝑀(1) = 𝒴 ×2 (𝑀(2))′ ×3 (𝑀(3))′ (4.7)

𝑀(2) = 𝒴 ×1 (𝑀(1))′ ×3 (𝑀(3))′ (4.8)

𝑀(3) = 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ (4.9)

This process is repeated until a certain number of iterations, i.e. as the convergence

criterion is satisfied (Carroll and Chang, 1970; Harshman, 1970; Kolda and Bader,

2009). The suggested number of iterations is no more than 50 (Bader et al., 2012).

Figure 4.6 shows the learning algorithm used in TRPR.


Figure 4.6. The TRPR learning algorithm, adapted from (Kutty et al., 2012)

4.2.4 Recommendation Generation

This section details how TRPR generates the Top-𝑁 list of item recommendations for

each user. Existing methods rank the candidate items based on the maximum value

of �̂�𝑢,𝑖,𝑡 for each (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 of the sets in the reconstructed tensor 𝒴 ̂(Kutty et al.,

2012; Nanopoulos, 2011; Rafailidis and Daras, 2013; Symeonidis et al., 2010). These

approaches fail to consider the user’s tag usage history in the initial tensor 𝒴 as it

solely generates the recommendations using the level of user preference for an item,

based on a tag only. TRPR approaches the problem of item recommendation as a

classification problem, making Naïve Bayes (Baker and McCallum, 1998) apt for

finding an efficient solution (Lops et al., 2011). In this case, the list of

recommendations is ranked and generated based on the probability score of a user to

select candidate items, in which the user tag usage history (or preference) is taken

into account.

Algorithm: TRPR Learning

Input: Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ∶= 𝑈 × 𝐼 × 𝑇, the size of latent factor 𝐹, maximal

iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥

Output: Latent Factors Matrices 𝑀(1), 𝑀(2), 𝑀(3) and the core tensor 𝒞

1. Construct the initial tensor model with the tagging data

𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆

Populate 𝒴 using Equation (4.1)

2. Apply a factorization technique to tensor 𝒴 to get:

a. The latent factors:

Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹, 𝑀(2)(0)

∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0

repeat

𝑀(1) 𝐹 leading left singular vectors of Equation (4.7)



ℎ + +

until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥 b. The core tensor:

𝒞 ← 𝒴 ×1 (𝑀(1))′ ×2 (𝑀(2))′ ×3 (𝑀(1))′

where 𝐶 ∈ ℝ𝐹×𝐹×𝐹


4.2.4.1 Tensor Reconstruction

Tensor reconstruction is the process of revealing new entries that are inferred from

the latent factors. The reconstructed tensor �̂� is derived by multiplying the core

tensor by all latent factor matrices:

�̂� ∶= 𝒞 ×1 𝑀(1) ×2 𝑀(2) ×3 𝑀(3) (4.10)

Implementing the general 𝑛-mode matrix product for reconstructing a tensor on

a large dataset is expensive, due to memory overflow. The problem becomes worse

in the last step of multiplication, where the effects of earlier latent factors have been

included. TRPR proposes a memory-efficient loop approach, i.e. the block-striped

(matrix) product approach to solve the problem of the last step of multiplication.

In TRPR, the 1-mode and 2-mode (matrix) products are implemented to

multiply the core tensor 𝒞 by the reduced factor matrices 𝑀(1) and 𝑀(2), in order to

obtain the intermediate tensor results 𝒴1̂ and 𝒴2̂ sequentially. To multiply 𝒴2̂ by the

third reduced factor matrix 𝑀(3), a 3-mode block-striped (matrix) product is

implemented for the purpose of memory efficiency. The block-stripping of the

matrix 𝑀(3) and multiplication subtasks allow producing smaller manipulations that

can fit in the allowed memory size. In this case, the multiplication task between the

mode-3 matrix (𝑌2̂) (an equivalent form of tensor 𝒴2̂) and 𝑀(3) is split into 𝑁 number

of subtasks, where 𝑁 ∶= 𝑆 𝑑𝑖𝑣 𝑏, 𝑆 is the size of set of tags, and 𝑏 is a user-given

block-strip row size (𝑏 ≪ S). This means that a matrix (𝑀(3))𝑛

, where |(𝑀(3))𝑛| =

𝑏, is obtained and multiplied by 𝑌2̂, at each subtask. Finally, the complete

reconstructed tensor �̂� is achieved by combining all subtask results. Figure 4.7

shows the tensor reconstruction algorithm used in TRPR.


Figure 4.7. The TRPR tensor reconstruction algorithm

Algorithm: TRPR Tensor Reconstruction

Input: Latent factor matrices: 𝑀(1), 𝑀(2), 𝑀(3), 𝒞, 𝑄 = |𝑈|, 𝑅 = |𝐼|, 𝑆 = |𝑇|, Block-strip row size 𝑏 where 𝑏 ≪ 𝑆; 𝑁 ∶= 𝑆 𝑑𝑖𝑣 𝑏 and 𝑑 ∶= 𝑆 𝑚𝑜𝑑 𝑏.

Output: Reconstructed Tensor �̂�

1. 1-mode (matrix) product:

𝒴1̂ 𝒞 ×1 𝑀(1)

where 𝒴1̂ ∈ ℝ𝑄×𝐹×𝐹

2. 2-mode (matrix) product:

𝒴2̂ 𝒴1̂ ×2 𝑀(2)

where 𝒴2̂ ∈ ℝ𝑄×𝑅×𝐹

3. Block-striped 3-mode (matrix) product:

for 𝑛 1 to 𝑁

(𝑀(3))𝑛

𝑀(3)((𝑛−1)𝑏+1,𝐹)

where (𝑀(3))𝑛

∈ ℝ𝑏×𝐹

/* 3-mode (matrix) product: */

(𝒴3̂)𝑛 𝒴2̂ ×3 (𝑀(3))

𝑛⟺ (𝑌3̂(3))

𝑛 (𝑀(3))

𝑛𝑌2̂(3)

where (𝑌3̂(3))1

∈ ℝ𝑏×𝑄𝑅

(𝒴3̂)𝑛 mode-3 de-matricization of (𝑌3̂(3))

𝑛

where (𝒴3̂)𝑛∈ ℝ𝑄×𝑅×𝑏

𝒴3̂ 𝒴3̂ + (𝒴3̂)𝑛

end for

if 𝑑 0 then

𝑛 𝑛 + 1

(𝑀(3))𝑛

𝑀(3)((𝑛−1)𝑑+1,𝐹)

where (𝑀(3))𝑛

∈ ℝ𝑑×𝐹

/* 3-mode (matrix) product: */

(𝒴3̂)𝑛 𝒴2̂ ×3 (𝑀(3))

𝑛⟺ (𝑌3̂(3))

𝑛 (𝑀(3))

𝑛𝑌2̂(3)

where (𝑌3̂(3))1

∈ ℝ𝑑×𝑄𝑅

(𝒴3̂)𝑛 mode-3 de-matricization of (𝑌3̂(3))

𝑛

where (𝒴3̂)𝑛∈ ℝ𝑄×𝑅×𝑑

𝒴3̂ 𝒴3̂ + (𝒴3̂)𝑛

end if

�̂� 𝒴3̂ where �̂� ∈ ℝ𝑄×𝑅×𝑆


Example 4.2: TRPR Tensor Reconstruction.

An example of tensor reconstruction is illustrated in Figure 4.8 by using the toy

example of a third-order tensor 𝒴 ∈ ℝ3×4×5 shown in Figure 4.2. Applying a

factorization technique with 𝐹 = 2 as the reduction size to 𝒴 results in three latent

factor matrices and one core tensor, 𝑀(1) ∈ ℝ3×2, 𝑀(2) ∈ ℝ4×2, 𝑀(3) ∈ ℝ5×2, and

𝐶 ∈ ℝ2×2×2. The reconstructed tensor �̂� is derived by multiplying all factorized

elements together. The intermediate tensor 𝒴2̂ ∈ ℝ3×4×2 is obtained by

implementing the 1-mode and 2-mode (matrix) products sequentially on the core

tensor 𝒞 and the latent factor matrices 𝑀(1) and 𝑀(2).

For the purpose of memory efficiency, the parallel matrix multiplication based on the

row wise block-striped matrix product is applied to multiply 𝒴2̂ by the last factor

matrix 𝑀(3). In this case, the 3-mode (matrix) products of tensor 𝒴2̂ by matrix 𝑀(3)

(denoted as 𝒴2̂ ×3 𝑀(3)) is converted as the multiplication of matrix 𝑀(3) by 𝑌2̂(3) as

the tensor 𝒴2̂ mode-3 matricization (denoted as 𝑀(3)𝑌2̂(3)).

As shown in Figure 4.8, the 3-mode (matrix) products of tensor 𝒴2̂ by matrix 𝑀(3) is

split, by choosing 𝑏 = 2, into three 3-mode block-striped (matrix) products, resulting

(𝑌3̂(3))1

∈ ℝ2×12, (𝑌3̂(3))2

∈ ℝ2×12, and (𝑌3̂(3))3

∈ ℝ1×12. The full reconstructed

tensor is derived by combining the mode-3 de-matricization of the three resulted 3-

mode block-striped (matrix) products, i.e. 𝒴3̂ (𝒴3̂)1+ (𝒴3̂)2

+ (𝒴3̂)3, where �̂�

𝒴3̂, �̂� ∈ ℝ3×4×5.


ME

MO

RY

E

FFIC

IEN

T

AP

PR

OA

CH

EQUIVALENT

User 3

User 2

User 1

t a g

it

em

3-mode (matrix) product;

tag

ite

m

2-mode (matrix) product

3-mode (matrix) product with matricization;

tag

use

r

1-mode (matrix) product

COMBINING RESULTS

3-mode block-striped (matrix) product Full Reconstructed Tensor

;

User 3

User 2

User 1

t a g

it

em

Figure 4.8. Example of tensor reconstruction process by implementing the memory efficient approach

where 𝒴 ∈ ℝ3×4×5, 𝑄 = 3, 𝑅 = 4, 𝑆 = 5, 𝐹 = 2, and 𝑏 = 2


The reconstructed tensor �̂� identifies the new entries that are inferred from the

latent factors. It is to be noted that, compared to the initial tensor in Figure 4.2, the

relevance grade 𝑦𝑢,𝑖,𝑡 in 𝒴 has been recalculated as predicted preference score �̂�𝑢,𝑖,𝑡

in �̂�, as illustrated in Figure 4.9. The predicted preference score �̂�𝑢,𝑖,𝑡 represents the

likeliness of user 𝑢 to tag item 𝑖 with tag 𝑡. As the system is trying to recommend

items that have not been selected by the users, the list of item recommendations for

each user will be selected based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.

User 1-0.0494

0.7318

0.5577

-0.0232

tag

ite

m

0.1359

-0.0332

0.2997

0.1664

-0.1249

0.6034

0.3170

-0.0445

0.1102

0.1537

0.4369

0.1928

-0.0186

-0.0215

0.1257

0.1199

User 20.7319

-0.0604

0.5335

-0.0071

-0.0186

0.1609

0.4453

0.2011

0.5665

-0.1961

0.1817

-0.0424

0.1602

0.1202

0.6126

0.2476

-00302

-0.0329

0.1404

0.1312

User 30.1037

0.1037

0.1683

0.1205

-0.0308

-0.0343

0.4882

0.2469

0.1150

0.1163

-0.1216

-0.0491

-0.0443

-0.0495

0.6666

0.3140

-0.0092

-0.0102

0.2043

0.1400

Figure 4.9. Example of the reconstructed tensor �̂� ∈ ℝ3×4×5

4.2.4.2 Candidate Item and Tag Preference Sets Generation

As previously described, TRPR attempts to rank and generate the list of

recommendations based on the probability score of a user 𝑢 to select items by taking

into account the user’s tag usage history. For this reason, two sets are created, i.e. the

candidate item set and the tag preference set, for each user 𝑢. The candidate item set,

𝑍𝑢 = {𝑖1, 𝑖2 , 𝑖3, … , 𝑖𝑅} where 𝑍𝑢 ⊆ 𝐼 with |𝑍𝑢| ≤ 𝑅 and 𝑅 is the size of set of items,

is a list of items that the user 𝑢 might be interested in based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.

Note that setting a user-defined threshold to the entries in �̂� is necessary in order to

determine whether the items can be considered for recommendations (Kutty et al.,

2012). The tag preference set, 𝑋𝑢 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑣} where 𝑋𝑢 ⊆ 𝑇 with |𝑋𝑢| ≤ 𝑣

and 𝑣 is the size of tag preference set where 𝑣 ≤ 𝑆 with 𝑆 as the size of set of tags, is

a list of tags that user 𝑢 has used to annotate the items. The tag preference set is

generated based on the maximum values of �̂�𝑢,𝑖,𝑡 on each (𝑢, 𝑖) ∈ 𝐴 set.


4.2.4.3 Top-𝑁 Item Recommendation Generation via Probabilistic Ranking

The probabilistic ranking approach calculates the probability of users to select items

in 𝑍𝑢 by observing the previous usage activities of tag preference set 𝑋𝑢 in 𝒴. The

Bayes’ theorem is used for predicting the class candidate item 𝑍𝑢 that have the

highest posterior probability given 𝑋𝑢, 𝑝(𝑍𝑢|𝑋𝑢). The conditional probability can be

formulated as (McCallum and Nigam, 1998):

𝑝(𝑍𝑢|𝑋𝑢) =𝑝(𝑍𝑢)𝑝(𝑋𝑢|𝑍𝑢)

𝑝(𝑋𝑢) (4.11)

where prior 𝑝(𝑍𝑢) is the prior distributions of parameter set 𝑍𝑢 before 𝑋𝑢 is

observed; 𝑝(𝑋𝑢|𝑍𝑢) is the probability of observing tag preference set 𝑋𝑢 given 𝑍𝑢;

and 𝑝(𝑋𝑢) is the probability of observing 𝑋𝑢.

TRPR generates the Top-𝑁 list of recommendations for target user 𝑢 by

implementing the assumption of multinomial event model distribution for the Naïve

Bayes classifier, i.e. assuming that an item 𝑖𝑟 ∈ 𝑍𝑢 is represented by the number of

occurrences of 𝑡𝑐 ∈ 𝑋𝑢 (McCallum and Nigam, 1998). In this case, the posterior

probability 𝑝𝑢,𝑖𝑑 of user 𝑢 with tag preference 𝑋𝑢 for candidate item 𝑖𝑑 ∈ 𝑍𝑢 is

obtained by multiplying the prior probability of 𝑖𝑑, 𝑝(𝑍𝑢 = 𝑖𝑑), with the probability

of tag preference 𝑡𝑐 ∈ 𝑋𝑢 given 𝑖𝑑, 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑):

𝑝𝑢,𝑖𝑑 ∶= 𝑝(𝑖𝑑|𝑋𝑢) = 𝑝(𝑍𝑢 = 𝑖𝑑)∏ 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑)((∑ 𝑦𝑢,𝑖,𝑡𝑐𝑅𝑖=1 )+1)|𝑋𝑢|

𝑐=1 (4.12)

where 𝑦𝑢,𝑖,𝑡𝑐 denotes the binary relevance grade for user 𝑢 who has used tag

preference 𝑡𝑐 to annotate item 𝑖. The 𝑝(𝑍𝑢 = 𝑖𝑑) and 𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑) are calculated

as:

𝑝(𝑍𝑢 = 𝑖𝑑) =∑ 𝕀((𝑢,𝑖𝑑,∗)∈𝐴𝑜𝑏)

𝑄𝑢=1

∑ ∑ 𝕀((𝑢,𝑖,∗)∈𝐴𝑜𝑏)𝑄𝑢=1

𝑅𝑖=1

(4.13)

𝑝(𝑡𝑐|𝑍𝑢 = 𝑖𝑑) =1+∑ 𝑦𝑢,𝑖𝑑,𝑡𝑐

𝑄𝑢=1

𝑆+∑ ∑ 𝑦𝑢,𝑖𝑑,t𝑆𝑡=1

𝑄𝑢=1

(4.14)

where the 𝕀(∙) is the indicator function which is equal to 1 if the condition is

satisfied, and 0 otherwise. Recall 𝑄, 𝑅, 𝑆 are the number of users, items, and tags,

respectively. To avoid zero values resulted from Equation (4.12) and Equation


(4.14), a Laplacean estimate (Lops et al., 2011) is applied as a smoothing method by

adding one to those equations.

For the target user 𝑢, the list of Top-𝑁 item recommendations is an ordered set

of 𝑁 items, 𝑇𝑜𝑝𝑁𝑢, obtained by sorting the 𝑝𝑢,𝑖𝑑 of user’s candidate items in

descending order. Figure 4.10 describes the probabilistic ranking algorithm for

generating the Top-𝑁 list of item recommendation.

Figure 4.10. The probabilistic ranking for Top-𝑁 item recommendation generation algorithm

𝑝𝑢,𝑖𝑑 𝑝(𝑖𝑑|𝑋𝑢)

Algorithm: Probabilistic Ranking for Top-𝑵 Item Recommendation

Generation

Input: Initial tensor 𝒴, Reconstructed tensor �̂�, Tag preference size 𝑣, Number

of Recommendation 𝑁

Output: The list of 𝑁 items: 𝑇𝑜𝑝𝑁𝑢

For each target user 𝑢:

1. Generate the candidate item set:

𝑍𝑢 = {𝑖1, 𝑖2 , 𝑖3, … , 𝑖𝑅} with |𝑍𝑢| ≤ 𝑅

where 𝑍𝑢 {𝑖|(𝑢, 𝑖,∗) ∈ 𝐴\𝐴𝑜𝑏} /* non-observed items */

2. Generate the tag preference set:

𝑋𝑢 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑣} such that |𝑋𝑢| ≤ 𝑣

where each tag preference is derived based on 𝑚𝑎𝑥�̂�𝑢,𝑖,𝑡( (𝑢, 𝑖) ∈ 𝐴)

3. Calculate posterior probability of each item in 𝑍𝑢 and use the value for

generating Top-𝑁 item recommendation:

𝐷 ∅, 𝐿𝑖𝑠𝑡𝑃 ∅

/* initialize the 𝐿𝑖𝑠𝑡𝑃 using the first 𝑁 posterior values of 𝑍𝑢 */

for 𝑑 1 to 𝑁

𝑝𝑢,𝑖𝑑 𝑝(𝑖𝑑|𝑋𝑢)

𝐿𝑖𝑠𝑡𝑃𝐿𝑖𝑠𝑡𝑃 ⋃𝑝𝑢,𝑖𝑑

𝐷 𝐷 ∪ 𝑑

end for

/* update 𝐿𝑖𝑠𝑡𝑃 */

for 𝑑 (𝑁 + 1) to |𝑍𝑢|

if 𝑝𝑢,𝑖𝑑 > (min 𝐿𝑖𝑠𝑡𝑃) then

𝐿𝑖𝑠𝑡𝑃 𝐿𝑖𝑠𝑡𝑃 − (min 𝐿𝑖𝑠𝑡𝑃)

𝐷𝐷 − 𝑑𝑚𝑖𝑛

𝐿𝑖𝑠𝑡𝑃 ← 𝐿𝑖𝑠𝑡𝑃 ∪ 𝑝𝑢,𝑖𝑑

𝐷𝐷 ∪ 𝑑

end if

end for

𝑇𝑜𝑝𝑁𝑢 {𝑖𝑑 ∈ 𝑍𝑢 |𝑑 ∈ 𝐷}


Example 4.3: Probabilistic Ranking.

An example on how to calculate the posterior probability score, using the toy

example, is illustrated by firstly representing the entries of the initial tensor 𝒴 in

Figure 4.2 and the reconstructed tensor �̂� in Figure 4.9 as tables in Figure 4.11(a)

and (b) respectively. Note that the non-negative and non-zero entries are disregarded,

which results in 38 non-zero entries out of a total of 60 entries in �̂�. From Figure

4.11(b), it can be observed that the tensor reconstruction process has recalculated

𝑦𝑢,𝑖,𝑡 in 𝒴 as continuous values �̂�𝑢,𝑖,𝑡 in �̂�.

(a) (b)

Figure 4.11. Example of tensor model from toy dataset with only non-negative and non-zero values

displayed as table: (a) Initial tensor 𝒴 ∈ ℝ3×4×5, and (b) Reconstructed tensor �̂� ∈ ℝ3×4×5


Since the system is interested in recommending items, the process would identify

items that have not been selected by each targeted user based on (𝑢, 𝑖) ∈ 𝐴\𝐴𝑜𝑏 sets.

As highlighted in Figure 4.11(b), �̂� identifies six new observed (𝑢, 𝑖) sets in total for

𝑢1, 𝑢2, and 𝑢3, i.e. {(𝑢1, 𝑖1), (𝑢1, 𝑖4), (𝑢2, 𝑖2), (𝑢2, 𝑖4), (𝑢3, 𝑖1), (𝑢3, 𝑖2)}. The

candidate item set of each user, that needs to be ranked probabilistically, is derived

as 𝑍𝑢1= {𝑖1, 𝑖4}; 𝑍𝑢2

= {𝑖2, 𝑖4}; 𝑍𝑢3= {𝑖1, 𝑖2}. By choosing 𝑣 = 3, the tag preference

sets of each user are derived as 𝑋𝑢1= {𝑡1, 𝑡3, 𝑡4}, 𝑋𝑢2

= {𝑡1, 𝑡3, 𝑡4}, and 𝑋𝑢3=

{𝑡2, 𝑡4, 𝑡5}. Using Equation (4.12), the posterior probability scores of the candidate

items for each user are calculated as:

𝑝𝑢1,𝑖1 =1

6∗ [(

1+1

5+1)1+1

∗ (1+0

5+1)1+1

∗ (1+0

5+1)1+1

] = 1.43𝑒−05

𝑝𝑢1,𝑖4 =1

6∗ [(

1+0

5+2)1+1

∗ (1+0

5+2)1+1

∗ (1+1

5+2)1+1

] = 0.57𝑒−05

𝑝𝑢2,𝑖2 =1

6∗ [(

1+1

5+2)1+1

∗ (1+1

5+2)0+1

∗ (1+0

5+2)1+1

] = 7.93e−05

𝑝𝑢2,𝑖4 =1

6∗ [(

1+0

5+2)1+1

∗ (1+0

5+2)0+1

∗ (1+1

5+2)1+1

] = 3.97e−05

𝑝𝑢3,𝑖1 =1

6∗ [(

1+0

5+1)1+1

∗ (1+0

5+1)1+1

∗ (1+0

5+1)1+1

] = 0.36e−05

𝑝𝑢3,𝑖2 =1

6∗ [(

1+0

5+1)1+1

∗ (1+0

5+1)1+1

∗ (1+0

5+1)1+1

] = 0.36e−05

Given the above posterior probability scores, the list of Top-𝑁 item

recommendations for each user can now be generated. It can be concluded that 𝑖1 is

more likely to interest 𝑢1 than 𝑖4 since 𝑝𝑢1,𝑖1: 1.43𝑒−05 > 𝑝𝑢1,𝑖4: 0.57𝑒−05. While 𝑖2

is more likely to interest 𝑢2 than 𝑖4, since 𝑝𝑢2,𝑖2: 7.93𝑒−05 > 𝑝𝑢2,𝑖4: 3.97𝑒−05. On the

other hand, 𝑖1 and 𝑖2 are on the same level of interest for 𝑢3, as 𝑝𝑢3,𝑖1: 0.36𝑒−05 =

𝑝𝑢3,𝑖2: 0.36𝑒−05.

As a result, 𝑇𝑜𝑝𝑁𝑢1, 𝑇𝑜𝑝𝑁𝑢2

and 𝑇𝑜𝑝𝑁𝑢3 are generated in the sequence order of {𝑖1,

𝑖4}, {𝑖2, 𝑖4} and {𝑖1, 𝑖2}, respectively. These results differ from the conventional

tensor-based approaches (Kutty et al., 2012; Nanopoulos, 2011; Rafailidis and Daras,

2013; Symeonidis et al., 2010) which generate 𝑇𝑜𝑝𝑁𝑢1, 𝑇𝑜𝑝𝑁𝑢2

and 𝑇𝑜𝑝𝑁𝑢3 as the

sequence order of {𝑖4, 𝑖1}, {𝑖4, 𝑖2} and {𝑖2, 𝑖1}, respectively. Note that the orders of


conventional approach would be resultant based on the following conditions:

𝑚𝑎𝑥(�̂�𝑢1,𝑖1): 0.1359 < 𝑚𝑎𝑥(�̂�𝑢1,𝑖4): 0.1928, 𝑚𝑎𝑥(�̂�𝑢2,𝑖2): 0.1609 <

𝑚𝑎𝑥(�̂�𝑢2,𝑖4): 0.2476, and 𝑚𝑎𝑥(�̂�𝑢3,𝑖1): 0.1150 < 𝑚𝑎𝑥(�̂�𝑢3,𝑖2): 0.1163.

4.2.5 Empirical Evaluation

The proposed point-wise tensor-based TRPR method is evaluated and compared to

another point-wise tensor-based method, MAX (Symeonidis et al., 2010), and the

matrix-based method CTS (Kim et al., 2010). (Rendle and Schmidt-Thieme, 2010).

The variation of TRPR and MAX methods are demonstrated with three commonly

used tensor factorization techniques (i.e. CP, HOOI, and HOSVD (Kolda and Bader,

2009)) using the Matlab Tensor Toolbox (Bader et al., 2012). The results are

presented as TRPR-CP, TRPR-HOOI, TRPR-HOSVD, and MAX-CP, MAX-HOOI,

MAX-HOSVD for these variations.

The experiments are conducted by the 5-fold cross-validation experimentation.

For each fold, each dataset is randomly divided into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a

test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not

overlap in posts, i.e., there exist no triplets for a user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet

(𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The recommendation task is to predict and rank the

Top-𝑁 items for the users present in 𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured

using F1-Score and reported over the average values on all five runs.

4.2.5.1 Choosing the Latent Factor Matrix Size 𝐹

Investigation on the impact of latent factor matrix size, 𝐹, towards performance, is

conducted in order to choose which size of 𝐹 is to be used for the experiments.

Figure 4.12 shows the performance comparison of TRPR-CP on Delicious 10–core

with an increasing number of 𝐹 from 8 to 256. It can be observed that the

recommendation quality does not benefit from 𝐹 more than 128. For this reason, for

all the tensor-based methods, the size of the latent factor matrix 𝐹 is set to 128.


Figure 4.12. Performance comparison of TRPR-CP with an increasing number of 𝐹

4.2.5.2 Accuracy Performance

The comparison of recommendation accuracy between the proposed TRPR method

and the benchmarking methods are investigated using the F1-Score on various Top-

𝑁 positions. Figure 4.13, Figure 4.14, Figure 4.15 and Figure 4.16 demonstrate that

TRPR outperforms the matrix-based method CTS and the conventional tensor-based

method MAX on the Delicious, LastFM, CiteULike and MovieLens datasets,

respectively.

(a)

(b)

(c)

Figure 4.13. F1-Score at various Top-𝑁 positions on Delicious dataset

5 10 15 202

2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

Delicious 10-core

Top-N

F1

-Sco

re

TRPR-CP F=8

TRPR-CP F=16

TRPR-CP F=32

TRPR-CP F=64

TRPR-CP F=128

TRPR-CP F=256

F1@5 F1@10 F1@15 F1@200

2

4

Top-N

Sco

re (

%)

Delicious 10-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

2

4

Top-N

Sco

re (

%)

Delicious 15-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

2

4

Top-N

Sco

re (

%)

Delicious 20-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS


(a)

(b)

(c)

Figure 4.14. F1-Score at various Top-𝑁 positions on LastFM dataset

(a)

(b)

(c)

Figure 4.15. F1-Score at various Top-𝑁 positions on CiteULike dataset

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

LastFM 10-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

LastFM 15-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

LastFM 20-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

2

4

Top-N

Sco

re (

%)

CiteULike 10-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

CiteULike 15-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

CiteULike 20-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS


(a)

(b)

(c)

Figure 4.16. F1-Score at various Top-𝑁 positions on MovieLens dataset

Table 4.1 lists the average of TRPR recommendation accuracy improvement to

show the outperformance over the MAX method when implemented on various

factorization techniques. The percentage scores are reported as an average

improvement over Top-5, Top-10, Top-15, and Top-20 values. The results show

that probabilistically ranking the candidate items, generated from the reconstructed

tensor, by utilising the user’s past tagging activities, can significantly improve the

recommendation accuracy.

When observing the robustness of TRPR with several factorization techniques,

it can be noted that TRPR with CP and HOOI factorization techniques achieve bigger

improvement compared to that of HOSVD. HOSVD optimizes each mode of tensor

𝒴 dimension separately and disregards the interaction among them (Kolda and

Bader, 2009). Therefore the candidate item and tag preference sets generated from

the reconstructed tensor �̂� could not reveal the user interest as much as CP and

HOOI that take all lateral interactions into consideration in the optimization process.

F1@5 F1@10 F1@15 F1@200

5

10

Top-NS

co

re (

%)

MovieLens 10-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

5

10

Top-N

Sco

re (

%)

MovieLens 15-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS

F1@5 F1@10 F1@15 F1@200

10

20

Top-N

Sco

re (

%)

MovieLens 20-core

MAX-CP

TRPR-CP

MAX-HOSVD

TRPR-HOSVD

MAX-HOOI

TRPR-HOOI

CTS


Dataset 𝒑-core Factorization technique

CP HOSVD HOOI

Delicious 10 31.14% 15.23% 24.77%

15 43.41% 11.96% 45.35%

20 25.39% 1.28% 24.31%

LastFM 10 25.83% 0.22% 25.35%

15 30.60% 2.92% 25.69%

20 27.16% 13.17% 26.92%

CiteULike 10 37.52% 10.99% 35.05%

15 28.31% 7.51% 30.12%

20 29.54% 3.43% 32.06%

MovieLens 10 23.53% 2.56% 22.49%

15 38.39% 11.00% 28.24%

20 23.30% 13.86% 26.13%

Table 4.1. Average TRPR accuracy improvement over MAX

Additionally, from Table 4.1, it can also be observed that the 𝑝-core size

impacts improvement of recommendation accuracy. On the Delicious and CiteUlike

datasets, for all factorization techniques, the larger the size of the 𝑝-core, the less

improvement is achieved. On the contrary, the improvement tends to increase when a

larger 𝑝-core size is implemented on LastFM and MovieLens datasets. The

characteristic of the datasets listed in Table 3.2 is the reason behind this. Table 3.2

shows that the number of users is always greater than the number of items available

on the Delicious and CiteULike dataset. While on the LastFM and MovieLens

dataset, the numbers of items offered is always more than the number of users.

4.2.5.3 Impact of Tag Preference Set Size

The impact of tag preference set size to the performance of TRPR is investigated by

measuring the F1-Score at various scales of 𝑣 values, i.e. 10 to 100. The examination

is demonstrated on TRPR implemented to the CP factorization technique as

implementation on other techniques show similar results.

Figure 4.17, Figure 4.18, Figure 4.19, and Figure 4.20 display the impact of tag

preference set size on the Delicious, LastFM, CiteULike, and Movielens datasets,


respectively. The trend shows that TRPR achieves the best F1-Score on small 𝑣

values (at most 𝑣 = 40), while using larger 𝑣 value results into inferior performance

as the tag preference becomes too general and may not really indicate the users

preference.

(a) (b) (c)

Figure 4.17. Impact of tag preference set size on Delicious dataset

(a) (b) (c)

Figure 4.18. Impact of tag preference set size on LastFM dataset

(a) (b) (c)

Figure 4.19. Impact of tag preference set size on CiteULike dataset

(a) (b) (c)

Figure 4.20. Impact of tag preference set size on MovieLens dataset

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

Tag Preference Size (v)

F1-

Sco

re@

10 (%

)

Delicious 10-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

3.5


F1-

Sco

re@

10 (%

)

Delicious 15-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

3.5


F1-

Sco

re@

10 (%

)

Delicious 20-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7


F1-

Sco

re@

10 (

%)

LastFM 10-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8


F1-

Sco

re@

10 (

%)

LastFM 15-core

10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10


F1-

Sco

re@

10 (

%)

LastFM 20-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

3.5


F1-

Sco

re@

10 (

%)

CiteULike 10-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6


F1-

Sco

re@

10 (

%)

CiteULike 15-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8


F1-

Sco

re@

10 (

%)

CiteULike 20-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7


F1-

Sco

re@

10 (%

)

MovieLens 10-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8


F1-

Sco

re@

10 (%

)

MovieLens 15-core

10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12


F1-

Sco

re@

10 (%

)

MovieLens 20-core


4.2.5.4 Scalability

The scalability of TRPR, in comparison to the MAX methods (Symeonidis et al.,

2010), is examined on the full tensor reconstruction process in terms of the space

consumption and CPU runtime. The examination is demonstrated on the largest

dataset used in this thesis, i.e. Delicious dataset. The space consumption and CPU

runtime are measured at various 𝑝-core, i.e. 10, 15, 20, 50, 80, and 100 core sizes.

As a result, six tensor models of different 𝑢𝑠𝑒𝑟 × 𝑖𝑡𝑒𝑚 × 𝑡𝑎𝑔 dimensionalities were

built as: 2,009 × 1,485 × 2,589; 1,609 × 719 × 1,761; 1,359 × 424 × 1,321;

665 × 52 × 422; 362 × 13 × 189; and 250 × 7 × 125, respectively. Accordingly,

the bigger the 𝑝-core, the smaller the tensor dimensionality is achieved.

(a) Space Consumption

(b) CPU Runtime

Figure 4.21. Scalability comparison by varying tensor dimensionality on Delicious dataset

250×7×125 362×13×189 665×52×422 1359x424x1321 1609×719×1761 2009x1485x258910

0

101

102

103

104

105

106

107

108

109

Space Consumption

Tensor Dimensionality

log

Sp

ace

(kb

)

MAX-CP

TRPR-CP

MAX-HOOI

TRPR-HOOI

MAX-HOSVD

TRPR-HOSVD

250×7×125 362×13×189 665×52×422 1359x424x1321 1609×719×1761 2009x1485x258910

0

101

102

103

104

105

106

107

108

109

CPU Runtime

Tensor Dimensionality

log

Ru

nti

me (

sec)

MAX-CP

TRPR-CP

MAX-HOOI

TRPR-HOOI

MAX-HOSVD

TRPR-HOSVD


Figure 4.21 demonstrates the scalability analysis of TRPR and MAX methods

on a single processor. It can be observed that TRPR is able to run on all tensor

dimensionalities. On the contrary, MAX failed to run on the two largest data

(2,009 × 1,485 × 2,589 and 1,609 × 719 × 1,761), due to memory overflow.

These results show that TRPR is scalable for large tensor size on any factorization

techniques with nearly constant space consumption and a linear time computation to

the tensor dimensionality. It is to be noted that for the purpose of accuracy

benchmarking, the 𝑛-mode block-striped (matrix) product was implemented to the

MAX method for making it applicable for all datasets used in the experiments.

4.2.6 Summary of Probabilistic Ranking

In this section, the Tensor-based Item Recommendation using Probabilistic Ranking

methods (TRPR) is proposed to address the scalability and accuracy challenges in

using tensor models in a tag-based item recommendation system. The method utilises

a memory efficient loop technique to enable scalable tensor reconstruction and

probabilistic ranking to improve the recommendation accuracy of candidate items

generated from the reconstructed tensor. TRPR developed the simple but effective

concept of block-striped parallel matrix multiplication to enable scalable tensor

reconstruction as well as advancing the concept of probabilistic ranking to achieve

higher recommendation accuracy.

The experimental results on various real-world datasets have demonstrated

that:

The implementation of an 𝑛-mode block-striped (matrix) product makes the

full tensor reconstruction scalable for large datasets;

The proposed TRPR, with the variations of 𝑝-core and factorization

techniques, outperforms the benchmarking methods in terms of accuracy.

This ascertains that recommendation accuracy can be improved with

probabilistically ranking the candidate items, generated from the full

reconstructed tensor, by utilising the user’s past tagging activities.


4.3 WE-RANK: WEIGHTED TENSOR APPROACH FOR RANKING

4.3.1 Overview

The developed We-Rank method focuses on dealing with the sparsity problem and

improving the recommendation accuracy during the learning-to-rank procedure.

Figure 4.22 illustrates the overview of the recommendation ranking using the

weighted tensor approach, called Recommendation Ranking using Weighted Tensor

(We-Rank).

boolean

Scheme

Tagging Data

Item 3

Item 1

Tag 2

Tag 3

Tag 4

Tag 5

Tag 1

Item

2User

1

User2

User3

Item4

User Profile

Construction

YÎRQ x R x S

Latent Factors

Generation

M(1)

, M(2)

,M(3)

Top-N Item Recommendation Generation

TopNu

Target

User

User Tag Usage

Likeliness

Generation

LÎRQ x S

Weighted Tensor

Construction

WÎRQ x R x S

Positive and

Negative Tag

Preference Sets

Generation

L+

u, L-u

Figure 4.22. Overview of the weighted tensor approach for ranking method (We-Rank)

To begin, the same as in TRPR, a third-order tensor representing the user

profiles is constructed from the tagging data by using the boolean scheme. Unlike

TRPR, We-Rank does not solely use the boolean user profiles for finding the hidden

relationships between the users, i.e. optimizing the tensor model with the least square

loss function. For learning the model, We-Rank implements a weighted tensor

approach that plays an important role in learning, as it controls and differentiates the

reward and penalty for the observed and non-observed tagging data entries of the

primary tensor model respectively. The primary tensor model that represents the non-

boolean values now becomes the underlying learning-to-rank model for

recommendation. This weighted approach tackles the data sparsity problem by


ignoring the missing data and modelling only the known entries (Acar et al., 2011).

The learning model can now be optimized as a weighted least square loss function.

Based on initial tensor boolean values, We-Rank first calculates the user tag

usage likeliness to each tag. The likeliness scores are then sorted in descending order

so that tags with high and low likeliness scores can be distinguished for each user.

For each user, the tags with high likeliness scores are called the positive tag

preference set, whereas those of low scores are called the negative tag preference set.

Given the observed tagging data entries in the primary tensor model, a multi-

dimensional data structure is needed to store the values of positive and negative tag

preference sets. For this reason, another tensor is constructed and is called a weighted

tensor in this thesis. The entries of the weighted tensor are a bijective mapping to the

entries of the primary tensor model that represents the user profiles. This mapping

ensures that each observed entry of the primary tensor is rewarded, such that the

corresponding entry of the weighted tensor holds higher positive value than those of

the non-observed ones. Note that the values in the weighted tensor are not changing

over the learning process. The process of weighted tensor construction is detailed in

Section 4.3.3.2 and the example is shown in Figure 4.29. In comparison to a TRPR

that needs to apply a subsequent approach for correctly ranking the order of

recommended items following the reconstruction processes, the resultant latent

factors of We-Rank can be directly used for generating the Top-𝑁 list of item

recommendations.

The next three sub-sections detail the three main processes in We-Rank: (1)

user profile construction, i.e. tensor model construction; (2) learning-to-rank

procedure, i.e. latent factors generation via tensor factorization to derive the

relationships inherent in the model; and (3) recommendation generation. The last two

sub-sections present empirical evaluation and summary of the method.


The user profile construction is the process of constructing the initial tensor to model

the multi-dimension data. In the same way as TRPR, the developed We-Rank method

uses a boolean scheme (Symeonidis et al., 2010) to build a primary third-order tensor

model to represent the user profile and ranking learning model. The boolean scheme

simply interprets the tagging data as binary data, which includes two types of entries:


“relevant” and “irrelevant”. The “relevant” entries, labelled as “1”, are the observed

entries where the user has explicitly revealed interest by annotating item using tags;

while the “irrelevant” entries, labelled as “0”, are the remaining (non-observed)

entries.

Once again, let 𝑈 = {𝑢1, 𝑢2, 𝑢3, … , 𝑢𝑄} be the set of 𝑄 users,

𝐼 = {𝑖1, 𝑖2, 𝑖3, … , 𝑖𝑅} be the set of 𝑅 items, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, … , 𝑡𝑆} be the set of 𝑆

tags. From the tagging data 𝐴 ∶ 𝑈 × 𝐼 × 𝑇, a vector of 𝑎: (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents an

activity of user 𝑢 to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏,

defines the state in which users have expressed their interest to items in the past, by

annotating those items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed

tagging data is very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|. The initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is

constructed where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a binary numerical value that

represents the relevance grade of the tagging activity. The rules of boolean scheme

relevance grade labelling to generate the entries of tensor 𝒴 is formulated in

Equation (4.1). A sample primary tensor model 𝒴 ∈ ℝ3×4×5 populated by

implementing the boolean interpretation scheme is illustrated in Figure 4.2.


This section details how the latent factor corresponding to each dimension of tensor

𝒴 is inferred.


In order to emphasise and penalise the observed and non-observed tagging data

respectively, the weighted Mean Square Error (wMSE) is used as the optimization

criterion for solving a regression/classification task. The wMSE of all users over all

items under all tags can be defined as (Acar et al., 2011):

𝑤𝑀𝑆𝐸 ≔ ∑ ∑ ∑ [𝑤𝑢,𝑖,𝑡(𝑦𝑢,𝑖,𝑡− �̂�

𝑢,𝑖,𝑡)]

2𝑆𝑡=1

𝑅𝑖=1

𝑄𝑢=1 (4.15)

where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned as one of elements in the binary

relevance set of { 0, 1} from the primary tensor model representing the user profiles.

�̂�𝑢,𝑖,𝑡 is the predicted preference score that reflects the preference level of user 𝑢 for

annotating item 𝑖 using a tag 𝑡, calculated from the latent factors. 𝑤𝑢,𝑖,𝑡 is the


weighted reward/penalty value for 𝑦𝑢,𝑖,𝑡 calculated from the weighted tensor

explained later in Section 4.3.3.2.

From the primary tensor 𝒴, latent factors are derived to infer the latent

relationships between the dimensions of users, items and tags. The CP model (Kolda

and Bader, 2009) is used as the factorization technique as well as the predictor

function model. CP is a well-known factorization technique that has shown to be less

expensive in both memory and time consumption compared to another well-known

algorithm, Tucker (Kolda and Bader, 2009).

Y »

(Q x R x S)

+m

(3)1

m(1)

1

m(2)

1

m(3)

2

m(1)

2

m(2)

2

+m

(3)F

m(1)

F

m(2)

F

. . .


As illustrated in Figure 4.23, CP factorizes a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆

into a sum of latent factors rank-one of 𝑚𝑓(1)

∈ ℝ𝑄, 𝑚𝑓(2)


∈ ℝ𝑆 for

𝑓 = 1,… , 𝐹, where 𝐹 is the column size of the corresponding latent factors matrix.

These latent factors are used in calculating the predicted score that reflects the

preference level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡. The predicted

preference score is calculated as:

�̂�𝑢,𝑖,𝑡 ∶= ∑ 𝑚𝑢,𝑓(1)

∙ 𝑚𝑖,𝑓(2)


𝑓=1 = ⟦𝑀(1), 𝑀(2), 𝑀(3)⟧ (4.16)

4.3.3.2 Weighted Tensor

This section details how the weighted tensor 𝒲 is constructed. The weighted tensor

𝒲 is a main component in the We-Rank method as it regulates the importance of the

observed and non-observed entries in the learning model. The entries of 𝒲 are

generated based on the tag usage likeliness of each user.


The User Tag Usage Likeliness Generation

We-Rank assumes that there are two characteristics that can determine the tag usage

likeliness of each user. Firstly, users use different choices of tags for annotating the

same item. From the toy example as shown in Figure 4.2, it can be observed that

each user has used different tags for annotating 𝑖3, i.e. 𝑢1, 𝑢2, and 𝑢3 use {𝑡4},

{𝑡2, 𝑡4}, and {𝑡2}, respectively. Secondly, the same tag can be used for annotating

different items. From Figure 4.2, it can be observed that 𝑡1 has been used by 𝑢1 and

𝑢2 for annotating {𝑖2} and {𝑖1} respectively.

This thesis attempts to capture these two characteristics by revealing the user

and tag latent features. Given the primary tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 which represents the

user profile, the user and tag latent features can be revealed by applying the non-

negative matrix factorization technique (Cichocki et al., 2009; Kim et al., 2014) to

the mode-1 and mode-3 matricizations of tensor 𝒴, i.e. 𝑌(1) ∈ ℝ𝑄×𝑅𝑆 and 𝑌(3) ∈

ℝ𝑆×𝑄𝑅. The formulations can be presented as follows:

𝑌(1) ∶= 𝐴𝐶′ (4.17)

𝑌(3) ∶= 𝐵𝐷′ (4.18)

where 𝐴 ∈ ℝ𝑄×𝐹 and 𝐵 ∈ ℝ𝑆×𝐹 are the user and tag latent features respectively, in

which 𝐹 is the size of the latent feature. Whereas 𝐶 ∈ ℝ𝑅𝑆×𝐹and 𝐷 ∈ ℝ𝑄𝑅×𝐹 are the

coefficient matrices of the mode-1 and mode-3 matricizations of tensor 𝒴, in regards

to the user and tag latent features, respectively. Equation (4.17) and Equation (4.18)

define that each user and tag, which is represented as columns in 𝑌(1) and 𝑌(3)

matrices, can be approximated as a non-negative linear combination of basis vector,

which are represented as columns in 𝐴 and 𝐵 matrices (Kim et al., 2014),

respectively. In other words, each column of the 𝐴 and 𝐵 matrices is representing the

importance of the user latent feature (𝛽) and the tag latent feature (𝜑) to a particular

user and tag, respectively. After the user and tag latent features are generated, the tag

usage likeliness of a user 𝑢 to a tag 𝑡 can then be calculated as:

𝑙𝑢,𝑡 ∶= ∑ 𝑎𝑢,𝑘𝑏𝑘,𝑡𝐹𝑘=1 (4.19)

where 𝑙𝑢,𝑡, 𝑎𝑢,𝑘, and 𝑏𝑘,𝑡 are elements of the User Tag Usage Likeliness matrix

𝐿 ∈ ℝ𝑄×𝑆, the User Latent Feature matrix 𝐴 ∈ ℝ𝑄×𝐹, and the Tag Latent Feature


matrix 𝐵 ∈ ℝ𝑅×𝐹. The detail of the User Tag Usage Likeliness Generation algorithm

is shown in Figure 4.24.

1: Algorithm: User Tag Usage Likeliness Generation

2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, the size of latent feature 𝐹

3: Output: User Tag Usage Likeliness Matrix 𝐿 ∈ ℝ𝑄×𝑆

4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆

5: Generate 𝑦𝑢,𝑖,𝑡 by using Equation (4.1)

6: Get Y(1) by implementing mode-1 matricization on tensor 𝒴

7: Get 𝑌(3) by implementing mode-3 matricization on tensor 𝒴

8: Get 𝐴 ∈ ℝ𝑄×𝐹, 𝐵 ∈ ℝ𝑆×𝐹 by implementing the non-negative matrix

factorization on 𝑌(1) and 𝑌(3)

9: for 𝑢 ∈ 𝑈 do

10: for 𝑡 ∈ 𝑇 do

11: for 𝑘 ← 1 𝑡𝑜 𝐹 do

12: 𝑙𝑢,𝑡 ⟵ 𝑎𝑢,𝑘𝑏𝑘,𝑡

13: end

14: end

15: end /* 𝐿 ∈ ℝ𝑄×𝑆 */

Figure 4.24. The user Tag Usage Likeliness generation algorithm

Example 4.4: User Tag Usage Likeliness Generation.

An example of how to generate User Tag Usage Likeliness is illustrated by using the

toy example of a third-order tensor 𝒴 ∈ ℝ3×4×5 in Figure 4.2. As shown in Figure

4.25(a) and (b), the mode-1 and mode-3 matricizations of tensor 𝒴 result into

𝑌(1) ∈ ℝ3×20 and 𝑌(3) ∈ ℝ5×12. Figure 4.26(a) and (b) present the resultant user and

tag latent feature matrices after applying the non-negative matrix factorization to 𝑌(1)

and 𝑌(3) respectively, by choosing 𝐹 = 2.

As previously described, each column of the 𝐴 and 𝐵 matrices is representing the

importance of the user latent feature (𝛽) and the tag latent feature (𝜑) to a particular

user and tag, respectively. From Figure 4.26(a), it can be observed that the

importance of feature 𝛽1 and 𝛽2 to 𝑢1 is 0.3943 and 1.0000, respectively. On the

other hand, from Figure 4.26(b), it can be perceived that the importance of feature 𝜑1

and 𝜑2 to 𝑡1 is 0.0000 and 0.7071, respectively. The tag usage likeliness of a user 𝑢

to a tag 𝑡 is calculated by using Equation (4.19) and the resultant matrix is shown in

Figure 4.27.


𝑌(1) = [0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 01 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 00 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1

]

(a)

𝑌(3) =

[ 0 1 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0 1 0 0 0 1 00 1 0 0 0 0 0 0 0 0 0 00 0 1 0 0 0 1 0 0 0 0 10 0 0 0 0 0 0 0 0 0 0 0]

(b)

Figure 4.25. Example of the resulted matricization of tensor 𝒴 ∈ ℝ3×4×5: (a) Mode-1 matricization

𝑌(1) ∈ ℝ3×20, and (b) Mode-3 matricization 𝑌(3) ∈ ℝ5×12

0.3943 1.0000

0.7835 0.0000

0.4803 0.0000

User Feature

u1

u2

u3

β1 β2

0.0000 0.7071

0.7013 0.0000

0.0000 0.7071

Tag Feature

t1

t2

t3

j1 j2

0.7106 0.0000

0.0570 0.0000

t4

t5

(a) (b)

Figure 4.26. Example of the resulted latent feature matrix: (a) User latent feature matrix 𝐴 ∈ ℝ3×2,

and (b) Tag latent feature matrix 𝐵 ∈ ℝ5×2

0.7071 0.2765

0.0000 0.5495

0.0000 0.3368

User Tag Usage Likeliness

u1

u2

u3

t1 t2 t3 t4 t5 0.7071 0.2802

0.0000 0.5568

0.0000 0.3413

0.0225

0.0447

0.0274

Figure 4.27. Example of the resulted User Tag Usage Likeliness matrix 𝐿 ∈ ℝ3×5

Positive and Negative Tag Preference Sets Generation

Once the user tag usage likeliness to each tag is calculated, the list of tags is ordered

in descending order based on the likeliness scores. For each user, tags at the top and

bottom of the list can now be distinguished and called positive and negative tag

preference sets respectively. This distinction is done by setting the size of the tag

preference set, 𝑣. The positive tag preference set, 𝐿𝑢+, is generated based on the larger


values of 𝐿𝑢,∗, where 𝐿𝑢+ ⊆ 𝑇 such that |𝐿𝑢

+| ≤ 𝑣. Whereas the negative tag

preference set, 𝐿𝑢−, is generated based on the lower values of 𝐿𝑢,∗, where 𝐿𝑢

− ⊆ 𝑇 such

that |𝐿𝑢−| ≤ 𝑣.

Weighted Tensor Construction

Given the observed tagging data entries, positive and negative tag preference sets,

the weighted tensor 𝒲 can be constructed as detailed in Figure 4.28. The entries of

𝒲 are a bijective mapping to the entries of the primary tensor model 𝒴 that

represents the user profile. Given the list of observed entries, the positive and

negative tag preference sets, 𝑤𝑢,𝑖,𝑡 can be assigned as one of the elements of the

ordinal relevance values of {2,1,0, −1}. The “2” value represents the observed

entries, whereas “1” and “-1” values represent non-observed entries, which belongs

to the positive and negative tag preference sets respectively. Meanwhile, any other

entries are labelled as “0”. From Figure 4.28, it can be noted that each observed entry

of the primary tensor is indisputably rewarded such that the associated entry of the

weighted tensor holds higher positive value than that of the non-observed one.

1: Algorithm: Weighted Tensor Construction

2: Input : Tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, User Tag Usage Likeliness Matrix 𝐿 ∈ ℝ𝑄×𝑆 ,

Tag preference size 𝑣

3: Output: Weighted Tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆


5: Get 𝐿𝑢+ max (𝐿𝑢,∗) such that (|𝐿𝑢

+| ≤ 𝑣)

6: Get 𝐿𝑢− min (𝐿𝑢,∗) such that (|𝐿𝑢

−| ≤ 𝑣)

7: Initialize 𝒲 ∈ ℝ𝑄×𝑅×𝑆 with zeroes

8: for 𝑖 ∈ 𝐼 do


10: if 𝑦𝑢,𝑖,𝑡 == 1 then /* observed entries */

11: 𝑤𝑢,𝑖,𝑡 ⟵ 2

12: elseif (𝑡 ∈ 𝐿𝑢+) ∧ (𝑦𝑢,𝑖,𝑡 ≠ 1) then

13: 𝑤𝑢,𝑖,𝑡 ⟵ 1

14: elseif (𝑡 ∈ 𝐿𝑢−) ∧ (𝑦𝑢,𝑖,𝑡 ≠ 1) then

15: 𝑤𝑢,𝑖,𝑡 ⟵ −1

16: end

17: end

18: end

19: end

Figure 4.28. The weighted tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆 construction algorithm


Example 4.5: Weighted Tensor Construction.

An example on how to construct the weighted tensor 𝒲 is illustrated by using the

toy example of primary tensor 𝒴 ∈ ℝ3×4×5 as shown in Figure 4.29(a) and the User

Tag Usage Likeliness matrix 𝐿 ∈ ℝ3×5, built in Example 4.4, as shown in Figure

4.27.

By choosing 𝑣 = 2, the positive user preference sets of each user are derived as

𝐿𝑢1+ = {𝑡1, 𝑡3}, 𝐿𝑢2

+ = {𝑡4, 𝑡2}, and 𝐿𝑢3+ = {𝑡4, 𝑡2}. Whereas the negative preference

sets of each user are derived as 𝐿𝑢1− = {𝑡2, 𝑡5}, 𝐿𝑢2

− = {𝑡3, 𝑡1}, and 𝐿𝑢3− = {𝑡3, 𝑡1}.

By implementing the weighted tensor construction algorithm in Figure 4.28, the

entries of weighted tensor 𝒲 are generated and the result is shown in Figure 4.29(b).

It can be observed that 𝒲 is resulted by regarding the user’s collaborations and

associations, as its construction process implicitly clusters similar users as per tag

usage, i.e. by implementing the Tag Usage Likeliness generation algorithm in Figure

4.24. Note that the non-negative matrix factorization technique is not used for

generating the latent factors of We-Rank since it takes only non-negative values for

all the latent factors (Xu et al., 2003), which means that entries that the users do not

like – represented as negative values – would not be regarded for generating the list

of recommendations.

User 30 0 0 0 0

0

0

0

0 0 0 0

1 0 0 0

0 0 1 1

User 2

1 0 0 0 0

0

0

0

0 0 0 0

1 0 1 0

0 0 0 0

User 10 0 0 0 0

1

0

0

1 0 0

0 0 1 0

0 0 0 0

tag

ite

m 0

User 30 0 0 0 0

0

-1

-1

0 0 0 0

2 -1 1 0

1 -1 2 2

User 2

2 1 -1 1 0

0

-1

0

0 0 0 0

2 -1 2 0

0 0 0 0

tag

ite

m

User 1

0 0 0 0 0

2

1

0

-1 2 0 -1

-1 1 2 -1

0 0 0 0

(a) (b)

Figure 4.29. Example of: (a) Primary tensor 𝒴 ∈ ℝ3×4×5, and (b) the resulted Weighted tensor

𝒲 ∈ ℝ3×4×5


The latent factors generation, via tensor factorization, is the process of deriving the

latent relationships between dimensions of tensor model. The latent factorss, 𝑀(1),


𝑀(2), and 𝑀(3), corresponding to each dimension of tensor 𝒴, are generated by

optimizing the objective function of the recommendation model.

Given the optimization criterion in Equation (4.15), the resultant objective

function of We-Rank can be formulated as (Acar et al., 2011):

𝐿(𝛩) ∶= ∑ ∑ ∑ [𝑤𝑢,𝑖,𝑡(𝑦𝑢,𝑖,𝑡 − �̂�𝑢,𝑖,𝑡)]2𝑆

𝑡=1𝑅𝑖=1

𝑄𝑢=1 = [𝒲 ∗ (𝒴 − ⟦𝑀(1),𝑀(2),𝑀(3)⟧)]

2(4.20)

The gradients of 𝑤𝑀𝑆𝐸 given a case (𝑢, 𝑖, 𝑡) with respect to the model parameter are

formulated as follows (Acar et al., 2011):

𝜕𝐿

𝜕𝑚𝑢(1) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡

2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1

𝑅𝑖=1 ∙ 𝑚𝑖

(2)∙ 𝑚𝑡

(3)𝑄𝑢=1 (4.21)

𝜕𝐿

𝜕𝑚𝑖(2) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡

2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1

𝑅𝑖=1 ∙ 𝑚𝑢

(1)∙ 𝑚𝑡

(3)𝑄𝑢=1 (4.22)

𝜕𝐿

𝜕𝑚𝑡(3) = 2∑ ∑ ∑ 𝑤𝑢,𝑖,𝑡

2 (−𝑦𝑢,𝑖,𝑡 + �̂�𝑢,𝑖,𝑡)𝑆𝑡=1

𝑅𝑖=1 ∙ 𝑚𝑢

(1)∙ 𝑚𝑖

(2)𝑄𝑢=1 (4.23)

The We-Rank learning algorithm is outlined in Figure 4.30 to find the latent factors.

1: Algorithm: We-Rank Learning

2: Input : Initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆, Weighted tensor 𝒲 ∈ ℝ𝑄×𝑅×𝑆, latent

factor matrix column size 𝐹, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥

3: Output: Latent factors 𝑀(1), 𝑀(2), 𝑀(3)

4: Initialize 𝑀(1)(0)∈ ℝ𝑄×𝐹 , 𝑀(2)(0)

∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0

5: repeat


7: 𝑚𝑢(1)

⟵ 𝑚𝑢(1)

+𝜕𝐿

𝜕𝑚𝑢(1) based on Equation (4.21)

8: End 9: for 𝑖 ∈ 𝐼 do

10: 𝑚𝑖(2)

⟵ 𝑚𝑖(2)

+𝜕𝐿

𝜕𝑚𝑖(2) based on Equation (4.22)

11: End 12: for 𝑡 ∈ 𝑇 do

13: 𝑚𝑡(3)

⟵ 𝑚𝑡(3)

+𝜕𝐿

𝜕𝑚𝑡(𝑡) based on Equation (4.23)

14: end 15: + + ℎ

16: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥

Figure 4.30. The We-Rank learning algorithm



Since We-Rank implements a weighted scheme during the learning process such that

reward and penalty are given to the observed and non-observed tagging data entries

respectively, the resultant latent factors of We-Rank can be directly used for

generating the Top-𝑁 list of item recommendations for each target user. In this case,

the recommended items are selected based on the maximum value of �̂�𝑢,𝑖,𝑡,

calculated using Equation (4.16).


The performance of We-Rank is compared with benchmarking methods, including

MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme, 2010), and

CTS (Kim et al., 2010). It is to be noted that comparison with the previously

proposed method, TRPR, is presented in Chapter 6. For all tensor-based methods, the

size of latent factor matrix 𝐹 is set to 128 as the recommendation quality usually

does not benefit from more than that value. The experiments are conducted by 5-fold

cross-validation experimentation. For each fold, each dataset is randomly divided

into a training set 𝐷𝑡𝑟𝑎𝑖𝑛 (80%) and a test set 𝐷𝑡𝑒𝑠𝑡 (20%) based on the number of

posts data. 𝐷𝑡𝑟𝑎𝑖𝑛 and 𝐷𝑡𝑒𝑠𝑡 do not overlap in posts, i.e., there exist no triplets for a

user-item set in the 𝐷𝑡𝑟𝑎𝑖𝑛 if a triplet (𝑢, 𝑖,∗) is present in the 𝐷𝑡𝑒𝑠𝑡. The

recommendation task is to predict and rank the Top-𝑁 items for the users present in

𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured using the F1-Score and reported over

the average values on all five runs.

4.3.5.1 Impact of Tag Preference Size

The impact of tag preference set size to the performance of We-Rank is investigated

by measuring the F1-Score at various scales of 𝑣 values, i.e. 10 to 100. Figure 4.31,

Figure 4.32, Figure 4.33, and Figure 4.34 respectively display the impact of tag

preference set size on Delicious, LastFM, CiteULike, and Movielens datasets. It can

be observed that We-Rank achieves the best F1-Score when 𝑣 is in the range of 50 to

90. After these values, further increase of 𝑣 value decreases the We-Rank

performance. This result indicates that selecting too many numbers of tags is not

only causing unnecessary computation cost but also corrupting the user’s preference.


(a) 10-core (b) 15-core (c) 20-core

Figure 4.31. Impact of tag preference set size on Delicious dataset


Figure 4.32. Impact of tag preference set size on LastFM dataset


Figure 4.33. Impact of tag preference set size on CiteULike dataset


Figure 4.34. Impact of tag preference set size on MovieLens dataset

10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5


F1-

Sc

ore

@10

(%

)

Delicious 10-core

10 20 30 40 50 60 70 80 901000

0.2

0.4

0.6

0.8

1


F1-

Sc

ore

@10

(%

)

Delicious 15-core

10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

1.2

1.4


F1-

Sc

ore

@10

(%

)

Delicious 20-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6


F1-

Sc

ore

@10

(%

)

LastFM 10-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

3.5

4


F1-

Sc

ore

@10

(%

)

LastFM 15-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5


F1-

Sc

ore

@10

(%

)

LastFM 20-core

10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8

1

1.2

1.4


F1-

Sc

ore

@10

(%

)

CiteULike 10-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3


F1-

Sc

ore

@10

(%

)

CiteULike 15-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5


F1-

Sc

ore

@10

(%

)

CiteULike 20-core

10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3

3.5


F1-

Sc

ore

@10

(%

)

MovieLens 10-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5


F1-

Sc

ore

@10

(%

)

MovieLens 15-core

10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6


F1-

Sc

ore

@10

(%

)

MovieLens 20-core


4.3.5.2 Primary Tensor 𝒴 and Weighted Tensor 𝒲

The impact of tag preference set size to the weighted tensor 𝒲 is investigated by

comparing its density from 𝐷𝑡𝑟𝑎𝑖𝑛 at various scales of 𝑣 values, i.e. 0 to 100. Note

that the density of 𝒲 is the same with that of primary tensor 𝒴 when 𝑣 = 0. Figure

4.35 shows that the weighted tensor 𝒲 density is linear to the tag preference set size

on each dataset used in this thesis.

(a) (b)

(c) (d)

Figure 4.35. The weighted tensor 𝒲 densities at various tag preference set size on: (a) Delicious, (b)

LastFM, (c) CiteULike, and (d) MovieLens datasets

To study how the sparsity problem within 𝒴 is solved in the form of 𝒲, their

densities are compared to results of the impact of tag preference set size to We-Rank

performance shown in Section 4.3.5.1. Table 4.2 lists the density comparison of non-

zeros entries generated from 𝐷𝑡𝑟𝑎𝑖𝑛 on the 𝒴 and 𝒲 with 𝑣 = 50. From Table 4.2

and results in Section 4.3.5.1, it can be observed that from a sparse 𝒴, a denser 𝒲

can be generated to weigh each 𝒴 entry during the learning process, and that the

density of 𝒲 is influencing the performance. This observation is not just confirming

that the implementation of 𝒲 can solve the sparsity problem within 𝒴, but it also

affecting the success of We-Rank.

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

Tag Preference Set Size (v)

Den

sity

(%

)

Delicious

10-core

15-core

20-core

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5


Den

sity

(%

)

LastFM

10-core

15-core

20-core

0 10 20 30 40 50 60 70 80 90 1000

0.2

0.4

0.6

0.8


Den

sity

(%

)

CiteULike

10-core

15-core

20-core

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4


Den

sity

(%

)

MovieLens

10-core

15-core

20-core


Dataset Tensor

Density of

non-zero tensor entries (%)

𝟏𝟎-core 𝟏𝟓-core 𝟐𝟎-core

Delicious Primary 𝒴 0.0006 0.0014 0.0027

Weighted 𝒲 (𝑣 = 50) 0.0291 0.0678 0.1305

LastFM Primary 𝒴 0.0042 0.0092 0.0163

Weighted 𝒲 (𝑣 = 50) 0.1377 0.3017 0.5297

CiteULike Primary 𝒴 0.0010 0.0037 0.0095

Weighted 𝒲 (𝑣 = 50) 0.0392 0.1470 0.3897

MovieLens Primary 𝒴 0.0062 0.0283 0.0284

Weighted 𝒲 (𝑣 = 50) 0.3629 0.8399 1.6020

Table 4.2. The density comparison of non-zero entries generated from 𝐷𝑡𝑟𝑎𝑖𝑛 on the primary tensor 𝒴

and weighted tensor 𝒲 (𝑣 = 50)


The comparison of recommendation accuracy between We-Rank and the

benchmarking methods are investigated using F1-Score on various Top-𝑁 positions.

Table 4.3, Table 4.4, Table 4.5, and Table 4.6 list the comparison for the Delicious,

LastFM, CiteULike and MovieLens datasets, respectively. Data in these tables

indicate that We-Rank achieves better performance in comparison to benchmarking

methods on the 15 and 20-cores of MovieLens dataset only (Table 4.6). We-Rank

implements weight values, calculated from the user’s tag usage likeliness, to either

reward or penalise each entry of the primary tensor. In other words, its performance

highly depends on how well the user’s tag usage likeliness is captured. Therefore,

looking at the densities listed in Table 4.2, We-Rank outperformance occurs when the

user’s tag usage likeliness are sufficiently captured, i.e. from the dataset with dense

𝒴, as demonstrated on the 15 and 20-cores of MovieLens dataset. This indicates that

We-Rank may perform better for a much larger 𝑝-core size, to result in dense 𝒴, on

other datasets.


Methods 𝟏𝟎-core (Score in %) 𝟏𝟓-core (Score in %) 𝟐𝟎-core (Score in %)

F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20

MAX 2.18 2.28 2.34 2.26 2.68 2.83 2.70 2.67 3.26 3.46 3.37 3.22

PITF 2.18 2.41 2.34 2.28 2.40 2.64 2.62 2.54 2.91 2.98 3.01 2.87

CTS 1.96 2.22 2.31 2.26 2.40 2.62 2.65 2.58 2.71 2.88 2.80 2.85

We-Rank 1.94 2.06 2.06 1.73 2.13 2.30 2.19 2.12 2.59 2.77 2.65 2.54

Table 4.3. F1-Score at various Top-𝑁 positions on Delicious dataset


F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20

MAX 6.22 6.80 6.94 6.66 6.40 7.68 8.01 7.83 7.13 8.12 8.36 8.15

PITF 5.06 6.28 6.71 6.77 6.15 7.30 7.75 7.86 6.60 8.17 8.54 8.59

CTS 4.01 4.72 4.76 4.51 4.05 4.43 4.58 4.44 4.83 6.29 6.59 6.50

We-Rank 4.00 4.70 4.76 4.48 4.78 4.98 4.68 4.82 5.13 6.43 6.68 6.91

Table 4.4. F1-Score at various Top-𝑁 positions on LastFM dataset


F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20

MAX 2.90 2.93 3.00 2.91 5.42 4.97 4.58 4.53 7.68 7.54 7.31 7.17

PITF 3.89 4.07 3.97 3.86 4.84 4.87 4.47 4.24 5.79 5.58 5.43 5.41

CTS 2.73 2.63 2.55 2.43 4.83 4.40 3.91 3.69 7.00 6.31 5.68 5.34

We-Rank 2.19 2.41 2.45 2.51 2.86 3.33 3.64 3.68 7.14 6.41 6.06 6.99

Table 4.5. F1-Score at various Top-𝑁 positions on CiteULike dataset


F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20 F1@5 F1@10 F1@15 F1@20

MAX 5.91 6.36 6.14 5.98 6.93 6.91 6.55 6.07 9.92 9.37 8.73 8.43

PITF 3.50 4.34 4.47 4.66 4.11 5.45 5.64 5.80 6.58 7.23 7.34 7.55

CTS 5.06 5.56 5.40 5.26 6.09 6.13 5.79 5.68 7.64 8.50 7.98 7.39

We-Rank 4.35 4.96 5.17 5.24 6.99 6.91 6.74 6.80 9.85 9.59 9.05 9.00

Table 4.6. F1-Score at various Top-𝑁 positions on MovieLens dataset


4.3.6 Summary of Weighted Tensor Factorization

In this section, a weighted tensor factorization method, named Recommendation

Ranking using Weighted Tensor (We-Rank), is proposed to address the sparsity

problem and accuracy challenges in using tensor models in a tag-based item

recommendation system.


that:

The implementation of weighted tensor 𝒲 solves the sparsity problem within

primary tensor 𝒴;

On dense datasets, We-Rank outperforms the benchmarking methods which

indicate that it efficiently utilises the weighted scheme to reward or penalise

each primary tensor model entries during the learning process. However, We-

Rank underperformance in comparison to the benchmarking methods on most

of the datasets is due to its sensitivity of how well the users tag usage

likeliness, that is a driving factor for populating the weight values, is

captured.

4.4 CHAPTER SUMMARY

This chapter has detailed the two point-wise based ranking recommendation methods

developed in this thesis, namely Tensor-based Item Recommendation using

Probabilistic Ranking (TRPR) and Recommendation Ranking using Weighted Tensor

(We-Rank), to solve the tag-based item recommendation task. Both methods

implement the boolean scheme to populate the entries of the tensor model.

TRPR focuses on improving the scalability during the tensor reconstruction

process and the recommendation accuracy after the tensor model has been

reconstructed. As demonstrated in the result, the implementation of an 𝑛-mode

block-striped (matrix) product makes the full tensor reconstruction scalable for large

datasets. TRPR outperforms the benchmarking methods in terms of accuracy with

variations of 𝑝-core and factorization techniques. We-Rank focuses on dealing with

the sparsity problem and improving the recommendation accuracy during the

learning-to-rank procedure. The experimental results have demonstrated that the


implementation of weighted tensor 𝒲 solves the sparsity problem within primary

tensor 𝒴. We-Rank outperforms the benchmarking methods on dense datasets only,

since it is prone to how well the user’s tag usage likeliness, which is used to populate

the weight values, is captured.

Chapter 5: List-wise based Ranking Methods 119

Chapter 5: List-wise based Ranking

Methods

This chapter presents the developed list-wise based ranking recommendation

methods based on multi-graded data. It begins with an introduction of the list-wise

based ranking approach. The next two sections detail the proposed methods, i.e. Do-

Rank: Learning from multi-graded data and Go-Rank: Learning from graded-

relevance data, in which the novel User-Tag Set (UTS) and graded-relevance

schemes are implemented to interpret the tagging data. For each method, the

experiments are conducted and the results are then discussed.

5.1 INTRODUCTION

Solving the recommendation task using a list-wise based ranking approach, the

recommendation problem is seen as a ranking learning problem, i.e. predicting an

ordered list of items that will be of interest to a user, in which the predicted entries

depend on other corresponding entries (Liu, 2009; Mohan et al., 2011; Rendle,

2011). In this case, a recommendation model should be optimized with respect to the

ranking evaluation measure so that a list of items optimized from the ranking

evaluation measure perspective can be recommended to each user (Chapelle and Wu,

2010; Cremonesi et al., 2010; Xu and Li, 2007). The task of a tag-based item

recommendation system is to generate the list of items that may be of interest to a

user, by learning from the user’s past tagging behaviour that is recorded in tagging

data. The list of recommended items is sorted in descending order based on the

predicted preference score that reflects the preference level of a user for annotating

an item using a tag. Given that users usually show more interest in the few items at

the top of the list than those further down the list (Cremonesi et al., 2010), the order

of items in the recommendation list is crucial. In this case, the recommendation task

can be regarded as a ranking problem and solved by implementing the list-wise based

ranking approach.

120 Chapter 5: List-wise based Ranking Methods

5.1.1 Challenges

Implementing the list-wise based ranking approach in a tag-based item

recommendation system makes it natural to implement an interpretation scheme that

can leverage the tagging data as a ranking representation, for building the user

profile. In other words, such a scheme must apply a ranking constraint to interpret

the tagging data, resulting in non-binary relevance (or multi-graded) input data.

According to the set-based scheme, the observed tagging data can be

customarily interpreted as “positive” (or “relevant”) entries as users have implicitly

expressed their interest in items using tags. On the other hand, the non-observed

tagging data can reveal two types of information: (1) “negative” (or “irrelevant”)

entries that define the state where a user is not interested in the items; or (2) “null”

(or “indecisive”) entries that define the state where a user might be interested in

items in the future (Rendle, Balby Marinho, et al., 2009). These indecisive entries

need to be predicted and become candidate recommendations. Accordingly, entries

of tagging data can be labelled using the ordinal relevance set of

{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} for a tuple of ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ resulting in

multi-graded data (Ifada and Nayak, 2014a, 2015; Rendle, Balby Marinho, et al.,

2009; Rendle and Schmidt-Thieme, 2010).

To implement the list-wise based ranking approach, a choice of optimization

criterion is crucial as it controls the latent factors learning process and it depends on

the type of data used (Chapelle and Wu, 2010; Liu, 2009). Mean Average Precision

(MAP), Mean Reciprocal Rank (MRR) and Discount Cumulative Gain (DCG) are

widely used measures for evaluating the ranking performance of a ranking model

(Liu, 2009). MAP is defined as the mean value of Average Precision (AP) that

considers the rank position of each relevant item. In this case, AP is the average of

precision scores at the positions where there are relevant items (Buckley and

Voorhees, 2000; Chapelle and Wu, 2010). MRR is the mean of Reciprocal Rank

(RR), which is equivalent to MAP in cases where the user wishes to see only one

relevant item (Craswell, 2009; Voorhees, 1999). The RR itself is the reciprocal of the

rank of the first relevant item.

The nature of MAP and MRR make them commonly used measures for

optimizing the binary relevance input data (Chapelle and Wu, 2010; Liu, 2009; Shi,

Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas,


Larson, Oliver, et al., 2012). They are deemed unsuitable to be used as the

optimization criterion for solving the recommendation task in tag-based

recommendation systems that entries of tagging data are interpreted as multi-graded

data. Instead, DCG is more widely used in the case of multi-graded data (Chapelle

and Wu, 2010; Liu, 2009; Weimer et al., 2007). DCG assumes that the higher the

ranked position of a relevant items, the more important it is to the user and the more

likely it is to be selected (Järvelin and Kekäläinen, 2002). However, directly

optimizing DCG across all users in the recommendation model is computationally

expensive. To deal with this, a fast learning approach is desirable for scalable

learning process.

Moreover, this thesis shows that not all non-observed entries, on each observed

user-tag set, should simply be regarded as “irrelevant” entries (Ifada and Nayak,

2014a). Those entries are not the “indecisive” entries as the items of the entries have

already been selected by the user in the past and, therefore, they are not required to

be predicted in the future. This opens up a new problem as to how to further detail

the entries of non-observed data, in which the tagging data is interpreted as a graded-

relevance data (Ifada and Nayak, 2016) since there exist transitional entries between

the “relevant” and “irrelevant” entries.

To learn from the graded-relevance data, DCG can no longer be used as the

optimization criterion for implementing the list-wise based ranking approach as it is

not suitable to handle the graded data with “transitional” entries (Ifada and Nayak,

2016). Alternatively, Graded Average Precision (GAP) (Robertson et al., 2010) has

been shown to effectively work as the generalisation of Average Precision (AP) for

the case of rating of explicit feedback data (Robertson et al., 2010). Yet, GAP has

never been used on a tag-based recommendation system.

5.1.2 Proposed Solutions

The proposed Do-Rank and Go-Rank methods fall under the category of list-wise

based ranking approach since they use the ranking evaluation measure as the

recommendation model optimization criterion. As a different type of data requires a

different optimization criterion in the learning-to rank-approach (Chapelle and Wu,

2010; Liu, 2009), it makes it obvious that the two alternative forms of the interpreted


tagging data require different optimization criterion for learning the recommendation

model.

Do-Rank, the first developed method in this chapter, focuses on learning from

multi-graded data, in which the Discount Cumulative Gain (DCG) ranking evaluation

measure is used as the optimization criterion; whereas Go-Rank focuses on learning

from the graded-relevance data, in which the Graded Average Precision (GAP)

ranking evaluation measure is used as the optimization criterion. For constructing the

user profile representation, the developed Do-Rank and Go-Rank methods implement

the novel ranking based interpretation schemes, namely User-Tag Set (UTS) and

graded-relevance schemes, respectively. The next two sections present each of the

methods.

5.2 DO-RANK: LEARNING FROM MULTI-GRADED DATA

5.2.1 Overview

The novel DCG Optimization for Learning-to-Rank (Do-Rank) method is developed

for learning from multi-graded data on a tag-based item recommendation model, in

which the DCG ranking evaluation measure is used as the optimization criterion. In

other words, this method generates an optimal list of recommended items from the

DCG perspective for all users. This section also presents the proposed User-Tag Set

(UTS) scheme based on set-based (Rendle, Balby Marinho, et al., 2009) to construct

the initial third-order tensor model for representing the user profile.

Previous work (Weimer et al., 2007) has proposed using Normalized DCG

(NDCG) as the recommendation model optimization criterion, yet the problem

solved was for a recommendation system that uses ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation

rating data entries as explicit feedback data. This means that the list of

recommendations is generated by ranking the predicted rating scores inferred from

the non-observed ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ relations (Balakrishnan and Chopra, 2012; Weimer et

al., 2007). In contrast, this thesis deals with a quite different and difficult problem in

comparison to the prior work as the tag-based recommendation systems use the

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation tagging data entries as implicit feedback data.


The key challenges faced are inferring the latent relationships of ternary input

data and predicting the preference score of each entry (Ifada and Nayak, 2014a;

Rendle, Balby Marinho, et al., 2009). In contrast to the explicit feedback data, which

has one preference or rating score only on each observed user-item set, the implicit

feedback data of the tag-based recommendation system has multiple preference

scores. This means that the list of recommendations needs to be generated by ranking

the predicted preference scores of items under all tags that may be of interest to a

user. Consequently, the preference scores calculated by the recommendation system

must infer the tag that will influence the user for choosing the recommended item.

The next three sub-sections detail the three main processes in Do-Rank: (1)






The user profile construction is the process of building an initial tensor to model the

multi-dimension data. The Do-Rank method uses the proposed User-Tag Set (UTS)

scheme, based on set-based (Rendle, Balby Marinho, et al., 2009) to populate the

tensor model.



𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the tagging activity of user 𝑢

to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in

which users have expressed their interest to items in the past, by annotating those

items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed tagging data is

very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|. Initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed to represent

the user profile, where 𝑄, 𝑅, and 𝑆 are the size of set of users, items and tags

respectively. Each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the

relevance grade of tagging activity.

The set-based scheme is a well-known ranking based interpretation scheme

that has shown good performance when implemented on a pair-wise ranking based


approach (Rendle, Balby Marinho, et al., 2009; Rendle and Schmidt-Thieme, 2010).

In contrast to the boolean scheme that interprets tagging data as binary entries

(Symeonidis et al., 2010), the set-based scheme interprets the tagging data as multi-

graded data of three distinct entries that are revealed from the observed and non-

observed entries, i.e. “relevant”, “irrelevant”, and “indecisive”. The set-based

scheme solves the two shortcomings of the boolean scheme: (1) the sparsity problem

– the 0 values dominate the data, and (2) the overfitting problem – all non-observed

entries are denoted as 0 (Rendle, Balby Marinho, et al., 2009).

The set-based scheme differentiates the relevance grade of the resulted multi-

graded data by applying a ranking constraint. Given the “relevant” and “irrelevant”

entries, the scheme infers that, for each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, user 𝑢 is less favourable to

use tag 𝑡 for annotating any items other than those of “relevant” entries (Gemmell et

al., 2011). For that reason, higher ordinal relevance values are assigned to the

“relevant” entries and labelled with “1” value, whereas the “irrelevant” entries are

labelled with “–1” value. The “0” value is used to label the “indecisive” entries, i.e.

entries to be predicted for generating the recommendations. The rules of set-based

scheme relevance grade labelling to generate the entries of tensor 𝒴 can be

formulated as follows:


−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏


(5.1)

Example 5.1: Tagging Data Interpretation using set-based scheme.

An example on how the set-based scheme interprets tagging data is illustrated by

using the entries of User 1 (𝑢1) in the toy example illustrated in Figure 3.2. Recall

that the toy example represents a tensor model that holds the record of 𝐴𝑜𝑏 and

𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3}, 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}.

Each slice of the tensor represents a user matrix that contains the user tag usage for

each item. The “+” symbols represent the 𝐴𝑜𝑏 entries, for instance, the observed

tagging data example of Figure 3.2 shows that user 𝑢1 has annotated item 𝑖2 using

tag 𝑡1. Figure 5.1(a) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 as the

representation of user profile, in which entries are generated from the tagging data by

implementing the set-based interpretation scheme as formulated in Equation (5.1).


In Figure 3.2, the observed entries show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his

interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, the tagging data

is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖3, 𝑖4} to be annotated using 𝑡1; (2)

{𝑖2} more than {𝑖1, 𝑖3, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3} more than {𝑖1, 𝑖2, 𝑖4}

to be annotated using 𝑡4. The representation, as shown in Figure 5.1(a), can then be

generated as: (1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “1”) while

those of {𝑖1, 𝑖3, 𝑖4} are “irrelevant” (or graded as “-1”); (2) on (𝑢1, 𝑡3) set, entry of

{𝑖2} is “relevant” while those of {𝑖1, 𝑖3, 𝑖4} are “irrelevant”; and (3) on (𝑢1, 𝑡4) set,

entry of {𝑖3} is “relevant” while those of {𝑖1, 𝑖2, 𝑖4} are “irrelevant”. Notice that the

“indecisive” entries, graded as “0”, are revealed from the non-observed sets, i.e.

(𝑢1, 𝑡2) and (𝑢1, 𝑡5).

User 30 -1 0 -1 -1

0

0

0

-1 0 -1 -1

1 0 -1 -1

-1 0 1 1

User 2

1 -1 0 -1 0

-1

-1

-1

-1 0 -1 0

1 0 1 0

-1 0 -1 0

User 1-1 0 -1 -1 0

1

-1

-1

1 -1 0

0 -1 1 0

0 -1 -1 0

tag

ite

m 0

User 30 -1 0 -1 -1

0

0

0

-1 0 -1 -1

1 0 0 0

0 0 1 1

User 2

1 0 0 0 0

-1

0

-1

-1 0 -1 0

1 0 1 0

-1 0 -1 0

User 1-1 0 -1 -1 0

1

0

-1

1 0 0

0 0 1 0

0 -1 -1 0

tag

ite

m 0

(a) (b)

Figure 5.1. The initial tensor 𝒴 ∈ ℝ3×4×5 , as the representation of user profile, which entries are

generated by implementing the: (a) set-based and (b) UTS interpretation schemes

Despite the capability of the set-based scheme in generating multi-graded

values from the tagging data, this scheme overgeneralises its interpretation, since it

completely disregards the fact that the “irrelevant” items, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set,

have possibly been annotated by the user using other tags. Motivated by this

shortcoming, preliminary work (Ifada and Nayak, 2014a) was conducted to study the

variation of the set-based scheme, in which the impact of regarding the user’s

previously annotated items in the tagging data interpretation was investigated. In the

study, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, items that have not been tagged by user 𝑢 are

considered as “irrelevant” entries, whereas those that have been tagged by a user are

considered as “indecisive” entries. This variant offers two advantages over the set-

based scheme. Firstly, it can interpret the tagging data more efficiently (Ifada and

Nayak, 2014a) as it infers the user tagging history more intensely by taking into


account the user’s collection of previously selected items and, therefore, only those

that are not within the collection are regarded as “irrelevant” entries. For this reason,

this variant of a set-based scheme was called non-user-collection on User-Tag Set

scheme. However, for simplicity, this scheme is called User-Tag Set (UTS) scheme

in this thesis. Next, the implementation of a user’s item collection constraint makes

this scheme result in less dense interpreted entries in comparison to that of the set-

based scheme. In other words, there is less data that needs to be learned from the

model. The rules of UTS scheme relevance grade labelling to generate the entries of

tensor 𝒴 can be formulated as follows:


−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐼\{𝑖|(𝑢, 𝑖,∗) ∈ 𝐴𝑜𝑏} 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏


(5.2)

Example 5.2: Tagging Data Interpretation using UTS scheme.

An example on how the UTS scheme interprets tagging data is illustrated by using

the entries of User 1 (𝑢1) in Figure 3.2. Recall that the toy example represents a

tensor model that holds the record of 𝐴𝑜𝑏, 𝒴 ∈ ℝ3×4×5 where 𝑈 = {𝑢1, 𝑢2, 𝑢3},

𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, and 𝑇 = {𝑡1, 𝑡2, 𝑡3, 𝑡4, 𝑡5}. Figure 5.1(b) illustrates the constructed

initial tensor 𝒴 ∈ ℝ3×4×5 as the representation of user profile, in which entries are

generated from the tagging data by implementing the UTS interpretation scheme as

formulated in Equation (5.2).

In Figure 3.2, the observed entries show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his

interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4}, the tagging data

is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡1; (2)

{𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3} more than {𝑖1, 𝑖4} to be

annotated using 𝑡4. Note that, in the same order of exemplification, the same

statement cannot be determined for: (1) {𝑖3} as it was annotated using 𝑡4; (2) {𝑖3} as

it was annotated using 𝑡4; and (3) {𝑖2} as it was annotated using 𝑡1 and 𝑡3.

As shown in Figure 5.1(b), the user profile representation can then be generated as:

(1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “1”) while those of {𝑖1, 𝑖4}

are “irrelevant” (or graded as “-1”); (2) on (𝑢1, 𝑡3) set, entry of {𝑖2} is “relevant”

while those of {𝑖1, 𝑖4} are “irrelevant”; and (3) on (𝑢1, 𝑡4) set, entry of {𝑖3} is

“relevant” while those of {𝑖1, 𝑖4} are “irrelevant”. Note that the “indecisive” entries,


graded as “0”, are revealed from: (1) the entries of observed (𝑢1, 𝑡1), (𝑢1, 𝑡3), and

(𝑢1, 𝑡4) sets that are not either “relevant” or “irrelevant”; and (2) the entries of the

non-observed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5).

The examples of the set-based scheme (Figure 5.1(a)) and the UTS scheme

(Figure 5.1(b)) show that the former overgeneralises the “irrelevant” entries. The set-

based scheme assumes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other than those

appearing in observed entries are “irrelevant” and this disregards the fact that those

items have been annotated by the user using other tags. In contrast, the UTS scheme

states that the “irrelevant” entries should only be interpreted from items that have not

been tagged by the user, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set.


This section details how the latent factors corresponding to each dimension of tensor

𝒴 are learned and generated.


Discounted Cumulative Gain (DCG) is a widely used measure for evaluating the

performance of ranking models that include multiple relevance graded data (Chapelle

and Wu, 2010), compared to other measures such as Mean Average Precision (MAP)

and Mean Reciprocal Rank (MRR) as they are more commonly used to handle binary

relevance data. Since the recommendation model is optimized with respect to the

evaluation measure such that it can generate a quality Top-𝑁 recommendation list

(Chapelle and Wu, 2010; Cremonesi et al., 2010; Xu and Li, 2007), the

recommendation task now becomes to recommend an optimal item list (from the

DCG perspective) to users using the latent factors.

DCG is significantly useful for solving the recommendation task; it generates a

quality Top-𝑁 recommendation list since it allows the correct order of higher ranked

items to be more important than that of the lower ranked items (Balakrishnan and

Chopra, 2012; Chapelle and Wu, 2010). In other words, the higher positions have

more influence on the DCG score. The DCG score for a user 𝑢 across all items under

tag 𝑡 can be defined as:

𝐷𝐶𝐺𝑢,𝑡 ≔ ∑2

𝑦𝑢,𝑖,𝑡−1

𝑙𝑜𝑔2(1+𝑟𝑢,𝑖,𝑡)𝑖∈𝐼 (5.3)


where 𝑦𝑢,𝑖,𝑡 is the relevance grade that is assigned a value from the ordinal relevance

set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {1, 0, −1}) based on the initial tensor

model. The 𝑟𝑢,𝑖,t is the ranking position of item 𝑖 for user 𝑢 with tag 𝑡 and, is

approximated using �̂�𝑢,𝑖,𝑡, i.e. the predicted preference score that reflects the

preference level of user 𝑢 for annotating item 𝑖 using tag 𝑡, calculated from the

latent factors. The numerator of Equation (5.3) is the gain function that gives weight

to the items based on their relevance grade, while the denominator is the discount

function that makes items lower down in the ranked list, contribute less to the score.

The DCG score of all users over all items under all tags can be defined as:

𝐷𝐶𝐺 ≔1

𝑄𝑆∑ ∑ ∑

2𝑦𝑢,𝑖,𝑡−1

𝑙𝑜𝑔2(1+𝑟𝑢,𝑖,𝑡)𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 (5.4)

From the tensor 𝒴, latent factors matrices are generated and learned in order to

derive the latent relationships between the dimensions of users, items and tags. To

develop Do-Rank, the CP model (Kolda and Bader, 2009) is used as the factorization

technique as well as the predictor function model. CP is a well-known factorization

technique that has been shown to be less expensive in both memory and time

consumption compared to Tucker (Kolda and Bader, 2009).

Y »

(Q x R x S)

+m

(3)1

m(1)

1

m(2)

1

m(3)

2

m(1)

2

m(2)

2

+m

(3)F

m(1)

F

m(2)

F

. . .


As illustrated in Figure 5.2, CP factorizes a third-order tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 into

a sum of latent factor rank-one of 𝑚𝑓(1)

∈ ℝ𝑄, 𝑚𝑓(2)


∈ ℝ𝑆 for

𝑓 = 1,… , 𝐹, where 𝐹 is the column size of the corresponding latent factors. These

latent factors are used in calculating the predicted score that reflects the preference


level of a user 𝑢 for annotating an item 𝑖 using a tag 𝑡. Recall that a CP model can

also be considered as a special case of Tucker with a diagonal core tensor (Kolda and

Bader, 2009), as illustrated in Figure 4.4. The predicted preference score is

calculated as:

�̂�𝑢,𝑖,𝑡 ∶= ∑ 𝑚𝑢,𝑓(1)

∙ 𝑚𝑖,𝑓(2)


𝑓=1 = ⟦𝑀(1), 𝑀(2), 𝑀(3)⟧ (5.5)

5.2.3.2 Ranking Smoothing

It can be seen from Equation (5.4) that DCG is dependent on the ranking positions.

The rankings change in a non-smooth way with respect to the predicted preference

scores calculated based on the model parameters (i.e. latent factor matrices). The

non-smooth function of DCG makes difficult the application of standard

optimization approaches such as gradient descent since they require smoothness in

the objective function (Chapelle and Wu, 2010; Wu et al., 2009).

The proposed method Do-Rank solves the non-smoothing problem of DCG by

approximating the ranking position 𝑟𝑢,𝑖,𝑡 using a smoothing function with respect to

the model parameters. Inspired by the learning-to-rank approach from the field of

Information Retrieval (Chapelle and Wu, 2010; Wu et al., 2009), 𝑟𝑢,𝑖,𝑡 is

approximated by the following smoothing function:

𝑟𝑢,𝑖,𝑡 ≈ 1 + ∑ 𝜎(Δ�̂�)𝑗≠𝑖 (5.6)

where 𝜎(𝑥) is the logistic function 1

1+𝑒−𝑥, and Δ�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡 is the predicted

preference scores difference for two items calculated from the latent factors.

Substituting Equation (5.6) to Equation (5.4), the smoothed approximation of DCG is

obtained as:

𝑠𝐷𝐶𝐺 ≔1

𝑄𝑆∑ ∑ ∑


1+log2(∑ 𝜎(Δ�̂�)𝑗≠𝑖 )𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 (5.7)

Figure 5.3 shows the comparison between DCG, calculated using Equation (5.4), and

the smoothed approximation of DCG (𝑠𝐷𝐶𝐺), calculated using Equation (5.7).


Figure 5.3. The comparison between DCG and the smoothed approximation of DCG (𝑠𝐷𝐶𝐺)


Given Equation (5.6), the resultant objective function can now be formulated as:

𝐿(Θ) ≔ ∑ ∑ ∑2


1+log2(∑ 𝜎(Δ�̂�)𝑗≠𝑖 )𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆Θ‖Θ‖𝐹2 (5.8)

where 𝜆Θ is the regularization coefficient corresponding to 𝜎Θ as model parameters

that controls overfitting. Note that the constant coefficient 1

𝑄𝑆 in 𝑠𝐷𝐶𝐺 (Equation

(5.7)) can be neglected since it has no influence on the optimization. The gradient

descent is performed to optimize the objective function in Equation (5.8). Given a

case (𝑢, 𝑖, 𝑡) with respect to the model parameters {𝑀(1), 𝑀(2), 𝑀(3)}, the gradient of

𝑠𝐷𝐶𝐺 can be achieved by computing the derivation of Equation (5.8) as follows:

𝜕𝐿

𝜕𝜃=

𝜕

𝜕𝜃(∑ ∑ ∑


1+log2((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 ) − 𝜆𝜃𝜃 (5.9)

which can be rewritten as:

𝜕𝐿

𝜕𝜃= ∑ ∑ ∑

−(2𝑦𝑢,𝑖,𝑡−1)[

1

((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ) ln2)(

𝜕

𝜕𝜃(∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))]

[1+log2((∑ 𝜎(Δ�̂�)𝑗≠𝑖 ))]2𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.10)

Given that:

𝜕

𝜕𝜃(∑ 𝜎(Δ�̂�)𝑗≠𝑖 ) = (∑ (−𝜎(Δ�̂�) + (𝜎(Δ�̂�))

2)𝑗≠𝑖 )

𝜕

𝜕𝜃Δ�̂� (5.11)

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

DCG vs Smoothed DCG (sDCG)

Point

Sc

ore

DCG

Smoothed DCG (sDCG)


And replacing 𝜎(𝛥�̂�) as 𝛿, for notational convenience, the resultant gradient of

𝑠𝐷𝐶𝐺 is obtained as:

𝜕𝐿

𝜕𝜃= ∑ ∑ ∑

−(2𝑦𝑢,𝑖,𝑡−1)[

1

(ln2∑ 𝛿𝑗≠𝑖 )(∑ (−𝛿+𝛿2)𝑗≠𝑖

𝜕

𝜕𝜃Δ�̂�)]

[1+log2(∑ 𝛿𝑗≠𝑖 )]2𝑖∈𝐼𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.12)

Equation (5.12) confirms that only 𝜕

𝜕𝜃Δ�̂� needs to be computed, with respect to

the model parameters, to implement the 𝑠𝐷𝐶𝐺 optimization. However, it can also be

noticed that directly optimizing 𝑠𝐷𝐶𝐺 across all users in the recommendation model

is computationally expensive, since the pair-wise predicted preference score

difference Δ�̂� between each item and all other items in the system needs to be

calculated.

Do-Rank proposes to solve the computation problem by employing a fast

learning approach. The basic idea of the approach is to optimize 𝑠𝐷𝐶𝐺 by computing

only the pair-wise predicted preference score difference Δ�̂� between items of

“relevant” entries and those of “irrelevant” entries, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. In this

case, the user’s “relevant” (or positive) and “irrelevant” (or negative) items are

inferred from each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set and then defined as: (1) 𝑍𝑃 = {𝑖|𝑦𝑢,𝑖,𝑡 = 1},

positive items derived from the observed data, and (2) 𝑍𝑁 = {𝑖|𝑦𝑢,𝑖,𝑡 = −1},

negative items derived from the items that have not been tagged by user 𝑢 using any

other tags. The resultant objective function can now be formulated by:

𝐿(Θ) ≔ ∑ ∑ ∑2


1+log2(∑ 𝜎(Δ�̂�)𝑗∈𝑍𝑁 )𝑖∈𝑍𝑃𝑡∈𝑇𝑢∈𝑈 − 𝜆Θ‖Θ‖𝐹2 (5.13)

where Δ�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡.

The gradient of 𝑠𝐷𝐶𝐺 given a case (𝑢, 𝑖, 𝑗, 𝑡) with respect to the model

parameter = {𝑚𝑢(1)

, 𝑚𝑖(2)

, 𝑚𝑗(2)

, 𝑚𝑡(3)

} is given by Equation (5.14).

𝜕𝐿

𝜕𝜃= ∑ ∑ ∑

−(2𝑦𝑢,𝑖,𝑡−1)[

1

(ln2 ∑ 𝛿𝑗∈𝑍𝑁 )(∑ (−𝛿+𝛿2)𝑗∈𝑍𝑁

𝜕

𝜕𝜃Δ�̂�)]

[1+log2(∑ 𝛿𝑗∈𝑍𝑁 )]2𝑖∈𝑍𝑃𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.14)

where 𝛿 = 𝜎(Δ�̂�).


To apply the 𝑠𝐷𝐶𝐺 optimization, the gradients for the model based on its

parameters only have to compute the gradient of 𝜕

𝜕𝜃Δ�̂� as follows:

𝜕Δ�̂�

𝜕𝑚𝑢(1) = (𝑚𝑖

(2)⨀𝑚𝑡

(3)− 𝑚𝑗

(2)⨀𝑚𝑡

(3)) (5.15)

𝜕Δ�̂�

𝜕𝑚𝑖(2) = (𝑚𝑢

(1)⨀𝑚𝑡

(3)) (5.16)

𝜕Δ�̂�

𝜕𝑚𝑗(2) = −(𝑚𝑢

(1)⨀𝑚𝑡

(3)) (5.17)

𝜕Δ�̂�

𝜕𝑚𝑡(3) = (𝑚𝑢

(1)⨀𝑚𝑖

(2)− 𝑚𝑢

(1)⨀𝑚𝑗

(2)) (5.18)

where ⨀ denotes an element-wise product. It can be noted, from Equation (5.14),

that to optimize 𝑠𝐷𝐶𝐺 across all users and under all tags, Δ�̂� needs to be computed

for each 𝑍𝑃 only that is less computationally expensive than computing Δ�̂� for each

𝑅, since |𝑍𝑃| ≪ 𝑅. The Do-Rank learning algorithm is outlined in Figure 5.4.

5.2.3.4 Complexity Analysis and Convergence

The complexity of learning process for a single iteration is analysed. Complexity of

the Do-Rank with fast learning (as illustrated in Figure 5.4) is 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑝�̃�))

where 𝑝 and �̃� denote the average number of 𝑍𝑃 and 𝑍𝑁 per (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. Since

𝑝, �̃� ≪ 𝑅, the Do-Rank complexity now becomes linear to the size of 𝑅, instead of

exponential to 𝑅 as it would have been in the absence of fast learning and the UTS

scheme.

The objective function of Do-Rank is optimizing 𝑠𝐷𝐶𝐺 (Equation (5.7)), the

smoothed approximation of DCG (Equation (5.4)). Do-Rank uses the iterated DCG

scores during the optimization process as the termination criterion (Shi, Karatzoglou,

Baltrunas, Larson, Hanjalic, et al., 2012), instead of using the conventional criteria

such as the number of iterations (Shi, Karatzoglou, Baltrunas, Larson, Oliver, et al.,

2012) and the convergence rate (Rendle and Schmidt-Thieme, 2010). The

optimization process is terminated when DCG scores start to decline, where usually

it only requires less than 20 iterations to reach this stage. Since the number of

required iterations is quite small, this does not affect Do-Rank complexity.


1: Algorithm: Do-Rank Learning

2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, learning rate 𝛼, factor matrix

column size 𝐹, regularization 𝜆, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥


4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆

5: Populate 𝒴 using Equation (5.2) 6: 𝑍𝑃 = {𝑖|𝑦𝑢,𝑖,𝑡 = 1}, 𝑍𝑁 = {𝑖|𝑦𝑢,𝑖,𝑡 = −1}


∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0

8: 𝑔0 = 𝐷𝐶𝐺 based 𝒴 and 𝑀(1)(0), 𝑀(2)(0)

, 𝑀(3)(0)

9: repeat


11: 𝑚𝑢(1)

⟵ 𝑚𝑢(1)

+ 𝛼𝜕𝐿

𝜕𝑚𝑢(1) based on Equation (5.14) and (5.15)


13: 𝑚𝑡(3)

⟵ 𝑚𝑡(3)

+ 𝛼𝜕𝐿

𝜕𝑚𝑡(3) based on Equation (5.14) and (5.18)


15: for 𝑡 ∈ 𝑇 do 16: for 𝑖 ∈ 𝑍𝑃 do

17: for 𝑗 ∈ 𝑍𝑁 do

18: 𝑚𝑖(2)

⟵ 𝑚𝑖(2)

+ 𝛼𝜕𝐿

𝜕𝑚𝑖(2) based on Equation (5.14) and (5.16)

19: 𝑚𝑗(2)

⟵ 𝑚𝑗(2)

+ 𝛼𝜕𝐿

𝜕𝑚𝑗(2) based on Equation (5.14) and (5.17)

20: + + ℎ

21: 𝑔 = 𝐷𝐶𝐺 based 𝒴 and 𝑀(1)(ℎ), 𝑀(2)(ℎ)

, 𝑀(3)(ℎ)

22: if 𝑔 − 𝑔0 ≤ 0 23: Break

24: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥

Figure 5.4. The Do-Rank learning algorithm


Since Do-Rank implements a ranking-based interpretation scheme and learning-to-

rank model, the resulted latent factors of Do-Rank can be directly used for generating

the Top-𝑁 list of item recommendations for each target user. Using Equation (5.5),

the predicted preference score �̂�𝑢,𝑖,𝑡 of target user 𝑢 to item 𝑖 on tag 𝑡 is calculated.

The candidate items of user 𝑢 are identified based on the maximum �̂�𝑢,𝑖,𝑡 of each

user-item set. The score of candidate items are then ranked in descending order for

generating the list of the recommended items.



The proposed Do-Rank method and benchmarking methods are evaluated by 5-fold






𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured and reported over the average values

on all five runs using Normalized DCG (NDCG) and Average Precision (AP),

presented at various Top-𝑁 positions, as well as Mean Average Precision (MAP).

The performance of Do-Rank is compared with benchmarking methods,

including MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme,

2010), and CTS (Kim et al., 2010). It is to be noted that comparison amongst all

proposed methods is presented in Chapter 6.

To enable meaningful comparisons, the parameter values for all methods are

tuned on randomly selected 25% of all the observed data available in 𝐷𝑡𝑟𝑎𝑖𝑛. For all

tensor-based methods, the size of latent factor matrix 𝐹 is set to 128 as the

recommendation quality usually does not benefit from more than that value. The

learning rate 𝛼 and regularization 𝜆 for PITF are set as 0.01 and 0.00005,

respectively, as suggested in the article (Ifada and Nayak, 2015). For CTS, the

neighbourhood size 𝑘 and model size 𝑤 are all searched from the grid of

{10,20,30,40,50,60,70,80,90,100}. The learning rate 𝛼 and regularization 𝜆 for Do-

Rank are adjusted from 0.01 to 0.1 and 0.00001 to 0.00005, respectively.


The recommendation performance comparisons of the proposed Do-Rank and the

benchmarking methods on each dataset are listed in Table 5.1, Table 5.2, Table 5.3,

and Table 5.4. It can be observed that Do-Rank outperforms the benchmarking

methods in terms of NDCG, AP and MAP on most datasets. It can be noted that the

higher the Top-𝑁 position, the less the NDCG score is, while in contrast, the AP

score is higher on less Top-𝑁 position.


Compared to PITF that employs an AUC-based optimization approach which

gives equal penalty to the mistakes at the top and bottom list of recommendations

(Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012), Do-Rank enhances the

Top-𝑁 recommendation performance by optimizing the top-biased measure DCG.

The results confirm that optimizing Top-𝑁 recommendation evaluation measure for

building the learning model will improve the recommendation performance.

Additionally, PITF is a pair-wise ranking model that aims to get the ranking order

within each pair correctly, while Do-Rank employs the list-wise ranking model

which aims to get the correct order of all items in the recommendation list. Lastly,

Do-Rank outperformance over CTS proves that the three-dimensional characteristic

of tagging data must be captured so that the many-to-many relationships that exist

among the dimensions can be kept rather than projecting the three-dimension into

two-dimensions (Symeonidis et al., 2010).



NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP NDCG@5 NDCG@10 AP@5 AP@10 MAP

MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75

PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29

CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07

Do-Rank 2.31 1.98 5.22 5.72 2.78 2.36 2.09 5.25 5.93 3.54 2.69 2.26 6.02 6.64 4.63

Table 5.1. NDCG, AP, and MAP on Delicious dataset



MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11

PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52

CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59

Do-Rank 8.15 7.05 14.12 14.45 6.50 8.55 7.56 15.51 15.68 7.09 9.40 8.29 17.13 17.52 7.61

Table 5.2. NDCG, AP, and MAP on LastFM dataset




MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.50 6.88 3.97 3.48 8.31 9.24 8.82

PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.20 7.18 4.65 3.74 8.54 9.91 9.78

CTS 2.20 1.87 4.94 5.34 4.31 3.87 3.20 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33

Do-Rank 3.08 2.77 5.42 5.95 4.47 4.95 4.18 10.28 10.90 9.31 5.17 4.49 10.62 11.64 11.43

Table 5.3. NDCG, AP, and MAP on CiteULike dataset



MAX 6.23 5.10 10.50 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.40

PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.40 9.98 5.98 6.28 5.65 12.26 12.90 8.73

CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.30

Do-Rank 6.28 5.39 11.00 11.47 6.22 8.61 7.13 16.49 16.96 9.81 11.34 9.21 21.06 21.68 14.28

Table 5.4. NDCG, AP, and MAP on MovieLens dataset


5.2.5.2 Impact of UTS scheme

The impact of implementing UTS scheme is investigated by comparing the tensor 𝒴

entries population generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using the proposed UTS scheme with those

of the boolean and set-based schemes. The statistics of the tensor entries population

listed in Table 5.5 shows that the “relevant” entries population of all schemes are the

same. Note that the boolean scheme generates the least variety of distinct entries as it

overfits the “irrelevant” and “indecisive” entries of the non-observed tagging data

(Ifada and Nayak, 2014a; Rendle, Balby Marinho, et al., 2009).

It can be easily perceived from Table 5.5 that both the set-based and UTS

scheme differentiate the “irrelevant” and “indecisive” entries as two distinguished

entries, however the entries population distributions are not the same. The reason for

this is because the UTS scheme implements a user’s item collection constraint for

interpreting the non-observed tagging data, as previously described in Section 5.2.2.

This is unlike the set-based scheme, which interprets all items other than those

appearing in “relevant” entries as “irrelevant” entries. In other words, the set-based

scheme includes some relationships that are not meant to be. Therefore, the

“irrelevant” entries population of UTS is less than that of set-based scheme. On the

contrary, the “indecisive” entries population of UTS is more than that of set-based

scheme.


Dataset Interpretation

Scheme Distinct Entry

Tensor Population (%)


Delicious boolean Relevant 0.0006 0.0014 0.0027

Irrel./Indecisive 99.9994 99.9986 99.9973

set-based Relevant 0.0006 0.0014 0.0027

Irrelevant 0.4377 0.5670 0.6737

Indecisive 99.5617 99.4316 99.3236

UTS Relevant 0.0006 0.0014 0.0027

Irrelevant 0.4285 0.5499 0.6477

Indecisive 99.5709 99.4487 99.3496

LastFM boolean Relevant 0.0042 0.0092 0.0163



Irrelevant 1.2937 1.7233 2.1553

Indecisive 98.7021 98.2675 97.8284

UTS Relevant 0.0042 0.0092 0.0163

Irrelevant 1.2538 1.6443 2.0242

Indecisive 98.7420 98.3465 97.9595

CiteULike boolean Relevant 0.0010 0.0037 0.0095



Irrelevant 0.3176 0.4777 0.5995

Indecisive 99.6814 99.5186 99.3910

UTS Relevant 0.0010 0.0037 0.0095

Irrelevant 0.3039 0.4450 0.5472

Indecisive 99.6951 99.5513 99.4433

MovieLens boolean Relevant 0.0062 0.0283 0.0284



Irrelevant 1.6114 2.0969 2.4147

Indecisive 98.3824 97.8748 97.5569

UTS Relevant 0.0062 0.0283 0.0284

Irrelevant 1.4651 1.8476 2.0418

Indecisive 98.5287 98.1241 97.9298

Table 5.5. The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using

boolean, set-based and UTS schemes


5.2.5.3 Scalability

The scalability of Do-Rank is examined in terms of its learning running time to study

the impact of implementing the UTS scheme for the fast learning approach on

optimizing 𝑠𝐷𝐶𝐺 as defined in Equation (5.14). The examination is demonstrated on

the 10-core of the Delicious and MovieLens datasets, as implementation on other

datasets and cores show similar results. The learning running time is measured on a

single iteration at various scales, i.e. 10% to 100% of training set (𝐷𝑡𝑟𝑎𝑖𝑛).

Figure 5.5 shows that the running time of the “fast learning” approach is linear

to the size of data on both datasets, i.e. determined by the size of items 𝑅. The

“original learning” approach, i.e. optimizing 𝑠𝐷𝐶𝐺 without implementing the fast

learning approach requires more learning time since the computational complexity is

determined by 𝑅2, as previously described in Section 1.1.1.

(a) (b)

Figure 5.5. The Do-Rank scalability

5.2.5.4 Convergence

The learning algorithm convergence of Do-Rank is demonstrated on the MovieLens

20-core set, however, the convergence behaviours of other datasets and cores are the

same. Figure 5.6(a) and Figure 5.6(b) show the evolution of DCG@10 across

iterations on the training (𝐷𝑡𝑟𝑎𝑖𝑛) and test (𝐷𝑡𝑒𝑠𝑡) sets respectively. DCG increases

through early iterations on both sets, before the performance is declined. It ascertains

that Do-Rank is able to effectively optimize DCG. It can be noted that the DCG

measure drops after a few iterations (less than 15) which indicates that using a

measure score as termination criterion is a useful approach in order to avoid the

model to overfit (Shi, Karatzoglou, Baltrunas, Larson, Hanjalic, et al., 2012).

10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

3000

3500

4000

Ratio of training set (%)

Ru

nn

ing

tim

e (

sec)

Delicious 10-core

Fast Learning

Original Learning

10 20 30 40 50 60 70 80 90 1000

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000


Ru

nn

ing

tim

e (

sec)

MovieLens 10-core

Fast Learning

Original Learning


(a) Training set (𝐷𝑡𝑟𝑎𝑖𝑛) (b) Test set (𝐷𝑡𝑒𝑠𝑡)

Figure 5.6. The Do-Rank convergence criterion

5.2.6 Summary of Learning from Multi-Graded Data

In this section, the DCG Optimization for Learning-to-Rank (Do-Rank) method is

proposed for learning from multi-graded data on a tag-based item recommendation

model. Do-Rank proposes the User-Tag Set (UTS) scheme for interpreting the

tagging data and directly optimizes the (smoothed) DCG for learning the tensor

model in order to generate an ordered list of items that might interest the user. A fast

learning approach is also implemented to enable efficient execution of Do-Rank.


that:

Do-Rank outperforms all benchmarking methods on the NDCG, AP, and

MAP measures on most datasets. This ascertains that optimizing DCG for

building the learning model improves the recommendation performance;

UTS scheme more efficiently interprets the tagging data, in comparison to the

set-based scheme, as it implements a user’s item collection constraint for

interpreting the non-observed tagging data;

UTS scheme improves Do-Rank scalability as it generates less dense non-

indecisive entries, in comparison to that of the set-based scheme, and

therefore less data need to be learned by the ranking model.

0 5 10 15 20

17

17.2

17.4

17.6

17.8

18

18.2

18.4

MovieLens pc-20: Training set

Number of iteration

DC

G@

10

(%

)

0 5 10 15 20

8

8.2

8.4

8.6

8.8

9

9.2

9.4

MovieLens pc-20: Test set

Number of iteration

DC

G@

10

(%

)


5.3 GO-RANK: LEARNING FROM GRADED-RELEVANCE DATA

5.3.1 Overview

The GAP Optimization for Learning-to-Rank (Go-Rank) method is developed for

learning from graded-relevance data on a tag-based item recommendation model, in

which the GAP ranking evaluation measure is used as the optimization criterion. Go-

Rank generates an optimal list of recommended items from the GAP perspective for

all users. This section also presents a novel graded-relevance scheme to construct the

initial third-order tensor model for representing the user profile. The proposed

graded-relevance scheme interprets the tagging data with four distinct entries, i.e.

“relevant”, “likely relevant”, “irrelevant”, and “indecisive”. The “likely relevant”

entries are the transitional entries between the “relevant” and “irrelevant” entries. It

is to be noted that using GAP as the optimized ranking evaluation measure enables

the learning model to set up thresholds so that the “likely relevant” entries can be

regarded as either “relevant” or “irrelevant” entries. Each tagging data entry can then

be graded with one of the ordinal relevance values of {2,1,0, −1}. As a result, the

scheme generates the implicit tagging data entries as multi-graded data, similar to

explicit rating data that is hard to obtain otherwise.

Graded Average Precision (GAP) (Robertson et al., 2010) has been shown to

effectively work as the generalisation of Average Precision (AP) for the case of

rating of explicit feedback data (Robertson et al., 2010). Researchers (Shi et al.,

2013a) have proposed to use GAP as the recommendation model optimization

criterion for a recommendation system that uses ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩ binary relation rating

data entries as explicit feedback data. The list of recommendations is generated by

ranking the predicted preference scores inferred from the non-observed ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚⟩

relations (Balakrishnan and Chopra, 2012; Weimer et al., 2007). In contrast, this

thesis deals with a quite different and difficult problem in comparison to the prior

work as the tag-based recommendation system is built from the ⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩

ternary relations tagging data and the ordinal rating values are given on each

observed user-tag set. This means that there exist multiple rating values on each

observed user-tag set, instead of just one rating value like the one given for the

explicit feedback data. Eventually, the recommendation list needs to be generated by

ranking the predicted preference scores of the list of items under all tags that may be


of interest to a user. Therefore, the tag-based recommendation system should be able

to infer the tag that will influence the user for choosing the recommended item based

on the highest preference score.

The next three sub-sections detail the three main processes in Go-Rank: (1)






The user profile construction is the process of constructing the initial tensor to model

the multi-dimension data. The developed Go-Rank method uses a novel graded-

relevance scheme for constructing an initial third-order tensor model to represent the

user profile and ranking learning model. The proposed graded-relevance scheme

interprets the tagging data with four distinct entries, i.e. “relevant”, “likely relevant”,

“irrelevant”, and “indecisive”. The “likely relevant” entries are set as the transitional

entries between the “relevant” and “irrelevant” entries.



𝐴 ∶= 𝑈 × 𝐼 × 𝑇, a vector of 𝑎 = (𝑢, 𝑖, 𝑡) ∈ 𝐴 represents the tagging activity of user 𝑢

to annotate item 𝑖 using tag 𝑡. The observed tagging data, 𝐴𝑜𝑏, defines the state in

which users have expressed their interest to items in the past, by annotating those

items using tags where 𝐴𝑜𝑏 ⊆ 𝐴. Usually, the number of observed tagging data is

very sparse thus |𝐴𝑜𝑏| ≪ |𝐴|.

The set-based scheme (Rendle, Balby Marinho, et al., 2009) can solve the

drawbacks of the boolean scheme by differentiating the entries of non-observed data

as “irrelevant” and “indecisive” entries according to each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set. However,

as shown in Section 5.2.2, the set-based scheme is incorrect as it overgeneralises the

“irrelevant” entries and results in inferior recommendation performance (Ifada and

Nayak, 2014a). On each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, the scheme interprets any items other than

those appearing in observed entries as “irrelevant” entries (Rendle, Balby Marinho,

et al., 2009) and disregards the fact that some of those items have been annotated by


the user using other tags. Actually, only the items, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, that have

not been tagged by user 𝑢 should be interpreted as “irrelevant” entries (Ifada and

Nayak, 2014a). Yet, how to interpret the entries of non-observed data that should not

simply be regarded as “irrelevant” remains in dispute.

Example 5.3: Issues in Tagging Data Interpretation using set-based scheme.

An example of what the issues are, of using the set-based scheme to interpret the

tagging data, is illustrated by using the entries of User 1 (𝑢1) in the toy example

illustrated in Figure 3.2. Using the set-based scheme (Rendle, Balby Marinho, et al.,

2009), as illustrated in Figure 5.7(a), the observed and non-observed entries can be

listed based on (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 sets of the user. The observed entries show that there

exist (𝑢1, 𝑡1), (𝑢1, 𝑡3), and (𝑢1, 𝑡4) sets that have been used to annotate {𝑖2}, {𝑖2}, and

{𝑖3}, respectively. These observed entries are regarded as “relevant” entries while all

non-observed entries of non-existed set, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5) on any items can

easily be interpreted as “indecisive” entries. The problem becomes apparent when

the “irrelevant” entries are to be interpreted. It can be seen that on all sets, both 𝑖1

and 𝑖4 have never been annotated by 𝑢1 using any other tags, and therefore it makes

sense to assume that the entries of (𝑢1, 𝑡1), (𝑢1, 𝑡3), and (𝑢1, 𝑡4) with {𝑖1, 𝑖4} are

“irrelevant”. However, the entries of 𝑖3 with (𝑢1, 𝑡1) and (𝑢1, 𝑡3) should not simply

be interpreted as “irrelevant” as the item 𝑖3 occurs as “relevant” on (𝑢1, 𝑡4).

Similarly, 𝑖2 is “relevant” on (𝑢1, 𝑡1) and (𝑢1, 𝑡3), and therefore (𝑢1, 𝑡4) with 𝑖2

cannot be “irrelevant”. Simply labelling those entries as “irrelevant” is improper and

can result in inferior recommendation performance (Ifada and Nayak, 2014a). It can

be noted that those entries definitely cannot be labelled “indecisive” as they are not

amongst entries to be predicted in the future.

User 30 -1 0 -1 -1

0

0

0

-1 0 -1 -1

1 0 -1 -1

-1 0 1 1

User 2

1 -1 0 -1 0

-1

-1

-1

-1 0 -1 0

1 0 1 0

-1 0 -1 0

User 1-1 0 -1 -1 0

1

-1

-1

1 -1 0

0 -1 1 0

0 -1 -1 0

tag

ite

m 0

User 30 -1 0 -1 -1

0

0

0

-1 0 -1 -1

2 0 1 1

1 0 2 2

User 2

2 1 0 1 0

-1

1

-1

-1 0 -1 0

2 0 2 0

-1 0 -1 0

User 1-1 0 -1 -1 0

2

1

-1

2 1 0

0 1 2 0

0 -1 -1 0

tag

ite

m 0

(a) (b)

Figure 5.7. Example of initial tensor 𝒴 ∈ ℝ3×4×5 , as the representation of user profile, which entries

are generated by implementing the (a) set-based and (b) graded-relevance interpretation schemes


The graded-relevance interpretation scheme is proposed to effectively leverage

the tagging data for building the tensor ranking learning model 𝒴. Following the

general rule, the observed entries are regarded as “relevant”, indicating that users

have shown their interest in the entries. From the observed entries, the list of distinct

items is extracted that have been annotated by user 𝑢 using any tags. This list,

denotes as 𝐶𝑢, is defined as follows:

𝐶𝑢 = {𝑖|(𝑢, 𝑖,∗) ∈ 𝐴𝑜𝑏} (5.19)

The item set 𝐶𝑢 assists in distinguishing the non-observed data that do not

belong to either the “irrelevant” or the “indecisive” category. On each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏

set, the entries in which items have been annotated using other tags are labelled as

“likely relevant” entries. As a result, the graded-relevance scheme exposes the non-

observed entries as a mixture of three entries: (1) “likely relevant” entries – user is

probably interested in the entries, yet this is not explicitly revealed, (2) “irrelevant”

entries – user is not interested in the entries, and (3) “indecisive” entries – user might

be interested in the entries in the future. The “likely relevant” entries are revealed as

entries for which, even though they do not occur in the observed set, the items of the

entries have actually been tagged by the user.

To represent the user profile, the initial tensor 𝒴 ∈ ℝ𝑄×𝑅×𝑆 is constructed

where each tensor entry, 𝑦𝑢,𝑖,𝑡, is given a numerical value that represents the

relevance grade of tagging activity. Having the four possible distinct values for each

entry, the entries are assigned with an ordinal relevance value, which is graded from

the highest to the lowest ones, i.e. “relevant”, “likely relevant”, “irrelevant”, and

“indecisive”. The graded-relevance scheme can generate entries labelled with

{2,1, −1,0} for the tensor model, which are comparable to rating data. The rules of

graded-relevance scheme relevance grade labelling to generate the entries of tensor

𝒴 can be formulated as follows:

𝑦𝑢,𝑖,𝑡 ≔ {

2 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∈ 𝐴𝑜𝑏

1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐶𝑢 𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏

−1 𝑖𝑓 (𝑢, 𝑖, 𝑡) ∉ 𝐴𝑜𝑏 𝑎𝑛𝑑 𝑖 ∈ 𝐼\𝐶𝑢𝑎𝑛𝑑 (𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏


(5.20)


Example 5.4: Tagging Data Interpretation using graded-relevance scheme.

An example of how the graded-relevance scheme interprets tagging data is

illustrated by using the entries of User 1 (𝑢1) in the toy example illustrated in Figure

3.2. Figure 5.7(b) illustrates the constructed initial tensor 𝒴 ∈ ℝ3×4×5 that represents

the user profile, for which entries are generated from the tagging data by

implementing the graded-relevance interpretation scheme as formulated in Equation

(5.20).

The observed entries in Figure 3.2 show that 𝑢1 has used 𝑡1, 𝑡3, and 𝑡4 to reveal his

interest for {𝑖2}, {𝑖2}, and {𝑖3}, respectively. Given 𝐼 = {𝑖1, 𝑖2, 𝑖3, 𝑖4} and 𝐶𝑢1=

{𝑖2, 𝑖3}, the tagging data is interpreted as 𝑢1 favours: (1) {𝑖2} more than {𝑖1, 𝑖4} to be

annotated using 𝑡1; (2) {𝑖2} more than {𝑖1, 𝑖4} to be annotated using 𝑡3; and (3) {𝑖3}

more than {𝑖1, 𝑖4} to be annotated using 𝑡4. Note that, in the same order of

exemplification, interpretations can also be made that 𝑢1 favours: (1) {𝑖3} more than

{𝑖1, 𝑖4} as it was annotated using 𝑡4, yet still less than {𝑖2}; (2) {𝑖3} more than {𝑖1, 𝑖4}

as it was annotated using 𝑡4, yet still less than {𝑖2}; and (3) {𝑖2} more than {𝑖1, 𝑖4} as

it was annotated using 𝑡1 and 𝑡3, yet still less than {𝑖3}.

As shown in Figure 5.7(b), the user profile representation can then be generated as:

(1) on (𝑢1, 𝑡1) set, entry of {𝑖2} is “relevant” (or graded as “2”) while those of {𝑖3}

and {𝑖1, 𝑖4} are “likely relevant” (or graded as “1”) and “irrelevant” (or graded as

“-1”), respectively; (2) on (𝑢1, 𝑡3) set, entry of {𝑖2} is “relevant” while those of {𝑖3}

and {𝑖1, 𝑖4} are “likely relevant” and “irrelevant” , respectively; and (3) on (𝑢1, 𝑡4)

set, entry of {𝑖3} is “relevant” while those of {𝑖2} and {𝑖1, 𝑖4} are “likely relevant” and

“irrelevant” , respectively. Note that the “indecisive” entries, graded as “0”, are

revealed from the entries of the non-observed sets, i.e. (𝑢1, 𝑡2) and (𝑢1, 𝑡5).

Comparing the example of the graded-relevance scheme (Figure 5.7(b)) to that of

set-based scheme (Figure 5.7(a)), it can be observed that the latter overgeneralised

the “irrelevant” entries as it assumes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other

than those appearing in observed entries are “irrelevant” and disregards the fact that

those items have been tagged by the user. The graded-relevance scheme states that

“irrelevant” entries of the set-based scheme can be further broken down into “likely

relevant” and “irrelevant” entries.



This section details how the latent factors, corresponding to each dimension of tensor

𝒴, are learned and generated.


GAP is the generalisation of the Average Precision (AP) measure for the ordinal

relevance data (Robertson et al., 2010). Using GAP as the optimized ranking

evaluation measure enables the recommendation model to set up thresholds so that

the “likely relevant” entries can be regarded as either “relevant” or “irrelevant”

entries. The task of recommendation can now be formulated as the recommendation

of an optimal (from the GAP perspective) items list to users using the latent factor

matrices of the tensor model. Based on the original definition of GAP (Robertson et

al., 2010), the GAP score for a user 𝑢 under tag 𝑡 can be formulated as:

𝐺𝐴𝑃𝑢,𝑡 ∶=∑ ∑ ∑ 𝑔𝑘𝕀(𝑦𝑢,𝑖,𝑡≥1)𝕀(𝑦𝑢,𝑗,𝑡≥𝜇𝑘)

𝕀(𝑟𝑢,𝑗,𝑡≤𝑟𝑢,𝑖,𝑡)

𝑟𝑢,𝑖,𝑡𝑗≠𝑖

𝑐𝑘=1𝑖∈𝐼

∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘

𝑙𝑘=1

(5.21)

And therefore the GAP score for all users under all tags can be defined as:

𝐺𝐴𝑃 ∶=1

𝑄𝑆∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦

𝕀(𝑟𝑢,𝑗,𝑡≤𝑟𝑢,𝑖,𝑡)

𝑟𝑢,𝑖,𝑡𝑗≠𝑖



𝑙𝑘=1

𝑡∈𝑇𝑢∈𝑈 (5.22)

where 𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡) and 𝑦𝑢,𝑖,𝑡 is the relevance label assigned from the ordinal

relevance values of {2,1,0, −1} obtained from the initial tensor model. The 𝑟𝑢,𝑖,t is

the ranking position of item 𝑖 for user 𝑢 with tag 𝑡 and, is approximated using �̂�𝑢,𝑖,𝑡 ,

i.e. the predicted preference score that reflects the preference level of user 𝑢 for

annotating item 𝑖 using tag 𝑡, calculated from the latent factors. The 𝑔𝑘 denotes the

threshold probability (Robertson et al., 2010) that the user sets as a threshold of

relevance at grade 𝜇𝑘, i.e. regarding the entries with grades equal or larger than 𝜇𝑘 as

“relevant” and the others as “irrelevant”. In other words, 𝑔𝑘 and 𝜇𝑘 are the

parameters that control whether the “likely relevant” entries should be regarded as

“relevant” or ‘irrelevant”. It is to be noted that the probability values must be

exclusive and exhaustive probabilities (Robertson et al., 2010):

∑ 𝑔𝑘𝑐𝑘=1 = 1 (5.23)


The 𝕀(∙) is the indicator function which is equal to 1 if the condition is satisfied, and

0 otherwise. The 𝑛𝑢,𝑙,𝑡 is number of items labelled with grade 𝑙 by user 𝑢 using tag 𝑡.

Notice that for notational convenience, the following substitution is performed:

𝛽𝑔𝑦 = 𝑔𝑘𝕀(𝑦𝑢,𝑖,𝑡 > 1)𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘) (5.24)

From the tensor 𝒴, latent factors are learned and generated in order to derive

the latent relationships between the dimensions of users, items and tags. The same as

Do-Rank, Go-Rank uses the CP model (Kolda and Bader, 2009), a well-known

technique that has been shown to be less expensive in both memory and time

consumption compared to Tucker (Kolda and Bader, 2009), as the factorization

technique as well as the predictor function model. Recall that the CP factorization

model for third-order tensor and the predicted preference score is illustrated in Figure

5.2 and calculated using Equation (5.5), respectively.

5.3.3.2 Ranking Smoothing

From Equation (5.22), it can be observed that GAP is dependent on the ranking

positions of items in the recommendation list, as it is reliant on the values of

𝕀(𝑟𝑢,𝑗,𝑡 ≤ 𝑟𝑢,𝑖,𝑡) and 𝑟𝑢,𝑖,𝑡. Given that the ranking positions are determined via the

predicted preference scores calculated based on the model parameters (i.e. latent

factor matrices), the GAP function becomes non-smooth. For this reason, it is hard to

apply the standard optimization approaches to the objective function, as such

approaches require the smoothness function (Chapelle and Wu, 2010; Wu et al.,

2009). This thesis attempts to tackle this problem by implementing the smoothing

function (Chapelle and Wu, 2010) to the ranking position with respect to the model

parameters as follows:

𝕀(𝑟𝑢,𝑗,𝑡 ≤ 𝑟𝑢,𝑖,𝑡) ≈ 𝜎(∆�̂�) (5.25)

𝑟𝑢,𝑖,𝑡 ≈ 1 + ∑ 𝜎(∆�̂�)𝑗≠𝑖 (5.26)

where 𝜎 is the logistic function 𝜎(𝑥) =1

1+𝑒−𝑥 and ∆�̂� = �̂�𝑢,𝑖,𝑡 − �̂�𝑢,𝑗,𝑡.


Substituting Equations (5.25) and (5.26) to Equation (5.22), the smoothed

approximation of GAP is obtained as:

𝑠𝐺𝐴𝑃 ∶=1

𝑄𝑆∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)

1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑗≠𝑖



𝑙𝑘=1

𝑡∈𝑇𝑢∈𝑈 (5.27)

Figure 5.8 shows the comparison between GAP, calculated using Equation (5.22),

and the smoothed approximation of GAP (𝑠𝐺𝐴𝑃), calculated using Equation (5.27).

Figure 5.8. The comparison between GAP and the smoothed approximation of GAP (𝑠𝐺𝐴𝑃)


The resultant objective function with 𝜆𝛩 as the regularization coefficient

corresponding to 𝜎𝛩 for avoiding overfitting, is formulated as:

𝐿(𝛩) ∶=1

𝑄𝑆∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)

1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑗≠𝑖



𝑙𝑘=1

𝑡∈𝑇𝑢∈𝑈 − 𝜆𝛩‖𝛩‖𝐹2 (5.28)

The gradient descent is performed to optimize model parameters 𝛩 of ∆�̂�

formulated in Equation (5.28). It is to be noted that the 𝑄𝑆 and ∑ 𝑛𝑢,𝑙,𝑡𝑐𝑙=1 ∑ 𝑔𝑘

𝑙𝑘=1

coefficients can be disregarded as they have no influence on the optimization.

Given a case (𝑢, 𝑖, 𝑗, 𝑡) with respect to model parameters

{𝑚𝑢(1)

, 𝑚𝑖(2)

, 𝑚𝑗(2)

, 𝑚𝑡(3)

}, the gradient of 𝑠𝐺𝐴𝑃 can be achieved by computing the

derivation of Equation (5.28) as follows:

𝜕𝐿

𝜕𝜃=

𝜕

𝜕𝜃(∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦𝜎(∆�̂�)𝑗≠𝑖𝑐𝑘=1𝑖∈𝐼

1+∑ 𝜎(∆�̂�)𝑗≠𝑖𝑡∈𝑇𝑢∈𝑈 ) − 𝜆𝜃𝜃 (5.29)

1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

GAP vs Smoothed GAP (sGAP)

Point

Sc

ore

GAP

Smoothed GAP (sGAP)


By replacing 𝜎(𝛥�̂�) as 𝛿, for notational convenience, the derivation formulation can

be rewritten as:

𝜕𝐿

𝜕𝜃= ∑ ∑

∑ ∑ ∑ [(𝜕

𝜕𝜃(𝛽𝑔𝑦𝛿))(1+∑ 𝛿𝑗≠𝑖 )]−[(𝛽𝑔𝑦𝛿)(

𝜕

𝜕𝜃(1+∑ 𝛿𝑗≠𝑖 ))]𝑗≠𝑖


[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.30)

Given that:

𝜕

𝜕𝜃(𝛽𝑔𝑦𝛿) = 𝛽𝑔𝑦(−𝛿 + (𝛿)2)

𝜕

𝜕𝜃∆�̂� (5.31)

and

𝜕

𝜕𝜃(1 + ∑ 𝛿𝑗≠𝑖 ) = ∑ (−𝛿 + (𝛿)2)

𝜕

𝜕𝜃∆�̂�𝑗≠𝑖 (5.32)

The resulted gradient of 𝑠𝐷𝐶𝐺 is obtained as:

𝜕𝐿

𝜕𝜃= ∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦([(−𝛿+𝛿2)(1+∑ 𝛿𝑗≠𝑖 )𝜕

𝜕𝜃𝛥�̂�]−[𝛿(∑ (−𝛿+𝛿2)𝑗≠𝑖

𝜕

𝜕𝜃𝛥�̂�)])𝑗≠𝑖


[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑇𝑢∈𝑈 − 𝜆𝜃𝜃 (5.33)

Knowing that 𝛽𝑔𝑦 is actually determined by 𝕀(𝑦𝑢,𝑖,𝑡 > 1)𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘), i.e. the

entries must not be “irrelevant” and the grade of entries must be at least equal to the

threshold 𝜇𝑘; and that users are not using the whole available tags, the learning

algorithm can be modified such that it can run more efficiently and faster. In this

case, instead of calculating 𝑠𝐺𝐴𝑃 for the entire item set across all tags, the 𝑠𝐺𝐴𝑃 is

optimized only across tags that have been used by user u, 𝑉𝑢 = {𝑡|(𝑢,∗, 𝑡) ∈ 𝐴𝑜𝑏} ,

for items’ entries that are labelled as “relevant” or “likely relevant” for the user 𝑢, i.e.

𝑍𝑢 = {𝑖|𝑦𝑢,𝑖,𝑡 = 2 ∪ 𝑦𝑢,𝑖,𝑡 = 1}. Other entries are not necessary to be included since

their values are always less than any threshold graded values of 𝜇, which means that

𝕀(𝑦𝑢,𝑗,𝑡 ≥ 𝜇𝑘) will always be 0. As a result, the gradient of 𝑠𝐺𝐴𝑃 given a case

(𝑢, 𝑖, 𝑗, 𝑡) with respect to the model parameter, {𝑚𝑢(1)

, 𝑚𝑖(2)

, 𝑚𝑗(2)

, 𝑚𝑡(3)

}, becomes:

𝜕𝐿

𝜕𝜃= ∑ ∑

∑ ∑ ∑ 𝛽𝑔𝑦([(−𝛿+𝛿2)(1+∑ 𝛿𝑗≠𝑖 )𝜕

𝜕𝜃𝛥�̂�]−[𝛿(∑ (−𝛿+𝛿2)𝑗≠𝑖

𝜕

𝜕𝜃𝛥�̂�)])𝑗≠𝑖

𝑐𝑘=1𝑖∈𝑍𝑢

[1+∑ 𝛿𝑗≠𝑖 ]2𝑡∈𝑉𝑢𝑢∈𝑈 − 𝜆𝜃𝜃 (5.34)

To apply the 𝑠𝐺𝐴𝑃 optimization, the model only has to substitute the computation of

𝜕

𝜕𝜃Δ�̂� based on its parameters:

𝜕Δ�̂�

𝜕𝑚𝑢(1) = (𝑚𝑖

(2)⨀𝑚𝑡

(3)− 𝑚𝑗

(2)⨀𝑚𝑡

(3)) (5.35)


𝜕Δ�̂�

𝜕𝑚𝑖(2) = (𝑚𝑢

(1)⨀𝑚𝑡

(3)) (5.36)

𝜕Δ�̂�

𝜕𝑚𝑗(2) = −(𝑚𝑢

(1)⨀𝑚𝑡

(3)) (5.37)

𝜕Δ�̂�

𝜕𝑚𝑡(3) = (𝑚𝑢

(1)⨀𝑚𝑖

(2)− 𝑚𝑢

(1)⨀𝑚𝑗

(2)) (5.38)

where ⨀ denotes element-wise product. The Go-Rank learning algorithm is outlined

in Figure 5.9.

1: Algorithm: Go-Rank Learning

2: Input : Training set 𝐷𝑡𝑟𝑎𝑖𝑛 ⊆ 𝑈 × 𝐼 × 𝑇, threshold probability 𝑔 ∈{𝑔1, 𝑔2}, threshold grade 𝜇 ∈ {𝜇1, 𝜇2}, learning rate 𝛼, factor matrix

column size 𝐹, regularization 𝜆, maximal iteration 𝑖𝑡𝑒𝑟𝑀𝑎𝑥


4: 𝑄 = |𝑈| , 𝑅 = |𝐼| , 𝑆 = |𝑇| , 𝒴 ∈ ℝ𝑄×𝑅×𝑆 , 𝑦𝑢,𝑖,𝑡 ∈ {2,1,0, −1} 5: Populate 𝒴 using Equation (5.20)

6: 𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡)

7: 𝑍𝑢 = {𝑖|𝑦𝑢,𝑖,𝑡 = 2 ∪ 𝑦𝑢,𝑖,𝑡 = 1} 8: 𝑉𝑢 = {𝑡|(𝑢,∗, 𝑡) ∈ 𝐴}


∈ ℝ𝑅×𝐹, 𝑀(3)(0)∈ ℝ𝑆×𝐹, ℎ = 0

10: repeat


12: 𝑚𝑢(1)

⟵ 𝑚𝑢(1)

+ 𝛼𝜕𝐿

𝜕𝑚𝑢(1) based on Equation (5.34) and (5.35)


14: 𝑚𝑡(3)

⟵ 𝑚𝑡(3)

+ 𝛼𝜕𝐿

𝜕𝑚𝑡(3) based on Equation (5.34) and (5.38)


16: for 𝑡 ∈ 𝑉𝑢 do 17: for 𝑖 ∈ 𝑍𝑢 do 18: for 𝑘 ← 1 to 𝑐 do

19: for 𝑗 ≠ 𝑖 do

20: 𝑚𝑖(2)

⟵ 𝑚𝑖(2)

+ 𝛼𝜕𝐿

𝜕𝑚𝑖(2) based on Equation (5.34) and (5.36)

21: 𝑚𝑗(2)

⟵ 𝑚𝑗(2)

+ 𝛼𝜕𝐿

𝜕𝑚𝑗(2) based on Equation (5.34) and (5.37)

22: + + ℎ 23: until ℎ ≥ 𝑖𝑡𝑒𝑟𝑀𝑎𝑥

Figure 5.9. The Go-Rank learning algorithm


5.3.3.4 Complexity Analysis and Convergence

The complexity of the Go-Rank learning process is analysed on a single iteration.

The initial Go-Rank (Equation (5.28)) complexity is 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑐𝑅2)) where

𝑐 = 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡). Given that 𝒴 ∈ ℝ𝑄×𝑅×𝑆, the total number of possible entries of the

tensor model, i.e. the sum of “relevant”, “likely relevant”, irrelevant” and

“indecisive” entries, is calculated as |𝑌| = 𝑄𝑅𝑆. Since |𝑌| ≫ 𝑄, 𝑆, 𝑅, 𝐹, 𝑐, the overall

complexity of Go-Rank in one iteration can be regarded as |𝑌|. In other words, it is

linear to the total number of possible entries of the tensor model.

After the implementation of the proposed approach for making the learning run

efficiently and faster, the Go-Rank complexity (illustrated in Figure 5.9), becomes

𝑂(𝐹(𝑄 + 𝑆 + 𝑄�̃� 𝑐�̃�2)) where �̃� and �̃� denote the average number of 𝑉𝑢 and 𝑍𝑢.

Since �̃� ≪ 𝑆 and �̃� ≪ 𝑅, the Go-Rank complexity now becomes |�̃�|, i.e. the sum of

“relevant” and “likely relevant” entries of the tensor model, where |�̃�| ≪ |𝑌|.


Since Go-Rank implements a ranking-based interpretation scheme and learning-to-

rank model, the resulted latent factors of Go-Rank can be directly used for generating

the Top-𝑁 list of item recommendations for each target user. Using Equation (5.5),

the predicted preference score �̂�𝑢,𝑖,𝑡 of target user 𝑢 to item 𝑖 on tag 𝑡 is calculated.

The candidate items of user 𝑢 are identified based on the maximum �̂�𝑢,𝑖,𝑡 of each

user-item set. The score of candidate items are then ranked in descending order for

generating the list of the recommended items.


The proposed Go-Rank method and benchmarking methods are evaluated by 5-fold






𝐷𝑡𝑒𝑠𝑡. The performance evaluation is measured and reported over the average values


on all five runs using AP and NDCG, presented at various Top-𝑁 positions, as well

as MAP. The performance of Go-Rank is compared with benchmarking methods,

including MAX (Symeonidis et al., 2010), PITF (Rendle and Schmidt-Thieme,

2010), and CTS (Kim et al., 2010). It is to be noted that comparison amongst all

proposed methods is presented in Chapter 6.

To enable meaningful comparisons, the parameter values for all methods are

tuned by randomly selected 25% of all the observed data available in 𝐷𝑡𝑟𝑎𝑖𝑛. For all

tensor-based methods, the size of latent factor matrix 𝐹 is set to 128 as the

recommendation quality usually does not benefit from more than that value. The

learning rate 𝛼 and regularization 𝜆 for PITF are set as 0.01 and 5𝑒−05, respectively,

as suggested in the article (Ifada and Nayak, 2015). The parameters for MAX are

empirically tuned as t𝑜𝑙𝑒𝑟𝑎𝑛𝑐𝑒 = 1𝑒−04 and 𝜆 = 0. For CTS, the neighbourhood

size 𝑘 and model size 𝑤 are all searched from the grid of

{10,20,30,40,50,60,70,80,90,100}. The learning rate 𝛼 and regularization 𝜆 for Go-

Rank are adjusted from 0.01 to 0.1 and 0.00001 to 0.00005, respectively.

5.3.5.1 Impact of graded-relevance Scheme

The impact of implementing graded-relevance scheme is investigated by comparing

the tensor 𝒴 entries population generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using the proposed graded-

relevance scheme with those of the boolean and set-based schemes. Comparison

with UTS, a previously proposed interpretation scheme, is presented in Chapter 6.

The statistics of the tensor entries population listed in Table 5.6 shows that the

“relevant” entries populations on all schemes are the same. Concurring with the

previous outcome described in Section 5.3.2, the boolean scheme generates the least

variety of distinct entries in comparison to the two other schemes. A significant

difference between the set-based and graded-relevance schemes is that the latter

breaks down the “irrelevant” entries of the former into “likely relevant” and

“irrelevant” entries while the “indecisive” entries remain the same, as previously

described in Section 5.3.2. In this case, the graded-relevance scheme reveals the

small number of “irrelevant” entries from the set-based scheme as “likely relevant”

entries, i.e. less than 16%. A portion of these entries can then possibly be regarded as

“relevant” entries by using the threshold probability (Robertson et al., 2010), as later

shown in Section 5.3.5.3.









Irrelevant 0.4377 0.5670 0.6737

Indecisive 99.5617 99.4316 99.3236

graded-relevance Relevant 0.0006 0.0014 0.0027

Likely Relevant 0.0092 0.0171 0.0260

Irrelevant 0.4285 0.5499 0.6477

Indecisive 99.5617 99.4316 99.3236




Irrelevant 1.2937 1.7233 2.1553

Indecisive 98.7021 98.2675 97.8284


Likely Relevant 0.0399 0.0790 0.1311

Irrelevant 1.2538 1.6443 2.0242

Indecisive 98.3021 98.2675 97.8284




Irrelevant 0.3176 0.4777 0.5995

Indecisive 99.6814 99.5186 99.3910


Likely Relevant 0.0137 0.0327 0.0523

Irrelevant 0.3039 0.4450 0.5472

Indecisive 99.6814 99.5186 99.3910




Irrelevant 1.6114 2.0969 2.4147

Indecisive 98.3824 97.8748 97.5569


Likely Relevant 0.1463 0.2493 0.3729

Irrelevant 1.4651 1.8476 2.0418

Indecisive 98.3824 97.8748 97.5569

Table 5.6. The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using

boolean, set-based, and graded-relevance schemes



The recommendation performance comparisons of the proposed Go-Rank and the

benchmarking methods in terms of NDCG, AP, and MAP on each dataset are listed

in Table 5.7, Table 5.8, Table 5.9, and Table 5.10. Note that, in contrast to AP, the

higher the Top-𝑁 position, the less the NDCG score is.

The proposed Go-Rank performance in comparison to the benchmarking

methods varies on each dataset. On the Delicious dataset, Go-Rank outperforms the

benchmarking methods, other than PITF. On the LastFM and CiteULike datasets,

Go-Rank achieves superior results on the 15 and 20-cores in terms of any evaluation

measure and at all Top-𝑁 positions. However, results on the 10-core are not showing

the same trends. On the other hand, Go-Rank reaches a constant outperformance on

any 𝑝-core size of the MovieLens dataset. These variations can be explained by

observing the tensor entries population distribution of each dataset listed in Table

5.6. Go-Rank inferior results only occur on a dataset with very low “relevant” entries

population (i.e. less than 0.0030%). On a dataset with higher “relevant” entries

population, Go-Rank shows its superiority. In this case, the size of 𝑝-core is

impacting the Go-Rank improvement over benchmarking methods. That is, the

performance improvement is linear to the size of 𝑝-core on all datasets. Figure 5.10

shows the Go-Rank improvement in terms of AP@5 over one of the benchmarking

methods, PITF. Seeing the trend, Go-Rank may outperform PITF on the Delicious

dataset with larger 𝑝-core size. In general, all of these results confirm that Go-Rank

is an effective approach for learning the tensor model built with data labelled using

the proposed graded-relevance interpretation scheme, in which its GAP-based

optimization enables the learning model to set up thresholds so that the “likely

relevant” entries can be regarded as either “relevant” or “irrelevant” entries.




MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75

PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29

CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07

Go-Rank 1.94 1.74 4.51 5.02 2.65 2.05 1.82 4.60 5.15 3.19 2.47 2.18 5.65 6.33 4.55

Table 5.7. NDCG, AP, and MAP on Delicious dataset



MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11

PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52

CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59

Go-Rank 7.39 6.35 14.22 14.15 5.65 8.60 7.59 16.38 16.68 7.27 10.28 8.91 19.33 19.34 8.93

Table 5.8. NDCG, AP, and MAP on LastFM dataset




MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.50 6.88 3.97 3.48 8.31 9.24 8.82

PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.20 7.18 4.65 3.74 8.54 9.91 9.78

CTS 2.2 1.87 4.94 5.34 4.31 3.87 3.20 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33

Go-Rank 3.11 2.77 5.77 6.20 4.33 5.04 4.24 10.16 10.77 9.19 6.09 4.48 10.66 11.61 11.52

Table 5.9. NDCG, AP, and MAP on CiteULike dataset



MAX 6.23 5.10 10.25 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.40

PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.40 9.98 5.98 6.28 5.65 12.26 12.90 8.73

CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.30

Go-Rank 6.24 5.11 10.52 10.93 6.18 8.61 7.01 16.24 17.02 9.73 11.34 9.34 21.32 21.73 14.46

Table 5.10. NDCG, AP, and MAP on MovieLens dataset


Figure 5.10. Go-Rank improvement over PITF

It is worthwhile to highlight the reasons of Go-Rank outperformance over PITF

on all datasets, except the Delicious dataset. These two methods implement the non-

boolean interpretation schemes to build the tensor models, i.e. Go-Rank uses the

graded-relevance scheme while PITF uses the set-based schemes. Go-Rank attains

great improvement over PITF for two main reasons. Go-Rank builds the learning

model as a list-wise ranking model, i.e. aiming to get the order of all lists correctly,

whereas PITF builds the learning model as a pair-wise ranking model, which means

that it attempts to get the correct ranking order within each pair only. Moreover, Go-

Rank enhances the Top-𝑁 recommendation performance by optimizing the top-

biased measure GAP, i.e. the generalisation of AP for ordinal relevance data. While

PITF implements the equal-penalty measure AUC (Shi, Karatzoglou, Baltrunas,

Larson, Hanjalic, et al., 2012). Additionally, CTS underperforms Go-Rank, which

indicates that projecting the ternary relations of tagging data into a two-dimensional

model is adversely impacting recommendation quality (Symeonidis et al., 2010).

5.3.5.3 Impact of Probability Values

The impact of probability values is examined to demonstrated how “relevant” is the

“likely relevant” data. The probabilities values, 𝑔 ∈ {𝑔1, 𝑔2} where 𝑔1 + 𝑔2 = 1, are

the probabilities that regulate whether the “likely relevant” entries should be

regarded as “relevant” or “irrelevant”. It is to be noted that 𝑔1 and 𝑔2 determine the

Delicious LastFM CiteULike MovieLens-20

-10

0

10

20

30

40

50

60

70

80

Dataset

AP

@5

Imp

rove

men

t (%

)

10-core

15-core

20-core


percentage of considering the “likely relevant” entries as “relevant” and ‘irrelevant”

respectively. For example, 𝑔1 = 0.1 indicates that 10% of “likely relevant” entries

will be considered as “relevant” and the other 90% of those will be considered as

“irrelevant”. The experiments were conducted with a total grid of probability values

between 0 and 1 with an interval of 0.1, resulting

𝑔1 ∈ {0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}. The threshold of regarding the

“likely relevant” entries as either “relevant” or ‘irrelevant” entries is fixed as

𝜇 ∈ {2,1}, following the grades of the “relevant” and “likely relevant” entries

formulated as in Equation (5.20). Depending on which threshold value is used,

𝑦𝑢,𝑗,𝑡 ≥ 𝜇 will determine the entries as “relevant” and the others as “irrelevant”.

For the Delicious dataset, Figure 5.11 shows that not all of the “likely relevant”

entries are “irrelevant” since the highest AP@5 are achieved when 𝑔1 = 0.3,

𝑔1 = 0.2 and 𝑔1 = 0.1 for the 10-core, 15-core and 20-core sets, respectively. This

means that, a total of 30%, 20% or 10% of the “likely relevant” entries are actually

found “relevant”. For the LastFM dataset, Figure 5.12 shows that the highest AP@5

is achieved when 𝑔1 = 0.1, 𝑔1 = 0.1 and 𝑔1 = 0.2 for the 10-core, 15-core and 20-

core sets, respectively. Hence, a total of 10%, 10% or 20% of the “likely relevant”

entries of the sets can be regarded as “relevant”. Likewise, Figure 5.13 shows that a

total of 20%, 20% or 40% of the “likely relevant” entries of the sets can be regarded

as “relevant” for the 10-core, 15-core and 20-core of CiteULike dataset as the

highest AP@5 are achieved when 𝑔1 = 0.2 𝑔1 = 0.2 and 𝑔1 = 0.4, respectively.

Finally, complementing the results of other datasets, the impact of probability values

experiments results on the MovieLens dataset shows that not all of the “likely

relevant” entries are “irrelevant”. Figure 5.14 shows the highest AP@5 are achieved

when 𝑔1 = 0.1, 𝑔1 = 0.2 and 𝑔1 = 0.3 for the 10-core, 15-core and 20-core sets,

respectively. In other words, a total of 10%, 20% or 30% of the “likely relevant”

entries are actually found “relevant”.


(a) (b) (c)

Figure 5.11. Impact of probability values on the Delicious dataset

(a) (b) (c)

Figure 5.12. Impact of probability values on the LastFM dataset

(a) (b) (c)

Figure 5.13. Impact of probability values on the CiteULike dataset

(a) (b) (c)

Figure 5.14. Impact of probability values on the MovieLens dataset

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

1

2

3

4

5Delicious 10-core

Probability Values (g1)

AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

1

2

3

4

5Delicious 15-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

1

2

3

4

5

6Delicious 20-core


AP

@5

(%

)0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0

5

10

15LastFM 10-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20LastFM 15-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20LastFM 20-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

1

2

3

4

5

6

CiteULike 10-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

2

4

6

8

10

12

CiteULike 15-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

2

4

6

8

10

12

CiteULike 20-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

2

4

6

8

10

12MovieLens 10-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20MovieLens 15-core


AP

@5

(%

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00

5

10

15

20

25MovieLens 20-core


AP

@5

(%

)


All of the results establish that any items other than those appearing in the

observed entries, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, should not simply be regarded as

“irrelevant” since the user has revealed his interest to some of them using other tags

to annotate them, regarded as “transitional” entries. The graded-relevance scheme is

an efficient scheme, as it sets those entries as distinct entries positioned between the

“relevant” and “irrelevant” entries, and label them as “likely relevant”.

Additionally, the correlation between the impacts of probability values to the

size of 𝑝-core can also be observed from the results. That is, the higher the 𝑝-core

size, change in probability value does not make much difference to performance,

except on the LastFM dataset – but it may happen for a much larger 𝑝-core size. In

this case, smaller 𝑝-core sizes are most affected by the variation of the probability

values. This observation indicates that the implementation of the graded-relevance

scheme on Go-Rank highlights data granularity more effectively on a sparse dataset

in comparison to that of a dense dataset.

5.3.5.4 Scalability

The scalability of Go-Rank is examined in terms of its learning running time to study

the impact of implementing the “fast learning” approach on optimizing 𝑠𝐺𝐴𝑃 as

defined in Equation (5.34). The examination is demonstrated on the 10-core of the

Delicious and Movielens datasets, as implementation on other datasets and cores

shows similar results. The learning running time is measured on a single iteration at

various scales, i.e. 10% to 100% of training set (𝐷𝑡𝑟𝑎𝑖𝑛). Figure 5.15 shows that the

fast learning approach time is much faster, compared to that of the “original

learning” approach (Equation (5.33)) on both datasets.

(a) (b)

Figure 5.15. The Go-Rank scalability

10 20 30 40 50 60 70 80 90 10010

5

106

107

108

109

1010


Lo

g R

un

nin

g t

ime

(se

c)

Delicious 10-core

Fast Learning

Original Learning

10 20 30 40 50 60 70 80 90 10010

4

105

106

107

108

109


Lo

g R

un

nin

g t

ime

(se

c)

MovieLens 10-core

Fast Learning

Original Learning


5.3.5.5 Convergence

The learning algorithm convergence of Go-Rank is demonstrated on the 20-core of

CiteULike and MovieLens datasets, whereas the convergence behaviours of other

datasets and cores are the same. Figure 5.16 shows the evolution of AP@5 across

iterations on the training (𝐷𝑡𝑟𝑎𝑖𝑛) and test (𝐷𝑡𝑒𝑠𝑡) sets, where the score gradually

increases along the iterations and converges after a few iterations. These results

demonstrate that Go-Rank effectively optimizes GAP.

(a) (b)

Figure 5.16. The Go-Rank convergence

5.3.6 Summary of Learning from Graded-Relevance Data

In this section, the GAP Optimization for Learning-to-Rank (Go-Rank) method is

proposed for learning from graded-relevance data on a tag-based item

recommendation model. Go-Rank proposes the graded-relevance scheme for

interpreting the tagging data and directly optimizes the (smoothed) GAP for learning

the tensor model for generating an item recommendation list. To improve the

scalability of GoRank, a fast learning approach that applies sparsity aware

optimization is implemented.


that:

The graded-relevance is an efficient scheme as it leverages the tagging data

more effectively. It establishes that, on each (𝑢, 𝑡) ∈ 𝐴𝑜𝑏 set, any items other

5 10 15 20 25 300

2

4

6

8

10

12

14

16

18

20

Number of iteration

AP

@5 (

%)

CiteULike 20-core

Training set

Test set

5 10 15 20 25 300

10

20

30

40

50

60

Number of iteration

AP

@5 (

%)

MovieLens 20-core

Training set

Test set


than those appearing in observed entries should not simply be all regarded as

“irrelevant” entries, as considered in the set-based scheme;

Go-Rank is scalable and outperforms all benchmarking methods on the

NDCG, AP, and MAP measures on most of the datasets. All of these

ascertain that implementing the graded-relevance scheme and optimizing

GAP for building the learning model improve the recommendation

performance. A portion of “likely relevant” entries that are found “relevant”

consequently assists Go-Rank to produce a high quality recommendation.

5.4 CHAPTER SUMMARY

This chapter has detailed the two proposed list-wise based ranking recommendations

methods, namely DCG Optimization for Learning-to-Rank (Do-Rank) and GAP

Optimization for Learning-to-Rank (Go-Rank), to solve the tag-based item


Do-Rank is developed for learning from multi-graded data using the DCG

ranking evaluation measure as an optimization criterion. As demonstrated in the

results, Do-Rank outperforms all benchmarking methods on the NDCG, AP, and

MAP measures on most datasets. The proposed UTS scheme, implemented for Do-

Rank, efficiently interprets the tagging data and improves Do-Rank scalability.

Meanwhile, Go-Rank is developed for learning from graded-relevance data using the

GAP ranking evaluation measure as an optimization criterion. The proposed graded-

relevance scheme that encourages tensor density is implemented to populate the

tensor entries for efficient and fast learning. The experimental results have

demonstrated that graded-relevance efficiently interprets the tagging data and that

Go-Rank is scalable and outperforms all benchmarking methods on the NDCG, AP,

and MAP measures on most of datasets.


Chapter 6: Performance Comparisons and Analysis 165

Chapter 6: Performance Comparisons and

Analysis

In Chapter 4, two point-wise based ranking recommendations methods including

Tensor-based Item Recommendation using Probabilistic Ranking (TRPR) and

Recommendation Ranking using Weighted Tensor (We-Rank) were discussed. In

Chapter 5, two list-wise based ranking recommendations methods including DCG

Optimization for Learning-to-Rank (Do-Rank) and GAP Optimization for Learning-

to-Rank (Go-Rank) were described. However, comparisons of all the proposed

methods and benchmarking methods have not been conducted. Characteristics such

as the strength and shortcomings of methods, method which achieves the best

performance, and the impact of an interpretation scheme are unknown. Note that

NDCG, AP, and MAP are used as the evaluation measures, as described in Section

3.4, as they are more widely used to measures for ranking performance in

comparison to F1-Score.

In this chapter, the results from all four methods are compared and analysed to

determine when would be the best situation to use a method. This chapter focuses on:

To analyse the impact of interpretation scheme to tensor entries population;

To analyse the impact of users’ tagging behaviour to tensor entries

population;

To analyse the impact of “relevant” entries to methods’ performances;

To analyse the impact of handling “likely relevant” entries to methods’

performances;

To analyse the impact of 𝑝-core to tensor entries population and performance

of the recommendation methods;

To compare and analyse the performance of two learning-to-rank approaches:

point-wise and list-wise based ranking methods;

To compare the performance of the proposed and benchmarking methods;

166 Chapter 6: Performance Comparisons and Analysis

To compare and analyse the performance, including accuracy, computation

complexity, scalability and efficiency of the proposed methods;

To discuss the strengths and shortcomings of the proposed methods.

Note that TRPR-CP is chosen to represent the performance of TRPR as

implementing CP technique results into the best performance in comparison to

implementing other factorization techniques as observed from results in Table 6.1,

Table 6.2, Table 6.3, and Table 6.4.




MAX 1.90 1.70 4.22 4.70 2.38 2.00 1.77 4.14 4.63 2.86 2.10 1.85 4.73 5.34 3.75

PITF 2.30 1.99 5.33 5.88 2.60 2.12 1.93 4.77 5.52 3.16 2.52 2.18 5.81 6.48 4.29

CTS 1.85 1.66 4.02 4.59 2.51 2.03 1.84 4.47 4.99 3.11 2.28 2.05 5.20 5.83 4.07

TRPR-CP 2.38 2.09 5.37 5.94 3.13 2.77 2.43 6.32 7.05 4.05 2.78 2.48 6.47 7.22 4.81

TRPR-HOSVD 1.99 1.76 4.48 4.71 3.07 2.16 2.11 4.72 5.77 3.63 2.12 1.92 4.77 5.63 4.78

TRPR-HOOI 2.01 2.11 4.27 5.49 3.55 2.39 2.16 5.70 6.59 3.77 2.12 1.92 4.77 5.63 4.78

We-Rank 1.46 1.42 1.86 2.06 2.14 2.00 1.76 3.92 4.36 2.40 2.09 1.79 4.44 4.83 3.29

Do-Rank 2.31 1.98 5.22 5.72 2.78 2.36 2.09 5.25 5.93 3.54 2.69 2.26 6.02 6.64 4.63

Go-Rank 1.94 1.74 4.51 5.02 2.65 2.05 1.82 4.60 5.15 3.19 2.47 2.18 5.65 6.33 4.55

Table 6.1. The proposed and benchmarking methods performances on Delicious dataset



MAX 6.68 5.92 12.38 12.48 5.33 7.69 6.63 13.83 13.84 6.31 8.26 7.22 14.54 14.65 7.11

PITF 7.56 6.45 13.97 14.31 5.96 8.15 7.15 15.43 15.67 7.05 8.41 7.58 16.16 16.62 7.52

CTS 4.87 4.17 12.36 12.47 3.87 7.78 6.94 14.27 14.57 6.55 8.93 7.83 16.56 16.93 7.59

TRPR-CP 6.56 5.68 13.63 13.71 5.20 8.38 7.40 15.84 16.05 7.31 9.84 8.57 18.16 18.44 8.86

TRPR-HOSVD 6.66 5.99 13.11 13.27 5.35 7.76 6.67 14.21 14.54 6.53 8.85 7.86 16.89 17.49 7.57

TRPR-HOOI 6.52 5.69 13.11 13.27 5.35 7.85 6.99 14.90 14.28 6.39 8.85 7.86 16.89 17.49 7.57

We-Rank 4.09 4.06 8.62 9.24 3.63 6.74 6.56 9.86 10.71 6.20 8.41 7.56 16.15 16.59 7.51

Do-Rank 8.15 7.05 14.12 14.45 6.50 8.55 7.56 15.51 15.68 7.09 9.40 8.29 17.13 17.52 7.61

Go-Rank 7.39 6.35 14.22 14.15 5.65 8.60 7.59 16.38 16.68 7.27 10.28 8.91 19.33 19.34 8.93

Table 6.2. The proposed and benchmarking methods performances on LastFM dataset




MAX 3.18 2.73 6.09 6.45 4.58 3.93 3.26 8.14 8.5 6.88 3.97 3.48 8.31 9.24 8.82

PITF 3.18 2.68 5.57 5.05 4.28 3.94 3.58 8.51 9.2 7.18 4.65 3.74 8.54 9.91 9.78

CTS 2.20 1.87 4.94 5.34 4.31 3.87 3.2 8.44 8.97 7.91 5.16 4.32 10.23 11.05 11.33

TRPR-CP 2.60 2.28 5.81 6.38 4.83 4.41 3.65 9.69 10.19 8.89 5.97 5.20 12.34 13.30 13.25

TRPR-HOSVD 3.19 2.72 6.08 6.46 4.83 3.95 3.27 8.87 8.73 8.89 5.56 4.14 11.12 12.83 14.67

TRPR-HOOI 3.15 2.18 5.97 5.89 4.77 3.95 3.04 8.89 7.74 8.51 5.56 4.16 11.12 12.83 14.67

We-Rank 1.91 1.82 3.96 4.48 4.06 3.13 3.07 5.56 6.32 6.92 4.61 3.74 8.52 9.89 9.75

Do-Rank 3.08 2.77 5.42 5.95 4.47 4.95 4.18 10.28 10.90 9.31 5.17 4.49 10.62 11.64 11.43

Go-Rank 3.11 2.77 5.77 6.20 4.33 5.04 4.24 10.16 10.77 9.19 6.09 4.48 10.66 11.61 11.52

Table 6.3. The proposed and benchmarking methods performances on CiteULike dataset



MAX 6.23 5.10 10.50 10.56 5.36 7.27 6.05 13.84 14.22 7.36 8.40 7.21 16.46 16.82 10.4

PITF 4.14 3.82 8.34 9.21 4.36 4.33 4.23 9.4 9.98 5.98 6.28 5.65 12.26 12.9 8.73

CTS 5.97 5.10 10.04 10.38 5.68 6.24 5.22 12.08 12.58 7.08 8.17 7.10 15.82 16.07 10.3

TRPR-CP 7.26 5.96 13.56 13.92 7.04 8.21 6.75 16.24 16.63 9.53 11.15 9.11 21.22 21.71 13.79

TRPR-HOSVD 6.11 5.11 10.65 11.07 5.30 7.27 6.08 13.90 14.71 7.55 10.21 8.53 20.27 21.50 12.93

TRPR-HOOI 5.96 5.01 10.25 11.12 5.39 7.78 5.96 14.10 14.74 7.85 8.51 7.36 17.07 18.38 12.14

We-Rank 4.12 3.51 8.01 9.33 4.29 5.68 5.22 12.17 13.13 7.85 8.24 7.13 15.97 16.36 11.42

Do-Rank 6.28 5.39 11.00 11.47 6.22 8.61 7.13 16.49 16.96 9.81 11.34 9.21 21.06 21.68 14.28

Go-Rank 6.24 5.11 10.52 10.93 6.18 8.61 7.01 16.24 17.02 9.73 11.34 9.34 21.32 21.73 14.46

Table 6.4. The proposed and benchmarking methods performances on MovieLens dataset


6.1 IMPACT OF INTERPRETATION SCHEME TO TENSOR ENTRIES

POPULATIONS

The tensor entries population resulted from 𝐷𝑡𝑟𝑎𝑖𝑛 using boolean, UTS, and graded-

relevance schemes listed in Table 5.5 and Table 5.6 are combined as Table 6.5 in

order to study the impact of the interpretation scheme to tensor entries.







UTS Relevant 0.0006 0.0014 0.0027

Irrelevant 0.4285 0.5499 0.6477

Indecisive 99.5709 99.4487 99.3496


Likely Relevant 0.0092 0.0171 0.0260

Irrelevant 0.4285 0.5499 0.6477

Indecisive 99.5617 99.4316 99.3236



UTS Relevant 0.0042 0.0092 0.0163

Irrelevant 1.2538 1.6443 2.0242

Indecisive 98.7420 98.3465 97.9595


Likely Relevant 0.0399 0.0790 0.1311

Irrelevant 1.2538 1.6443 2.0242

Indecisive 98.7021 98.2675 97.8284



UTS Relevant 0.0010 0.0037 0.0095

Irrelevant 0.3039 0.4450 0.5472

Indecisive 99.6951 99.5513 99.4433


Likely Relevant 0.0137 0.0327 0.0523

Irrelevant 0.3039 0.4450 0.5472

Indecisive 99.6814 99.5186 99.3910



UTS Relevant 0.0062 0.0283 0.0284

Irrelevant 1.4651 1.8476 2.0418

Indecisive 98.5287 98.1241 97.9298


Likely Relevant 0.1463 0.2493 0.3729

Irrelevant 1.4651 1.8476 2.0418

Indecisive 98.3824 97.8748 97.5569

Table 6.5 . The comparison of tensor entries population distribution generated from 𝐷𝑡𝑟𝑎𝑖𝑛 using

boolean, UTS, and graded-relevance schemes


Table 6.5 shows that boolean, UTS, and graded-relevance schemes generate

two, three, and four varieties of distinct entries, respectively. Within each dataset and

core, the “relevant” entries population of all schemes are the same, while the

“indecisive” entries are not. Meanwhile, the UTS and graded-relevance schemes

generate the same number of “irrelevant” entries as the additional distinct entries of

graded-relevance, i.e. “likely relevant”, were revealed from the “indecisive” entries.

6.2 IMPACT OF 𝒑-CORE TO TENSOR ENTRIES POPULATIONS AND

METHOD PERFORMANCES

Each dataset used in this thesis has been refined using various 𝑝-core sizes, i.e. 10,

15, and 20-cores. Figure 6.1 depicts the correlation between 𝑝-core size and entries

population distribution within each scheme. The comparison is demonstrated as the

average of entries populations of all datasets on each 𝑝-core size. From Figure 6.1, it

can be observed that the “relevant”, “likely relevant”, or “irrelevant” entries

population distribution is linear to the size of the 𝑝-core. In contrast, the “indecisive”

entries populations are decreasing on larger 𝑝-core sizes. Therefore, in general, as

seen in Chapters 4 and 5, the performance of methods that implement UTS and

graded-relevance schemes is improved for datasets with a larger 𝑝-core size in

comparison to methods that implement the boolean scheme.

Figure 6.1. Comparison of size of 𝑝-core over tensor entries population on boolean, UTS, and graded-

relevance schemes

Figure 6.2, Figure 6.3, and Figure 6.4 show the comparison of 𝑝-core size to

method performances in terms of NDCG, AP, and MAP, respectively. It can be

observed that the size of the 𝑝-core is linear to the performances of all proposed

Relevant Irrelevant&Indecisive

100

101

102

103

104

boolean

Ave

rag

e o

f E

ntr

ies

Po

pu

lati

on

(%)

Relevant Irrelevant Indecisive

100

101

102

103

104

UTS

Relevant LikelyRelevant Irrelevant Indecisive

100

101

102

103

104

graded-relevance

10-core

15-core

20-core

10-core

15-core

20-core

10-core

15-core

20-core


methods on any evaluation measure. The results indicate the robustness of the

proposed methods over 𝑝-core refinement procedure. Note that Table 6.5 shows that

the size of 𝑝-core is linear to the “relevant” entries population.

Figure 6.2. Comparison of 𝑝-core over methods performances using NDCG

Figure 6.3. Comparison of 𝑝-core over methods performances using AP

Figure 6.4. Comparison of 𝑝-core over methods performances using MAP

6.3 IMPACT OF USERS TAGGING BEHAVIOURS TO TENSOR

ENTRIES POPULATIONS

A Social Tagging System (STS) allows its users to use different tags for annotating

the same item as well as the same tag being able to be used for annotating different

items. This facilitates the users’ tagging behaviours to be reflected differently in each

STS by observing an observed set, user-item or user-tag, as dominant. The user-item

set is more dominant than the user-tag set when users prefer to use less tags for

annotating items. Conversely, the user-tag set is considered more dominant than the

user-item set when users prefer to use more tags for annotating items. Figure 6.5

10 15 201

1.5

2

2.5

3

Delicious

p-core

ND

CG

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 204

5

6

7

8

9

LastFM

p-coreN

DC

G@

10 (

%)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 201

2

3

4

5

6

CiteULike

p-core

ND

CG

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 202

4

6

8

10

MovieLens

p-core

ND

CG

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 202

3

4

5

6

7

8

Delicious

p-core

AP

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 208

10

12

14

16

18

20

LastFM

p-core

AP

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 204

6

8

10

12

14

CiteULike

p-core

AP

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 205

10

15

20

25

MovieLens

p-core

AP

@10

(%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 202

2.5

3

3.5

4

4.5

5

Delicious

p-core

MA

P (%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 203

4

5

6

7

8

9

LastFM

p-core

MA

P (%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 204

6

8

10

12

14

CiteULike

p-core

MA

P (%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 204

6

8

10

12

14

16

MovieLens

p-core

MA

P (%

)

TRPR-CP

We-Rank

Do-Rank

Go-Rank


displays the statistics of observed sets within each dataset used in this thesis. The

statistic shows that users of the Delicious dataset have the same tagging behaviour as

those of CiteULike, i.e. the user-tag set is more dominant than user-item set. In

contrast, the users of LastFM and MovieLens have a more dominant user-item set

than user-tag set.

(a) (b)

(c) (d)

Figure 6.5. The statistic of user-item and user-tag sets on: (a) Delicious, (b) LastFM, (c) CiteULike,

and (d) MovieLens datasets

Next, comparison of the “relevant” over “irrelevant” and “likely relevant”

entries population generated using a graded-relevance scheme is conducted to study

the impact of user tagging behaviour to entries population. In this case, the values of

“relevant” entries populations of all datasets and cores are compiled and sorted in

ascending order. Note that the “irrelevant” entries of both the UTS and graded-

relevance scheme are equal, as listed in Table 6.5.

Results in Figure 6.6 and Figure 6.7, respectively, point out that, given the

“relevant” entries population, a dataset with a dominant user-item set generates more

“irrelevant” and “likely relevant” entries populations in comparison to that of a

dominant user-tag set. The reason for this is because both the “irrelevant” and “likely

relevant” entries populations are generated from the tagging data using graded-

relevance and/or UTS schemes based on each observed user-tag set. Consequently, a

dataset with a more dominant user-item set can reveal less “indecisive” entries

population and results in more “irrelevant” and “likely relevant” entries. Therefore,

10-core 15-core 20-core0

0.5

1

1.5

2

2.5x 10

4 Delicious

p-core

# se

t

User-Item

User-Tag


0.5

1

1.5

2

2.5

3x 10

4 LastFM

p-core

# se

t

User-Item

User-Tag


2000

4000

6000

8000

10000

CiteULike

p-core

# se

t

User-Item

User-Tag


2000

4000

6000

8000

MovieLens

p-core

# se

t

User-Item

User-Tag


in general, the performance of datasets with a dominant user-item set is improved

when implemented with methods that implement UTS and graded-relevance schemes

in comparison to datasets with a dominant user-tag set.

Figure 6.6. Comparison of “relevant” over “irrelevant” entries population

Figure 6.7. Comparison of “relevant” over “likely relevant” entries

6.4 IMPACT OF “RELEVANT” ENTRIES TO METHOD

PERFORMANCES

To study the impact of “relevant” entries population to method performance, the

values of “relevant” entries populations of all datasets and cores are compiled and

sorted in ascending order. Figure 6.8, Figure 6.9, and Figure 6.10 show the

comparisons in terms of NDCG, AP, and MAP, respectively. It can be observed that

the “relevant” entries population does not determine the method performance. That

is, a larger “relevant” entries population is not guaranteed to result in a higher

performance score. Note that results on any other entries population also showed the

same behaviour. The results indicate that methods do not solely depend on a single

distinct entry for learning the tensor model in order to generate list of item

0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840

0.5

1

1.5

2

2.5

Del

icio

us

10-c

ore

Cit

eULi

ke10

-co

re Del

icio

us

15-c

ore

Del

icio

us

20-c

ore

Cit

eULi

ke15

-co

re

Last

FM

10-c

ore

Mo

vieL

ens

10-c

ore La

stF

M15

-co

re

Cit

eULi

ke20

-co

re

Last

FM

20-c

ore

Mo

vieL

ens

15-c

ore

Mo

vieL

ens

20-c

ore

"Relevant" Entries (%)

"Irr

elev

ant"

En

trie

s (%

)

"Relevant" vs "Irrelevant" Entries Population

Dominant User-Item

Dominant User-Tag

0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Del

icio

us

10-c

ore

Cit

eULi

ke10

-co

re

Del

icio

us

15-c

ore

Del

icio

us

20-c

ore

Cit

eULi

ke15

-co

re

Last

FM

10-c

ore

Mo

vieL

ens

10-c

ore

Last

FM

15-c

ore

Cit

eULi

ke20

-co

re

Last

FM

20-c

ore

Mo

vieL

ens

15-c

ore

Mo

vieL

ens

20-c

ore

"Relevant" Entries (%)

"Lik

ely

Rel

evan

t" E

ntr

ies

(%)

"Relevant" vs "Likely Relevant" Entries Population

Dominant User-Item

Dominant User-Tag


recommendations. Each method populates tensor entries by employing an

interpretation scheme with various distinct entries where each distinct entry (and

population) of the same scheme is correlated to one another, as listed in Section 6.1.

TRPR and We-Rank implement the boolean scheme, which generates two distinct

entries, while Do-Rank and Go-Rank implement the UTS and graded-relevance

schemes, which generate three and four distinct entries respectively. Therefore, the

outperformance amongst methods cannot be simply determined based on a certain

distinct entry.

Figure 6.8. Comparison of “relevant” entries population over methods performances using NDCG

Figure 6.9. Comparison of “relevant” entries population over methods performances using AP

Figure 6.10. Comparison of “relevant” entries population over methods performances using MAP

0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840

2

4

6

8

10

12

"Relevant" Entries Population vs Method Performance

"Relevant" entries (%)

ND

CG

@10

(%

)

Del

icio

us

10-c

ore

Cit

eULi

ke10

-co

re

Del

icio

us

15-c

ore

Del

icio

us

20-c

ore C

iteU

Like

15-c

ore

Last

FM

10-c

ore

Mo

vieL

ens

10-c

ore

Last

FM

15-c

ore

Cit

eULi

ke20

-co

re

Last

FM

20-c

ore

Mo

vieL

ens

15-c

ore

Mo

vieL

ens

20-c

ore

TRPR-CP

We-Rank

Do-Rank

Go-Rank

0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840

5

10

15

20

25



AP

@10

(%

)

Del

icio

us

10-c

ore

Cit

eULi

ke10

-co

re

Del

icio

us

15-c

ore

Del

icio

us

20-c

ore C

iteU

Like

15-c

ore La

stF

M10

-co

re

Mo

vieL

ens

10-c

ore La

stF

M15

-co

re

Cit

eULi

ke20

-co

re

Last

FM

20-c

ore

Mo

vieL

ens

15-c

ore

Mo

vieL

ens

20-c

ore

TRPR-CP

We-Rank

Do-Rank

Go-Rank

0.0006 0.0010 0.0014 0.0027 0.0037 0.0042 0.0062 0.0092 0.0095 0.0163 0.0283 0.02840

2

4

6

8

10

12

14

16

18



MA

P (

%)

Del

icio

us

10-c

ore

Cit

eULi

ke10

-co

re

Del

icio

us

15-c

ore

Del

icio

us

20-c

ore

Cit

eULi

ke15

-co

re

Last

FM

10-c

ore

Mo

vieL

ens

10-c

ore

Last

FM

15-c

ore

Cit

eULi

ke20

-co

re

Last

FM

20-c

ore

Mo

vieL

ens

15-c

ore

Mo

vieL

ens

20-c

ore

TRPR-CP

We-Rank

Do-Rank

Go-Rank


6.5 IMPACT OF HANDLING “LIKELY RELEVANT” ENTRIES TO

METHOD PERFORMANCES

“Likely relevant” entries, generated from a graded-relevance scheme, are transitional

entries positioned between the “relevant” and “irrelevant” entries. A method that

takes the transitioning into account will only benefit with this data scheme. For

example, the proposed method Go-Rank uses GAP as the optimized ranking

evaluation measure to allow the tensor model to set up thresholds, so that the “likely

relevant” entries can be regarded as either “relevant” or “irrelevant” entries. The

conjecture is that optimizing inappropriate objective function (i.e. not leveraging the

data scheme) on a tensor model results in inferior recommendation quality. To

ascertain this statement, the same method with user profiles represented by two

different data schemes is used for analysis.

The method MAX (one of the benchmarking methods in Section 3.5) is used.

Firstly, the tensor model is built using a boolean scheme, calling the method MAX-

boolean. Secondly, the tensor model is built using the graded-relevance scheme,

calling the method MAX-graded. The objective function of both MAX-boolean and

MAX-graded is minimizing the Mean Square Error (MSE), suitable for solving a

classification problem (Ifada and Nayak, 2014c). The performance comparison

between these two methods is demonstrated on an AP evaluation measure, as that of

NDCG and MAP show the same trend. As shown in Figure 6.11, MAX-graded

results in poorer recommendation quality in comparison to MAX-boolean. MAX-

graded disregards the constraint where the “likely relevant” entries should be further

regarded as either “relevant” or “irrelevant” in order to effectively learn the tensor

model.

Figure 6.11. Comparison of MAX-boolean over MAX- graded performances showing the impact of

inappropriately handling the “likely relevant” entries

10 15 200

1

2

3

4

5

6

Delicious

p-core

AP

@10

MAX-boolean

MAX-graded

10 15 200

5

10

15

LastFM

p-core

MAX-boolean

MAX-graded

10 15 200

2

4

6

8

10

CiteULike

p-core

MAX-boolean

MAX-graded

10 15 200

5

10

15

20

MovieLens

p-core

MAX-boolean

MAX-graded


6.6 ACCURACY COMPARISONS OF THE PROPOSED METHODS

A comparative analysis of all proposed methods has been conducted to highlight

their strength and shortcomings. This analysis is expected to lead us towards setting a

selection mechanism of a method as per data characteristics.

Delicious dataset: Figure 6.12 shows that TRPR always achieves the best

results amongst the four proposed ranking methods on any 𝑝-core size and

evaluation measure, followed by Do-Rank, Go-Rank, and We-Rank. TRPR

works best on Delicious dataset due to the following reasons:

o The Delicious dataset is categorised as a dataset with dominant user-tag

set, as shown in Figure 6.5(a). This indicates that users tend to annotate

items with a generous number of tags;

o This particular characteristic of dataset suits TRPR because the

calculation of tag probability that significantly controls the

recommendation generation process may favour this dataset;

o In TRPR, the tensor reconstruction procedure is better suited for

generating the list of recommendations, than that of factors based, since

the probabilistic ranking procedure utilises candidate item and user tag

preference sets that are both revealed from the reconstructed tensor;

o With this type of dataset, the boolean scheme is sufficient to interpret

tagging data, representing the user profile.

Figure 6.12. Comparison of methods performances on Delicious dataset

LastFM dataset: Figure 6.13 shows that Do-Rank performs the best on the

10-core and any evaluation measure, in comparison to the other proposed

10 15 200

0.5

1

1.5

2

2.5

Delicious

p-core

ND

CG

@10

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

2

4

6

8

Delicious

p-core

AP

@1

0

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

1

2

3

4

5

Delicious

p-core

MA

P

TRPR-CP

We-Rank

Do-Rank

Go-Rank


methods. Yet, along with the increment of 𝑝-core size, Do-Rank

underperforms Go-Rank and TRPR in respective order. The same with

Delicious dataset, We-Rank also performs the worst on this dataset. The

results indicate that, to achieve the best performance on the LastFM dataset,

Do-Rank is the best option on small 𝑝-core size and Go-Rank is the one on

larger 𝑝-core size. This is due to the following reasons:

o The LastFM dataset is categorised as a dataset with dominant user-item

set, as shown in Figure 6.5(b). This shows that users tend to annotate

items with the least number of tags;

o The dataset with this characteristic suits either Do-Rank or Go-Rank

because the learning algorithm that formulates the ranking based on a list

of associated items may favour this dataset;

o From Figure 6.5(b), it can be observed that the over domination of the

user-item set in comparison to user-tag set is linear to the size of 𝑝-core,

i.e. the over domination significantly decreases on larger 𝑝-core size. This

indicates that, users on larger 𝑝-core size tend to use more tags for

annotating items than those on smaller size;

o Do-Rank performs better on smaller 𝑝-core size since its learning

algorithm formulates the ranking based on a list of associated items out of

all tags, most of them which have not been used by the user. When more

tags are used by the user, i.e. on larger 𝑝-core size, the user preference of

tags matters. For this reason, Go-Rank performs better, since its learning

algorithm formulates the ranking based on a list of associated items out of

tags that have been used by the user only;

o The ranking-based scheme – UTS or graded-relevance scheme – is the

suitable tagging data interpretation scheme, used to construct the user

profile, for this type of dataset;

o In Do-Rank and Go-Rank, the factors based procedure is better suited for

generating a list of recommendations, than that of tensor reconstruction

based, since the ranking-based interpretation scheme and list-wise based

ranking approach are implemented.


Figure 6.13. Comparison of methods performances on LastFM dataset

CiteULike dataset: Figure 6.14 shows that TRPR outperforms the other

proposed methods, on most 𝑝-core size and evaluation measures. Similar to

Delicious and LastFM datasets, We-Rank performs the worst on this dataset.

The same reasons that make TRPR perform best on the Delicious dataset

apply here as the CiteULike dataset is also categorised as a dataset with a

dominant user-tag set, as shown in Figure 6.5(c). However, TRPR

outperformance on this dataset is not as constant as that on the Delicious

dataset due to the significant evolvement of user-tag set over domination to

the size of 𝑝-core, i.e. the over domination is decreasing on larger 𝑝-core size.

In this case, on smaller 𝑝-core size, users tend to use too many tags, which

makes the tag preference set become too general and affects method

performance. On larger 𝑝-core size, users tend to use a moderate amount of

tags, which may really indicate the users’ preferences.

Figure 6.14. Comparison of methods performances on CiteULike dataset

10 15 200

2

4

6

8

10

LastFM

p-core

ND

CG

@10

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

5

10

15

20

LastFM

p-coreA

P@

10

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

2

4

6

8

10

LastFM

p-core

MA

P

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

1

2

3

4

5

6

CiteULike

p-core

ND

CG

@10

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

5

10

15

CiteULike

p-core

AP

@1

0

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

5

10

15

CiteULike

p-core

MA

P

TRPR-CP

We-Rank

Do-Rank

Go-Rank


MovieLens dataset: Figure 6.15 shows that TRPR performs the best on the

10-core as compared to the other proposed methods. Yet, after this 𝑝-core

size, it underperforms Go-Rank and Do-Rank. Similar to the results of other

datasets, We-Rank yields the worst performance as compared to other

methods. Go-Rank or Do-Rank outperformance in this dataset is caused by

the same reasons as those of the LastFM dataset since the MovieLens dataset

is also categorised as a dataset with a dominant user-item set, as shown in

Figure 6.5(d). However, TRPR works best on small 𝑝-core size, possibly due

to users of this dataset being likely to only choose certain types of items, i.e.

movies, and to use tags that are related to genres or actors’ names (Gemmell

et al., 2011). In other words, on a small 𝑝-core size, the amount of tags used

by the users for annotating items may strongly reveal the user tag preference

that is most beneficial in TRPR. While on larger 𝑝-core size, users tend to add

more and varied tags for annotating items, which makes it more difficult to

reveal the user preference.

Figure 6.15. Comparison of methods performances on MovieLens dataset

6.7 POINT-WISE BASED RANKING METHODS VERSUS LIST-WISE

BASED RANKING METHODS

Point-wise based ranking methods include TRPR and We-Rank, while list-wise based

ranking methods include Do-Rank and Go-Rank. The comparison of each method is

described as per evaluation measure from the average performance of per method

over all datasets, as shown in Figure 6.16.

10 15 200

2

4

6

8

10

MovieLens

p-core

ND

CG

@10

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

5

10

15

20

25

MovieLens

p-core

AP

@1

0

TRPR-CP

We-Rank

Do-Rank

Go-Rank

10 15 200

5

10

15

MovieLens

p-core

MA

P

TRPR-CP

We-Rank

Do-Rank

Go-Rank


Figure 6.16. Comparison of proposed methods performances as the average over all datasets using

NDCG, AP, and MAP

NDCG:

o Do-Rank, the list-wise based ranking method, achieves the best

performance in compared to other proposed methods in terms of NDCG.

The implementation of the UTS scheme and DCG – as the tagging data

interpretation scheme to result in multi-graded data representation and the

learning model optimization criterion respectively– in Do-Rank suits this

evaluation measure. Recall that DCG is the widely used ranking

evaluation measure in the case of multi-graded relevance data (Chapelle

and Wu, 2010; Liu, 2009; Weimer et al., 2007), where NDCG is the

normalization of DCG by its ideal ranking;

o Despite the fact that TRPR is a point-wise based ranking method that

builds its learning model from binary data, it can achieve equal

performance with Go-Rank – a list-wise based ranking method that builds

its model from graded-relevance data. The reason for this lies in the

probabilistic approach applied in TRPR to re-rank the candidate items

generated from the full reconstructed tensor by utilising the user tag

preference. This allows the ranking interdependency amongst the list of

candidate items to one another, which may favour NDCG;

o We-Rank, a point-wise based ranking method, is performing the worst due

to the following reasons:

We-Rank implements the point-wise based ranking approach for

learning-to-rank. This means that there is no interdependency

0

1

2

3

4

5

6

7

8

9

5.13

3.97

5.20 5.13

ND

CG

@10

Methods

TRPR-CP

We-Rank

Do-Rank

Go-Rank

0

5

10

15

20

12.55

8.94

12.0512.08

AP

@10

Methods

TRPR-CP

We-Rank

Do-Rank

Go-Rank

0

2

4

6

8

10

12

7.56

5.79

7.31 7.30

MA

P

Methods

TRPR-CP

We-Rank

Do-Rank

Go-Rank


between the predicted items and other items or between the tag used

to annotate the predicted items and other tags, within the learning

model;

We-Rank performance is prone to how well the users tag usage

likeliness is captured, which will be used to reward or penalise each

tensor model entry during the learning process;

We-Rank directly uses the learned latent factors for generating the

list of item recommendations for each target user, while the

weighting approach within the learning process cannot sufficiently

capture the user’s tag usage likeliness. Implementing the full tensor

reconstruction and probabilistic ranking procedure may improve the

performance of We-Rank.

AP:

o TRPR on average achieves the best performance in compared to the other

proposed methods in terms of AP. The implementation of the boolean

scheme and probabilistic approach – as respectively the tagging data

interpretation scheme to result in binary data representation and the

approach to re-rank the candidate items generated from the full

reconstructed tensor – in TRPR suits this evaluation measure. Recall that

AP is a commonly used ranking evaluation measure in the case of binary

relevance data (Chapelle and Wu, 2010; Liu, 2009; Shi, Karatzoglou,

Baltrunas, Larson, Hanjalic, et al., 2012; Shi, Karatzoglou, Baltrunas,

Larson, Oliver, et al., 2012);

o Go-Rank on average achieves better results than Do-Rank because it uses

GAP as the optimization criterion for learning its model that is built from

graded-relevance data, resulted from the implementation of graded-

relevance scheme. Recall that GAP is the generalisation of AP that works

as an alternative ranking evaluation measure for multi-graded relevance

data (Ferrante et al., 2014; Robertson et al., 2010). This may make AP

favour Go-Rank over Do-Rank;

o We-Rank performs the worst due to the same reasons as those of NDCG.


MAP:

o TRPR averagely achieves the best performance in comparison to other

proposed methods in term of MAP. This corresponds to results of AP as

MAP is the average of AP over all users.

In general, the list-wise based ranking methods – which include Do-Rank and Go-

Rank – can achieve better performance in terms of NDCG in comparison to the

point-wise based ranking methods – which include TRPR and We-Rank. The

characteristic of NDCG as an evaluation measure favours the methods that build the

learning model from multi-graded data. On the other hand, the point-wise based

ranking method TRPR achieves better performance in terms of AP and MAP as

compared to the list-wise based ranking methods. The characteristics of AP and

MAP as evaluation measures favour the methods that build the learning model from

binary data.

6.8 PROPOSED METHODS VERSUS BENCHMARKING METHODS

MAX: TRPR, Do-Rank and Go-Rank outperform MAX in most evaluation measures

and datasets. MAX uses the boolean scheme for constructing the user profiles and

implements a point-wise based ranking approach for learning the tensor

recommendation model. MAX directly utilises the reconstructed tensor to generate

the list of recommendations. Consequently, though it learns the latent factors of the

ternary dimensions, it may fail to rank the list of recommendations in required order.

However, We-Rank underperforms MAX on some of the datasets. The reasons are

speculated as follows:

We-Rank is a point-wise based ranking approach that uses the learned latent

factors for generating the list of item recommendations, instead of using the

full tensor reconstruction procedure;

For learning the factors model, We-Rank implements the weight values,

calculated from the user’s tag usage likeliness, to either reward or penalise

each tensor entry. Therefore, its performance highly depends on how well the


user’s tag usage likeliness is captured. We-Rank outperforms MAX on the

20-core of LastFM and CiteUlike datasets, and on the 15 and 20-cores of the

MovieLens dataset since the user’s tag usage likeliness can be sufficiently

captured on those datasets.

Pairwise Interaction Tensor Factorization (PITF) Method: PITF uses the set-

based scheme for constructing the user profiles and implements a pair-wise based

ranking approach for learning the tensor recommendation model. Experiment results

show that PITF achieves inferior performance in comparison to TRPR, Do-Rank and

Go-Rank at most evaluation measures and datasets. The reasons of this are as

follows:

Despite the fact that TRPR uses the boolean scheme to generate the tensor

entries, and implements point-wise based ranking approach for learning the

tensor recommendation model, it applies a subsequent stage, i.e. the

probabilistic ranking approach, for generating the list of recommendations.

The second stage of the method probabilistically re-ranks the list of candidate

items for each user, generated from the learned latent factors, as a list-wise

ranking model in order to get the correct order of recommendations;

Do-Rank enhances recommendation performance by optimizing the top-

biased measure DCG, while PITF implements the AUC-based optimization

approach, which gives equal penalty to mistakes at the top and bottom of the

list of recommendations. Moreover, as a pair-wise ranking model, PITF aims

to get the ranking order within each pair correctly, while Do-Rank employs

the list-wise ranking model, which aims to get the correct order of all lists;

Affirming Do-Rank outperformance, Go-Rank attains great improvement

over PITF as: (1) it enhances the recommendation performance by optimizing

the top-biased measure GAP, i.e. the generalisation of AP for ordinal

relevance data; and (2) it builds the learning model as a list-wise ranking

model and aims to get the order of all lists correctly.

In general, the same reason that causes We-Rank underperformance towards MAX

also applies here. Note that We-Rank achieves better performance than PITF on the

15 and 20-cores of the MovieLens dataset.


Candidate Tag Set (CTS): CTS is a matrix-based method that implements the

boolean scheme and ranks the recommendations by using the user’s past tagging

activities in forming the user’s likelihood. All of the proposed methods are better

performed than CTS since CTS projects the ternary relations of tagging data into a

two-dimensional model, impacting the recommendation quality. Same with PITF,

We-Rank only achieves better performance than CTS on the 15 and 20-cores of the

MovieLens dataset. The reason for We-Rank underperformance is the same for the

case of MAX and PITF.

6.9 EFFICIENCY VERSUS METHOD PERFORMANCES

Do-Rank and Go-Rank apply fast learning approaches to address the optimization

computation problem. As shown in Figure 5.5, Do-Rank performance is linear to the

size of data, as expressed by the size of items R with implementation of fast learning,

in comparison to 𝑅2 when implemented without fast learning. Similarly, Figure 5.15

shows Go-Rank with fast learning is a few orders of magnitude faster than Go-Rank

without fast learning. This section analyses whether there is any trade-off between

the efficiency and accuracy. The comparative AP@10 performance of fast leaning

and original learning is shown on the 10-core of each dataset, whereas results of

other 𝑝-core sizes show similar behaviour. Figure 6.17 shows that, for both Do-Rank

and Go-Rank, the implementation of a fast learning approach does not compromise

on accuracy; instead for some datasets an improved performance is shown. It

ascertains that fast learning not only efficiently reduces the learning time but also

improves or maintains the accuracy.

(a) (b)

Figure 6.17. The comparison of efficiency over method performances

Delicious LastFM CiteULike MovieLens0

5

10

15

20

5.7

2

14.4

5

5.9

5

11.4

7

5.6

7

12.2

8

5.9

5

11.3

9

Do-Rank

Datasets

AP

@1

0

Fast Learning

Original Learning

Delicious LastFM CiteULike MovieLens0

5

10

15

20

5.0

2

14.1

5

6.2

0

10.9

3

4.9

5

14.0

8

6.1

7

10.8

0

Go-Rank

Datasets

AP

@1

0

Fast Learning

Original Learning


6.10 COMPUTATION COMPLEXITY

The complexity of all proposed methods is compared and listed in Table 6.6. It can

be observed that the learning of point-wise based ranking methods is less complex

than that of the list-wise based ranking methods, except when TRPR is implemented

with a Tucker-based technique, i.e. HOSVD or HOOI. On the other hand, the

recommendation generation of point-wise based ranking method TRPR is more

complex than that of list-wise based ranking methods and the point-wise based

ranking method, We-Rank. We-Rank, Do-Rank, and Go-Rank directly use the learned

latent factors for generating a list of item recommendations for each target user.

Learning

Approach Method

Complexity

Learning Recommendation

Generation

Point-wise

based ranking

TRPR Tucker-based technique:

𝑂(𝐹3(𝑄 + 𝑆 + 𝑅))

CP-based technique:

𝑂(𝐹(𝑄 + 𝑆 + 𝑅))

Tucker-based technique:

𝑂(𝐹3𝑄𝑅 ∪𝑗=1𝑙 {𝑠𝑗}) + 𝑂(𝑛𝑣)

CP-based technique:

𝑂(𝐹𝑄𝑅 ∪𝑗=1𝑙 {𝑠𝑗}) + 𝑂(𝑛𝑣)

We-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑅)) 𝑂(𝐹)

List-wise

based ranking

Do-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑄𝑆𝑅)) 𝑂(𝐹)

Go-Rank 𝑂(𝐹(𝑄 + 𝑆 + 𝑄�̃� 𝑐�̃�2)) 𝑂(𝐹)

Table 6.6. The comparison of complexity of proposed methods

where:

𝑄 : size of set of users

𝑅 : size of set of items

𝑆 : size of set of tags

𝐹 : size of latent factor

𝑙 : 𝑆 𝑑𝑖𝑣 𝑏 where 𝑏 is the size of block-strip row with 𝑏 ≪ 𝑆

𝑛 : size of candidate item set

𝑣 : size of tag preference set


�̃� : average number of 𝑉𝑢 (list of tags of that have been used by user u)

�̃� : average number of 𝑍𝑢 (list items which entries are labelled as “relevant” or

“likely relevant” for the user 𝑢)

𝑐 : 𝑚𝑎𝑥(𝑦𝑢,𝑖,𝑡)

6.11 TIME COMPLEXITY

The running time complexity of proposed methods includes the three main processes

running time, i.e. user profile (tensor) construction, learning the latent factors, and

recommendation generation. The comparison is shown on the smallest 𝑝-core size,

i.e. 10-core, of each dataset, consuming the most processing time.

Figure 6.18 shows that the domination of running time varies on each method.

For the point-wise based ranking methods, the running time of TRPR and We-Rank

are respectively dominated by the process of recommendation generation – requiring

two stages of full tensor reconstruction from latent factors and probabilistic ranking,

described in Section 4.2.4 – and weighted tensor construction, described in Section

4.3.3.2. On the other hand, due to the nature of the list-wise based ranking approach,

learning the latent factors process dominates the running time of both Do-Rank and

Go-Rank, as described in Section 5.2.3 and Section 5.3.3 respectively. Note that the

running time of user profile (tensor) construction, learning the latent factors, and

recommendation generation processes in TRPR, We-Rank and Go-Rank, respectively,

are very low in comparison to the other processes of the methods and therefore they

are seen as a line in Figure 6.18. In general, the total running time of TRPR is

relatively comparable to that of Go-Rank, whereas the running time of We-Rank is to

that of Do-Rank.


Figure 6.18. The comparison of time complexity of proposed methods

6.12 STRENGTHS AND SHORTCOMINGS OF THE PROPOSED

METHODS

Probabilistic Ranking Method: Tensor-based Item Recommendation using

Probabilistic Ranking (TRPR).

Strength:

o Dataset: TRPR is suited to work with a dataset that has a dominant user-

tag set, such as Delicious and CiteULike;

o Accuracy:

TRPR improves recommendation accuracy by the implementation of a

probabilistic approach that utilises the user tag preference for ranking

TRPR-CP We-Rank Do-Rank Go-Rank10

-2

100

102

104

106

108

Delicious 10-core

Methods

log

Ru

nti

me

(se

c)

User Profile (Tensor) Construction

Learning the Latent Factors



-2

100

102

104

106

108

LastFM 10-core

Methods

log

Ru

nti

me

(se

c)





-2

100

102

104

106

CiteULike 10-core

Methods

log

Ru

nti

me

(se

c)





-2

100

102

104

106

MovieLens 10-core

Methods

log

Ru

nti

me

(se

c)





candidate items generated from the full reconstructed tensor, in

comparison to a conventional approach that directly ranks candidate

items from the reconstructed tensor;

The combination of both a boolean scheme for building the learning

model and a probabilistic approach make TRPR work best in terms of

AP and MAP.

o Computation complexity and running time:

The implementation of a memory efficient loop approach makes

TRPR scalable for full tensor reconstruction on a large dataset;

Belonging to the point-wise based ranking approach, the learning

process TRPR requires significantly less complexity and running time

in comparison to that of methods that belong to the list-wise based

ranking approach.

Shortcomings:

o Computation complexity and running time: The recommendation

generation process of TRPR requires the two stages of full tensor

reconstruction from the learned latent factors and probabilistic ranking

procedures, which make it require significantly more complexity and

running time in comparison to that which only requires a latent factors-

based procedure.

Weighted Tensor Approach for Ranking: Recommendation Ranking using

Weighted Tensor (We-Rank).

Strength:

o Dataset: We-Rank is suited to work on dense datasets, such as the 15 and

20-cores of the MovieLens dataset;

o Accuracy: The implementation of a weighted scheme, to reward or

penalise each primary tensor entry during the learning process, makes

We-Rank outperforms the benchmarking methods on dense datasets;


o Computation complexity and running time:

Belonging to the point-wise based ranking approach, the learning

process of We-Rank requires significantly less complexity and running

time in comparison to that of methods that belong to the list-wise

based ranking approach;

The learned latent factors of We-Rank can be directly used in the

recommendation generation process that makes it require less

complexity and running time in comparison to that which requires a

full tensor reconstruction procedure.

Shortcomings:

o Accuracy: We-Rank obtains low performances in comparison to the other

proposed methods, as the weighted scheme cannot sufficiently capture the

user’s tag usage likeliness.

Learning from Multi-Graded Data: DCG Optimization for Learning-to-Rank

(Do-Rank).

Strength:

o Dataset: Do-Rank is suited to work with a dataset that has a dominant

user-item set, such as LastFM and MovieLens;

o Accuracy: The combination of both the UTS scheme for building the

learning model and DCG as the optimization criterion of the learning

model optimization criterion make Do-Rank work best in terms of

NDCG;

o Efficiency: The implementation of a fast learning approach in Do-Rank

efficiently reduces the learning time, while at the same time improving or

maintaining accuracy;

o Computation complexity and running time: The learned latent factors of

Do-Rank can be directly used in the recommendation generation process

that makes it require less complexity and running time in comparison to

that which requires a full tensor reconstruction procedure.


Shortcomings:

o Computation complexity and running time: Belonging to the list-wise

based ranking approach, the learning process Do-Rank requires

significantly more complexity and running time in comparison to that of

methods that belong to the point-wise based ranking approach.

Learning from Graded-relevance data: GAP Optimization for Learning-to-

Rank (Go-Rank).

Strength:

o Dataset: Go-Rank is suited to work with a dataset that has a dominant

user-item set, such as LastFM and MovieLens;

o Accuracy: The combination of a graded-relevance scheme for building

the learning model and GAP as the optimization criterion of the learning

model optimization criterion make Go-Rank work better in comparison to

Do-Rank in terms of AP;

o Efficiency: The implementation of a fast learning approach in Go-Rank

efficiently reduces the learning time, while at the same time improving or

maintaining accuracy;

o Computation complexity and running time: The learned latent factors of

Go-Rank can be directly used in the recommendation generation process

that makes it require less complexity and running time in comparison to

that which requires full tensor reconstruction procedure.

Shortcomings:

o Computation complexity and running time: Belonging to the list-wise

based ranking approach, the learning process of Go-Rank requires

significantly more complexity and running time in comparison to that of

methods that belong to the point-wise based ranking approach.


6.13 CHAPTER SUMMARY

This chapter has compared and analysed the performance of all proposed methods in

order to investigate which method performs best in terms of various aspects. First,

analysis on the impact of interpretation scheme, users’ tagging behaviour, “relevant”

entries to method performances, handling “likely relevant” entries and 𝑝-core are

conducted. The results are presented from the perspective of tensor entries

population and method performances.

Next, the performance of each proposed method is compared and discussed

from the perspective of each dataset. The comparison of the proposed methods’

performances, based on the implemented learning-to-rank approach, is then

conducted. In general, the list-wise based ranking methods – which include Do-Rank

and Go-Rank – can achieve better performance in terms of NDCG in comparison to

the point-wise based ranking methods – which includes TRPR and We-Rank. On the

other hand, the point-wise based ranking method TRPR – We-Rank is excluded in

this generalisation due to its poor performance – achieves better performance in

terms of AP and MAP in comparison to the list-wise based ranking methods. The

reason for these results is due to the characteristic of each evaluation measure, i.e.

NDCG is likely to favour methods that build the learning model from multi-graded

data, while AP and MAP are likely to favour methods that build the learning model

from binary data.

In comparison with the benchmarking methods, except We-Rank, the

proposed methods outperform MAX, PITF and CTS at most evaluation measures and

datasets. We-Rank underperformance in comparison to the benchmarking methods on

most of the datasets is due to its sensitivity to how well the user’s tag usage likeliness

is captured, which will be used to reward or penalise each tensor model entry during

the learning process.

Various aspects of the proposed methods, such as efficiency, computation

complexity, and time scalability, are then presented. Finally, the strengths and

shortcomings of the proposed methods are distilled.


Chapter 7: Conclusions 193

Chapter 7: Conclusions

With the growing user-generated information on the web, STSs have gained great

popularity, since users can annotate items of their interest using freely defined tags

and then utilise those tags to organise, retrieve, and share items with other users.

Over a period of time, the tagging data are recorded as a result of the accumulated

tagging activity, i.e. an event when a user uses a tag to annotate an item and forms a

⟨𝑢𝑠𝑒𝑟, 𝑖𝑡𝑒𝑚, 𝑡𝑎𝑔⟩ ternary relation. By learning from the users past tagging

behaviours, a tag-based system can generate a list of item recommendations, which

may be of interest to a user. To boost the performance of such a system, the unique

multi-dimensional relations between users, items, and tags that represent the user

profiles must be appropriately modelled, such that the latent relationships among

them are thoroughly captured. Furthermore, knowing that users are more interested

in the top list recommended items, the ranking of items in the recommendation list is

crucial.

This research developed methods for building a tag-based item

recommendation system that explores the interplay between the multi-dimensions of

tagging data. In order to achieve the goal, efficient tagging data interpretation

schemes and recommendation ranking methods are proposed, in which tensor model

and learning-to-rank approaches are employed. The recommendation methods

developed in this thesis implement two ranking approaches: point-wise and list-wise.

A point-wise based ranking method approaches the recommendation task as a

regression/classification task. A list-wise based ranking method approaches the

recommendation task as a ranking task.

The first section of this chapter summarises the contributions of this research,

followed by the findings that are drawn from this thesis. Finally, limitations and

directions of current and future works are presented, respectively.

194 Chapter 7: Conclusions

7.1 SUMMARY OF CONTRIBUTIONS

The literature review presented in Chapter 2 has highlighted the following

shortcomings in the tag-based item recommendation system:

o Lack of efficient schemes that can thoroughly utilise the user’s tagging

history for generating a list of recommendations;

o Lack of efficient methods that efficiently implement a learning-to-rank

approach to solve the tag-based item recommendation task;

o Lack of comprehensive works that study whether a combination of an

interpretation scheme and a learning-to-rank approach has a positive

influence in making a recommendation.

This thesis focused on overcoming those shortcomings by proposing two novel

schemes to interpret tagging data and four tag-based recommendation methods for

generating the list of item recommendations in which tensor model and learning-to-

rank approaches are implemented. The main contributions are summarised as

follows:

Developed two ranking-based interpretation schemes which apply a ranking

constraint to interpret the tagging data and result in a richer multi-graded

relevance data:

o User-Tag Set (UTS) scheme interprets the tagging data and results in three

possible distinct entries: “relevant” or “1”, “irrelevant” or “-1”, and

“indecisive” or “0”;

o graded-relevance scheme interprets the tagging data and results in four

possible distinct entries: “relevant” or “2”, “likely relevant” or “1”,

“irrelevant” or “-1”, and “indecisive” or “0”.

Developed Tensor-based Item Recommendation using Probabilistic Ranking

(TRPR) method:

o TRPR implements a memory efficiency technique in order to solve the

scalability issue that occurs during the tensor reconstruction process;

Conclusions 195

o TRPR improves the recommendation quality by ranking the items using a

probabilistic approach, i.e. a subsequent approach after tensor

reconstruction process.

Developed Recommendation Ranking using Weighted Tensor (We-Rank)

method:

o We-Rank implements a weighted scheme for learning the tensor

recommendation model such that rewards and penalties are given to the

observed and non-observed entries of each user-item set, respectively.

Developed DCG Optimization for Learning-to-Rank (Do-Rank) method:

o Do-Rank learns from a user profile which is built from multi-graded data

resulted by implementing the proposed User-Tag Set (UTS) scheme;

o Do-Rank optimizes the recommendation model with respect to Discount

Cumulative Gain (DCG) as the ranking evaluation measure to

appropriately learn the tensor recommendation model built from multi-

graded data.

Developed GAP Optimization for Learning-to-Rank (Go-Rank) method:

o Go-Rank learns from a user profile which is built from graded-relevance

data resulted by implementing the proposed graded-relevance scheme

that effectively leverages the tagging data;

o Go-Rank optimizes the recommendation model with respect to Graded

Average Precision (GAP) as the ranking evaluation measure to

appropriately learn the tensor recommendation model built from graded-

relevance data.

Comprehensive analyses of the results of all the developed and benchmarking

methods that reveal the strength and shortcoming of each developed method,

comparison between point-wise and list-wise based ranking methods, and

various aspects that influence method performances such as interpretation

schemes, tensor entries population, and 𝑝-core.


7.2 SUMMARY OF FINDINGS

The main findings from this thesis are summarised as follows:

In response to Research Question 1 (How tagging data can be efficiently

interpreted such that the user’s tagging history is thoroughly utilised while

making recommendations?), two interpretation schemes are proposed and the

findings are as follows:

o The UTS scheme efficiently interprets tagging data as a rich multi-graded

data, where each entry can be one of the elements in the ordinal relevance

set of {𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {1, 0, − 1}). This scheme

tackles the overgeneralisation of the set-based scheme by implementing a

user’s item collection constraint such that not all items, on each observed

user-tag set, are labelled as irrelevant” entries, as they have been

annotated by the users using other tags. Accordingly, UTS generates less

dense, non-indecisive entries in comparison to that of a set-based scheme.

This means that a tensor model that uses this scheme to populate its

entries will learn less data in comparison to a set-based scheme;

o The graded-relevance scheme efficiently interprets tagging data as a rich

graded-relevance data, where each entry can be one of the elements in the

ordinal relevance set of

{𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑙𝑖𝑘𝑒𝑙𝑦 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡, 𝑖𝑛𝑑𝑒𝑐𝑖𝑠𝑖𝑣𝑒, 𝑖𝑟𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡} (or {2, 1, 0, − 1}).

The “likely relevant” entries are the transitional entries between the

“relevant” and “irrelevant” entries. This scheme tackles the

overgeneralisation of a set-based scheme in term of labelling entries as

“irrelevant” by revealing the “likely relevant” entries as transitional

entries positioned between the “relevant” and “irrelevant” entries;

o Experiment results show that both UTS and graded-relevance schemes

support ranking methods, i.e. Do-Rank and Go-Rank respectively, to

achieve quality recommendations and to deal with the scalability problem

in the learning process;

o In regards to the implementation of 𝑝-core refinement to the dataset, the

“relevant”, “likely relevant”, or “irrelevant” entries population

Conclusions 197

distribution generated from any interpretation scheme is linear to the size

of the 𝑝-core;

o The user’s tagging behaviour influences the entries population. A dataset

that has a dominant user-item set, in which users prefer to use less

number of tags for annotating items, given the “relevant” entries

population, more “irrelevant” and/or “likely relevant” entries populations

are generated by the UTS and/or graded-relevance schemes dataset in

comparison to a dataset with a dominant user-tag set, i.e. users prefer to

use more tags for annotating items.

In response to Research Question 2 (How can a learning-to-rank approach be

implemented to solve the tag-based item recommendation task? What

optimization criterion should be used for learning the tensor recommendation

model? In what order can the Top-𝑁 item recommendation be made?), four

recommendation methods are proposed with tensor model and learning-to-

rank approaches. The experimental results on various real-world datasets

indicate the following findings:

o TRPR, a point-wise based ranking method, has shown that

probabilistically ranking candidate items generated from the reconstructed

tensor achieves improved performance over just directly ranking them.

TRPR has demonstrated that the memory efficient loop approach solves

the expensiveness of full tensor reconstruction procedure in the

recommendation generation process, ensuring TRPR is scalable on a large

dataset. The calculation of tag probability that controls the ranking

approach makes TRPR suitable for a dataset with a dominant user-tag set,

such as Delicious and CiteULike datasets;

o We-Rank, a point-wise based ranking method, has been shown to

outperform the benchmarking methods by utilising the weighted scheme.

Yet, since We-Rank highly depends on how well the user tag usage

likeliness is captured – for populating the weighted tensor – it only

performs well on dense datasets. The learned latent factors of We-Rank

can be directly used for generating the Top-𝑁 list of item


recommendations for each target user due to the implementation of a

weighting scheme within the learning process, avoiding the complex

recommendation generation process, as full tensor reconstruction is not

required;

o Do-Rank has been shown to outperform the benchmarking methods and

has demonstrated that the implementation of a fast learning approach in

the learning process is reducing the learning time while at the same time

improving or maintaining accuracy. The learned latent factors of Do-Rank

can be directly used for generating the Top-𝑁 list of item

recommendations for each target user, due to the implementation of a

UTS interpretation scheme and list-wise based ranking approach within

the learning process, avoiding the complex recommendation generation

process, as full tensor reconstruction is not required. The formulation of

ranking that is based on a list of associated items in the learning algorithm

makes Do-Rank suitable for a dataset with a dominant user-item set, such

as LastFM and MovieLens datasets;

o Go-Rank achieves better performance in comparison to benchmarking

methods. The fast learning approach solves the expensiveness within

learning process due to the implementation of a list-wise based ranking

approach. Results show that this is not only reducing the learning time,

but also improving or maintaining accuracy. The learned latent factors of

Go-Rank can be directly used for generating the Top-𝑁 list of item

recommendations for each target user, due to the implementation of a

graded-relevance interpretation scheme and list-wise based ranking

approach within the learning process, avoiding the complex

recommendation generation process, as full tensor reconstruction is not

required. The same with Do-Rank, Go-Rank is suitable for a dataset with

a dominant user-item set, such as LastFM and MovieLens datasets, due to

the formulation of ranking that is based on a list of associated items in the

learning algorithm.

Conclusions 199

In response to Research Question 3 (Does a combination of an interpretation

scheme and a learning-to-rank approach have a positive influence in making a

recommendation? Given that the proposed tag-based item recommendation

methods are grouped as point-wise and list-wise based ranking approaches,

comparing their performances may help to finding an efficient method),

comprehensive comparison and analysis of all proposed methods is

conducted and the findings are as follows:

o Analysis of all proposed methods confirms that a combination of the

interpretation scheme and the learning-to-rank approach has a positive

influence in making a recommendation;

o The list-wise based ranking methods – which include Do-Rank and Go-

Rank – achieve better performance in terms of NDCG in compared to the

point-wise based ranking methods – which include TRPR and We-Rank.

Combination of the UTS scheme for building the learning model and

DCG as the optimization criterion in Do-Rank, leads NDCG, the widely

used ranking evaluation measure in the case of multi-graded relevance

data, to favour this method;

o The point-wise based ranking method TRPR – We-Rank is excluded due

to its poor performance – achieves better performance in terms of AP and

MAP in comparison to the list-wise based ranking methods. Combination

of a boolean scheme for building the learning model and a probabilistic

approach for ranking the candidate items generated from the full

reconstructed tensor in TRPR leads AP and MAP, the widely used

ranking evaluation measure in the case of binary data, to favour this

method. Meanwhile, a combination of a graded-relevance scheme for

building the learning model and GAP as the optimization criterion makes

Go-Rank averagely achieve better results than Do-Rank in terms of AP.


7.3 LIMITATIONS AND FUTURE WORKS

This research focuses mainly on generating item recommendations for tag-based

systems using a tensor model and learning-to-rank approach. Several future

improvements and extensions to the currently proposed methods are as follows:

Further scope of improving the proposed methods:

o We-Rank performs the worst amongst the other proposed methods. It also

underperforms in the benchmarking methods on some of the datasets used

in this thesis. This happens due to its sensitivity to how well the user’s tag

usage likeliness is captured, which will be used to reward or penalise each

tensor model entry during the learning process. Hence, future work can

include investigating a more efficient approach for capturing the user’s

tag usage likeliness in order to improve the performance of We-Rank;

o The graded-relevance scheme is an efficient scheme, as it leverages the

tagging data more effectively. In Go-Rank, it has been implemented with

GAP as the optimization criterion of the learning model, in order to

improve the tag-based item recommendation performance. For future

work, it would be interesting to investigate the potential of implementing

the graded-relevance scheme with other ranking evaluation measures for

generating tag-based item recommendations.

Further scope of extending the problems:

o One of the common problems in a tag-based recommendation system is

the cold-start user problem, i.e. the situation in which a user has annotated

a single item only with limited number of tags. This makes it difficult to

infer the user preferences on the system, due to limited usage data.

Though UTS and graded-relevance schemes, which are able to generate

richer data, have been proposed and implemented on Do-Rank and Go-

Rank, as yet no discussion and experiments have been done to investigate

the impact of those schemes in solving the cold-start problem. Future

work can include this study on Do-Rank and Go-Rank;

o The semantics behind the tags should be considered properly, as they are

freely defined by the users and therefore they can cause semantic

Conclusions 201

problems such as synonymy and polysemy (Golder and Huberman, 2006).

Future work can include the analysis of the tag semantic problem and

utilise the outcome in building the tensor model. An alternative solution

could be implementing the topic model approach to keep the tags’ nature

as the “social vocabulary” (Alper, 2012).

Further scope of extending the applications:

o The proposed methods were built to solve the problem of tag-based item

recommendation. In the future, it would be interesting to apply the

methods to other applications that can generate three-dimensional data,

similar that of a tag-based system;

o The proposed methods apply to three-dimensional data. Future work can

possibly apply them to higher order data.


Bibliography 203

Bibliography

Acar, E., Dunlavy, D. M., Kolda, T. G., and Mørup, M. (2011). Scalable Tensor

Factorizations for Incomplete Data. Chemometrics and Intelligent Laboratory

Systems, 106(1), 41-56.

Adomavicius, G., and Tuzhilin, A. (2005). Toward the Next Generation of

Recommender Systems: A Survey of the State-of-the-art and Possible

Extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6),

734-749.

Adomavicius, G., and Tuzhilin, A. (2011). Context-aware Recommender Systems. In

Recommender systems handbook. 217-253, Springer.

Agichtein, E., Brill, E., and Dumais, S. (2006). Improving Web Search Ranking by

Incorporating User Behavior Information. Proc. The 29th annual

international ACM SIGIR Conference on Research and Development in

Information Retrieval, Seattle, Washington, USA, 19-26. ACM.

Alper, M. E. (2012). Personalized Recommendation in Folksonomies using a Joint

Probabilistic Model of Users, Resources and Tags. Proc. The 11th

International Conference on Machine Learning and Applications (ICMLA),

Boca Raton, Florida, 1: 368-373. IEEE.

Appellof, C. J., and Davidson, E. (1981). Strategies for Analyzing Data from Video

Fluorometric Monitoring of Liquid Chromatographic Effluents. Analytical

Chemistry, 53(13), 2053-2056.

Bader, B. W., Kolda, T. G., and others (2012). MATLAB Tensor Toolbox Version

2.5. Retrieved 12 June, from

http://www.sandia.gov/~tgkolda/TensorToolbox/

Baker, L. D., and McCallum, A. K. (1998). Distributional Clustering of Words for

Text Classification. Proc. The 21st Annual International ACM SIGIR

Conference on Research and Development in Information Retrieval,

Melbourne, Australia, 96-103. ACM.

Balakrishnan, S., and Chopra, S. (2012). Collaborative Ranking. Proc. The 5th ACM

International Conference on Web Search and Data Mining, Seattle,

Washington, USA, 143-152. ACM.

Balcan, M.-F., Bansal, N., Beygelzimer, A., Coppersmith, D., Langford, J., and

Sorkin, G. B. (2007). Robust Reductions from Ranking to Classification. In

Learning Theory. 604-619, Springer Berlin Heidelberg.

Barragans-Martinez, A. B., Rey-Lopez, M., Costa-Montenegro, E., Mikic-Fonte, F.

A., Burguillo, J. C., and Peleteiro, A. (2010). Exploiting Social Tagging in a

Web 2.0 Recommender System. Internet Computing, IEEE, 14(6), 23-30.

Batagelj, V., and Zaveršnik, M. (2002). Generalized Cores. arXiv preprint

cs/0202039,

Begelman, G., Keller, P., and Smadja, F. (2006). Automated Tag Clustering:

Improving Search and Exploration in the Tag Space. Proc. Collaborative

Web Tagging Workshop at WWW2006, Edinburgh, Scotland, 15-33.

Bergqvist, G., and Larsson, E. G. (2010). The Higher-Order Singular Value

Decomposition: Theory and an Application. IEEE Signal Processing

Magazine, 27(3), 151-154.

204 Bibliography

Bogers, T., and van den Bosch, A. (2009). Collaborative and Content-based Filtering

for Item Recommendation on Social Bookmarking Websites. Proc. The ACM

RecSys Workshop on Recommender Systems and the Social Web, New York,

USA, 9-16. ACM.

Buckley, C., and Voorhees, E. M. (2000). Evaluating Evaluation Measure Stability.

Proc. The 23rd Annual International ACM SIGIR Conference on Research

and Development in Information Retrieval, Athens, Greece, 33-40. ACM.

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and

Hullender, G. (2005). Learning to Rank using Gradient Descent. Proc. The

22nd International Conference on Machine Learning, 89-96. ACM.

Burke, R. (2007). Hybrid Web Recommender Systems. In The adaptive web. 377-

408, Springer Berlin Heidelberg.

Cai, Y., Zhang, M., Luo, D., Ding, C., and Chakravarthy, S. (2011). Low-order

Tensor Decompositions for Social Tagging Recommendation. Proc. The 4th

ACM International Conference on Web Search and Data Mining, Hong

Kong, China, 695-704. ACM.

Cantador, I., Brusilovsky, P., and Kuflik, T. (2011). Second Workshop on

Information Heterogeneity and Fusion in Recommender Systems

(HetRec2011). Proc. The 5th ACM Conference on Recommender Systems,

Chicago, Illinois, USA, 387-388.

Cao, Z., Qin, T., Liu, T.-Y., Tsai, M.-F., and Li, H. (2007). Learning to Rank: From

Pairwise Approach to Listwise Approach. Paper presented at the The 24th

International Conference on Machine Learning. Corvalis, Oregon: ACM.

Carroll, J. D., and Chang, J.-J. (1970). Analysis of Individual Differences in

Multidimensional Scaling via an N-way Generalization of “Eckart-Young”

Decomposition. Psychometrika, 35(3), 283-319.

Castellano, G., Fanelli, A., Torsello, M., and Jain, L. (2009). Innovations in Web

Personalization. In Web Personalization in Intelligent Environments. 1-26,

Springer Berlin Heidelberg.

Chapelle, O., Metlzer, D., Zhang, Y., and Grinspan, P. (2009). Expected Reciprocal

Rank for Graded Relevance. Proc. The 18th ACM Conference on Information

and Knowledge Management, 621-630. ACM.

Chapelle, O., and Wu, M. (2010). Gradient Descent Optimization of Smoothed

Information Retrieval Metrics. Information Retrieval, 13(3), 216-235.

Cichocki, A., Zdunek, R., Phan, A. H., and Amari, S.-i. (2009). Nonnegative Matrix

and Tensor Factorizations: Applications to Exploratory Multi-way Data

Analysis and Blind Source Separation, John Wiley & Sons.

Craswell, N. (2009). Mean Reciprocal Rank. In Encyclopedia of Database Systems.

1703-1703, Springer.

Cremonesi, P., Koren, Y., and Turrin, R. (2010). Performance of Recommender

Algorithms on Top-N Recommendation Tasks. Proc. The 4th ACM

Conference on Recommender Systems, Barcelona, Spain, 39-46. ACM.

Das, M., Das, G., and Hristidis, V. (2011). Leveraging Collaborative Tagging for

Web Item Design. Proc. The 17th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, San Diego, California, USA,

538-546. ACM.

de Campos, L. M., Fernández-Luna, J. M., Huete, J. F., and Rueda-Morales, M. A.

(2010). Combining Content-based and Collaborative Recommendations: A

Hybrid Approach based on Bayesian Networks. International Journal of

Approximate Reasoning, 51(7), 785-799.

Bibliography 205

De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000a). A Multilinear Singular

Value Decomposition. SIAM Journal on Matrix Analysis and Applications,

21(4), 1253-1278.

De Lathauwer, L., De Moor, B., and Vandewalle, J. (2000b). On the Best Rank-1 and

Rank-(R 1, R 2,..., Rn) Approximation of Higher-Order Tensors. SIAM

Journal on Matrix Analysis and Applications, 21(4), 1324-1342.

Doerfel, S., and Jäschke, R. (2013). An Analysis of Tag-recommender Evaluation

Procedures. Proc. The 7th ACM Conference on Recommender Systems, 343-

346. ACM.

Dyrby, M., Baunsgaard, D., Bro, R., and Engelsen, S. B. (2005). Multiway

Chemometric Analysis of the Metabolic Response to Toxins Monitored by

NMR. Chemometrics and Intelligent Laboratory Systems, 76(1), 79-89.

Ferrante, M., Ferro, N., and Maistro, M. (2014). Rethinking How to Extend Average

Precision to Graded Relevance. In Information Access Evaluation.

Multilinguality, Multimodality, and Interaction. 19-30, Springer International

Publishing.

Gemmell, J., Schimoler, T., Mobasher, B., and Burke, R. (2011). Tag-based

Resource Recommendation in Social Annotation Applications. In User

Modeling, Adaption and Personalization. 111-122, Springer.

Golder, S. A., and Huberman, B. A. (2006). Usage patterns of collaborative tagging

systems. Journal of Information Science, 32(2), 198-208.

Halpin, H., Robu, V., and Shepherd, H. (2007). The Complex Dynamics of

Collaborative Tagging. Proc. The 16th International Conference on World

Wide Web, Banff, Alberta, Canada, 21: 211-220. ACM.

Harshman, R. A. (1970). Foundations of the PARAFAC Procedure: Models and

Conditions for an "Explanatory" Multimodal Factor Analysis. UCLA Working

Papers in Phonetics, 16, 1-84.

Hofmann, T. (1999). Probabilistic Latent Semantic Analysis. Proc. The 15th

Conference on Uncertainty in Artificial Intelligence (UAI'99), Stockholm,

Sweden, 289-296. Morgan Kaufmann Publishers Inc.

Ifada, N., and Nayak, R. (2014a). An Efficient Tagging Data Interpretation and

Representation Scheme for Item Recommendation. Proc. The 12th

Australasian Data Mining Conference, Brisbane, Australia, ACS.

Ifada, N., and Nayak, R. (2014b). Tensor-based Item Recommendation using

Probabilistic Ranking in Social Tagging Systems. Proc. The 23rd

International Conference on World Wide Web Companion, Seoul, Korea,

805-810. ACM.

Ifada, N., and Nayak, R. (2014c). A Two-Stage Item Recommendation Method

Using Probabilistic Ranking with Reconstructed Tensor Model. In User

Modeling, Adaptation, and Personalization. 98-110, Springer.

Ifada, N., and Nayak, R. (2015). Do-Rank: DCG Optimization for Learning-to-Rank

in Tag-Based Item Recommendation Systems. In Advances in Knowledge

Discovery and Data Mining. 510-521, Springer.

Ifada, N., and Nayak, R. (2016). How Relevant is the Irrelevant Data: Leveraging the

Tagging Data for a Learning-to-Rank Model. Proc. The 19th ACM

International Conference on Web Search and Data Mining, San Francisco,

California, US, 23-32.

Jain, V., and Varma, M. (2011). Learning to Re-rank: Query-dependent Image Re-

ranking using Click Data. Proc. The 20th International Conference on World

Wide Web, Hyderabad, India, 277-286. ACM.

206 Bibliography

Järvelin, K., and Kekäläinen, J. (2002). Cumulated Gain-based Evaluation of IR

Techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422-

446.

Jäschke, R., Marinho, L., Hotho, A., Schmidt-Thieme, L., and Stumme, G. (2007).

Tag Recommendations in Folksonomies. Proc. The 11th European

Conference on Principles and Practice of Knowledge Discovery in

Databases, Warsaw, Poland, 506-514. Springer-Verlag.

Jitao, S., Changsheng, X., and Jing, L. (2012). User-Aware Image Tag Refinement

via Ternary Semantic Analysis. IEEE Transactions on Multimedia, 14(3),

883-895.

Kim, H.-N., Alkhaldi, A., El Saddik, A., and Jo, G.-S. (2011). Collaborative User

Modeling with User-generated Tags for Social Recommender Systems.

Expert Systems with Applications, 38(7), 8488-8496.

Kim, H.-N., and El Saddik, A. (2013). Exploring Social Tagging for Personalized

Community Recommendations. User Modeling and User-Adapted

Interaction, 23(2-3), 249-285.

Kim, H.-N., Ji, A.-T., Ha, I., and Jo, G.-S. (2010). Collaborative Filtering based on

Collaborative Tagging for Enhancing the Quality of Recommendation.

Electronic Commerce Research and Applications, 9(1), 73-83.

Kim, J., He, Y., and Park, H. (2014). Algorithms for Nonnegative Matrix and Tensor

Factorizations: A Unified View based on Block Coordinate Descent

Framework. Journal of Global Optimization, 58(2), 285-319.

Kolda, T., and Bader, B. (2009). Tensor Decompositions and Applications. SIAM

Review, 51(3), 455-500.

Kolda, T. G. (2006). Multilinear Operators for Higher-order Decompositions,

Sandia National Laboratories.

Kolda, T. G., and Sun, J. (2008). Scalable Tensor Decompositions for Multi-aspect

Data Mining. Proc. The 8th IEEE International Conference on Data Mining,

Pisa, Italy, 363-372. IEEE.

Koren, Y. (2008). Factorization Meets The Neighborhood: A Multifaceted

Collaborative Filtering Model. Proc. The 14th ACM SIGKDD International

Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada,

USA, 426-434. ACM.

Koren, Y., and Bell, R. (2011). Advances in Collaborative Filtering. In

Recommender Systems Handbook. 145-186, Springer US.

Koren, Y., Bell, R., and Volinsky, C. (2009). Matrix Factorization Techniques for

Recommender Systems. Computer, 42(8), 30-37.

Koren, Y., and Sill, J. (2011). OrdRec: An Ordinal Model for Predicting

Personalized Item Rating Distributions. Proc. The 5th ACM conference on

Recommender Systems, Chicago, Illinois, USA, 117-124. ACM.

Kutty, S., Chen, L., and Nayak, R. (2012). A People-to-people Recommendation

System using Tensor Space Models. Proc. The 27th Annual ACM Symposium

on Applied Computing, Trento, Italy, 187-192. ACM.

Lee, D. H., and Brusilovsky, P. (2010). Interest Similarity of Group Members: The

Case study of Citeulike. Proc. The WebSci10: Extending the Frontiers of

Society On-Line, North Carolina, US.

Lee, W. S. (2001). Collaborative Learning and Recommender Systems. Proc. The

Eighteenth International Conference on Machine Learning, San Francisco,

CA, USA, 314-321. Morgan Kaufmann Publishers Inc.

Bibliography 207

Leginus, M., Dolog, P., and Žemaitis, V. (2012). Improving Tensor Based

Recommenders with Clustering. In User Modeling, Adaptation, and

Personalization. 151-163, Springer Berlin Heidelberg.

Lekakos, G., and Giaglis, G. (2007). A Hybrid Approach for Improving Predictive

Accuracy of Collaborative Filtering Algorithms. User Modeling and User-

Adapted Interaction, 17(1-2), 5-40.

Li, P., Wu, Q., and Burges, C. J. (2007). McRank: Learning to Rank using Multiple

Classification and Gradient Boosting. Proc. Advances in Neural Information

Processing Systems 20 (NIPS 2007), 897-904.

Li, X., Guo, L., and Zhao, Y. E. (2008). Tag-based Social Interest Discovery. Proc.

The 17th International Conference on World Wide Web, Beijing, China, 675-

684. ACM.

Liang, H., Xu, Y., Li, Y., and Nayak, R. (2008). Collaborative Filtering

Recommender Systems Using Tag Information. Proc. IEEE/WIC/ACM

International Conference on Web Intelligence and Intelligent Agent

Technology, 2008. WI-IAT '08. , Sydney, NSW, 3: 59-62. IEEE Computer

Society.

Liang, H., Xu, Y., Li, Y., and Nayak, R. (2009). Tag Based Collaborative Filtering

for Recommender Systems. In Rough Sets and Knowledge Technology. 666-

673, Springer Berlin Heidelberg.

Liang, H., Xu, Y., Li, Y., Nayak, R., and Tao, X. (2010). Connecting users and items

with weighted tags for personalized item recommendations. Proc. The 21st

ACM Conference on Hypertext and Hypermedia, Toronto, Ontario, Canada,

51-60. ACM.

Liu, J., Musialski, P., Wonka, P., and Ye, J. (2009). Tensor completion for estimating

missing values in visual data. Proc. IEEE 12th International Conference on

Computer Vision, 2009 2114-2121. IEEE.

Liu, N. N., and Yang, Q. (2008). Eigenrank: A Ranking-oriented Approach to

Collaborative Filtering. Proc. The 31st Annual International ACM SIGIR

Conference on Research and Development in Information Retrieval,

Singapore, 83-90. ACM.

Liu, N. N., Zhao, M., and Yang, Q. (2009). Probabilistic Latent Preference Analysis

for Collaborative Filtering. Proc. The 18th ACM Conference on Information

and Knowledge Management, Hong Kong, China, 759-766. ACM.

Liu, T.-Y. (2009). Learning to Rank for Information Retrieval. Foundations and

Trends in Information Retrieval, 3(3), 225-331.

Lops, P., Gemmis, M., and Semeraro, G. (2011). Content-based Recommender

Systems: State of the Art and Trends. In Recommender Systems Handbook.

73-105, Springer US.

Lü, L., Medo, M., Yeung, C. H., Zhang, Y.-C., Zhang, Z.-K., and Zhou, T. (2012).

Recommender Systems. Physics Reports, 519(1), 1-49.

Luo, X., Ouyang, Y., and Xiong, Z. (2012). Improving neighborhood based

Collaborative Filtering via integrated folksonomy information. Pattern

Recognition Letters, 33(3), 263-270.

Maneeroj, S., and Takasu, A. (2009). Hybrid Recommender System Using Latent

Features. Proc. International Conference on Advanced Information

Networking and Applications Workshops (WAINA '09), Bradford, 661-666.

IEEE Computer Society.

208 Bibliography

Marinho, L., Nanopoulos, A., Schmidt-Thieme, L., Jäschke, R., Hotho, A., Stumme,

G., and Symeonidis, P. (2011). Social Tagging Recommender Systems. In

Recommender Systems Handbook. 615-644, Springer US.

Marinho, L. B., Hotho, A., Jäschke, R., Nanopoulos, A., Rendle, S., Schmidt-

Thieme, L., Stumme, G., and Symeonidis, P. (2012). Recommender Systems

for Social Tagging Systems, Springer Science & Business Media.

McCallum, A., and Nigam, K. (1998). A comparison of event models for naive bayes

text classification. Proc. AAAI-98 workshop on learning for text

categorization, 752: 41-48. Citeseer.

Mezghani, M., Zayani, C. A., Amous, I., and Gargouri, F. (2012). A User Profile

Modelling using Social Annotations: A Survey. Proc. The 21st International

Conference Companion on World Wide Web, Lyon, France, 969-976. ACM.

Mobasher, B. (2007). Data Mining for Web Personalization. In The adaptive web.

90-135, Springer Berlin Heidelberg.

Mobasher, B., Jin, X., and Zhou, Y. (2004). Semantically Enhanced Collaborative

Filtering on the Web. In Web Mining: From Web to Semantic Web. 57-76,


Mohan, A., Chen, Z., and Weinberger, K. (2011). Web-Search Ranking with

Initialized Gradient Boosted Regression Trees. Journal of Machine Learning

Research, 14, 77-89.

Mørup, M., Hansen, L. K., and Arnfred, S. M. (2008). Algorithms for sparse

nonnegative Tucker decompositions. Neural computation, 20(8), 2112-2131.

Nanopoulos, A. (2011). Item Recommendation in Collaborative Tagging Systems.

IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and

Humans, 41(4), 760-771.

Negoescu, R. A., and Gatica-Perez, D. (2008). Topickr: Flickr Groups and Users

Reloaded. Proc. The 16th ACM International Conference on Multimedia,

Vancouver, British Columbia, Canada, 857-860. ACM.

Pan, R., Dolog, P., and Xu, G. (2013). KNN-Based Clustering for Improving Social

Recommender Systems. In Agents and Data Mining Interaction. 115-125,

Springer.

Pazzani, M. J., and Billsus, D. (2007). Content-Based Recommendation Systems. In

The Adaptive Web. 325-341, Springer Berlin Heidelberg.

Pradel, B., Sean, S., Delporte, J., Guérif, S., Rouveirol, C., Usunier, N., Fogelman-

Soulié, F., and Dufau-Joel, F. (2011). A case study in a recommender system

based on purchase data. Proc. Proceedings of the 17th ACM SIGKDD

international conference on Knowledge discovery and data mining, 377-385.

ACM.

Qiu, F., and Cho, J. (2006). Automatic identification of user interest for personalized

search. Proc. The 15th International Conference on World Wide Web,

Edinburgh, Scotland, 727-736. ACM.

RadiumOne (2013). #Mobile Hashtag Survey. from

http://radiumone.com/about/#research

Rafailidis, D., and Daras, P. (2013). The TFC Model: Tensor Factorization and Tag

Clustering for Item Recommendation in Social Tagging Systems. IEEE

Transactions on Systems, Man and Cybernetics, Part A: Systems and

Humans, 43(3), 673-688.

Rawat, R., Nayak, R., and Li, Y. (2011). Identifying Interests of Web Users for

Effective Recommendations. International Journal of Innovation,

Management and Technology, 2(1), 19-24.

Bibliography 209

Rendle, S. (2011). Context-aware Ranking with Factorization Models, Springer.

Rendle, S., Balby Marinho, L., Nanopoulos, A., and Schmidt-Thieme, L. (2009).

Learning Optimal Ranking with Tensor Factorization for Tag

Recommendation. Proc. The 15th ACM SIGKDD International Conference

on Knowledge Discovery and Data Mining, Paris, France, 727-736. ACM.

Rendle, S., Freudenthaler, C., Gantner, Z., and Schmidt-Thieme, L. (2009). BPR:

Bayesian Personalized Ranking from Implicit Feedback. Proc. The 25th

Conference on Uncertainty in Artificial Intelligence, Montreal, Quebec,

Canada, 452-461. AUAI Press.

Rendle, S., and Schmidt-Thieme, L. (2010). Pairwise Interaction Tensor

Factorization for Personalized Tag Recommendation. Proc. The 3rd ACM

International Conference on Web Search and Data Mining, New York, USA,

81-90. ACM.

Robertson, S. E., Kanoulas, E., and Yilmaz, E. (2010). Extending Average Precision

to Graded Relevance Judgments. Proc. The 33rd International ACM SIGIR

Conference on Research and Development in Information Retrieval, Geneva,

Switzerland, 603-610. ACM.

Schoefegger, K., and Granitzer, M. (2012). Overview and Analysis of Personal and

Social Tagging Context to Construct User Models. Proc. The 2nd Workshop

on Context-awareness in Retrieval and Recommendation, Lisbon, Portugal,

14-21. ACM.

Shani, G., and Gunawardana, A. (2011). Evaluating Recommendation Systems. In

Recommender Systems Handbook. 257-297, Springer.

Shepitsen, A., Gemmell, J., Mobasher, B., and Burke, R. (2008). Personalized

Recommendation in Social Tagging Systems using Hierarchical Clustering.

Proc. The 2008 ACM Conference on Recommender Systems, 259-266. ACM.

Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., and Hanjalic, A. (2013a).

GAPfm: Optimal Top-N Recommendations for Graded Relevance Domains.

Proc. The 22nd ACM International Conference on Information & Knowledge

Management, San Francisco, California, USA, 2261-2266. ACM.

Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., and Hanjalic, A. (2013b).

xCLiMF: Optimizing Expected Reciprocal Rank for Data with Multiple

Levels of Relevance. Proc. The 7th ACM conference on Recommender

Systems, Hong Kong, China, 431-434. ACM.

Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Hanjalic, A., and Oliver, N.

(2012). TFMAP: Optimizing MAP for Top-N Context-Aware

Recommendation. Proc. The 35th International ACM SIGIR Conference on

Research and Development in Information Retrieval, Portland, Oregon, USA,

155-164. ACM.

Shi, Y., Karatzoglou, A., Baltrunas, L., Larson, M., Oliver, N., and Hanjalic, A.

(2012). CLiMF: Learning to Maximize Reciprocal Rank with Collaborative

Less-is-More Filtering. Proc. The 6th ACM Conference on Recommender

Systems, Dublin, Ireland, 139-146. ACM.

Shi, Y., Larson, M., and Hanjalic, A. (2010). List-wise Learning to Rank with Matrix

factorization for Collaborative Filtering. Proc. The 4th ACM Conference on

Recommender Systems, Barcelona, Spain, 269-272. ACM.

Sigurbjörnsson, B., and Van Zwol, R. (2008). Flickr Tag Recommendation based on

Collective Knowledge. Proc. The 17th International Conference on World

Wide Web, Beijing, China, 327-336. ACM.

210 Bibliography

Singh Anand, S., and Mobasher, B. (2005). Intelligent Techniques for Web

Personalization. In Lecture notes in computer science. 1-36, Springer Berlin

Heidelberg.

Su, X., and Khoshgoftaar, T. M. (2009). A Survey of Collaborative Filtering

Techniques. Advances in Artificial Intelligence, 2009(January), 4:2-4:2.

Sun, J. T., Zeng, H. J., Liu, H., Lu, Y., and Chen, Z. (2005). CubeSVD: a novel

approach to personalized Web search. Proc. The 14th international

conference on World Wide Web, Chiba, Japan, 382-390. ACM.

Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. (2008). Tag

Recommendations based on Tensor Dimensionality Reduction. Proc. The

2008 ACM Conference on Recommender Systems, Lausanne, Switzerland,

43-50. ACM.

Symeonidis, P., Nanopoulos, A., and Manolopoulos, Y. (2010). A Unified

Framework for Providing Recommendations in Social Tagging Systems

Based on Ternary Semantic Analysis. IEEE Transactions on Knowledge and

Data Engineering, 22(2), 179-192.

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R.,

Botstein, D., and Altman, R. B. (2001). Missing Value Estimation Methods

for DNA Microarrays. Bioinformatics, 17(6), 520-525.

Tso-Sutter, K. H. L., Marinho, L. B., and Schmidt-Thieme, L. (2008). Tag-aware

Recommender Systems by Fusion of Collaborative Filtering Algorithms.

Proc. The 2008 ACM symposium on Applied computing, Fortaleza, Ceara,

Brazil, 1995-1999. ACM.

Tsourakakis, C. E. (2009). MACH: Fast Randomized Tensor Decompositions. arXiv

preprint arXiv:0909.4969.

Tucker, L. R. (1966). Some Mathematical Notes on Three-mode Factor Analysis.

Psychometrika, 31(3), 279-311.

Vasilescu, M. A. O., and Terzopoulos, D. (2002). Multilinear Analysis of Image

Ensembles: Tensorfaces. In Computer Vision—ECCV 2002. 447-460,

Springer.

Venugopal, K. R., Srinivasa, K. G., and Patnaik, L. M. (2009). Algorithms for Web

Personalization. In Soft Computing for Data Mining Applications. 217-230,


Voorhees, E. M. (1999). The TREC-8 Question Answering Track Report. Proc.

TREC-8, 99: 77-82.

Vozalis, M. G., and Margaritis, K. G. (2007). Using SVD and demographic data for

the enhancement of generalized Collaborative Filtering. Information

Sciences, 177(15), 3017-3037.

Wang, Y., Wang, L., Li, Y., and He, D. (2013). A Theoretical Analysis of NDCG

Ranking Measures. Proc. The 26th Annual Conference on Learning Theory

(COLT 2013), Princeton, NJ, USA.

Weimer, M., Karatzoglou, A., Le, Q. V., and Smola, A. (2007). Maximum Margin

Matrix Factorization for Collaborative Ranking. Advances in Neural

Information Processing Systems.

Wetzker, R., Zimmermann, C., and Bauckhage, C. (2008). Analyzing social

bookmarking systems: A del. icio. us cookbook. Proc. The ECAI 2008

Mining Social Data Workshop, Patras, Greece, 26-30.

Wu, M., Chang, Y., Zheng, Z., and Zha, H. (2009). Smoothing DCG for Learning to

Rank: A Novel Approach using Smoothed Hinge Functions. Proc. The 18th

Bibliography 211

ACM Conference on Information and Knowledge Management, Hong Kong,

China, 1923-1926. ACM.

Xu, J., and Li, H. (2007). AdaRank: A Boosting Algorithm for Information Retrieval.

Proc. The 30th Annual International ACM SIGIR Conference on Research

and Development in Information Retrieval, Amsterdam, The Netherlands,

391-398. ACM.

Xu, W., Liu, X., and Gong, Y. (2003). Document Clustering based on Non-negative

Matrix Factorization. Proc. The 26th annual International ACM SIGIR

Conference on Research and Development in Informaion Retrieval, Toronto,

Canada, 267-273. ACM.

Zhang, Z.-K., Zhou, T., and Zhang, Y.-C. (2011). Tag-Aware Recommender

Systems: A State-of-the-Art Survey. Journal of Computer Science and

Technology, 26(5), 767-777.

Zhang, Z. K., Liu, C., Zhang, Y. C., and Zhou, T. (2010). Solving the Cold-start

Problem in Recommender Systems with Social Tags. EPL (Europhysics

Letters), 92(2), 28002-p28001-28002-p28006.

Zhen, Y., Li, W.-J., and Yeung, D.-Y. (2009). TagiCoFi: Tag Informed Collaborative

Filtering. Proc. The 3rd ACM Conference on Recommender Systems, New

York, USA, 69-76.

Documents

P RANKING FOR AG BASED ITEM RECOMMENDATION SYSTEM … Ifada_Thesis.pdf · Noor Ifada . B.Eng, M.ISD . Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy