Matrix and Tensor Decomposition in Recommender …delab.csd.auth.gr/papers/RecSys2016s.pdfMatrix and Tensor Decomposition in Recommender Systems Panagiotis Symeonidis Department of

Matrix and Tensor Decomposition in RecommenderSystems

Panagiotis SymeonidisDepartment of Informatics

Aristotle UniversityThessaloniki, 54124, [email protected]

ABSTRACTThis turorial offers a rich blend of theory and practice re-garding dimensionality reduction methods, to address the in-formation overload problem in recommender systems. Thisproblem affects our everyday experience while searching forknowledge on a topic. Naive Collaborative Filtering can-not deal with challenging issues such as scalability, noise,and sparsity. We can deal with all the aforementioned chal-lenges by applying matrix and tensor decomposition meth-ods. These methods have been proven to be the most accu-rate (i.e., Netflix prize) and efficient for handling big data.For each method (SVD, SVD++, timeSVD++, HOSVD,CUR, etc.) we will provide a detailed theoretical mathe-matical background and a step-by-step analysis, by usingan integrated toy example, which runs throughout all partsof the tutorial, helping the audience to understand clearlythe differences among factorisation methods.

1. INTRODUCTIONRepresenting data in lower dimensional spaces has been

extensively used in many disciplines such as natural lan-guage and image processing, data mining and informationretrieval. Recommender systems deal with challenging is-sues such as scalability, noise, and sparsity and thus, matrixand tensor factorization techniques appear as an interest-ing tool to be exploited [2, 4]. That is, we can deal withall the aforementioned challenges by applying matrix andtensor decomposition methods (also known as factorizationmethods).

The rest of this paper is organized as follows. Section 2provides a detailed outline of the tutorial for matrix decom-position techniques. Section 3 provides a detailed outlineof the tutorial for tensor decomposition. Finally, Section 4concludes this paper.

2. MATRIX DECOMPOSITIONMatrix Factorization denotes a process, where a matrix

is factorized into a product of matrices. Its importance re-

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).

RecSys ’16 September 15-19, 2016, Boston , MA, USAc© 2016 Copyright held by the owner/author(s).

ACM ISBN 978-1-4503-4035-9/16/09.

DOI: http://dx.doi.org/10.1145/2959100.2959195

lies on the exploitation of the latent associations that ex-ist in data among the participating entities (e.g. betweenusers and items). In a trivial form, the matrix factorizationmethod uses two matrices, which hold the information ofcorrelation between the user-feature and item-feature fac-tors, respectively.

Figure 1 shows an example of the latent factors, whichcould be revealed after performing matrix decomposition.As shown, the X’X axis divides both people and movies ac-cording sex (e.g., male or female). When a movie is closerto the female part of X’X axis, it means that this movie ismost popular in women rather than in men. The Y’Y axisdivides people and movies as “war-like” and “romantic”. A“war-like” viewer is assumed to prefer movies showing bloodand deaths. In contrast, a “romantic” viewer chooses moviesthat present love and passion.

USER 1

USER 4

USER 2

USER 3

War movies

Romance movies

Geared toward females

Geared toward males

Figure 1: Visual representation of users and moviesusing two axes — male versus female and war-likeversus romantic.

429

http://dx.doi.org/10.1145/2959100.2959195

To predict a user’s rating over a movie, we can computethe dot product of the movie’s and user’s [x,y] coordinateson the graph. In addition, Figure 1 shows where movies andusers might fall on the basic two dimensions. For example,we would expect User 3 to love “Casablanca”, to hate “TheKing’s Speech”, and to rate “Amelie” about average. Notethat some movies (i.e. “Taken 3”) and users (i.e. User 4 )would be characterised as fairly neutral on these two dimen-sions.

In this tutorial, we provide the related work of basic ma-trix decomposition methods. The first method that we dis-cuss is known as Eigenvalue Decomposition, which decom-poses the initial matrix into a canonical form. A secondmethod is the Non-Negative Matrix Factorization (NMF),which factorizes the initial matrix into two smaller matri-ces with the constraint that each element of the factorizedmatrices should be non-negative. A third method is theProbabilistic Matrix Factorization (PMF), which scales wellto large datasets. PMF mathod performs well on very sparseand imbalanced datasets using spherical Gaussian priors.The last but one method is Probabilistic Latent SemanticAnalysis (PLSA) which is based on a mixture decomposi-tion derived from a latent class model. The last method isCUR Decomposition, which confronts the problem of densityin the factorized matrices (a problem that is faced on SVDmethod). Moreover, we describe Singular Value Decomposi-tion (SVD) and UV-Decomposition in details. We minimisean objective function, which captures the error between thepredicted and the real value of a user’s rating. Moreover, anadditional constraint of friendship is added in the objectivefunction to leverage the quality of recommendations [1].

Finally, we study the performance of the described SVDand UV-decomposition algorithms, against an improved ver-sion of the original item-based CF algorithm combined withSVD.

3. TENSOR DECOMPOSITIONBecause of the ternary relational nature of data in many

cases (e.g., Social Tagging Systems (STSs), Location-basedsocial network (LBSNs) etc.), many recommendation algo-rithms originally designed to operate on matrices cannotbe applied. Higher order problems put forward new chal-lenges and opportunities for recommender systems. For ex-ample, the ternary relation of STSs can be represented asa third-order tensor A = (au,i,t) ∈ R|U|×|I|×|T |. Symeoni-dis et al. [3], for example, proposed to interpret the userassignment of a tag on an item, as a binary tensor where1 indicates observed tag assignments and 0 missing values(see Figure 2):

au,i,t :=

{1, if user u assign on item i tag t

0, otherwise

Tensor factorization techniques can be employed in orderto exploit the underlying latent semantic structure in tensorA. The basic idea is to transform the recommendation prob-lem as a third-order tensor completion problem, by tryingto predict the non-observed entries in A.

In this tutorial, we provide the related work on tensor de-composition methods. The first method that is discussed isthe Tucker Decomposition method (TD), which is the under-lying tensor factorization model of HOSVD. TD decomposesa tensor into a set of matrices and one small core tensor. The

users

items

tags

Figure 2: Tensor representation of a STS where pos-itive feedback is interpreted as 1 (i.e., autr := 1) andthe rest as 0 (i.e., autr := 0).

second one is PARAFAC method (PARAllel FACtor analy-sis), which is the same as TD method with the restrictionthat core tensor should be diagonal. The third method is thePITF method (Pairwise Interaction Tensor Factorization),which is a special case of TD method with linear runtimeboth for learning and prediction. The last method that isanalyzed is the Low-order Tensor Decomposition (LOTD)method.

The main factorization method that will be presented inthis tutorial is Higher Order SVD (HOSVD), which is anextended version of the SVD method. In particular, we willpresent a step-by-step implementation of HOSVD in a toyexample. Then, we will present how we can update HOSVDwhen a new user is registered in our recommender system.We will also discuss how HOSVD can be combined withother methods for leveraging the quality of recommenda-tions. Finally, we will provide experimental results of tensordecomposition methods on real datasets in STSs. Moreover,we will discuss about the metrics that we will use (i.e., Pre-cision, Recall, RMSE, etc.). Our goal is to present the mainfactors that influence the effectiveness of algorithms.

4. CONCLUSIONSIn this tutorial, we discuss the advantages and limitations

of each matrix and tensor decomposition algorithm for rec-ommender systems, and provide future research directions.

References[1] Yehuda Koren. Factorization meets the neighborhood:

A multifaceted collaborative filtering model. InProceedings of the 14th ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,KDD ’08, pages 426–434, New York, NY, USA, 2008.ACM.

[2] Yehuda Koren, Robert Bell, and Chris Volinsky. Matrixfactorization techniques for recommender systems.Computer, 42(8):30–37, August 2009.

[3] Panagiotis Symeonidis, Alexandros Nanopoulos, andYannis Manolopoulos. Tag recommendations based ontensor dimensionality reduction. In RecSys ’08:Proceedings of the 2008 ACM conference onRecommender systems, pages 43–50. ACM, 2008.

[4] Gabor Takacs, Istvan Pilaszy, Bottyan Nemeth, andDomonkos Tikk. Matrix factorization and neighborbased algorithms for the netflix prize problem. RecSys’08, pages 267–274, New York, NY, USA, 2008. ACM.

430

Documents

Matrix and Tensor Decomposition in Recommender …delab.csd.auth.gr/papers/RecSys2016s.pdfMatrix and Tensor Decomposition in Recommender Systems Panagiotis Symeonidis Department of