10
Leveraging Long and Short-term Information in Content-aware Movie Recommendation Wei Zhao SIAT, Chinese Academy of Sciences [email protected] Haixia Chai, Benyou Wang Tencent {haixiachai,wabywang}@tencent.com Jianbo Ye Pennsylvania State University [email protected] Min Yang * SIAT, Chinese Academy of Sciences [email protected] Zhou Zhao Zhejiang University [email protected] Xiaojun Chen Shenzhen University [email protected] ABSTRACT Movie recommendation systems provide users with ranked lists of movies based on individual’s preferences and constraints. Two types of models are commonly used to generate ranking results: long-term models and session-based models. While long-term mod- els represent the interactions between users and movies that are supposed to change slowly across time, session-based models en- code the information of users’ interests and changing dynamics of movies’ attributes in short terms. In this paper, we propose an LSIC model, leveraging Long and Short-term Information in Content-aware movie recommendation using adversarial training. In the adversarial process, we train a generator as an agent of re- inforcement learning which recommends the next movie to a user sequentially. We also train a discriminator which attempts to dis- tinguish the generated list of movies from the real records. The poster information of movies is integrated to further improve the performance of movie recommendation, which is specically essen- tial when few ratings are available. The experiments demonstrate that the proposed model has robust superiority over competitors and sets the state-of-the-art. We will release the source code of this work after publication. CCS CONCEPTS Recommender system → Movie Recommendation; KEYWORDS Movie recommendation, Adversarial learning, Content-aware rec- ommendation 1 INTRODUCTION With the sheer volume of online information, much attention has been given to data-driven recommender systems. Those systems au- tomatically guide users to discover products or services respecting their personal interests from a large pool of possible options. Numer- ous recommendation techniques have been developed. Three main * Min Yang is the corresponding author. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). WOODSTOCK’97, USA © 2018 Copyright held by the owner/author(s). 123-4567-24-567/08/06. . . $15.00 DOI: 10.475/123_4 categories of them are: collaborative ltering methods, content- based methods and hybrid methods [3, 19]. In this paper, we aim to develop a method producing a ranked list of n movies to a user at a given moment (top- n movie recommendation) by exploiting both historical user-movie interactions and the content information of movies. Matrix factorization (MF) [16] is one of the most successful tech- niques in the practice of recommendation due to its simplicity, at- tractive accuracy and scalability. It has been used in a broad range of applications such as recommending movies, books, web pages, relevant research and services. The matrix factorization technique is usually eective because it discovers the latent features under- pinning the multiplicative interactions between users and movies. Specically, it models the user preference matrix approximately as a product of two lower-rank latent feature matrices representing user proles and movie proles respectively. Despite the appeal of matrix factorization, this technique does not explicitly consider the temporal variability of data [32]. Firstly, the popularity of an movie may change over time. For example, movie popularity booms or fades, which can be triggered by ex- ternal events such as the appearance of an actor in a new movie. Secondly, users may change their interests and baseline ratings over time. For instance, a user who tended to rate an average movie as “4 stars”, may now rate such a movie as “3 stars”. Recently, recur- rent neural network (RNN) [13] has gained signicant attention by considering such temporal dynamics for both users and movies and achieved high recommendation quality [31, 32]. The basic idea of these RNN-based methods is to formulate the recommendation as a sequence prediction problem. They take the latest observations as input, update the internal states and make predictions based on the newly updated states. As shown in [8], such prediction based on short-term dependencies is likely to improve the recommendation diversity. More recent work [32] reveals that matrix factorization based and RNN-based recommendation approaches have good performances for the reasons that are complementary to each other. Specically, the matrix factorization recommendation approaches make movie predictions based on users’ long-term interests which change very slowly with respect to time. On the contrary, the RNN recommen- dation approaches predict which movie will the user consume next, respecting the dynamics of users’ behaviors and movies’ attributes in the short term. It therefore motivates us to devise a joint ap- proach that takes advantage of both matrix factorization and RNN, arXiv:1712.09059v5 [cs.IR] 26 Jun 2018

Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

Leveraging Long and Short-term Information inContent-aware Movie Recommendation

Wei ZhaoSIAT, Chinese Academy of Sciences

[email protected]

Haixia Chai, Benyou WangTencent

{haixiachai,wabywang}@tencent.com

Jianbo YePennsylvania State University

[email protected]

Min Yang∗

SIAT, Chinese Academy of [email protected]

Zhou ZhaoZhejiang University

[email protected]

Xiaojun ChenShenzhen [email protected]

ABSTRACTMovie recommendation systems provide users with ranked listsof movies based on individual’s preferences and constraints. Twotypes of models are commonly used to generate ranking results:long-term models and session-based models. While long-term mod-els represent the interactions between users and movies that aresupposed to change slowly across time, session-based models en-code the information of users’ interests and changing dynamicsof movies’ attributes in short terms. In this paper, we proposean LSIC model, leveraging Long and Short-term Information inContent-aware movie recommendation using adversarial training.In the adversarial process, we train a generator as an agent of re-inforcement learning which recommends the next movie to a usersequentially. We also train a discriminator which attempts to dis-tinguish the generated list of movies from the real records. Theposter information of movies is integrated to further improve theperformance of movie recommendation, which is speci�cally essen-tial when few ratings are available. The experiments demonstratethat the proposed model has robust superiority over competitorsand sets the state-of-the-art. We will release the source code of thiswork after publication.

CCS CONCEPTS• Recommender system → Movie Recommendation;

KEYWORDSMovie recommendation, Adversarial learning, Content-aware rec-ommendation

1 INTRODUCTIONWith the sheer volume of online information, much attention hasbeen given to data-driven recommender systems. Those systems au-tomatically guide users to discover products or services respectingtheir personal interests from a large pool of possible options. Numer-ous recommendation techniques have been developed. Three main∗Min Yang is the corresponding author.

Permission to make digital or hard copies of part or all of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for third-party components of this work must be honored.For all other uses, contact the owner/author(s).WOODSTOCK’97, USA© 2018 Copyright held by the owner/author(s). 123-4567-24-567/08/06. . . $15.00DOI: 10.475/123_4

categories of them are: collaborative �ltering methods, content-based methods and hybrid methods [3, 19]. In this paper, we aim todevelop a method producing a ranked list of n movies to a user at agiven moment (top-n movie recommendation) by exploiting bothhistorical user-movie interactions and the content information ofmovies.

Matrix factorization (MF) [16] is one of the most successful tech-niques in the practice of recommendation due to its simplicity, at-tractive accuracy and scalability. It has been used in a broad rangeof applications such as recommending movies, books, web pages,relevant research and services. The matrix factorization techniqueis usually e�ective because it discovers the latent features under-pinning the multiplicative interactions between users and movies.Speci�cally, it models the user preference matrix approximately asa product of two lower-rank latent feature matrices representinguser pro�les and movie pro�les respectively.

Despite the appeal of matrix factorization, this technique doesnot explicitly consider the temporal variability of data [32]. Firstly,the popularity of an movie may change over time. For example,movie popularity booms or fades, which can be triggered by ex-ternal events such as the appearance of an actor in a new movie.Secondly, users may change their interests and baseline ratingsover time. For instance, a user who tended to rate an average movieas “4 stars”, may now rate such a movie as “3 stars”. Recently, recur-rent neural network (RNN) [13] has gained signi�cant attention byconsidering such temporal dynamics for both users and movies andachieved high recommendation quality [31, 32]. The basic idea ofthese RNN-based methods is to formulate the recommendation as asequence prediction problem. They take the latest observations asinput, update the internal states and make predictions based on thenewly updated states. As shown in [8], such prediction based onshort-term dependencies is likely to improve the recommendationdiversity.

More recent work [32] reveals that matrix factorization based andRNN-based recommendation approaches have good performancesfor the reasons that are complementary to each other. Speci�cally,the matrix factorization recommendation approaches make moviepredictions based on users’ long-term interests which change veryslowly with respect to time. On the contrary, the RNN recommen-dation approaches predict which movie will the user consume next,respecting the dynamics of users’ behaviors and movies’ attributesin the short term. It therefore motivates us to devise a joint ap-proach that takes advantage of both matrix factorization and RNN,

arX

iv:1

712.

0905

9v5

[cs

.IR

] 2

6 Ju

n 20

18

Page 2: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

WOODSTOCK’97, July 1997, USA Zhao et al.

exploiting both long-term and short-term associations among usersand movies.

Furthermore, most existing recommender systems take into ac-count only the users’ past behaviors when making recommendation.compared with tens of thousands of movies in the corpus, the his-torical rating set is too sparse to learn a well-performed model. Itis desirable to exploit the content information of movies for rec-ommendation. For example, movie posters reveal a great amountof information to understand movies and users, as demonstratedin [35]. Such a poster is usually the �rst contact that a user has witha movie, and plays an essential role in the user’s decision to watchit or not. When a user is watching the movie presented in cold, blueand mysterious visual e�ects, he/she may be interested in receivingrecommendations for movies with similar styles, rather than othersthat are with the same actors or subject [35]. These visual featuresof movies are usually captured by the corresponding posters.

In this paper, we propose a novel LSIC model, which leveragesLong and Short-term Information in Content-aware movie recom-mendation using adversarial training. The LSTC model employs anadversarial framework to combine the MF and RNN based modelsfor the top-n movie recommendation, taking the best of each toimprove the �nal recommendation performance. In the adversarialprocess, we simultaneously train two models: a generative modelGand a discriminative model D. In particular, the generator G takesthe user ui and time t as input, and predicts the recommendationlist for user i at time t based on the historical user-movie interac-tions. We implement the discriminator D via a siamese networkthat incorporates long-term and session-based ranking model in apair-wise scenario. The two point-wise networks of siamese net-work share the same set of parameters. The generator G and thediscriminator D are optimized with a minimax two-player game.The discriminator D tries to distinguish the real high-rated moviesin the training data from the recommendation list generated bythe generator G, while the training procedure of generator G is tomaximize the probability of D making a mistake. Thus, this adver-sarial process can eventually adjust G to generate plausible andhigh-quality recommendation list. In addition, we integrate posterinformation of movies to further improve the performance of movierecommendation, which is speci�cally essential when few ratingsare available.

We summarize our main contributions as follows:

• To the best of our knowledge, we are the �rst to use GANframework to leverage the MF and RNN approaches fortop-n recommendation. This joint model adaptively adjustshow the contributions of the long-term and short-terminformation of users and movies are mixed together.

• We propose hard and soft mixture mechanisms to integrateMF and RNN. We use the hard mechanism to calculate themixing score straightforwardly and explore several softmechanisms to learn the temporal dynamics with the helpof the long-term pro�les.

• Our model uses reinforcement learning to optimize thegenerator G for generating highly rewarded recommenda-tion list. Thus, it e�ectively bypasses the non-di�erentiabletask metric issue by directly performing policy gradientupdate.

• We automatically crawl the posters of the given movies,and explore the potential of integrating poster informa-tion to improve the accuracy of movie recommendation.The release of the collected posters would push forwardthe research of integrating content information in movierecommender systems.

• To verify the e�ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets:Net�ix Prize Contest data and Movielens data. The exper-imental results demonstrate that our model consistentlyoutperforms the state-of-the-art methods.

The rest of the paper is organized as follows. In Section 2, we re-view the related work on recommender systems. Section 3 presentsthe proposed adversarial learning framework for movie recommen-dation in details. In Section 4, we describe the experimental data,implementation details, evaluation metrics and baseline methods.The experimental results and analysis are provided in Section 5.Section 6 concludes this paper.

2 RELATEDWORKRecommender system is an active research �eld. The authors of [3,19] describe most of the existing techniques for recommender sys-tems. In this section, we brie�y review the following major ap-proaches for recommender systems that are mostly related to ourwork.

Matrix factorization for recommendation. Modeling the long-terminterests of users, the matrix factorization method and its variantshave grown to become dominant in the literature [10, 11, 14, 16, 26].In the standard matrix factorization, the recommendation task canbe formulated as inferring missing values of a partially observeduser-item matrix [16]. The Matrix Factorization techniques are ef-fective because they are designed to discover the latent featuresunderlying the interactions between users and items. Srebro et al.[27] suggested the Maximum Margin Matrix Factorization (MMMF),which used low-norm instead of low-rank factorizations. Mnih andSalakhutdinov [22] presented the Probabilistic Matrix Factorization(PMF) model that characterized the user preference matrix as aproduct of two lower-rank user and item matrices. The PMF modelwas especially e�ective at making better predictions for users withfew ratings. He et al. [10] proposed a new MF method which con-siders the implicit feedback for on-line recommendation. In [10],the weights of the missing data were assigned based on the popu-larity of items. To exploit the content of items and solve the sparseissue in recommender systems, [35] presented the model for movierecommendation using additional visual features (e.g.. posters andstill frames) to better understand movies, and further improved theperformance of movie recommendation.

Recurrent neural network for recommendation. These traditionalMF methods for recommendation systems are based on the assump-tion that the user interests and movie attributes are near static,which is however not consistent with reality. Koren [15] discussedthe e�ect of temporal dynamics in recommender systems and pro-posed a temporal extension of the SVD++ (called TimeSVD++) toexplicitly model the temporal bias in data. However, the features

Page 3: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

Leveraging Long and Short-term Information in Content-aware Movie Recommendation WOODSTOCK’97, July 1997, USA

used in TimeSVD++ were hand-crafted and computationally ex-pensive to obtain. Recently, there have been increasing interests inemploying recurrent neural network to model temporal dynamicin recommendation systems. For example, Hidasi et al. [12] appliedrecurrent neural network (i.e. GRU) to session-based recommendersystems. This work treats the �rst item a user clicked as the initialinput of GRU. Each follow-up click of the user would then triggera recommendation depending on all of the previous clicks. Wuet al. [30] proposed a recurrent neural network to perform thetime heterogeneous feedback recommendation. Wu et al. [32] usedLSTM autoregressive model for the user and movie dynamics andemployed matrix factorization to model the stationary componentsthat encode �xed properties. Di�erent from their work, we useGAN framework to leverage the MF and RNN approaches for top-nrecommendation, aiming to generate plausible and high-qualityrecommendation lists. To address the cold start problem in recom-mendation, [7] presented a visual and textural recurrent neuralnetwork (VT-RNN), which simultaneously learned the sequentiallatent vectors of user’s interest and captured the content-basedrepresentations that contributed to address the cold start.

Generative adversarial network for recommendation. In parallel,previous work has demonstrated the e�ectiveness of generativeadversarial network (GAN) [9] in various tasks such as image gen-eration [2, 24], image captioning [4], and sequence generation[33].The most related work to ours is [29], which proposed a novel IR-GAN mechanism to iteratively optimize a generative retrieval com-ponent and a discriminative retrieval component. IRGAN reportedimpressive results on the tasks of web search, item recommenda-tion, and question answering. Our approach di�ers from theirs inseveral aspects. First, we combine the MF approach and the RNNapproach with GAN, exploiting the performance contributions ofboth approaches. Second, IRGAN does not attempt to estimate thefuture behavior since the experimental data is split randomly intheir setting. In fact, they use future trajectories to infer the his-torical records, which seems not useful in real-life applications.Third, we incorporate poster information of movies to deal withthe cold-start issue and boost the recommendation performance.

3 OUR MODELSuppose there is a sparse user-movie rating matrix R that consistsofU users and M movies. Each entry ri j,t denotes the rating of useri on movie j at time step t . The rating is represented by numericalvalues from 1 to 5, where the higher value indicates the strongerpreference. Instead of predicting the rating of a speci�c user-moviepair as is done in [1, 20], the proposed LSIC model aims to provideusers with ranked lists of movies (top-n recommendation) [18].

In this section, we elaborate each component of LSIC modelfor content-aware movie recommendation. The main notations ofthis work are summarized in Table 1 for clarity. The LSTC modelemploys an adversarial framework to combine the MF and RNNbased models for the top-n movie recommendation. The overviewof our proposed architecture and its data-�ow are illustrated inFigure 1. In the adversarial process, we simultaneously train twomodels: a generative model G and a discriminative model D.

Table 1: Notation list. We use superscript u to annotate pa-rameters related to a user, and superscriptm to annotate pa-rameters related to a movie.

R the user-movie rating matrixU , M the number of users and moviesri j rating score of user i on movie jri j,t rating score of user i on movie j at time teui MF user factors for user iemj MF movie factors for movie jbui bias of user i in MF and RNN hybrid calculationbmj bias of movie j in MF and RNN hybrid calculationhui,t LSTM hidden-vector at time t for user ihmj,t LSTM hidden-vector at time t for movie jzui,t the rating vector of user i at time t (LSTM input)zmj,t the rating vector of movie j at time t (LSTM input)α it attention weight of user i at time tβ jt attention weight of movie j at time tm+ index of a positive (high-rating) movie drawn from the

entire positive movie setM+m− index of a negative (low-rating) movie randomly chosen from the

entire negative movie setM−mд,t index of an item chosen by generator G at time t

3.1 Matrix Factorization (MF)The MF framework [22] models the long-term states (global in-formation) for both users (eu ) and movies (em ). In its standardsetting, the recommendation task can be formulated as inferringmissing values of a partially observed user-movie rating matrix R.The formulation of MF is given by:

argmineu ,em

∑i, j

Ii j(ri j − ρ

((eui )

T emj))2+ λu ‖eu ‖2F + λ

m ‖em ‖2F (1)

where eui and emj represent the user and movie latent factors inthe shared d-dimension space respectively. ri j denotes the user i’srating on movie j . Ii j is an indicator function and equals 1 if ri j > 0,and 0 otherwise. λu and λm are regularization coe�cients. The ρ(·)is a logistic scoring function that bounds the range of outputs.

In most recommender systems, matrix factorization techniques[16] recommend movies based on estimated ratings. Even thoughthe predicted ratings can be used to rank the movies, it is knownthat it does not provide the best prediction for the top-n recommen-dation. Because minimizing the objective function – the squarederrors – does not perfectly align with the goal to optimize the rank-ing order. In this paper, we apply MF for ranking prediction (top-nrecommendation) directly, similar to [29].

3.2 Recurrent Neural Network (RNN)The RNN based recommender system focuses on modeling session-based trajectories instead of global (long-term) information [32].It predicts future behaviors and provides users with a ranking listgiven the users’ past history. The main purpose of using RNN is tocapture time-varying state for both users and movies. Particularly,we use LSTM cell as the basic RNN unit. Each LSTM unit at time tconsists of a memory cell ct , an input gate it , a forget gate ft , andan output gate ot . These gates are computed from previous hidden

Page 4: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

WOODSTOCK’97, July 1997, USA Zhao et al.

Discriminative Model

PolicyGradient

Generative Model

Recurrent Temporal User-State

Candidate Pool

Recurrent Temporal Movie-State

Score

Userm- m+

CNN

RNN & MF Hybrid

CNNjm

uie

...,0uih

mje

...

,1mjz

iu

jmRNN & MF Hybrid

CNN

Score

Siamese Pairwise Ranking Model

,mj tz

,0uiz ,1

uiz ,

ui tz

,1uih ,

ui th

,1mjh ,

mj th

,0mjz

,0mjh

Figure 1: The Architecture of Long-term and Session-based Ranking Model with Adversarial Network.

state ht−1 and the current input xt :

[ft , it ,ot ] = sigmoid(W [ht−1, xt ]) (2)

The memory cell ct is updated by partially forgetting the existingmemory and adding a new memory content lt :

lt = tanh(V [ht−1, xt ]) (3)ct = ft � ct−1 + it � lt (4)

Once the memory content of the LSTM unit is updated, thehidden state at time step t is given by:

ht = ot � tanh(ct ) (5)

For simplicity of notation, the update of the hidden states of LSTMat time step t is denoted as ht = LSTM(ht−1, xt ).

Here, we use zui,t ∈ RU and zmj,t ∈ R

M to represent the ratingvector of user i and movie j given time t respectively. Both zui,t andzmj,t serve as the input to the LSTM layer at time t to infer the newstates of the user and the movie:

hui,t = LSTM(hui,t−1, zui,t ) (6)

hmj,t = LSTM(hmj,t−1, zmj,t ) (7)

Here, hui,t and hmj,t denote hidden states for user i and movie j attime step t respectively.

In this work, we explore the potential of integrating postersof movies to boost the performance of movie recommendation.Inspired by the recent advances of CNNs in computer vision, theposter is mapped to the same space of the movie by using a CNN.More concretely, we encode each image into a FC-2k feature vectorwith Resnet-101 (101 layers), resulting in a 2048-dimensional vectorrepresentation. The poster Pj of movie j is only inputted once, att = 0, to inform the movie LSTM about the poster content:

zmj,0 = CNN (Pj ). (8)

3.3 RNN and MF HybridThe session-based model deals with temporal dynamics of the userand movie states, we further incorporate the long-term preference

of users and the �xed properties of movies. To exploit their ad-vantages together, similar to [32], we de�ne the rating predictionfunction as:

ri j,t = д(eui , emj , h

ui,t , h

mj,t ) (9)

where д(.) is a score function, eui and emj denote the global latentfactors of user i and movie j learned by Eq. (1); hui,t and hmj,t denotethe hidden states at time step t of two RNNs learned by Eq. (6)and Eq. (7) respectively. In this work, we study four strategies tocalculate the score function д, integrating MF and RNN. The detailsare described below.

LSIC-V1. This is a hard mechanism, using a simple way to cal-culate the mixing score from MF and RNN with the followingformulation:

ri j,t = д(eui , emj , h

ui,t , h

mj,t ) =

11 + exp(−s) (10)

s = eui · emj + h

ui,t · h

mj,t + b

ui + b

mj (11)

where bui and bmj are the biases of user i and movie j; hui,t and hmj,tare computed by Eq. (6) and Eq. (7).

In fact, LSIC-V1 does not exploit the global factors in learningthe temporal dynamics. In this paper, we also design a soft mixturemechanism and provide three strategies to account for the globalfactors eui and emj in learning hui,t and hmj,t , as described below (i.e.,LSIC-V2, LSIC-V3 and LSIC-V4).

LSIC-V2. We use the latent factors of user i (eui ) and movie j (emj )pre-trained by MF model to initialize the hidden states of the LSTMcells hui,0 and hmj,0 respectively, as depicted in Figure 2(b).

LSIC-V3. As shown in Figure 2(c), we extend LSIC-V2 by treatingeui (for user i) and emj (for movie j) as the static context vectors, andfeed them as an extra input into the computation of the temporalhidden states of users and movies by LSTM. At each time step, thecontext information assists the inference of the hidden states ofLSTM model.

Page 5: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

Leveraging Long and Short-term Information in Content-aware Movie Recommendation WOODSTOCK’97, July 1997, USA

Global User Factors

Global Movie Factors

,0uih ,1

uih ,

ui th

,0mjh ,1

mjh ,

mj th

,ij tr

uie

mje

(a) LSIC-V1: Hard mechanism

Global User Factors

Global Movie Factors

uie

mje

,ij tr,0uih ,1

uih ,

ui th

,0mjh ,1

mjh ,

mj th

(b) LSIC-V2: Prior initialization

Global User Factors

Global Movie Factors

uie

mje

,0uih ,1

uih ,

ui th

,0mjh ,1

mjh ,

mj th

,ij tr

(c) LSIC-V3: Static context

Movie Attention

Global User Factors

Global Movie Factors

User Attention

uie

mje

,ij tr

uie

mje

,0uih ,1

uih ,

ui th

,0mjh ,1

mjh ,

mj th

,0uic ,1

uic ,

ui tc

,mj tc,1

mjc,0

mjc

(d) LSIC-V4: Attention model

Figure 2: Four strategies to calculate the score function д, integrating MF and RNN.

LSIC-V4. This method use an attention mechanism to compute aweight for each hidden state by exploiting the global factors. Themixing scores at time t can be reformulated by:

ri j,t = д(eui , emj , h

ui,t−1, h

mj,t−1, c

ui,t , c

mj,t ) =

11 + exp(−s) (12)

s = eui · emj + h

ui,t · h

mj,t + bi + bj (13)

where cui,t and cmj,t are the context vectors at time step t for user iand movie j; hui,t and hmj,t are the hidden states of LSTMs at timestep t , computed by

hui,t = LSTM(hui,t−1, zui,t , c

ui,t ) (14)

hmj,t = LSTM(hmj,t−1, zmj,t , c

mj,t ) (15)

The context vectors cui,t and cmj,t act as extra input in the com-putation of the hidden states in LSTMs to make sure that everytime step of the LSTMs can get full information of the context(long-term information). The context vectors cui,t and cmj,t are thedynamic representations of the relevant long-term information foruser i and movie j at time t , calculated by

cui,t =U∑k=1

α ik,t euk ; cmj,t =

M∑p=1

βjp,t e

mp (16)

where U and M are the number of users and movies. The attentionweights α ik,t and β

jp,t for user i and movie j at time step t are

computed by

α ik,t =exp(σ (hui,t−1, e

uk ))∑U

k ′=1 exp(σ (hui,t−1, e

uk ′))

(17)

βjp,t =

exp(σ (hmj,t−1, emp ))∑M

p′=1 exp(σ (hmj,t−1, e

mp′ ))

(18)

where σ is a feed-forward neural network to produce a real-valuedscore. The attention weights α it and β jt together determine whichuser and movie factors should be selected to generate ri j,t .

3.4 Generative Adversarial Network (GAN) forRecommendation

Generative adversarial network (GAN)[9] consist of a generatorG and a discriminator D that compete in a minimax game withtwo players: The discriminator tries to distinguish real high-ratedmovies on training data from ranking or recommendation list pre-dicted by G, and the generator tries to fool the discriminator togenerate(predict) well-ranked recommendation list. Concretely, Dand G play the following game on V(D,G):

minG

maxD

v(D,G) =Ex∼Ptrue (x )[logD(x)]+

Ez∼Pz (z)[log(1 − D(G(z)))](19)

Here, x is the input data from training set, z is the noise variablesampled from normal distribution.

We propose an adversarial framework to iteratively optimizetwo models: the generative model G predicting recommendationlist given historical user-movie interactions, and the discriminativemodel D predicting the relevance of the generated list. Like thestandard generative adversarial networks (GANs) [9], our modelalso optimizes the two models with a minimax two-player game. Dtries to distinguish the real high-rated movies in the training datafrom the recommendation list generated by G, while G maximizesthe probability of D making a mistake. Hopefully, this adversarialprocess can eventually adjust G to generate plausible and high-quality recommendation list. We further elaborate the generatorand discriminator below.

3.4.1 Discriminate Model. As depicted in Figure 1 (right side),we implement the discriminator D via a Siamese Network thatincorporates long and session-based ranking models in a pair-wisescenario. The discriminator D has two symmetrical point-wisenetworks that share parameters and are updated by minimizing apair-wise loss.

The objective of discriminator D is to maximize the probabilityof correctly distinguishing the ground truth movies from generatedrecommendation movies. For G �xed, we can obtain the optimalparameters for the discriminator D with the following formulation.

Page 6: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

WOODSTOCK’97, July 1997, USA Zhao et al.

θ∗ = argmaxθ

∑i ∈U

(Em+,m−∼ptrue [logDθ (ui ,m−,m+ |t)]+

Em+∼ptrue ,mд,t∼Gϕ (mд,t |ui ,t )[log(1 − Dθ (ui ,mд,t ,m+ |t)

] )(20)

whereU denotes the user set, ui denotes user i , m+ is a positive(high-rating) movie,m− is a negative movie randomly chosen fromthe entire negative (low-rating) movie space, θ andϕ are parametersofD andG , andmд,t is the generated movie byG given time t . Here,we adopt hinge loss as our training objective since it performsbetter than other training objectives. Hinge loss is widely adoptedin various learning to rank scenario, which aims to penalize theexamples that violate the margin constraint:

D(ui ,m−,m+ |t) = max{0, ϵ − д(eui , e

mm+ , h

ui,t , h

mm+,t )

+д(eui , emm− , h

ui,t , h

mm−,t )

} (21)

where ϵ is the hyper-parameter determining the margin of hingeloss, and we compress the outputs to the range of (0, 1).

3.4.2 Generative Model. Similar to conditional GANs pro-posed in [21], our generator G takes in the auxiliary information(user ui and time t ) as input, and generates the ranking list for useri . Speci�cally, when D is optimized and �xed after computing Eq.20, the generator G can be optimized by minimizing the followingformulation:

ϕ∗ = argminϕ

∑m∈M

(Emд,t∼Gϕ (mд,t |ui ,t )[log(1 − D(ui ,mд,t ,m+ |t)]

)(22)

Here,M denotes the movie set. As in [9], instead of minimizinglog(1−D(ui ,mд,t ,m+ |t)), we trainG to maximize log(D(ui ,mд,t ,m+ |t)).

3.4.3 Policy Gradient. Since the sampling of recommendationlist by generator G is discrete, it cannot be directly optimized bygradient descent as in the standard GAN formulation. Therefore,we use policy gradient based reinforcement learning algorithm [28]to optimize the generator G so as to generate highly rewardedrecommendation list. Concretely, we have the following derivations:

∇ϕ JG (ui ) = ∇ϕEmд,t∼Gϕ (mд,t |ui ,t )[logD(ui ,mд,t ,m+ |t)]

=∑

m∈M∇ϕGϕ (m |ui , t) logD(ui ,m,m+ |t)

=∑

m∈MGϕ (m |ui , t)∇ϕ logGϕ (m |ui , t) logD(ui ,m,m+ |t)

= Emд,t∼Gϕ (m |ui ,t )[∇ϕ logGϕ (mд,t |ui , t) logD(ui ,mд,t ,m+ |t)]

≈ 1K

K∑k=1∇ϕ logGϕ (mk |ui , t) logD(ui ,mk ,m+ |t)

(23)where K is number of movies sampled by the current version ofgenerator and mk is the k-th sampled item. With reinforcementlearning terminology, we treat the term logD(ui ,mk ,m+ |t) as thereward at time step t , and take an actionmk at each time step. To ac-celerate the convergence, the rewards within a batch are normalizedwith a Gaussian distribution to make the signi�cant di�erences.

The overall procedure is summarized in Algorithm 1. Duringthe training stage, the discriminator and the generator are trained

Algorithm 1: Long and Session-based Ranking Model withAdversarial Network1 Input: generator Gϕ , discriminator Dθ , training data S .2 Initialize models Gϕ and Dθ with random weights, and

pre-train them on training data S .3 repeat4 for g-steps do5 Generate recommendation list for user i at time t using the

generator Gϕ .6 Sample K candidates from recommendation list.7 for k ∈ {1, ...,K} do8 Sample a positive moviem+ from S.9 Compute the reward logD(ui ,mk ,m+ |t) with Eq.(21)

10 Update generator Gϕ via policy gradient Eq.(23).

11 for d-steps do12 Use current Gϕ to generate a negative movie and

combined with a positive movie sampled from S .13 Update discriminator Dθ with Eq.(20).14 until convergence

alternatively in a adversarial manner via Eq.(20) and Eq.(23), re-spectively.

4 EXPERIMENTAL SETUP4.1 Datasets

Table 2: Characteristics of the datasets.

Dataset Movielens-100K Net�ix-3M Net�ix-Full

Users 943 326,668 480,189movies 1,6831 17,751 17,770Ratings 100,000 16,080,980 100,480,507

Train Data 09/97-03/98 9/05-11/05 12/99-11/05Test Data 03/98-04/98 12/05 12/05Train Ratings 77,714 13,675,402 98,074,901Test Ratings 21,875 2,405,578 2,405,578

Density 0.493 0.406 0.093Sparsity 0.063 0.003 0.012

In order to evaluate the e�ectiveness of our model, we conductexperiments on two widely-used real-life datasets: Movielens100Kand Net�ix (called “Net�ix-Full”). To evaluate the robustness of ourmodel, we also conduct experiments on a 3-month Net�ix (called“Net�ix-3M”) dataset, which is a small version of Net�ix-Full andhas di�erent training and testing period. For each dataset, we splitthe whole data into several training and testing intervals based ontime, as is done in [32], to simulate the actual situation of predictingfuture behaviors of users given the data that occurred strictly beforecurrent time step. Then, each testing interval is randomly dividedinto a validation set and a testing set. We removed the users andmovies that do not appear in training set from the validation and test

Page 7: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

Leveraging Long and Short-term Information in Content-aware Movie Recommendation WOODSTOCK’97, July 1997, USA

sets. The detailed statistics are presented in Table 21. Following [29],we treat “5-star” in Net�ix, “4-start” and “5-start” for Movielens100Kas positive feedback and all others as unknown (negative) feedback.

4.2 Implementation DetailsMatrix Factorization. We use matrix factorization with 5 and 16

factor numbers for Movielens and Net�ix respectively [29]. The pa-rameters are randomly initialized by a uniform distribution rangedin [-0.05, 0.05]. We take gradient-clipping to suppress the gradientto the range of [-0.2,0.2]. L2 regularization (with λ = 0.05) is usedto the weights and biases of user and movie factors.

Recurrent Neural Network. We use a single-layer LSTM with 10hidden neurons, 15-dimensional input embeddings, and 4-dimensionaldynamic states where each state contains 7-days users/movies be-havioral trajectories. That is, we take one month as the length of asession. The parameters are initialized with the same way as in MF.L2 regularization (with λ = 0.05) is used to the weights and biasesof the LSTM layer to avoid over-�tting.

Generative Adversarial Nets. We pre-train G and D on the train-ing data with a pair-wise scenario, and use SGD algorithm withlearning rate 1 × 10−4 to optimize its parameters. The number ofsampled movies is set to 64 (i.e., K = 64). In addition, we use matrixfactorization model to generate 100 candidate movies, and thenre-rank these movies with LSTM. In all experiments, we conductmini-batch training with batch size 128.

4.3 Evaluation MetricsTo quantitatively evaluate our method, we adopt the rank-basedevaluation metrics to measure the performance of top-n recom-mendation [6, 18] , including Precision@N, Normalised DiscountedCumulative Gain (NDCG@N), Mean Average Precision (MAP) andMean Reciprocal Ranking (MRR).

4.4 Comparison to BaselinesIn the experiments, we evaluate and compare our models withseveral state-of-the-art methods.

Bayesian Personalised Ranking (BPR). Given a positive movie,BPR uniformly samples negative movies to resolve the imbalanceissue and provides a basic baseline for top-N recommendation [25].

Pairwise Ranking Factorization Machine (PRFM). This is one ofthe state-of-the-art movie recommendation algorithms, which ap-plies Factorization Machine model to microblog ranking on basisof pairwise classi�cation [23]. We use the same settings as in [23]in our experiments.

LambdaFM. It is a strong baseline for recommendation, which di-rectly optimizes the rank biased metrics [34]. We run the LambdaFMmodel with the publicly available code2, and use default settingsfor all hyperparameters.

1“Density” shows the average number of 5-ratings for the user per day. “Sparsity”shows the �lling-rate of user-movie rating matrix as used in [32]2https://github.com/fajieyuan/LambdaFM

Recurrent Recommender Networks (RRN). This model supple-ments matrix factorization with recurrent neural network modelvia a hard mixture mechanism [32]. We use the same setting asin [32].

IRGAN. This model trains the generator and discriminator alter-natively with MF in an adversarial process [29]. We run the IRGANmodel with the publicly available code3, and use default settingsfor all hyperparameters.

5 EXPERIMENTAL RESULTSIn this section, we compare our model with baseline methods quan-titatively and qualitatively.

5.1 Quantitative EvaluationWe �rst evaluate the performance of top-n recommendation. Theexperimental results are summarized in Tables 3 ,4 and 5. Our modelsubstantially and consistently outperforms the baseline methodsby a noticeable margin on all the experimental datasets. In particu-lar, we have explored several versions of our model with di�erentmixture mechanisms. As one anticipates, LSIC-V4 achieves the bestresults across all evaluation metrics and all datasets. For example,on MovieLens dataset, LSIC-V4 improves 7.45% on percision@5 and6.87% on NDCG@5 over the baseline methods. The main strength ofour model comes from its capability of prioritizing both long-termand short-term information in content-aware movie recommenda-tion. In addition, our mixture mechanisms (hard and soft) also seemquite e�ective to integrate MF and RNN.

To better understand the adversarial training process, we visual-ize the learning curves of LSIC-V4 as shown in Figure 3. Due to thelimited space, we only report the Pecision@5 and NDCG@5 scoresas in [29]. The other metrics exhibit a similar trend. As shown inFigure 3, after about 50 epoches, both Precision@5 and NDCG@5converge and the winner player is the generator which is usedto generate recommendation list for our �nal top-n movie recom-mendation. The performance of generator G becomes better withthe e�ective feedback (reward) from discriminator D. On the otherhand, once we have a set of high-quality recommendation movies,the performance of D deteriorates gradually in the training proce-dure and makes mistakes for predictions. In our experiments, weuse the generator G with best performance to predict test data.

5.2 Ablation StudyIn order to analyze the e�ectiveness of di�erent components ofour model for top-n movie recommendation, in this section, wereport the ablation test of LSIC-V4 by discarding poster informa-tion (w/o poster) and replacing the reinforcement learning withGumbel-Softmax [17] (w/o RL), respectively. Gumbel-Softmax is analternative method to address the non-di�erentiation problem sothat G can be trained straightforwardly.

Due to the limited space, we only illustrate the experimentalresults for Net�ix-3M dataset that is widely used in movie recom-mendation (see Table 6). Generally, both factors contribute, andreinforcement learning contributes most. This is within our ex-pectation since discarding reinforcement learning will lead the

3https://github.com/geek-ai/irgan

Page 8: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

WOODSTOCK’97, July 1997, USA Zhao et al.

Table 3: Moive recommendation results (MovieLens).

Precision@3 Precision@5 Precision@10 NDCG@3 NDCG@5 NDCG@10 MRR MAP

BPR 0.2795 0.2664 0.2301 0.2910 0.2761 0.2550 0.4324 0.3549PRFM 0.2884 0.2699 0.2481 0.2937 0.2894 0.2676 0.4484 0.3885LambdaFM 0.3108 0.2953 0.2612 0.3302 0.3117 0.2795 0.4611 0.4014RRN 0.2893 0.2740 0.2480 0.2951 0.2814 0.2513 0.4320 0.3631IRGAN 0.3022 0.2885 0.2582 0.3285 0.3032 0.2678 0.4515 0.3744

LSIC-V1 0.2946 0.2713 0.2471 0.2905 0.2801 0.2644 0.4595 0.4066LSIC-V2 0.3004 0.2843 0.2567 0.3122 0.2951 0.2814 0.4624 0.4101LSIC-V3 0.3105 0.3023 0.2610 0.3217 0.3086 0.2912 0.4732 0.4163LSIC-V4 0.3327 0.3173 0.2847 0.3512 0.3331 0.2939 0.4832 0.4321

Impv 7.05% 7.45% 9.00% 6.36% 6.87% 5.15% 4.79% 7.65%

Table 4: Movie recommendation results (Net�ix-3M).

Precision@3 Precision@5 Precision@10 NDCG@3 NDCG@5 NDCG@10 MRR MAP

BPR 0.2670 0.2548 0.2403 0.2653 0.2576 0.2469 0.3829 0.3484PRFM 0.2562 0.2645 0.2661 0.2499 0.2575 0.2614 0.4022 0.3712LambdaFM 0.3082 0.2984 0.2812 0.3011 0.2993 0.2849 0.4316 0.4043RRN 0.2759 0.2741 0.2693 0.2685 0.2692 0.2676 0.3960 0.3831IRGAN 0.2856 0.2836 0.2715 0.2824 0.2813 0.2695 0.4060 0.3718

LSIC-V1 0.2815 0.2801 0.2680 0.2833 0.2742 0.2696 0.4416 0.4025LSIC-V2 0.2901 0.2883 0.2701 0.2903 0.2831 0.2759 0.4406 0.4102LSIC-V3 0.3152 0.3013 0.2722 0.2927 0.2901 0.2821 0.4482 0.4185LSIC-V4 0.3221 0.3193 0.2921 0.3157 0.3114 0.2975 0.4501 0.4247

Impv 4.51% 7.00% 3.88% 4.85% 4.04% 4.42% 4.29% 5.05%

Table 5: Movie recommendation results (Net�ix-Full).

Precision@3 Precision@5 Precision@10 NDCG@3 NDCG@5 NDCG@10 MRR MAP

BPR 0.3011 0.2817 0.2587 0.2998 0.2870 0.2693 0.3840 0.3660PRFM 0.2959 0.2837 0.2624 0.2831 0.2887 0.2789 0.4060 0.3916LambdaFM 0.3446 0.3301 0.3226 0.3450 0.3398 0.3255 0.4356 0.4067RRN 0.3135 0.2954 0.2699 0.3123 0.3004 0.2810 0.3953 0.3768IRGAN 0.3320 0.3229 0.3056 0.3319 0.3260 0.3131 0.4248 0.4052

LSIC-V1 0.3127 0.3012 0.2818 0.3247 0.3098 0.2957 0.4470 0.4098LSIC-V2 0.3393 0.3271 0.3172 0.3482 0.3401 0.3293 0.4448 0.4213LSIC-V3 0.3501 0.3480 0.3291 0.3498 0.3451 0.3321 0.4503 0.4257LSIC-V4 0.3621 0.3530 0.3341 0.3608 0.3511 0.3412 0.4587 0.4327

Impv 5.08% 6.94% 3.56% 4.58% 3.33% 4.82% 5.30% 6.39%

Table 6: Ablation results for Net�ix-3M dataset.

Precision@3 Precision@5 Precision@10 NDCG@3 NDCG@5 NDCG@10 MRR MAP

LSIC-V4 0.3221 0.3193 0.2921 0.3157 0.3114 0.2975 0.4501 0.4247w/o RL 0.3012 0.2970 0.2782 0.2988 0.2927 0.2728 0.4431 0.4112w/o poster 0.3110 0.3012 0.2894 0.3015 0.3085 0.2817 0.4373 0.4005

adversarial learning ine�cient. With Gumbel-Softmax, G does notbene�t from the reward of D, so that we do not know which moviessampled by G are good and should be reproduced. Not surprisingly,poster information also contributes to movie recommendation.

5.3 Case StudyIn this section, we will further show the advantages of our modelsthrough some quintessential examples.

In Table 7, we provide the recommendation lists generated bythree state-of-the-art baseline methods (i.e., IRGAN, RNN, LambdaFM)

Page 9: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

Leveraging Long and Short-term Information in Content-aware Movie Recommendation WOODSTOCK’97, July 1997, USA

Table 7: The Recalled movies from Top-10 candidates in Ne�ix-3M dataset

Groundtruth IRGAN [29] RRN [32] LambdaFM [34] LSIC-V4

Userid: 1382

9 SoulsThe Princess BrideStuart Saves His FamilyThe Last ValleyWax MaskAfter HoursSession 9Valentin

[1] The Beatles: Love Me Do[2] Wax Mask X[3] Stuart Saves His Family X

[4] After Hours X[5] Top Secret!

[6] Damn Yankees[7] Dragon Tales: It’s Cool to Be Me![8] Play Misty for Me[9] The Last Round: Chuvalo vs. Ali’[10] La Vie de Chateau

[1] Falling Down[2] 9 Souls X[3] Wax Mask X

[4] After Hours X[5] Stuart Saves His Family X

[6] Crocodile Dundee 2[7] The Princess Bride X[8] Dragon Tales: It’s Cool to Be Me![9] They Were Expendable[10] Damn Yankees

[1] The Avengers ’63[2] Wax Mask X[3] The Boondock Saints

[4] Valentin X[5] 9 Souls X

[6] The Princess Bride X[7] After Hours X[8] Tekken[9] Stuart Saves His Family X[10] Runn Ronnie Run

[1]9 Souls X[2]The Princess Bride X[3]Stuart Saves His Family X

[4] The Last Valley X[5] Wax Mask X

[6] Session 9 X[7] Dragon Tales: It’s Cool to Be Me![8] Damn Yankees[9] After Hours X[10] Valentin X

Userid: 8003 9 SoulsPrincess Bride

[1] Cheech Chong’s Up in Smoke[2] Wax Mask[3] Damn Yankees

[4] Dragon Tales: It’s Cool to Be Me![5] Top Secret!

[6] Agent Cody Banks 2: Destination London[7] After Hours[8] Stuart Saves His Family[9] 9 Souls X[10] The Beatles: Love Me Do

[1] Crocodile Dundee 2[2] Session 9[3] Falling Down

[4] Wax Mask[5] After Hours

[6] Stuart Saves His Family[7] 9 Souls X[8] The Princess Bride X[9] Dragon Tales: It’s Cool to Be Me![10] Scream 2

[1] The Insider[2]A Nightmare on Elm Street 3[3] Dennis the Menace Strikes Again

[4] Civil Brand[5] 9 Souls X

[6] Falling Down[7] The Princess Bride X[8] Radiohead: Meeting People[9] Crocodile Dundee 2[10] Christmas in Connecticut

[1] 9 Souls X[2]The Princess Bride X[3]The Last Valley

[4]Stuart Saves His Family[5]Wax Mask

[6]Dragon Tales: It’s Cool to Be Me![7]Session 9[8]Crocodile Dundee 2[9]Damn Yankees[10]Cheech Chong’s Up in Smoke

as well as the proposed LSIC-V4 model for two users who are ran-domly selected from the Net�ix-3M dataset. Our model can rank thepositive movies in higher positions than other methods. For exam-ple, the ranking of the movie “9 souls” for user “8003” has increasedfrom 5-th position (by LambdaFM) to 1st position (by LSIC-V4).Meanwhile some emerging movies such as “Session 9” and “TheLast Valley” that are truly attractive to the user “1382” have beenrecommended by our models, whereas they are ignored by baselinemethods. In fact, we can include all positive movies in the top-10list for user “1382” and in top-3 list for user “8003”. Our modelbene�ts from the fact that both dynamic and static knowledge areincorporated into the model with adversarial training.

0 10 20 30 40 50 60Number of epoches

0.15

0.20

0.25

0.30

Prec

ision

@5

LSIC-V4-GeneratorLSIC-V4-Discriminator

0 10 20 30 40 50 60Number of epoches

0.15

0.20

0.25

0.30

NDCG

5

LSIC-V4-GeneratorLSIC-V4-Discriminator

Figure 3: Learning curves of GAN on Net�ix-3M

5.4 Re-rank E�ectFrom our experiments, we observe that it could be time-consumingfor RNN to infer all movies. In addition, users may be interested in afew movies that are subject to a long-tailed distribution. Motivatedby these observations, we provide a re-ranking strategy as used in[5]. Speci�cally, we �rst generate N candidate movies by MF, andthen re-rank these candidate movies with our model. In this way,the inference time can been greatly reduced. Figure 4 illustrates theperformance curves over the number of candidate movies (i.e. N )generated by MF. We only report the Pecision@5 and NDCG@5results due to the limited space, the other metrics exhibit a similar

trend. As shown in Figure 4, when the number of candidate moviesis small, i.e., N ≤ 100 for Net�ix-3M dataset, the Percision@5raises gradually with the number of candidate movies increases.Nevertheless, the performance drops rapidly with the increasingcandidate movies when N ≥ 100. It suggests that the candidates ina long-tail side generated by MF is inaccurate and these candidatemovies deteriorate the overall performance of the re-rank strategy.

5 25 45 65 85 105

The number of re-rank candidates0.300

0.305

0.310

0.315

0.320

Prec

ision

@5

LSIC-V45 25 45 65 85 105

The number of re-rank candidates

0.295

0.300

0.305

0.310

NDCG

@5

LSIC-V4

Figure 4: Sensitivity of the candidate scale on Net�ix-3M

5.5 Session Period SensitivityThe above experimental results have shown that the session-based(short-term) information indeed improves the performance of top-N recommendation. We conduct an experiment on Net�ix-3M toinvestigate how the session period in�uences the �nal recommen-dation performance. As shown in Figure 5, the bars in purple color(below) describe the basic performance of MF component while thered ones (above) represent the extra improvement by the LSIC-V4.The RNN plays an insigni�cant role in the very early sessions sinceit lacks enough historical interaction data. For later period of ses-sions, RNN component tends to be more e�ective, and our jointmodel achieves a clear improvement over the model with only MFcomponent.

Page 10: Leveraging Long and Short-term Information in Content ... · To verify the e˛ectiveness of our model, we conduct ex-tensive experiments on two widely used real-life datasets: Net˚ix

WOODSTOCK’97, July 1997, USA Zhao et al.

1 2 3 4 5 6 7 8 9

The session peroid (week)0.20

0.22

0.24

0.26

0.28

0.30

0.32

Prec

ision

@5

LSIC-V4(only MF part)LSIC-V4

1 2 3 4 5 6 7 8 9

The session peroid (week)0.20

0.22

0.24

0.26

0.28

0.30

0.32

NDCG

@5

LSIC-V4(only MF part)LSIC-V4

Figure 5: Sensitivity of the session period on Net�ix-3M

6 CONCLUSIONIn this paper, we proposed a novel adversarial process for top-nrecommendation. Our model incorporated both matrix factoriza-tion and recurrent neural network to exploit the bene�ts of thelong-term and short-term knowledge. We also integrated posterinformation to further improve the performance of movie recom-mendation. Experiments on two real-life datasets showed the per-formance superiority of our model.

REFERENCES[1] Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next gen-

eration of recommender systems: A survey of the state-of-the-art and possibleextensions. IEEE transactions on knowledge and data engineering 17, 6 (2005),734–749.

[2] Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein gen-erative adversarial networks. In International Conference on Machine Learning.214–223.

[3] Jesus Bobadilla, Fernando Ortega, Antonio Hernando, and Abraham Gutierrez.2013. Recommender systems survey. Knowledge-based systems 46 (2013), 109–132.

[4] Tseng-Hung Chen, Yuan-Hong Liao, Ching-Yao Chuang, Wan-Ting Hsu, JianlongFu, and Min Sun. 2017. Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner. arXiv preprint arXiv:1705.00930 (2017).

[5] Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networksfor youtube recommendations. In Proceedings of the 10th ACM Conference onRecommender Systems. ACM, 191–198.

[6] Paolo Cremonesi, Yehuda Koren, and Roberto Turrin. 2010. Performance ofrecommender algorithms on top-n recommendation tasks. In Proceedings of thefourth ACM conference on Recommender systems. ACM, 39–46.

[7] Qiang Cui, Shu Wu, Qiang Liu, and Liang Wang. 2016. A Visual and Textual Recur-rent Neural Network for Sequential Prediction. arXiv preprint arXiv:1611.06668(2016).

[8] Robin Devooght and Hugues Bersini. 2016. Collaborative �ltering with recurrentneural networks. arXiv preprint arXiv:1608.07400 (2016).

[9] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarialnets. In Advances in neural information processing systems. 2672–2680.

[10] Xiangnan He, Hanwang Zhang, Min-Yen Kan, and Tat-Seng Chua. 2016. Fast ma-trix factorization for online recommendation with implicit feedback. In Proceed-ings of the 39th International ACM SIGIR conference on Research and Developmentin Information Retrieval. ACM, 549–558.

[11] Antonio Hernando, Jesús Bobadilla, and Fernando Ortega. 2016. A non negativematrix factorization for collaborative �ltering recommender systems based on aBayesian probabilistic model. Knowledge-Based Systems 97 (2016), 188–202.

[12] Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk.2015. Session-based recommendations with recurrent neural networks. In ICLR.

[13] Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neuralcomputation 9, 8 (1997), 1735–1780.

[14] Yehuda Koren. 2008. Factorization meets the neighborhood: a multifacetedcollaborative �ltering model. In Proceedings of the 14th ACMSIGKDD internationalconference on Knowledge discovery and data mining. ACM, 426–434.

[15] Yehuda Koren. 2010. Collaborative �ltering with temporal dynamics. Commun.ACM 53, 4 (2010), 89–97.

[16] Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix factorization tech-niques for recommender systems. Computer 42, 8 (2009).

[17] Matt J Kusner and José Miguel Hernández-Lobato. 2016. GANS for Sequencesof Discrete Elements with the Gumbel-softmax Distribution. arXiv preprint

arXiv:1611.04051 (2016).[18] Nathan N Liu and Qiang Yang. 2008. Eigenrank: a ranking-oriented approach to

collaborative �ltering. In Proceedings of the 31st annual international ACM SIGIRconference on Research and development in information retrieval. ACM, 83–90.

[19] Jie Lu, Dianshuang Wu, Mingsong Mao, Wei Wang, and Guangquan Zhang. 2015.Recommender system application developments: a survey. Decision SupportSystems 74 (2015), 12–32.

[20] Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate isnot enough: how accuracy metrics have hurt recommender systems. In CHI’06extended abstracts on Human factors in computing systems. ACM, 1097–1101.

[21] Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets.arXiv preprint arXiv:1411.1784 (2014).

[22] Andriy Mnih and Ruslan R Salakhutdinov. 2008. Probabilistic matrix factorization.In Advances in neural information processing systems. 1257–1264.

[23] Runwei Qiang, Feng Liang, and Jianwu Yang. 2013. Exploiting ranking fac-torization machines for microblog retrieval. In Proceedings of the 22nd ACMinternational conference on Conference on information & knowledge management.ACM, 1783–1788.

[24] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele,and Honglak Lee. 2016. Generative adversarial text to image synthesis. InProceedings of the 33rd International Conference on International Conference onMachine Learning. JMLR.org, 1060–1069.

[25] Ste�en Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. InProceedings of the twenty-�fth conference on uncertainty in arti�cial intelligence.AUAI Press, 452–461.

[26] Jasson DM Rennie and Nathan Srebro. 2005. Fast maximum margin matrixfactorization for collaborative prediction. In Proceedings of the 22nd internationalconference on Machine learning. ACM, 713–719.

[27] Nathan Srebro, Jason Rennie, and Tommi S Jaakkola. 2005. Maximum-marginmatrix factorization. In Advances in neural information processing systems. 1329–1336.

[28] Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour.2000. Policy gradient methods for reinforcement learning with function approxi-mation. In Advances in neural information processing systems. 1057–1063.

[29] Jun Wang, Lantao Yu, Weinan Zhang, Yu Gong, Yinghui Xu, Benyou Wang,Peng Zhang, and Dell Zhang. 2017. IRGAN: A Minimax Game for UnifyingGenerative and Discriminative Information Retrieval Models. In Proceedings ofthe 40th International ACM SIGIR Conference on Research and Development inInformation Retrieval. 515–524.

[30] Caihua Wu, Junwei Wang, Juntao Liu, and Wenyu Liu. 2016. Recurrent neuralnetwork based recommendation for time heterogeneous feedback. Knowledge-Based Systems 109 (2016), 90–103.

[31] Chao-Yuan Wu, Amr Ahmed, Alex Beutel, and Alexander J Smola. 2016. JointTraining of Ratings and Reviews with Recurrent Recommender Networks. ICLR(2016).

[32] Chao-Yuan Wu, Amr Ahmed, Alex Beutel, Alexander J Smola, and How Jing. 2017.Recurrent recommender networks. In Proceedings of the Tenth ACM InternationalConference on Web Search and Data Mining. ACM, 495–503.

[33] Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu. 2017. SeqGAN: SequenceGenerative Adversarial Nets with Policy Gradient.. In AAAI. 2852–2858.

[34] Fajie Yuan, Guibing Guo, Joemon M Jose, Long Chen, Haitao Yu, and WeinanZhang. 2016. Lambdafm: learning optimal ranking with factorization machinesusing lambda surrogates. In Proceedings of the 25th ACM International on Confer-ence on Information and Knowledge Management. ACM, 227–236.

[35] Lili Zhao, Zhongqi Lu, Sinno Jialin Pan, and Qiang Yang. 2016. Matrix Factoriza-tion+ for Movie Recommendation.. In IJCAI. 3945–3951.