Neural Networks for Information Retrievalnn4ir.com/wsdm2018/slides/08_RecommenderSystems.pdfDeep reinforcement learning for recommendations – [Zhao et al., 2017] I MDP-based formulations

235

Outline

Morning programPreliminariesModeling user behaviorSemantic matchingLearning to rank

Afternoon programEntitiesGenerating responsesRecommender systemsIndustry insightsQ & A

236

OutlineMorning program

PreliminariesModeling user behaviorSemantic matchingLearning to rank

Afternoon programEntitiesGenerating responsesRecommender systems

Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up

Industry insightsQ & A

237

Recommender systemsRecommender systems – The task

I Build a model that estimates how auser will like an item.

I A typical recommendation setup hasI matrix with users and itemsI plus ratings of users for items

reflecting past/known preferences

and tries to predict future preferences

I This is not about rating prediction

[Karatzoglou and Hidasi, Deep Learning for Recommender Systems, RecSys ’17, 2017]

238

Recommender systemsApproaches to recommender systems

I Collaborative filteringI Based on analyzing users’ behavior and preferences such as ratings given to movies

or books

I Content-based filteringI Based on matching the descriptions of items and users’ profilesI Users’ profiles are typically constructed using their previous purchases/ratings, their

submitted queries to search engines and so on

I A hybrid approach

239

Recommender systemsWarm, cold

I Cold start problemI User cold-start problem – generate recommendations for a new user / a user for

whom very few preferences are knownI Item cold-start problem – recommendation items that are new / for which very

users have shared ratings or preferences

I Cold items/users

I Warm items/users

240






241

Recommender systemsMatrix factorization

I The recommender system’s work horse

V

minu,v

X

i,j2R

(ri,j � uTi vj) + �(||ui||2 + ||vj ||2)

R

242

Recommender systemsMatrix factorization

I Discover the latent features underlying the interactions between users and items

I Don’t rely on imputation to fill in missing ratings and make matrix dense

I Instead, model observed ratings directly, avoid overfitting through a regularizedmodel

I Minimize the regularized squared error on the set of known ratings:

minu,v

X

i,j2R(ri,j � uT

i vj) + �(kuik2 + kvjk2)

Popular methods for minimizing include stochastic gradient descent andalternating least squares

243






244

Recommender systemsA feed-forward neural network view

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

245

Recommender systemsA deeper view

246

Recommender systemsMatrix factorization vs. feed-forward network

I Two models are very similarI Embeddings, MSE loss, gradient-based optimization

I Feed-forward net can learn di↵erent embedding combinations than a dot product

I Capturing pairwise interactions through feed-forward net requires a huge amountof data

I This approach is not superior to properly tuned traditional matrix factorizationapproach

247

Recommender systemsGreat escape . . .

I Side information

I Richer models

I Other tasks

248






249

Recommender systemsSide information for recommendation

(1) (2)

(3) (4)

250

Recommender systemsSide information for recommendation

I Textual side informationI Product description, reviews, etc.I Extraction: RNNs, one dimensional CNNs, word embeddings, paragraph vectorsI Applications: news, products, books, publication

I ImagesI Product pictures, video thumbnailsI Extraction: CNNsI Applications: fashion, video

I Music/audioI Extraction: CNNs and RNNsI Applications: music

251

Recommender systemsTextual side information

I Content2vec [Nedelec et al., 2016]

I Using associated textual information for recommendations [Bansal et al., 2016]

252

Recommender systemsTextual information for improving recommendations

I Task: paper recommendationI Item representation

I Text representation: RNN basedI Item-specific embeddings created using MFI Final representation: item + text embeddings

253

Recommender systemsImages in recommendation

Visual Bayesian Personalized Ranking (BPR) [He and McAuley, 2016]

I Bias terms

I MF modelI Visual part:

I Pretrained CNN featuresI Dimension reduction through embeddings

I BPR loss

254






255

Recommender systemsAlternative models

I Restricted Boltzman Machines [Salakhutdinov et al., 2007]

I Auto-encoders [Wu et al., 2016]

I Prod2vec [Grbovic et al., 2015]

I Wide + Deep models [Cheng et al., 2016]

256

Recommender systemsRestricted Boltzman Machines – RBM

I Generative stochastic neural network

I Visible and hidden units connected by weights

I Activation probabilities:p(hj = 1|v) = �(bhj +

Pmi=1 wi,jvi)

p(vi = 1|h) = �(bvi +Pn

j=1 wi,jhj)

I TrainingI Set visible units based on data, sample hidden units, then sample

visible unitsI Modify weights to approach the configuration of visible units to the

data

I In recommendation:I Visible units: ratings on the movieI Vector of length 5 (for each rating value) in each unitI Units corresponding to users who not rated the movie are ignored

257

Recommender systemsAuto-encoders

Auto-encoders

I One hidden layer

I Same number of input and output units

I Try to reconstruct the input on the output

I Hidden layer: compressed representation of the data

Constraining the model: improve generalization

I Sparse auto-encoders: activation of units are limited

I Denoising auto-encoders: corrupt the input

258

Recommender systemsAuto-encoders for recommendation

Reconstruct corrupted user interaction vectors [Wuet al., 2016]

I Collaborative Denoising Auto-Encoder (CDAE)

I The link between nodes are associated withdi↵erent weights

I The links with red color are user specific

I Other weights are shared across all the users

259

Recommender systemsProd2vec and Item2vec

I Prod2vec and item2vec: Item-item co-occurrencefactorization

I User2vec: User-user co-occurrence factorization

I The two approaches can be combined [Lianget al., 2016]

260

Recommender systemsWide + Deep models

I Combination of two modelsI Deep neural network

I On embedded item featuresI In charge of generalization

I Linear modelI On embedded item featureI And cross product of item featuresI In charge of memorization on binarized

features

I [Cheng et al., 2016]

261






262

Recommender systemsOther tasks

I Session-based recommendation

I Contextual sequence prediction

I Time-sensitive sequence prediction

I Causality in recommendations

I Recommendation as question answering

I Deep reinforcement learning for recommendations

263

Recommender systemsSession-based recommendation

I Treat recommendations as a sequence classification problem

I Input: a sequence of user actions (purchases/ratings of items)

I Output: next action

I Disjoint sessions (instead of consistent user history)

264

Recommender systemsGRU4Rec

Network structure [Hidasi et al., 2016]

I Input: one hot encoded item ID

I Output: scores over all items

I Goal: predicting the next item in the session

Adapting GRU to session-based recommendations

I Session-parallel mini-batching: to handle sessions of (very)di↵erent length and lots of short sessions

I Sampling on the output: to handle lots of items(inputs,outputs)

265

Recommender systemsGRU4Rec

Session-parallel mini-batches

I Mini-batch is defined over sessions

Output sampling

I Computing scores for all items (100K1M) in every step is slow

I One positive item (target) + severalsamples

I Fast solution: scores on mini-batch targets

I Items of the other mini-batches arenegative samples for the currentmini-batch

266

Recommender systemsContextual sequence prediction

I Input: sequence of contextual useractions, plus current context

I Output: probability of next action

I E.g. “Given all the actions a user hastaken so far, what’s the most likely videothey’re going to play right now?” [Beutelet al., 2018]

267

Recommender systemsTime-sensitive sequence prediction

I Recommendations are actions at amoment in time

I Proper modeling of time and systemdynamics is critical

I Experiment on a Netflix internaldataset

I Context:I Discrete time – Day-of-week:

Sunday, Monday, . . . Hour-of-dayI Continuous time (Timestamp)

I Predict next play (temporal splitdata)

[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]

And now, for some speculative tasks

in the recommender systems space

Answers not clear, but good potential for follow-up research

268

269

Recommender systemsCausality in recommendations – [Schnabel et al., 2016]

I Virtually all data for training recommendersystems is subject to selection biases

I In movie recommendation, users typicallywatch and rate movies they like, rarely moviesthey do not like

I View recommendation from causal inferenceperspective – exposing user to item isintervention analogous to exposing patient totreatment in medical study

I Propensity-weighted MF method – propensitiesact as weights on loss terms

minu,v

X

i,j2R

1Pi,j

(ri,j � uTi vj) + �(kuik2 + kvjk2)

I Performance: MF vs. propensityweighted MF

(As ↵ ! 0, data is increasinglymissing not at random, only revealingtop rated items.)

I How to incorporate this in neuralnetworks for recommender systems?

270

Recommender systemsRecommendations as question answering – [Dodge et al., 2015]

I Conversational recommendation agentI (1) question-answering (QA), (2)

recommendation, (3) mix of recommendationand QA and (4) general dialog about the topic(chit-chat)

I Memory network jointlytrained on all (four) tasksperforms best

I Incorporate short and longterm memory and can uselocal context and knowledgebases of facts

I Performance on QA needs areal boost

I Performance degraded ratherthan improved when trainingon all four tasks at once

271

Recommender systemsDeep reinforcement learning for recommendations – [Zhao et al., 2017]

I MDP-based formulations of recommender systems go back to early 2000sI Use of reinforcement learning has two advantages

1. can continuously update strategies during interactions2. are able to learn strategy that maximizes the long-term cumulative reward from users

I List-wise recommendation framework, which can be applied in scenarios with largeand dynamic item spaces

I Uses Actor-Critic network

I Integrate multiple orders – positional order, temporal order

I Needs proper evaluation in live environment

272






273

Recommender systemsWrap-up

I Current directionsI Item/user embeddingI Deep collaborative filteringI Feature extraction from contentI Session- and context-based recommendationI Fairness, accuracy, confidentiality, and transparency (FACT)

I Deep learning can work well for recommendations

I Matrix factorization and deep learning are very similar in classic recommendationsetup

I Lots of areas to explore

274

Recommender systemsResources

RankSys : https://github.com/RankSys/RankSys

LibRec : https://www.librec.net

LensKit : http://lenskit.org

LibMF : https://www.csie.ntu.edu.tw/%7Ecjlin/libmf/

proNet-core : https://github.com/cnclabs/proNet-core

rival : http://rival.recommenders.net

TensorRec : https://github.com/jfkirk/tensorrec

https://github.com/RankSys/RankSys

https://www.librec.net

http://lenskit.org

https://www.csie.ntu.edu.tw/%7Ecjlin/libmf/

https://github.com/cnclabs/proNet-core

http://rival.recommenders.net

https://github.com/jfkirk/tensorrec

Documents

Neural Networks for Information Retrievalnn4ir.com/wsdm2018/slides/08_RecommenderSystems.pdfDeep reinforcement learning for recommendations – [Zhao et al., 2017] I MDP-based formulations