Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
235
Outline
Morning programPreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systemsIndustry insightsQ & A
236
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
237
Recommender systemsRecommender systems – The task
I Build a model that estimates how auser will like an item.
I A typical recommendation setup hasI matrix with users and itemsI plus ratings of users for items
reflecting past/known preferences
and tries to predict future preferences
I This is not about rating prediction
[Karatzoglou and Hidasi, Deep Learning for Recommender Systems, RecSys ’17, 2017]
238
Recommender systemsApproaches to recommender systems
I Collaborative filteringI Based on analyzing users’ behavior and preferences such as ratings given to movies
or books
I Content-based filteringI Based on matching the descriptions of items and users’ profilesI Users’ profiles are typically constructed using their previous purchases/ratings, their
submitted queries to search engines and so on
I A hybrid approach
239
Recommender systemsWarm, cold
I Cold start problemI User cold-start problem – generate recommendations for a new user / a user for
whom very few preferences are knownI Item cold-start problem – recommendation items that are new / for which very
users have shared ratings or preferences
I Cold items/users
I Warm items/users
240
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
241
Recommender systemsMatrix factorization
I The recommender system’s work horse
V
minu,v
X
i,j2R
(ri,j � uTi vj) + �(||ui||2 + ||vj ||2)
R
242
Recommender systemsMatrix factorization
I Discover the latent features underlying the interactions between users and items
I Don’t rely on imputation to fill in missing ratings and make matrix dense
I Instead, model observed ratings directly, avoid overfitting through a regularizedmodel
I Minimize the regularized squared error on the set of known ratings:
minu,v
X
i,j2R(ri,j � uT
i vj) + �(kuik2 + kvjk2)
Popular methods for minimizing include stochastic gradient descent andalternating least squares
243
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
244
Recommender systemsA feed-forward neural network view
[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]
245
Recommender systemsA deeper view
246
Recommender systemsMatrix factorization vs. feed-forward network
I Two models are very similarI Embeddings, MSE loss, gradient-based optimization
I Feed-forward net can learn di↵erent embedding combinations than a dot product
I Capturing pairwise interactions through feed-forward net requires a huge amountof data
I This approach is not superior to properly tuned traditional matrix factorizationapproach
247
Recommender systemsGreat escape . . .
I Side information
I Richer models
I Other tasks
248
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
249
Recommender systemsSide information for recommendation
(1) (2)
(3) (4)
250
Recommender systemsSide information for recommendation
I Textual side informationI Product description, reviews, etc.I Extraction: RNNs, one dimensional CNNs, word embeddings, paragraph vectorsI Applications: news, products, books, publication
I ImagesI Product pictures, video thumbnailsI Extraction: CNNsI Applications: fashion, video
I Music/audioI Extraction: CNNs and RNNsI Applications: music
251
Recommender systemsTextual side information
I Content2vec [Nedelec et al., 2016]
I Using associated textual information for recommendations [Bansal et al., 2016]
252
Recommender systemsTextual information for improving recommendations
I Task: paper recommendationI Item representation
I Text representation: RNN basedI Item-specific embeddings created using MFI Final representation: item + text embeddings
253
Recommender systemsImages in recommendation
Visual Bayesian Personalized Ranking (BPR) [He and McAuley, 2016]
I Bias terms
I MF modelI Visual part:
I Pretrained CNN featuresI Dimension reduction through embeddings
I BPR loss
254
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
255
Recommender systemsAlternative models
I Restricted Boltzman Machines [Salakhutdinov et al., 2007]
I Auto-encoders [Wu et al., 2016]
I Prod2vec [Grbovic et al., 2015]
I Wide + Deep models [Cheng et al., 2016]
256
Recommender systemsRestricted Boltzman Machines – RBM
I Generative stochastic neural network
I Visible and hidden units connected by weights
I Activation probabilities:p(hj = 1|v) = �(bhj +
Pmi=1 wi,jvi)
p(vi = 1|h) = �(bvi +Pn
j=1 wi,jhj)
I TrainingI Set visible units based on data, sample hidden units, then sample
visible unitsI Modify weights to approach the configuration of visible units to the
data
I In recommendation:I Visible units: ratings on the movieI Vector of length 5 (for each rating value) in each unitI Units corresponding to users who not rated the movie are ignored
257
Recommender systemsAuto-encoders
Auto-encoders
I One hidden layer
I Same number of input and output units
I Try to reconstruct the input on the output
I Hidden layer: compressed representation of the data
Constraining the model: improve generalization
I Sparse auto-encoders: activation of units are limited
I Denoising auto-encoders: corrupt the input
258
Recommender systemsAuto-encoders for recommendation
Reconstruct corrupted user interaction vectors [Wuet al., 2016]
I Collaborative Denoising Auto-Encoder (CDAE)
I The link between nodes are associated withdi↵erent weights
I The links with red color are user specific
I Other weights are shared across all the users
259
Recommender systemsProd2vec and Item2vec
I Prod2vec and item2vec: Item-item co-occurrencefactorization
I User2vec: User-user co-occurrence factorization
I The two approaches can be combined [Lianget al., 2016]
260
Recommender systemsWide + Deep models
I Combination of two modelsI Deep neural network
I On embedded item featuresI In charge of generalization
I Linear modelI On embedded item featureI And cross product of item featuresI In charge of memorization on binarized
features
I [Cheng et al., 2016]
261
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
262
Recommender systemsOther tasks
I Session-based recommendation
I Contextual sequence prediction
I Time-sensitive sequence prediction
I Causality in recommendations
I Recommendation as question answering
I Deep reinforcement learning for recommendations
263
Recommender systemsSession-based recommendation
I Treat recommendations as a sequence classification problem
I Input: a sequence of user actions (purchases/ratings of items)
I Output: next action
I Disjoint sessions (instead of consistent user history)
264
Recommender systemsGRU4Rec
Network structure [Hidasi et al., 2016]
I Input: one hot encoded item ID
I Output: scores over all items
I Goal: predicting the next item in the session
Adapting GRU to session-based recommendations
I Session-parallel mini-batching: to handle sessions of (very)di↵erent length and lots of short sessions
I Sampling on the output: to handle lots of items(inputs,outputs)
265
Recommender systemsGRU4Rec
Session-parallel mini-batches
I Mini-batch is defined over sessions
Output sampling
I Computing scores for all items (100K1M) in every step is slow
I One positive item (target) + severalsamples
I Fast solution: scores on mini-batch targets
I Items of the other mini-batches arenegative samples for the currentmini-batch
266
Recommender systemsContextual sequence prediction
I Input: sequence of contextual useractions, plus current context
I Output: probability of next action
I E.g. “Given all the actions a user hastaken so far, what’s the most likely videothey’re going to play right now?” [Beutelet al., 2018]
267
Recommender systemsTime-sensitive sequence prediction
I Recommendations are actions at amoment in time
I Proper modeling of time and systemdynamics is critical
I Experiment on a Netflix internaldataset
I Context:I Discrete time – Day-of-week:
Sunday, Monday, . . . Hour-of-dayI Continuous time (Timestamp)
I Predict next play (temporal splitdata)
[Raimon and Basilico, Deep Learning for Recommender Systems, 2017]
And now, for some speculative tasks
in the recommender systems space
Answers not clear, but good potential for follow-up research
268
269
Recommender systemsCausality in recommendations – [Schnabel et al., 2016]
I Virtually all data for training recommendersystems is subject to selection biases
I In movie recommendation, users typicallywatch and rate movies they like, rarely moviesthey do not like
I View recommendation from causal inferenceperspective – exposing user to item isintervention analogous to exposing patient totreatment in medical study
I Propensity-weighted MF method – propensitiesact as weights on loss terms
minu,v
X
i,j2R
1Pi,j
(ri,j � uTi vj) + �(kuik2 + kvjk2)
I Performance: MF vs. propensityweighted MF
(As ↵ ! 0, data is increasinglymissing not at random, only revealingtop rated items.)
I How to incorporate this in neuralnetworks for recommender systems?
270
Recommender systemsRecommendations as question answering – [Dodge et al., 2015]
I Conversational recommendation agentI (1) question-answering (QA), (2)
recommendation, (3) mix of recommendationand QA and (4) general dialog about the topic(chit-chat)
I Memory network jointlytrained on all (four) tasksperforms best
I Incorporate short and longterm memory and can uselocal context and knowledgebases of facts
I Performance on QA needs areal boost
I Performance degraded ratherthan improved when trainingon all four tasks at once
271
Recommender systemsDeep reinforcement learning for recommendations – [Zhao et al., 2017]
I MDP-based formulations of recommender systems go back to early 2000sI Use of reinforcement learning has two advantages
1. can continuously update strategies during interactions2. are able to learn strategy that maximizes the long-term cumulative reward from users
I List-wise recommendation framework, which can be applied in scenarios with largeand dynamic item spaces
I Uses Actor-Critic network
I Integrate multiple orders – positional order, temporal order
I Needs proper evaluation in live environment
272
OutlineMorning program
PreliminariesModeling user behaviorSemantic matchingLearning to rank
Afternoon programEntitiesGenerating responsesRecommender systems
Items and UsersMatrix factorizationMatrix factorization as a networkSide informationRicher modelsOther tasksWrap-up
Industry insightsQ & A
273
Recommender systemsWrap-up
I Current directionsI Item/user embeddingI Deep collaborative filteringI Feature extraction from contentI Session- and context-based recommendationI Fairness, accuracy, confidentiality, and transparency (FACT)
I Deep learning can work well for recommendations
I Matrix factorization and deep learning are very similar in classic recommendationsetup
I Lots of areas to explore
274
Recommender systemsResources
RankSys : https://github.com/RankSys/RankSys
LibRec : https://www.librec.net
LensKit : http://lenskit.org
LibMF : https://www.csie.ntu.edu.tw/%7Ecjlin/libmf/
proNet-core : https://github.com/cnclabs/proNet-core
rival : http://rival.recommenders.net
TensorRec : https://github.com/jfkirk/tensorrec