A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural Networks

@cataldomusto @ale_suglia

@cld_greco @SWAP_research

A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural NetworksALESSANDRO SUGLIA, CLAUDIO GRECO, CATALDO MUSTO, MARCO DE GEMMIS, PASQUALE

LOPS, GIOVANNI SEMERARO

UNIVERSITÀ DEGLI STUDI DI BARI ‘ALDO MORO’ - ITALY

25th International Conference on User Modeling, Adaptation and Personalization

Bratislava, SlovakiaJuly 12, 2017

[email protected]

Recurrent Neural Networks (RNNs)Widespread Deep Learning Architecture◦ Based on Neural Networks

◦ The connections between the units may contain loops which let consider past states in the learning process

◦ Very suitable to model variable-length sequential data

Alessandro Suglia, Claudio Greco, Cataldo Musto, Marco de Gemmis, Pasquale Lops, Giovanni Semeraro.

A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural Networks. UMAP 2017. Bratislava, Slovakia. July 12, 2017




PROS CONS

◦ Very good performance in different tasks

◦ Can learn short-term and long-term (temporal) dependencies

◦ Vanishing/exploding gradient problem






PROS CONS

◦ Very good performance in different tasks

◦ Can learn short-term and long-term (temporal) dependencies

◦ Vanishing/exploding gradient problem

LONG-SHORT TERM MEMORY NETWORKS (LSTMS)◦ Introduced to solve the vanishing/exploding gradient problem

Each cell presents a complex structure which is more powerful than simple RNN cells.



Motivations



?

In content-based recommender systems

suggestions are generated by matching

the features stored in the user profile

with those describing the items to be

recommended

Motivations



user profile

?

items





recommended

Motivations



user profile

?

items





recommended

Content Representation plays a key role!

Motivations



user profile

?

items





recommended

RNNs are very suitable!Content can be considered as a

sequence of terms

Research Question



Research Question



Our contribution

AMAR (Ask Me Any Rating)Deep Architecture inspired by a neural

network model used to solve Question

Answering toy tasks [*]

[*] J. Weston et al. “Towards AI-Complete Question

Answering: A Set of Prerequisite Toy Tasks”.

In: CoRR abs/1502.05698 (2015)

Research Question



Our contribution

AMAR (Ask Me Any Rating)Deep Architecture inspired by a neural

network model used to solve Question

Answering toy tasks [*]

[*] J. Weston et al. “Towards AI-Complete Question

Answering: A Set of Prerequisite Toy Tasks”.

In: CoRR abs/1502.05698 (2015)

AnalogyQuestion:Answers = User Profile:Items

AMAR: Ask Me Any Rating



AMAR: Ask Me Any RatingUser and Item are modeled through two embeddings

EMBEDDINGS ARE JOINTLY LEARNED







Given an item, its textual description w1 , ... ,wn isrepresented through a RNN with LSTM cells

Each LSTM generates a latent representation h(wi) for each word wi

The final representation of the item is obtainedthrough a MEAN POOLING LAYER





The resulting embeddings are merged through a CONCATENATION LAYER








A LOGISTIC REGRESSION LAYER estimates user interest in the item and builds the recommendation list.




The resulting embeddings are merged through a CONCATENATION LAYER

AMAR+AMAR has a very modular and extensiblearchitecture

It is possible to add extra modules to encodemore information beyond the simple descriptionof the item





AMAR+ introduces A GENRE EMBEDDING,whichrepresents the genre associated to the item to be recommended






For each genre g1, … , gm associated to an item a genre embedding is learnt. All the embeddingsare averaged through a MEAN POOLING LAYER.






For each genre g1, … , gm associated to an item a genre embedding is learnt. All the embeddingsare averaged through a MEAN POOLING LAYER.

The new information is merged and the pipeline estimates again the user preference in the item



Experiments

How does our deep architectureperform when compared to other

content-based recommendersystems or state-of-the-art

baselines?



Datasets

MovieLens 1M (ML1M)

6,040 users3,883 movies1,000,209 ratings57.51% positive ratings165.59 ratings/user (avg.)269.88 ratings/item (avg.)99.4% sparsity



Datasets

DBbook

6,181 users6,733 movies72,732 ratings45.86% positive ratings11.71 ratings/user (avg.)10.74 ratings/item (avg.)99.8% sparsity



Experimental SettingsTop-N recommendation task

Metric◦ F1@5

AMAR parameters◦ RMSprop optimizer, 25 epochs

◦ a=0.9, learning rate 0.001

◦ Batch size 1536 (ML1M) and 512 (DBbook)

◦ Binary cross entropy as cost function

◦ User, Item and Genre embedding size = 10

Item Processing◦ Mapping item names with Wikipedia pages

◦ Extraction of textual content from plots



BaselinesWord Embedding techniques

◦ Word2Vec

◦ Glove

◦ Doc2Vec

◦ In Word2Vec and Glove, items/profile are representedas the centroid vector of the representation of the word occurring in the textual descriptions

Collaborative Filtering and Matrix Factorizationtechniques

U2U-CF, I2I-CF

BPRMF, BPRSlim, WRMF

Optimal parameters. All available in MyMediaLite toolkit



BaselinesWord Embedding techniques

◦ Word2Vec

◦ Glove

◦ Doc2Vec

◦ In Word2Vec and Glove, items/profile are representedas the centroid vector of the representation of the word occurring in the textual descriptions

Collaborative Filtering and Matrix Factorizationtechniques

◦ U2U-CF, I2I-CF

◦ BPRMF[*], BPRSlim[+], WRMF

◦ Optimal parameters.

◦ All available in MyMediaLite toolkit



[*] S. Rendle, C.Freudenthaler, Z. Gantner, L. Schmidt-Thieme:

BPR: Bayesian Personalized Ranking from Implicit Feedback. UAI 2009.

[+] X. Ning, G. Karypis: Slim: Sparse linear methods for top-n recommender systems. ICDM 2011.

Results – MovieLens data



0.5550.558

0.490.482 0.485

0.427 0.431 0.425 0.423

0.446

MovieLens

AMAR AMAR+ Word2Vec Doc2Vec Glove U2U I2I BPRMF WRMF BPRSlim

0.5550.558

0.490.482 0.485

0.427 0.431 0.425 0.423

0.446

MovieLens





Word

Embeddingtechniques

0.5550.558

0.490.482 0.485

0.427 0.431 0.425 0.423

0.446

MovieLens





Word

Embeddingtechniques

Collaborative Filtering and

Matrix Factorization

techniques

Results – DBbook data



0.5640.565

0.542 0.540.552

0.536 0.536

0.5080.519

0.511

MovieLens


0.5640.565

0.542 0.540.552

0.536 0.536

0.5080.519

0.511

MovieLens


Results – DBbook data



AMAR and AMAR+

overcome all the

baselines

RecapAMAR: a deep architecture for content-based recommendation exploiting RNNs

◦ Neural Network predicts the likelihood that a user would like a certain item

◦ User and Item embeddings are jointly learned.

◦ LSTMs to model textual description of the items.

Results

AMAR and AMAR+ significantly improve all the baselines

Modular and Extensible Architecture: AMAR+ introduces a genre embedding

High training time (ML1M=90’ per epoch , DBbook=50’ per epoch)



Thanks!



[email protected]

@cataldomusto, @ale_suglia

@cld_greco, @SWAP_research

Data & Analytics

A Deep Architecture for Content-based Recommendations Exploiting Recurrent Neural Networks