View
10
Download
0
Category
Preview:
Citation preview
Bayesian Personalized Feature Interaction Selection forFactorization Machines
Yifan Chen1,2 Pengjie Ren1 Yang Wang3 Maarten de Rijke1
1University of Amsterdam
2National University of Defense Technology
3Hefei University of Technology
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 1 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 2 / 28
Factorization Machines
What is Factorization Machine?I generic supervised learning methodI account for feature interactions with factored parameters
I the combination of features
#Hashtag
“comics”
“marvel”
“avengers”
Feature combinations
(“comics”,“marvel”)
(“comics”,“avengers”)
(“marvel”,“avengers”)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 3 / 28
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
Factorization Machines
I Linear regression: O(d)
r̂(x) = b0 +d∑
i=1
wixi
I Degree-2 polynomial regression: O(d2)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
I Factorization machine: O(dk)
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
〈vi , vj〉 · xixj
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28
Factorization Machines
Example
r̂(spider-man) =b0 + wcomics + wmarvel + wavengers+
〈vcomics, vmarvel〉+ 〈vcomics, vavengers〉+ 〈vmarvel, vavengers〉
#Hashtag
“comics”
“marvel”
“avengers”
Feature combinations
(“comics”,“marvel”)
(“comics”,“avengers”)
(“marvel”,“avengers”)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 5 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 6 / 28
Factorization Machines for Recommendation
I Effective use of historical interactions between users and items
I Incorporate additional information associated with users or itemsI High-dimensional feature space
I #feature = #user + #item + #additionalI not all features or feature interactions are helpful
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28
Factorization Machines for Recommendation
I Effective use of historical interactions between users and items
I Incorporate additional information associated with users or itemsI High-dimensional feature space
I #feature = #user + #item + #additionalI not all features or feature interactions are helpful
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28
Feature Interaction Selection (FIS)
Filter out useless feature interactions
I P-FIS: Select feature interactionsfor users personally
I FIS: select a common set ofinteractions
x1
x2
x3
x4
x1 · x2
x1 · x3
x1 · x4
x2 · x3
x2 · x4
x3 · x4
x1 · x3
x1 · x4
x2 · x4
FM
FIS for u1 and u2
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28
Feature Interaction Selection (FIS)
Filter out useless feature interactions
I P-FIS: Select feature interactionsfor users personally
x1
x2
x3
x4
x1 · x2
x1 · x3
x1 · x4
x2 · x3
x2 · x4
x3 · x4
x1 · x3
x1 · x3
x2 · x3
x2 · x3
x2 · x4
x2 · x4
FM
FM
P-FIS for u1
P-FIS for u2
I FIS: select a common set ofinteractions
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 9 / 28
Personalized Factorization Machines (PFM)
FM
r̂(x) = b0 +d∑
i=1
wixi +d∑
i=1
d∑j=i+1
wij · xixj
PFM
r̂(x) = bu +d∑
i=1
wuixi +d∑
i=1
d∑j=i+1
wuij · xixj
Select 1st-order interactions {xi} and 2nd-order interactions {xixj} by {wui} and {wuij}
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 10 / 28
Bayesian Variable Selection (BVS)
I Apply BVS to select feature interactionsI avoid expensive cross-validation
I Priors for BVSI sparsity priorsI spike-and-slab
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 11 / 28
Bayesian Variable Selection
Spike-and-slab
I Spike (black arrow):p(w = 0) = 0.5
I Slab (blue line)
Sparsity priors
I f (w) = 12b exp
(− |x−µ|b
)I p(w = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 12 / 28
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
Hereditary Spike-and-Slab Priors
I Spike-and-slab
s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.
I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions
sui , suj ∼ Bernoulli(π1)
p(suij = 1 | sui suj = 1) = 1 (Strong heredity)
p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)
p(suij = 1 | sui + suj = 0) = 0
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)
4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)
5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)
8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)
9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM
12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
Generative Procedure of BP-FIS
Algorithm Generation procedure
1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i
6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij
10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 15 / 28
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
Optimization
Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )
Infeasible exact inferenceI space complexity: O(md2)
I time complexity: O(2md2)
Variational inferenceI approximate p(W̃ ,S | R,X ) by
q(W̃ ,S)I space complexity: O(md)
I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same
as FMs
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Experiment
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 17 / 28
Experimental Setup
DatasetsHetRec: Information Heterogeneityand Fusion in Recommender Systems
I MovieLens: rating and tagging
I LastFM: rating, tagging, socialnetworking
I Delicious: rating, tagging, socialnetworking
BaselinesI Rendle (2010): Factorization Machine
(FM)
I Sparse Factorization Machine (SFM)
I Xiao et al. (2017): AttentionalFactorization Machine (AFM)
I He and Chua (2017): NeuralFactorization Machine (NFM)
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 18 / 28
Experimental Setup
Our methodsapply BP-FIS to a linear FM and anon-linear FM
I BP-FM
I BP-NFM
EvaluationTop-N recommendation
I Leave-One-Out-Cross-Validation(LOOCV)
I ranking among 100 items
I metrics: HR@N and ARHR@N
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 19 / 28
Overall Performance
Table: Delicious
Method HR@1 HR@10 ARHR@10
FM 0.0202 0.1147 0.0440SFM 0.0229 0.1212 0.0465AFM 0.0274 0.1169 0.0494BP-FM 0.0278 0.1240** 0.0509*
NFM 0.0229 0.1065 0.0426BP-NFM 0.0268 0.1289** 0.0504**
∗ and ∗∗ indicate that the best score is signif-icantly better than the second best score withp < 0.1 and p < 0.05, respectively.
I SFM outperforms FM and AFMon HR@10: need for FIS
I BP-FM and BP-NFM significantlyoutperforms FMs and NFM,respectively: effect of P-FIS
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 20 / 28
Impact of Embedding Size
k=64 k=128 k=256
0.0
0.2
0.4
0.6
MovieLens
HR
@10 AFM
FMSFMNFMBP−FMBP−NFM
I k = 64: P-FIS has insignificant effect of FMsI k = 128, 256:
I BP-FM and BP-NFM significantly outperform FMs and NFMI BP-NFM does not outperform BP-FM
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 21 / 28
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 1, top-1 recommendation
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 22 / 28
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 2, top-5 recommendation
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 23 / 28
Case study
#hashtag
“action”“buddy”
“comedy”
“sequel”
“action”
“buddy”
“comedy”
“sequel”
Figure: User 3, not recommended
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 24 / 28
IntroductionFactorization MachinesFeature Interaction Selection
Model descriptionBayesian personalized feature interaction selectionEfficient optimization
Experiment
Conclusion
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 25 / 28
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
Conclusion
1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.
2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs
3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28
Future Work
1. Extend BP-FIS to select higher-order feature interactions
2. Consider group-level personalization via clustering to speed up training
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 27 / 28
Thank You
Source codehttps://github.com/yifanclifford/BP-FIS
ContactI Yifan Chen (https://sites.google.com/view/yifanchenuva/home)
I y.chen4@uva.nl
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28
Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for SparsePredictive Analytics. In SIGIR. ACM, 355–364.
Steffen Rendle. 2010. Factorization Machines. In ICDM. IEEE, 995–1000.
Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017.Attentional Factorization Machines: Learning the Weight of Feature Interactions viaAttention Networks. In IJCAI. 3119–3125.
Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28
Recommended