49
Bayesian Personalized Feature Interaction Selection for Factorization Machines Yifan Chen 1,2 Pengjie Ren 1 Yang Wang 3 Maarten de Rijke 1 1 University of Amsterdam 2 National University of Defense Technology 3 Hefei University of Technology Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 1 / 28

Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Bayesian Personalized Feature Interaction Selection forFactorization Machines

Yifan Chen1,2 Pengjie Ren1 Yang Wang3 Maarten de Rijke1

1University of Amsterdam

2National University of Defense Technology

3Hefei University of Technology

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 1 / 28

Page 2: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 2 / 28

Page 3: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines

What is Factorization Machine?I generic supervised learning methodI account for feature interactions with factored parameters

I the combination of features

#Hashtag

“comics”

“marvel”

“avengers”

Feature combinations

(“comics”,“marvel”)

(“comics”,“avengers”)

(“marvel”,“avengers”)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 3 / 28

Page 4: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines

I Linear regression: O(d)

r̂(x) = b0 +d∑

i=1

wixi

I Degree-2 polynomial regression: O(d2)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

wij · xixj

I Factorization machine: O(dk)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

〈vi , vj〉 · xixj

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

Page 5: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines

I Linear regression: O(d)

r̂(x) = b0 +d∑

i=1

wixi

I Degree-2 polynomial regression: O(d2)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

wij · xixj

I Factorization machine: O(dk)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

〈vi , vj〉 · xixj

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

Page 6: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines

I Linear regression: O(d)

r̂(x) = b0 +d∑

i=1

wixi

I Degree-2 polynomial regression: O(d2)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

wij · xixj

I Factorization machine: O(dk)

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

〈vi , vj〉 · xixj

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 4 / 28

Page 7: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines

Example

r̂(spider-man) =b0 + wcomics + wmarvel + wavengers+

〈vcomics, vmarvel〉+ 〈vcomics, vavengers〉+ 〈vmarvel, vavengers〉

#Hashtag

“comics”

“marvel”

“avengers”

Feature combinations

(“comics”,“marvel”)

(“comics”,“avengers”)

(“marvel”,“avengers”)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 5 / 28

Page 8: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 6 / 28

Page 9: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines for Recommendation

I Effective use of historical interactions between users and items

I Incorporate additional information associated with users or itemsI High-dimensional feature space

I #feature = #user + #item + #additionalI not all features or feature interactions are helpful

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28

Page 10: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Factorization Machines for Recommendation

I Effective use of historical interactions between users and items

I Incorporate additional information associated with users or itemsI High-dimensional feature space

I #feature = #user + #item + #additionalI not all features or feature interactions are helpful

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 7 / 28

Page 11: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Feature Interaction Selection (FIS)

Filter out useless feature interactions

I P-FIS: Select feature interactionsfor users personally

I FIS: select a common set ofinteractions

x1

x2

x3

x4

x1 · x2

x1 · x3

x1 · x4

x2 · x3

x2 · x4

x3 · x4

x1 · x3

x1 · x4

x2 · x4

FM

FIS for u1 and u2

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28

Page 12: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Feature Interaction Selection (FIS)

Filter out useless feature interactions

I P-FIS: Select feature interactionsfor users personally

x1

x2

x3

x4

x1 · x2

x1 · x3

x1 · x4

x2 · x3

x2 · x4

x3 · x4

x1 · x3

x1 · x3

x2 · x3

x2 · x3

x2 · x4

x2 · x4

FM

FM

P-FIS for u1

P-FIS for u2

I FIS: select a common set ofinteractions

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 8 / 28

Page 13: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Model descriptionBayesian personalized feature interaction selectionEfficient optimization

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 9 / 28

Page 14: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Personalized Factorization Machines (PFM)

FM

r̂(x) = b0 +d∑

i=1

wixi +d∑

i=1

d∑j=i+1

wij · xixj

PFM

r̂(x) = bu +d∑

i=1

wuixi +d∑

i=1

d∑j=i+1

wuij · xixj

Select 1st-order interactions {xi} and 2nd-order interactions {xixj} by {wui} and {wuij}

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 10 / 28

Page 15: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Bayesian Variable Selection (BVS)

I Apply BVS to select feature interactionsI avoid expensive cross-validation

I Priors for BVSI sparsity priorsI spike-and-slab

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 11 / 28

Page 16: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Bayesian Variable Selection

Spike-and-slab

I Spike (black arrow):p(w = 0) = 0.5

I Slab (blue line)

Sparsity priors

I f (w) = 12b exp

(− |x−µ|b

)I p(w = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 12 / 28

Page 17: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Hereditary Spike-and-Slab Priors

I Spike-and-slab

s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.

I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions

sui , suj ∼ Bernoulli(π1)

p(suij = 1 | sui suj = 1) = 1 (Strong heredity)

p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)

p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

Page 18: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Hereditary Spike-and-Slab Priors

I Spike-and-slab

s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.

I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions

sui , suj ∼ Bernoulli(π1)

p(suij = 1 | sui suj = 1) = 1 (Strong heredity)

p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)

p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

Page 19: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Hereditary Spike-and-Slab Priors

I Spike-and-slab

s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.

I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions

sui , suj ∼ Bernoulli(π1)

p(suij = 1 | sui suj = 1) = 1 (Strong heredity)

p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)

p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

Page 20: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Hereditary Spike-and-Slab Priors

I Spike-and-slab

s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.

I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions

sui , suj ∼ Bernoulli(π1)

p(suij = 1 | sui suj = 1) = 1 (Strong heredity)

p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)

p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

Page 21: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Hereditary Spike-and-Slab Priors

I Spike-and-slab

s ∼ Bernoulli(π), w̃ ∼ N (0, 1), w = w̃ · s.

I Hereditary spike-and-slabI capture the relations between 1st-order and 2nd-order feature interactions

sui , suj ∼ Bernoulli(π1)

p(suij = 1 | sui suj = 1) = 1 (Strong heredity)

p(suij = 1 | sui + suj = 1) = π2 (Weak heredity)

p(suij = 1 | sui + suj = 0) = 0

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 13 / 28

Page 22: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 23: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)

4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 24: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)

5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 25: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 26: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)

8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 27: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)

9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 28: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 29: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM

12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 30: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Generative Procedure of BP-FIS

Algorithm Generation procedure

1: for each user u ∈ U do2: for each feature i ∈ F do3: draw first-order interaction selection variable sui ∼ Bernoulli(π1)4: draw first-order interaction weight w̃i ∼ N (0, 1)5: wui = sui · w̃i

6: for each feature pair i , j ∈ F do7: draw second-order interaction selection variable suij ∼ p(suij | sui , suj)8: draw second-order interaction weight w̃ij ∼ N (0, 1)9: wuij = suij · w̃ij

10: for each feature vector x ∈ X do11: calculate the rating prediction r̂(x) by PFM12: draw r(x) ∼ p(r | r̂(x))

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 14 / 28

Page 31: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Model descriptionBayesian personalized feature interaction selectionEfficient optimization

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 15 / 28

Page 32: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Optimization

Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )

Infeasible exact inferenceI space complexity: O(md2)

I time complexity: O(2md2)

Variational inferenceI approximate p(W̃ ,S | R,X ) by

q(W̃ ,S)I space complexity: O(md)

I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same

as FMs

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

Page 33: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Optimization

Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )

Infeasible exact inferenceI space complexity: O(md2)

I time complexity: O(2md2)

Variational inferenceI approximate p(W̃ ,S | R,X ) by

q(W̃ ,S)I space complexity: O(md)

I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same

as FMs

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

Page 34: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Optimization

Maximum A Posteriori: arg maxW̃ ,S p(W̃ , S | R,X )

Infeasible exact inferenceI space complexity: O(md2)

I time complexity: O(2md2)

Variational inferenceI approximate p(W̃ ,S | R,X ) by

q(W̃ ,S)I space complexity: O(md)

I Stochastic Gradient VariationalBayes (SGVB)I time complexity: O(dk), same

as FMs

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 16 / 28

Page 35: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Model descriptionBayesian personalized feature interaction selectionEfficient optimization

Experiment

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 17 / 28

Page 36: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Experimental Setup

DatasetsHetRec: Information Heterogeneityand Fusion in Recommender Systems

I MovieLens: rating and tagging

I LastFM: rating, tagging, socialnetworking

I Delicious: rating, tagging, socialnetworking

BaselinesI Rendle (2010): Factorization Machine

(FM)

I Sparse Factorization Machine (SFM)

I Xiao et al. (2017): AttentionalFactorization Machine (AFM)

I He and Chua (2017): NeuralFactorization Machine (NFM)

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 18 / 28

Page 37: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Experimental Setup

Our methodsapply BP-FIS to a linear FM and anon-linear FM

I BP-FM

I BP-NFM

EvaluationTop-N recommendation

I Leave-One-Out-Cross-Validation(LOOCV)

I ranking among 100 items

I metrics: HR@N and ARHR@N

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 19 / 28

Page 38: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Overall Performance

Table: Delicious

Method HR@1 HR@10 ARHR@10

FM 0.0202 0.1147 0.0440SFM 0.0229 0.1212 0.0465AFM 0.0274 0.1169 0.0494BP-FM 0.0278 0.1240** 0.0509*

NFM 0.0229 0.1065 0.0426BP-NFM 0.0268 0.1289** 0.0504**

∗ and ∗∗ indicate that the best score is signif-icantly better than the second best score withp < 0.1 and p < 0.05, respectively.

I SFM outperforms FM and AFMon HR@10: need for FIS

I BP-FM and BP-NFM significantlyoutperforms FMs and NFM,respectively: effect of P-FIS

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 20 / 28

Page 39: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Impact of Embedding Size

k=64 k=128 k=256

0.0

0.2

0.4

0.6

MovieLens

HR

@10 AFM

FMSFMNFMBP−FMBP−NFM

I k = 64: P-FIS has insignificant effect of FMsI k = 128, 256:

I BP-FM and BP-NFM significantly outperform FMs and NFMI BP-NFM does not outperform BP-FM

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 21 / 28

Page 40: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Case study

#hashtag

“action”“buddy”

“comedy”

“sequel”

“action”

“buddy”

“comedy”

“sequel”

Figure: User 1, top-1 recommendation

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 22 / 28

Page 41: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Case study

#hashtag

“action”“buddy”

“comedy”

“sequel”

“action”

“buddy”

“comedy”

“sequel”

Figure: User 2, top-5 recommendation

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 23 / 28

Page 42: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Case study

#hashtag

“action”“buddy”

“comedy”

“sequel”

“action”

“buddy”

“comedy”

“sequel”

Figure: User 3, not recommended

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 24 / 28

Page 43: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

IntroductionFactorization MachinesFeature Interaction Selection

Model descriptionBayesian personalized feature interaction selectionEfficient optimization

Experiment

Conclusion

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 25 / 28

Page 44: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Conclusion

1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.

2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs

3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

Page 45: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Conclusion

1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.

2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs

3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

Page 46: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Conclusion

1. We study personalized feature interaction selection (P-FIS) for FactorizationMachines.

2. We propose a Bayesian personalized feature interaction selection (BP-FIS)method based on the Bayesian variable selection.I We propose hereditary spike-and-slab as priors to achieve P-FIS.I BP-FIS is a plug-and-play framework for FMs

3. We design an efficient optimization algorithm based on Stochastic GradientVariational Bayes (SGVB).

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 26 / 28

Page 47: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Future Work

1. Extend BP-FIS to select higher-order feature interactions

2. Consider group-level personalization via clustering to speed up training

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 27 / 28

Page 48: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Thank You

Source codehttps://github.com/yifanclifford/BP-FIS

ContactI Yifan Chen (https://sites.google.com/view/yifanchenuva/home)

I [email protected]

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28

Page 49: Bayesian Personalized Feature Interaction Selection …...Factorization Machines I Linear regression: O(d) ^r(x) = b 0 + Xd i=1 w ix i I Degree-2 polynomial regression: O(d2) r^(x)

Xiangnan He and Tat-Seng Chua. 2017. Neural Factorization Machines for SparsePredictive Analytics. In SIGIR. ACM, 355–364.

Steffen Rendle. 2010. Factorization Machines. In ICDM. IEEE, 995–1000.

Jun Xiao, Hao Ye, Xiangnan He, Hanwang Zhang, Fei Wu, and Tat-Seng Chua. 2017.Attentional Factorization Machines: Learning the Weight of Feature Interactions viaAttention Networks. In IJCAI. 3119–3125.

Yifan Chen, Pengjie Ren, Yang Wang, Maarten de Rijke SIGIR 2019 28 / 28