Relational learning using bilinear models and its ... · y(v;u) = g k(v) User ubought item w. Let...

Preview:

Citation preview

Relational learning using bilinear models and its

application in E-commerce

Shenghuo Zhushenghuo.zhu@alibaba-inc.com

Alibaba Group

May 2, 2015

Collaborators:Rong Jin, Qi Qian, Lijun Zhang, Mehrdad Mahdavi, Tianbao Yang

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 1 / 21

Recommendation in e-commerce

From http://ju.taobao.com/

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 2 / 21

Personalized search in e-commerce

From http://tw.taobao.com/

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 3 / 21

Recommendation to sellers

From Taobao’s seller interface.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 4 / 21

Relation and ranking

In the above tasks, we consider the relationship between

buyer and item

buyer/query and item

seller and item

seller and buyer

...

and rank items (or buyers) conditioned on a buyer (or a seller, aquery-buyer pair).

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 5 / 21

Relation as function

Buyer and item: u buyer, v item,I scoring function: y(v;u)

Ranking by scores:I y(vi;u) > y(vj ;u) =⇒ u prefers vi over vj .

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 6 / 21

Ranking by segmentation

Assume that given K underlying user segment, users, u,belonging to segment k share a same scoring function:

y(v;u) = gk(v)

User u bought item w. Let all users that bought item w besegment k. gk(v) be the preference scores (purchase history) ofitem v in segument k. It is item-based collaborative filtering.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 7 / 21

Ranking using mixture

User u bought more than one item. The above strictsegmentation assumption is relaxed. It is usually considered touse similarity between users.

In a general term, scoring function of u is a linear combinationof gk:

y(v;u) =K∑k=1

βkgk(v),

where segmentation is latent.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 8 / 21

Ranking using mixture

User u bought more than one item. The above strictsegmentation assumption is relaxed. It is usually considered touse similarity between users.

In a general term, scoring function of u is a linear combinationof gk:

y(v;u) =K∑k=1

βkgk(v),

where segmentation is latent.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 8 / 21

Ranking as matrix factorizationFor each user u,

y(v;u) =∑k

βkgk(v)

As each user has its own βk = fk(u)

y(v;u) =∑k

fk(u)gk(v)

Put y(u, v), fk(u) and gk(v) as matrices

Y = GF> ≈ T

Convex: use low rank constraint of Y ,[YLZG09].

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 9 / 21

Ranking as matrix factorizationFor each user u,

y(v;u) =∑k

βkgk(v)

As each user has its own βk = fk(u)

y(v;u) =∑k

fk(u)gk(v)

Put y(u, v), fk(u) and gk(v) as matrices

Y = GF> ≈ T

Convex: use low rank constraint of Y ,[YLZG09].

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 9 / 21

Ranking as matrix factorizationFor each user u,

y(v;u) =∑k

βkgk(v)

As each user has its own βk = fk(u)

y(v;u) =∑k

fk(u)gk(v)

Put y(u, v), fk(u) and gk(v) as matrices

Y = GF> ≈ T

Convex: use low rank constraint of Y ,[YLZG09].

FG

Y

W

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 9 / 21

Ranking as matrix factorizationFor each user u,

y(v;u) =∑k

βkgk(v)

As each user has its own βk = fk(u)

y(v;u) =∑k

fk(u)gk(v)

Put y(u, v), fk(u) and gk(v) as matrices

Y = GF> ≈ T

Convex: use low rank constraint of Y ,[YLZG09].

FG

Y

W

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 9 / 21

Ranking in bilinear model

User feature xu, and item feature zv.

fk(u) = 〈ak, xu〉 , gk(v) = 〈bk, zv〉

y(v;u) =∑k

fk(u)gk(v) = 〈xu,Wzv〉

where W =∑

k akb>k .

Put y(v;u), fk(u) and gk(v) as matrices

Y = X>WZ

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 10 / 21

Ranking in bilinear model

User feature xu, and item feature zv.

fk(u) = 〈ak, xu〉 , gk(v) = 〈bk, zv〉

y(v;u) =∑k

fk(u)gk(v) = 〈xu,Wzv〉

where W =∑

k akb>k .

Put y(v;u), fk(u) and gk(v) as matrices

Y = X>WZ

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 10 / 21

Issues

How to control the complexity of learning space?I Rank of W , or nuclear norm ‖W‖∗.

When features have high dimensions, can we take the advantageof low complexity of W to reduce the computationalcomplexity?

I The model is essentially a linear model:

y ≡ vec(Y) = (Z ⊗X)>vec(W ) ≡ x>w.

I A projection apporach of linear model is presented in this talk.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 11 / 21

Issues

How to control the complexity of learning space?I Rank of W , or nuclear norm ‖W‖∗.

When features have high dimensions, can we take the advantageof low complexity of W to reduce the computationalcomplexity?

I The model is essentially a linear model:

y ≡ vec(Y) = (Z ⊗X)>vec(W ) ≡ x>w.

I A projection apporach of linear model is presented in this talk.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 11 / 21

Solve via random projection

To solve:

w = arg minw

λ

2‖w‖2 +

∑i

`(x>i w, yi).

ApproachI Generate a random projection R of rank m, and let xi = Rxi.I Solve:

v = argminv

λ

2‖v‖2 +

∑i

`(x>i v, yi).

I Recover: w = R>v.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 12 / 21

Solve via random projection

To solve:

w = arg minw

λ

2‖w‖2 +

∑i

`(x>i w, yi).

ApproachI Generate a random projection R of rank m, and let xi = Rxi.I Solve:

v = argminv

λ

2‖v‖2 +

∑i

`(x>i v, yi).

I Recover: w = R>v.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 12 / 21

Issue

w is limited in the subspace spanned by R, as w = R>v.

Theorem 3 of [ZMJ+13]

For any 0 < ε ≤ 1/2, with a probability1− exp(−(d− r)/32)− exp(−m/32)− δ, we have

‖w − w∗‖2 ≥1

2

√d− rm

(1−

ε√

2(1 + ε)

1− ε

)‖w∗‖2.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 13 / 21

Dual space

Dual variable and function

`∗(α, yi) = supξαiξ − `(ξ, yi)

Dual problem

α = arg minα

1

2λα>X>Xα +

∑i

`∗(αi, yi).

Dual problem after random projection

α = arg minα

1

2λα>X>R>RXα +

∑i

`∗(αi, yi).

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 14 / 21

Proposition

For any 0 < ε ≤ 1/2, with a probability at least 1− δ, we have

‖α− α∗‖K ≤ε

1− ε‖α∗‖K ,

provided m = Ω(ε−2 log δ−1).

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 15 / 21

Random Projection Dual Recovery (DuRP)

Generate a random projection R of rank m, and let xi = Rxi.

Solve:

arg minv

λ

2‖v‖2 +

∑i

`(x>i v, yi).

Obtain dual variables: αi = `′(x>i v, yi)

Recover primal solution: w = − 1λ

∑i αixi.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 16 / 21

Random Projection Dual Recovery (DuRP)

Generate a random projection R of rank m, and let xi = Rxi.

Solve:

arg minv

λ

2‖v‖2 +

∑i

`(x>i v, yi).

Obtain dual variables: αi = `′(x>i v, yi)

Recover primal solution: w = − 1λ

∑i αixi.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 16 / 21

Approximation error of DuRP

Theorem 2 of [ZMJ+13]

For any 0 < ε ≤ 1/2, with a probability at least 1− δ, we have

‖w − w∗‖2 ≤ε

1− ε‖w∗‖2,

provided m ≥ (r+1) log(2r/δ)cε2

.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 17 / 21

Dual variables in bilinear model

Dual variables in α can be reshaped to a matrix A, where thenonzero entries correpsond to the user-item pairs havinginteraction.

Then the recovered matrix is written as

W = XAZ>.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 18 / 21

DuRP for high dimensional bi-linear model

Very high dimension in its linear representation, i.e. Z ⊗XI Random projection: R = R2 ⊗R1.[QJZL13].

Recovered matrix W = XAZ> is of high dimension and usuallydense, thus is difficult to apply to online service.

I Approximated by multiplication of two low rank matrices, usingapproximate SVD [HMT11].

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 19 / 21

Summary

Many applications using relational data

A bilinear model is a straightforward approach

To learn from massive data, dual recovery random projection.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 20 / 21

Reference

N. Halko, P. Martinsson, and J. Tropp.

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011.

Qi Qian, Rong Jin, Shenghuo Zhu, and Yuanqing Lin.

An integrated framework for high dimensional distance metric learning and its application to fine-grained visualcategorization.Technical Report 2013-TR102, NEC Laboratories America, 2013.arXiv:1402.0453.

Kai Yu, John Lafferty, Shenghuo Zhu, and Yihong Gong.

Large-scale collaborative prediction using a nonparametric random effects model.In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 1185–1192, NewYork, NY, USA, 2009. ACM.

Lijun Zhang, Mehrdad Mahdavi, Rong Jin, Tianbao Yang, and Shenghuo Zhu.

Recovering optimal solution by dual random projection.In COLT’13: The 26th Annual Conference on Learning Theory, 2013.

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 21 / 21

Recommended