Relational learning using bilinear models and its ... · y(v;u) = g k(v) User ubought item w. Let...

Relational learning using bilinear models and its

application in E-commerce

Shenghuo Zhushenghuo.zhu@alibaba-inc.com

Alibaba Group

May 2, 2015

Collaborators:Rong Jin, Qi Qian, Lijun Zhang, Mehrdad Mahdavi, Tianbao Yang

Shenghuo Zhu (Alibaba) Relational learning using bilinear models and its application in E-commerceMay 2, 2015 1 / 21

Recommendation in e-commerce

From http://ju.taobao.com/

Personalized search in e-commerce

From http://tw.taobao.com/

Recommendation to sellers

From Taobao’s seller interface.

Relation and ranking

In the above tasks, we consider the relationship between

buyer and item

buyer/query and item

seller and item

seller and buyer

and rank items (or buyers) conditioned on a buyer (or a seller, aquery-buyer pair).

Relation as function

Buyer and item: u buyer, v item,I scoring function: y(v;u)

Ranking by scores:I y(vi;u) > y(vj ;u) =⇒ u prefers vi over vj .

Ranking by segmentation

Assume that given K underlying user segment, users, u,belonging to segment k share a same scoring function:

y(v;u) = gk(v)

User u bought item w. Let all users that bought item w besegment k. gk(v) be the preference scores (purchase history) ofitem v in segument k. It is item-based collaborative filtering.

Ranking using mixture

User u bought more than one item. The above strictsegmentation assumption is relaxed. It is usually considered touse similarity between users.

In a general term, scoring function of u is a linear combinationof gk:

y(v;u) =K∑k=1

βkgk(v),

where segmentation is latent.

Ranking using mixture

User u bought more than one item. The above strictsegmentation assumption is relaxed. It is usually considered touse similarity between users.

In a general term, scoring function of u is a linear combinationof gk:

y(v;u) =K∑k=1

βkgk(v),

where segmentation is latent.

Ranking as matrix factorizationFor each user u,

y(v;u) =∑k

βkgk(v)

As each user has its own βk = fk(u)

y(v;u) =∑k

fk(u)gk(v)

Put y(u, v), fk(u) and gk(v) as matrices

Y = GF> ≈ T

Convex: use low rank constraint of Y ,[YLZG09].

y(v;u) =∑k

βkgk(v)

y(v;u) =∑k

fk(u)gk(v)

Y = GF> ≈ T

y(v;u) =∑k

βkgk(v)

y(v;u) =∑k

fk(u)gk(v)

Y = GF> ≈ T

y(v;u) =∑k

βkgk(v)

y(v;u) =∑k

fk(u)gk(v)

Y = GF> ≈ T

Ranking in bilinear model

User feature xu, and item feature zv.

fk(u) = 〈ak, xu〉 , gk(v) = 〈bk, zv〉

y(v;u) =∑k

fk(u)gk(v) = 〈xu,Wzv〉

where W =∑

k akb>k .

Put y(v;u), fk(u) and gk(v) as matrices

Y = X>WZ

Ranking in bilinear model

User feature xu, and item feature zv.

fk(u) = 〈ak, xu〉 , gk(v) = 〈bk, zv〉

y(v;u) =∑k

fk(u)gk(v) = 〈xu,Wzv〉

where W =∑

k akb>k .

Put y(v;u), fk(u) and gk(v) as matrices

Y = X>WZ

Issues

How to control the complexity of learning space?I Rank of W , or nuclear norm ‖W‖∗.

When features have high dimensions, can we take the advantageof low complexity of W to reduce the computationalcomplexity?

I The model is essentially a linear model:

y ≡ vec(Y) = (Z ⊗X)>vec(W ) ≡ x>w.

I A projection apporach of linear model is presented in this talk.

Issues

How to control the complexity of learning space?I Rank of W , or nuclear norm ‖W‖∗.

When features have high dimensions, can we take the advantageof low complexity of W to reduce the computationalcomplexity?

I The model is essentially a linear model:

y ≡ vec(Y) = (Z ⊗X)>vec(W ) ≡ x>w.

I A projection apporach of linear model is presented in this talk.

Solve via random projection

To solve:

w = arg minw

2‖w‖2 +

`(x>i w, yi).

ApproachI Generate a random projection R of rank m, and let xi = Rxi.I Solve:

v = argminv

2‖v‖2 +

`(x>i v, yi).

I Recover: w = R>v.

Solve via random projection

To solve:

w = arg minw

2‖w‖2 +

`(x>i w, yi).

ApproachI Generate a random projection R of rank m, and let xi = Rxi.I Solve:

v = argminv

2‖v‖2 +

`(x>i v, yi).

I Recover: w = R>v.

w is limited in the subspace spanned by R, as w = R>v.

Theorem 3 of [ZMJ+13]

For any 0 < ε ≤ 1/2, with a probability1− exp(−(d− r)/32)− exp(−m/32)− δ, we have

‖w − w∗‖2 ≥1

√d− rm

2(1 + ε)

1− ε

)‖w∗‖2.

Dual space

Dual variable and function

`∗(α, yi) = supξαiξ − `(ξ, yi)

Dual problem

α = arg minα

2λα>X>Xα +

`∗(αi, yi).

Dual problem after random projection

α = arg minα

2λα>X>R>RXα +

`∗(αi, yi).

Proposition

For any 0 < ε ≤ 1/2, with a probability at least 1− δ, we have

‖α− α∗‖K ≤ε

1− ε‖α∗‖K ,

provided m = Ω(ε−2 log δ−1).

Random Projection Dual Recovery (DuRP)

Generate a random projection R of rank m, and let xi = Rxi.

Solve:

arg minv

2‖v‖2 +

`(x>i v, yi).

Obtain dual variables: αi = `′(x>i v, yi)

Recover primal solution: w = − 1λ

∑i αixi.

Random Projection Dual Recovery (DuRP)

Generate a random projection R of rank m, and let xi = Rxi.

Solve:

arg minv

2‖v‖2 +

`(x>i v, yi).

Obtain dual variables: αi = `′(x>i v, yi)

Recover primal solution: w = − 1λ

∑i αixi.

Approximation error of DuRP

Theorem 2 of [ZMJ+13]

For any 0 < ε ≤ 1/2, with a probability at least 1− δ, we have

‖w − w∗‖2 ≤ε

1− ε‖w∗‖2,

provided m ≥ (r+1) log(2r/δ)cε2

Dual variables in bilinear model

Dual variables in α can be reshaped to a matrix A, where thenonzero entries correpsond to the user-item pairs havinginteraction.

Then the recovered matrix is written as

W = XAZ>.

DuRP for high dimensional bi-linear model

Very high dimension in its linear representation, i.e. Z ⊗XI Random projection: R = R2 ⊗R1.[QJZL13].

Recovered matrix W = XAZ> is of high dimension and usuallydense, thus is difficult to apply to online service.

I Approximated by multiplication of two low rank matrices, usingapproximate SVD [HMT11].

Summary

Many applications using relational data

A bilinear model is a straightforward approach

To learn from massive data, dual recovery random projection.

Reference

N. Halko, P. Martinsson, and J. Tropp.

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.SIAM Review, 53(2):217–288, 2011.

Qi Qian, Rong Jin, Shenghuo Zhu, and Yuanqing Lin.

An integrated framework for high dimensional distance metric learning and its application to fine-grained visualcategorization.Technical Report 2013-TR102, NEC Laboratories America, 2013.arXiv:1402.0453.

Kai Yu, John Lafferty, Shenghuo Zhu, and Yihong Gong.

Large-scale collaborative prediction using a nonparametric random effects model.In ICML’09: Proceedings of the 26th Annual International Conference on Machine Learning, pages 1185–1192, NewYork, NY, USA, 2009. ACM.

Lijun Zhang, Mehrdad Mahdavi, Rong Jin, Tianbao Yang, and Shenghuo Zhu.

Recovering optimal solution by dual random projection.In COLT’13: The 26th Annual Conference on Learning Theory, 2013.

Relational learning using bilinear models and its ... · y(v;u) = g k(v) User ubought item w. Let...

Documents

The C5 Generic Collection Library for C# and CLIT, U, K, V generic type parameter v value of key in dictionary V w, u view IList 8.1 x, y collection item xs, ys item sequence,

FORM 8-K...8-K - FORM 8-K Item 1.01 Entry into a Material Definitive Agreement Item 3.02 Unregistered Sales of Equity Securities. Item 3.03 Material Modification to Rights of Securityholders

V 3 T K V 4 T K A UM- AT I RV T K

} v } u v µ K µ ] v , } o ~ K, } v } î ì í ï - caib.es

> 'h K& tKD E sKd Z^ K& K>>/ Z KhEdz€¦ · tKD E sKd Z^ K& K>>/ Z KhEdz D/^^/KE ^d d D Ed d Z > P µ } ( t } u v s } ] v } v ] v } o ] ] o } P v ] Ì ] } v Z v } µ P ] v ( } u

FORM 10-K · Part I: Item 1A. 24 Part I Item 1. Business Item 1A. Risk Factors Item 1B. Unresolved Staff Comments Item 2. Properties Item 3. Legal Proceedings Item 4. Submission of

FINAL for Public Release - Oranga Tangata Oranga Whanau ... · K v P d v P U K v P t Z v µ í K v P d v P U K v P t Z v µ K v P d v P U K v P t Z v µ î Æ µ ] À ^ µ u u Ç

Form 10-K 2004 - Brunswick Corporation · ANNUAL REPORT ON FORM 10-K TABLE OF CONTENTS Page Part I Item 1. Business 3 Item 2. Properties 10 Item 3. Legal Proceedings 11 Item 4. Submission

Item No. 1 : Application Received For Ph.D. Registration ... R C... · Subject – Architecture Date of Presentation : ... 3½½½½ lnj rDR;k e/; dkgh r`V;k fdok vkonu dyY; ... Date

FORM 10-K · 10-K - FORM 10-K MARCH 28, 2003 Part I PART I Item 1. Business Item 2. Properties Item 3. Legal Proceedings Item 4. Submission of Matters to a Vote of Security Holders

W E S T - roads.maryland.gov · v v v v v v v v v v v v) ù)ù n " ")c k k l n) ù k k n f j")ù a") ")a o s k k l k " q")ù f " 6 p" a") 3 k h 5 k h ä ä ä ä ä ä ä ä ä ä

Ö H ( ` `)) V) § F V ``§ )KK)* * ê6... · ! 4 2 b a ?9@ < + * $ * ) * ( )`) " j * * ) * ` m lkk` h g* * ) )v lofeo k) (* +* ( k* j ! * j ! k k))k q mu `)

FORM 10-K - de.marketscreener.com filetable of contents 10-k - form 10-k part i item 1. business item 2. properties item 3. legal proceedings item 4. submission of matters to a vote

V I C K Y V I C K Y

EDITORIAL BOARDkvinsdronacharyacochin.nic.in/images/2017-18-e-news...• K V Idukki • KV RB Kottayam • K V Kaduthuruthy • K V NTPCL Kayamkulam • K V INS Dronacharya The valedictory

v/;{k dk lans'k v;k dk lans'k - Oriental Bank of · PDF file1 Oriental Bank of Commerce Annual Report 2012-13 v/;{k dk lans'k Chairman's Message v;k dk lans'k Dear Shareholders, I

v;oi'kj'pj [k[

V K 7UHHVRI.QRZOHGJH

report Item Ot u Item scalaire Item Of V Item ot v€¦ · Esperanto Espanol Eesti suoml Français Hrvatski Magyar Interlingua aahasa Indonesia Italiano Malayalam Nederlands Norsk

v -' > k - Archive