Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Finding Large Elements in Factorized TensorsMichael E. HOULE1 Hisashi KASHIMA2 Michael NETT1,2
1 National Institute of Informatics, Japan2 The University of Tokyo, Japan
Summary
Low-rank tensor decompositions have becomevaluable tools for many practical problems aris-ing in machine learning and data mining appli-cations concerned with higher-order data. Whilemuch research has focused on specific methodsfor decomposition, only a limited amount of at-tention has been given to the process of locatingelements of interest from within a tensor.Efficient strategies for identifying and extractingtargeted elements are very important in prac-tical applications, which frequently operate onscales at which one cannot afford the time re-quirements associated with brute force methods.We propose several algorithms for solving or ap-proximating the problem of locating the k largestelements of a tensor given only its factoriza-tion. We provide experimental evidence that ourmethods enable follow-on applications in areassuch as link prediction and recommender sys-tems to operate at a significantly larger scale.
Motivation
In virtually all practical applications of high-ordertensors, the number of elements of the tensor isfar larger than the number of objects in the dataset. Generally speaking, in order to be effective,tensor-based applications must take advantageof the sparsity of the representation, and avoidthe expensive brute-force scanning of tensor el-ements. One of the most important issues in thearea of tensor analysis is therefore that of tensorcompletion, where after observing only a limitednumber of element values, one seeks to deduceproperties of the remaining unobserved values.Depending on the application, one usually re-quires knowledge of a relatively small set of ten-sor elements, rather than the entire unobservedportion of the tensor at hand. Tensor completionstrategies have many real world applications in-cluding, but not limited to, personalized tag rec-ommendation and link prediction in social webservices.
Vector, Matrix and Tensor Products
• Let X and Y be tensors of dimensions n1, . . . , np.Their element-wise product X ∗Y is given by
[X ∗Y]i1...ip = [X]i1...ip · [Y]i1...ip.
• Let u(1), . . . , u(p) be vectors in Rn. Their innerproduct is
Φ(u(1), . . . , u(p)) =
n∑i=1
p∏j=1
u(j)i .
• Let u(1), . . . , u(p) be vectors with u(i) ∈ Rni for1 6 i 6 p. Their outer product is the order-ptensor satisfying
[Φ(u(1), . . . , u(p))]i1...ip =
p∏j=1
u(j)ij .
Note that, whenever only two vectors u and v areinvolved, one may conveniently use the simplernotation of Φ(u, v) = u>v and Ψ(u, v) = uv>.
Problem Definition
Let X be a real-valued order-p tensor in Rn1×···×np
and let k ∈ N. The top-k problem is to find aset S of tensor elements whose sum is maximal,subject to |S| = min{k, n1 · · ·np}.
100%94%
82%49%
30%12%
3%16%
100%92%
65%38%
39%9%
14%10%
66%64%
62%41%
18%23%
14%13%
47%49%
40%26%
33%25%
21%30%
30%29%
16%29%
29%37%
28%30%
24%19%
23%17%
22%45%
54%66%
7%4%
17%14%
30%68%
86%94%
10%17%
20%16%
42%75%
86%100%
13%17%
26%12%
39%64%
80%79%
7%20%
15%8%
24%47%
56%57%
100%
100%
86%
68%
39%
27%
18%
16%
100%
89%
84%
71%
45%
29%
28%
9%
66%
69%
70%
42%
38%
47%
24%
26%
47%
48%
32%
46%
44%
40%
45%
33%
30%
33%
27%
28%
45%
43%
41%
47%
24%
21%
22%
44%
52%
75%
67%
71%
7%
18%
26%
49%
57%
84%
83%
87%
10%
30%
34%
37%
63%
90%
95%
96%
13%
17%
27%
35%
64%
83%
83%
89%
7%
26%
19%
26%
45%
59%
72%
64%
100%94%
82%49%
30%12%
3%16%
100%100%
78%59%
47%14%
6%13%
86%76%
83%65%
55%41%
24%24%
68%78%
83%97%
75%54%
37%22%
39%57%
97%97%
88%61%
42%14%
27%64%
80%91%
76%39%
30%25%
18%34%
56%55%
43%29%
18%7%
16%31%
37%21%
37%21%
16%8%
Data Model
The provided data is modeled using a low-rankfactorization with factor matrices U(1), . . . , U(p):
X =
m∑i=1
X(i) =
m∑i=1
Ψ(u(1):i , . . . , u(p)
:i )
The tensor element at location (i1, . . . , ip) is theinner product of the correpsonding rows of thefactor matrices:
[X]i1...ip =
r∑j=1
[Ψ(u(1)
:j , . . . , u(p):j )
]i1...ip
=
r∑j=1
p∏k=1
u(k)ikj
= Φ(u(1)i1: , . . . , u(p)
ip: )
Method I – Enumeration
An exhaustive enumeration of all tensor ele-ments provides exact answers to top-k queriesin time Θ(n1 · · ·np(rp + log k)).
Method II – Greedy Heuristic
An indexing-based method which provides ap-proximate answers to top-k queries in timeO(p2sktr).
Method III – Divide and Conquer
A clustering-based divide and conquer methodwhich aims to reduce the time complexity by dis-carding as many subproblems as possible.
X[2,1,1] X[2,2,1]
X[1,1,1] X[1,2,1]
Individual subproblems can either be solved ex-actly via enumeration, or approximate using thepreviously proposed heuristic.
Importance of Scalability
Projections show that for real application data,exhaustive query processing may require up to31 years.
1
10
100
1000
10000
100000
1e+06
1e+07
1e+08
1e+09
1e+10
Delicious Favorite LastFM Movielens RetweetE
nu
me
ratio
n T
ime
[s]
Data Set
Reduced SetFull Set
The proportion of uninteresting tensor elementsis overwhelmingly large.
1
10
100
1000
10000
100000
1e+06
1e+07
1e+08
1e+09
-1 -0.5 0 0.5 1 1.5 2 2.5
Num
. of O
bserv
ations
Tensor Elements
Distribution of Elements in the LastFM Data Set
Evaluation
Performance of the methods for the LastFM dataset. The input tensor has size 2,100 × 18,744 ×12,647.
200
300
400
500
600
700
800
900
1000
1100
1200
1300
0 100 200 300 400 500 600
Accu
mu
late
d T
op
-10
00
Ele
me
nts
Runtime [s]
Comparison on the full "LastFM" Data Set
EnumDCGreedyDC
Greedy
Additional Material
•Technical Report•Poster• Implementation•Documentation
連連連絡絡絡先先先: Michael E. HOULE (フフフーーールルルマママイイイケケケルルル) / 国国国立立立情情情報報報学学学研研研究究究所所所 客客客員員員教教教授授授
� : 03 - 4212 - 2538 t : 03 - 3556 - 1916 B : meh @ nii.ac.jp