On the extension of trace norm to tensorsttic.uchicago.edu/~ryotat/talks/trace10.pdfOn the extension...

Preview:

Citation preview

On the extension of trace norm to tensors

Ryota Tomioka1, Kohei Hayashi2, Hisashi Kashima1

1The University of Tokyo2Nara Institute of Science and Technology

2010/12/10

NIPS2010 Workshop: Tensors, Kernels, and Machine Learning

Convex low-rank tensor completion

SensorsTime

Feat

ures

Factors(loadings)

Core(interactions)

I1

I2I3

I1

I2

I3

r1

r2

r3r1

r2 r3

Tucker decomposition

Conventional formulation (nonconvex)

mode-k productobservation

minimizeC,U1,U2,U3

!! " (Y # C $1 U1 $2 U2 $3 U3) !2F + regularization.

minimizeX

!! " (Y # X ) !2F s.t. rank(X ) $ (r1, r2, r3).

• Alternate minimization• Have to fix the rank beforehand

Our approach

MatrixEstimation of

low-rank matrix(hard)

Trace normminimization(tractable)

[Fazel, Hindi, Boyd 01]

TensorEstimation of

low-rank tensor(hard)

Rank defined in the sense ofTucker decomposition

Extendedtrace norm

minimization(tractable)

Generalization

Trace norm regularization (for matrices)

X ! RR!C , m = min(R,C)

Linear sum of singular-values

• Roughly speaking, L1 regulariza6on on the singular‐values.

• Stronger regulariza6on ‐‐> more zero singular‐values     ‐‐> 

low rank.

• Not obvious for tensors (no singular‐values for tensors)

!X!tr =m!

j=1

!j(X)

Mode-k unfolding (matricization)

I1

I2 I3

I1

I2 I2 I2

I3

Mode-1 unfolding

I1

I2 I3

I2

I3 I3 I3

I1

Mode-2 unfolding

X(1)

X(2)

Elementary facts about Tucker decomposition

X = C !1 U1 !2 U2 !3 U3

r1

r2 r3C

I1r1

U1 I2

r2U2

I3

r3

U3

Mode-1 unfolding

Mode-2 unfolding

Mode-3 unfolding

X(1) = U1C(1)(U3 !U2)!

X(2) = U2C(2)(U1 !U3)!

X(3) = U3C(3)(U2 !U1)!

rank ≦ r1

rank ≦ r2

rank ≦ r3

The rank of X(k) is no more than the rank of C(k)

What it means

• We can use the trace norm of an unfolding of a 

tensor X to learn low‐rank X.

Tensor X is low-rank∃k, rk<Ik

Unfolding Xk is a low-rank

matrix

Matricization

Tensorization

Approach 1: As a matrix

• Pick a mode k, and hope that the tensor to be 

learned is low rank in mode k.

minimizeX!RI1!···!IK

12!

!! " (Y # X ) !2F + !X(k)!",

Pro: Basically a matrix problem → Theoretical guarantee (Candes & Recht 09)Con: Have to be lucky to pick the right mode.

Approach 2: Constrained optimization

• Constrain so that each unfolding of X is 

simultaneously low rank.

minimizeX!RI1!···!IK

12!

!! " (Y # X ) !2F +

K!

k=1

"k!X(k)!".

γk: tuning parameter usually set to 1.

Pro: Jointly regularize every modeCon: Strong constraint

See also Marco Signoretto et al.,10

Approach 3: Mixture of low-rank tensors

• Each mixture component Zk is regularized to be 

low‐rank only in mode‐k.

minimizeZ1,...,ZK

12!

!!!!!! !"Y "

#K

k=1Zk

$!!!!!

2

F

+K#

k=1

"k#Zk(k)#!,

Pro: Each Zk takes care of each modeCon: Sum is not low-rank

Numerical experiment

• True tensor: Size 50x50x20, rank 7x8x9. No noise (λ=0).

• Random train/test split.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

10−4

10−2

100

102

Fraction of observed elements

Gen

eral

izatio

n er

ror

As a Matrix (mode 1)As a Matrix (mode 2)As a Matrix (mode 3)ConstraintMixtureTucker (large)Tucker (exact)Optimization tolerance

Tucker=EM algo

(nonconvex)

Computation time

• Convex formula6on is also fast

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

Fraction of observed elements

Com

puta

tion

time

(s)

As a MatrixConstraintMixtureTucker (large)Tucker (exact)

Phase transition behaviour

0 20 40 60 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

[10 12 13]

[10 12 9][16 6 8]

[17 13 15]

[20 20 5]

[20 3 2]

[30 12 10]

[32 3 4]

[40 20 6]

[40 3 2]

[40 9 7]

[4 42 3][5 4 3]

[7 8 15]

[7 8 9][8 5 6]

Sum of true ranks

Frac

tion

requ

ired

for p

erfe

ct re

cons

truct

ion

• Sum of true ranks= min(r1,r2r3)+ min(r2,r3r1)+ min(r3,r1r2)

Summary

• Low‐rank tensor comple6on can be computed in a convex 

op6miza6on problem using the trace norm of the unfoldings.

‐ No need to specify the rank beforehand.

• Convex formula6on is more accurate and faster than 

conven6onal EM‐based Tucker decomposi6on.

• Curious “phase transi6on” found → compressive‐sensing‐

type analysis is an on‐going work.

• Technical report: Arxiv:1010.0789 (including op6miza6on)

• Code:

‐ h]p://www.ibis.t.u‐tokyo.ac.jp/RyotaTomioka/So_wares/Tensor

Acknowledgment

• This work was supported in part by MEXT KAKENHI 

22700138, 22700289, and NTT Communica6on 

Science Laboratories.

Recommended