Expected Tensor Decomposition with Stochastic Gradient...

Expected Tensor Decomposition with Stochastic Gradient Descent

Takanori Maehara (Shizuoka Univ) Kohei Hayashi (NII -> AIST) Ken-ichi Kawarabayashi (NII)

Presented at AAAI 2016

Information Overload

• Too many texts to catch up

• How can we know what’s going on? --- Machine learning!

http://blogs.transparent.com/

How can we analyze this?

http://tweetfake.com/

Word Co-occurance w

word matrix decomposition

Higher-order Information

• Idea is generalized as tensor

tensor

hashtag

tensor decomposition

Scalability Issue

• To compute decomposition, we need beforehand.

• Sometimes is too huge to put on memory.

tensor

Questions

• Can we get decomposition without ?

• If possible, when and how?

• When: is given by sum • How: stochastic optimization

tensor

Revisit: Word Co-occurance

• : “Heavy” tensor (e.g. dense) • : “Light” tensor (e.g. sparse)

tweet 1 tweet 2 tweet T

𝑋𝑡

we want to decompose

CP Decomposition

min𝑈,𝑉,𝑊

𝐿 　　;𝑈,𝑉,𝑊

• 𝐿 ;𝑈,𝑉,𝑊 = || − ∑ 𝑢𝑟 ∘ 𝑣𝑟 ∘ 𝑤𝑟𝑅𝑟=1 ||2

Batch Optimization

• CP decomposition is non-convex • Use alternating least squares

Repeat until convergence: • 𝑈 ← 𝑈 − 𝜂𝛻𝑈𝐿 • 𝑉 ← 𝑉 − 𝜂𝛻𝑉𝐿 • 𝑊 ← 𝑊 − 𝜂𝛻𝑊𝐿

step size

Key Fact: Loss is Decomposable

L( ) =L( + + …. + ) =L( ) + L( ) + … + L( ) + const

𝑋1 𝑋2 𝑋𝑇

𝐿 𝑋 = ||𝑋 −�𝑢𝑟 ∘ 𝑣𝑟 ∘ 𝑤𝑟

𝑟=1

E𝑥 − 𝑎 2 = E𝑥 2 − 2𝑎E𝑥 + 𝑎2 = E 𝑥2 − 2𝑎E𝑥 + 𝑎2 + E𝑥 2 − E 𝑥2 = E[𝑥 − 𝑎]2+Var[𝑥]

Intuition: for stochastic variable 𝑥 and non-stochastic variable 𝑎,

Stochastic Optimization

• If 𝐿 = 𝐿1 + 𝐿2 + ⋯+ 𝐿𝑇, we can use 𝛻𝐿𝑡 instead of 𝛻𝐿

• Stochastic ALS

for 𝑡 = 1, … ,𝑇: • 𝑈 ← 𝑈 − 𝜂𝑡𝛻𝑈𝐿 • 𝑉 ← 𝑉 − 𝜂𝑡𝛻𝑉𝐿 • 𝑊 ← 𝑊 − 𝜂𝑡𝛻𝑊𝐿

Repeat until convergence: • 𝑈 ← 𝑈 − 𝜂𝛻𝑈𝐿 • 𝑉 ← 𝑉 − 𝜂𝛻𝑉𝐿 • 𝑊 ← 𝑊 − 𝜂𝛻𝑊𝐿

𝑋 𝑋𝑡

𝑋𝑡

Observations

• ALS and SALS converge to the same solutions – (𝐿ALS−𝐿SALS) = const

• SALS does not require to precompute – Batch gradient: ∇L( ) – Stochastic gradient: ∇L( )

• One step of SALS is lighter – is sparse, whereas is not – ∇L( ) is also sparse

𝑋𝑡

𝑋 𝑋𝑡

𝑋𝑡

Related Work

• Element-wise SGD A special case of SALS

Experiment

• Amazon Review datasets (word triplets) • 10M reviews, 280K vocabs, 500M non-zeros

SGD almost converges before batch starts

Experiment (Cont’d)

• 34M reviews, 520K vocabs, 1G non-zeros

ALS failed (exhausting 256 GB

memory)

Expected Tensor Decomposition with Stochastic Gradient...

Documents

High-Dimensional Stochastic Optimal Control using Continuous Tensor Decompositions … · 2018-01-12 · 1 High-Dimensional Stochastic Optimal Control using Continuous Tensor Decompositions

CS 598 EVS: Tensor Computations - Tensor Decomposition

Optimizing Tensor Decomposition on HPC Systems –Challenges

Augmented Tensor Decomposition with Stochastic Alternating

Typed Tensor Decomposition of Knowledge Bases for Relation ... · Updated Version – October 16, 2014 Typed Tensor Decomposition of Knowledge Bases for Relation Extraction Kai-Wei

Online CP - GitHub Pages · 4. Online Stochastic Tensor Decomposition for Background Subtraction in Multispectral Video Sequences Problem: Dense 3-d tensor (2d image*multi-spectra)

A Tensor Spectral Approach to Learning Mixed Membership ... · Tensor decomposition: Obtain spectral decomposition of the tensor Tensor spectral clustering: Project nodes on the obtained

Q. ZHAO et al. TENSOR RING DECOMPOSITION 1 Tensor Ring ... · Q. ZHAO et al. TENSOR RING DECOMPOSITION 2 can be computed by a sequence of SVD decompositions, or by the cross approximation

Spectral decomposition of a 4th-order covariance tensor: … · 2014. 3. 9. · the variability of synthetic diffusion tensor magnetic resonance imaging (DTI) data. The spectral decomposition

Stochastic Decomposition for Two-stage Stochastic Linear ......Key words: Stochastic programming, stochastic decomposition, sample average approximation, two-stage models with random

Expected tensor decomposition with stochastic gradient descent

Stochastic Decomposition for Two-stage Stochastic Linear ... · Stochastic decomposition (SD) has been a computationally eﬀective approach to solve large-scale stochastic programming

Typed Tensor Decomposition of Knowledge Bases for Relation Extraction

Tensor train to solve stochastic PDEs

Multisite, multifrequency tensor decomposition of ... multifrequency tensor decomposition of magnetotelluric data Gary W. McNeice ⁄and Alan G. Jonesz ABSTRACT Accurate interpretation

Computing Large-Scale Matrix and Tensor Decomposition with ... · Computing Large-Scale Matrix and Tensor Decomposition with Structured Factors: A Uniﬁed Nonconvex Optimization

Introduction to the Tensor Train Decomposition and Its ... · and Its Applications in Machine Learning ... assign a label y ... Introduction to the Tensor Train Decomposition and

Dynamic Valuation Decomposition within Stochastic Economiespages.stern.nyu.edu/~dbackus/BCZ/HS/Hansen_lecture_Aug_11.pdf · Dynamic Valuation Decomposition within Stochastic Economies

Spectral Tensor-Train Decomposition for low-rank surrogate ... · Spectral tensor-train decomposition for low-rank surrogate models Daniele Bigoni ⇤1, Allan P. Engsig-Karup1, Youssef

tensor completion through multiple kronecker product decomposition