Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Vanishing Component Analysis Features never seen before
Roi Livni1,3, David Lehavi2, Sagi Schein2, Hila Nachlieli2, Shai Shelv-Shwartz3, Amir Globreson3 1 ELSC-ICNC Edmond & Lily Safra Center For Brain Sciences, The Hebrew University, Israel
2 HP Research Labs, Israel 3 School of Computer Science and Engineering The Hebrew University, Israel
Classical feature extraction methods search for features with “informative behavior” over the sample
VARIANTS
Could we be interested in features that are constant over the sample???
INVARIANTS
Introduction
We present a method for describing and representing a set of polynomials that approximately vanish over a sample set. • The main challenge is that the vanishing set is an exponentially large space. Making the problem seem intractable and statistically
infeasible. • The solution is to treat the space of polynomials as a ring and the vanishing polynomials as an ideal. Through the notion of
“generators” we are able to compactly represent a vanishing ideal. • We present one possible application and apply our method to classification. We thus show the effectiveness and potential of our
method.
A new approach for data representation:
For any sample set 𝑆, consider the space of polynomial equations satisfied by the sample set:
{𝑝: 𝑝 𝑠 = 0 ∀𝑠 ∈ 𝑆} • Fully characterizes a sample • Finitely (and tractably) generated for finite S
On real-world data sets:
𝑝(𝑠) ≈ 0 ∀𝑠 ∈ 𝑆
Some Definitions:
An Ideal 𝐼, in the polynomial ring 𝑅 𝑥1, … , 𝑥𝑚 satisfies: • A linear subspace • Absorbing property: 𝑓 ∈ 𝐼, 𝑔 ∈ 𝑅 𝑥1 … , 𝑥𝑚 ⇒ 𝑓 ⋅ 𝑔 ∈ 𝐼
{𝑔1, … , 𝑔𝑚} generate 𝐼 if for all 𝑓 ∈ 𝐼 we can represent 𝑓 = ∑ℎ𝑖𝑔𝑖
for some ℎ𝑖′𝑠
Guarantees:
• At zero tolerance, V generates the vanishing ideal.
• Algorithm stops after m iterations.
• 𝐹 ≤ 𝑚 ; 𝑉 ≤ 𝑚2 ⋅ min ( 𝐹 , 𝑑).
• Takes 𝑂( 𝐹 2 + 𝐹 ⋅ 𝑉 ) time to compute F and V on a new sample point.
Applications and Future Directions:
• By learning the invariants of each class we’ve exploited VCA for tasks of classification.
• We’ve constructed a linear classifier that is competitive with KSVM and is faster during test time.
• How can we exploit vanishing components for other tasks in Machine Learning?
• Clustering. • Parametric Manifold
Learning. • And more…
Experiments
Timeline:
• Hilbert (1890): Every ideal in the polynomial ring has a finite set of generators.
• Buchberger & Möller (1982): For the set of polynomials vanishing on a finite set we can efficiently compute and store a set of generators .
• Kreuzer et al (2009): AVI algorithm: “Approximately Vanishing Ideal”. A numerical approach.
Algorithm: VCA
𝑥1
𝑥3
𝑝1
𝑝3
𝑝12
𝑝1𝑝2
𝑝22
𝑔1
𝑔2
𝑔3
𝑔1𝑝1
𝑔2𝑝1
𝑔1𝑝2
𝑔2𝑝2
ℎ1
ℎ2
ℎ3
ℎ4
Next Degree
SVD SVD Next
Degree SVD
Return -V; -F .
𝑥2 𝑝2
Test Runtime Error Rate Data Set
KSVM VCA KSVM VCA
9600 2800 0.42 0.42 Pendigits
70,000 1100 4.3 4.8 Letter
380,000 2600 1.4 1.5 USPS
3,100,000 4000 2 2.2 MNIST
…