1
Vanishing Component Analysis Features never seen before Roi Livni 1,3 , David Lehavi 2 , Sagi Schein 2 , Hila Nachlieli 2 , Shai Shelv-Shwartz 3 , Amir Globreson 3 1 ELSC-ICNC Edmond & Lily Safra Center For Brain Sciences, The Hebrew University, Israel 2 HP Research Labs, Israel 3 School of Computer Science and Engineering The Hebrew University, Israel Classical feature extraction methods search for features with “informative behavior” over the sample VARIANTS Could we be interested in features that are constant over the sample??? INVARIANTS Introduction We present a method for describing and representing a set of polynomials that approximately vanish over a sample set. The main challenge is that the vanishing set is an exponentially large space. Making the problem seem intractable and statistically infeasible. The solution is to treat the space of polynomials as a ring and the vanishing polynomials as an ideal. Through the notion of “generators” we are able to compactly represent a vanishing ideal. We present one possible application and apply our method to classification. We thus show the effectiveness and potential of our method. A new approach for data representation: For any sample set , consider the space of polynomial equations satisfied by the sample set: {: = 0 ∀ ∈ } Fully characterizes a sample Finitely (and tractably) generated for finite S On real-world data sets: () ≈ 0 ∀ ∈ Some Definitions: An Ideal , in the polynomial ring 1 ,…, satisfies: A linear subspace Absorbing property: ∈ , ∈ 1 …, ⇒⋅∈ { 1 ,…, } generate if for all we can represent = ∑ℎ for some Guarantees: At zero tolerance, V generates the vanishing ideal. Algorithm stops after m iterations. ; 2 ⋅ min( , ). Takes ( 2 + ) time to compute F and V on a new sample point. Applications and Future Directions: By learning the invariants of each class we’ve exploited VCA for tasks of classification. We’ve constructed a linear classifier that is competitive with KSVM and is faster during test time. How can we exploit vanishing components for other tasks in Machine Learning? Clustering. Parametric Manifold Learning. And more… Experiments Timeline: Hilbert (1890): Every ideal in the polynomial ring has a finite set of generators. Buchberger & Möller (1982): For the set of polynomials vanishing on a finite set we can efficiently compute and store a set of generators . Kreuzer et al (2009): AVI algorithm: “Approximately Vanishing Ideal”. A numerical approach. Algorithm: VCA 1 3 1 3 1 2 1 2 2 2 1 2 3 1 1 2 1 1 2 2 2 1 2 3 4 Next Degree SVD SVD Next Degree SVD Return -V; -F . 2 2 Test Runtime Error Rate Data Set KSVM VCA KSVM VCA 9600 2800 0.42 0.42 Pendigits 70,000 1100 4.3 4.8 Letter 380,000 2600 1.4 1.5 USPS 3,100,000 4000 2 2.2 MNIST

Vanishing Component Analysis - Princeton …rlivni/files/vanishing...Vanishing Component Analysis Features never seen before Roi Livni1,3, David Lehavi2, Sagi Schein2, Hila Nachlieli2,

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vanishing Component Analysis - Princeton …rlivni/files/vanishing...Vanishing Component Analysis Features never seen before Roi Livni1,3, David Lehavi2, Sagi Schein2, Hila Nachlieli2,

Vanishing Component Analysis Features never seen before

Roi Livni1,3, David Lehavi2, Sagi Schein2, Hila Nachlieli2, Shai Shelv-Shwartz3, Amir Globreson3 1 ELSC-ICNC Edmond & Lily Safra Center For Brain Sciences, The Hebrew University, Israel

2 HP Research Labs, Israel 3 School of Computer Science and Engineering The Hebrew University, Israel

Classical feature extraction methods search for features with “informative behavior” over the sample

VARIANTS

Could we be interested in features that are constant over the sample???

INVARIANTS

Introduction

We present a method for describing and representing a set of polynomials that approximately vanish over a sample set. • The main challenge is that the vanishing set is an exponentially large space. Making the problem seem intractable and statistically

infeasible. • The solution is to treat the space of polynomials as a ring and the vanishing polynomials as an ideal. Through the notion of

“generators” we are able to compactly represent a vanishing ideal. • We present one possible application and apply our method to classification. We thus show the effectiveness and potential of our

method.

A new approach for data representation:

For any sample set 𝑆, consider the space of polynomial equations satisfied by the sample set:

{𝑝: 𝑝 𝑠 = 0 ∀𝑠 ∈ 𝑆} • Fully characterizes a sample • Finitely (and tractably) generated for finite S

On real-world data sets:

𝑝(𝑠) ≈ 0 ∀𝑠 ∈ 𝑆

Some Definitions:

An Ideal 𝐼, in the polynomial ring 𝑅 𝑥1, … , 𝑥𝑚 satisfies: • A linear subspace • Absorbing property: 𝑓 ∈ 𝐼, 𝑔 ∈ 𝑅 𝑥1 … , 𝑥𝑚 ⇒ 𝑓 ⋅ 𝑔 ∈ 𝐼

{𝑔1, … , 𝑔𝑚} generate 𝐼 if for all 𝑓 ∈ 𝐼 we can represent 𝑓 = ∑ℎ𝑖𝑔𝑖

for some ℎ𝑖′𝑠

Guarantees:

• At zero tolerance, V generates the vanishing ideal.

• Algorithm stops after m iterations.

• 𝐹 ≤ 𝑚 ; 𝑉 ≤ 𝑚2 ⋅ min ( 𝐹 , 𝑑).

• Takes 𝑂( 𝐹 2 + 𝐹 ⋅ 𝑉 ) time to compute F and V on a new sample point.

Applications and Future Directions:

• By learning the invariants of each class we’ve exploited VCA for tasks of classification.

• We’ve constructed a linear classifier that is competitive with KSVM and is faster during test time.

• How can we exploit vanishing components for other tasks in Machine Learning?

• Clustering. • Parametric Manifold

Learning. • And more…

Experiments

Timeline:

• Hilbert (1890): Every ideal in the polynomial ring has a finite set of generators.

• Buchberger & Möller (1982): For the set of polynomials vanishing on a finite set we can efficiently compute and store a set of generators .

• Kreuzer et al (2009): AVI algorithm: “Approximately Vanishing Ideal”. A numerical approach.

Algorithm: VCA

𝑥1

𝑥3

𝑝1

𝑝3

𝑝12

𝑝1𝑝2

𝑝22

𝑔1

𝑔2

𝑔3

𝑔1𝑝1

𝑔2𝑝1

𝑔1𝑝2

𝑔2𝑝2

ℎ1

ℎ2

ℎ3

ℎ4

Next Degree

SVD SVD Next

Degree SVD

Return -V; -F .

𝑥2 𝑝2

Test Runtime Error Rate Data Set

KSVM VCA KSVM VCA

9600 2800 0.42 0.42 Pendigits

70,000 1100 4.3 4.8 Letter

380,000 2600 1.4 1.5 USPS

3,100,000 4000 2 2.2 MNIST