PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

1PEGASOS Primal Efficient sub-GrAdient SOlver for SVM

Shai Shalev-Shwartz

Yoram Singer

Nati Srebro

The Hebrew University Jerusalem, Israel

YASSO = Yet Another Svm SOlver

2Support Vector Machines

QP form:

More “natural” form:

Empirical lossRegularization term

3Outline

• Previous Work

• The Pegasos algorithm

• Analysis – faster convergence rates

• Experiments – outperforms state-of-the-art

• Extensions• kernels• complex prediction problems• bias term

4Previous Work

• Dual-based methods• Interior Point methods

• Memory: m2, time: m3 log(log(1/))

• Decomposition methods• Memory: m, Time: super-linear in m

• Online learning & Stochastic Gradient• Memory: O(1), Time: 1/2 (linear kernel)• Memory: 1/2, Time: 1/4 (non-linear kernel)

Typically, online learning algorithms do not converge to the optimal solution of SVM

Better rates for finite dimensional instances (Murata, Bottou)

5PEGASOS

Subgradient

Projection

A_t = S

Subgradient method

|A_t| = 1

Stochastic gradient

6Run-Time of Pegasos

• Choosing |At|=1 and a linear kernel over Rn

Run-time required for Pegasos to find accurate solution w.p. ¸ 1-

• Run-time does not depend on #examples

• Depends on “difficulty” of problem ( and )

7Formal Properties

• Definition: w is accurate if

• Theorem 1: Pegasos finds accurate

solution w.p. ¸ 1-after at most

iterations.

• Theorem 2: Pegasos finds log(1/) solutions

s.t. w.p. ¸ 1-, at least one of them is accurate after iterations

8Proof SketchA second look on the update step:

9Proof Sketch

• Lemma (free projection):

• Logarithmic Regret for OCP (Hazan et al’06)

• Take expectation:

• f(wr)-f(w*) ¸ 0 Markov gives that w.p. ¸ 1-

• Amplify the confidence

10Experiments

• 3 datasets (provided by Joachims)• Reuters CCAT (800K examples, 47k features)• Physics ArXiv (62k examples, 100k features)• Covertype (581k examples, 54 features)

• 4 competing algorithms• SVM-light (Joachims)• SVM-Perf (Joachims’06)• Norma (Kivinen, Smola, Williamson ’02)• Zhang’04 (stochastic gradient descent)

• Source-Code available online

11Training Time (in seconds)

Pegasos SVM-Perf SVM-Light

Reuters 2 77 20,075

Covertype 6 85 25,514

Astro-

Physics2 5 80

12Compare to Norma (on Physics)

obj. value

test error

13Compare to Zhang (on Physics)

Ob

ject

ive

But, tuning the parameter is more expensive than learning …

14Effect of k=|At| when T is fixed

Ob

ject

ive

15Effect of k=|At| when kT is fixed

Ob

ject

ive

16I want my kernels !

• Pegasos can seamlessly be adapted to

employ non-linear kernels while working

solely on the primal objective function

• No need to switch to the dual problem

• Number of support vectors is bounded by

17Complex Decision Problems

• Pegasos works whenever we know how to

calculate subgradients of loss func. (w;(x,y))

• Example: Structured output prediction

• Subgradient is (x,y’)-(x,y) where y’ is the

maximizer in the definition of

18bias term

• Popular approach: increase dimension of xCons: “pay” for b in the regularization term

• Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2

• Define:Cons: |At| need to be large

• Search b in an outer loopCons: evaluating objective is 1/2

19Discussion• Pegasos: Simple & Efficient solver for SVM

• Sample vs. computational complexity• Sample complexity: How many examples do we need

as a function of VC-dim (), accuracy (), and confidence ()

• In Pegasos, we aim at analyzing computational complexity based on , , and (also in Bottou & Bousquet)

• Finding argmin vs. calculating min: It seems that

Pegasos finds the argmin more easily than it requires to

calculate the min value

Documents

PEGASOS Primal Efficient sub-GrAdient SOlver for SVM