Upload
tadhg
View
60
Download
0
Embed Size (px)
DESCRIPTION
YASSO = Yet Another Svm SOlver. PEGASOS Primal Efficient sub-GrAdient SOlver for SVM. Shai Shalev-Shwartz Yoram Singer Nati Srebro. The Hebrew University Jerusalem, Israel. Support Vector Machines. QP form:. More “natural” form:. Regularization term. Empirical loss. Outline. - PowerPoint PPT Presentation
Citation preview
1PEGASOS Primal Efficient sub-GrAdient SOlver for SVM
Shai Shalev-Shwartz
Yoram Singer
Nati Srebro
The Hebrew University Jerusalem, Israel
YASSO = Yet Another Svm SOlver
2Support Vector Machines
QP form:
More “natural” form:
Empirical lossRegularization term
3Outline
• Previous Work
• The Pegasos algorithm
• Analysis – faster convergence rates
• Experiments – outperforms state-of-the-art
• Extensions• kernels• complex prediction problems• bias term
4Previous Work
• Dual-based methods• Interior Point methods
• Memory: m2, time: m3 log(log(1/))
• Decomposition methods• Memory: m, Time: super-linear in m
• Online learning & Stochastic Gradient• Memory: O(1), Time: 1/2 (linear kernel)• Memory: 1/2, Time: 1/4 (non-linear kernel)
Typically, online learning algorithms do not converge to the optimal solution of SVM
Better rates for finite dimensional instances (Murata, Bottou)
5PEGASOS
Subgradient
Projection
A_t = S
Subgradient method
|A_t| = 1
Stochastic gradient
6Run-Time of Pegasos
• Choosing |At|=1 and a linear kernel over Rn
Run-time required for Pegasos to find accurate solution w.p. ¸ 1-
• Run-time does not depend on #examples
• Depends on “difficulty” of problem ( and )
7Formal Properties
• Definition: w is accurate if
• Theorem 1: Pegasos finds accurate
solution w.p. ¸ 1-after at most
iterations.
• Theorem 2: Pegasos finds log(1/) solutions
s.t. w.p. ¸ 1-, at least one of them is accurate after iterations
8Proof SketchA second look on the update step:
9Proof Sketch
• Lemma (free projection):
• Logarithmic Regret for OCP (Hazan et al’06)
• Take expectation:
• f(wr)-f(w*) ¸ 0 Markov gives that w.p. ¸ 1-
• Amplify the confidence
10Experiments
• 3 datasets (provided by Joachims)• Reuters CCAT (800K examples, 47k features)• Physics ArXiv (62k examples, 100k features)• Covertype (581k examples, 54 features)
• 4 competing algorithms• SVM-light (Joachims)• SVM-Perf (Joachims’06)• Norma (Kivinen, Smola, Williamson ’02)• Zhang’04 (stochastic gradient descent)
• Source-Code available online
11Training Time (in seconds)
Pegasos SVM-Perf SVM-Light
Reuters 2 77 20,075
Covertype 6 85 25,514
Astro-
Physics2 5 80
12Compare to Norma (on Physics)
obj. value
test error
13Compare to Zhang (on Physics)
Ob
ject
ive
But, tuning the parameter is more expensive than learning …
14Effect of k=|At| when T is fixed
Ob
ject
ive
15Effect of k=|At| when kT is fixed
Ob
ject
ive
16I want my kernels !
• Pegasos can seamlessly be adapted to
employ non-linear kernels while working
solely on the primal objective function
• No need to switch to the dual problem
• Number of support vectors is bounded by
17Complex Decision Problems
• Pegasos works whenever we know how to
calculate subgradients of loss func. (w;(x,y))
• Example: Structured output prediction
• Subgradient is (x,y’)-(x,y) where y’ is the
maximizer in the definition of
18bias term
• Popular approach: increase dimension of xCons: “pay” for b in the regularization term
• Calculate subgradients w.r.t. w and w.r.t b:Cons: convergence rate is 1/2
• Define:Cons: |At| need to be large
• Search b in an outer loopCons: evaluating objective is 1/2
19Discussion• Pegasos: Simple & Efficient solver for SVM
• Sample vs. computational complexity• Sample complexity: How many examples do we need
as a function of VC-dim (), accuracy (), and confidence ()
• In Pegasos, we aim at analyzing computational complexity based on , , and (also in Bottou & Bousquet)
• Finding argmin vs. calculating min: It seems that
Pegasos finds the argmin more easily than it requires to
calculate the min value