The Forgetron: A kernel-based Perceptron on a fixed budget

The Forgetron: A kernel-based Perceptron on a fixed budget

Problem Setting•Online learning:

•Choose an initial hypothesis f0

•For t=1,2,…

•Receive an instance xt and predict sign(ft(xt))

•If (yt ft(xt) <= 0)

•update the hypothesis f

•Goal: minimize the number of prediction mistakes

•Kernel-based hypotheses

•Example: the dual Perceptron

• is always 1

•Initial hypothesis:

•Update rule:

•Competitive analysis

•Compete against any g:

•Goal: A provably correct algorithm on a budget: |It| <= B

The Forgetron Proof Technique

Experiments

Hardness Result•For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss.

• for each ft based on B examples, exists x in X such that ft(x) = 0

•norm of the competitor is

we cannot compete with hypotheses whose norm larger than

•Initilize:

•For t=1,2…

•define

•Receive an instance xt, predict sign(ft(xt)), and then receive yt

•If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update:

•If |I’t|<= B skip the next two steps

•define rt = min It

•Progress measure: can be rewritten as

•The Perceptron update leads to positive progress:

•The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ?

synthetic (10% label noise)

The Hebrew University Jerusalem, Israel

Ofer Dekel

Shai Shalev-Shwartz

Yoram Singer

•A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1

•Budget B >= 84

•Competitor g s.t.

•Then, 1 2 3 ... ... t-1 t

Step (1) - Perceptron

Step (2) - Shrinking

Step (3) - Removal

•Compare to Crammer, Kandola, Singer (NIPS’03)

•Display online error vs. budget constraint 1 2 3 ... ... t-1 t

1 2 3 ... ... t-1 t

1 2 3 ... ... t-1 t

Deviation due to Shrinking

Deviation due to Removal

•Case I: the shrinking is projection onto the ball of radius U = ||g||:

•Case II: aggressive shrinking

•If example (x,y) is activated on round r and deactivated on round t then,

Perceptron’s progress

Deviation due to removal:

Mistake Bound

hinge lossmax { 0 , 1 - ytg(xt)

}

1000 2000 3000 4000 5000 6000

0.05

0.1

0.15

0.2

0.25

0.3

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

0 500 1000 1500 2000

0.2

0.25

0.3

0.35

0.4

0.45

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

100 200 300 400 5000.05

0.1

0.15

0.2

0.25

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

USPS

census-income (adult)

1000 2000 3000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

MNIST

define

define

Documents

The Forgetron: A kernel-based Perceptron on a fixed budget