1
The Forgetron: A kernel-based Perceptron on a fixed budget Problem Setting Online learning: Choose an initial hypothesis f 0 •For t=1,2,… Receive an instance x t and predict sign(f t (x t )) If (y t f t (x t ) <= 0) •update the hypothesis f Goal: minimize the number of prediction mistakes Kernel-based hypotheses •Example: the dual Perceptron is always 1 •Initial hypothesis: •Update rule: Competitive analysis •Compete against any g: Goal: A provably correct algorithm on a budget: |I t | <= B The Forgetron Proof Technique Experiments Hardness Result •For any kernel-based algorithm that satisfies |I t | <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss. for each f t based on B examples, exists x in X such that f t (x) = 0 •norm of the competitor is we cannot compete with hypotheses •Initilize: •For t=1,2… •define Receive an instance x t , predict sign(f t (x t )), and then receive y t If y t f t (x t ) <= 0 set M t = M t-1 + 1 and update: If |I’ t |<= B skip the next two steps define r t = min I t •Progress measure: can be rewritten as •The Perceptron update leads to positive progress: •The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ? synthetic (10% label noise) The Hebrew University Jerusalem, Israel Ofer Dekel Shai Shalev- Shwartz Yoram Singer A sequence {(x 1 ,y 1 ),…,(x T ,y T )} such that K(x t ,x t ) <= 1 •Budget B >= 84 •Competitor g s.t. •Then, 1 2 3 ... ... t-1 t Step (1) - Perceptron Step (2) - Shrinking Step (3) - Removal •Compare to Crammer, Kandola, Singer (NIPS’03) •Display online error vs. budget constraint 1 2 3 ... ... t-1 t 1 2 3 ... ... t-1 t 1 2 3 ... ... t-1 t Deviation due to Shrinking Deviation due to Removal •Case I: the shrinking is projection onto the ball of radius U = ||g||: •Case II: aggressive shrinking •If example (x,y) is activated on round r and deactivated on round t then, Perceptron’s progress Deviation due to removal: Mistake Bound hinge loss max { 0 , 1 - y t g(x t ) } 1000 2000 3000 4000 5000 6000 0.05 0.1 0.15 0.2 0.25 0.3 budgetsize -B average error Forgetron CKS 0 500 1000 1500 2000 0.2 0.25 0.3 0.35 0.4 0.45 budgetsize -B average error Forgetron CKS 100 200 300 400 500 0.05 0.1 0.15 0.2 0.25 budgetsize -B average error Forgetron CKS USPS census- income (adult) 1000 2000 3000 0.05 0.1 0.15 0.2 0.25 0.3 0.35 budgetsize -B average error Forgetron CKS MNIST defin e defin e

The Forgetron: A kernel-based Perceptron on a fixed budget

Embed Size (px)

DESCRIPTION

Ofer Dekel Shai Shalev-Shwartz Yoram Singer. The Forgetron: A kernel-based Perceptron on a fixed budget. The Hebrew University Jerusalem, Israel. Problem Setting. The Forgetron. Proof Technique. Mistake Bound. Online learning: Choose an initial hypothesis f 0 For t=1,2,… - PowerPoint PPT Presentation

Citation preview

Page 1: The Forgetron:  A kernel-based Perceptron on a fixed budget

The Forgetron: A kernel-based Perceptron on a fixed budget

Problem Setting•Online learning:

•Choose an initial hypothesis f0

•For t=1,2,…

•Receive an instance xt and predict sign(ft(xt))

•If (yt ft(xt) <= 0)

•update the hypothesis f

•Goal: minimize the number of prediction mistakes

•Kernel-based hypotheses

•Example: the dual Perceptron

• is always 1

•Initial hypothesis:

•Update rule:

•Competitive analysis

•Compete against any g:

•Goal: A provably correct algorithm on a budget: |It| <= B

The Forgetron Proof Technique

Experiments

Hardness Result•For any kernel-based algorithm that satisfies |It| <= B we can find g and an arbitrarily long sequence such that the algorithm errs on every single round whereas g suffers no loss.

• for each ft based on B examples, exists x in X such that ft(x) = 0

•norm of the competitor is

we cannot compete with hypotheses whose norm larger than

•Initilize:

•For t=1,2…

•define

•Receive an instance xt, predict sign(ft(xt)), and then receive yt

•If yt ft(xt) <= 0 set Mt = Mt-1 + 1 and update:

•If |I’t|<= B skip the next two steps

•define rt = min It

•Progress measure: can be rewritten as

•The Perceptron update leads to positive progress:

•The shrinking and removal steps might lead to negative progress Can we quantify the deviation they might cause ?

synthetic (10% label noise)

The Hebrew University Jerusalem, Israel

Ofer Dekel

Shai Shalev-Shwartz

Yoram Singer

•A sequence {(x1,y1),…,(xT,yT)} such that K(xt,xt) <= 1

•Budget B >= 84

•Competitor g s.t.

•Then, 1 2 3 ... ... t-1 t

Step (1) - Perceptron

Step (2) - Shrinking

Step (3) - Removal

•Compare to Crammer, Kandola, Singer (NIPS’03)

•Display online error vs. budget constraint 1 2 3 ... ... t-1 t

1 2 3 ... ... t-1 t

1 2 3 ... ... t-1 t

Deviation due to Shrinking

Deviation due to Removal

•Case I: the shrinking is projection onto the ball of radius U = ||g||:

•Case II: aggressive shrinking

•If example (x,y) is activated on round r and deactivated on round t then,

Perceptron’s progress

Deviation due to removal:

Mistake Bound

hinge lossmax { 0 , 1 - ytg(xt)

}

1000 2000 3000 4000 5000 6000

0.05

0.1

0.15

0.2

0.25

0.3

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

0 500 1000 1500 2000

0.2

0.25

0.3

0.35

0.4

0.45

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

100 200 300 400 5000.05

0.1

0.15

0.2

0.25

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

USPS

census-income (adult)

1000 2000 3000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

budget size - B

ave

rag

e e

rro

r

ForgetronCKS

MNIST

define

define