25
Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Embed Size (px)

Citation preview

Page 1: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Linear Programming Boosting

by Column and Row Generation

Kohei Hatano and Eiji TakimotoKyushu University, Japan

DS 2009

Page 2: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

Page 3: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

ExampleGiven a web page, predict if the topic

is “DS 2009”.

hypothesis set= words

DS 2009?y n

-1+1

DS 2009?y n

-1+1

0.5* ALT?y n

-1+1

+ 0.3* Porto?y n

-1+1

+ 0.2*

Weighted majority vote

Modern Machine Learning ApproachFind a weighted majority of hypotheses (or hyperplane) which enlarges the margin

Page 4: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

--≥

-

n

iii

ii

n

jijji

m

ii

ρ

αα

ymiξρhαy

mννξν

ρ

11

1

1,,

0 ,1

)1,1 ,,...,1( ,)(

sub.to.

)1 s.t. constant:(1

.max

α

x

ξα

1-norm soft margin optimization

(aka 1 norm SVM)Popular formulation as well as 2 norm soft margin opt.“find a hyperplane which separates positive and negative instances well”

margin ρ

ξ i

loss ξ i

Note・ Linear Program・ good generarization guarantee [Schapire et al. 98]

marginloss

normalized with 1 norm

Page 5: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1-norm soft margin optimization(2)

Advantage of 1 norm soft margin opt.Solution likely to be sparse   ⇒  useful for feature selection

sparse hyperplane0.5 *(DS 2009?) + 0.3 *(ALT?) + 0.2* (Porto?)

non-sparse hyperplane0.2 *(DS 2009?) + 0.1 *(ALT?) + 0.1 * (Porto?)* 0.1* (wine?)+0.05*(tasting?)+0.05* (discovery?)+ 0.03*(science?)+0.02*()+…

2 norm soft margin opt.

1 norm soft marign opt.

Page 6: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Recent Results

Our resultnew algorithm for 1 norm soft margin optimization

2-norm soft margin optimization・ Quadratic Programming・ SMO [Platt, 1999]

・ SVMlight [Joachims, 1999]

・ SVMPerf [Joachims, 2006]

・ Pegasos [Shai-Schwartz et al., 2007]

There are state-of-the-art solvers

1-norm soft margin optimization・ Linear Programming・ LPBoost [Dem iriz et al, 2003]

・ Entropy Regularized LPBoost [Warmuth et al., 2008]

・ others [Mangasarian 2006][Sra 2006]

not efficient enough for large dataroom for improvements

Page 7: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

Page 8: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

BoostingClassification: frog “+1”, others “-1”Hypotheses ・・・ color and size

color

size

-1

+1

1 . d1 : uniform distribution

2 . For t=1,…,T (i) Choose hypothesis ht

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

-1

-1

+1

Page 9: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Boosting (2)

color

size

-1

+1

1 . d1 : uniform distribution

2 . For t=1,…,T (i) Choose hypothesis ht

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

-1

-1

+1

h1

h1(xi)

)()(edge1

i

m

iii hydh xd

Edge of hypothesis h w.r.t. distribution d

yih(xi)>0 if correct

h of accuracy

frog

-1,or +1 ∈[-1,+1]

Page 10: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Boosting (2)

color

size

-1

+1

1 . d1 : uniform distribution

2 . For t=1,…,T (i) Choose hypothesis ht

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

-1

-1

+1

h1

h1(xi)

More weights onMisclassified instances

Page 11: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

frog

Boosting (3)

color

size

-1

+1

1 . d1 : uniform distribution

2 . For t=1,…,T (i) Choose hypothesis ht

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

-1

-1

+1

h2

Note: more weights on “diffucult” instances

Page 12: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Boosting (4)

color

size

-1

+1

1 . d1 : uniform distribution

2 . For t=1,…,T (i) Choose hypothesis ht

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

-1

-1

+1

0.4h1 +0.6h2

Page 13: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Boosting & 1-norm soft margin optimizatoin

m

iii

jd

γ

d

njγh

γ

11

,

1 ,1

0

),...,1( ,)(edge

sub.to.

.min

Dual

d

d

n

jjj

ii

n

jijji

m

ii

ρ

αα

ymiξρhαy

νξν

ρ

11

1

1,,

0 ,1

)1,1 ,,...,1( ,)(

sub.to.

)1(1

.max

Primal

α

x

ξα

“find the large-margin hyperplane which separates pos. and neg. instances as much as possible”

find the distiribution which minimizesthe maximum edge of hypotheses(find the most “difficult” distribution w.r.t. hypotheses)

≈ solvable using Boosting

equivalent by duality of LP

Page 14: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

LPBoost [Demiriz et al, 2003]Update: solve the dual problem w.r.t. the hypothesis set {h1,…,ht}

Output: the convex combination

where α is the solution of the primal problem

m

iii

jd

γtt

d

tjγh

γγ

11

,11

1 ,1

0

),...,1( ,)(edge

sub.to.

.minarg),(

d

dd

Theorem[Demiriz et al.]

Given ε>0, LPBoost outputs ε-approximation of the optimal solution.

T

ttt hαf

1

)()( xx

Page 15: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Properties of the optimal solution

margin ρ*ξ* i

loss ξ* i

KKT conditions imply:

Note: Optimal solution can be recovered using only instances with positive weigthts

0 margin with i instance

10 margin with instance

1 margin with i instance

**

**

**

i

i

i

dρν

νdρ

problem dual the of sol. :),(

problem primal the of sol.:),,(**

***

γ

ρ

d

ξα

Page 16: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Properties of the optimal solution ( 2 )

By KKT conditions

0 d w.r.t. edge with hypothesis

0 d w.r.t. edge with hypothesis***

***

j

j

αγ

αγ

Sparseness of the optimal solution1. Sparseness w.r.t. hypotheses weighting2. Sparseness w.r.t. instances

Note: Optimal solution can be recovered using only hypotheses with positive coefficients.

Page 17: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

Page 18: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Our idea: Sparse LPBoostTake advantage of the sparseness w.r.t. hypotheses and instances

2.For t=1….(i)Pick up instances with margin <ρt

(iii) solve the dual problem w.r.t. the past chosen instances by Boosting (ρt+1 : the solution)

3 . Output the solution of the primal problem.

TheoremGiven ε>0, Sparse LPBoost outputs ε-approximation of the optimal solution.

Page 19: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Our idea (matrix form)

γ

γ

γ

d

d

d

hyhyhy

hyhyhy

hyhyhy

m

i

mmmmmm

iniijiii

nj

1

111

1

1111111

)(...)(...)(

......

)(... )(...)(

......

)(...)(...)(

xxx

xxx

xxx

Each row i corresponds toinstance i

Each column j corresponds to hypothesis j

LP LPBoost Sparse LPBoost

“effective” constraintsfor optimal sol.

whole matrix columns intersections of columns and rows

intersections of column and rows

Inequality Constraints of the dual problem

Page 20: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

How to choose examples (hypotheses)?

1st attempt: add an instance one by one

11

11

Ω11

)time ncomputatio(

kkν

t

t

k mk

νtt

less efficient than LP solver!

Assumptions- # of hypotheses is constant・ time complexity of a LP solver :  mk (m: # of instances) 

Our method :Choose at most 2t instances with margin < ρ (t:# of iterations)If the algorithm terminate after it chooose cm instances ( 0<c < 1 )

kkkcm

t

kt mOmcν

)2.04(2time ncomputatio)log(

1

)(

Note: same argument holds for hypothesesunknown

Page 21: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

Page 22: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Experiments(new experiments not in the proceedings)

Data set # of examples m

# of hypotheses n

Reuters-21578 10,170 30,389

RCV1 20,242 47,237

news20 19,996 1,335,193

parameters :ν=0.2m, ε=0.001

each algorithm implemented with C++ and LP solver CPLEX 11.0

Page 23: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Experimental Results (sec.)

Sparse LPBoost improves the computation time by 3 to 100 times.

Page 24: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

Page 25: Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu University, Japan DS 2009

Summary & Open problem

Our result• Sparse LPBoost: provable decompotion algorithm

which ε-approximates 1-norm soft margin optimization

• faster than 3 to 100 times than LP solver or LPBoost.

Open problem• Theoretical guarantee on # of iterations• Better method for choosing instances (hypotheses)