Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu...

Linear Programming Boosting

by Column and Row Generation

Kohei Hatano and Eiji TakimotoKyushu University, Japan

DS 2009

1. Introduction2. Preliminaries3. Our algorithm4. Experiment5. Summary

ExampleGiven a web page, predict if the topic

is “DS 2009”.

hypothesis set= words

DS 2009?y n

0.5* ALT?y n

+ 0.3* Porto?y n

+ 0.2*

Weighted majority vote

Modern Machine Learning ApproachFind a weighted majority of hypotheses (or hyperplane) which enlarges the margin

ymiξρhαy

mννξν

)1,1 ,,...,1( ,)(

sub.to.

)1 s.t. constant:(1

1-norm soft margin optimization

(aka 1 norm SVM)Popular formulation as well as 2 norm soft margin opt.“find a hyperplane which separates positive and negative instances well”

margin ρ

loss ξ i

Note・ Linear Program・ good generarization guarantee [Schapire et al. 98]

marginloss

normalized with 1 norm

1-norm soft margin optimization(2)

Advantage of 1 norm soft margin opt.Solution likely to be sparse 　　⇒　 useful for feature selection

sparse hyperplane0.5 *(DS 2009?) + 0.3 *(ALT?) + 0.2* (Porto?)

non-sparse hyperplane0.2 *(DS 2009?) + 0.1 *(ALT?) + 0.1 * (Porto?)* 0.1* (wine?)+0.05*(tasting?)+0.05* (discovery?)+ 0.03*(science?)+0.02*()+…

2 norm soft margin opt.

1 norm soft marign opt.

Recent Results

Our resultnew algorithm for 1 norm soft margin optimization

2-norm soft margin optimization・ Quadratic Programming・ SMO [Platt, 1999]

・ SVMlight [Joachims, 1999]

・ SVMPerf [Joachims, 2006]

・ Pegasos [Shai-Schwartz et al., 2007]

There are state-of-the-art solvers

1-norm soft margin optimization・ Linear Programming・ LPBoost [Dem iriz et al, 2003]

・ Entropy Regularized LPBoost [Warmuth et al., 2008]

・ others [Mangasarian 2006][Sra 2006]

not efficient enough for large dataroom for improvements

BoostingClassification: frog “+1”, others “-1”Hypotheses ・・・ color and size

1 ． d1 : uniform distribution

2 ． For t=1,…,T (i) Choose hypothesis hｔ

maximizing the edge w.r.t.dt (ii) Update distribution dt

to dt+1

3. Output weighting of chosen hypotheses

Boosting (2)

to dt+1

h1(xi)

)()(edge1

iii hydh xd

Edge of hypothesis h w.r.t. distribution d

yih(xi)>0 if correct

h of accuracy

-1,or +1 ∈[-1,+1]

Boosting (2)

to dt+1

h1(xi)

More weights onMisclassified instances

Boosting (3)

to dt+1

Note: more weights on “diffucult” instances

Boosting (4)

to dt+1

0.4h1 +0.6h2

Boosting & 1-norm soft margin optimizatoin

),...,1( ,)(edge

sub.to.

ymiξρhαy

νξν

)1,1 ,,...,1( ,)(

sub.to.

Primal

“find the large-margin hyperplane which separates pos. and neg. instances as much as possible”

find the distiribution which minimizesthe maximum edge of hypotheses(find the most “difficult” distribution w.r.t. hypotheses)

≈ solvable using Boosting

equivalent by duality of LP

LPBoost [Demiriz et al, 2003]Update: solve the dual problem w.r.t. the hypothesis set {h1,…,ht}

Output: the convex combination

where α is the solution of the primal problem

),...,1( ,)(edge

sub.to.

.minarg),(

Theorem[Demiriz et al.]

Given ε>0, LPBoost outputs ε-approximation of the optimal solution.

ttt hαf

)()( xx

Properties of the optimal solution

margin ρ*ξ* i

loss ξ* i

KKT conditions imply:

Note: Optimal solution can be recovered using only instances with positive weigthts

0 margin with i instance

10 margin with instance

1 margin with i instance

problem dual the of sol. :),(

problem primal the of sol.:),,(**

Properties of the optimal solution （ 2 ）

By KKT conditions

0 d w.r.t. edge with hypothesis

0 d w.r.t. edge with hypothesis***

Sparseness of the optimal solution１． Sparseness w.r.t. hypotheses weighting２． Sparseness w.r.t. instances

Note: Optimal solution can be recovered using only hypotheses with positive coefficients.

Our idea: Sparse LPBoostTake advantage of the sparseness w.r.t. hypotheses and instances

2.For t=1….(i)Pick up instances with margin <ρt

(iii) solve the dual problem w.r.t. the past chosen instances by Boosting (ρt+1 ： the solution)

3 ． Output the solution of the primal problem.

TheoremGiven ε>0, Sparse LPBoost outputs ε-approximation of the optimal solution.

Our idea (matrix form)

hyhyhy

mmmmmm

iniijiii

1111111

)(...)(...)(

......

)(... )(...)(

......

)(...)(...)(

Each row i corresponds toinstance i

Each column j corresponds to hypothesis j

LP LPBoost Sparse LPBoost

“effective” constraintsfor optimal sol.

whole matrix columns intersections of columns and rows

intersections of column and rows

Inequality Constraints of the dual problem

How to choose examples (hypotheses)?

1st attempt: add an instance one by one

)time ncomputatio(

less efficient than LP solver!

Assumptions- # of hypotheses is constant・ time complexity of a LP solver ：　 mk (m: # of instances)　

Our method ：Choose at most 2t instances with margin < ρ (t:# of iterations)If the algorithm terminate after it chooose cm instances （ 0<c < 1 ）

kt mOmcν

)2.04(2time ncomputatio)log(

）（

Note: same argument holds for hypothesesunknown

Experiments(new experiments not in the proceedings)

Data set # of examples m

# of hypotheses n

Reuters-21578 10,170 30,389

RCV1 20,242 47,237

news20 19,996 1,335,193

parameters ：ν=0.2m, ε=0.001

each algorithm implemented with C++ and LP solver CPLEX 11.0

Experimental Results (sec.)

Sparse LPBoost improves the computation time by 3 to 100 times.

Summary & Open problem

Our result• Sparse LPBoost: provable decompotion algorithm

which ε-approximates 1-norm soft margin optimization

• faster than 3 to 100 times than LP solver or LPBoost.

Open problem• Theoretical guarantee on # of iterations• Better method for choosing instances (hypotheses)

Linear Programming Boosting by Column and Row Generation Kohei Hatano and Eiji Takimoto Kyushu...

Documents

Identification of Reduction and Recycling Potential by a Detailed Waste Composition Analysis Kohei Watanabe

Kohei Nawa appointed for program artist, major programs ... · Press Release September 15, 2016 Roppongi Art Night Executive Committee - 2 - Main Program Artist: Kohei Nawa Photo:

General Hospital Company - IR Webcasting · 2016-12-27 · © Terumo Corporation Shoji Hatano President, General Hospital Company General Hospital Company 4

Suppressing Redundant TCP Retransmissions in Wireless Mesh Networks · Suppressing Redundant TCP Retransmissions in Wireless Mesh Networks Shuhei Aketa, Eiji Takimoto, Yuto Otsuki,

PT-symmetric quantum mechanics - University of …hatano-lab.iis.u-tokyo.ac.jp/.../1211_02_bender.pdfPT-symmetric quantum mechanics Carl Bender Washington University Kyoto, December

Kohei Shiomoto, Yousuke Takahashi, and Keisuke Ishibashi

L ICitizen Soul! D Y Hatano Hirofumi People In The Box # 25 D 0 Z … · 2020-04-26 · C 3 Z O 0 z fill JD J \ L ICitizen Soul! D Y Hatano Hirofumi People In The Box # 25 D 0 Z D

WiMAX Management 29/09/2006 Kohei ISEDA FUJITSU LABORATOIRES LTD

Political Text Analysis - Kohei Watanabe · Lecture 4 Kohei Watanabe. ... Types of statistical analysis (2) Unit of cooccurrences • Unit cooccurrences changes levels of language

A4 Agribusiness2017 takimoto ol2 - 東京理科大学 · 2018. 11. 27. · Title: A4_Agribusiness2017_takimoto_ol2 Created Date: 10/11/2017 10:25:57 AM

Yasuo Sakai 1 , M.D., Ph.D. Junko Okano 2 , M.D., Ph.D. Kohei Shiota 3 , M.D., Ph.D

Kohei Saito Karl Marx’s Ecosocialism

Kohei Uchimura - Famous Sports Celebrity For Kids

Photography Presentation Kohei Yoshiyuki(2)

Multifaceted Interpretation of Colon Cancer Stem Cells · International Journal of Molecular Sciences Review Multifaceted Interpretation of Colon Cancer Stem Cells Yuichiro Hatano

Nonclinical Drug Development - pcp.nihtraining.com · Nonclinical Drug Development Chris H. Takimoto, MD, PhD, ... – Targeted therapies/Biomarkers ... –Selective –Predictable

Carleton Motion Capture (CMC) Matt Kracum, Will Levine, Coryn Pavelsky, Kohei Shinkawa Jack Goldfeather

Liquidity Ratios Period 1 Kohei Takata

CHAPTER 11: PROPERTIES OF SOLUTIONS By Kelly Sun and Libby Takimoto

ADAM FISHER, DAVID BRANCH, KAZUHITO HATANO, and E. …