47
SpicyMKL Efficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School of Information Science and Technology Department of Mathematical Informatics 23th Oct., 2009@ISM 1

SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Embed Size (px)

Citation preview

Page 1: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

SpicyMKL Efficient multiple kernel learning

method using dual augmented Lagrangian

Taiji SuzukiRyota Tomioka

The University of TokyoGraduate School of Information Science and Technology

Department of Mathematical Informatics

23th Oct., 2009@ISM1

Page 2: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Multiple Kernel Learning

Fast Algorithm

2

SpicyMKL

Page 3: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Kernel Learning

3

Regression, Classification

Kernel Function

Page 4: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

• Gaussian, polynomial, chi-square, ….– Parameters : Gaussian width, polynomial

degree

• Features– Computer Vision : color, gradient, sift (sift,

hsvsift, huesift, scaling of sift), Geometric Blur, image regions, ...

4

Thousands of kernels

Multiple Kernel Learning(Lanckriet et al. 2004)

Select important kernels and combine them

Page 5: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

5

Multiple Kernel Learning (MKL)Single Kernel

Multiple Kernel

dm:convex combination, sparse (many 0 components)

Page 6: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

6

LassoGroup Lasso

Multiple Kernel Learning (MKL)

grouping

kernelize

Sparse learning

: N samples

: M kernels

: RKHS associated with km

: Convex loss ( hinge, square, logistic )

L1-regularization

→sparse

Page 7: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

7

: Gram matrix of mth kernel

Representer theorem

Finite dimensional problem

Page 8: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

8

Proposed:SpicyMKL

• Scales well against # of kernels.• Formulated in a general framework.

SpicyMKL

Page 9: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Outline

• Introduction• Details of Multiple Kernel Learning • Our method

– Proximal minimization– Skipping inactive kernels

• Numerical Experiments– Bench-mark datasets– Sparse or dense

9

Page 10: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Details of Multiple Kernel Learning(MKL)

10

Page 11: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

11

MKL

Generalized Formulation

: sparse regularization

Page 12: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

12

hinge loss

logistic loss

elastic net

L1regularization

elastic net:

L1regularization:

hinge:

logistic:

Page 13: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Rank 1 decomposition

13

If g is monotonically increasing and f` is diff’ble at the optimum, the solution of MKL is rank 1:

such that

convex combination of kernels

Proof Derivative w.r.t.

Page 14: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Existing approach 1

14

Constraints based formulation: [Lanckriet et al. 2004, JMLR] [Bach, Lanckriet & Jordan 2004, ICML]

Dual

• Lanckriet et al. 2004: SDP• Bach, Lanckriet & Jordan 2004: SOCP

primal

Page 15: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Existing approach 2

15

Upper bound based formulation: [Sonnenburg et al. 2006, JMLR] [Rakotomamonjy et al. 2008, JMLR] [Chapelle & Rakotomamonjy 2008, NIPS workshop]

• Sonnenburg et al. 2006: Cutting plane (SILP)• Rakotomamonjy et al. 2008: Gradient descent (SimpleMKL)• Chapelle & Rakotomamonjy 2008: Newton method (HessianMKL)

primal

(Jensen’s inequality)

Page 16: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Problem of existing methods

16

• Do not make use of sparsity

during the optimization.

→   Do not perform efficiently when

# of kernels is large.

Page 17: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Outline

• Introduction• Details of Multiple Kernel Learning • Our method

– Proximal minimization– Skipping inactive kernels

• Numerical Experiments– Bench-mark datasets– Sparse or dense

17

Page 18: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Our method SpicyMKL

18

Page 19: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

• DAL (Tomioka & Sugiyama 09)– Dual Augmented Lagrangian– Lasso, group Lasso, trace norm regularization

• SpicyMKL = DAL + MKL– Kernelization of DAL

19

Page 20: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Difficulty of MKL ( Lasso )

20

is non-diff’ble at 0.

0

Relax by Proximal Minimization

(Rockafellar, 1976)

Page 21: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Our algorithm

21

SpicyMKL: Proximal Minimization (Rockafellar, 1976)

Set some initial solution . Choose increasing sequence .

Iterate the following update rule until convergence:

1

2

Regularization toward the last update.

Taking its dual, the update step can be solved efficiently

super linearly !

• •

It can be shown that

Page 22: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

22

Fenchel’s duality theorem

Split into two parts

Page 23: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

23

dual

Non-diff’ble Smooth !Moreau envelope

dual

Non-diff’ble

Page 24: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Inner optimization

• Newton method is applicable.

24

Twice Differentiable Twice Differentiable (a.e.)

Even if is not differentiable,we can apply almost the same algorithm.

Page 25: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Rigorous Derivation of

25

Convolution and duality

Page 26: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Moreau envelope

26

Moreau envelope (Moreau 65)

For a convex function , we define its Moreau envelope as

Page 27: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

27

Moreau envelope

L1-penaltydual

Smooth !

Page 28: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

28

We arrived at …

Dual of update rule in proximal minimization

• Differentiable• Only active kernels need to be calculated.

(next slide)

What we have observed

Page 29: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

29

Soft threshold

Hard threshold

If (inactive), ,and its gradient also vanishes.

Skipping inactive kernels

Page 30: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

30

The objective function:

Its derivatives:

where

• Only active kernels contributes their computations.

→ Efficient for large number of kernels.

Derivatives• We can apply Newton method for the dual

optimization.

Page 31: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

If is strongly convex and is sparse reg., for any increasing sequence ,the convergence rate is super linear.

For any proper convex functions and , and any non-decreasing sequence ,

Convergence result

31

Thm.

Thm.

Page 32: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

• If loss is non-differentiable, almost the same algorithm is available by utilizing Moreau envelope of .

32

primal dual Moreau envelope of dual

Non-differentiable loss

Page 33: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Dual of elastic net

33

Smooth !

primal dual

Naïve optimization in the dual is applicable.But we applied prox. minimization methodfor small to avoid numerical instability.

Page 34: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Why ‘Spicy’ ?

• SpicyMKL = DAL + MKL.

• DAL also means a major Indian cuisine.

 → hot, spicy

 → SpicyMKL

34

from Wikipedia

“Dal”

Page 35: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Outline

• Introduction• Details of Multiple Kernel Learning • Our method

– Proximal minimization– Skipping inactive kernels

• Numerical Experiments– Bench-mark datasets– Sparse or dense

35

Page 36: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Numerical Experiments

36

Page 37: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

37

SpicyMKL

CPU time against # of kernels

• Scales well against # of kernels.

RingnormSplice

IDA data set, L1 regularization.

Page 38: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

38

SpicyMKL

• Scaling against # of samples is almost the same as the existing methods.

RingnormSplice

CPU time against # of samplesIDA data set, L1 regularization.

Page 39: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

39

UCI dataset # of kernels 189 243 918 891 1647

Page 40: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

40

Comparison in elastic net(Sparse v.s. Dense)

Page 41: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

where .

Sparse or Dense ?

41

Sparse Dense0 1

Elastic Net

Compare performances of sparse and dense solutionsin a computer vision task.

Page 42: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

caltech101 dataset (Fei-Fei et al., 2004)

• object recognition task• anchor, ant, cannon, chair, cup

• 10 classification problems   (10=5×(5-1)/2) • # of kernels: 1760 = 4 (features)×22 (regions) ×

 あああああ 2 (kernel functions) × 10 (kernel parameters)– sift features [Lowe99]: colorDescriptor [van de Sande et al.10]

hsvsift, sift (scale = 4px, 8px, automatic scale)– regions: whole region, 4 division, 16 division, spatial

pyramid. (computed histogram of features on each region.)– kernel functions: Gaussian kernels, kernels with

10 parameters. 42

Page 43: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

43

Medium density

dense

Performance of sparse solution is comparable with dense solution.

Performances are averaged over all classification problems.

Best

Page 44: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

まとめと今後の課題Conclusion• We proposed a new MKL algorithm that is

efficient when # of kernels is large.– proximal minimization– neglect ‘in-active’ kernels

• Medium density showed the best performance, but sparse solution also works well.

Future work• Second order update

44

Conclusion and Future works

Page 45: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

• Technical report

T. Suzuki & R. Tomioka: SpicyMKL.

arXiv: http://arxiv.org/abs/0909.5026

• DAL R. Tomioka & M. Sugiyama: Dual Augmented Lagrangian

Method for Efficient Sparse Reconstruction. IEEE Signal Proccesing Letters, 16 (12) pp. 1067--1070, 2009.

45

Page 46: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

Thank you for your attention!

46

Page 47: SpicyMKL E fficient multiple kernel learning method using dual augmented Lagrangian Taiji Suzuki Ryota Tomioka The University of Tokyo Graduate School

47

Moreau envelope

proximal operator

Some properties

0

(Fenchel duality th.)

(Fenchel duality th.)

Differentiable !