24
Multiple Kernel Learning Marius Kloft Technische Universität Berlin Korea University

Multiple Kernel Learning

Embed Size (px)

DESCRIPTION

Multiple Kernel Learning. Marius Kloft Technische Universität Berlin Korea University. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim - PowerPoint PPT Presentation

Citation preview

Page 1: Multiple Kernel Learning

Multiple Kernel LearningMarius KloftTechnische Universität BerlinKorea University

Page 2: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Aim

▫ Learning the relation of of two random quantities and

from observations

• Kernel-based learning:

Machine Learning

1/12

• Example

▫ Object detection in images

Page 3: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Multiple Views / Kernels

2/12

How to combine the views?

Weightings.

(Lanckriet, 2004)

Shape

Space

Color

Page 4: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Computation of Weights?

3/12

• State of the art

▫ Sparse weights

Kernels / views are completely discarded

▫But why discard information?

(Bach, 2008)

Page 5: Multiple Kernel Learning

Marius Kloft (TU Berlin)

From Vision to Reality?

• State of the art: sparse method

▫ empirically ineffective

4/12

(Gehler et al., Noble et al., Shawe-Taylor et al., NIPS 2008)

• My dissertation: new methodology

▫ established as a standard

Breakthrough at lear-ning bounds: O(M/n)

Effective in Applications

More efficient and effective in practice

Page 6: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Presentation of MethodologyNon-sparse Multiple Kernel Learning

Page 7: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Generalized formulation

▫ arbitrary loss

▫ arbitrary norms

e.g. lp-norms:

1-norm leads to sparsity:

New Methodology

• Computation of weights?

▫ Model

Kernel

▫ Mathematical program

Convex problem.

5/12

(Kloft et al., ECML 2010, JMLR 2011)

Optimization over weights

Page 8: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Theoretical Analysis

• Theoretical foundations

▫ Active research topic

NIPS workshop 2010

▫ We show:

Theorem (Kloft & Blanchard). The local Rademacher complexity of MKL is bounded by:

• Corollaries (Learning Bounds)

▫ Upper bound with rate

best known rate:

Generally

for , improvement of two orders of magnitude

6/12

(Kloft & Blanchard, NIPS 2011, JMLR 2012)

(Cortes et al., ICML 2010)

Page 9: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Proof (Sketch)

1. Relating the original class with the centered class

2. Bounding the complexity of the centered class

3. Khintchine-Kahane’s and Rosenthal’s inequalities

4. Bounding the complexity of the original class

5. Relating the bound to the truncation of the spectra of the kernels

7/12

Page 10: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Implementation

▫ In C++ (“SHOGUN Toolbox”)

Matlab/Octave/Python/R support

▫ Runtime:

~ 1-2 orders of magnitude faster

Optimization

• Algorithms

1. Newton method

2. sequential, quadratically constrained programming

with level set projections

3. block-coordinate descent alg.

Alternate

solve (P) w.r.t. w

solve (P) w.r.t. :

Until convergence

(proved)

8/12

(Kloft et al., JMLR 2011)

analytical

(Sketch)

Page 11: Multiple Kernel Learning

11Marius Kloft (TU Berlin)

▫ Choice of p crucial

▫ Optimality of p depends on true sparsity

SVM fails in the sparse scenarios

1-norm best in the sparsest scenario only

p-norm MKL proves robust in all scenarios

• Bounds

▫ Can minimize bounds w.r.t. p

▫ Bounds well reflect empirical results

• Design

▫ Two 50-dimensional Gaussians

mean μ1 on simplex,

▫Zero-mean features irrelevant

50 training examples

One linear kernel per feature

▫ Six scenarios

Vary % of irrelevant features

Variance so that Bayes error constant

• Results

sparsity0% 44% 64% 82% 92% 98%

0%

50%

test error

0% 44% 64% 82% 92% 98%

Toy Experiment

Sparsity

sparsity

Page 12: Multiple Kernel Learning

12Marius Kloft (TU Berlin)

Applications▫

Compu

ter v

ision

Visu

al o

bjec

t rec

ogni

tion

▫Bio

info

rmat

ics

Subce

llula

r loc

aliza

tion

Pro

tein

fold

pre

dict

ion

DNA

Tran

scrip

tion

star

t site

det

ectio

n

M

etab

olic

netw

ork

reco

nstru

ctio

n

non-sparsity

• Lesson learned:

▫ Optimality of a method depends on true underlying sparsity of problem

• Applications studied:

Page 13: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Visual object recognition

▫ Aim: annotation of visual media (e.g., images)

▫ Motivation:

▫  content-based image retrieval

Application Domain: Computer Vision

9/12

aeroplane bicycle bird

Page 14: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Multiple kernels

▫ based on

Color histograms

shapes

(gradients)

local features

(SIFT words)

spatial features

• Visual object recognition

▫ Aim: annotation of visual media (e.g., images)

▫ Motivation:

▫  content-based image retrieval

Application Domain: Computer Vision

9/12

Page 15: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Datasets

▫ 1. VOC 2009 challenge

7054 train / 6925 test images

20 object categories

▫aeroplane, bicycle,…

▫ 2. ImageCLEF2010 Challenge

8000 train / 10000 test images

▫ taken from Flickr

93 concept categories

▫partylife, architecture, skateboard, …

Application Domain: Computer Vision

9/12

• 32 Kernels

▫ 4 types:

▫ varied over color channel combinations and spatial tilings (levels of a spatial pyramid)

▫ one-vs.-rest classifier for each visual concept

Bag of Words Global Histogram

Gradients BoW-SIFT Ho(O)G

Color Histograms BoW-C HoC

Page 16: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Challenge results

▫ Employed our approach for ImageCLEF2011 Photo Annotation challenge

achieved the winning entries in 3 categories!

Application Domain: Computer Vision

9/12

• Preliminary results:

▫ using BoW-S only gives worse results

→ BoW-S alone is not sufficient

SVM MKL

VOC2009 55.85 56.76

CLEF2010 36.45 37.02

(Binder et al., 2010,2011)

Page 17: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Experiment

▫ Disagreement of single-kernel classifiers

▫ BoW-S kernels induce more or less the same predictions

Application Domain: Computer Vision

9/12

• Why can MKL help?

▫ some images better captured by certain kernels

▫ Different images may have different kernels that capture them well

Page 18: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Detection of

▫ transcription start sites:

• by means of kernels based on:

▫ sequence alignments

▫ distribution of nukleotides

downstream, upstream

▫ folding properties

binding energies and angles

• Empirical analysis

▫ detection accuracy (AUC):

▫ higher accuracies than sparse MKL and ARTS

ARTS winner of international comparison of 19 models

(Abeel et al., 2009)

(Kloft et al., NIPS 2009, JMLR 2011)

Application Domain: Genetics

Page 19: Multiple Kernel Learning

Marius Kloft (TU Berlin)

• Theoretical analysis

▫ impact of lp-Norm on bound

▫ confirms experimental results:

stronger theoretical guarantees for proposed approach (p>1)

empirical and theoretical results approximately equal for

• Empirical analysis

▫ detection accuracy (AUC):

▫ higher accuracies than sparse MKL and ARTS

ARTS winner of international comparison of 19 models

(Kloft et al., NIPS 2009, JMLR 2011)

(Abeel et al., 2009)

Application Domain: Genetics

Page 20: Multiple Kernel Learning

20Marius Kloft (TU Berlin)

• Protein Fold Prediction

▫ Prediction of fold class of protein

Fold class related to protein’s function

▫e.g., important for drug design

▫ Data set and kernels from Y. Ying

27 fold classes

Fixed train and test sets

12 biologically inspired kernels

▫e.g. hydrophobicity, polarity, van-der-Waals volume

• Results

▫ Accuracy:

▫ 1-norm MKL and SVM on par

▫ p-norm MKL performs best

6% higher accuracy than baselines

Application Domain: Pharmacology

Page 21: Multiple Kernel Learning

21Marius Kloft (TU Berlin)

▫Com

pute

r visi

on

Visual

obj

ect r

ecog

nitio

n

▫Bio

info

rmat

ics

Subce

llula

r loc

aliza

tion

Pro

tein

fold

pre

dict

ion

DNA

Tran

scrip

tion

splic

e sit

e de

tect

ion

M

etab

olic

netw

ork

reco

nstru

ctio

n

non-sparsity

Further Applications

Page 22: Multiple Kernel Learning

Marius Kloft (TU Berlin)

Conclusion: Non-sparse Multiple Kernel Learning

10/12

Training with > 100,000 data points and > 1 000 Kernels

Sharp learning bounds

Appli-cations

Visual Object Recognition

established standard: winner of ImageCLEF 2011 Challenge

Computational Biology

More accurate gene detector than winner of int. comparison

Page 23: Multiple Kernel Learning

12/12

Thank you for your attention.

I will be pleased to answer any additional questions.

Page 24: Multiple Kernel Learning

References

▫ Abeel, Van de Peer, Saeys (2009).  Toward a gold standard for promoter prediction evaluation. Bioinformatics.

▫ Bach (2008).  Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR).

▫ Kloft, Brefeld, Laskov, and Sonnenburg (2008). Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.

▫ Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009). Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).

▫ Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.

▫ Kloft, Blanchard (2011). The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011).

▫ Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 12(Mar):953-997.

▫ Kloft and Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.

▫ Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).

11/12