View
216
Download
0
Category
Tags:
Preview:
Citation preview
Multiple Kernel LearningMarius Kloft TECHNISCHE UNIVERSITÄT BERLIN(NOW: COURANT / SLOAN-KETTERING INSTITUTE)
Marius Kloft
• Aim
▫ Learning the relation of of two random quantities and
from observations
• Kernel-based learning:
Machine Learning
1/20
• Example
▫ Object detection in images
Marius Kloft
Multiple Views / Kernels
2/20
How to combine the views?
Weightings.
(Lanckriet, 2004)
Shape
Space
Color
Marius Kloft
Computation of Weights?
3/20
• State of the art
▫ Sparse weights
Kernels / views are completely discarded
▫But why discard information?
(Lanckriet et al., Bach et al., 2004)
Marius Kloft
From Vision to Reality?
• State of the art: sparse method
▫ empirically ineffective
4/20
(Gehler et al., Noble et al., Cortes et al., Shawe-Taylor et al., etc., NIPS WS 2008, ICML, ICCV, ...)
• New methodology
learning bounds of rate up to O(M/n)
More effective in applications
More efficient and effective in practice
Marius Kloft
Presentation of MethodologyNon-sparse Multiple Kernel Learning
Marius Kloft
• Generalized formulation
▫ arbitrary loss
▫ arbitrary norms
e.g. lp-norms:
1-norm leads to sparsity:
New Methodology
• Computation of weights?
▫ Model
Kernel
▫ Mathematical program
Convex problem.
5/20
(Kloft et al., NIPS 2009, ECML 2009/10, JMLR 2011)
Optimization over weights
Marius Kloft
Theoretical Analysis
• Theoretical foundations
▫ Active research topic
NIPS workshop 2009
▫ We show:
Theorem (Kloft & Blanchard). The local Rademacher com-plexity of MKL is bounded by1:
• Corollaries (Learning Bounds)
▫ Bound:
depends on kernel
best known rate:
6/20
(Kloft & Blanchard, NIPS 2011, JMLR 2012; Kloft, Rückert, Bartlett, ECML 2010)
(Cortes, Mohri, Rostamizadeh, ICML 2010)
Marius Kloft
Proof (Sketch)
1. Relating the original class with the centered class
2. Bounding the complexity of the centered class
3. Khintchine-Kahane’s and Rosenthal’s inequalities
4. Bounding the complexity of the original class
5. Relating the bound to the truncation of the spectra of the kernels
7/20
Marius Kloft
• Implementation
▫ In C++ (“SHOGUN Toolbox”)
Matlab/Octave/Python/R support
▫ Runtime:
~ 1-2 orders of magnitude faster
Optimization
• Algorithms
1. Newton method
2. sequential, quadratically constrained programming
with level set projections
3. block-coordinate descent alg.
Alternate
solve (P) w.r.t. w
solve (P) w.r.t. :
Until convergence
(proved)
8/20
(Kloft et al., JMLR 2011)
analytical
(Sketch)
Marius Kloft
▫ Choice of p crucial
▫ Optimality of p depends on true sparsity
SVM fails in the sparse scenarios
1-norm best in the sparsest scenario only
p-norm MKL proves robust in all scenarios
• Bounds
▫ Can minimize bounds w.r.t. p
▫ Bounds well reflect empirical results
• Design
▫ Two 50-dimensional Gaussians
mean μ1 := binary vector;
▫Zero-mean features irrelevant
50 training examples
One linear kernel per feature
▫ Six scenarios
Vary % of irrelevant features
Variance so that Bayes error constant
• Results
sparsity0% 44% 64% 82% 92% 98%
0%
50%
test error
0% 44% 64% 82% 92% 98%
Toy Experiment
Sparsity
9/20
sparsity
Marius Kloft
Applications▫
Compu
ter v
ision
Visu
al o
bjec
t rec
ogni
tion
▫Bio
info
rmat
ics
Subce
llula
r loc
aliza
tion
Pro
tein
fold
pre
dict
ion
DNA
Tran
scrip
tion
star
t site
det
ectio
n
M
etab
olic
netw
ork
reco
nstru
ctio
n
non-sparsity
10/20
• Lesson learned:
▫ Optimality of a method depends on true underlying sparsity of problem
• Applications studied:
Marius Kloft
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
11/20
aeroplane bicycle bird
Marius Kloft
• Multiple kernels
▫ based on
Color histograms
shapes
(gradients)
local features
(SIFT words)
spatial features
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
12/20
Marius Kloft
• Datasets
▫ 1. VOC 2009 challenge
7054 train / 6925 test images
20 object categories
▫aeroplane, bicycle,…
▫ 2. ImageCLEF2010 Challenge
8000 train / 10000 test images
▫ taken from Flickr
93 concept categories
▫partylife, architecture, skateboard, …
Application Domain: Computer Vision
13/20
• 32 Kernels
▫ 4 types:
▫ varied over color channel combinations and spatial tilings (levels of a spatial pyramid)
▫ one-vs.-rest classifier for each visual concept
Bag of Words Global Histogram
Gradients BoW-SIFT Ho(O)G
Color Histograms BoW-C HoC
Marius Kloft
• Challenge results
▫ Employed our approach for ImageCLEF2011 Photo Annotation challenge
achieved the winning entries in 3 categories
Application Domain: Computer Vision
14/20
• Preliminary results:
▫ using BoW-S only gives worse results
→ BoW-S alone is not sufficient
SVM MKL
VOC2009 55.85 56.76
CLEF2010 36.45 37.02
(Binder, Kloft, et al., 2010, 2011)
Marius Kloft
• Experiment
▫ Disagreement of single-kernel classifiers
▫ BoW-S kernels induce more or less the same predictions
Application Domain: Computer Vision
15/20
• Why can MKL help?
▫ some images better captured by certain kernels
▫ different images may have different kernels that capture them well
Marius Kloft
• Detection of
▫ transcription start sites:
• by means of kernels based on:
▫ sequence alignments
▫ distribution of nukleotides
downstream, upstream
▫ folding properties
binding energies and angles
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS (small n)
ARTS = winner of international comparison of 19 models
(Abeel et al., 2009)
(Kloft et al., NIPS 2009, JMLR 2011)
Application Domain: Genetics
16/20
Marius Kloft
• Theoretical analysis
▫ impact of lp-Norm on bound
▫ confirms experimental results:
stronger theoretical guarantees for proposed approach (p>1)
empirical and theoretical results approximately equal for
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS (small n)
ARTS = winner of international comparison of 19 models
(Kloft et al., NIPS 2009, JMLR 2011)
(Abeel et al., 2009)
Application Domain: Genetics
17/20
Marius Kloft
• Protein Fold Prediction
▫ Prediction of fold class of protein
Fold class related to protein’s function
▫e.g., important for drug design
▫ Data set and kernels from Y. Ying
27 fold classes
Fixed train and test sets
12 biologically inspired kernels
▫e.g. hydrophobicity, polarity, van-der-Waals volume
• Results
▫ Accuracy:
▫ 1-norm MKL and SVM on par
▫ p-norm MKL performs best
6% higher accuracy than baselines
18/20
Application Domain: Pharmacology
Marius Kloft
▫Com
pute
r visi
on
Visual
obj
ect r
ecog
nitio
n
▫Bio
info
rmat
ics
Subce
llula
r loc
aliza
tion
Pro
tein
fold
pre
dict
ion
DNA
Tran
scrip
tion
splic
e sit
e de
tect
ion
M
etab
olic
netw
ork
reco
nstru
ctio
n
non-sparsity
19/20
Further Applications
Marius Kloft
Conclusion: Non-sparse Multiple Kernel Learning
20/20
Training with > 100,000 data points and > 1 000 Kernels
Sharp learning bounds
Appli-cations
Visual Object Recognition
winner of ImageCLEF 2011 Photo-annotation Challenge
Computational Biology
gene detector competitive to winner of int. comparison
Thank you for your attention
I will be happy to answer any additional questions
References
▫ Abeel, Van de Peer, Saeys (2009). Toward a gold standard for promoter prediction evaluation. Bioinformatics.
▫ Bach (2008). Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR).
▫ Kloft, Brefeld, Laskov, and Sonnenburg (2008). Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.
▫ Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009). Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).
▫ Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.
▫ Kloft, Blanchard (2011). The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011).
▫ Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 12(Mar):953-997.
▫ Kloft and Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.
▫ Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).
Recommended