Upload
risa-sargent
View
21
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Multiple Kernel Learning. Marius Kloft Technische Universität Berlin Korea University. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A. Machine Learning. Example Object detection in images. Aim - PowerPoint PPT Presentation
Citation preview
Multiple Kernel LearningMarius KloftTechnische Universität BerlinKorea University
Marius Kloft (TU Berlin)
• Aim
▫ Learning the relation of of two random quantities and
from observations
• Kernel-based learning:
Machine Learning
1/12
• Example
▫ Object detection in images
Marius Kloft (TU Berlin)
Multiple Views / Kernels
2/12
How to combine the views?
Weightings.
(Lanckriet, 2004)
Shape
Space
Color
Marius Kloft (TU Berlin)
Computation of Weights?
3/12
• State of the art
▫ Sparse weights
Kernels / views are completely discarded
▫But why discard information?
(Bach, 2008)
Marius Kloft (TU Berlin)
From Vision to Reality?
• State of the art: sparse method
▫ empirically ineffective
4/12
(Gehler et al., Noble et al., Shawe-Taylor et al., NIPS 2008)
• My dissertation: new methodology
▫ established as a standard
Breakthrough at lear-ning bounds: O(M/n)
Effective in Applications
More efficient and effective in practice
Marius Kloft (TU Berlin)
Presentation of MethodologyNon-sparse Multiple Kernel Learning
Marius Kloft (TU Berlin)
• Generalized formulation
▫ arbitrary loss
▫ arbitrary norms
e.g. lp-norms:
1-norm leads to sparsity:
New Methodology
• Computation of weights?
▫ Model
Kernel
▫ Mathematical program
Convex problem.
5/12
(Kloft et al., ECML 2010, JMLR 2011)
Optimization over weights
Marius Kloft (TU Berlin)
Theoretical Analysis
• Theoretical foundations
▫ Active research topic
NIPS workshop 2010
▫ We show:
Theorem (Kloft & Blanchard). The local Rademacher complexity of MKL is bounded by:
• Corollaries (Learning Bounds)
▫ Upper bound with rate
best known rate:
Generally
for , improvement of two orders of magnitude
6/12
(Kloft & Blanchard, NIPS 2011, JMLR 2012)
(Cortes et al., ICML 2010)
Marius Kloft (TU Berlin)
Proof (Sketch)
1. Relating the original class with the centered class
2. Bounding the complexity of the centered class
3. Khintchine-Kahane’s and Rosenthal’s inequalities
4. Bounding the complexity of the original class
5. Relating the bound to the truncation of the spectra of the kernels
7/12
Marius Kloft (TU Berlin)
• Implementation
▫ In C++ (“SHOGUN Toolbox”)
Matlab/Octave/Python/R support
▫ Runtime:
~ 1-2 orders of magnitude faster
Optimization
• Algorithms
1. Newton method
2. sequential, quadratically constrained programming
with level set projections
3. block-coordinate descent alg.
Alternate
solve (P) w.r.t. w
solve (P) w.r.t. :
Until convergence
(proved)
8/12
(Kloft et al., JMLR 2011)
analytical
(Sketch)
11Marius Kloft (TU Berlin)
▫ Choice of p crucial
▫ Optimality of p depends on true sparsity
SVM fails in the sparse scenarios
1-norm best in the sparsest scenario only
p-norm MKL proves robust in all scenarios
• Bounds
▫ Can minimize bounds w.r.t. p
▫ Bounds well reflect empirical results
• Design
▫ Two 50-dimensional Gaussians
mean μ1 on simplex,
▫Zero-mean features irrelevant
50 training examples
One linear kernel per feature
▫ Six scenarios
Vary % of irrelevant features
Variance so that Bayes error constant
• Results
sparsity0% 44% 64% 82% 92% 98%
0%
50%
test error
0% 44% 64% 82% 92% 98%
Toy Experiment
Sparsity
sparsity
12Marius Kloft (TU Berlin)
Applications▫
Compu
ter v
ision
Visu
al o
bjec
t rec
ogni
tion
▫Bio
info
rmat
ics
Subce
llula
r loc
aliza
tion
Pro
tein
fold
pre
dict
ion
DNA
Tran
scrip
tion
star
t site
det
ectio
n
M
etab
olic
netw
ork
reco
nstru
ctio
n
non-sparsity
• Lesson learned:
▫ Optimality of a method depends on true underlying sparsity of problem
• Applications studied:
Marius Kloft (TU Berlin)
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
9/12
aeroplane bicycle bird
Marius Kloft (TU Berlin)
• Multiple kernels
▫ based on
Color histograms
shapes
(gradients)
local features
(SIFT words)
spatial features
• Visual object recognition
▫ Aim: annotation of visual media (e.g., images)
▫ Motivation:
▫ content-based image retrieval
Application Domain: Computer Vision
9/12
Marius Kloft (TU Berlin)
• Datasets
▫ 1. VOC 2009 challenge
7054 train / 6925 test images
20 object categories
▫aeroplane, bicycle,…
▫ 2. ImageCLEF2010 Challenge
8000 train / 10000 test images
▫ taken from Flickr
93 concept categories
▫partylife, architecture, skateboard, …
Application Domain: Computer Vision
9/12
• 32 Kernels
▫ 4 types:
▫ varied over color channel combinations and spatial tilings (levels of a spatial pyramid)
▫ one-vs.-rest classifier for each visual concept
Bag of Words Global Histogram
Gradients BoW-SIFT Ho(O)G
Color Histograms BoW-C HoC
Marius Kloft (TU Berlin)
• Challenge results
▫ Employed our approach for ImageCLEF2011 Photo Annotation challenge
achieved the winning entries in 3 categories!
Application Domain: Computer Vision
9/12
• Preliminary results:
▫ using BoW-S only gives worse results
→ BoW-S alone is not sufficient
SVM MKL
VOC2009 55.85 56.76
CLEF2010 36.45 37.02
(Binder et al., 2010,2011)
Marius Kloft (TU Berlin)
• Experiment
▫ Disagreement of single-kernel classifiers
▫ BoW-S kernels induce more or less the same predictions
Application Domain: Computer Vision
9/12
• Why can MKL help?
▫ some images better captured by certain kernels
▫ Different images may have different kernels that capture them well
Marius Kloft (TU Berlin)
• Detection of
▫ transcription start sites:
• by means of kernels based on:
▫ sequence alignments
▫ distribution of nukleotides
downstream, upstream
▫ folding properties
binding energies and angles
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS
ARTS winner of international comparison of 19 models
(Abeel et al., 2009)
(Kloft et al., NIPS 2009, JMLR 2011)
Application Domain: Genetics
Marius Kloft (TU Berlin)
• Theoretical analysis
▫ impact of lp-Norm on bound
▫ confirms experimental results:
stronger theoretical guarantees for proposed approach (p>1)
empirical and theoretical results approximately equal for
• Empirical analysis
▫ detection accuracy (AUC):
▫ higher accuracies than sparse MKL and ARTS
ARTS winner of international comparison of 19 models
(Kloft et al., NIPS 2009, JMLR 2011)
(Abeel et al., 2009)
Application Domain: Genetics
20Marius Kloft (TU Berlin)
• Protein Fold Prediction
▫ Prediction of fold class of protein
Fold class related to protein’s function
▫e.g., important for drug design
▫ Data set and kernels from Y. Ying
27 fold classes
Fixed train and test sets
12 biologically inspired kernels
▫e.g. hydrophobicity, polarity, van-der-Waals volume
• Results
▫ Accuracy:
▫ 1-norm MKL and SVM on par
▫ p-norm MKL performs best
6% higher accuracy than baselines
Application Domain: Pharmacology
21Marius Kloft (TU Berlin)
▫Com
pute
r visi
on
Visual
obj
ect r
ecog
nitio
n
▫Bio
info
rmat
ics
Subce
llula
r loc
aliza
tion
Pro
tein
fold
pre
dict
ion
DNA
Tran
scrip
tion
splic
e sit
e de
tect
ion
M
etab
olic
netw
ork
reco
nstru
ctio
n
non-sparsity
Further Applications
Marius Kloft (TU Berlin)
Conclusion: Non-sparse Multiple Kernel Learning
10/12
Training with > 100,000 data points and > 1 000 Kernels
Sharp learning bounds
Appli-cations
Visual Object Recognition
established standard: winner of ImageCLEF 2011 Challenge
Computational Biology
More accurate gene detector than winner of int. comparison
12/12
Thank you for your attention.
I will be pleased to answer any additional questions.
References
▫ Abeel, Van de Peer, Saeys (2009). Toward a gold standard for promoter prediction evaluation. Bioinformatics.
▫ Bach (2008). Consistency of the Group Lasso and Multiple Kernel Learning. Journal of Machine Learning Research (JMLR).
▫ Kloft, Brefeld, Laskov, and Sonnenburg (2008). Non-sparse Multiple Kernel Learning. NIPS Workshop on Kernel Learning.
▫ Kloft, Brefeld, Sonnenburg, Laskov, Müller, and Zien (2009). Efficient and Accurate Lp-norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2009).
▫ Kloft, Rückert, and Bartlett (2010). A Unifying View of Multiple Kernel Learning. ECML.
▫ Kloft, Blanchard (2011). The Local Rademacher Complexity of Lp-Norm Multiple Kernel Learning. Advances in Neural Information Processing Systems (NIPS 2011).
▫ Kloft, Brefeld, Sonnenburg, and Zien (2011). Lp-Norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 12(Mar):953-997.
▫ Kloft and Blanchard (2012). On the Convergence Rate of Lp-norm Multiple Kernel Learning. Journal of Machine Learning Research (JMLR), 13(Aug):2465-2502.
▫ Lanckriet, Cristianini, Bartlett, El Ghaoui, Jordan (2004). Learning the Kernel Matrix with Semidefinite Programming. Journal of Machine Learning Research (JMLR).
11/12