ICCV2013 reading 2014.3.28
Akisato Kimura (@_akisato)
Paper to read
(Presented at ICCV2013)
Problem dealing with in this paper
β’ Learning using privileged information (LUPI) β Training
β’ Feature vectors : ππ = π₯π₯1, β¦ , π₯π₯ππ , π₯π₯ππ β βππ β’ Label annotation : ππ = π¦π¦1, β¦ ,π¦π¦ππ , π¦π¦ππ β β β’ Additional information : ππβ = π₯π₯1β, β¦ , π₯π₯ππβ , π₯π₯ππβ β βππβ
β Testing β’ Prediction function : ππ: βππ β β β’ No additional information required
Privileged information??
β’ Applicable to several scenarios in CV
Formulation
β’ Generic supervised binary classification β Training
β’ Feature vectors : ππ = π₯π₯1, β¦ , π₯π₯ππ , π₯π₯ππ β βππ β’ Label annotation : ππ = π¦π¦1, β¦ ,π¦π¦ππ , π¦π¦ππ β {+1,β1} β’ Additional information : ππβ = π₯π₯1β, β¦ , π₯π₯ππβ , π₯π₯ππβ β βππβ
β Testing β’ Prediction function : ππ: βππ β β β’ No additional information required
Key idea
β’ Privileged information allow us to distinguish between easy and hard examples β If the privileged data is easy to classify, then the
original data would also be easy to classify.
β β¦ under the assumption that the privileged data is similarly informative about the problem at hand.
Linear SVM
β’ Ordinary convergence rate = ππ(ππβ1/2) β’ It improves to ππ(ππβ1)
β if we knew the optimal slack values ππππ in advance (OracleSVM [Vapnik+ 2009])
minπ€π€ββππ,ππββ,ππππββ
Slack variables in SVM
β’ Slack variables tell us which training examples are easy / hard to classify β ππππ = 0 β easy β ππππ β« 0 β hard
minπ€π€ββππ,ππββ,ππππββ
SVM+
β’ A 1st model for LUPI β Use privileged data as a proxy to the oracle β Parameterize ππππ = π€π€β, π₯π₯ππβ + ππβ
[Vapnik+ NN2009, NIPS2010]
Why should SVM+ be improved?
β’ Cannot be solved by popular SVM packages β Although good optimization algorithms were
derived [Pechyony+ 2011], they work only with the dual.
Learning to rank setup instead
β’ Underlying idea is the same β’ Using the privileged data to identify easy /
hard-to-separate sample pairs β Instead of using it to identify easy / hard-to-
classify samples
SVMrank
β’ Slack variables tell us which training example pairs are easy / hard / impossible to separate
[Joachims KDD2002]
Proposed method: Rank transfer
β’ The strategy is similar to SVM+, but indirect.
1. SVMrank on ππβ (The ranking function ππβ) 2. Margins ππππππ = ππβ π₯π₯ππβ β ππβ(π₯π₯ππβ) βππ, ππ π¦π¦ππ > π¦π¦ππ
β’ ππππππ β« 0 : easy, ππππππ β 0 : hard, ππππππ < 0 : impossible
3. SVMrank on ππ with data-dependent margins
Intuition
β’ If it was difficult to correctly rank a pair on ππβ, also it will also be difficult on ππ 1. Pairs (ππ, ππ) with small margins ππππππ have more
limited influence on π€π€ 2. Incorrectly ranked pairs are ignored.
1.
2.
Why not Rank transfer?
β’ We can use standard SVM packages! β For the SVMrank on ππβ this is clear. β For the SVMrank on ππ we need variable
transformations
Experiments
β’ 4 different types of privileged information β All of those can be handled in a unified framework.
β’ 4 different methods to be compared β SVM, SVMrank, SVM+, Rank transfer
β’ Evaluation metric = Average Precision
(1) Attributes as privileged info
β’ Animals with Attributes Dataset β 10 species ( = classes), 85 properties ( = attributes)
β’ Features: 2000-dim SURF β’ Privileged: 85-dim predicted attributes
[Lampert+ PAMI2014]
β’ Learn 1-vs-1 classifiers with 100 training samples
(1) Results
β’ Rank transfer is the best.
(2) Bounding box as privileged info
β’ Fine-grained setup on ILSVRC2012 β 17 classes with variety of snakes
β’ Features: 4096-dim Fisher vector from the whole images
β’ Privileged: 4096-dim Fisher vector from the bounding box regions
β’ Learn 1-vs-rest classifiers
(2) Results
β’ SVM+ is the best, ranking strategies do not seem suitable for this setup.
(3) Texts as privileged info
β’ IsraelImages dataset [Bekkerman+ CVPR2007]
β 11 classes, 1800 images with a textual description up to 18 words
β’ Features: 4096-dim Fisher vectors β’ Privileged: BoWs from the texts β’ Learn 1-vs-1 classifiers
Desert Trees
(3) Results
β’ Reference (privileged only) is the best β’ All the others produce almost the same.
β Note that, high accuracy in the privileged space does not necessarily mean that the privileged information is helpful for the target task.
(4) Rationales as privileged info
β’ Hot or Not dataset [Donahue+ ICCV2011]
β’ Features: 500-dim densely sampled SIFT from the whole image
β’ Privileged: 500-dim densely sampled SIFT from the rationales
(4) Results
β’ Reference is the best. β’ Rank transfer performs better for male class. β’ Hard to draw a conclusion.
Appendix: Margin transfer
β’ One possible alternative to Rank transfer
But not so goodβ¦
Last words
β’ The idea is nice, easy to use. β’ More privileged information, better
performance? --- needs discussions β’ Which types of privileged information are
suitable? --- unknown