Download pdf - ICCV2013 reading: Learning to rank using privileged information

ICCV2013 reading 2014.3.28

Akisato Kimura (@_akisato)

Paper to read

(Presented at ICCV2013)

Problem dealing with in this paper

• Learning using privileged information (LUPI) – Training

• Feature vectors : 𝑋𝑋 = 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 , 𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑 • Label annotation : 𝑌𝑌 = 𝑦𝑦1, … ,𝑦𝑦𝑁𝑁 , 𝑦𝑦𝑖𝑖 ∈ ℕ • Additional information : 𝑋𝑋∗ = 𝑥𝑥1∗, … , 𝑥𝑥𝑁𝑁∗ , 𝑥𝑥𝑖𝑖∗ ∈ ℝ𝑑𝑑∗

– Testing • Prediction function : 𝑓𝑓: ℝ𝑑𝑑 → ℕ • No additional information required

Privileged information??

• Applicable to several scenarios in CV

Formulation

• Generic supervised binary classification – Training

• Feature vectors : 𝑋𝑋 = 𝑥𝑥1, … , 𝑥𝑥𝑁𝑁 , 𝑥𝑥𝑖𝑖 ∈ ℝ𝑑𝑑 • Label annotation : 𝑌𝑌 = 𝑦𝑦1, … ,𝑦𝑦𝑁𝑁 , 𝑦𝑦𝑖𝑖 ∈ {+1,−1} • Additional information : 𝑋𝑋∗ = 𝑥𝑥1∗, … , 𝑥𝑥𝑁𝑁∗ , 𝑥𝑥𝑖𝑖∗ ∈ ℝ𝑑𝑑∗

– Testing • Prediction function : 𝑓𝑓: ℝ𝑑𝑑 → ℝ • No additional information required

Key idea

• Privileged information allow us to distinguish between easy and hard examples – If the privileged data is easy to classify, then the

original data would also be easy to classify.

– … under the assumption that the privileged data is similarly informative about the problem at hand.

Linear SVM

• Ordinary convergence rate = 𝑂𝑂(𝑁𝑁−1/2) • It improves to 𝑂𝑂(𝑁𝑁−1)

– if we knew the optimal slack values 𝜉𝜉𝑖𝑖 in advance (OracleSVM [Vapnik+ 2009])

min𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ

Slack variables in SVM

• Slack variables tell us which training examples are easy / hard to classify – 𝜉𝜉𝑖𝑖 = 0 → easy – 𝜉𝜉𝑖𝑖 ≫ 0 → hard

min𝑤𝑤∈ℝ𝑑𝑑,𝑏𝑏∈ℝ,𝜉𝜉𝑖𝑖∈ℝ

SVM+

• A 1st model for LUPI – Use privileged data as a proxy to the oracle – Parameterize 𝜉𝜉𝑖𝑖 = 𝑤𝑤∗, 𝑥𝑥𝑖𝑖∗ + 𝑏𝑏∗

[Vapnik+ NN2009, NIPS2010]

Why should SVM+ be improved?

• Cannot be solved by popular SVM packages – Although good optimization algorithms were

derived [Pechyony+ 2011], they work only with the dual.

Learning to rank setup instead

• Underlying idea is the same • Using the privileged data to identify easy /

hard-to-separate sample pairs – Instead of using it to identify easy / hard-to-

classify samples

SVMrank

• Slack variables tell us which training example pairs are easy / hard / impossible to separate

[Joachims KDD2002]

Proposed method: Rank transfer

• The strategy is similar to SVM+, but indirect.

1. SVMrank on 𝑋𝑋∗ (The ranking function 𝑓𝑓∗) 2. Margins 𝜌𝜌𝑖𝑖𝑖𝑖 = 𝑓𝑓∗ 𝑥𝑥𝑖𝑖∗ − 𝑓𝑓∗(𝑥𝑥𝑖𝑖∗) ∀𝑖𝑖, 𝑗𝑗 𝑦𝑦𝑖𝑖 > 𝑦𝑦𝑖𝑖

• 𝜌𝜌𝑖𝑖𝑖𝑖 ≫ 0 : easy, 𝜌𝜌𝑖𝑖𝑖𝑖 ≈ 0 : hard, 𝜌𝜌𝑖𝑖𝑖𝑖 < 0 : impossible

3. SVMrank on 𝑋𝑋 with data-dependent margins

Intuition

• If it was difficult to correctly rank a pair on 𝑋𝑋∗, also it will also be difficult on 𝑋𝑋 1. Pairs (𝑖𝑖, 𝑗𝑗) with small margins 𝜌𝜌𝑖𝑖𝑖𝑖 have more

limited influence on 𝑤𝑤 2. Incorrectly ranked pairs are ignored.

1.

2.

Why not Rank transfer?

• We can use standard SVM packages! – For the SVMrank on 𝑋𝑋∗ this is clear. – For the SVMrank on 𝑋𝑋 we need variable

transformations

Experiments

• 4 different types of privileged information – All of those can be handled in a unified framework.

• 4 different methods to be compared – SVM, SVMrank, SVM+, Rank transfer

• Evaluation metric = Average Precision

(1) Attributes as privileged info

• Animals with Attributes Dataset – 10 species ( = classes), 85 properties ( = attributes)

• Features: 2000-dim SURF • Privileged: 85-dim predicted attributes

[Lampert+ PAMI2014]

• Learn 1-vs-1 classifiers with 100 training samples

(1) Results

• Rank transfer is the best.

(2) Bounding box as privileged info

• Fine-grained setup on ILSVRC2012 – 17 classes with variety of snakes

• Features: 4096-dim Fisher vector from the whole images

• Privileged: 4096-dim Fisher vector from the bounding box regions

• Learn 1-vs-rest classifiers

(2) Results

• SVM+ is the best, ranking strategies do not seem suitable for this setup.

(3) Texts as privileged info

• IsraelImages dataset [Bekkerman+ CVPR2007]

– 11 classes, 1800 images with a textual description up to 18 words

• Features: 4096-dim Fisher vectors • Privileged: BoWs from the texts • Learn 1-vs-1 classifiers

Desert Trees

(3) Results

• Reference (privileged only) is the best • All the others produce almost the same.

– Note that, high accuracy in the privileged space does not necessarily mean that the privileged information is helpful for the target task.

(4) Rationales as privileged info

• Hot or Not dataset [Donahue+ ICCV2011]

• Features: 500-dim densely sampled SIFT from the whole image

• Privileged: 500-dim densely sampled SIFT from the rationales

(4) Results

• Reference is the best. • Rank transfer performs better for male class. • Hard to draw a conclusion.

Appendix: Margin transfer

• One possible alternative to Rank transfer

But not so good…

Last words

• The idea is nice, easy to use. • More privileged information, better

performance? --- needs discussions • Which types of privileged information are

suitable? --- unknown