Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar Philip Torr Andrew Zisserman

Invariant Large Margin Nearest Neighbour Classifier

M. Pawan Kumar

Philip Torr

Andrew Zisserman

Aim

• To learn a distance metric for invariant

nearest neighbour classification

Training data

Aim



Target pairs

Aim



Impostor pairs

Problem : Euclidean distance may not provide correct nearest neighbours

Solution : Learn a mapping to new space

Aim



• Bring Target pairs closer

• Move Impostor pairs away

Aim

Euclidean Distance Learnt Distance



Aim




Aim




Aim




Aim




Aim



TransformationTrajectories

Learn a mapping to new space

Aim



• Bring Target Trajectory pairs closer

• Move Impostor Trajectory pairs away

Aim




MotivationFace Recognition in TV Video

I1

I2

I3

I4

In

.

.

.

FeatureVector

Euclidean distance may not give correct nearest neighbours

Learn a distance metric

MotivationFace Recognition in TV Video

Invariance to changes in position of features

• Large Margin Nearest Neighbour (LMNN)

• Preventing Overfitting

• Polynomial Transformations

• Invariant LMNN (ILMNN)

• Experiments

Outline

LMNN ClassifierWeinberger, Blitzer and Saul - NIPS 2005

Learns a distance metric for Nearest Neighbour classification

• Learns a mapping L x Lx

• Bring target pairs closer

• Move impostor pairs away

xi xj

xk



Distance between xi and xj : D(i,j) = (xi-xj)T LTL (xi-xj)

xi xj

xk



Distance between xi and xj : D(i,j) = (xi-xj)T M (xi-xj)

min Σij D(i,j)

subject to M 0

Convex Semidefinite Program(SDP)

M 0

xi xj

xk

Global minimum



D(i,k) – D(i,j) ≥ 1 - eijk eijk ≥ 0

min Σijk eijk

subject to M 0

Convex SDP

xi xj

xk



min Σij D(i,j) + ΛH Σijk eijk

subject to M 0

D(i,k) – D(i,j) ≥ 1- eijk

eijk ≥ 0Solve to obtain optimum M

Complexity : Polynomial in number of points


Advantages

• Trivial extension to multiple classes

• Efficient polynomial time solution

Disadvantages

• Large number of degrees of freedom – overfitting ??

• Does not model invariance of data





• Experiments

Outline

L2 Regularized LMNN ClassifierRegularize Frobenius norm of L

• ||L||2 = Σ Mii

min Σij D(i,j) + ΛH Σijk eijk + ΛR Σi Mii

subject to M 0


eijk ≥ 0

L2-LMNN

Diagonal LMNNLearn a diagonal L matrix => Learn a diagonal M matrix

min Σij D(i,j) + ΛH Σijk eijk

subject to M 0


eijk ≥ 0 Mij = 0, i ≠ j

Linear Program

D-LMNN

Diagonally Dominant LMNNMinimize 1-norm of off-diagonal element of M

min Σij D(i,j) + ΛH Σijk eijk + ΛR Σij tij

subject to M 0


eijk ≥ 0

tij ≥ Mij, tij ≥ -Mij , i ≠ j

DD-LMNN

LMNN Classifier

What about invariance to known transformations?

Append input data with transformed versions

Inefficient Inaccurate

Can we add invariance to LMNN?

• No – Not for a general transformation

• Yes - For some types of transformations





• Experiments

Outline

Polynomial Transformations

x =a

bRotate x by an angle θ

a

b

cos θ

sin θ

-sin θ

cos θ

1-θ2/2 -(θ-θ3/6) a

b(θ-θ3/6) 1-θ2/2 Taylor’s Series

Polynomial Transformations

x =a

bRotate x by an angle θ

a

b

cos θ

sin θ

-sin θ

cos θ

a 1θ

b -a/2 b/6

b a -b/2 -a/6 θ2

θ3

X θ

T(θ,x) = X θ

Why are Polynomials Special?

≡ P 0

θ1

θ2

(θ1 ,θ2)

DISTANCE

Sum of squares of polynomials

SD-Representability of Polynomials Lasserre, 2001

Why are Polynomials Special?

≡ P’ 0

θ1

θ2

DISTANCE

Sum of squares of polynomials





• Experiments

Outline

ILMNN ClassifierLearns a distance metric for invariant Nearest Neighbour classification


• Bring target trajectories closer

• Move impostor trajectories away

Polynomial trajectories

xi xj

xk



• Bring target trajectories closer

• Move impostor trajectories away


M 0

Minimize maximum distance

Maximize minimum distance

xi xj

xk


• Use SD-Representability. One Semidefinite Constraint.


• Solve for M in polynomial time.

• Add regularizers to prevent overfitting.

xi xj

xk





• Experiments

Outline

DatasetFaces from an episode of “Buffy – The Vampire Slayer”

11 Characters

* Thanks to Josef Sivic and Mark Everingham

24,244 Faces (with ground truth labelling*)

Dataset SplitsExperiment 1

Experiment 2

• Random permutation of dataset

• 30% training

• 30% validation (to estimate ΛH and ΛR)

• 40% testing

• First 30% training

• Next 30% validation

• Last 40% testing

Suitable forNearest Neighbour-type

Classification

Not so suitable forNearest Neighbour-type

Classification

Incorporating InvarianceInvariance of feature position to Euclidean Transformation

-5o ≤ θ ≤ 5o

-3 ≤ tx ≤ 3 pixels

-3 ≤ ty ≤ 3 pixels

Approximated to degree 2 polynomial using Taylor’s series

Derivatives approximated as image differences

Image Rotated Image

Incorporating InvarianceInvariance of feature position to Euclidean Transformation

-5o ≤ θ ≤ 5o

-3 ≤ tx ≤ 3 pixels

-3 ≤ ty ≤ 3 pixels

Approximated to degree 2 polynomial using Taylor’s series

Derivatives approximated as image differences

Smooth Image Smooth Image

- =Derivative

Training the Classifiers

Within-shot Faces

Problem : Euclidean distance provides 0 error

Solution : Cluster.

Training the Classifiers

Efficiently solve SDP using Alternative Projection

Bauschke and Borwein, 1996

Problem : Euclidean distance provides 0 error

Solution : Cluster. Train using cluster centres.

Testing the Classifiers

Map all training points using L

Map the test point using L

Find nearest neighbours. Classify.

Measure Accuracy = No. of True Positives

No. of Test Faces

Timings

Method Training Testing

kNN-E - 62.2 s

L2-LMNN 4 h 62.2 s

D-LMNN 1 h 53.2 s

DD-LMNN 2 h 50.5 s

L2-ILMNN 24 h 62.2 s

D-ILMNN 8 h 48.2 s

DD-ILMNN 24 h 51.9 s

M-SVM 300 s 446.6 sSVM-KNN - 2114.2 s

Accuracy

Method Experiment 1 Experiment 2

kNN-E 83.6 26.7

L2-LMNN 61.2 22.6

D-LMNN 85.6 24.3

DD-LMNN 84.4 24.5

L2-ILMNN 65.9 24.0

D-ILMNN 87.2 32.0

DD-ILMNN 86.6 29.8

M-SVM 62.3 30.0SVM-KNN 75.5 28.1

Accuracy


kNN-E 83.6 26.7

L2-LMNN 61.2 22.6

D-LMNN 85.6 24.3

DD-LMNN 84.4 24.5

L2-ILMNN 65.9 24.0

D-ILMNN 87.2 32.0

DD-ILMNN 86.6 29.8

M-SVM 62.3 30.0SVM-KNN 75.5 28.1

Accuracy


kNN-E 83.6 26.7

L2-LMNN 61.2 22.6

D-LMNN 85.6 24.3

DD-LMNN 84.4 24.5

L2-ILMNN 65.9 24.0

D-ILMNN 87.2 32.0

DD-ILMNN 86.6 29.8

M-SVM 62.3 30.0SVM-KNN 75.5 28.1

Accuracy


kNN-E 83.6 26.7

L2-LMNN 61.2 22.6

D-LMNN 85.6 24.3

DD-LMNN 84.4 24.5

L2-ILMNN 65.9 24.0

D-ILMNN 87.2 32.0

DD-ILMNN 86.6 29.8

M-SVM 62.3 30.0SVM-KNN 75.5 28.1

True Positives

Conclusions

• Regularizers for LMNN

• Adding invariance to LMNN

• More accurate than Nearest Neighbour

• More accurate than LMNN

Future Research

• D-LMNN and D-ILMNN for Chi-squared distance

• D-LMNN and D-ILMNN for dot product distance

• Handling missing data –

Sivaswamy, Bhattacharya, Smola, JMLR – 2006

• Learning local mappings (adaptive kNN)

Questions ??

False Positives

Precision-Recall Curves

Experiment 1


Experiment 1


Experiment 1


Experiment 2


Experiment 2


Experiment 2

Documents

Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar Philip Torr Andrew Zisserman