Lec07 aggregation-and-retrieval-system

Preview:

Citation preview

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 07

Feature Aggregation and Image Retrieval System

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.

http://l.web.umkc.edu/lizhu

p.1Image Analysis & Retrieval, 2016

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.2

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian The scale is controlled by 𝜎

Characteristic Scale:

Image Analysis & Retrieval, 2016 p.3

2

2

2

22

y

g

x

gg

𝑔 = 𝑒− 𝑥+𝑦 2

2𝜎

r

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

characteristic scale

SIFT

Use DoG to approximate LoG Separable Gaussian filter

Difference of image instead of difference of Gaussian kernel

Image Analysis & Retrieval, 2016 p.4

LoG

Scale space construction By Gaussian Filtering, and Image Difference

Peak Strength & Edge Removal

Peak Strength: Interpolate true DoG response and pixel location by Taylor

expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

Image Analysis & Retrieval, 2016 p.5

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the

gradients closer to the center

Image Analysis & Retrieval, 2016 p.6

SIFT Matching and Repeatability Prediction

SIFT Distance

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

Image Analysis & Retrieval, 2016 p.7

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

2 )

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃

Box Fitler – CABOX work

Basic Idea: Approximate DoG with linear combination of box filters

min.𝒉

𝒈− 𝐵 ∙ 𝒉 𝐿22 + 𝒉 𝐿1

Solution by LASSO

Image Analysis & Retrieval, 2016 p.8

= h1* h2*+ + …

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.9

Image Matching/Retrieval System

SIFT is a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy

Say if we can compute a single distance from a collection of features:

Then for a data base of n images, we can compute an n x n distance matrix This gives us full information of the performance of this

feature/distance system

How to characterize the performance of such image matching and retrieval system ?

Image Analysis & Retrieval, 2016 p.10

𝑑 𝐼1, 𝐼2 =

𝑘

𝛼𝑘𝑑(𝐹𝑘1, 𝐹𝑘

2)

𝐷𝑖 ,𝑘= 𝑑(𝐼𝑗 , 𝐼𝑘)

Thresholding for Matching

Basically, for any pair of Images (documents, in IR jargon), we declare

Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;

FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;

TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;

FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t;

Image Analysis & Retrieval, 2016 p.11

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗 , 𝐼𝑘 < 𝑡

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Matching System Performance

True Positive Rate/Precision: Out of retrieved matching pairs, how many are true matching

pairs

For all matching pairs with distance < t

False Positive Rate:

Out of retrieved matching pairs, how many are actually negative, false matchings

Image Analysis & Retrieval, 2016 p.12

𝑇𝑃𝑅 =𝑡𝑝

𝑡𝑝 + 𝑓𝑛

𝐹𝑃𝑅 =𝑓𝑝

𝑓𝑝 + 𝑡𝑛

TPR-FPR

Definition:

TP rate = TP/(TP+FN)

FP rate = FP/(FP+TN)

From the actual value

point of view

Image Analysis & Retrieval, 2016 p.13

ROC curve(1)

ROC = receiver operating characteristic

Y:TP rate

X:FP rate

Image Analysis & Retrieval, 2016 p.14

ROC curve(2)

Which method (A or B) is better?compute ROC area: area under ROC

curve

Image Analysis & Retrieval, 2016 p.15

Precision, Recall, F-measure

Precision = TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Precision:is the probability that a

retrieved document is relevant.

Recall:is the probability that a

relevant documentis retrieved in a search.

Image Analysis & Retrieval, 2016 p.16

Matlab Implementation

We will compute all image pair distances D(j,k)

How do we compute the TPR-FPR plot ? Understand that TPR and

FPR are actually function of threshold t,

Just need to parameterize TPR(t) and FPR(t), and obtaining operating points of meaningful thresholds, to generate the plot.

Matlab Implementation: [tp, fp, tn,

fn]=getPrecisionRecall()

Image Analysis & Retrieval, 2016 p.17

d_min = min(min(d0), min(d1));

d_max = max(max(d0), max(d1));

delta = (d_max - d_min) / npt;

for k=1:npt

thres = d_min + (k-1)*delta;

tp(k) = length(find(d0<=thres));

fp(k) = length(find(d1<=thres));

tn(k) = length(find(d1>thres));

fn(k) = length(find(d0>thres));

end

if dbg

figure(22); grid on; hold on;

plot(fp./(tn+fp), tp./(tp+fn), '.-r',

'DisplayName', 'tpr-fpr');legend();

end

TPR-FPR

Image Matching performance are characterized by functions TPR(FPR)

Retrieval set: we want high Precision, Short List: High Recall.

Image Analysis & Retrieval, 2016 p.18

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.19

Why Aggregation ?

What (Local) Interesting Points features bring us ? Scale and rotation invariance in the form of nk x d:

Un-cerntainty of the number of detected features nk, at query time

Permutation along rows of features are the same representation.

Problems: The feature has state, not able to draw decision boundaries,

Not directly indexable/hashable

Typically very high dimensionality

Image Analysis & Retrieval, 2016 p.20

𝑆𝑘| [𝑥𝑘 , 𝑦𝑘, 𝜃𝑘 , 𝜎𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛

Decision Boundary in Matching

Can we have a decision boundary function for interesting points based representation ?

Image Analysis & Retrieval, 2016 p.21

…..

Curse of Dimensionality in Retrieval

What feature dimensions will do to the retrieval efficiency… Looking at retrieval 99% of per dimension locality, and the

total volume covered plot.

Matlab: showDimensionCurse.m

Image Analysis & Retrieval, 2016 p.22

+

Aggregation – 30,000ft view

Bag of Words Compute k centroids in feature space, called visual words

Compute histogram

k x1 feature, hard assignment

VLAD Compute centroids in feature space

Compute aggregaged difference w.r.t the centroids

k x d feature, soft assignment

Fisher Vector Compute a Gaussian Mixture Model (GMM) with 2nd order info

Compute the aggregated feature w.r.t the mean and covariance of GMM

2 x k x d feature

AKULA Adaptive centroids and feature count

Improved with covariance ?

Image Analysis & Retrieval, 2016 p.23

0.5

0.4 0.05

0.05

Visual Key Words: main idea

Extract some local features from a number of images …

Image Analysis & Retrieval, 2016 24

e.g., SIFT descriptor

space: each point is 128-

dimensional

Slide credit: D. Nister

Visual Key Words: main idea

Image Analysis & Retrieval, 2016 25Slide credit: D. Nister

Visual words: main idea

Image Analysis & Retrieval, 2016 26

Slide credit: D. Nister

Visual words: main idea

Image Analysis & Retrieval, 2016 27

Slide credit: D. Nister

Slide credit: D. Nister

Visual Key Words

Image Analysis & Retrieval, 2016 28

Each point is a local

descriptor, e.g. SIFT

vector.

Slide credit: D. Nister

Image Analysis & Retrieval, 2016 29

Visual words

Example: each group of patches belongs to the same visual word

Image Analysis & Retrieval, 2016 30

Figure from Sivic & Zisserman, ICCV 2003

Visual words

Image Analysis & Retrieval, 2016 3131

Source credit: K. Grauman, B. Leibe

• More recently used for describing scenes and objects for the sake of indexing or classification.

Sivic & Zisserman 2003;

Csurka, Bray, Dance, & Fan

2004; many others.

Object Bag of ‘words’

ICCV 2005 short course, L. Fei-Fei

Bag of Words

Image Analysis & Retrieval, 2016 32

BoW Examples

Illustration

Image Analysis & Retrieval, 2016 33

Bags of visual words

Summarize entire image based on its distribution (histogram) of word occurrences.

Analogous to bag of words representation commonly used for documents.

Image Analysis & Retrieval, 2016 34

Image credit: Fei-Fei Li

Texture Retrieval

Texons…

Image Analysis & Retrieval, 2016 35

Universal texton dictionary

histogram

Source: Lana Lazebnik

BoW Distance Metrics

Rank images by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images.

Image Analysis & Retrieval, 2016 p.36

[5 1 1 0][1 8 1 4]

djq

Inverted List

Image Retrieval via Inverted List

Image Analysis & Retrieval, 2016 37

Image credit: A. Zisserman

Visual

Word

number

List of image

numbers

When will this give us a significant gain in efficiency?

Indexing local features: inverted file index

For text documents, an efficient way to find all pageson which a word occurs is to use an index…

We want to find all images in which a feature occurs.

We need to index each feature by the image it appears and also we keep the # of occurrence.

Image Analysis & Retrieval, 2016 38

Source credit : K. Grauman, B. Leibe

TF-IDF Weighting

Term Frequency – Inverse Document Frequency Describe image by frequency of each visual word within

it, down-weight words that appear often in the database (Standard weighting for text retrieval)

Image Analysis & Retrieval, 2016 p.39

Total number of

words in database

Number of

occurrences of

word i in whole

database

Number of

occurrences of

word i in

document d

Number of

words in

document d

BoW Use Case with Spatial Localization

Collecting words within a query region

Image Analysis & Retrieval, 2016 40

Query region:

pull out only the SIFT

descriptors whose

positions are within the

polygon

Image Analysis & Retrieval, 2016 41

BoW Patch Search

Localizing the BoW representation

Image Analysis & Retrieval, 2016 42

Localization with BoW

Image Analysis & Retrieval, 2016 43

Hiearchical Assignment of Histogram

Tree construction:

Image Analysis & Retrieval, 2016 44

[Nister & Stewenius, CVPR’06]

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 45

[Nister & Stewenius, CVPR’06]

46

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 46Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

47

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 47Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 48

[Nister & Stewenius, CVPR’06]

Vocabulary Tree

Training: Filling the tree

Image Analysis & Retrieval, 2016 49

[Nister & Stewenius, CVPR’06]

50

Vocabulary Tree

Recognition

Image Analysis & Retrieval, 2016 50Slide credit: David Nister

[Nister & Stewenius, CVPR’06]

RANSAC

verification

Vocabulary Tree: Performance

Evaluated on large databases Indexing with up to 1M images

Online recognition for databaseof 50,000 CD covers Retrieval in ~1s

Find experimentally that large vocabularies can be beneficial for recognition

Image Analysis & Retrieval, 2016 51

[Nister & Stewenius, CVPR’06]

Larger vocabularies

can be

advantageous…

But what happens if it

is too large?

Visual Word Vocabulary Size

Performance w.r.t vocabulary size

Image Analysis & Retrieval, 2016 52

Bags of words: pros and cons

Good:+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ Inverted List implementation offers practical solution

against large repository

Bad:- Lost of information at quantization and histogram

generation- basic model ignores geometry – must verify afterwards,

or encode via features- background and foreground mixed when bag covers

whole image- interest points or sampling: no guarantee to capture

object-level parts

Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe

Can we improve BoW ?

• E.g. Why isn’t our Bag of Words classifier at 90% instead of 70%?

• Training Data

– Huge issue, but not necessarily a variable you can manipulate.

• Learning method

– BoW is on top of any feature scheme

• Representation

– Are we losing too much info in the process ?

Image Analysis & Retrieval, 2016 p.54

Standard Kmeans Bag of Words

BoW revisited

Image Analysis & Retrieval, 2016 p.55

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics/information ?

Image Analysis & Retrieval, 2016 p.56

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

We already looked at the Spatial Pyramid/Pooling

Spatial Pooling

Image Analysis & Retrieval, 2016 p.57

level 2: 4x4level 0: 1x1 level 1: 2x2

Key take away: Multiple assignment ? Soft Assignment ?

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics? For instance:• mean of local descriptors

Image Analysis & Retrieval, 2016 p.58

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics? For instance:• mean of local descriptors

• (co)variance of local descriptors

Image Analysis & Retrieval, 2016 p.59

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

Image Analysis & Retrieval, 2016 p.60

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval –the inverted file index becomes less sparse.

Image Analysis & Retrieval, 2016 p.61

A first example: the VLAD

Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :

• assign:

• compute:

• concatenate vi’s + normalize

Image Analysis & Retrieval, 2016 p.62

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

3

x

v1 v2v3 v4

v5

1

4

2

5

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

A first example: the VLAD

A graphical representation of

Image Analysis & Retrieval, 2016 p.63

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

VL_FEAT Implementation

Matlab:

Image Analysis & Retrieval, 2016 p.64

function [vc]=vladSiftEncoding(sift,

codebook)

dbg=1;

if dbg

if (0) % init VL_FEAT, only need

to do once

run('../../tools/vlfeat-

0.9.20/toolbox/vl_setup.m');

end

im = imread('../pics/flarsheim-

2.jpg');

[f, sift] =

vl_sift(single(rgb2gray(im))); sift =

single(sift');

[indx, codebook] = kmeans(sift,

16);

% make sift # smaller

sift = sift(1:800,:);

end

[n, kd]=size(sift);

[m, kd]=size(codebook);

% compute assignment

dist = pdist2(codebook, sift);

mdist = mean(mean(dist));

% normalize the heat kernel s.t. mean

dist is mapped to 0.5

a = -log(0.5)/mdist;

indx = exp(-a*dist);

vc=vl_vlad(sift', codebook', indx);

if dbg

figure(41); colormap(gray);

subplot(2,2,1); imshow(im);

title('image');

subplot(2,2,2); imagesc(dist);

title('m x n distance');

subplot(2,2,3); imagesc(indx);

title('m x n assignment');

subplot(2,2,4); imagesc(reshape(vc,

[m, kd]));title('vlad code');

end

VLAD Code

What are the tweaks ? Code book design

Soft Assignment options

Image Analysis & Retrieval, 2016 p.65

References

Vocabulary Tree: David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary

Tree. CVPR (2) 2006: 2161-2168

VLAD: Herve Jegou, Matthijs Douze, Cordelia Schmid:

Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision 87(3): 316-336 (2010)

Fisher Vector: Florent Perronnin, Jorge Sánchez, Thomas Mensink:

Improving the Fisher Kernel for Large-Scale Image Classification. ECCV (4) 2010: 143-156

AKULA: Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park:

AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014: 13-22

Image Analysis & Retrieval, 2016 p.66

Lec 07 Summary

Image Retrieval System Metric What is true positive, false positive, true negative, false

negative ?

What is precision, recall, F-score ?

Why Aggregation ? Decision boundary

Indexing/Hashing

Bag of Words A histogram with bins visual words

Variations: hierarchical assignment with vocabulary tree

Implementation: Inverted List

VLAD Richer encoding of aggregated info

Soft assignment of features to codebook bins

Vectorized representation – no need for inverted list

Image Analysis & Retrieval, 2016 p.67

Recommended