Lec07 aggregation-and-retrieval-system

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 07

Feature Aggregation and Image Retrieval System

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.

http://l.web.umkc.edu/lizhu

p.1Image Analysis & Retrieval, 2016

Outline

ReCap of Lecture 06 SIFT

Box Filter

Image Retrieval System

Why Aggregation ?

Aggregation Schemes

Summary

Image Analysis & Retrieval, 2016 p.2

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian The scale is controlled by 𝜎

Characteristic Scale:

𝑔 = 𝑒− 𝑥+𝑦 2

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

characteristic scale

Use DoG to approximate LoG Separable Gaussian filter

Difference of image instead of difference of Gaussian kernel

Scale space construction By Gaussian Filtering, and Image Difference

Peak Strength & Edge Removal

Peak Strength: Interpolate true DoG response and pixel location by Taylor

expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation Weighted by a Gaussian window to give more emphasis to the

gradients closer to the center

SIFT Matching and Repeatability Prediction

SIFT Distance

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃

Box Fitler – CABOX work

Basic Idea: Approximate DoG with linear combination of box filters

min.𝒉

𝒈− 𝐵 ∙ 𝒉 𝐿22 + 𝒉 𝐿1

Solution by LASSO

= h1* h2*+ + …

Outline

Box Filter

Why Aggregation ?

Aggregation Schemes

Summary

Image Matching/Retrieval System

SIFT is a sub-image level feature, we actually care more on how SIFT match will translate into image level matching/retrieval accuracy

Say if we can compute a single distance from a collection of features:

Then for a data base of n images, we can compute an n x n distance matrix This gives us full information of the performance of this

feature/distance system

How to characterize the performance of such image matching and retrieval system ?

𝑑 𝐼1, 𝐼2 =

𝛼𝑘𝑑(𝐹𝑘1, 𝐹𝑘

𝐷𝑖 ,𝑘= 𝑑(𝐼𝑗 , 𝐼𝑘)

Thresholding for Matching

Basically, for any pair of Images (documents, in IR jargon), we declare

Then for each possible image pair, or pairs we care, for a given threshold t, there will be 4 possible consequences TP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) < t;

FP pair: {Ij, Ik} declared matching pairs, d(Ij, Ik) >= t;

TN pair: {Ij, Ik} declared non-matching pairs, d(Ij, Ik) >= t;

FN pair: {Ij, Ik} declared non- matching pairs, d(Ij, Ik) < t;

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ, 𝑖𝑓 𝑑 𝐼𝑗 , 𝐼𝑘 < 𝑡

𝐼𝑗 , 𝐼𝑘 𝑎𝑟𝑒𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Matching System Performance

True Positive Rate/Precision: Out of retrieved matching pairs, how many are true matching

For all matching pairs with distance < t

False Positive Rate:

Out of retrieved matching pairs, how many are actually negative, false matchings

𝑇𝑃𝑅 =𝑡𝑝

𝑡𝑝 + 𝑓𝑛

𝐹𝑃𝑅 =𝑓𝑝

𝑓𝑝 + 𝑡𝑛

TPR-FPR

Definition:

TP rate = TP/(TP+FN)

FP rate = FP/(FP+TN)

From the actual value

point of view

ROC curve(1)

ROC = receiver operating characteristic

Y:TP rate

X:FP rate

ROC curve(2)

Which method (A or B) is better?compute ROC area: area under ROC

Precision, Recall, F-measure

Precision = TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Precision:is the probability that a

retrieved document is relevant.

Recall:is the probability that a

relevant documentis retrieved in a search.

Matlab Implementation

We will compute all image pair distances D(j,k)

How do we compute the TPR-FPR plot ? Understand that TPR and

FPR are actually function of threshold t,

Just need to parameterize TPR(t) and FPR(t), and obtaining operating points of meaningful thresholds, to generate the plot.

Matlab Implementation: [tp, fp, tn,

fn]=getPrecisionRecall()

d_min = min(min(d0), min(d1));

d_max = max(max(d0), max(d1));

delta = (d_max - d_min) / npt;

for k=1:npt

thres = d_min + (k-1)*delta;

tp(k) = length(find(d0<=thres));

fp(k) = length(find(d1<=thres));

tn(k) = length(find(d1>thres));

fn(k) = length(find(d0>thres));

if dbg

figure(22); grid on; hold on;

plot(fp./(tn+fp), tp./(tp+fn), '.-r',

'DisplayName', 'tpr-fpr');legend();

TPR-FPR

Image Matching performance are characterized by functions TPR(FPR)

Retrieval set: we want high Precision, Short List: High Recall.

Outline

Box Filter

Why Aggregation ?

Aggregation Schemes

Summary

Why Aggregation ?

What (Local) Interesting Points features bring us ? Scale and rotation invariance in the form of nk x d:

Un-cerntainty of the number of detected features nk, at query time

Permutation along rows of features are the same representation.

Problems: The feature has state, not able to draw decision boundaries,

Not directly indexable/hashable

Typically very high dimensionality

𝑆𝑘| [𝑥𝑘 , 𝑦𝑘, 𝜃𝑘 , 𝜎𝑘, ℎ1, ℎ2, … , ℎ128] , 𝑘 = 1. . 𝑛

Decision Boundary in Matching

Can we have a decision boundary function for interesting points based representation ?

Curse of Dimensionality in Retrieval

What feature dimensions will do to the retrieval efficiency… Looking at retrieval 99% of per dimension locality, and the

total volume covered plot.

Matlab: showDimensionCurse.m

Aggregation – 30,000ft view

Bag of Words Compute k centroids in feature space, called visual words

Compute histogram

k x1 feature, hard assignment

VLAD Compute centroids in feature space

Compute aggregaged difference w.r.t the centroids

k x d feature, soft assignment

Fisher Vector Compute a Gaussian Mixture Model (GMM) with 2nd order info

Compute the aggregated feature w.r.t the mean and covariance of GMM

2 x k x d feature

AKULA Adaptive centroids and feature count

Improved with covariance ?

0.4 0.05

Visual Key Words: main idea

Extract some local features from a number of images …

Image Analysis & Retrieval, 2016 24

e.g., SIFT descriptor

space: each point is 128-

dimensional

Slide credit: D. Nister

Visual Key Words: main idea

Image Analysis & Retrieval, 2016 25Slide credit: D. Nister

Visual words: main idea

Visual Key Words

Each point is a local

descriptor, e.g. SIFT

vector.

Visual words

Example: each group of patches belongs to the same visual word

Figure from Sivic & Zisserman, ICCV 2003

Visual words

Source credit: K. Grauman, B. Leibe

• More recently used for describing scenes and objects for the sake of indexing or classification.

Sivic & Zisserman 2003;

Csurka, Bray, Dance, & Fan

2004; many others.

Object Bag of ‘words’

ICCV 2005 short course, L. Fei-Fei

Bag of Words

BoW Examples

Illustration

Bags of visual words

Summarize entire image based on its distribution (histogram) of word occurrences.

Analogous to bag of words representation commonly used for documents.

Image credit: Fei-Fei Li

Texture Retrieval

Texons…

Universal texton dictionary

histogram

Source: Lana Lazebnik

BoW Distance Metrics

Rank images by normalized scalar product between their (possibly weighted) occurrence counts---nearest neighbor search for similar images.

[5 1 1 0][1 8 1 4]

Inverted List

Image Retrieval via Inverted List

Image credit: A. Zisserman

Visual

number

List of image

numbers

When will this give us a significant gain in efficiency?

Indexing local features: inverted file index

For text documents, an efficient way to find all pageson which a word occurs is to use an index…

We want to find all images in which a feature occurs.

We need to index each feature by the image it appears and also we keep the # of occurrence.

Source credit : K. Grauman, B. Leibe

TF-IDF Weighting

Term Frequency – Inverse Document Frequency Describe image by frequency of each visual word within

it, down-weight words that appear often in the database (Standard weighting for text retrieval)

Total number of

words in database

Number of

occurrences of

word i in whole

database

Number of

occurrences of

word i in

document d

Number of

words in

document d

BoW Use Case with Spatial Localization

Collecting words within a query region

Query region:

pull out only the SIFT

descriptors whose

positions are within the

polygon

BoW Patch Search

Localizing the BoW representation

Localization with BoW

Hiearchical Assignment of Histogram

Tree construction:

[Nister & Stewenius, CVPR’06]

Vocabulary Tree

Training: Filling the tree

Vocabulary Tree

Image Analysis & Retrieval, 2016 46Slide credit: David Nister

Vocabulary Tree

Recognition

RANSAC

verification

Vocabulary Tree: Performance

Evaluated on large databases Indexing with up to 1M images

Online recognition for databaseof 50,000 CD covers Retrieval in ~1s

Find experimentally that large vocabularies can be beneficial for recognition

Larger vocabularies

can be

advantageous…

But what happens if it

is too large?

Visual Word Vocabulary Size

Performance w.r.t vocabulary size

Bags of words: pros and cons

Good:+ flexible to geometry / deformations / viewpoint+ compact summary of image content+ provides vector representation for sets+ Inverted List implementation offers practical solution

against large repository

Bad:- Lost of information at quantization and histogram

generation- basic model ignores geometry – must verify afterwards,

or encode via features- background and foreground mixed when bag covers

whole image- interest points or sampling: no guarantee to capture

object-level parts

Image Analysis & Retrieval, 2016 53Source credit : K. Grauman, B. Leibe

Can we improve BoW ?

• E.g. Why isn’t our Bag of Words classifier at 90% instead of 70%?

• Training Data

– Huge issue, but not necessarily a variable you can manipulate.

• Learning method

– BoW is on top of any feature scheme

• Representation

– Are we losing too much info in the process ?

Standard Kmeans Bag of Words

BoW revisited

http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/bag_of_visual_words.pdf

Motivation

Bag of Visual Words is only about counting the number of local descriptors assigned to each Voronoi region

Why not including other statistics/information ?

We already looked at the Spatial Pyramid/Pooling

Spatial Pooling

level 2: 4x4level 0: 1x1 level 1: 2x2

Key take away: Multiple assignment ? Soft Assignment ?

Motivation

Why not including other statistics? For instance:• mean of local descriptors

Motivation

Why not including other statistics? For instance:• mean of local descriptors

• (co)variance of local descriptors

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

Simple case: Soft Assignment

Called “Kernel codebook encoding” by Chatfield et al. 2011. Cast a weighted vote into the most similar clusters.

This is fast and easy to implement (try it for Project 3!) but it does have some downsides for image retrieval –the inverted file index becomes less sparse.

A first example: the VLAD

Given a codebook ,e.g. learned with K-means, and a set oflocal descriptors :

• assign:

• compute:

• concatenate vi’s + normalize

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

v1 v2v3 v4

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

A first example: the VLAD

A graphical representation of

Jégou, Douze, Schmid and Pérez, “Aggregating local descriptors into a compact image representation”, CVPR’10.

VL_FEAT Implementation

Matlab:

function [vc]=vladSiftEncoding(sift,

codebook)

dbg=1;

if dbg

if (0) % init VL_FEAT, only need

to do once

run('../../tools/vlfeat-

0.9.20/toolbox/vl_setup.m');

im = imread('../pics/flarsheim-

2.jpg');

[f, sift] =

vl_sift(single(rgb2gray(im))); sift =

single(sift');

[indx, codebook] = kmeans(sift,

% make sift # smaller

sift = sift(1:800,:);

[n, kd]=size(sift);

[m, kd]=size(codebook);

% compute assignment

dist = pdist2(codebook, sift);

mdist = mean(mean(dist));

% normalize the heat kernel s.t. mean

dist is mapped to 0.5

a = -log(0.5)/mdist;

indx = exp(-a*dist);

vc=vl_vlad(sift', codebook', indx);

if dbg

figure(41); colormap(gray);

subplot(2,2,1); imshow(im);

title('image');

subplot(2,2,2); imagesc(dist);

title('m x n distance');

subplot(2,2,3); imagesc(indx);

title('m x n assignment');

subplot(2,2,4); imagesc(reshape(vc,

[m, kd]));title('vlad code');

VLAD Code

What are the tweaks ? Code book design

Soft Assignment options

References

Vocabulary Tree: David Nistér, Henrik Stewénius: Scalable Recognition with a Vocabulary

Tree. CVPR (2) 2006: 2161-2168

VLAD: Herve Jegou, Matthijs Douze, Cordelia Schmid:

Improving Bag-of-Features for Large Scale Image Search. International Journal of Computer Vision 87(3): 316-336 (2010)

Fisher Vector: Florent Perronnin, Jorge Sánchez, Thomas Mensink:

Improving the Fisher Kernel for Large-Scale Image Classification. ECCV (4) 2010: 143-156

AKULA: Abhishek Nagar, Zhu Li, Gaurav Srivastava, Kyungmo Park:

AKULA - Adaptive Cluster Aggregation for Visual Search. DCC 2014: 13-22

Lec 07 Summary

Image Retrieval System Metric What is true positive, false positive, true negative, false

negative ?

What is precision, recall, F-score ?

Why Aggregation ? Decision boundary

Indexing/Hashing

Bag of Words A histogram with bins visual words

Variations: hierarchical assignment with vocabulary tree

Implementation: Inverted List

VLAD Richer encoding of aggregated info

Soft assignment of features to codebook bins

Vectorized representation – no need for inverted list

Lec07 aggregation-and-retrieval-system

Education

Solving Network Congestion with Carrier Aggregation Sheets/RFMD PDFs/Carrier... · Solving Network Congestion with Carrier Aggregation ... carrier aggregation ... Carrier aggregation

Link Aggregation Group and Link Aggregation Control … · Link Aggregation Group and Link Aggregation Control Protocol ThischapterdescribesLinkAggregationGroup,LinkAggregationControlProtocol,andmanualload

Chapter 6: SQL –Data Retrieval - db.in.tum.de · Chapter 6: SQL –Data Retrieval Content: •How do we access data in a database using SQL Next: •Aggregation in SQL. 6-Dec-18

Media Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval Information Retrieval Image Retrieval Video Retrieval Audio Retrieval

Cellular Automata and Random Boolean Networks CA RBNusers.sussex.ac.uk/~inmanh/easy/alife10/Lec07/lec07.pdf · Readable pop sci on CAs: William Poundstone "The Recursive Universe"

Aggregation and Secure Aggregation. Learning Objectives Understand why we need aggregation in WSNs Understand aggregation protocols in WSNs Understand

CS1102 Lec07 – Digital Media

1 Improving large-scale image retrieval through robust ...epubs.surrey.ac.uk/812468/1/RVDW_PAMI.pdf · 1 Improving large-scale image retrieval through robust aggregation of local

Lec07 Physical Assess Neonates

Prediction of “Aggregation-prone” and “Aggregation

Retrieval: Multiple Tables and Aggregation · Retrieval: Multiple Tables and Aggregation T his chapter resumes the discussion of the retrieval possibilities of the SQL language. It

Configuring Link Aggregation Group and Link Aggregation ... · Configuring Link Aggregation Group and Link Aggregation Control Protocol ThischapterdescribesLinkAggregationGroup,LinkAggregationControlProtocol,andmanualload

Performing aggregation and ellipsis using discourse … · Performing aggregation and ellipsis using ... also distinguishes four types: syntactic aggregation, ... Performing aggregation

Configuring Link Aggregation Group and Link Aggregation

Web scale image retrieval using compact tensor aggregation ...gosselin/pdf/negrel13mm.pdfWeb scale image retrieval using compact tensor aggregation of visual descriptors Romain Negrel,

Multiple Visual-Semantic Embedding for Video Retrieval ... · guitar Similarity Aggregation s N Final Similarity Global Visual Network Textual Embedding Network Sequential Visual

Lec07 Bacterial Gene Mapping 2014

lec07-deadlock.ppt - University of California, Berkeleycs162/fa13/Lectures/lec07-deadlock... · – Solutions for breaking and avoiding deadlock Note: Some slides and/or pictures

EECS150 - Digital Design Lecture 7 - High-Level Design ...cs150/fa13/agenda/lec/lec07-hld1.pdf · 9/19/2013 2 Fall 2013 EECS150 - Lec07-hld1 Page 3 Introduction • High-level Design

Lec07, Image I (Acquisition and Representation), v1.07m.ppt