Lec12 review-part-i

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 12

Review of Part I:

Features and Classifiers in Image Classification

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.

http://l.web.umkc.edu/lizhu

Z. Li, Image Analysis & Retrv. 2016 p.1

Outline

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Homework-2: Aggregation

Data Set:

6 class, 6x20=120 images

Global Codebook for VLAD, FisherVector

Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift

Random sample a subset of them: use randperm()

Compute PCA:

o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k

o [A0, s, eigv]=princomp(hw2_sift(indx,: ));

Project SIFT

o Dimension reduction: x = sift*A0(:,1:kd)

Compute Kmeans and GMM models for VLAD and FV aggregation:

o vl_feat: vl_kmeans(), vl_gmm();

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

Image Analysis & Understanding, 2015 p.4

v1 v2v3 v4

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel

HW-2 Deliverables

Functions:

[gmm]=getGMM(sift, A, kd, n); %gmm.m, gmm.v, gmm.p

[km]=getKMeans(sift, A, kd, n); % km: n x kd

[vlad]=getVladSift(im, km, opt); % opt = ‘sift’, or ‘dsft’

[fv]=getFVSift(im, gmm, opt); % can use vl_fisher()

Plots:

For each image: {sift, dsift} x {vlad, fv} x {kd=[24,32]} x{n=[64, 128]}, total 16 features per image, plot the 120x120 distances for each feature. [hint: use subplot(2,2,k), 4 plots per fig]

TPR-FPR: for each class, plot one figure with 16 plots,

mAP: one plot/table

Quiz-1

Covers what we reviewed today

Format:

True or False

Multiple Choices

Problem Solving

Extra Credit (25%)

Close Book, 1-page A4 cheating sheet allowed.

cheating sheet need to be turned in with name and id number

cheating sheet will also be graded for up to 10 pts

Quiz is actually on me.

Outline

Quiz-1

Harris Detector

Classification in Image Retrieval

A typical image retrieval pipeline

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT

BowVLADFisher VectorSupervector

Image Formation - Geometry

Image Analysis & Retrieval, 2016 p.10

Slide Credit: GaTech Computer Vision class, 2015.

Intrinsic Assumptions• Unit aspect ratio

• Optical center at (0,0)• No skew of pixels

Extrinsic Assumptions• No rotation• Camera at (0,0,0)

In general, homography H maps 2d points according to, x’=Hx

Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF

Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable

affine matrix A[2x2]

Similarity Transform, 4 DoF: a rigid transform that preserves distance if

Image Analysis & Retrieval, 2016 p.11

𝑥′𝑦′

𝑤′

ℎ1 ℎ2 ℎ3ℎ4 ℎ5 ℎ6ℎ7 ℎ8 ℎ9

𝑥𝑦𝑤

H =𝑎1 𝑎2 𝑡1𝑎3 𝑎4 𝑡20 0 1

H =𝑠 𝑐𝑜𝑠 𝜃 −𝑠 sin(𝜃) 𝑡1𝑠 𝑠𝑖𝑛 𝜃 𝑠 cos(𝜃) 𝑡2

VL_FEAT Implementation

VL_FEAT has an implementation:

p.12Z. Li, Image Analysis & Retrv. 2016

Image Formation – Color Space

RGB Color Space:

8 bit Red, Green, Blue Channels

Mixed together

YUV/YCbCr Color Space

HSV Color Space

Hue: color in polar coord

Saturation

Color Histogram – Fixed Codebook

Color Histogram:

Fixed code book to generate histogram

Example: MPEG 7 Scalable Color Descriptor

HSV Color Space, 16x4x4 partition for bins

Scalability thru quantization and Haar Transform

Color Histogram – Adaptive Codebook

Adaptive Color Histogram

No fixed code book

Example: MPEG 7 Dominant Color Descriptor

Distance Metric

Not index-able

Image Filters

Basic Filter Operations

Region of Support

1 1 1*

1/9 1/9 1/9

Filter Properties

Linear Filters are:

Shift-Invariant:

f(m-k, n-j)*h = g(m-k, n-j), if f*h=g

Associative:

f*h1*h2 = f*(h1*h2)

this can save a lot of complexity

Distributive:

f*h1 + f*h2 = f*(h1+h2)

useful in SIFT’s DoG filtering.

Edge Detection

The need of smoothing

Gaussian smoothing

Combine differentiation and Gaussian

derivative of Gaussian (x)

Image Gradient from DoG Filtering

Image Gradient (at a scale):

The gradient points in the direction of most rapid increase in intensity

The edge strength is given by the gradient magnitude:

The gradient direction is given by:

HOG – Histogram of Oriented Gradients

Generate local histogram per block

Size of HoG: n*m*4*9=36nm, for n x m cells.

30251510

2525150

80707050

40303020

510105

1020205

Angle Magnitude

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Binary voting Magnitude voting

1 ,...,,,...,,,...,,,..., hhhhhhhhf

HOG Cell size

Smoothed 2nd moment:

Harris Detector

)()()(),(

IDIIII

1. Image derivatives (optionally blur first)

2. Square of derivatives

3. Gaussian filter g(I)

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg

])),([trace()],(det[ 2

DIDIharrisR

4. Harris Function – detect corners, no need to compute the

eigen values

Harris Detector Steps

Response function -> Threshold -> Local Maxima

Harris Detector NOT scale invariant

R(x,y)

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian

The scale is controlled by 𝜎

Characteristic Scale:

𝑔 = 𝑒− 𝑥+𝑦 2

image𝜎 = 0.8𝑟 𝜎 = 1.2𝑟 𝜎 = 2𝑟

characteristic scale

SIFT Detector via Difference of Gaussian

However, LoG computing is expensive – non-seperatable filter

SIFT: uses Difference of Gaussian filters to approximate LoG

Separate-able, much faster

Box Filter Approximation – you will see later.

Scale space construction By Gaussian Filtering, and Image Difference

Peak Strength & Edge Removal

Peak Strength:

Interpolate true DoG response and pixel location by Taylor expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation

Weighted by a Gaussian window to give more emphasis to the gradients closer to the center

SIFT Description

Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells

For each cell, compute an 8-bin edge histogram

Overall, we have 4x4x8=128 dimension

angle histogram

Spatial-scale space extrema

SIFT Matching and Repeatability Prediction

SIFT Distance – True Love Test

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

Combined scale/peak strength pmf

𝑑(𝑠11, 𝑠𝑘∗

𝑑(𝑠11, 𝑠𝑘

2)≤ 𝜃

Outline

Quiz-1

Harris Detector

Why Aggregation ?

Curse of Dimensionality

Decision Boundary / Indexing

Histogram Aggregation: Bag-of-Words

Codebook:

Feature space: Rd, k-means to get k centroids, {𝜇1, 𝜇2, … , 𝜇𝑘}

BoW Hard Encoding:

For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry

Aggregated dimension: k

Histogram Aggregation: Kernel Code Book Soft Encoding

Kernel Code Book Soft Encoding

Kernel Affinity: 𝐾 𝑥𝑗 , 𝜇𝑘 = 𝑒−𝑘|𝑥𝑗−𝜇𝑘|2

Assignment Matrix: 𝐴𝑗,𝑘 = 𝐾(𝑥𝑗 , 𝜇𝑘)/ 𝑘𝐾(𝑥𝑗 , 𝜇𝑘)

Encoding: k-dimensional: X(k)= 1

𝑛 𝑗𝐴𝑗,𝑘

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {𝜇𝑘}

Compute aggregated differences

L2 normalize

Final feature: k x d

v1 v2v3 v4

① assign descriptors

② compute x- i

③ vi=sum x- i for cell i

𝑣𝑘 =

∀𝑗,𝑠.𝑡.𝑁𝑁 𝑥𝑗 =𝜇𝑘

𝑥𝑗 − 𝜇𝑘

𝑣𝑘 = 𝑣𝑘/ 𝑣𝑘 2

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

𝐹𝑉 = 𝑢1, 𝑢2, … , 𝑢𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {𝜇𝑘 , Σ𝑘 , 𝜋𝑘}

Normalize by Fisher Kernel

Supervector Aggregation

Assuming we have UBM GMM model

𝜆𝑈𝐵𝑀 = {𝑃𝑘 , 𝜇𝑘 , Σ𝑘},

with identical prior and covariance

Then for two utterance samples a and b, with GMM models 𝜆𝑎 = {𝑃𝑘 , 𝜇𝑘

𝑎 , Σ𝑘},

𝜆𝑏 = {𝑃𝑘 , 𝜇𝑘𝑏 , Σ𝑘},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric

This is also a linear kernel function scaled by the UBM covariances

𝐾 𝜆𝑎 , 𝜆𝑏 =

𝑃𝑘Σ𝑘−(

12)𝜇𝑘

( 𝑃𝑘Σ𝑘−(

12)𝜇𝑘

AKULA – Adaptive KLUster Aggregation

AKULA Descriptor:

No fixed codebook !

Outer loop: subspace optimization

Inner loop: cluster centroids + SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

Distance metric:

Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance)

d A1, A2 =1

𝑗=0

𝑘d𝑚𝑖𝑛1 𝑗 𝑤𝑚𝑖𝑛

1 (𝑗) +1

𝑖=0

𝑘d𝑚𝑖𝑛2 𝑖 𝑤𝑚𝑖𝑛

2 (𝑖)

d𝑚𝑖𝑛1 𝑗 = min

𝑖𝑑𝑗,𝑖

d𝑚𝑖𝑛2 𝑖 = min

𝑗𝑑𝑗,𝑖

w𝑚𝑖𝑛1 𝑗 = 𝑤𝑗,𝑖∗ , 𝑖∗ = 𝑎𝑟𝑔min

𝑖𝑑𝑗,𝑖

w𝑚𝑖𝑛2 𝑖 = 𝑤𝑗∗,𝑖 , 𝑗∗ = 𝑎𝑟𝑔min

𝑗𝑑𝑗,𝑖

Image Retrieval Performance Metrics

True Positive Rate

False Positive Rate

Precision

Recall

F-Measure

TPR = TP/(TP+FN)

FPR = FP/(FP+TN)

Precision =TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Retrieval Performance: Mean Average Precision

mAP measures the retrieval performance across all queries

mAP example

MAP is computed across all query results as the average precision over the recall

Compute TPR-FPR

ROC Curve (Receiver Operating Characteristics)

Confusion/distance matrix: d(j,k)

Ground Truth: m(j,k)

𝑚 𝑗, 𝑘 = 1, 𝑑𝑜𝑐 𝑗 𝑎𝑛𝑑 𝑘 𝑎𝑟𝑒 𝑚𝑎𝑡𝑐ℎ0, 𝑛𝑜𝑡 𝑚𝑎𝑡𝑐ℎ

Classification in Image Retrieval

A typical image retrieval pipeline

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

kNN Classifier

kNN is Nonlinear Classifier

Operations: majority vote

Performance bounds on k:

not worse off by 2 times the error rate of Bayesian error rate

As k increases, it improves, but not much

Accelerate NN retrieval operation

Kd-tree:

o a space partition scheme, to limit the number of data points comparison operation

Kd-tree supported kNN search:

o Approx KNN search

o Kd-Forest supported Approx Search

Gaussian Generative Model Classifier

Shaped by the relative shape of Covariance Matrix

Matlab:

[rec_label, err]=classify(q, x, y, method);

Method = ‘quadratic’

The Support Vector Solution

Why called SVM ?

K(xi, x)

Support Vector Machine

Matlab:

% train svm

svm = svmtrain(x, y, 'showplot',

true);

% svm classification

for k=1:5

figure(41); hold on; grid

on; axis([0 1 0 1]);

[u,v]=ginput(1);

q=[u,v];

plot(u, v, '*b');

rec = svmclassify(svm, q);

fprintf('\n recognized as

%c', rec);

Summary

Image Analysis & Retrieval Part I

Image Formation

Color and Texture Features

Interest Points Detection – Harris Detector

Invariant Interest Point Detection – SIFT

Aggregation

Classification tools: knn, gmm, svm

Performance metrics:

o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP

Part II: Holistic approach

Just throw the pixels to the algorithm to figure out what are the good features.

Lec12 review-part-i

Education

Lec12 Quality

Lec12 2D and 3D Graphics.pdf

Lec12 Video

lec12web.mit.edu/sahughes/www/8.033/lec12.pdf · 2016. 1. 4. · elc) Joes . Title: lec12 Author: Scott A. Hughes Created Date: 10/25/2011 5:20:01 PM

lec12 - Rutgers University · 2015. 4. 28. · Title: lec12.key Created Date: 4/28/2015 9:30:21 PM

Lec12 caches performance comp architecture

Lec12 Basic Scripts

Ecology Lec12

lec12 Denis HMS pair of pants - University of California, Berkeleyauroux/290f15-notes/lec12... · 2015. 11. 25. · lec12_Denis_HMS_pair_of_pants Author: Catherine Cannizzo Created

Lec12 MM1k Queueing System2

Lec12 - College of Engineeringlinx/ECE201/notes_2012/lec12.pdf · Lec12 Thursday, February 02, 2012 5:12 PM lec12 Page 1 . Review: - V-I characteristics of 2-terminal networks - Source

Lec12 Basic Computer Organization

Lec12 - University of Massachusetts Amherstlass.cs.umass.edu/~shenoy/courses/spring13/lectures/Lec... · 2013. 3. 4. · Title: Lec12 Author: Prashant Shenoy Created Date: 3/4/2013

Lec12-mwf - Purdue University College of Engineering · 2015. 2. 13. · Lec12-mwf Wednesday, January 28, 2009 11:33 AM lec12-mwf Page 1 . Routing Sunday, January 25, 2009 11:45 AM

MAE 493N 593T Lec12

Lec12 - Faculty Web Sitesfaculty.tamuc.edu/cbertulani/ast/lectures/Lec12.pdf · 2017. 3. 21. · Title: Lec12.pptx Author: Carlos Bertulani Created Date: 3/21/2017 6:40:28 PM

Lec12 Introduction to SPICE - SJTU

ARQ-Error correction-Lec12

Microbiology lec12,klebsiella&salmonella

Lec12 4slides Per Page