Lec12 review-part-i

Preview:

Citation preview

Image Analysis & Retrieval

CS/EE 5590 Special Topics (Class Ids: 44873, 44874)

Fall 2016, M/W 4-5:15pm@Bloch 0012

Lec 12

Review of Part I:

Features and Classifiers in Image Classification

Zhu Li

Dept of CSEE, UMKC

Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346.

http://l.web.umkc.edu/lizhu

Z. Li, Image Analysis & Retrv. 2016 p.1

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.2

Homework-2: Aggregation

Data Set:

6 class, 6x20=120 images

Global Codebook for VLAD, FisherVector

Compute SIFT and DenseSIFT from each image, sort them into two arrays: hw2_sift: hw2_dsift

Random sample a subset of them: use randperm()

Compute PCA:

o indx = randperm(n_sift); indx = indx(1:20*2^10); % sample 20k

o [A0, s, eigv]=princomp(hw2_sift(indx,: ));

Project SIFT

o Dimension reduction: x = sift*A0(:,1:kd)

Compute Kmeans and GMM models for VLAD and FV aggregation:

o vl_feat: vl_kmeans(), vl_gmm();

Z. Li, Image Analysis & Retrv. 2016 p.3

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {πœ‡π‘˜}

Compute aggregated differences

L2 normalize

Final feature: k x d

Image Analysis & Understanding, 2015 p.4

3

x

v1 v2v3 v4

v5

1

4

2

5

β‘  assign descriptors

β‘‘ compute x- i

β‘’ vi=sum x- i for cell i

π‘£π‘˜ =

βˆ€π‘—,𝑠.𝑑.𝑁𝑁 π‘₯𝑗 =πœ‡π‘˜

π‘₯𝑗 βˆ’ πœ‡π‘˜

π‘£π‘˜ = π‘£π‘˜/ π‘£π‘˜ 2

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

Image Analysis & Understanding, 2015 p.5

𝐹𝑉 = 𝑒1, 𝑒2, … , 𝑒𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {πœ‡π‘˜ , Ξ£π‘˜ , πœ‹π‘˜}

Normalize by Fisher Kernel

HW-2 Deliverables

Functions:

[gmm]=getGMM(sift, A, kd, n); %gmm.m, gmm.v, gmm.p

[km]=getKMeans(sift, A, kd, n); % km: n x kd

[vlad]=getVladSift(im, km, opt); % opt = β€˜sift’, or β€˜dsft’

[fv]=getFVSift(im, gmm, opt); % can use vl_fisher()

Plots:

For each image: {sift, dsift} x {vlad, fv} x {kd=[24,32]} x{n=[64, 128]}, total 16 features per image, plot the 120x120 distances for each feature. [hint: use subplot(2,2,k), 4 plots per fig]

TPR-FPR: for each class, plot one figure with 16 plots,

mAP: one plot/table

Z. Li, Image Analysis & Retrv. 2016 p.6

Quiz-1

Covers what we reviewed today

Format:

True or False

Multiple Choices

Problem Solving

Extra Credit (25%)

Close Book, 1-page A4 cheating sheet allowed.

cheating sheet need to be turned in with name and id number

cheating sheet will also be graded for up to 10 pts

Relax

Quiz is actually on me.

Z. Li, Image Analysis & Retrv. 2016 p.7

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.8

Classification in Image Retrieval

A typical image retrieval pipeline

p.9

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

Homography,Color space

Color histogramFiltering, Edge DetectionHoG, Harris Detector, SIFT

BowVLADFisher VectorSupervector

Image Formation - Geometry

Image Analysis & Retrieval, 2016 p.10

X0IKx

10100

000

000

1z

y

x

f

f

v

u

w

K

Slide Credit: GaTech Computer Vision class, 2015.

Intrinsic Assumptionsβ€’ Unit aspect ratio

β€’ Optical center at (0,0)β€’ No skew of pixels

Extrinsic Assumptionsβ€’ No rotationβ€’ Camera at (0,0,0)

X

x

Image Formation: Homography

In general, homography H maps 2d points according to, x’=Hx

Up to a scale, as [x, y, w]=[sx, sy, sw], so H has 8 DoF

Affine Transform: 6 DoF: Contains a translation [t1, t2], and invertable

affine matrix A[2x2]

Similarity Transform, 4 DoF: a rigid transform that preserves distance if

s=1:

Image Analysis & Retrieval, 2016 p.11

π‘₯′𝑦′

𝑀′

=

β„Ž1 β„Ž2 β„Ž3β„Ž4 β„Ž5 β„Ž6β„Ž7 β„Ž8 β„Ž9

π‘₯𝑦𝑀

H =π‘Ž1 π‘Ž2 𝑑1π‘Ž3 π‘Ž4 𝑑20 0 1

H =𝑠 π‘π‘œπ‘  πœƒ βˆ’π‘  sin(πœƒ) 𝑑1𝑠 𝑠𝑖𝑛 πœƒ 𝑠 cos(πœƒ) 𝑑2

0 0 1

VL_FEAT Implementation

VL_FEAT has an implementation:

p.12Z. Li, Image Analysis & Retrv. 2016

Image Formation – Color Space

RGB Color Space:

8 bit Red, Green, Blue Channels

Mixed together

YUV/YCbCr Color Space

HSV Color Space

Hue: color in polar coord

Saturation

Value

p.13Z. Li, Image Analysis & Retrv. 2016

Color Histogram – Fixed Codebook

Color Histogram:

Fixed code book to generate histogram

Example: MPEG 7 Scalable Color Descriptor

HSV Color Space, 16x4x4 partition for bins

Scalability thru quantization and Haar Transform

p.14Z. Li, Image Analysis & Retrv. 2016

Color Histogram – Adaptive Codebook

Adaptive Color Histogram

No fixed code book

Example: MPEG 7 Dominant Color Descriptor

Distance Metric

Not index-able

p.15Z. Li, Image Analysis & Retrv. 2016

Image Filters

Basic Filter Operations

Region of Support

p.16

1 1

1 1 1

1 1 1*

1/9 1/9 1/9

1/9 1/9 1/9

1/9 1/9 1/9

Z. Li, Image Analysis & Retrv. 2016

Filter Properties

Linear Filters are:

Shift-Invariant:

f(m-k, n-j)*h = g(m-k, n-j), if f*h=g

Associative:

f*h1*h2 = f*(h1*h2)

this can save a lot of complexity

Distributive:

f*h1 + f*h2 = f*(h1+h2)

useful in SIFT’s DoG filtering.

p.17Z. Li, Image Analysis & Retrv. 2016

Edge Detection

The need of smoothing

Gaussian smoothing

Combine differentiation and Gaussian

p.18

f

derivative of Gaussian (x)

Z. Li, Image Analysis & Retrv. 2016

Image Gradient from DoG Filtering

Image Gradient (at a scale):

p.19

The gradient points in the direction of most rapid increase in intensity

The edge strength is given by the gradient magnitude:

The gradient direction is given by:

Z. Li, Image Analysis & Retrv. 2016

HOG – Histogram of Oriented Gradients

Generate local histogram per block

Size of HoG: n*m*4*9=36nm, for n x m cells.

p.20

12

0

10

19747

11

0

10

19545

30251510

2525150

80707050

40303020

510105

1020205

Angle Magnitude

0

20

40

60

80

100

120

140

160

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

0

1

2

3

4

5

0-19 20-39 40-59 60-79 80-99 100-119 120-139 140-159 160-179

Binary voting Magnitude voting

4

9

4

1

3

9

3

1

2

9

2

1

1

9

1

1 ,...,,,...,,,...,,,..., hhhhhhhhf

HOG Cell size

Z. Li, Image Analysis & Retrv. 2016

Smoothed 2nd moment:

Harris Detector

p.21

)()(

)()()(),(

2

2

DyDyx

DyxDx

IDIIII

IIIg

1. Image derivatives (optionally blur first)

2. Square of derivatives

3. Gaussian filter g(I)

Ix Iy

Ix2

Iy2

IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg

])),([trace()],(det[ 2

DIDIharrisR

4. Harris Function – detect corners, no need to compute the

eigen values

har

1 2

1 2

det

trace

M

M

Z. Li, Image Analysis & Retrv. 2016

Harris Detector Steps

Response function -> Threshold -> Local Maxima

Harris Detector NOT scale invariant

p.22

R(x,y)

Z. Li, Image Analysis & Retrv. 2016

Scale Space Theory - Lindeberg

Scale Space Response via Laplacian of Gaussian

The scale is controlled by 𝜎

Characteristic Scale:

p.23

2

2

2

22

y

g

x

gg

𝑔 = π‘’βˆ’ π‘₯+𝑦 2

2𝜎

r

image𝜎 = 0.8π‘Ÿ 𝜎 = 1.2π‘Ÿ 𝜎 = 2π‘Ÿ

…

characteristic scale

Z. Li, Image Analysis & Retrv. 2016

SIFT Detector via Difference of Gaussian

However, LoG computing is expensive – non-seperatable filter

SIFT: uses Difference of Gaussian filters to approximate LoG

Separate-able, much faster

Box Filter Approximation – you will see later.

p.24

LoG

Scale space construction By Gaussian Filtering, and Image Difference

Z. Li, Image Analysis & Retrv. 2016

Peak Strength & Edge Removal

Peak Strength:

Interpolate true DoG response and pixel location by Taylor expansion

Edge Removal:

Re-do Harris type detection to remove edge on much reduced pixel set

p.25Z. Li, Image Analysis & Retrv. 2016

Scale Invariance thru Dominant Orientation Coding

Voting for the dominant orientation

Weighted by a Gaussian window to give more emphasis to the gradients closer to the center

p.26Z. Li, Image Analysis & Retrv. 2016

SIFT Description

Rotate to dominant direction, divide Image Patch (16x16 pixels) around SIFT point into 4x4 Cells

For each cell, compute an 8-bin edge histogram

Overall, we have 4x4x8=128 dimension

p.27

0 2

angle histogram

Spatial-scale space extrema

Z. Li, Image Analysis & Retrv. 2016

SIFT Matching and Repeatability Prediction

SIFT Distance – True Love Test

Not all SIFT are created equal…

Peak strength (DoG response at interpolated position)

p.28

Combined scale/peak strength pmf

𝑑(𝑠11, π‘ π‘˜βˆ—

2 )

𝑑(𝑠11, π‘ π‘˜

2)≀ πœƒ

Z. Li, Image Analysis & Retrv. 2016

Outline

HW-2

Quiz-1

Review part I: Image Features & Classifiers

Image Formation: Homography

Color Space & Color Features

Filtering and Edge Features

Harris Detector

SIFT

Aggregation: BoW, VLAD, Fisher, and Supervector

Image Retrieval System Metrics: TPR/Precision, FPR, Recall, mAP

Classification: kNN, GMM Bayesian, SVM

Z. Li, Image Analysis & Retrv. 2016 p.29

Why Aggregation ?

Curse of Dimensionality

Decision Boundary / Indexing

p.30

+

…..

+

+

+

Z. Li, Image Analysis & Retrv. 2016

Histogram Aggregation: Bag-of-Words

Codebook:

Feature space: Rd, k-means to get k centroids, {πœ‡1, πœ‡2, … , πœ‡π‘˜}

BoW Hard Encoding:

For n feature points,{x1, x2, …,xn} assignment matrix: kxn, with column only 1-non zero entry

Aggregated dimension: k

p.31

k

n

Z. Li, Image Analysis & Retrv. 2016

Histogram Aggregation: Kernel Code Book Soft Encoding

Kernel Code Book Soft Encoding

Kernel Affinity: 𝐾 π‘₯𝑗 , πœ‡π‘˜ = π‘’βˆ’π‘˜|π‘₯π‘—βˆ’πœ‡π‘˜|2

Assignment Matrix: 𝐴𝑗,π‘˜ = 𝐾(π‘₯𝑗 , πœ‡π‘˜)/ π‘˜πΎ(π‘₯𝑗 , πœ‡π‘˜)

Encoding: k-dimensional: X(k)= 1

𝑛 𝑗𝐴𝑗,π‘˜

p.32Z. Li, Image Analysis & Retrv. 2016

VLAD- Vector of Locally Aggregated Descriptors

Aggregate feature difference from the codebook

Hard assignment by finding the NN of feature {xk} to {πœ‡π‘˜}

Compute aggregated differences

L2 normalize

Final feature: k x d

p.33

3

x

v1 v2v3 v4

v5

1

4

2

5

β‘  assign descriptors

β‘‘ compute x- i

β‘’ vi=sum x- i for cell i

π‘£π‘˜ =

βˆ€π‘—,𝑠.𝑑.𝑁𝑁 π‘₯𝑗 =πœ‡π‘˜

π‘₯𝑗 βˆ’ πœ‡π‘˜

π‘£π‘˜ = π‘£π‘˜/ π‘£π‘˜ 2

Z. Li, Image Analysis & Retrv. 2016

Fisher Vector Aggregation

FV encoding

Gradient on the mean, for GMM component k, j=1..D

In the end, we have 2K x D aggregation on the derivation w.r.t. the means and variances

p.34

𝐹𝑉 = 𝑒1, 𝑒2, … , 𝑒𝐾 , 𝑣1, 𝑣2, … , 𝑣𝐾𝑇

Richer characterization of the code book with GMM: {πœ‡π‘˜ , Ξ£π‘˜ , πœ‹π‘˜}

Normalize by Fisher Kernel

Z. Li, Image Analysis & Retrv. 2016

Supervector Aggregation

Assuming we have UBM GMM model

πœ†π‘ˆπ΅π‘€ = {π‘ƒπ‘˜ , πœ‡π‘˜ , Ξ£π‘˜},

with identical prior and covariance

Then for two utterance samples a and b, with GMM models πœ†π‘Ž = {π‘ƒπ‘˜ , πœ‡π‘˜

π‘Ž , Ξ£π‘˜},

πœ†π‘ = {π‘ƒπ‘˜ , πœ‡π‘˜π‘ , Ξ£π‘˜},

The SV distance is,

It means the means of two models need to be normalized by the UBM covariance induced Mahanolibis distance metric

This is also a linear kernel function scaled by the UBM covariances

Z. Li, Image Analysis & Retrv. 2016 p.35

𝐾 πœ†π‘Ž , πœ†π‘ =

π‘˜

π‘ƒπ‘˜Ξ£π‘˜βˆ’(

12)πœ‡π‘˜

π‘Ž

𝑇

( π‘ƒπ‘˜Ξ£π‘˜βˆ’(

12)πœ‡π‘˜

𝑏)

AKULA – Adaptive KLUster Aggregation

AKULA Descriptor:

No fixed codebook !

Outer loop: subspace optimization

Inner loop: cluster centroids + SIFT count

A1={yc11, yc1

2, …, yc1k ; pc1

1, pc12, …, pc1

k }

A2={yc21, yc2

2, …, yc2k ; pc2

1, pc22, …, pc2

k }

Distance metric:

Min centroids distance, weighted by SIFT count (kind of manifolds tangent distance)

d A1, A2 =1

π‘˜

𝑗=0

π‘˜dπ‘šπ‘–π‘›1 𝑗 π‘€π‘šπ‘–π‘›

1 (𝑗) +1

π‘˜

𝑖=0

π‘˜dπ‘šπ‘–π‘›2 𝑖 π‘€π‘šπ‘–π‘›

2 (𝑖)

dπ‘šπ‘–π‘›1 𝑗 = min

𝑖𝑑𝑗,𝑖

dπ‘šπ‘–π‘›2 𝑖 = min

𝑗𝑑𝑗,𝑖

wπ‘šπ‘–π‘›1 𝑗 = 𝑀𝑗,π‘–βˆ— , π‘–βˆ— = π‘Žπ‘Ÿπ‘”min

𝑖𝑑𝑗,𝑖

wπ‘šπ‘–π‘›2 𝑖 = π‘€π‘—βˆ—,𝑖 , π‘—βˆ— = π‘Žπ‘Ÿπ‘”min

𝑗𝑑𝑗,𝑖

p.36Z. Li, Image Analysis & Retrv. 2016

Image Retrieval Performance Metrics

True Positive Rate

False Positive Rate

Precision

Recall

F-Measure

p.37

TPR = TP/(TP+FN)

FPR = FP/(FP+TN)

Precision =TP/(TP + FP),

Recall = TP/(TP + FN)

F-measure = 2*(precision*recall)/(precision + recall)

Z. Li, Image Analysis & Retrv. 2016

Retrieval Performance: Mean Average Precision

mAP measures the retrieval performance across all queries

Image Analysis & Understanding, 2015 p.38

mAP example

MAP is computed across all query results as the average precision over the recall

Image Analysis & Understanding, 2015 p.39

Compute TPR-FPR

ROC Curve (Receiver Operating Characteristics)

p.40

Confusion/distance matrix: d(j,k)

Ground Truth: m(j,k)

π‘š 𝑗, π‘˜ = 1, π‘‘π‘œπ‘ 𝑗 π‘Žπ‘›π‘‘ π‘˜ π‘Žπ‘Ÿπ‘’ π‘šπ‘Žπ‘‘π‘β„Ž0, π‘›π‘œπ‘‘ π‘šπ‘Žπ‘‘π‘β„Ž

Z. Li, Image Analysis & Retrv. 2016

Classification in Image Retrieval

A typical image retrieval pipeline

p.41

Image Formation

Feature Computing

Feature Aggregation

Classification

Knowledge/Data Base

Z. Li, Image Analysis & Retrv. 2016

kNN Classifier

kNN is Nonlinear Classifier

Operations: majority vote

Performance bounds on k:

not worse off by 2 times the error rate of Bayesian error rate

As k increases, it improves, but not much

Accelerate NN retrieval operation

Kd-tree:

o a space partition scheme, to limit the number of data points comparison operation

Kd-tree supported kNN search:

o Approx KNN search

o Kd-Forest supported Approx Search

p.42Z. Li, Image Analysis & Retrv. 2016

Gaussian Generative Model Classifier

Shaped by the relative shape of Covariance Matrix

Matlab:

[rec_label, err]=classify(q, x, y, method);

Method = β€˜quadratic’

Z. Li, Image Analysis & Retrv. 2016 p.43

The Support Vector Solution

Why called SVM ?

p.44Z. Li, Image Analysis & Retrv. 2016

K(xi, x)

Support Vector Machine

Matlab:

p.45

% train svm

svm = svmtrain(x, y, 'showplot',

true);

% svm classification

for k=1:5

figure(41); hold on; grid

on; axis([0 1 0 1]);

[u,v]=ginput(1);

q=[u,v];

plot(u, v, '*b');

rec = svmclassify(svm, q);

fprintf('\n recognized as

%c', rec);

end

Z. Li, Image Analysis & Retrv. 2016

Summary

Image Analysis & Retrieval Part I

Image Formation

Color and Texture Features

Interest Points Detection – Harris Detector

Invariant Interest Point Detection – SIFT

Aggregation

Classification tools: knn, gmm, svm

Performance metrics:

o tp, fp, tn, fn, tpr/precision, fpr, recall, mAP

Part II: Holistic approach

Just throw the pixels to the algorithm to figure out what are the good features.

Z. Li, Image Analysis & Retrv. 2016 p.46

Recommended