66
CS598:VISUAL INFORMATION RETRIEVAL Lecture III: Image Representation: Invariant Local Image Descriptors

CS598:Visual information Retrieval

  • Upload
    marlin

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

CS598:Visual information Retrieval. Lecture III: Image Representation: Invariant Local Image Descriptors. Recap of Lecture II. Color, texture, descriptors Color histogram Color correlogram LBP descriptors Histogram of oriented gradient Spatial pyramid matching - PowerPoint PPT Presentation

Citation preview

Page 1: CS598:Visual information Retrieval

CS598:VISUAL INFORMATION RETRIEVALLecture III: Image Representation: Invariant Local Image Descriptors

Page 2: CS598:Visual information Retrieval

RECAP OF LECTURE II Color, texture, descriptors

Color histogram Color correlogram LBP descriptors Histogram of oriented gradient Spatial pyramid matching

Distance & Similarity measure Lp distances Chi-Square distances KL distances EMD distances Histogram intersection

Page 3: CS598:Visual information Retrieval

LECTURE III: PART ILocal Feature Detector

Page 4: CS598:Visual information Retrieval

OUTLINE Blob detection

Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region

Page 5: CS598:Visual information Retrieval

GAUSSIAN KERNEL

• Constant factor at front makes volume sum to 1 (can be ignored when computing the filter values, as we should renormalize weights to sum to 1 in any case)

0.003 0.013 0.022 0.013 0.0030.013 0.059 0.097 0.059 0.0130.022 0.097 0.159 0.097 0.0220.013 0.059 0.097 0.059 0.0130.003 0.013 0.022 0.013 0.003

5 x 5, = 1

Source: C. Rasmussen

Page 6: CS598:Visual information Retrieval

GAUSSIAN KERNEL

• Standard deviation : determines extent of smoothingSource: K. Grauman

σ = 2 with 30 x 30 kernel

σ = 5 with 30 x 30 kernel

Page 7: CS598:Visual information Retrieval

CHOOSING KERNEL WIDTH• The Gaussian function has infinite support,

but discrete filters use finite kernels

Source: K. Grauman

Page 8: CS598:Visual information Retrieval

CHOOSING KERNEL WIDTH• Rule of thumb: set filter half-width to about 3σ

Page 9: CS598:Visual information Retrieval

GAUSSIAN VS. BOX FILTERING

Page 10: CS598:Visual information Retrieval

GAUSSIAN FILTERS• Remove “high-frequency” components from

the image (low-pass filter)• Convolution with self is another Gaussian

So can smooth with small- kernel, repeat, and get same result as larger- kernel would have

Convolving two times with Gaussian kernel with std. dev. σ is same as convolving once with kernel with std. dev.

• Separable kernel Factors into product of two 1D Gaussians

Source: K. Grauman

2

Page 11: CS598:Visual information Retrieval

SEPARABILITY OF THE GAUSSIAN FILTER

Source: D. Lowe

Page 12: CS598:Visual information Retrieval

SEPARABILITY EXAMPLE

*

*

=

=

2D convolution(center location only)

Source: K. Grauman

The filter factorsinto a product of 1D

filters:

Perform convolutionalong rows:

Followed by convolutionalong the remaining column:

Page 13: CS598:Visual information Retrieval

WHY IS SEPARABILITY USEFUL?• What is the complexity of filtering an n×n

image with an m×m kernel? O(n2 m2)

• What if the kernel is separable?O(n2 m)

Page 14: CS598:Visual information Retrieval

OUTLINE Blob detection

Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region

Page 15: CS598:Visual information Retrieval

BLOB DETECTION IN 2D Laplacian of Gaussian: Circularly symmetric

operator for blob detection in 2D

2

2

2

22

yg

xgg

Page 16: CS598:Visual information Retrieval

BLOB DETECTION IN 2D Laplacian of Gaussian: Circularly symmetric

operator for blob detection in 2D

2

2

2

222

norm yg

xgg Scale-normalized:

Page 17: CS598:Visual information Retrieval

• At what scale does the Laplacian achieve a maximum response to a binary circle of radius r?

SCALE SELECTION

r

image Laplacian

Page 18: CS598:Visual information Retrieval

• At what scale does the Laplacian achieve a maximum response to a binary circle of radius r?

• To get maximum response, the zeros of the Laplacian have to be aligned with the circle

• The Laplacian is given by:

• Therefore, the maximum response occurs at

SCALE SELECTION

r

image

62/)(222 2/)2(222

yxeyx .2/r

circle

Laplacian

0

Page 19: CS598:Visual information Retrieval

CHARACTERISTIC SCALE• We define the characteristic scale of a blob

as the scale that produces peak of Laplacian response in the blob center

characteristic scale

T. Lindeberg (1998). "Feature detection with automatic scale selection." International Journal of Computer Vision 30 (2): pp 77--116.

Page 20: CS598:Visual information Retrieval

SCALE-SPACE BLOB DETECTOR1. Convolve image with scale-normalized

Laplacian at several scales

Page 21: CS598:Visual information Retrieval

SCALE-SPACE BLOB DETECTOR: EXAMPLE

Page 22: CS598:Visual information Retrieval

SCALE-SPACE BLOB DETECTOR: EXAMPLE

Page 23: CS598:Visual information Retrieval

SCALE-SPACE BLOB DETECTOR1. Convolve image with scale-normalized

Laplacian at several scales2. Find maxima of squared Laplacian response

in scale-space

Page 24: CS598:Visual information Retrieval

SCALE-SPACE BLOB DETECTOR: EXAMPLE

Page 25: CS598:Visual information Retrieval

OUTLINE Blob detection

Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region

Page 26: CS598:Visual information Retrieval

Approximating the Laplacian with a difference of Gaussians:

2 ( , , ) ( , , )xx yyL G x y G x y

( , , ) ( , , )DoG G x y k G x y

(Laplacian)

(Difference of Gaussians)

EFFICIENT IMPLEMENTATION

Page 27: CS598:Visual information Retrieval

EFFICIENT IMPLEMENTATION

David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Page 28: CS598:Visual information Retrieval

INVARIANCE AND COVARIANCE PROPERTIES• Laplacian (blob) response is invariant w.r.t.

rotation and scaling• Blob location and scale is covariant w.r.t.

rotation and scaling• What about intensity change?

Page 29: CS598:Visual information Retrieval

OUTLINE Blob detection

Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region

Page 30: CS598:Visual information Retrieval

ACHIEVING AFFINE COVARIANCE• Affine transformation approximates viewpoint

changes for roughly planar objects and roughly orthographic cameras

Page 31: CS598:Visual information Retrieval

ACHIEVING AFFINE COVARIANCE

RRIIIIII

yxwMyyx

yxx

yx

2

112

2

, 00

),(

direction of the slowest

change

direction of the fastest

change

(max)-1/2

(min)-1/2

Consider the second moment matrix of the window containing the blob:

const][

vu

Mvu

Recall:

This ellipse visualizes the “characteristic shape” of the window

Page 32: CS598:Visual information Retrieval

AFFINE ADAPTATION EXAMPLE

Scale-invariant regions (blobs)

Page 33: CS598:Visual information Retrieval

AFFINE ADAPTATION EXAMPLE

Affine-adapted blobs

Page 34: CS598:Visual information Retrieval

FROM COVARIANT DETECTION TO INVARIANT DESCRIPTION• Geometrically transformed versions of the same neighborhood

will give rise to regions that are related by the same transformation

• What to do if we want to compare the appearance of these image regions?

Normalization: transform these regions into same-size circles

Page 35: CS598:Visual information Retrieval

• Problem: There is no unique transformation from an ellipse to a unit circle

We can rotate or flip a unit circle, and it still stays a unit circle

AFFINE NORMALIZATION

Page 36: CS598:Visual information Retrieval

• To assign a unique orientation to circular image windows:

Create histogram of local gradient directions in the patch

Assign canonical orientation at peak of smoothed histogram

ELIMINATING ROTATION AMBIGUITY

0 2

Page 37: CS598:Visual information Retrieval

FROM COVARIANT REGIONS TO INVARIANT FEATURES

Extract affine regions Normalize regionsEliminate rotational

ambiguityCompute appearance

descriptors

SIFT (Lowe ’04)

Page 38: CS598:Visual information Retrieval

INVARIANCE VS. COVARIANCE Invariance:

features(transform(image)) = features(image) Covariance:

features(transform(image)) =transform(features(image))

Covariant detection => invariant description

Page 39: CS598:Visual information Retrieval

LECTURE III: PART ILearning Local Feature Descriptor

David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110• 6291 as of 02/28/2010; 22481 as of 02/03/2014• Our goal is to design the best local image descriptors

in the world.

Page 40: CS598:Visual information Retrieval

Panoramic stitching [Brown et al. CVPR05]

3D reconstruction [Snavely et al. SIGGRAPH06]

Image databases [Mikolajczyk & Schmid ICCV01]

Object or location recognition [Nister & Stewenius CVPR06] [Schindler et al. CVPR07]

Robot navigation [Deans et al. AC05]

Courtesy of Seitz & Szeliski

Real-worldFace recognition [Wright & Hua CVPR09][Hua & Akbarzadeh ICCV09]

Page 41: CS598:Visual information Retrieval

TYPICAL MATCHING PROCESS

Image 1 Image 2

Image 1 Image 2

Interest point/region detection (sparse)

Dense sampling of image patches

Page 42: CS598:Visual information Retrieval

TYPICAL MATCHING PROCESSImage 1 Image 2

P

Q

Descriptor space

Page 43: CS598:Visual information Retrieval

PROBLEM TO SOLVE

To obtain the most discriminative, compact, and computationally efficient local image descriptors. How can we get ground truth data? What is the form of the descriptor function f(.)? What is the measure for optimality?

Learning a function of a local image patchdescriptor = f ( )

s.t. a nearest neighbor classifier is optimal

Page 44: CS598:Visual information Retrieval

HOW CAN WE GET GROUND TRUTH DATA?

Page 45: CS598:Visual information Retrieval

TRAINING DATA

[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]

3D PointCloud

Multiview stereo = Training data

Page 46: CS598:Visual information Retrieval

TRAINING DATA

[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]

3D PointCloud

Page 47: CS598:Visual information Retrieval

TRAINING DATA3D PointCloud

[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]

Page 48: CS598:Visual information Retrieval

TRAINING DATA3D PointCloud

[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]

Page 49: CS598:Visual information Retrieval

Statue of liberty (New York) – LibertyNotre Dame (Paris) – Notre DameHalf Dome (Yosemite) - Yosemite http://www.cs.ubc.ca/~mbrown/patchdata/patchdata.html

Liberty Yosemite

Page 50: CS598:Visual information Retrieval

WHAT IS THE FORM OF THE DESCRIPTOR FUNCTION?

Page 51: CS598:Visual information Retrieval

DESCRIPTOR ALGORITHMS

Algorithm

Normalizedimage patch

Descriptorvector

Gradientsquantized to

k orientations

NormalizeSummation

n grid region

[ SIFT – Lowe ICCV’99 ]

k × n vector

Page 52: CS598:Visual information Retrieval

DESCRIPTOR ALGORITHMS

Algorithm

Normalizedimage patch

Descriptorvector

GradientsQuantized

to k Orientation

s

Normalize(plus PCA)

[ GLOH – Mikolajzcyk & Schmid PAMI’05 ]

Summationn Log-Polar region

Page 53: CS598:Visual information Retrieval

DESCRIPTOR ALGORITHMS

Algorithm

Normalizedimage patch

Descriptorvector

CreateEdge Map Normalize

[ Shape Context – Belongie et al, NIPS’00 ]

Summationn Log-Polar region

Page 54: CS598:Visual information Retrieval

DESCRIPTOR ALGORITHMS

Algorithm

Normalizedimage patch

Descriptorvector

Grey value quantized into k bins

Normalize

[ Spin Image – Lazebnik et al, CVPR’03 ]

Summationn circular region

Page 55: CS598:Visual information Retrieval

DESCRIPTOR ALGORITHMS

Algorithm

Normalizedimage patch

Descriptorvector

Featuredetector Normalize

T S N[ Geometric Blur – Berg& Malik CVPR’01 ][ DAISY - Tola et al. CVPR’08]

Summationn Gaussian region

Page 56: CS598:Visual information Retrieval

OUR DESCRIPTOR ALGORITHM

Descriptorvector

T S N E Q

Page 57: CS598:Visual information Retrieval

S-BLOCK: PICKING THE BEST DAISY

S-Block: DAISY

Descriptorvector

T S N E Q

1r6s 1r8s 2r6s 2r8s

Page 58: CS598:Visual information Retrieval

T-BLOCK

DescriptorVector

T S N E Q T-Block:

• T1: Gradient bi-linearly weighted orientation binning• T2: Rectified gradient {| x|- x , | x|+ x , | y|- y , | y|+

y}• T3: Steerable filters

* <> 0 T3-4th-4

Page 59: CS598:Visual information Retrieval

N-E-Q BLOCKS

N-Block: Uniform v.s. SIFT-like normalization E-Block: Principle component analysis (PCA) Q-block: qi = βLvi + α, β is learnt from data, α =

0.5 if L is an odd number and α = 0 otherwise.

DescriptorVector

T S N E Q

Page 60: CS598:Visual information Retrieval

WHAT IS THE OPTIMAL CRITERION?

Page 61: CS598:Visual information Retrieval

DISCRIMINATIVE LEARNING AND OPTIMAL CRITERION

ST N

Parameterstraining pairs

False positive%Tr

ue p

ositi

ve %

Update Parameters(Powell)

Descriptor distances

• E-Block and Q-Block are optimized independently

95

α (error rate)

100,000 match/non-match

Page 62: CS598:Visual information Retrieval

SIFT-LIKE NORMALIZATION & PCA

SIFT like normalization has a clear sweet spot. PCA can usually reduce the dimension to 30~50.

Uniform normalization

Page 63: CS598:Visual information Retrieval

THE MOST DISCRIMINATIVE DAISY

Steerable filters at two spatial scales with PCA T3-2nd-6-2r6s + 2-scale + PCA: 42 dimension, 9.5% 2 × 6 × 2 × 2 × (2 × 6+1) = 624 dimensions before PCA

2r6s

Page 64: CS598:Visual information Retrieval

THE MOST COMPACT DAISY

T3-2nd-4-2r8s13 bytes, 13.2%

T2-4-1r8s-PCA7.5 bytes, 24.4%

SIFT128 bytes,

26.1%

• 2 bits /dimension is sufficient before PCA• 4 bits /dimension is needed after PCA• http://www.cs.ubc.ca/~mbrown/patchdata/patchdata.html

Page 65: CS598:Visual information Retrieval

SOME OF THE ERRORS MADE … …

The world is repeating itself….

Page 66: CS598:Visual information Retrieval

SUMMARY Blob detection

Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region

Learning local descriptors How can we get ground-truth data? What is the form of the descriptor function? What is the optimal criterion? How do we optimize it?