Computer Vision Lecture 4 FEATURE DETECTION. Feature Detection Feature Description Feature Matching...

Preview:

Citation preview

Computer Vision

Lecture 4FEATURE DETECTION

Feature Detection Feature Description Feature Matching

Today’s Topics

2

How to find camera parameters? Where is the camera, where is it directed at? What is the movement of the camera?

Where are the objects located in 3D? What are the dimensions of objects in 3D? What is the 3D structure of a scene?

How to process stereo video? How to detect and match image features? How to stitch images?

Questions of Interest

3

Introduction to Feature Detection

4

Image Mozaics

Object Recognition

Image Matching

Image Matching using Feature Points

NASA Mars Rover images

Feature Points are Used for …

Image alignment (e.g., mosaics) 3D reconstruction Motion tracking Object recognition Indexing and database retrieval Robot navigation …

Advantages of Local Features

Locality features are local, so robust to occlusion and clutter

Distinctiveness can differentiate a large database of objects

Quantity hundreds or thousands in a single image

Efficiency real-time performance achievable

Generality exploit different types of features in different situations

Point vs. Line Features

Processing Stages

Feature detection/extraction (look for stable/accurate features)

Feature description (aim for invariance to transformations)

Feature matching (for photo collection) or tracking (for video)

Feature Detection

13

Invariant Local Features

Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters

Features Descriptors

Choosing interest pointsWhere would you tell your friend to meet you?

Choosing interest pointsWhere would you tell your friend to meet you?

Stable (Good) Features: High Accuracy in Localization

Stable (Good) Features

“flat” region:no change in all directions

“edge”: no change along the edge direction

“corner”:significant change in all directions (uniqueness)

Good featureAperture problem

Autocorrelation: Indicator of “Good”ness

Wyx

yxIvyuxIvuR),(

),(),(),(

Autocorrelation Calculation

Autocorrelation can be approximated by sum-squared-difference (SSD):

Wyx

yxIvyuxIvuR),(

),(),(),(

Corner Detection: Mathematics

2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y

Change in appearance of window w(x,y) for the shift [u,v]:

I(x, y)E(u, v)

E(3,2)

w(x, y)

Corner Detection: Mathematics

2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y I(x, y)

E(u, v)

E(0,0)

w(x, y)

Change in appearance of window w(x,y) for the shift [u,v]:

Corner Detection: Mathematics

2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y

IntensityShifted intensity

Window function

orWindow function w(x,y) =

Gaussian1 in window, 0 outside

Source: R. Szeliski

Change in appearance of window w(x,y) for the shift [u,v]:

Corner Detection: Mathematics

2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y We want to find out how this function behaves for small shifts

Change in appearance of window w(x,y) for the shift [u,v]:

E(u, v)

Corner Detection: Mathematics

2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y

Local quadratic approximation of E(u,v) in the neighborhood of (0,0) is given by the second-order Taylor expansion:

v

u

EE

EEvu

E

EvuEvuE

vvuv

uvuu

v

u

)0,0()0,0(

)0,0()0,0(][

2

1

)0,0(

)0,0(][)0,0(),(

We want to find out how this function behaves for small shifts

Change in appearance of window w(x,y) for the shift [u,v]:

Taylor Expansion

Corner Detection: Mathematics 2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y Second-order Taylor expansion of E(u,v) about (0,0):

v

u

EE

EEvu

E

EvuEvuE

vvuv

uvuu

v

u

)0,0()0,0(

)0,0()0,0(][

2

1

)0,0(

)0,0(][)0,0(),(

),(),(),(),(2

),(),(),(2),(

),(),(),(),(2

),(),(),(2),(

),(),(),(),(2),(

,

,

,

,

,

vyuxIyxIvyuxIyxw

vyuxIvyuxIyxwvuE

vyuxIyxIvyuxIyxw

vyuxIvyuxIyxwvuE

vyuxIyxIvyuxIyxwvuE

xyyx

xyyx

uv

xxyx

xxyx

uu

xyx

u

Corner Detection: Mathematics 2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y Second-order Taylor expansion of E(u,v) about (0,0):

),(),(),(2)0,0(

),(),(),(2)0,0(

),(),(),(2)0,0(

0)0,0(

0)0,0(

0)0,0(

,

,

,

yxIyxIyxwE

yxIyxIyxwE

yxIyxIyxwE

E

E

E

yxyx

uv

yyyx

vv

xxyx

uu

v

u

v

u

EE

EEvu

E

EvuEvuE

vvuv

uvuu

v

u

)0,0()0,0(

)0,0()0,0(][

2

1

)0,0(

)0,0(][)0,0(),(

Corner Detection: Mathematics 2

,

( , ) ( , ) ( , ) ( , )x y

E u v w x y I x u y v I x y Second-order Taylor expansion of E(u,v) about (0,0):

v

u

yxIyxwyxIyxIyxw

yxIyxIyxwyxIyxw

vuvuE

yxy

yxyx

yxyx

yxx

,

2

,

,,

2

),(),(),(),(),(

),(),(),(),(),(

][),(

),(),(),(2)0,0(

),(),(),(2)0,0(

),(),(),(2)0,0(

0)0,0(

0)0,0(

0)0,0(

,

,

,

yxIyxIyxwE

yxIyxIyxwE

yxIyxIyxwE

E

E

E

yxyx

uv

yyyx

vv

xxyx

uu

v

u

The quadratic approximation simplifies to

2

2,

( , ) x x y

x y x y y

I I IM w x y

I I I

where M is a second moment matrix computed from image derivatives:

v

uMvuvuE ][),(

M

Corner Detection: Mathematics

yyyx

yxxx

IIII

IIIIyxwM ),(

x

II x

y

II y

y

I

x

III yx

Corners as distinctive interest points

2 x 2 matrix of image derivatives (averaged in neighborhood of a point).

Notation:

The surface E(u,v) is locally approximated by a quadratic form. Let’s try to understand its shape.

Interpreting the second moment matrix

v

uMvuvuE ][),(

yx yyx

yxx

III

IIIyxwM

,2

2

),(

2

1

,2

2

0

0),(

yx yyx

yxx

III

IIIyxwM

First, consider the axis-aligned case (gradients are either horizontal or vertical)

If either λ is close to 0, then this is not a corner, so look for locations where both are large.

Interpreting the second moment matrix

Consider a horizontal “slice” of E(u, v):

Interpreting the second moment matrix

This is the equation of an ellipse.

const][

v

uMvu

Consider a horizontal “slice” of E(u, v):

Interpreting the second moment matrix

This is the equation of an ellipse.

RRM

2

11

0

0

The axis lengths of the ellipse are determined by the eigenvalues and the orientation is determined by R

direction of the slowest change

direction of the fastest change

(max)-1/2

(min)-1/2

const][

v

uMvu

Diagonalization of M:

Interpreting the eigenvalues

1

2

“Corner”1 and 2 are large,

1 ~ 2;

E increases in all directions

1 and 2 are small;

E is almost constant in all directions

“Edge” 1 >> 2

“Edge” 2 >> 1

“Flat” region

Classification of image points using eigenvalues of M:

Corner response function

“Corner”R > 0

“Edge” R < 0

“Edge” R < 0

“Flat” region

|R| small

22121

2 )()(trace)det( MMR

α: constant (0.04 to 0.06)

Harris corner detector

1) Compute M matrix for each image window to get their cornerness scores.

2) Find points whose surrounding window gave large corner response (f> threshold)

3) Take the points of local maxima, i.e., perform non-maximum suppression

C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988. 

Harris Detector [Harris88]

Second moment matrix

)()(

)()()(),( 2

2

DyDyx

DyxDxIDI III

IIIg

39

1. Image derivatives

2. Square of derivatives

3. Gaussian filter g(I)

Ix Iy

Ix2 Iy

2 IxIy

g(Ix2) g(Iy

2) g(IxIy)

222222 )]()([)]([)()( yxyxyx IgIgIIgIgIg

])),([trace()],(det[ 2DIDIhar

4. Cornerness function – both eigenvalues are strong

har5. Non-maxima suppression

1 2

1 2

det

trace

M

M

(optionally, blur first)

Use Sobel Operator for Gradient Computation

Gaussian derivative of Gaussian

-1 0 1

-2 0 2

-1 0 1

1 2 1

0 0 0

-1 -2 -1

Horizontal derivative Vertical derivative

Eigenvalues and Eigenvectorsof the Auto-correlation Matrix

where and are the two eigenvalues of .

The eigenvector corresponding to gives the direction of largest increase in E,

while the eigenvector corresponding to gives the direction of smallest increase in E.

e

e

lower limit upper limit

How are +, e+, -, and e+ relevant for feature detection?

We want E(u,v) to be large for small shifts in all directions Thus, the minimum of E(u,v) should be large, this minimum is given by

the smaller eigenvalue - of M

Feature Detection Algorithm• Compute the gradient at each point in the image• Create the M matrix for each point from the gradients in a window • Compute the eigenvalues of each M• Find points with large response (- > threshold)

• Choose those points as features where - is a local maximum

The Harris Operator

• Called the “Harris Corner Detector” or “Harris Operator”

• Very similar to - but less expensive (no square root)

• Lots of other detectors, this is one of the most popular

2211

21122211

)(tr

)det(

aa

aaaa

A

Af

2221

1211

aa

aaA

The Harris Operator

Harris operator

Harris Detector Example

f value (red = high, blue = low)

Threshold (f > value)

Find Local Maxima of f

Harris Features (in red)

Many Existing Detectors Available

K. Grauman, B. Leibe

Hessian & Harris [Beaudet ‘78], [Harris ‘88]Laplacian, DoG [Lindeberg ‘98], [Lowe 1999]Harris-/Hessian-Laplace [Mikolajczyk & Schmid ‘01]Harris-/Hessian-Affine[Mikolajczyk & Schmid ‘04]EBR and IBR [Tuytelaars & Van Gool ‘04] MSER [Matas ‘02]Salient Regions [Kadir & Brady ‘01] Others…

Feature Description

56

Feature Descriptor

How can we extract feature descriptors that are invariant to variations (transformations) and yet still discriminative enough to establish correct correspondences?

?

Transformation Invariance

Find features that are invariant to transformations geometric invariance: translation, rotation, scale photometric invariance: brightness, exposure

Transformation Invariance

We would like to find the same features regardless of the transformation between images. This is called transformational invariance. Most feature methods are designed to be invariant to

• Translation, 2D rotation, scale They can usually also handle

• Limited affine transformations (some are fully affine invariant)

• Limited 3D rotations (SIFT works up to about 60 degrees)

• Limited illumination/contrast changes

Types of invariance

Illumination

Illumination Scale

Types of invariance

Illumination Scale Rotation

Types of invariance

Illumination Scale Rotation Affine

Types of invariance

Illumination Scale Rotation Affine Full Perspective

Types of invariance

How to Achieve Invariance

1. Make sure the detector is invariant Harris is invariant to translation and rotation Scale is trickier

• A common approach is to detect features at many scales using a Gaussian pyramid (e.g., MOPS)

• More sophisticated methods find “the best scale” to represent each feature (e.g., SIFT)

2. Design an invariant feature descriptor1. A descriptor captures the information in a region around the detected

feature point

2. The simplest descriptor: a square window of pixels 1. What’s this invariant to?

3. Let’s look at some better approaches…

Rotational Invariance for Feature Descriptors Find dominant orientation of the image patch. This is given by e+, the

eigenvector of M corresponding to +

Rotate the patch to horizontal direction Intensity normalize the window by subtracting the mean, dividing by the

standard deviation in the window

CSE 576: Computer Visio

8 pixels40 pixels

Adapted from slide by Matthew Brown

Scale Invariance

Detecting features at the finest stable scale possible may not always be appropriate

e.g., when matching images with little high frequency detail (e.g., clouds), fine-scale features may not exist.

One solution to the problem is to extract features at a variety of scales

e.g., by performing the same operations at multiple resolutions in a pyramid and then matching features at the same level.

This kind of approach is suitable when the images being matched do not undergo large scale changes

e.g., when matching successive aerial images taken from an airplane or stitching panoramas taken with a fixed-focal-length camera.

Scale Invariance for Feature Descriptors

Scale Invariant Detection

Key idea: find scale that gives local maximum of f f is a local maximum in both position and scale

Scale Invariant Feature Transform (SIFT) Features

In order to detect the local maxima and minima of f, each sample point is compared to its eight neighbors in the current image and nine neighbors in the scale above and below It is selected only if it is larger than all of

these neighbors or smaller than all of them. Instead of extracting features at many

different scales and then matching all of them, it is more efficient to extract features that are stable in both location and scale.

• Take 16x16 square window around detected feature

• Compute edge orientation (angle of the gradient - 90) for each pixel

• Throw out weak edges (threshold gradient magnitude)

• Create histogram of surviving edge orientations

Scale Invariant Feature Transform

Adapted from slide by David Lowe

0 2

angle histogram

SIFT Descriptor

• Divide the 16x16 window into a 4x4 grid of cells (2x2 case shown below)

• Compute an orientation histogram for each cell

• 16 cells * 8 orientations = 128 dimensional descriptor

Adapted from slide by David Lowe

Properties of SIFTExtraordinarily robust matching technique

Can handle changes in viewpoint• Up to about 60 degree out of plane rotation

Can handle significant changes in illumination• Sometimes even day vs. night (below)

Fast and efficient—can run in real time Lots of code available

• http://people.csail.mit.edu/albert/ladypack/wiki/index.php/Known_implementations_of_SIFT

Feature Matching

74

Feature Matching

Given a feature in image I1, how to find the best match in image I2?

1. Define a distance function that compares two descriptors

2. Test all the features in I2, find the one with minimum distance

Feature Distance SSD is commonly used to measure the difference

between two features f1 and f2

Use ratio-of-distance = SSD(f1, f2) / SSD(f1, f2’) as a measure of reliability of the feature match: f2 is best SSD match to f1 in I2

f2’ is 2nd best SSD match to f1 in I2

Ratio-of-distance is small for ambiguous matches

f1 f2f2'

I1I2

Key trade-offs

More Repeatable More Points

B1

B2

B3A1

A2 A3

Detection of interest points

More Distinctive More Flexible

Description of patches

Robust to occlusionWorks with less texture

Minimize wrong matches Robust to expected variationsMaximize correct matches

Robust detectionPrecise localization

Evaluating Feature Matching Performance

78

Evaluating the Results

How can we measure the performance of a feature matcher?

50

75

200

feature distance

True/false positives

The distance threshold affects performance True positives = # of detected matches that are correct

• Suppose we want to maximize these—how to choose threshold?

False positives = # of detected matches that are incorrect• Suppose we want to minimize these—how to choose

threshold?

5075

200

feature distance

false match

true match

Terminology

True positives = # of detected matching features

False negatives= # of missed matching features

False positives = # of detected not matching features

True negatives = # of missed not matching features

Matching features

Detected to have a match

Detected to have a match

Recall, Precision, and F-Number

Fraction of detected true positives among all positivesfraction of relevant instances that are retrieved

Fraction of true positives among all detected as positivesfraction of retrieved instances that are relevant

Approaches to 1 only when both recall and precision approach to 1

True and False Positive Rates

Fraction of detected true positives among all positives

Fraction of detected false positives among all negatives

Evaluating the Performance of a Feature Matcher

0.7

0 1

1

0.1

truepositive

rate

falsepositive

rate

Receiver Operating Characteristic (ROC) Curve

0.7

0 1

1

false positive rate

truepositive

rate

0.1

• Generated by counting # current/incorrect matches, for different thresholds• Want to maximize area under the curve (AUC)• Useful for comparing different feature matching methods• For more info: http://en.wikipedia.org/wiki/Receiver_operating_characteristic

Recommended