1
1
Feature Detection: Correlation-based Matching
� Goal: Develop matching procedures that can recognize possibly partially-occluded objects or features specified as patterns of intensity values
� Template matching � gray level correlation
� edge correlation
� Hough Transform
� Chamfer Matching
2
Applications
� Feature detectors� Line detectors
� Corner detectors
� Spot detectors
� Known shapes� Character fonts
� Faces
� Stereo
� Motion tracking
- + -+
-
2
3
Harris Corner Detector
C. Harris, M. Stephens, “A Combined Corner and Edge Detector,” 1988
4
The Basic Idea
� We should easily recognize the point by looking through a small window
� Shifting a window in any directionshould give a large changein response
3
5
Harris Detector: Basic Idea
“flat” region:no change in all directions
“edge”:no change along the edge direction
“corner”:significant change in all directions
6
Harris Detector: Mathematics
[ ]2
,
( , ) ( , ) ( , ) ( , )x y
E u v w x y I x u y v I x y= + + −∑
Change of intensity for the shift [u,v]:
IntensityShifted intensity
Window function
orWindow function w(x,y)=
Gaussian1 in window, 0 outside
4
7
Harris Detector: Mathematics
[ ]( , ) ,u
E u v u v Mv
≅
For small shifts [u,v] we have a bilinear approximation:
2
2,
( , ) x x y
x y x y y
I I IM w x y
I I I
=
∑
where M is a 2×2 matrix computed from image derivatives:
8
Harris Detector: Mathematics
[ ]( , ) ,u
E u v u v Mv
≅
Intensity change in shifting window: eigenvalue analysis
λ1, λ2 – eigenvalues of M
direction of the slowest change
direction of the fastest change
(λmax)-1/2
(λmin)-1/2
Ellipse E(u,v) = const
5
9
Harris Detector: Mathematics
λ1
λ2
“Corner”λ1 and λ2 are large,
λ1 ~ λ2;
E increases in all directions
λ1 and λ2 are small;
E is almost constant in all directions
“Edge”λ1 >> λ2
“Edge”λ2 >> λ1
“Flat”region
Classification of image points using eigenvalues of M:
10
Harris Detector: Mathematics
Measure of corner response:
( )2det traceR M k M= −
1 2
1 2
det
trace
M
M
λ λλ λ
== +
(k is an empirically determined constant; k = 0.04 - 0.06)
6
11
Harris Detector: Mathematics
λ1
λ2 “Corner”
“Edge”
“Edge”
“Flat”
• Rdepends only on eigenvalues of M
• R is large for a corner
• R is negative with large magnitude for an edge
• |R| is small for a flatregion
R> 0
R< 0
R< 0|R| small
12
Harris Detector
� The Algorithm:
� Find points with large corner response function R
(R > threshold)
� Take the points of local maxima of R
7
13
Harris Detector: Example
14
Harris Detector: ExampleCompute corner response R
8
15
Harris Detector: ExampleFind points with large corner response: R>threshold
16
Harris Detector: ExampleTake only the points of local maxima of R
9
17
Harris Detector: Example
18
Harris Detector: Summary
� Average intensity change in direction [u,v] can be expressed as a bilinear form:
� Describe a point in terms of eigenvalues of M:measure of corner response
� A good (corner) point should have a large intensity changein all directions, i.e., Rshould be a large positive value
[ ]( , ) ,u
E u v u v Mv
≅
( )2
1 2 1 2R kλ λ λ λ= − +
10
19
Harris Detector: Some Properties
� Rotation invariance
Ellipse rotates but its shape (i.e., eigenvalues) remains the same
Corner response R is invariant to image rotation
20
Harris Detector: Some Properties
� Partial invariance to affine intensitychange
� Only derivatives are used => invariance to intensity shift I → I + b
� Intensity scale: I → a I
R
x (image coordinate)
threshold
R
x (image coordinate)
11
21
Harris Detector: Some Properties
� But not invariant to image scale
Fine scale: All points will be classified as edges
Coarse scale:Corner
22
Harris Detector: Some Properties
� Quality of Harris detector for different scale changes
Repeatability rate:# correspondences
# possible correspondences
C. Schmid et al., “Evaluation of Interest Point Detectors,” IJCV 2000
12
23
Tomasi and Kanade’s Corner Detector
� Idea: Intensity surface has 2 directions with significant intensity discontinuities
� Image gradient [Ix, Iy]T gives information about direction and magnitude of one direction, but not two
� Compute 2 x 2 matrix
where Q is a 2n+1 x 2n+1 neighborhood of a given point p
=∑∑
∑∑
Qy
Qyx
Qyx
Qx
III
III
M 2
2
24
Corner Detection (cont.)
� DiagonalizeM converting it to the form
� Eigenvaluesλ1 and λ2 , λ1 ≥ λ2, give measure of the edge strength (i.e., magnitude) of the two strongest, perpendicular edge directions (specified by the eigenvectors of M)
� If λ1 ≈ λ2 ≈ 0, then p’s neighborhood is approximately constant intensity
� If λ1 > 0 and λ2 ≈ 0, then single step edge in neighborhood of p
� If λ2 > threshold and no other point within p’s neighborhood has greater value of λ2, then mark p as a corner point
=
2
1
0
0
λλ
M
13
25
Tomasi and Kanade Corner Algorithm
� Compute the image gradient over entire image f
� For each image point p:� form the matrix M over (2N+1) x (2N+1) neighborhood Q of p
� compute the smallest eigenvalue of M
� if eigenvalue is above some threshold, save the coordinates of p into a list L
� Sort L in decreasing order of eigenvalues
� Scanning the sorted list top to bottom: For each current point, p, delete all other points on the list that belong to the neighborhood of p
26
Results
14
27
Results
28
Results
15
29
Moravec’s Interest Operator
� Compute four directional variances in horizontal, vertical, diagonal and anti-diagonal directions for each 4 x 4 window
� If the minimum of four directional variances is a local maximum in a 12 x 12 overlapping neighborhood, then that window (point) is “interesting”
30
∑∑
∑∑
∑∑
∑∑
= =
= =
= =
= =
++−+−++=
++++−++=
+++−++=
+++−++=
2
0
3
1
2
2
0
2
0
2
2
0
3
0
2
3
0
2
0
2
))1,1(),((
))1,1(),((
))1,(),((
)),1(),((
j ia
j id
j iv
j ih
jyixPjyixPV
jyixPjyixPV
jyixPjyixPV
jyixPjyixPV
16
31
=
=
otherwise,0
max local a is),( if,1),(
)),(),,(),,(),,(min(),(
yxVyxI
yxVyxVyxVyxVyxV advh
32
17
33
Scale Invariant Detection
� Consider regions (e.g., circles) of different sizes around a point
� Regions of corresponding sizes will look the same in both images
34
Scale Invariant Detection
� The problem: How do we choose corresponding circles independently in each image?
18
35
Scale Invariant Detection
� Solution:
� Design a function on the region (circle) that is “scale invariant,” i.e., the same for corresponding regions, even if they are at different scales
Example: Average intensity. For corresponding regions (even of different sizes) it will be the same
scale = 1/2
– For a point in one image, we can consider it as a function of
region size (circle radius)
f
region size
Image 1 f
region size
Image 2
36
Scale Invariant Detection
� Common approach:
scale = 1/2
f
region size
Image 1 f
region size
Image 2
Take a local maximum of this function
Observation: Region size, for which the maximum is achieved, should be invariant to image scale
s1 s2
Important: This scale invariant region size is found in each image independently!
19
37
Scale Invariant Detection
� A “good” function for scale detection has one stable sharp peak
� For usual images: a good function would be a one which responds to contrast (sharp local intensity change
f
region size
bad
f
region size
bad
f
region size
Good !
38
Scale Invariance
Requires a method to repeatably select points in location and scale:
� The only reasonable scale-space kernel is a Gaussian (Koenderink, 1984; Lindeberg, 1994)
� An efficient choice is to detect peaks in the difference of Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales)
� Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)
Blur
Subtract
Blur
Subtract
20
39
SIFT Operator: Scale Space Processed One Octave (i.e., Doubling σσσσ) at a Time
Gσ ⊗ I
Gσ ⊗ Gσ = G√2σ
(Gσ)3 = G2σ
(Gσ)4 = G2√2σ
G2σ ⊗ I
(Gσ)4 = G4σ
40
SIFT: Key Point Localization
� Detect maxima and minima of difference-of-Gaussian in scale space
� Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)
� Taylor expansion around point:
� Offset of extremum (use finite differences for derivatives):
Blur
Subtract
21
43
SIFT: Select Canonical Orientation
� Create histogram of local gradient directions computed at selected scale
� Assign canonical orientation at peak of smoothed histogram
� Each key specifies stable 2D coordinates (x, y, scale, orientation)
0 2π
44
Example of Keypoint Detection
Threshold on value at DOG peak and on ratio of principle curvatures (Harris approach)
(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peak
value threshold(d) 536 left after testing
ratio of principlecurvatures
22
45
Creating Features Stable to Viewpoint Change
� Edelman, Intrator & Poggio (97) showed that complex cell outputs are better for 3D recognition than simple correlation
46
Stability to Viewpoint Change
� Classification of rotated 3D models (Edelman 97):
� Complex cells: 94% vs simple cells: 35%
23
47
SIFT Vector Formation
� Thresholded image gradients are sampled over 16 x 16 array of locations in scale space around each keypoint position
� Create array of orientation histograms from Gaussian-weighted gradients in 4 x 4 region
� 8 orientations x 4x4 histogram array = 128 dimensions
48
Feature Stability to Noise
� Match features after random change in image scale and orientation, with differing levels of image noise
� Find nearest neighbor in database of 30,000 features
24
49
Feature Stability to Affine Change
� Match features after random change in image scale and orientation, with 2% image noise, and affine distortion
� Find nearest neighbor in database of 30,000 features
50
Distinctiveness of Features
� Vary size of database of features, with 30 degree affine change,2% image noise
� Measure % correct for single nearest-neighbor match
25
51
Image Correlation
� Given:� n x n image, M, of an object of interest, called a template
� n x n image, N, that possibly contains that object (usually a window of a larger image)
� Goal: Develop functions that compare images M and Nand measure their similarity� Sum-of-Squared-Difference (SSD):
� (Normalized) Cross-Correlation (CC):
∑∑ ∑∑
∑∑
= = = =
= =
−−= n
i
n
j
n
i
n
j
n
i
n
j
jiNjiM
jiNljkiM
lkCC
1 1 1 1
2/122
1 1
]),(),([
),(),(
),(
SSD(k,l) = ∑∑ [M(I-k, j-l) - N(i, j)]2
52
Sum-of-Squared-Difference (SSD)
� Perfect match: SSD = 0
� If N = M + c, SSD = c2n2, so sensitive to constant
illumination change in image N. Fix by grayscale
normalization of N before SSD
26
53
Cross-Correlation (CC)
� CC measure takes on values in the range [0, 1] (or [0, √ ∑∑M2] if first term in denominator removed)� it is 1 if and only if N = cM for some constant c
� so N can be uniformly brighter or darker than the template, M, and the correlation will still be high
� SSD is sensitive to these differences in overall brightness
� The first term in the denominator, ΣΣM2, depends only on the template, and can be ignored because it is constant
� The second term in the denominator, ΣΣN2, can be eliminated if we first normalize the gray levels of N so that their total value is the same as that of M - just scale each pixel in N by ΣΣM/ΣΣN
� practically, this step is sometimes ignored, or M is scaled to have average gray level of the big image from which the unknown images, N, are drawn
54
Cross-Correlation
� Suppose that N(i,j) = cM(i,j)
1
]),(),([
),(
]),(),([
),(),(
]),(),([
),(),(
1 1 1 1
2/122
1 1
2
1 1 1 1
2/1222
1 1
1 1 1 1
2/122
1 1
=
=
=
=
∑∑ ∑∑
∑∑
∑∑ ∑∑
∑∑
∑∑ ∑∑
∑∑
= = = =
= =
= = = =
= =
= = = =
= =
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
n
i
n
j
jiNjiNc
jiNc
jiNjiNc
jiNjicN
jiNjiM
jiNjiM
CC
27
55
Cross-Correlation
� Alternatively, we can rescale both M and N to have unit total intensity� N′(i, j) = N(i, j)//ΣΣN
� M′(i, j) = M(i, j)/ΣΣM
� Now, we can view these new images, M′ and N′ as unit vectors of length n2
� The correlation measure ΣΣM′(i,j)N′(i,j) is the familiar dot product between the two n2 vectors M′′′′ andN′′′′. Recall that the dot product is the cosine of the angle between the two vectors� it is equal to 1 when the vectors are the same vector, or the normalized
images are identical
� These are BIG vectors
56
Cross-Correlation Example 1
� Template M = 0 0 0 Image N = 0
1 1 1 0 1 1 1 0
0 0 0 0
� ∑∑NM = 0 ∑∑N2 = 0
0 1 2 3 2 1 0 0
0 0 1 2 3 2 1 0
0
0
� ∑∑NM / √∑∑N2 = 0 1 √2 √3 √2 1 0
NOTE: Many near misses
28
57
Cross-Correlation Example 2
� Template M = 0 0 0 Image N = 0
1 1 1 4 4 4
0 0 0 0
� ∑∑NM = 0 ∑∑N2 = 0
4 8 12 8 4 16 32 48 32 16
0 0
� ∑∑NM / √∑∑N2 = 1 √2 √3 √2 1
58
Cross-Correlation Example 3
� Template M = 0 0 0 Image N = 1 1 1
1 1 1 0 1 1 1 0
0 0 0 1 1 1
� ∑∑NM = 1 2 3 2 1 ∑∑N2 = 1 2 3 2 1
1 2 3 2 1 2 4 6 4 2
1 2 3 2 1 3 6 9 6 3
2 4 6 4 2
1 2 3 2 1
29
59
Example 3 (cont.)
� ∑∑NM / √∑∑N2 = 0
1 √6/2 1
0 0 1 0 0
1 √6/2 1
0
� Lots of near misses!
60
Reducing the Computational Cost of Correlation Matching
� A number of factors lead to large costs in correlation matching:� the image N is much larger than the template M, so we have to perform
correlation matching of M against every n x n window of N
� we might have many templates, Mi, that we have to compare against a given image N
� face recognition - have a face template for every known face; this might easily be tens of thousands
� character recognition - template for each character
� we might not know the orientation of the template in the image
� template might be rotated in the image N - example: someone tilts their head for a photograph
� would then have to perform correlation of rotated versions of Magainst N
30
61
Reducing the Computational Cost of Correlation Matching
� A number of factors lead to large costs in correlation matching:� we might not know the scale, or size, of the template in the unknown
image
� the distance of the camera from the object might only be known approximately
� would then have to perform correlation of scaled versions of M against N
� Most generally, the image N contains some mathematical transformation of the template image M� if M is the image of a planar pattern, like a printed page or
(approximately) a face viewed from a great distance, then the transformation is an affine transformation, which has six degrees of freedom
62
Correlation Matching
� Let T(p1, p2, ..., pr) be the class of mathematical transformations of interest� For rotation, we have T(θ)
� For scaling, we have T(s)
� General goal is to find the values of p1, p2, ..., pr for which� C(T(p1, p2, ..., pr)M, N) is “best”
� highest for normalized cross-correlation
� smallest for SSD