7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 1/27
Scale Invariant Feature Transform
Harri Auvinen, Tapio Leppalampi,
Joni Taipale and Maria Teplykh
Lappeenranta University of TechnologyMachine Vision and Digital Image Analysis
November 24th
, 2009
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 2/27
Introduction
Scale-Invariant Feature Transform (SIFT)
A method developed by David G. Lowe
Feature extraction method
Invariance in feature extraction
A method should locate features
Extraction method should be robust, it should handledifferent types of changes between images
Illumination Affine transform
Scale Rotation
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 4/27
Algorithm
The steps of the SIFT algorithm:
1. Scale-space extrema detection Search over scales and image locations Locate local extremas
2. Keypoint localization Selects keypoints from local extremas
Keypoints are selected based on measures of their stability3. Orientation assignment
Orientations are assigned to each keypoint based on localimage gradient directions
4. Keypoint descriptor Local image gradients are measured at the selected scale
in the region around each keypoint These are transformed into a representation that allows for
signifigant levels of local shape distortion and change inillumination
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 5/27
Scale-space extrema detection
The first stage in keypoint detection is to find the local extremas
in scale-space. It contains
A cascade filtering approach Creation of octaves and
Scale-space images for each octave
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 6/27
Scale-space extrema detection
The original image I (x , y ) is blurred with Gaussian filter
L(x , y , σ) = G (x , y , σ) ∗ I (x , y ), (1)
where
G (x , y , σ) =1
2πσ2
exp−1
2σ2
(x 2 + y 2) . (2)
The procedure is repeated by changing the scale σ by
multiplying it with the factor k s times. Then the difference of
Gaussians (DoG),
D (x , y , σ) = L(x , y , k σ) − L(x , y , σ), (3)
is calculated for adjagent blurred images. According to Lowe to
achieve stable keypoints one should set s = 3 and k = 21/s .
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 7/27
Scale-space extrema detection
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 8/27
Scale-space extrema detection
The procedure to calculate the differences of Gaussians is then
repeated for each octave. The creation of the next octave:
Select the Gaussian blurred image which has σ value twice
to that of the original Subsample the image and use the output as the starting
point for next octave
Subsampling is made by selecting every second pixel from
the rows and columns of the image
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 9/27
Keypoint localization
Detection of local extremas/keypoints Find the extrema points in the DoG pyramid Improve the localization of the keypoint to subpixel
accuracy by using a second order Taylor series expansion
Elimination of keypoints Eliminate some points from the candidate list of keypoints
by finding those that have low contrast or are poorly
localised on an edge Contrast thresholding Cornerness thresholding
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 10/27
Detection of local extremas/keypoints
To detect the local maxima and minima of D (x , y , σ) each
point is compared with the pixels of all its 26 neighbours
If this value is the minimum or maximum, then this point is
an extrema
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 11/27
Detection of local extremas/keypoints. Brown and
Lowe methodImprovement to matching and stability
Approach uses the Taylor expansion of the scale-spacefunction, D (x , y , σ), shifted so that the origin is at the sample
point:
D (X ) = D +∂ D T
∂ X X +
1
2X ∂ 2D
∂ X 2X (4)
where D and its derivatives are evaluated at the sample pointand X = (x , y , σ)T is the offset from this point. The location of
the extremum X is determined by taking the derivative of this
function with respect to X and setting it to zero, giving
X = −
∂ 2D
∂ X 2
−
1 ∂ D
∂ X (5)
If X > 0.5 then it means that the extremum lies closer to a
different sample point. In this case, the interpolation is
performed.
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 12/27
Elimination of keypoints
a The 233x189 pixel
original image
b The initial 832 keypoints
locations at maxima and
minima of the
difference-of-Gaussian
functionc After applying a
threshold on minimum
contrast 729 keypoints
remain
d The final 536 keypoints
that remain following an
additional threshold on
ratio of principal
curvatures
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 13/27
Contrast thresholding
The function value at the extremum, D (X ), is useful for
rejecting unstable extrema with low contrast. This can be
obtained by substituting equation (5) into (4), giving
D (X ) = D + 12∂ D T∂ X
X . (6)
If the function value at X is below a threshold value this point is
excluded. For (c) all extrema with a value of |D(X )| < 0.03 were
discarded.
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 14/27
Cornerness thresholding
A poorly defined peak in the difference-of-Gaussian functionwill have a large principal curvature across the edge but a small
one in the perpendicular direction. The principal curvatures can
be computed from a 2x2 Hessian matrix, H, computed at the
location and scale of the keypoint:
H =
D xx D xy
D xy D yy
(7)
The derivatives are estimated by taking differences of
neighboring sample points. The eigenvalues of H areproportional to the principal curvatures of D .
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 15/27
Cornerness thresholdingLet α be the eigenvalue with the largest magnitude and β be
the smaller oneTr (H) = D xx + D yy = α + β (8)
Det (H) = D xx D yy − (D xy )2 = αβ (9)
Let r be the ratio between the largest magnitude eigenvalue
and the smaller one, so that α = r β . Then,
Tr (H)2
Det (H)= (α + β )2
αβ = (r β + β )2
r β 2= (r + 1)2
r (10)
The quantity (r + 1)2/r is at a minimum when the two
eigenvalues are equal and it increases with r . Therefore, to
check that the ratio of principal curvatures is below somethreshold, r , we only need to check
Tr (H)2
Det (H)<
(r + 1)2
r (11)
The transition from (c) to (d) was obtained with r = 10.
O i i i ( )
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 16/27
Orientation assignment (2)
Left: The point in the middle is the keypoint candidate. The
orientations of the points in the square area around this point
are precomputed using pixel differences.Right: Each bin in the histogram holds 10 degree, so it covers
the whole 360 degree with 36 bins in it. The value of each bin
holds the magnitude sums from all the points precomputed
within that orientation.
K i d i
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 17/27
Keypoint descriptor
Keypoint samples are accumulated into orientation
histograms summarizing the contents over 4x4 subregions
Best result is obtained 4X4 array of histograms with 8orientation bins in each
As a result a 4x4x8 = 128 element feature vector is
generated for each keypoint
K i t d i t
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 18/27
Keypoint descriptor
O i t ti i i
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 19/27
Orientation invariance
In order to achieve orientation invariance the coordinates
of the descriptor and the gradient orientations are rotatedrelative to the keypoint orientation
For efficiency, the gradients are precomputed for all levels
of the pyramid
A Gaussian weighting function with equal to one half thewidth of the descriptor window is used to assign a weight
to the magnitude of each sample point
The purpose of the Gaussian window is To avoid sudden changes in the descriptor with small
changes in the position of the window And to give less emphasis to gradients that are far from the
center of the descriptor, as these are most affected bymisregistration errors
Bo ndar affects
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 20/27
Boundary affects
To avoid all boundary affects
Trilinear interpolation is used to distribute the value of each
gradient sample into adjacent histogram bins
In other words, each entry into a bin is multiplied by aweight of 1 − d for each dimension
d is the distance of the sample from the central value of
the bin as measured in units of the histogram bin spacing
Effect of illumination
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 21/27
Effect of illumination
The feature vector modification
Reason by this is to reduce the effects of illumination
change
First, the vector is normalized to unit length
Second, threshold the values in the unit feature vector
And then renormalizing to unit length
Demo and applications
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 22/27
Demo and applications
Search for the sample in the image Classification of remote sensed imagery. [Yang&Newsam,
2008]
Model images of planar objects Recognition of 3D objects
Recognising Panoramas
People Redetection [Hu et al., 2008]
Object recognition
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 23/27
Object recognition
Recognising panoramas
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 24/27
Recognising panoramas
Comparison and Modifications
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 25/27
Comparison and ModificationsUntil now, SIFT has been proven to be the most reliable
descriptor among the others.
Ancuti&Bekaert, 2007 Mikolajczyk&Schmid, 2005
Modifications
CSIFT: A SIFT Descriptor with Color Invariant
Characteristics [Abdel-Hakim&Farag] SIFT-CCH: Increasing the SIFT distinctness by Color
Co-occurrence Histograms [Ancuti&Bekaert, 2007]
PCA-SIFT: A More Distinctive Representation for LocalImage Descriptors [Ke et al., 2004]
”. . . instead of using SIFT’s smoothed weighted histograms,we apply Principal Components Analysis (PCA) to thenormalized gradient patch.”
”. . . more distinctive and more compact leading tosignificant improvements in matching accuracy (and speed)
for both controlled and real-world conditions.”
References
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 26/27
References
David G. Lowe, Object Recognition from Local
Scale-Invariant Features, Proc. of the International
Cenference on Computer Vision, 1999
David G. Lowe, Distinctive Image Features from
Scale-Invariant Keypoints, International Journal of
Computer Vision, 2004
M. Brown and D.G. Lowe, Recognising Panoramas,International Conference on Computer Vision, 2002
Andrea Vevaldi, SIFT for Matlab,
http://www.vlfeat.org/ vedaldi/code/sift.html
Cosmin Ancuti and Philippe Bekaert, SIFT-CCH:Increasing the SIFT distinctness by Color Co-occurrence
Histograms, Proceedings of 5th IEEE International
Symposium on Image and Signal Processing and Analysis,
2007.
References
7/30/2019 Phuong phap SIFT
http://slidepdf.com/reader/full/phuong-phap-sift 27/27
References
K. Mikolajczyk and C. Schmid. A performance evaluation
of local descriptors. IEEE PAMI, 2005.
Alaa E. Abdel-Hakim, Aly A. Farag: CSIFT: A SIFTDescriptor with Color Invariant Characteristics.
Y. Ke, R. Suthankar and L. Hutson, PCA-SIFT: a more
distinctive representation for local image descriptors, in
Proc. of CVPR, 2004. Y. Yang and S. Newsam, Comparing SIFT Descriptors and
Gabor Texture Features for Classification of Remote
Sensed Imagery, IEEE International Conference on Image
Processing, 2008
Lei Hu, Shuqiang Jiang, Qingming Huang, Yizhou
Wang,Wen Gao,PEOPLE RE-DETECTION USING
ADABOOST WITH SIFT AND COLOR CORRELOGRAM,
The International Conference on Image Processing
(ICIP2008)