Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
STUDIES ON LOG-POLAR TRANSFORM FOR IMAGE
REGISTRATION AND IMPROVEMENTS USING
ADAPTIVE SAMPLING AND LOGARITHMIC SPIRAL
DISSERTATION
Presented in Partial Fulfillment of the Requirements for
the Degree Doctor of Philosophy in the
Graduate School of The Ohio State University
By
Rittavee Matungka, M.S.E., M.A., B.S.
* * * * *
The Ohio State University
2009
Dissertation Committee:
Professor Yuan F. Zheng, Adviser
Professor Bradley Clymer
Professor Ashok Krishnamurthy
Approved by
Adviser
Graduate Program inElectrical and Computer
Engineering
c© Copyright by
Rittavee Matungka
2009
ABSTRACT
Image registration is an essential step in many image processing applications that
need visual information from multiple images for comparison, integration or analysis.
Recently researchers have introduced image processing techniques using the log-polar
transform (LPT) for its rotation and scale invariant properties. However, limitations
still exist when LPT is applied to image registration applications. The major thesis of
this dissertation is to provide in depth analysis of LPT, its advantages, and problems.
We introduce new formulations that are derived to overcome the limitations of LPT
and thus extend its horizon in practice. A novel techniques are proposed to address
the limitations.
The first extension to be introduced is to extend the use of LPT to 2D object recog-
nition application. Motivated by the observation that LPT is sensitive to translation,
which leads to high computation in the search process, we integrate the combination
of feature extraction and feature point screening method to the recognition system.
With the scale and rotation invariant properties of LPT and the low computation
feature extraction and screening approach that further reduces the number of feature
point, a 2D object recognition is presented.
The second limitations of LPT that is addressed in this dissertation is the nonuni-
form sampling and its effects. LPT suffers from nonuniform sampling which makes it
not suitable for applications in which the registered images are altered or occluded.
ii
Inspired by this fact, we presents a new registration algorithm that addresses the
problems of the conventional LPT while maintaining the robustness to scale and ro-
tation. We introduce a novel Adaptive Polar Transform (APT) technique that evenly
and effectively samples the image in the Cartesian coordinates. Combining APT with
an innovative projection transform along with a matching mechanism, the proposed
method yields less computational load and more accurate registration than that of
the conventional LPT. Translation between the registered images is recovered with
the new search scheme using Gabor feature extraction to accelerate the localization
procedure. Moreover an image comparison scheme is proposed for locating the area
where the image pairs differ. Experiments on real images demonstrate the effective-
ness and robustness of the proposed approach for registering images that are subjected
to occlusion and alteration in addition to scale, rotation and translation.
Lastly, this dissertation observes the accuracy of the LPT approach that is limited
by the number of samples used in the mapping process. Since obtaining scale and
rotation parameters involves 2D correlation method either in the spatial domain or
in the frequency domain, the computational complexity of the matching procedure
grows exponentially as the number of samples increases. Motivated by this limitation,
we propose a novel pre-shifted logarithmic spiral (PSLS) approach that is robust
to translation, scale, and rotation and requires lower computational cost. By pre-
shifting the sampling point in the angular direction by π/nθ radian, the total number
of samples in angular direction can be reduced by half. This yields great reduction in
computational load in the matching process. Experimental results demonstrate the
effectiveness and robustness of the proposed approach.
iii
This is dedicated to my parents
iv
ACKNOWLEDGMENTS
Firstly, I would like to express my deep gratitude and appreciation to my advisor,
Dr. Yuan F. Zheng, for the guidance, encouragement and support he has given me
throughout my Ph.D study at the Ohio State University. He taught me how to think
outside the box, approach the research problems and express my ideas. He was always
there to encourage me whenever I lost hope and confidence, to give me advise and
guidance whenever I felt like my work hit the brick wall, and he always there to inspire
me that with the hard work and persistent, I can accomplish any goal. Without him,
this dissertation would not have been completed.
Special thanks to Dr. Bradley Clymer and Dr. Ashok Krishnamurthy for serving
on my committee. Their technical feedbacks and constructive comments make this
dissertation better than it could be.
It has been my pleasure to work with my colleges at The Ohio State University.
Mr. Junda Zhu, Mr. Liang Yuan, Ms. Yen-Lun Chen, Mr. Yuanwei Lao, Mr.
Gaotham Tamma, and all my colleges at the Multimedia and Robotics Laboratory,
thanks for their continual help and warmth friendship. I also would like to thank
Mr. Juan Torres, Mr. Zakaria Chehab, and Ms. Sripattamma Tawasay for their love,
support, and encouragement. It has been a wonderful journey because of them.
v
I cannot end without thanking my family, Mrs. Aprirada Matungka, Mr. Adiz
Rittikal, Mr. Vorrayuth Matungka, Mrs. Sunee Matungka and Ms. Nattarudee
Matungka on whose constant encouragement and love I have relied through all the
highs and lows. It is to them that I dedicate this work.
vi
VITA
August 20, 1980 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Born - Bangkok, Thailand
2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B.S. Electrical Engineering, Chula-longkorn University, Thailand
2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.A. Business and Managerial Eco-nomics, Chulalongkorn University,Thailand
2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.S.E. Telecommunications and Net-working Engineering, University ofPennsylvania, Pennsylvania
PUBLICATIONS
R. Matungka and Y. F. Zheng, “Scale and rotation invariant image registration usingpre-shifted logarithmic spiral,” in Proc. International Conference on Image Process-
ing, 2009 (Paper submitted)
R. Matungka and Y. F. Zheng, “Aerial image registration using projective polar trans-
form,” in Proc. International Conference on Acoustics, Speech, and Signal Processing,2009 (Paper accepted: to be published in 2009)
R. Matungka and Y. F. Zheng, “Image registration using adaptive polar transform,”
IEEE Trans. Image Processing, 2009 (Journal paper is conditionally accepted)
R. Matungka, Y. F. Zheng, and R. L. Ewing, “Object recognition using log-polarwavelet mapping,” in Proc. Tools with Artificial Intelligence, Nov 2008, pp. 559–563.
R. Matungka, Y. F. Zheng, and R. L. Ewing, “Image registration using adaptive polar
transform,” in Proc. International Conference on Image Processing, Oct 2008, pp.2416–2419.
vii
R. Matungka, Y. F. Zheng, and R. L. Ewing, “2D invariant object recognition usinglog-polar transform,” in Proc. World Congress on Intelligent Control and Automa-
tion, Jun 2008, pp. 223–228.
FIELDS OF STUDY
Major Field: Electrical and Computer Engineering
Studies in:
Communications and Signal ProcessingComputer Engineering
viii
TABLE OF CONTENTS
Page
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Chapters:
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Log-Polar Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Correlation approaches . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Phase correlation . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Fourier-Mellin . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.4 Feature-based approaches . . . . . . . . . . . . . . . . . . . 101.2.5 Multiresolution framework . . . . . . . . . . . . . . . . . . . 11
2. 2D INVARIANT OBJECT RECOGNITION USING LOG-POLAR TRANS-
FORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Log-Polar Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 The New LPT-Based Object Recognition Approach . . . . . . . . . 172.2.1 Establish the model . . . . . . . . . . . . . . . . . . . . . . 17
2.2.2 Global search . . . . . . . . . . . . . . . . . . . . . . . . . . 18
ix
2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 The robustness . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 The accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4 Extended Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3. IMAGE REGISTRATION USING ADAPTIVE POLAR TRANSFORM 47
3.1 Adaptive Polar Transform Approach . . . . . . . . . . . . . . . . . 49
3.1.1 Motivation of the proposed approach . . . . . . . . . . . . . 493.1.2 The proposed APT approach . . . . . . . . . . . . . . . . . 52
3.1.3 Projection transform . . . . . . . . . . . . . . . . . . . . . . 56
3.2 The New Image Registration Approach Using APT . . . . . . . . . 603.2.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.2.2 APT matching . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.3 Image comparison . . . . . . . . . . . . . . . . . . . . . . . 70
3.2.4 Complete algorithm . . . . . . . . . . . . . . . . . . . . . . 723.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.3.1 Image registration in general conditions . . . . . . . . . . . 753.3.2 Registration of images that are subjected to occlusion and
alteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.3.3 Registration results and errors . . . . . . . . . . . . . . . . 86
3.4 Extended Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4. SCALE AND ROTATION INVARIANT IMAGE REGISTRATION US-
ING PRE-SHIFTED LOGARITHMIC SPIRAL . . . . . . . . . . . . . . 91
4.1 The Proposed Logarithmic Spiral Approach . . . . . . . . . . . . . 934.1.1 Logarithmic spiral mapping . . . . . . . . . . . . . . . . . . 94
4.1.2 LS matching . . . . . . . . . . . . . . . . . . . . . . . . . . 974.1.3 Pre-shifted logarithmic spiral . . . . . . . . . . . . . . . . . 101
4.2 The New Image Registration Approach . . . . . . . . . . . . . . . . 1034.2.1 Feature point based search scheme . . . . . . . . . . . . . . 103
4.2.2 PSLS matching . . . . . . . . . . . . . . . . . . . . . . . . . 1064.2.3 Complete algorithm . . . . . . . . . . . . . . . . . . . . . . 107
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 1084.3.1 Registration accuracy . . . . . . . . . . . . . . . . . . . . . 108
4.3.2 Computational cost . . . . . . . . . . . . . . . . . . . . . . 1124.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
x
5. CONCLUSION AND FUTURE WORKS . . . . . . . . . . . . . . . . . . 115
Appendices:
A. Gabor Wavelet Feature Extraction . . . . . . . . . . . . . . . . . . . . . 118
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
xi
LIST OF TABLES
Table Page
2.1 Recognition results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1 The image registration error compared to the ground truth in general
conditions (lower is better) . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2 The image registration error compared to the ground truth when im-ages are altered or occluded (lower is better) . . . . . . . . . . . . . . 87
xii
LIST OF FIGURES
Figure Page
1.1 LPT mapping: (a) LPT sampling in the Cartesian Coordinates, (b) the
resulting sample distribution in the angular and log-radius directions.Note that in (a), the distance between consecutive sampling points in
the radius direction increases exponentially from center to the furthestcircumference due to the logarithm sampling . . . . . . . . . . . . . . 5
1.2 (a) The original image Lena, (b) the scaled and rotated image of (a),
(c) the LPT transformed image of (a), and (d) the LPT transformed
image of (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 (a) Original image, (b) the LPT of (a), (c) scaled and rotated imageof (a), (d) LPT of (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 A screening mask (dimension is in pixel) . . . . . . . . . . . . . . . . 22
2.3 (a) feature points extracted from the original image, (b)the red dot
indicates the feature point selected as the original point for creatingthe model and the screening vector, (c) feature points extracted from
the scaled and rotated image, (d) feature points remained after screening 24
2.4 (a) and (c) Feature points extraction of the House and blox images,(b) and (d) 8 feature points are randomly selected from each image . 28
2.5 18 candidates from scaled and rotated the House image . . . . . . . 29
2.6 18 candidates from scaled and rotated the blox image . . . . . . . . . 30
2.7 A demonstration of the performance in matching feature points in theobject that is rotated with various orientations. (a) proposed approach
(b) VNC (c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
xiii
2.8 A demonstration of the performance in screening feature points using
feature number 3 of the blox image. (a) proposed approach (b) VNC(c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.9 A demonstration of the performance in screening feature points using
feature number 3 of the house image. (a) proposed approach (b) VNC(c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.10 A demonstration of the performance in screening feature points in the
noisy environment using feature number 3 of the blox image. (a) pro-posed approach (b) VNC (c) ASD . . . . . . . . . . . . . . . . . . . . 34
2.11 A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the house image. (a)
proposed approach (b) VNC (c) ASD . . . . . . . . . . . . . . . . . . 35
2.12 Reference images and selected object (a) US coins(b) BMW (c) Aircraft 36
2.13 Experimental results in the target image: (from left to right) featurepoint extraction, screening, recognizing object . . . . . . . . . . . . . 42
2.14 Sets of tested images, UScoins . . . . . . . . . . . . . . . . . . . . . 43
2.15 Sets of tested images, UScoins in noisy enivironment . . . . . . . . . 43
2.16 Sets of tested images, BMW . . . . . . . . . . . . . . . . . . . . . . 44
2.17 Sets of tested images, Aircraft . . . . . . . . . . . . . . . . . . . . . 44
2.18 MR-LPT mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.19 MR-LPT of the quarter coin in US coin image: (a) 16×16, (b) 32×32,and (c) 64× 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.20 Feature point reduction from MR-LPT (red dot represents the center
point of LPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xiv
3.1 Example of inaccurate registration results using the conventional LPTdue to occlusion in the target image: (a) the model image patch
cropped from Lena image is registered to the non-occluded target im-age on the upper-right and occluded target image on the lower-right,
where the red rectangular indicate the registration results, (b) the cor-rection coefficients when registering to the non-occluded target image,
and (c) the correction coefficients when registering to the occludedtarget image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2 APT mapping: (a) adaptive sampling in the spatial domain with more
samples in the angular direction as the radius increases, (b) the result-ing sample distribution in the angular and radius directions. Note that
in (a), the distance between consecutive sampling points in the radius
direction remains the same for all radius circumferences and so doesthe angular direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3 Examples of APT: (a) the original Lena image, (b) the APT trans-
formed image of (a), (c) the scaled and rotated image of (a), and (d)the APT transformed image of (c) . . . . . . . . . . . . . . . . . . . . 55
3.4 The effects of the changes in scale and rotation in the Cartesian coordi-
nates to the projections � and Θ: (a) the scale change in the Cartesian(scale factor of 1.2 is applied to Lena image) appears as variable-scale
in the projection �, while the projection Θ becomes slightly altered,(b) the rotation in the Cartesian (rotation parameter of 45 degrees is
applied to Lena image) appears as shifting in the projection Θ, whilethere is no change in the projection �, and (c) the changes in both
scale and rotation and in the Cartesian (scale and rotation parameters
of 1.2 and 45 degrees, respectively are applied to Lena image) appearas variable-scale in projection � and shifting in projection Θ, respectively 57
3.5 Examples of APT of images of syrup bottle: (a) the reference image
with the model image selected as the area inside the red circle, (b) theAPT transformed image of (a), (c) the target image, and (d) the APT
transformed image of the area inside the red circle of the target image 58
xv
3.6 The projections of the syrup bottle images: (a) comparison of projec-tions � of the syrup bottle from the reference image and the target
image, and (b) comparison of projections Θ of the syrup bottle fromthe reference image and the target image. The differences between pro-
jections of the two images are more pronounced, which need additionaltechniques to resolve . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.7 Feature point extraction and localization procedure . . . . . . . . . . 63
3.8 Comparison of image registration results using the proposed approach
and the conventional LPT based approach on the image of squirrel : (a)the reference image where the entire area is used as the model image,
(b) registration result using the conventional LPT based approach, and
(c) registration result using the proposed approach (red rectangularindicates the registration result) . . . . . . . . . . . . . . . . . . . . . 77
3.9 Comparison of image registration results using the proposed approach
and the conventional LPT based approach on the image of statue ofliberty : (a) the reference image where the entire area is used as the
model image, (b) registration result using the conventional LPT basedapproach, and (c) registration result using the proposed approach (red
rectangular indicates the registration result) . . . . . . . . . . . . . . 78
3.10 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of bird : (a) the
reference image where the entire area is used as the model image, (b)registration result using the conventional LPT based approach, and
(c) registration result using the proposed approach (red rectangular
indicates the registration result) . . . . . . . . . . . . . . . . . . . . . 78
3.11 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of speed limit
sign: (a) the reference image with the model image selected as the areainside the red rectangular, (b) registration result using the conventional
LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result) . . . . . . 79
xvi
3.12 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of syrup bottle:
(a) the reference image with the model image selected as the areainside the red rectangular, (b) registration result using the conventional
LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result) . . . . . . 79
3.13 Comparison of image registration results using the proposed approach
and the conventional LPT based approach on the image of DreeseBuilding : (a) the reference image where the entire area is used as the
model image, (b) registration result using the conventional LPT basedapproach, and (c) registration result using the proposed approach (red
rectangular indicates the registration result and the green area indi-
cates the altered area) . . . . . . . . . . . . . . . . . . . . . . . . . . 82
3.14 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of flower : (a)
the reference image where the entire area is used as the model image,(b) registration result using the conventional LPT based approach, and
(c) registration result using the proposed approach (red rectangularindicates the registration result and the green area indicates the altered
area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.15 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of cereal box :
(a) the reference image where the entire area is used as the model im-age, (b) registration result using the conventional LPT based approach,
and (c) registration result using the proposed approach (red rectangu-
lar indicates the registration result and the green area indicates thealtered area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.16 Comparison of image registration results using the proposed approach
and the conventional LPT based approach on the image of humanbrain: (a) the reference image with the model image selected as the
area inside the red rectangular, (b) registration result using the con-ventional LPT based approach, and (c) registration result using the
proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area) . . . . . . . . . . . . . 84
xvii
3.17 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human lung :
(a) the reference image with the model image selected as the area in-side the red rectangular, (b) registration result using the conventional
LPT based approach, and (c) registration result using the proposed ap-proach (red rectangular indicates the registration result and the green
area indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 84
3.18 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of OSU tower :
(a) the reference image with the model image selected as the area in-side the red rectangular, (b) registration result using the conventional
LPT based approach, and (c) registration result using the proposed ap-
proach (red rectangular indicates the registration result and the greenarea indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 86
3.19 Comparison of image registration results using the proposed approach
and the conventional LPT based approach on the image of stop sign:(a) the reference image with the model image selected as the area in-
side the red rectangular, (b) registration result using the conventionalLPT based approach, and (c) registration result using the proposed ap-
proach (red rectangular indicates the registration result and the greenarea indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 87
3.20 PPT map (a) Distance parameters for image of size 10× 10 pixels and
(xc, yc) = (7, 7), (b) Graphic demonstration of PPT map for imagewith size 513× 513 pixels and (xc, yc) = (256, 256) . . . . . . . . . . 89
3.21 Example of PPT: (a) the original Lena image, (b) the PPT transformedimage of (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.1 LS mapping with nr = 8 and nθ = 16: (a) CW direction, (b) CCW
direction, and (c) PSLS in the CCW direction . . . . . . . . . . . . . 95
4.2 Comparison of the sampling points distribution between LPT and LSmappings: (a) original Target logo image, (b)Target logo image with
scale parameter 1.3, and (c) comparison the locations of the samplingpoints from the center . . . . . . . . . . . . . . . . . . . . . . . . . . 98
xviii
4.3 LS transformed vectors of barbara images, where nr = nθ = 16 andRmax = 64: (a) from original barbara image (this transformed vector is
presented in (b)-(d) as broken blue line for comparison), (b) from bar-bara image with 90 degrees rotation, (c) from barbara image with scale
parameter of 2, and (d) from barbara image with scale and rotationparameters of 2 and 90 degrees, respectively . . . . . . . . . . . . . . 100
4.4 Comparison of sampling distributions: (a) LPT and LS mappings and
(b) LPT and PSLS mappings . . . . . . . . . . . . . . . . . . . . . . 102
4.5 Comparison of percentage of oversampling between LPT and PSLSmappings: (a) images with the size 64 × 64, and (b)images with the
size 128× 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Image registration results using the proposed PSLS approach: (a) the
reference images where the entire area are used as the model images,(b) target images and their features, and (c) registration results (red
rectangular indicates the registration result) . . . . . . . . . . . . . . 110
4.7 Experimental results: (a) PSLS rotation test results, (b) PSLS scaletest results, (c) LPT rotation test results, and (d) LPT scale test results111
4.8 Computational benefit using PSLS approach . . . . . . . . . . . . . . 113
A.1 Performance evaluations of proposed Gabor features, SIFT [43], and
Harris-Laplacian [50], (a) Orientation change, (b) scale change, (c)noise added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
xix
CHAPTER 1
INTRODUCTION
Image registration is a process of aligning two images that share common visual
information such as images of the same object or images of the same scene taken at
different geometric viewpoints, different time, or by different image sensors. Image
registration is an essential step in many image processing applications that involve
multiple images for comparison, integration or analysis such as image fusion, image
mosaics, image or scene change detection, and medical imaging. The main objective
of image registration is to find the geometric transformations of the model image, IM ,
in the target image, IT , where IT (x, y) = T {IM(x′, y′)} and T is a two-dimensional
geometric transformation that associates the (x′, y′) coordinates in IM with the (x, y)
coordinates in IT . These two-dimensional geometric transformations include scale,
rotation, and translation in the Cartesian coordinates. The model image is often
presented in the form of image patch that is cropped from the reference image. For
the case that the entire reference image is desired to be registered to the target image,
the reference image is considered the model image.
Many image registration methods have been proposed in the past 20 years [5, 8,
35, 39, 45, 64, 93, 94], which can be categorized into two major groups: the feature-
based approach and the area-based approach. The feature-based approach uses only
1
the correspondence between the features in the two images for registration. The
features include region features such as building [24], water content [28], forests [67]
color gradient, line features such as edges [30, 31], line segments [9, 53], geometric
shape [52,56], and feature points such as lines or edges intersections [72], Gabor feature
[27, 46, 92]. Recently, Lowe et al. [43] introduce the so called SIFT method that can
extract image features that are invariant to illumination change, scale, and rotation.
Another well know method using a combination of Harris corner detection [26] and
Laplacian of Gaussian to extract features that are invariant to scale and rotation is
proposed in [50]. Since only the features are involved in the registration, the feature-
based approach has advantages in registering images that are subjected to alteration
or occlusion. However, the use of the feature-based approach is recommended only
when the images contain enough distinctive features [93]. As a result, for some
applications such as medical imaging, in which the images are not rich in detail
and features are difficult to be distinguished from one another, the feature-based
approach may not perform effectively. This problem can be overcome by the area-
based approach.
The common area-based approach is the normalized cross-correlation [18,39]. An-
other correlation based technique which is more robust to noise and changes in the
image intensity than the cross-correlation technique is the phase correlation [35], in
which the normalized cross-power spectrum between the two images is computed in
the frequency domain. Although the correlation approaches show successful results
in registering images that yield translation in the Cartesian, they both fail in the
case where there are changes in scale or rotation between the two images. Combining
2
the phase correlation technique with the log-polar transform (LPT), the Fourier-
Mellin [64,69] approach is proposed as a breakthrough area-based method that yields
invariant properties to translation, scale and rotation. However, recent studies [94]
and [73] indicate that the Fourier-Mellin method is able to recover only a fair amount
of rotation and scale. Moreover Fourier transform introduces a problem of border ef-
fect where rotation and scale affect the aliasing of the image. Another work in image
registration that uses the advantage of LPT is proposed in [94]. Unlike the Fourier-
Mellin approach in which matching and localization procedures are performed in the
frequency domain, the registration method proposed by Zokai et al. is performed
entirely in the spatial domain. The translation parameter is recovered by using the
coarse-to-fine multiresolution framework, while the scale and rotation parameters are
obtained by matching the log-polar transformed images using the cross-correlation
function.
1.1 Log-Polar Transform
Log-Polar transform (LPT) [3, 10, 34, 44, 47–49, 55, 57, 60, 74, 76–79, 84, 89, 90, 94]
is a well known tool for image processing for its rotation and scale invariant prop-
erties. LPT is a nonlinear and nonuniform sampling method used to convert image
from the Cartesian coordinates I(x, y) to the log-polar coordinates ILP (ρ, θ). The
mathematical expression of the mapping procedure is shown below
ρ = logbase√
(x− xc)2 + (y − yc)2 (1.1)
θ = tan−1 y − ycx− xc
(1.2)
where (xc, yc) is the center pixel of the transformation in the Cartesian coordinates.
(x, y) denotes the sampling pixel in the Cartesian coordinates and (ρ, θ) denotes
3
the log-radius and the angular position in the log-polar coordinates. For the sake
of simplicity, we assume the natural logarithmic is used in this paper. Sampling
is achieved by mapping image pixels in the Cartesian to the log-polar coordinates
according to Eqs. (1.1) and (1.2). Fig. 1.1 shows an example of the sampling point
for image in the Cartesian coordinates and the transformed result. As shown in Fig.
1.1(a), the distance between two consecutive sampling points in the radius direction
increases exponentially from center to the furthest circumference. In the angular
direction, for each radius, the circumference is sampled with the same number of
samples. Hence, image pixels close to the center are oversampled while image pixels
further away from the center are undersampled or missed.
The advantage of using log-polar over the Cartesian coordinate representation is
that any rotation and scale in the Cartesian coordinates is represented as shifting in
the angular and the log-radius directions in the log-polar coordinates, respectively.
Given g(x′, y′) a scaled and rotated image of f(x, y) with scale rotation parameters a
and α degrees, respectively, we have:
[x′
y′
]=
[a cosα −a sinαa sinα a cosα
] [xy
], (1.3)
x′ = ax cosα− ay sinα, y′ = ax sinα + ay cosα. (1.4)
In log-polar coordinate, f(ρ, θ)→ g(ρ′, θ′), we have:
ρ′ = logbase√
(x′)2 + (y′)2
ρ′ = logbase√
(ax cosα− ay sinα)2 + (ax sinα+ ay cosα)2
ρ′ = logbase√
(ar cos θ cosα− ar sin θ sinα)2 + (ar cos θ sinα + ar sin θ cosα)2
4
(a) (b)
Figure 1.1: LPT mapping: (a) LPT sampling in the Cartesian Coordinates, (b) theresulting sample distribution in the angular and log-radius directions. Note that in(a), the distance between consecutive sampling points in the radius direction increasesexponentially from center to the furthest circumference due to the logarithm sampling
ρ′ = logbase√
(ar cos(θ + α))2 + (ar sin(θ + α))2 = logbase√a2r2 = ρ+ logbase(a)
and
θ′ = tan−1
(y′
x′
)= tan−1
(ax sinα+ ay cosα
ax cosα− ay sinα
)= tan−1
(ar cos θ sinα + ar sin θ cosα
ar cos θ cosα− ar sin θ sinα
)
θ′ = tan−1
(ar sin(θ + α)
ar cos(θ + α)
)= θ + α.
The advantage of using log-polar over the Cartesian coordinate representation is
that any rotation and scale in the Cartesian coordinates is represented as shifting in
the angular and the log-radius directions in the log-polar coordinates, respectively, as
shown in Fig. 1.2. Fig. 1.2(a) is the original image and Fig. 1.2(b) is the scaled and
rotated version of the original image. Figs. 1.2(c) and 1.2(d) are the LPT images
of Figs. 1.2(a) and 1.2(b), respectively. The column of the log-polar coordinates
represents the angular direction while the row represents the log-radius. We can see
that rotation and scale in the Cartesian coordinates are represented as shifting in the
log-polar coordinates.
5
(a) (b) (c) (d)
Figure 1.2: (a) The original image Lena, (b) the scaled and rotated image of (a), (c)the LPT transformed image of (a), and (d) the LPT transformed image of (b)
1.2 Related Works
1.2.1 Correlation approaches
Correlation between two images is a statistical approach that has been widely used
as a key component for many template matching based image processing applications
such as pattern recognition [19,36], image registration [63,83], and object recognition
[80]. The most commonly used image correlation technique is the cross-correlation.
The technique is based on the squared Euclidean distance measurement. To compute
the cross-correlation coefficient, C(M, I, u, v), between a template or an object model
M and a query image or a target image I, the template is shifted to every position in
the image and the similarity of the grey level between the two images are computed.
C(M, I, u, v) =
∑x
∑y
[M(x, y)− M
] [I(x− u, y − v)− I
]σMσI
, (1.5)
σM =
√∑M
[M(x, y)− M
]2(1.6)
σI =
√∑I
[I(x, y)− I
]2(1.7)
6
where M is the average intensity of the model and I is the average intensity of the
target image at the area of interest. Although the cross-correlation technique itself is
not a template matching technique, the cross-correlation coefficient will have its peak
at the point (u′, v′) where the template matches the image the most. Therefore, we
can use this property to recover the translation of the object between the model and
the target image.
An improved correlation technique for similarity measure purpose is the Sequential
Similarity Detection Algorithm (SSDA) [8], [5]. The similarity between the Model
and the Target Image is computed by the following function.
SSDA(u, v) =∑x
∑y
∣∣M(x, y)− M − I(x− u, y − v) + I(u, v)∣∣ (1.8)
SSDA is not only simpler than the cross-correlation technique, it also performs better
when there is image intensity change between the two images [8].
1.2.2 Phase correlation
Proposed in 1975 [35], performing correlation matching in the frequency domain
demonstrated excellent robustness against noise and changes in image illumination.
Phase correlation [1,38,70] is based on the Shift theorem which states that shifting in
function f(x) by α equal to the multiplication of its Fourier transform by e−jαx. This
theorem also holds for a two the dimensional function or 2D image. For an image
I1(x, y) that is shifted by (α, β) , the Fourier transform and its relationship to the
original image can be shown as follow:
I2(x, y) = I1(x+ α, y + β) (1.9)
F2(ωx, ωy) = e−j(αωx+βωy)F1(ωx, ωy) (1.10)
7
where F (ωx, ωy) is the Fourier transformed image. We can see that the transla-
tion between two images in spatial domain can be represented as a phase difference
in frequency domain. The magnitude of the phase difference can be computed by
the normalized multiplication in frequency domain between the first image and the
complex conjugate of the second image. This computation technique is called a nor-
malized cross power spectrum. We can prove that the normalized cross spectrum is
the magnitude of phase difference follow:
F1(ωx, ωy)F∗2 (ωx, ωy)
|F1(ωx, ωy)| |F ∗2 (ωx, ωy)|
=F1(ωx, ωy)F1(ωx, ωy)e
−j(αωx+βωy)
|F1(ωx, ωy)F1(ωx, ωy)e−j(αωx+βωy)| (1.11)
= e−j(αωx+βωy) (1.12)
To represent this phase difference as translation in spatial domain, we apply the 2D
inverse Fourier transform to it. The peak value of this inverse Fourier transform
indicates the translation between the two images.
1.2.3 Fourier-Mellin
The Fourier-Mellin [11, 58, 61, 64, 69] approach combines the phase correlation
technique with the polar representation to address the problem of object that is both
translated and rotated. If I1(x, y) is an original image, the mathematical description
of translated and rotated image, I2(x, y), can be shown as:
I2(x, y) = I1(x cos θ0 + y sin θ0 − u,−x sin θ0 + y cos θ0 − v) (1.13)
where (u, v) is the translation parameter and θ is an orientation of the rotation.
According to the rotation and translation properties of Fourier transform that the
power spectrum of the rotated and translated image will also be rotated by the same
orientation while the magnitude remains the same, the Fourier transforms of I1(x, y)
8
and I2(x, y) are related by:
F2 (ωx, ωy) = e−j2π(ωxu+ωyv)F1 (ωx cos θ0 + ωy sin θ0,−ωx sin θ0 + ωy cos θ0) (1.14)
It is obvious that the magnitude of F2 is equal to F1, while the spectrum is rotated.
By representing the spectra in the polar coordinate and applying the phase correlation
technique, the rotation can be recovered.
|F2 (r, θ)| = |F1 (r, θ − θ0)| (1.15)
The extension of this method was to apply the Log-Polar transform to the Fourier
Magnitude in order to handle the scale change. If I2(x, y) is a scaled image of I1(x, y)
with the scale factor ξ, its Fourier transform can be expressed as:
I2 (x, y) = I1 (ξx, ξy) (1.16)
F2 (ω1, ω2) =1
ξ2F1
(ω1
ξ,ω2
ξ
)(1.17)
By applying Log-Polar transform to the transformed image, scale is represented as
translation in frequency domain.
F2 (logω1, logω2) = F1 (logω1 − log ξ, logω2 − log ξ) (1.18)
From (11) - (14), we can find the Fourier magnitude spectrum of translated, rotated,
and scaled image in Log-Polar coordinate as:
F2 (log r, θ) = F1 (log r − log ξ, θ − θ0) (1.19)
According to [94], the method however is able to recover a fair rotation and scale.
Moreover since Fourier transform introduce a problem of border effect where the
rotation and scale affects the aliasing of the image.
9
1.2.4 Feature-based approaches
Unlike template matching based object recognition that compare the similarity
between the entire area of interest with the template, feature-based approach uses
only the correspondence between the features to identify and recognize an object. The
commonly used features are such as object color, edges, geometric shape and contour,
object skeleton, and object interest point or feature point. One major advantage of
using feature based approach is dealing with the deformation. The ideal properties of
features are the invariance to illumination change, rotation, scale, camera viewpoint
and object deformation. He et al. [27] used Gabor wavelet to extract the rotation
invariant feature points of the human face and form a mesh representation of the
face for tracking within the video frames. The experiment shows successful results
for objects in the dark background, but the approach may fail in the more realistic
background with various intensities. Based on Harris corner detector, Mikolajczyk
et al. proposed in [50] the scale and affine invariant feature point extraction using
a combination of corner detection and blob detection, or the Laplacian of Gaussian.
Lowe et al [43] introduce SIFT (Scale Invariant Feature Transform), a scale and ro-
tation invariant feature extraction for object recognition and matching by convolving
the image with the Gaussian filter at the different scales creating a Difference-of-
Gaussian (DoG) image, then choose the maxima and minima of the convolved images
across the scales. Most of the works that is based on features aim to address the
problem of recognizing object that is partially viewed or is deformed.
To the best of our knowledge, most feature-based object recognition approaches
suffers from a high computational cost. The common recognition procedure con-
sists of feature extration, invariant property conversion, and feature matching using
10
RANSAC [21] algorithm or k−d tree. our object recognition approach yeild compet-
itive result with much less computational complexity.
1.2.5 Multiresolution framework
One of the most widely used technique to reduce the time for locating an object or
an area of interest in an image is by applying the multiresolution framework [13,65,94].
An Gaussian pyramid is build as a collection of the representation of an image at
different resolution level. A query image or a target image is convolved with the
Gaussian filter, which perform as a two dimensional low pass filter to remove unwanted
effect such as noise and edges as well as to smoothen the image. Since the high
frequency component is removed at each level of the pyramid, one can resample
the image at each level and reduce the number of pixel by a multiple of a power
of two. The smallest image in the hierarchy is the most smoothened and is usually
referred as a coarsest level in the pyramid, while the original image which has not been
smoothened by any filter is referred as a finest level. For spatial search proposes in
many computer vision application, the search begins by matching the known template
with the coarsest level of the target image to obtain a rough location of the object.
The search will be performed repeatedly at the finer level around the neighbor area
of the matched location of the coarser level. This search strategy is known as the
multiresolution framework or course-to-fine search strategy. Recently, researchers
have also studied an alternative way to perform a multiresolution framework by using
wavelet transform or other type of two dimensional low pass filters. Even though
different type of filter provides slightly different result and characteristic of filtered
image, the main idea for spatial search theme remains the same. The performance of
11
the multiresolution framework in reducing the search time for locating an object inside
the image is quite impressive as it has been widely used in many applications, however
it is lack in using the object feature information. Therefore, at each level of the
pyramid, the multiresolution framework still performs based on the blindly searching
theme for every translation around the neighbor area of the matched location obtained
from the coarser level.
12
CHAPTER 2
2D INVARIANT OBJECT RECOGNITION USINGLOG-POLAR TRANSFORM
To recognize an object that appears in a target image is an essential task in many
image processing applications such as image registration, artificial intelligence, and
active vision. Although objects appear as three-dimensional (3D) in the real-world,
the perceived objects in digital video or image are a two-dimensional (2D) projection
of the actual three-dimensional objects. In most cases, major problems in recogniz-
ing objects lie on the 2D changes in object appearances e.g. translation, rotation,
scale, and noise. Recently Lowe et al [43] proposed SIFT, a feature extraction and
description mechanism that is invariant to rotation and scale by convolving the image
with the Gaussian filter at different scales creating a Difference-of-Gaussian image.
Mikolajczyk et al [50] proposed Harris-Laplacian interest point detector using a com-
bination of Harris corner detection [26] and Laplacian-of-Gaussin to obtain scale,
rotation and affine invariant feature points. These two methods are widely used as a
tool for feature-based object recognition. Although SIFT and Harris-Laplacian per-
form well in extracting 2D invariant feature points of the rotated and scaled images
with high repeatability factor [51], the performance of the methods degrades when
working in the noisy environment. This will be illustrated later in Section III.
13
An alternative mechanism, the Log-Polar transform (LPT) [3], is used for 2D
matching applications [55, 60, 94] for its scale and rotation invariant properties. Ro-
tation and scale change in Cartesian coordinate appears as translation in log-polar
domain. The study in [94] shows that the invariant properties of LPT provide ad-
vantages in 2D matching over the Cartesian coordinate and the robustness in noisy
environment. However, in order to achieve the invariant properties, center of log-
polar transformation for both images has to match. One solution is proposed in [64],
in which the technique is usually known as Fourier-mellin (FM). The translation
of the center point is recovered from the correlation of phase magnitude of Fourier
transforms (FFT). By applying LPT to the Fourier transformed image followed by
matching the FFT-LPT image in frequency domain, the rotation and scale parameter
is obtained. Using FFT to match objects in frequency domain makes FM an effective
approach in term of noise resistant, and with LPT, FM is also invariant to rotation
and scale change. However, the studies in [64] indicate that the correctness for log-
polar matching in frequency domain as used in FM is limited to 1.8 scale factor and 80
degree rotation. Another problem of FM is reported in [73] as using phase magnitude
of FFT to recover the translation of an object introduces an aliasing problem.
Another solution is proposed in [94], where LPT is used for image registration
purposes. In this work, Gaussian pyramid is built as a collection of the representa-
tion of an image at different resolution levels. The search for center point begins by
matching the known template with the coarsest level of the target image to obtain a
rough location of the object. The search will be performed repeatedly at the finer level
around the neighbor area of the matched location of the coarser level. This search
14
strategy is known as the multi-resolution framework. Using this multi-resolution ap-
proach, the translation of center point can be found. However, since multi-resolution
approach uses low pass Gaussian filter function to remove the high frequency com-
ponent such as edges, the object or template to be searched needs to contain enough
low frequency information to assure a successful search in coarse level. Moreover, in
noisy environment, the low frequency component can be distorted which yields to the
failure in locating the center point.
Inspired by [94] and [64], we proposed in this Chapter a new LPT based template
matching based object recognition recognition that is invariant to 2D rotation and
scaling, and resistant to a reasonable amount of noise. Unlike the previous works,
we proposed invariant feature based method for recovering the translation of center
point. Our proposed search method is achieved by a combination of Gabor invariant
feature extraction and new multi-resolution log-polar transform. We note that our
proposed method differs from FM in several aspects. First, in our approach the log-
polar matching is performed in spatial domain not the frequency domain. Secondly
we address the problem of translation with the feature based approach. In contrast
with FM, the phase correlation technique is used in our approach for recovering the
scale and rotation in Log-Polar matching, not for recovering the translation.
To make the approach suites our purposes, our proposed searching method needs
to be invariant to 2D geometry changes as well as tolerate to a reasonable amount
of noise. We show in this paper the comparison of 2D invariant properties and noise
resistance between our proposed Gabor feature extraction and several commonly used
invariant feature extraction techniques.
15
2.1 Log-Polar Transform
In our approach we use the LPT to achieve rotation and scale invariant recognition.
For image processing, the LPT is a nonlinear sampling method used to convert image
from the Cartesian coordinate I(x, y) to the log-polar coordinate LPT(ρ,θ). First, a
square image with the size Rmax ×Rmax is transformed to the polar coordinate IP(r,
θ) by the following mapping:
IP (r, θ) = I(Rmax
2+ r cos(
2πθ
nθ),Rmax
2− r sin(
2πθ
nθ)). (2.1)
The variable Rmax is the maximum circle radius of the transformation inside the
image in the Cartesian coordinates. Polar transform will sample the image equally
in the radius direction for r = 0, . . . , nr where nr is the number of samples, and will
sample equally to cover 360 degrees in the angular direction for θ = 0, . . . , nθ where
nθ is the number of samples. Since the transformation is not a one-to-one mapping,
it requires interpolation.
Then the image in the polar coordinates is transformed to the log-polar coordinates
by applying the logarithm arithmetic to the scale r as follows:
ILP (ρ, θ) = IP (log(r), θ) (2.2)
The advantage of using log-polar over the Cartesian coordinate representation is
because any rotation and scale in the Cartesian coordinates is represented as shifts
in the angular and log-radius directions, respectively, in the log-polar coordinates as
shown in Fig. 2.1. Fig. 2.1 (a) is the original image while Fig. 2.1 (c) shows the 50%
scale and 45 degree counter-clockwise rotation of the original image. Figs. 2.1 (b)
and (d) are the LPT images of Fig. 2.1 (a) and (c), respectively. In Fig. 2.1 (d), the
16
column of the log-polar coordinates represents the angular direction while the row
represents the log-radius. We can see that the rotation and scale in the Cartesian
coordinates are represented by shift in the log-polar coordinates.
Another advantage of log-polar for the purpose of object recognition is from the
log-polar’s space variant properties. The sampling resolution in the Cartesian co-
ordinates is higher around the center point or the origin of the LPT and decreases
exponentially as it gets further away from the center. This space variant property
in log-polar gives the advantage for object recognition since the area occupying the
center of the LPT automatically becomes more important than the surrounding ar-
eas which are more likely to be the background of the target image. That makes the
recognition possible even when the object is not perfectly circle or heavily covered by
the background in its boundary areas.
2.2 The New LPT-Based Object Recognition Approach
2.2.1 Establish the model
We define model as an LPT matrix of the object ILPM(ρ, θ) to be mapped with
a set of LPT matrices in the target image, ILPC1 (ρ, θ), . . . , ILPC
n (ρ, θ) which we
call candidates. The superscript M and C represent the model and the candidate,
respectively. The subscript n is the number of feature points found in the target
image. How to identify the feature point will be discussed later in the subsection
3.2.1. To establish the model, we first manually choose an object from the reference
image. Next, we extract a set of feature points pM1 , . . . , pMn from the object using
Harris corner detector [26], which will be explained later again. Then we manually
chose one of the feature points pMi to be the origin of LPT. The size of RMmax of
17
(a) (b)
(c) (d)
Figure 2.1: (a) Original image, (b) the LPT of (a), (c) scaled and rotated image of(a), (d) LPT of (c)
the transformation is chosen to be the radius of the transformation circle that does
not contain any part of the background. The model is established by using the
transformations described in Eqs. 2.1 and 2.2.
2.2.2 Global search
In reality object can be translated to any location inside the entire target image,
and any small translation of the center point of the LPT could distort the log-polar
image or even change the image entirely. Therefore, in order to keep the advantage
18
of the LPT, the problem of translation or global search has to be addressed. A
work regarding this problem is presented in [77] by using projection-based translation
estimation. However, the solution only succeeds for a small translation. In many
recognition approaches, a large computational costs associate with the problem of
translation. The most straightforward search strategy is simply verifying all the
translation throughout the image [62], [37], [86]. In many motion related work [12],
the search space is limited to a small corresponding area from the previous video
frame. He et al proposed in [27] a more robust search strategy using the 2D golden
section algorithm by narrowing the search limit for both upper bounds and lower
bounds for each search iteration. In [94] a multi-resolution or image Gaussian pyramid
framework is used as a search strategy. At each level of the pyramid, first image is
convoluted with the low pass Gaussian filter function to remove the high frequency
component such as edges as well as smoother the image. The filtered image is then
down sampling by 2. Therefore each iteration reduces the size of the image by 4.
A search strategy begins with the coarsest level to locate the area of interest before
moving to the finer level.
Since LPT is very sensitive to translation, sub-pixel accuracy for translation re-
covery is needed. Using multi-resolution approach as a search strategy [94] can gives
the accuracy needed; however, the computational cost is still high due to multiple
level searches. To address this problem, we use a combination of Harris corner de-
tection and our new screening method as a key component to limit the global search
space. We use corner as a feature point in our approach because it gives advantages
of rotation invariant.
19
1. Feature Point Extraction: To counter the classic time-consuming global search
which exhaustively examines every pixel in the target image, we adopt Harris
corner detector to augment the LPT approach. In [26], corners are defined
as the points that have high intensity changes in both horizontal and vertical
directions. For a point (x,y) in an image I, given a shift (u, v), the intensity
change can be expressed as the auto-correlation function:
]E(u, v) =∑x,y
w(x, y) [I(x+ u, y + v)− I(x, y)]2 (2.3)
where w(x,y) is the Gaussian window function. For the small shift (u,v), we
have bilinear approximation of E(u,v) as:
E(u, v) ≈ [u, v]M
[uv
],M =
∑x,y
w(x, y)
[I2x IxIy
IxIy I2y
]. (2.4)
M is a symmetric matrix computed from the second derivative of E around
(x, y). To avoid the explicit eigenvalue decomposition of matrix M , Harris
suggested [26] the response function as follows:
fR = detM − k(tr[M ])2 (2.5)
where k is a constant value suggested to be 0.04 in [26], detM = λ1λ2 and
tr[M ] = λ1 + λ2. The value of the response function fR is used to classify
the type of image point (x, y). The point (x, y) can be classified as corner if
fR is positive and large. A local maxima window is applied to remove the
weak corners in the area of a strong corner. We extract a set of feature points
pC1 , . . . , pCn from the target image using Eqs. 2.3-2.5.
2. Screening: Corners have been proved to be invariant to rotation; however, it
is not invariant to scale change. Having the standard deviation parameter of
20
the Gaussian window and the size of the local-maxima mask to be small could
help solving this problem. As a result, even when object is scaled, most corners
detected still remain. However, the number of feature point detected increases
dramatically when the standard deviation is small, and this would slow down
the global search process. To limit the number of feature points, we introduce
a new screening method which requires much lower computational time than
applying the LPT to every feature point.
In this screening phase, we construct a four-dimensional screening vector,−→SV =
[qi, qj , qk, ql], for every feature point detected in the previous step by applying
the screening mask, as shown in Fig. 2.2, to every feature point. More specif-
ically, the screening mask has the size of 11 × 11 pixels and its center pixel,
C(i, j), is the feature point we want to perform the screening. Two types of
features are computed accordingly: axis features, denoted as ai, and quadrant
features, denoted as qi, for i = 1, . . . , 4. The axis features are computed from
the difference between the average intensity of every pixel in area Ai and the
feature point C(i, j). These axis features represent the intensity change when
the feature point C(i, j) is shifted in each axis direction. The quadrant features
are computed from the ratio between the average intensity of every pixel in area
Qi and the feature point C(i, j). The mathematical descriptions are shown in
2.6.
The axis features of the same axis are compared, i.e. a1 with a3 and a2 with a4.
We choose the area Qk that is not connected to the areas that have the highest
axis feature value in its axis, and use the variable qk obtained from this area
21
as the first element in our screening vector−→SV . For example, if a1 > a3 and
a4 > a2, we choose q3 to be the first element in−→SV .
Figure 2.2: A screening mask (dimension is in pixel)
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
a1 = 14
4∑n=1
I(i, j − n)− I(i, j)
a2 = 14
4∑n=1
I(i− n, j)− I(i, j)
a3 = 14
4∑n=1
I(i, j + n)− I(i, j)
a4 = 14
4∑n=1
I(i+ n, j)− I(i, j)
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣
q1 =4∑
n=1
4∑m=1
I(i+ n, j −m)/I(i, j)
q2 =4∑
n=1
4∑m=1
I(i− n, j −m)/I(i, j)
q3 =4∑
n=1
4∑m=1
I(i− n, j +m)/I(i, j)
q4 =4∑
n=1
4∑m=1
I(i+ n, j +m)/I(i, j)
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.
(2.6)
We now can create a vector−→SV by substituting the second, third and forth
element in−→SV with the rest of three q variables selected after the first element
in the counter-clockwise direction. Thus, in this example since q3 is the first
element in−→SV , we will have
−→SV = [q3, q4, q1, q2]. Since corner is invariant to
22
rotation, screening vector−→SV remains proportionally the same in the target
image even though the object is rotated.
To screen the feature points in the target image, we first find the screening
vector−−−→SV M from the model which is computed in the reference image at the
feature point that is used as the origin for creating the model. We then find
a set of screening vectors from the target image {−−→SV C
1 , . . . ,−−→SV C
k } where k is
the number of feature points extracted in Subsection 2.2.2. A set of remaining
feature point after screening, Pscr, can be described as:
Pscr =
{pi
∣∣∣∣ SV Ci . SV
M
‖SV Ci ‖ × ‖SV M‖ ≥ δ ; i = 1, .., k
}(2.7)
where the operator “.” is the inner product, and || || is the norm operator. The
search space is now limited to the remaining feature points in the target image.
We establish a set of candidates {ILPC1 (ρ, θ), . . . , ILPC
n (ρ, θ)} from the set of
remaining feature points, Pscr, by applying the LPT, described in Eqs. 2.1 and
2.2, to the target image using each feature point pC1 , . . . , pCn as the origin, where
n is the size of Pscr. The size of the LPT circle RCmax should be at least the size
of the object in the target image. Since the size of the object is unknown in the
target image, one may choose RCmax = RM
max The values of nr and nθ have to be
the same as what were used in establishing the model. Fig. 2.3 demonstrates
the performance of our screening method.
3. Local search: After the model and the candidates are established, we now per-
form a local search by mapping each of the candidates to the model. Since
rotation and scale appear as shift in the angular and the log-radius directions of
the log-polar coordinates, model and candidates may be misaligned. We recover
23
(a) (b)
(c) (d)
Figure 2.3: (a) feature points extracted from the original image, (b)the red dot indi-cates the feature point selected as the original point for creating the model and thescreening vector, (c) feature points extracted from the scaled and rotated image, (d)feature points remained after screening
these shifting by phase-correlating each of the candidates ILPCi (ρ, θ) with the
model ILPM(ρ, θ). To do so, we first find their 2D discrete Fourier transform
FCi (ωρ, ωθ) for i = 1, . . . , n and FM(ωρ, ωθ). For an N × M image, the 2D
discrete Fourier transform can be computed as:
F (ωρ, ωθ) =1
NM
N−1∑ρ=0
M−1∑θ=0
ILP (ρ, θ)e−2πj(ρωρ/N+θωθ/M). (2.8)
24
Any translation in the spatial domain results in phase shift in the frequency
domain as shown below:
ILPb(ρ, θ) ≡ ILPa(ρ−Δρ, θ −Δθ) (2.9)
FM(ωMρ , ωMθ )FC∗
i (ωCρ , ωCθ )∣∣FM(ωMρ , ω
Mθ )FC∗
i (ωCρ , ωCθ )∣∣ = eωρΔρi+ωθΔθi. (2.10)
The translation, (Δρ,Δθ), between two images is obtained by finding the peak
of the inverse Fourier transform of this normalized cross-power. From these two
variables, we can compute the scale and rotation between two log-polar images
as:
Scale = exp[Δρ log(Rmax)
nρ], Rotation = 360× Δθ
nθdegrees. (2.11)
In our approach, we phase-correlate each of the candidates ILPCi (ρ, θ), i =
1, . . . , n with the model ILPM(ρ, θ). Each candidate is then shifted by ILP SCi (ρ, θ) =
ILPCi (ρ−Δρ, θ −Δθ) before performing further verification. The superscript
SC represents the shift-recovered candidates.
4. Similarity measure: Although correlation technique itself is already a similarity
measure [8], in reality we may have to recognize or track multiples objects in the
target image. Each object gives different phase correlation value when mapping
with the target image. Thus relying only on the maximum phase correlation
value is not enough for the purpose of object recognition. We introduce a simple
similarity measure technique that gives fast computation and can withstand
noise and changes in image intensities. After each candidate is phase-correlated
and recovered for its rotation and scale, we now verify the similarity between
each of the shift-recovered candidates and the model. We treat each row and
25
column of two images as vectors and find the inner product result sρ(i) for
i = Δρ to nρ and sθ(j) for j = 1 to nθ as follows:
s•(k) =LSC• (k).LM• (k)
‖LSC• (k)‖ × ‖LM• (k)‖ (2.12)
where LSC(k) is the image vector k of the shift-recovered candidates, LM (k) is
the image vector k of the model. The subscript “.” indicates a particular ρ or
θ vector. The final similarity measure, Sl, is the sum of all sρ(i) and sθ(j),
Sl =
[nρ∑i=1
sρ(i) +
nθ∑j=1
sθ(j)
]/ (nρ + nθ −Δρ) . (2.13)
To make decision if the candidate matches the model, the similarity factor Sl
has to be greater than a threshold value ε.
2.3 Experimental Results
The performance of the proposed approach is evaluated on two bases; the robust-
ness and the accuracy. In the first phase of this section, the computational cost of the
algorithm is tested. We also discuss in detail the robustness of the proposed search
strategy which combines the feature point extraction with the screening method. The
performance of the proposed approach in term of feature point matching and screen-
ing as well as the robustness to scale and rotation is discussed and compared with
the correlation based techniques. The later subsection will present the accuracy of
the proposed object recognition approach.
2.3.1 The robustness
One of the major contributions in this work is the proposed search strategy which
combines a feature point extraction and the screening method. First we will evaluate
26
the overall computational cost of the proposed object recognition approach. All ex-
periments are carried out on Matlab software running on Pentium IV 2.0-GHz PC.
The average computational time is shown in Table 2.1. The results demonstrate
the performance of our screening method for which the average computational time
reduces to be approximately 25.19% and 7.43% of the computational time without
the screening in the normal environment and in the noisy environment, respectively.
Compared with the naive search strategy that blindly compute the LPT for all trans-
lations in the target image, the computational load of the later case is inefficiently
high to the approximately 100 times more than the proposed approach, e.g. 550 sec-
onds for 240 × 360 image size. The computational time can be slightly reduced when
smaller size of nr and nθ, is used. However, reducing the resolution of LPT leads to
the lower in accuracy rate. We also applied a multi-resolution search strategy frame-
work as suggested in [94]. We adopt a Gaussian pyramid with 3 levels reduction.
The search begins with all translations in the coarsest level. For the higher level, we
search only the 20 × 20 neighbor pixels around the match point of the coarser level.
The average computational time using a multi-resolution framework is greater than
our proposed approach as it requires approximately 26 seconds for each of the BMW
video frame image and 32 seconds for US coins images. This shows that our We also
note that using multi-resolution framework might not be suitable for multiple object
recognition since more than one area of interest might be needed to consider and this
could lead to a much higher computational cost in the finer level.
From the experiment, as shown in Table 2.1, the robustness of our object recog-
nition approach is presented. Feature point and feature point extraction plays an
27
(a) (b)
(c) (d)
Figure 2.4: (a) and (c) Feature points extraction of the House and blox images, (b)and (d) 8 feature points are randomly selected from each image
important role in reducing the computational cost. It is also shown that the screen-
ing method significantly further reduces the computational cost by approximately 30
to 80 percents. We extend our experiment to feature point matching and feature point
screening to evaluate the efficiency of the proposed screening method. To the best of
our knowledge, there is no publication that provides the screening method suited our
LPT based object recognition purposes. One of the commonly used feature matching
techniques is RANSAC [21]. This technique provides high accuracy feature point
matching with the tradeoff of high computational cost as the multiple feature point
outliners are involved in the computation to fit the models. Thus, it is not suitable
28
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
(m) (n) (o) (p) (q) (r)
Figure 2.5: 18 candidates from scaled and rotated the House image
for our screening purposes. We therefore compare our proposed screening method
with the correlation based feature matching techniques which can provide the result
closest to our screening purposes. Correlation technique is widely used as a tool for
matching feature points between two distinct images [81], [2], [82], [15], [91]. Two of
the correlation based techniques: average square difference correlation (ASD) [2], [82]
and variance normalized correlation (VNC) [2], [82] are compared with our work. The
first technique, ASD, is based on the intensity difference between square areas around
the feature point. The ASD correlation coefficient between feature point m and n is
defined as:
CASD (m,n) =1
N
∑m′∈ℵ(m),n′∈ℵ(n)
[I (m′)− I (n′)] (2.14)
where N is the size of neighborhood window. The size of neighborhood window used
in the experiment is 9 × 9 pixels. The second correlation based feature matching
29
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
(m) (n) (o) (p) (q) (r)
Figure 2.6: 18 candidates from scaled and rotated the blox image
technique, VCN, uses the normalized intensity of the neighborhood window and its
variance to match the feature point. The VNC correlation coefficient between feature
point m and n is defined as:
CV NC (m,n) =1
N√σ2I (m)σ2
I (n)
∑m′∈ℵ(m),n′∈ℵ(n)
[I(m′)− I(m)
] [I(n′)− I(n)
](2.15)
In order to evaluate the performance of the proposed screening approach when
the object is scaled and rotated. The House and blox images, which are chosen as
reference images in this experiment, are scaled by the factor of 0.9 and 2.5 respectively.
Two sets of candidate images are created by rotating the scaled images by 10i degree,
where i = 1, . . . , 18. This will create a total number of 18 candidates for each set of
the tested images as shown in Figs. 2.5 and 2.6. Each image is applied the Harris
corner detection to extract the feature point. We randomly choose 8 feature points
30
from each reference image, as shown in Fig 2.4, as reference features to be matched
with the candidates. The parameters used in the test are selected as; size of screening
mask is 11 × 11 pixel, the size of VNC mask and ASD mask is 9 × 9 and σ equal to
1.
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Per
cent
age
of c
orre
ct m
atch
δ = 0.99δ = 0.98δ = 0.97δ = 0.96δ = 0.95
(a)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Per
cent
age
of c
orre
ct m
atch
δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.5
(b)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Per
cent
age
of c
orre
ct m
atch
δ = 1000δ = 2000δ = 3000δ = 4000δ = 5000
(c)
Figure 2.7: A demonstration of the performance in matching feature points in theobject that is rotated with various orientations. (a) proposed approach (b) VNC (c)ASD
The 8 reference feature points of both reference images are matched with the
candidates using our proposed approach, VNC and ASD. We visually observe the
result of each approach to see the average percentage of the correct feature matching
at the different image orientation and at the different threshold value. Fig. 2.7
shows the average proportion of the correct feature matching using the reference
features. The result of using VNC and ASD for matching features is also provided for
comparison. From the experiment, the performance of both VNC and ASD degrades
31
as the orientation becomes higher. This is because the correlation technique is not
invariant to rotation. At the higher orientation, it can be easily seen that the proposed
approach outperform VNC and ASD and can provide the success rate (percentage of
correct match) of more than 40 percents even with the largest threshold value.
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.93δ = 0.92δ = 0.91
(a)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.6δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.2δ = 0.1δ = 0.1
(b)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 1000δ = 1000δ = 1000δ = 2000δ = 3000δ = 4000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 7000δ = 7000δ = 8000δ = 9000
(c)
Figure 2.8: A demonstration of the performance in screening feature points usingfeature number 3 of the blox image. (a) proposed approach (b) VNC (c) ASD
We tested the performance of our approach for matching feature points, now the
performance of the proposed approach in screening purpose is evaluated. For each of
the reference image, 1 feature point from the 8 reference feature points is randomly
chosen. Reference features number 3 from House image and number 4 from blox
image are chosen as reference feature point for screening. We visually observe the
remaining feature points after screening in each candidate, Pscr, to see whether the
screening procedure succeed in removing the undesire feature point while maintaining
the desire feature point. The result of using VNC and ASD for matching features is
32
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
OrientationR
emai
ning
feat
ures
afte
r sc
reen
ing
(%)
δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.94δ = 0.93δ = 0.93δ = 0.92δ = 0.91
(a)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1
(b)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
OrientationR
emai
ning
feat
ures
afte
r sc
reen
ing
(%)
δ = 1000δ = 2000δ = 3000δ = 4000δ = 4000δ = 4000δ = 4000δ = 5000δ = 6000δ = 7000δ = 8000δ = 9000
(c)
Figure 2.9: A demonstration of the performance in screening feature points usingfeature number 3 of the house image. (a) proposed approach (b) VNC (c) ASD
also provided for comparison. Figs. 2.8 and 2.9 show the proportion of the number
of the remaining feature point after screening, Pscr, to the total number of feature
point for each candidate at the different threshold value, δ. The discontinuity in the
plot indicates the failure of the tested approach at that particular threshold value in
maintaining the correct feature point after screening. The experiment shows VNC
fails to perform as screening at high orientation. Although ASD provides better
result than VNC, its performance is still relatively low as the proportion of remaining
features after screening is high (approximately 50 percents using threshold = 5000).
The experiment shows that the proposed approach performs better at almost every
orientation value. It is proven to be invariant to rotation and scale. Using the
threshold value of 0.93 in both experiments by the proposed approach, the correct
feature point remains while more than 60 percents of the total number of feature
points is removed.
33
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.93δ = 0.92δ = 0.91
(a)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.7δ = 0.6δ = 0.5δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1δ = 0.1δ = 0.1
(b)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 2000δ = 2000δ = 3000δ = 3000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 6000δ = 7000δ = 7000δ = 8000δ = 8000δ = 9000
(c)
Figure 2.10: A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the blox image. (a) proposed approach(b) VNC (c) ASD
We extend our test to the noisy environment. For the experiment, a Gaussian with
zero mean and 0.01 variance is added to the images. Figs. 2.10 and 2.11 demonstrated
the performance of the three approaches in screening. Our proposed screening method
outperform VNC and ASD with higher success rate and also can provide less number
of remaining features after screening. The result also indicate our proposed screening
method has a much smoother screening result throughout all orientation than VNC
and ASD.
It is also important to note that another advantage of the proposed method over
ASD and VNC is the smaller range of the threshold value which is from 0 to 1. In
ASD, the range of the correlation coefficient is limited by the size of the screening
mask which can grow more than 10000 when the mask size is bigger than 9 × 9
pixels. VNC on the other hand is a normalized correlation technique which provides
34
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.96δ = 0.95δ = 0.94δ = 0.93δ = 0.92δ = 0.91
(a)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 0.8δ = 0.7δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1
(b)
0 20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientation
Rem
aini
ng fe
atur
es a
fter
scre
enin
g (%
)
δ = 2000δ = 3000δ = 4000δ = 4000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 7000δ = 8000δ = 9000
(c)
Figure 2.11: A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the house image. (a) proposed approach(b) VNC (c) ASD
the limited range of the correlation coefficient from -1 to 1. Based on correlation
technique, both ASD and VNC performs slightly better than the proposed approach
at the low orientation. However, at the higher orientation, it can be seen clearly that
the proposed screening method outperform VNC and ASD in both without noise and
with noise environment. This allow user to set the proper threshold value for the
desired application easier than using ASD and VNC.
2.3.2 The accuracy
To test the accuracy of our object recognition approach, we used images taken from
a Nikon 4300 digital camera (UScoins), 2 video frame sequence images (a BMW car
video sequence from Caltech Image Database 1, and images taken from the aircraft
video sequence). We also create a set of noisy images by adding the Guassian noise to
1http://www.vision.caltech.edu/html-files/archive.html
35
(a) (b) (c)
Figure 2.12: Reference images and selected object (a) US coins(b) BMW (c) Aircraft
Images Size # of # of success Avg. time Avg. timeexperiment w/o screening w/ screening
US Coins 442×476 14 14 18.6261 5.9557BMW video 240×360 18 14 26.1667 3.8649Noisy Coins 442×476 14 11 102.2642 7.6032Aircraft 480×704 7 4 21.1238 8.9620
Table 2.1: Recognition results
some of the tested images. Samples of recognition procedures are shown in Fig. 2.13.
The sets of tested images are presented in Figs. 2.14 to Fig. 2.17. We also perform
the test with several images in the different conditions such as an artificially rotated
and scaled object, Fig. 2.13 (a), a video sequence of multiple image frames, Figs. 2.13
(b) and (d), and image in noisy environments (Gaussian noise is added), Fig. 2.13 (c).
The parameters of the LPT, feature extraction, and similarity measure are selected
accordingly in the experiments. The parameter σ, nr, nθ, δ, ε and the size of the local
maxima mask equal to 1, 128, 128, 0.98, 0.97, and 11 × 11 pixels respectively. The
parameter RMmax, is chosen manually in the experiments and the parameter RC
max, is
set to be the same as RMmax. The results of the experiments support our objective of
36
using LPT for the template matching based object recognition with the accuracy of
82.05% in the normal environment and 77.78% in the noisy environment.
2.4 Extended Works
We presented the innovative screening approach in Subsection 2.2.2 that signifi-
cantly reduces the computational time in the search process than applying the LPT
directly to every feature point. However, as discussed earlier, corner is not invariant
to scale change. As a result, it is important to use small value of standard deviation
parameter of the Gaussian window and the size of the local-maxima mask in order
to have sufficiently large amount of feature points to ensure that most corners de-
tected still remain even when object is scaled. The first extension of this work is to
improve the feature point extraction method to not only invariant to rotation, but
also to scale change. As a results, we adopt Gabor wavelet approach for extracting
the 2D invariant feature points. Some modifications to existing Gabor feature ex-
traction methods [27] are made to enhance the performance. More details of Gabor
feature extraction method can be found in Subsection 3.2.1 of Chapter 3 as well as
in Appendix A.
Although feature point extraction can reduce the search space to a set of features,
the number of feature detected is still high. To reduce the computation load, we
introduce an innovative search method, the Multi-resolution LPT (MR-LPT). Our
MR-LPT differs from the multi-resolution based method used in [94] as we do not use
image pyramid of the target image as a search space since performing the search from
coarsest to finest level in the noisy environment gives higher error rate than searching
from the highest resolution. Instead, we keep the resolution of the target image as
37
its original and use different sampling parameters nρ and ntheta for each iteration of
the search.
The search begins by establishing a set of candidates {ILPC1 (ρ, θ), . . . , ILPC
n (ρ, θ)},
from the set of feature points, P , by applying the LPT, described in Eqs. 2.1 and
2.2, to the target image using each feature point {pC1 , . . . , pCn } as the origin, where
n is the size of feature point (more details of feature extraction procedure can be
found in Subsection 3.2.1 of Chapter 3). The size of the LPT circle RCmax should be
at least the size of the object in the target image. Since the size of the object is
unknown in the target image, one may choose RCmax = RM
max, where the superscript
M and C represent the model and the candidate, respectively. The values of nρ and
ntheta for each iteration is T × 2(s− 1) where s is the iteration number s = {1, 2, 3}.
We use T = 8 in this example and throughout paper. In other word, the MR-LPT
performs from 16 × 16 sampling resolution until 64 × 64 sampling resolution, which
is the resolution used for creating the model. At each resolution level, the MR-LPT
images from each feature points in the target image are phase correlated with the
model that is created from MR-LPT at the same resolution. Feature point will be
discarded if its phase correlation value is less than a threshold value. The procedure
is repeated from the coarsest resolution level to the finest.
To obtain MR-LPT, one may perform the LPT with different sampling param-
eters, i.e. nρ = nθ = 16, 32, and 64, respectively. However, since LPT at different
sampling resolutions is computed independently and thus doing so the LPT image
from the coarser level will be discard and the LPT image at the finer level is computed
separately. This makes the computation of the coarser level wasted. We propose an
innovative and alternative way to perform the MR-LPT by mapping the location of
38
the area where the LPT at the coarser level to the area of LPT of the finer level. Fig.
2.18 demonstrates how the area of coarser level LPT is mapped to the area of finer
LPT. Figs. 2.18(a), 2.18(b), and 2.18(c) are the LPT mapping at the sampling reso-
lution of 16×16, 32×32, and 64×64, respectively. In MR-LPT, the areas with strip
pattern are used to represent the coarsest level of LPT image. Unlike regular LPT
(as shown in Fig. 2.18(a)) this LPT image at the coarsest level will not be discarded
after screening. Instead it is reused as a part of the finer LPT. The coloured areas
represent a finer MR-LPT 32× 32. And lastly, every block in Fig. 2.18(c) represents
the finest level, 64× 64.
The mathematical expression of MR-LPT can be demonstrated as follow. Let
M(1)sm×m be an MR-LPT image at level s with the size m × m. The finer level of
MR-LPT, M(1)sm×m can be computed by first calculating M
(2)sm×m and M
(3)sm×m, in the
same manner as M(1)sm×m with 2− n step shift in the log-radius direction, orientation
direction, and both log-radius and orientation directions, respectively. Then
Z(n)s+12m×2m = A2m×m ×M (n)s
m×m × ATm×2m (2.16)
where A is a matrix such that:
A = {aij ∈ A|aij = 1 for 2j − 1 = i, aij = 0 elsewhere.} (2.17)
Then,
Z(n)s+12m×2m = R
(n)s+12m×2m × Z
(n)s+12m×2m × C
(n)s+12m×2m (2.18)
where R and C are the translation matrices in the row and column direction, respec-
tively:
39
R(n)2m×2m =
{I2m×2m for n = 1, 3rij ∈ R|rij = 1 if j = i+ 1, rij = 0 elsewhere
(2.19)
C(n)2m×2m =
{I2m×2m for n = 1, 3cij ∈ C|cij = 1 if j = i+ 1, cij = 0 elsewhere
(2.20)
M(1)s+12m×2m =
3∑n=1
Z(n)s+12m×2m (2.21)
Fig. 2.19 demonstrates the MR-LPT image of US coins (Fig. 2.20) at resolution
o16×16, 32×32, and 64×64, respectively. The effectiveness of combined Gabor feature
extraction and MR-LPT search method in reducing the number of feature points in
the search space is presented in Fig. 2.20. More details and experimental results
using the proposed MR-LPT and Gabor feature extraction are presented in [47].
2.5 Summary
In this chapter, we have presented a new template matching based object recogni-
tion approach using the LPT. The rotation and scale invariant properties of the LPT
allow us to develop a new method for recognizing objects that is difficult to overcome
in the Cartesian coordinates. The method proposed extracts the rotation invariant
feature points from the object and the target image, and uses these feature points to
develop the LPT representations. Furthermore, we propose a new method to evaluate
and select the feature points using the screening vector which shows successful results
in reducing the number of the feature points that are considered candidates for log-
polar mapping. The result is a significant reduction of the computational time. We
also introduce a new similarity measure for final classification.
40
The experiments demonstrate how our recognition approach succeeds in recog-
nizing objects that are rotated and scaled in different conditions, i.e. still images,
video sequences, and noisy environment. The performance of our proposed screening
method is also evaluated. Our approach yields robustness for fast computation, and
since no motion estimation is used, our approach also works well for object tracking
even when the video sequence is skipped or reordered.
41
(a) (b) (c)
corners detected
(d) (e) (f)
(g) (h) (i)
(j) (k) (l)
Figure 2.13: Experimental results in the target image: (from left to right) featurepoint extraction, screening, recognizing object
42
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
(m) (n)
Figure 2.14: Sets of tested images, UScoins
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
(m) (n)
Figure 2.15: Sets of tested images, UScoins in noisy enivironment
43
(a) (b) (c) (d) (e) (f)
(g) (h) (i) (j) (k) (l)
(m) (n) (o) (p) (q) (r)
Figure 2.16: Sets of tested images, BMW
(a) (b) (c) (d) (e) (f)
(g)
Figure 2.17: Sets of tested images, Aircraft
44
(a) (b) (c)
Figure 2.18: MR-LPT mapping
(a) (b) (c)
Figure 2.19: MR-LPT of the quarter coin in US coin image: (a) 16× 16, (b) 32× 32,and (c) 64× 64
45
(a) (b) (c)
Figure 2.20: Feature point reduction from MR-LPT (red dot represents the centerpoint of LPT)
46
CHAPTER 3
IMAGE REGISTRATION USING ADAPTIVE POLARTRANSFORM
LPT is a well known tool for image processing for its rotation and scale invariant
properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in the Cartesian co-
ordinates appear as translation or shifting in the log-polar domain. These invariant
properties make LPT based image registration robust to scale and rotation. How-
ever, LPT suffers from the nonuniform sampling. One major problem of that is the
high computational cost in the transformation process, which comes from the over-
sampling at the fovea in the spatial domain. Since no information is gained from
oversampling, this computation is wasted. Another major problem of LPT is the bias
matching. With LPT, the matching mechanism focuses only at the fovea or the area
close to the center point of the transformation, while the peripheral area is given less
consideration. Furthermore, occlusions and alterations between the two images may
occur which is not considered by LPT. For example, the satellite images of the same
location but taken at different times may contain occlusion due to climate change or
cloud. In order to effectively register the two images, image registration method that
is invariant to occlusion and alteration is needed.
47
Motivated by these observations of the conventional LPT, our work attempts
to effectively register two images that are subjected to occlusion and alteration in
addition to scale, rotation and translation. We introduce adaptive polar transform
(APT) technique in the spatial domain that evenly samples the image. We further
apply the projection transform to the transformed image to reduce the image from
two to one dimensional vector. With APT and the projection transform, rotation and
scale in the Cartesian appear as shifting and variable-scale in the transformed domain.
The new method requires less computational cost in the transformation process than
that of the conventional LPT, while maintaining the robustness to the changes in
scale and rotation. We further introduce a new search method that efficiently uses
the scale and rotation invariant feature points to eliminate the exhaustive search
for all the possible translations of the model image. Another contribution of this
work is the image comparison scheme in the projection domains that is designed for
locating the areas that are subjected to occlusions and alterations in the image. This
information is useful for many applications such as medical imaging or scene change
detection. For example, medical images of the same patient that are taken before
and after the surgery are registered, compared and analyzed to evaluate the progress
of the surgery. It is valuable for the image registration system to not only align the
images, but also to automatically locate the area where the two images differ. This
information would ease the doctor for comparing and analyzing the medical images.
We denote the area that contains alteration or occlusion as the altered area.
48
3.1 Adaptive Polar Transform Approach
Inspired by the scale and rotation invariance properties of the conventional LPT
approach, we propose a novel image transformation scheme, APT, that is designed
to address the two major problems of LPT: the high computational cost in the trans-
formation procedure and the bias matching due to the nonuniform sampling, while
maintaining all the advantages. Subsection 1.1 briefly introduces LPT and its advan-
tages for image registration. Subsection 3.1.1 presents the motivation of the proposed
APT approach. Subsection 3.1.2 and 3.1.3 present the proposed APT and the pro-
jection transform, respectively.
3.1.1 Motivation of the proposed approach
Although LPT has been widely used in many image processing applications, it
suffers from nonuniform sampling. As shown in Fig. 1.1(a), as the radius of the
mapping increases, pixels in the Cartesian coordinates are sampled with less number
of times. This nonuniform sampling would cause image pixels that are far away from
the center point to be missed. This phenomenon yields losses in image information
which will eventually decrease the accuracy of the registration system. To prevent
any image pixel from being missed in the mapping process, large numbers of samples
are required in both the log-radius and the angular directions. We denotes nρ and
nθ as numbers of samples in the log-radius and the angular directions, respectively.
We also denote Ri as the radius size in pixel for i = 1, . . . , nρ, which will be sampled
equally to cover 360 degrees in the angular direction for θ = 0, . . . , nθ. To prevent
undersampling in the log-radius direction, the condition of Rnρ − Rnρ−1 ≤ 1 has to
49
be met. It can be easily found that the following equations should hold:
Ri = exp[i× logRmax
nρ], Rmax = Rnρ , (3.1)
nρ ≥logRmax
logRmax − log(Rmax − 1). (3.2)
In the angular direction, the greatest sampling circumference, Rmax, contains
approximately 2πRmax image pixels. Hence, we need to have the number of samples
in the angular direction equal or greater than that amount to prevent image pixels
being missed in the sampling procedure:
nθ ≥ 2πRmax. (3.3)
For the sake of simplicity, nρ and nθ can be selected in the form of multiple power
of two. This will not only make the computation simpler, but also accelerate the
matching which is usually carried out by cross-correlation [39] or by Fourier phase
correlation [35]. Hence, Eqs. (3.2) and (3.3) can be expressed as:
nρ = 2{ceil{1
log 2×log[ log Rmax
logRmax−log(Rmax−1)]}}, nθ = 8Rmax. (3.4)
The operation ceil(A) denotes the nearest integers greater than or equal to A. With
the proposed numbers of samples to be used in the computation of LPT, image pixels
in the Cartesian coordinates will not be missed in the mapping process. The pixels at
the circumference will be mapped to the log-polar coordinates at approximately one-
to-one. However, image pixels at the fovea will be oversampled. Since no information
is gained from oversampling, the computation is wasted. For an image of size 256×256
pixels, the minimum 613 samples in the log-radius and 805 samples in the angular
directions are required (1024 samples in both the log-radius and the angular directions
50
in the case of power of 2 number). It can be easily seen that the required number
of samples is too large compared to the original size of the image in the Cartesian
coordinates.
Another major problem of using the conventional LPT for image registration is
the bias matching. Since the number of samples at the fovea of the image is larger
than that of the peripheral, the matching mechanism will also focus more on the
fovea or central area while the peripheral area is given less consideration. Hence,
using the conventional LPT for image registration yields inaccurate result when there
are changes in the image, especially at the area close to the center point. Fig. 3.1
shows an example of registering the Lena image patch to the non-occluded target
image (the upper image on the right of Fig. 3.1(a)) and to the occluded target image
(the lower image on the right of Fig. 3.1(a)), respectively. In this demonstration, an
artificial block of 20×20 pixels is used as occlusion. The sensitivity of the matching is
evaluated through the correlation function between the model image and the target
image (referring to the registration method using the conventional LPT proposed
in [94]). Figs. 3.1(b) and 3.1(c) show the correlation coefficients at the point where
the model image and the target image are matched in case of the non-occluded target
image and the occluded target image, respectively. In this demonstration, it can
be seen that the conventional LPT performs effectively in registering images in the
normal condition. Fig 3.1(b) shows the strong peak of the correlation coefficients
and Fig. 3.1(a) (upper-right image) indicates accurate registration result. However,
when part of the target image contains occlusion, the peak of the correction is reduced
dramatically to the point that it is difficult to locate the peak, as shown in Fig. 3.1(c).
In fact, the peak of the correlation coefficients is mislocated (false peak). Therefore,
51
the registration result using the conventional LPT in this case yields errors for both
scale and rotation, as shown in the lower-right image in Fig. 3.1(a). Since the method
yields high sensitivity to changes in the image, especially at the area close to the fovea,
the conventional LPT is not suitable for registering image that is occluded or altered.
3.1.2 The proposed APT approach
The objective of our proposed APT is to address the two major problems of
the conventional LPT: the uneven sampling to the entire transformed image, and
the computational waste due to the oversampling at the fovea. It can be seen that
the cause of the problems comes from the fact that the number of samples of the
conventional LPT increases exponentially from the peripheral to the fovea. Hence,
when the transformed image is used for matching, more consideration is given to the
fovea than to the other area of the image. To address these problems, we propose
a novel APT approach that yields robustness in registering images that subjects to
occlusion and alteration, while maintaining sufficiently low computational cost during
the transformation.
We denote nr and nθ as the numbers of samples in the radius and the angular
directions, respectively. For APT, given a square image with the size of 2Rmax×2Rmax
pixels that we want to evenly sample and transform from the Cartesian to polar
coordinates, the parameter nr should be greater or equal to Rmax. We denote Ri as
a size of radius in pixel for the sample i in the radius direction, and Ui the sampling
circumference at Ri. Hence Ri = i × Rmax/nr. In the conventional LPT and polar
transform, Ui is sampled equally with the fixed numbers of sample, nθ, to cover 360
degree. To effectively sample the image, the parameter nθ needs to be adaptive. Since
52
(a)
(b) (c)
Figure 3.1: Example of inaccurate registration results using the conventional LPT dueto occlusion in the target image: (a) the model image patch cropped from Lena imageis registered to the non-occluded target image on the upper-right and occluded targetimage on the lower-right, where the red rectangular indicate the registration results,(b) the correction coefficients when registering to the non-occluded target image, and(c) the correction coefficients when registering to the occluded target image
53
(a) (b)
Figure 3.2: APT mapping: (a) adaptive sampling in the spatial domain with moresamples in the angular direction as the radius increases, (b) the resulting sampledistribution in the angular and radius directions. Note that in (a), the distancebetween consecutive sampling points in the radius direction remains the same for allradius circumferences and so does the angular direction
each circumference Ui covers approximately 2πRi pixels, the number of samples in
the angular direction for each sample i in the radius direction, nθi, should also be
adjusted accordingly. For the sake of simplicity, we estimate 2πRi ≈ 8Ri. Hence, the
effective value of the sampling parameters nr and nθican be expressed as:
nr = Rmax, nθi= 8Ri. (3.5)
The complete implementation to transform a square image with the size 2Rmax ×
2Rmax pixels in the Cartesian I(x, y) to adaptive polar image IP (r, θ) is shown below:
for i =1 to nr dofor j =1 to nθi
doIP (i, j) = I(Rmax +Ri cos(2πj
nθi), Rmax +Ri sin(2πj
nθi))
end forend for
54
(a) (b)
(c) (d)
Figure 3.3: Examples of APT: (a) the original Lena image, (b) the APT transformedimage of (a), (c) the scaled and rotated image of (a), and (d) the APT transformedimage of (c)
As shown in Fig. 3.2(a), the number of samples in the angular direction of APT
is distributed adaptively according to the size of the radius. The result in this step
is a series of sample bins arranged in the step-like manner as shown in Fig. 3.2(b),
i.e., as r increases, the length of the sample bins (number of samples in θ) increases.
The total number of samples is less than what is needed for the conventional LPT
methods to have the same sampling resolution. Since sampling involves interpolation
in both cases, which requires high computation, the reduction of samples in APT
accelerates the computation. The results of applying APT to the Lena image and
the scaled and rotated version of Lena are shown in Fig. 3.3.
55
3.1.3 Projection transform
As shown in Figs. 3.3(b) and 3.3(d), the result from adaptive sampling is a series of
sample bins arranged in the step-like manner which do not show coordinate shifts for
scaled and rotated image as in LPT any longer. To maintain the advantages of scale
and rotation invariance in LPT, we use an innovative projection transform method
which projects the two-dimensional image on the radius and angular coordinates,
respectively. From the two projections, we can accurately calculate the scale and
rotation parameters, for which the details will be elaborated in Section 3.2. For now
we define the projection transform for the APT transformed image.
Given a transformed image IP (r, θ) that consists of nr sample bins in which
each bin has the length of nθifor i = 1, ..., nr. We denote nθ,� and Θ as the
number of samples in the angular direction at Ri = Rmax, the projection on the
radius coordinate, and the projection on the angular coordinate, respectively. The
mathematical expressions of � and Θ are as follow:
�(i) = Ωi
nθi∑j=1
IP (i, j) (3.6)
Θ(j) =nr∑i=1
[η1ijIP (i, ceil(
j − 1
Ωi)) + η2
ijIP (i, ceil(j
Ωi))] (3.7)
where ⎧⎨⎩
i ∈ [1, ..., nr], j ∈ [1, ..., nθ], Ωi = nθ
nθi
η1ij = Ωi(j − 1)− floor[Ωi(j − 1)], η2
ij = 1− η1ij
IP (i, 0) = 0, ∀i(3.8)
The operation floor(A) denotes the nearest integers less than or equal to A. The
results of the projection transform, vectors � and Θ will have the dimension of nr
and nθ, respectively. Both projections � and Θ are normalized to reduce the effect
of illumination changes. The computation of projection � is simple and does not
56
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection ℜ
Projection ℜ of original imageProjection ℜ of scaled image
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection ℜ
Projection ℜ of original imageProjection ℜ of rotated image
0 50 100 150 200 2500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection ℜ
Projection ℜ of original imageProjection ℜ of scaled and rotated image
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection Θ
Projection Θ of original image
Projection Θ of scaled image
(a)
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection Θ
Projection Θ of original image
Projection Θ of rotated image
(b)
0 500 1000 1500 20000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection Θ
Projection Θ of original image
Projection Θ of scaled and rotated image
(c)
Figure 3.4: The effects of the changes in scale and rotation in the Cartesian coordi-nates to the projections � and Θ: (a) the scale change in the Cartesian (scale factorof 1.2 is applied to Lena image) appears as variable-scale in the projection �, whilethe projection Θ becomes slightly altered, (b) the rotation in the Cartesian (rotationparameter of 45 degrees is applied to Lena image) appears as shifting in the projectionΘ, while there is no change in the projection �, and (c) the changes in both scale androtation and in the Cartesian (scale and rotation parameters of 1.2 and 45 degrees,respectively are applied to Lena image) appear as variable-scale in projection � andshifting in projection Θ, respectively
require interpolation. The computation of projection Θ, on the other hand, requires
one-dimensional interpolation, as shown in Eq. (3.7).
Examples of the projections of APT of the Lena image and its scaled and rotated
version are shown in Fig. 3.4. It can be seen that the scale change in the Cartesian
appears as the variable-scale in the projection � (i.e. from f(t) to f(at)) and the
rotation change in the Cartesian appears as shifting in the projection Θ.
57
(a) (b)
(c) (d)
Figure 3.5: Examples of APT of images of syrup bottle: (a) the reference image withthe model image selected as the area inside the red circle, (b) the APT transformedimage of (a), (c) the target image, and (d) the APT transformed image of the areainside the red circle of the target image
Another example of APT transformed images and their projections are shown
in Figs. 3.5 and 3.6, both using real images instead of synthetic ones of Fig. 3.3.
The reference image is shown in Fig. 3.5(a), where the syrup bottle is chosen to
be the model image as shown by the red circle. Fig. 3.5(b) shows the APT of
the model image. The target image is shown in 3.5(c) and the APT, which is the
area inside the red circle, is shown in 3.5(d). To demonstrate the effect of scale
and rotation to the projections, we use the same size of the circular area to be
transformed for both the reference image and the target image (Rmax is equal for
both images). The projections � and Θ of APT of the syrup bottle in the reference
image and the target image are shown in Figs. 3.6(a) and 3.6(b), respectively. Unlike
the previous example with Lena images that involves only small scale parameter of
58
0 10 20 30 40 50 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection ℜ
Projection ℜ of model image
Projection ℜ of target image
(a)
0 100 200 300 400 5000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Projection Θ
Projection Θ of model image
Projection Θ of target image
(b)
Figure 3.6: The projections of the syrup bottle images: (a) comparison of projections �of the syrup bottle from the reference image and the target image, and (b) comparisonof projections Θ of the syrup bottle from the reference image and the target image.The differences between projections of the two images are more pronounced, whichneed additional techniques to resolve
1.2, this demonstration with syrup bottle images involves larger scale parameter of
1.84. Moreover the syrup bottle images are non-synthetic, which are captured with
digital camera in the uncontrolled environment. This introduces various parameters
that affect the similarity between the two images as well as their projections such
as illumination changes, blurness, and three-dimensional perspective. As a result,
the alterations between the projections are more pronounced. This also makes it
more difficult to obtain the scale and rotation parameters by only visual inspection.
Moreover, if the center point of the APT transform does not correspond to that of the
model image (the red circle area is moved), the alterations between the projections
of both images will become even greater. As a result, novel techniques for obtaining
scale, rotation, as well as translation between two images need to be introduced.
These new techniques are discussed in detail in the next Section.
59
3.2 The New Image Registration Approach Using APT
3.2.1 Localization
In order to fully exploit the advantage of APT, the translation parameter between
the two images has to be found. The simplest solution is to perform the exhaustive
search, in which the APT is computed for every pixel in the target image and the
translation parameter is found from the image pixel that yields the best matching
result to the APT of the model image. Since it is computationally expensive to
perform the exhaustive search, an innovative method to reduce the space of search
is needed. Many works have proposed methods to speed up the search procedure.
Fourier-Mellin [64, 69] uses the Fourier phase correlation to recover the translation
before computing the log-polar matching in the frequency domain. However, a recent
study [73] indicates the aliasing problem from using the phase magnitude of FFT to
recover the translation (of the center point). Zokai et al [94] uses a multi-resolution
imaging technique to reduce the computation cost of translation recovery by searching
from the coarsest to the finest level. This technique, however, performs well only when
the target image contains enough low frequency information and that low frequency
component is not distorted by noise. We introduce a new method to accelerate the
search procedure. The new method is based on a reduced set of feature points.
In order to avoid the exhaustive search of the target image for the model image, we
reduce the search from every pixel of the target image to a set of feature points only.
These feature points are obtained by applying Gabor transform to every pixel and
selecting those pixels which generate high energy in the wavelet domain. Apparently,
the number of feature points is much smaller than that of the pixels in the target
image, while the computation of the Gabor wavelet transform is much lower than
60
that of APT. Thus, the computation load is much lighter than the exhaustive search
using APT. The reason we choose Gabor wavelet for extracting feature points is due
to its invariant properties to scale, rotation, and noise. More details of the Gabor
feature extraction method can be found in [47].
First, feature points are extracted from both the reference image and the target
image. We denote PM = {pM1 , .., pMnM} and P T = {pT1 , .., pTnT
} as sets of feature points
in the reference image and in the target image, respectively. The superscription M
and T denote the model (reference) image and the target image, and nM and nT are
the number of feature points in the reference image and the number of feature points
in the target image, respectively. To crop the model image out of the reference image,
we manually select one of the feature points in the reference image pMk as a center
point for APT and choose the size of the radius for the transformation Rmax that
sufficiently covers the model image to compute APT and the projections of APT, �M
and ΘM . These projections �M and ΘM represent the model image. We note that
for the case where the entire reference image is desired to be registered to the target
image (model image is the reference image), the process of selecting feature point pMk
can be performed automatically by selecting the feature point that locates closest to
the center of the model image. The size of the radius for the transformation Rmax
can be computed as the minimal distance from the feature point to four boundaries
in x−y axes of the model image. To recover the translation between the model image
and the target image, we need to find the corresponding feature point pTk′ in the target
image that relates with the feature point pMk in the model image.
Fig. 3.7 shows an example of the localization procedure. On the left is the refer-
ence image (Lena with the size 256 × 256 pixels) with 166 feature points identified.
61
The model image is created by selecting one of the feature points in the reference
image pMk as the origin, and is cropped to a circular image patch that covers the area
Rmax desired to be registered to the target image. On the right is the target image
(Lena that is scaled by 0.7 and rotated by 30 degrees, the size of the image is 256
× 350 pixels) with 180 feature points identified. To recover the translation between
the model image and the target image, we need to locate the corresponding feature
point pTk′ in the target image. As shown in Fig. 3.7, the search space is limited to
only a set of feature points in the target image, which is approximately 0.2% of the
total number of pixels in the target image in this example. Hence, the search space
is much smaller than that of the exhaustive search strategy.
The number of feature points identified varies depending on the richness of details
in the image. In general, the number of feature points identified using the proposed
method is approximately 0.1% -1% of the total number of pixels in the image. And
for most images, these amounts of feature points are sufficient enough to find the
corresponding pixels; i.e. pMk in the model image and pTk′ in the target image. We
compromise the number of feature points identified in the image with the compu-
tational load for localization (refer to the Gabor feature extraction method in [47]).
The number of feature points is not too small to detect the corresponding feature
points, while not too large to significantly increase the computational load.
3.2.2 APT matching
After extracting feature points in the model image and selecting one feature point
pMk as the center point for computing the projections �M and ΘM using the APT
approach, the next step is to find the corresponding feature point pTk′ in the target
62
Figure 3.7: Feature point extraction and localization procedure
image and obtain both the scale and the rotation parameters between the two im-
ages. Given a set of feature points in the target image and the radius size of the APT
transformation Rmax to be the same as for computing the projections �M and ΘM ,
we use each feature point in the target image as a center point for computing APT
creating the set of candidate projections �T = {�T1 , ..,�TnT} and ΘT = {ΘT
1 , ..,ΘTnT}.
By matching �M and ΘM with every member in the sets of projections �T and ΘT ,
respectively, the translation, scale, and rotation parameters are obtained simultane-
ously. The matching results have three dimensions: the scale parameter, the rotation
parameter, and the distance coefficient. The translation parameter is the offset be-
tween the location of pMk to the feature point pTk′ in the target image that yields the
lowest distance coefficient.
For a given feature point pTz in the target image, we denotes IM , ITz , �Tz , and
ΘTz , as the model image, the image patch that is cropped from the target image
to be matched with the model image, the projection on the radius coordinate, and
the projection on the angular coordinate, respectively. Then the scale change in the
Cartesian coordinates reflects as variable-scale in the projection �, and the rotation
63
change reflects as shifting in the projection Θ. Therefore, the innovative matching
mechanisms are proposed to obtain both the scale and the rotation parameters. The
mathematical expression of image ITz that is a scaled and rotated version of image
IM is
ITz (x, y) = IM(azx cos θz + ay sin θz ,−azx sin θz + azy cos θz). (3.9)
In the polar domain, Eq. (4.3) can be expressed as:
ITz (r, θ) = IM(azr, θ − θz) (3.10)
where az and θz are the scale and the rotation parameters, respectively. As shown
in Eqs. 4.3 and 4.4, the scale parameter az, in the Cartesian coordinates appears as
scaling with the same value in the radius direction of the polar transformed image.
For the case of adaptive sampling (APT), we can see that the effect of scaling in the
Cartesian to APT mapping remains similar to that of the polar transform mapping.
Except for APT, the number of samples in the angular direction can vary (adaptive).
To compute the 1D projection in the radius direction �(i), we introduce multiplication
variable Ωi to compensate the effect of adaptive sampling in the angular direction, as
shown in Eqs. 3.6 to 3.8. As a result, the 1D projection � of APT is an approximation
to that of the polar transformed image. From Eq. 3.6, we have:
�Tz (i) = Ωi
nθi∑j=1
IP Tz (i, j) (3.11)
and
�M(i) = Ωi
nθi∑j=1
IPM(i, j). (3.12)
We can compute �Tz in terms of �M as follow:
�Tz (i) = Ωaz i
nθazi∑j=1
IPM(azi, j), (3.13)
64
�Tz (i) = �M(azi). (3.14)
Thus, the variable-scale az is a global and uniform scale for the 1D projection � curve.
As shown in Figs. 3.4(a) and 3.4(c), the projections Θ of the scaled images in the
Cartesian are slightly altered when compared with that of the original image. This
is because the areas covered during the APT transformations are different as a result
of the scaling. Hence, in order to accurately obtain the shifting parameter between
the two projections ΘM and ΘTz , the variable-scale parameter az between the two
projections �M and �Tz needs to be obtained first.
Find scale parameter
There are several ways to obtain the scale parameter from the projections depend-
ing on the requirements of the application in term of the computational cost, accuracy,
image types and environments. We introduce here three effective algorithms.
• Algorithm 1: Fourier method.
Based on the property of Fourier transform, given a Fourier transform of signal
x(t) as X(ω), the Fourier transform of the scaled signal can be expressed as:
x(t)←→ X(ω) (3.15)
x(at)←→ 1
|a|X(ω
a). (3.16)
Similar to the polar transform, the projection �Tz computed from image ITz that
is the scaled version of image IM with the scale parameter az will also be the
scaled version of the projection �M with the same scale parameter az. From
65
Eqs. (3.15) and (3.16), given the Fourier transform of the projection �(.) as
Γ(.), we can obtain the scale parameter az by:
�Tz (i) = �M(azi) (3.17)
|az| =ΓM(0)
ΓTz (0). (3.18)
We note that this Fourier method performs effectively only when the scale
parameter is small. Large scale parameter yields the aliasing problem. Hence,
this algorithm is suitable for applications that require fast and accurate image
registration while the scale parameter between the two images is expected to
be small, such as medical image registration [45], or scene change detection.
• Algorithm 2: Logarithm method
The second algorithm uses the scale invariant property of the logarithm func-
tion. First, the logarithm function is applied to the projection � and the output
is then quantized to maintain the original dimension of the projection. The
mathematical expression of the implementation is as follow:
L�(k) = �(nrlog k
lognr); k = 1, ..., nr. (3.19)
The parameter L�(.) denotes the logarithmic of the projection �. Given image
ITz a scaled and rotated version of image IM , the scale parameter az between
the two images would appear as translation in the logarithm domain:
£�Tz (ρ) = £�M(ρ) + log az. (3.20)
66
To find the displacement d, where d = log az, such that L�Tz (ρ) = L�M(ρ −
d), one can evaluate the correlation function between the two logarithmic of
projections C(£�M ,£�Tz ):
d = arg max C(L�M ,L�Tz ). (3.21)
This algorithm yields high accuracy and fast matching in the general condition.
However, similarly to the conventional LPT, this method can give inaccurate
result if there are high changes at the fovea between the two images. Although
this algorithm uses logarithmic in the computation, the proposed algorithm is
different from the image registration method in [94] in many aspects. First
of all, the correlation coefficient is not used to verify the global translation.
Instead, the correlation function here is used to obtain the scale parameter
only. Secondly, minimum-square-error (MSE) between projections Θ is used in
the adaptive polar domain as a true similarity measure to verify the registered
image pair. This will be discussed later in detail in Subsection 3.2.2.
• Algorithm 3: Resampling projection
Both algorithms 1 and 2 can yield accuracy at some degrees. As stated earlier,
the accuracy of algorithm 1 degrades as the scale parameter increases as a result
of aliasing between the two projections. In the case of algorithm 2, although
the accuracy of the algorithm is not affected by the size of the scale parameter,
it yields inaccurate result when there are strong changes at the fovea between
the two images. Moreover, the second algorithm yields displacement estimation
with only integer accuracy. If high precision is required, one may include an
extra step after acquiring the estimated scale parameter, denoted as a′z, from
67
either algorithm 1 or algorithm 2. Given the lower limit aL and the upper limit
aU of the search space for scale parameter such that aL < a′z < aU , respectively,
and given the number of the search steps equals to h. The search process can
be described as follow:
for i =1 to h do�Mres = Φ(�M , ai) where ai = aL + i(aU−aL)
h
if ai ≥ 1 then
E� =
√∑nraij=1[�Tz (j)−�Mres(j)]2
elseE� =
√∑nr
j=1[�Tz (j)−�Mres(j)]2end if
end for
where the operation y = Φ(x, a) is a resampling procedure. This resampling
procedure is a common operation in signal processing for computing the se-
quence in vector x at a times the original sampling rate by using a polyphase
filter implementation. Operation Φ applies an anti-aliasing linear-phase (low-
pass) FIR filter to x during the resampling process. The result will have the
dimension of vector equal to length(y) = length(x) × a. We normalize the
resampled projection prior comparison. The scale parameter is equal to the
resampling parameter ai that yields the lowest MSE, E�. It is obvious that the
larger parameter h is, the higher accuracy would be obtained.
Find rotation parameter
After the scale parameter is obtained, the next step is to find the rotation pa-
rameter θz. For the same sampling radius, Rmax, the larger the image (az >> 1),
68
the smaller the area of the scene is covered in the sampling procedure. As a result,
the magnitude of the projections Θ between the two images could be slightly altered.
This phenomenon is illustrated in Figs. 3.4(a) and 3.4(c). Hence, to accurately ob-
tain the rotation parameter θz, Eq. (3.7) needs to be modified according to the scale
parameter az:
ΘM =
{ΘM if az ≤ 1∑nr
azi=1[η
1ijIP
M(i, ceil( j−1Ωi
)) + η2ijIP
M(i, ceil( jΩi
))] otherwise,(3.22)
ΘTz =
{ΘTz if az ≥ 1∑aznr
i=1 [η1ijIP
Tz (i, ceil( j−1
Ωi)) + η2
ijIPTz (i, ceil( j
Ωi))] otherwise.
(3.23)
Both projections are then resampled to be equal in length. The rotation parameter
can be found by evaluating the correlation function C(ΘM , ΘTz ):
d = arg max C(ΘM , ΘTz ) (3.24)
θz =2πd
nθ. (3.25)
Find distance coefficient
The final and crucial component resulting from APT matching is the distance
coefficient, denoted as E . This distance coefficient indicates how large the difference
between the two images is. The distance coefficient Ez between the model image IM
and the target image ITz can be computed from the Euclidean distance between the
projections ΘM and ΘTz :
Ez =
√√√√ nθ∑i=1
[ΘTz (i)− ΘM(i− d)]2. (3.26)
69
For the proposed image registration approach, we are searching for one feature
point pTk′ in the target image that corresponds with the feature point pMk in the
model image. Such feature point pTk′ is the point that yields the lowest distance
coefficient. We note that the projections �M and �Tz are not used for computing the
distance coefficient Ez because the dimension of projections that can be used in the
computation varies depending on the scale parameter az (as seen in the computation
of E� in algorithm 3 of the Subsection 3.2.2). As a result, only the projections ΘM
and ΘT are used in the computation.
3.2.3 Image comparison
This work extends the use of image registration to the environment where images
contain occlusion or alteration. Hence, it is important to be able to locate the area
such changes take place in the target image, which will be useful for many applications
such as medical image registration and scene change detection. Using the advantages
of APT, we propose a fast and simple image comparison scheme that can effectively
and automatically locate the altered area or the area where the registered image pair
differs without scarifying additional computational cost.
Since the scale difference in the Cartesian coordinates between the model image
and the target image yields different area coverage in the transformation process, the
dimension of the projection � for comparing the two images needs to be adjusted
accordingly. Given the scale and the rotation parameters between the model image
and the target image as ak′ and θk′ , respectively, we can find the altered area as a set
of pixels (X, Y ) in the Cartesian coordinates, where X = x1, ..., xt, Y = y1, ..., yt, and
parameter t is the number of pixels in the altered area as follow:
70
ε�(i) = ‖�Tk′(i)− �Mres(i)‖{
for i = 1 to nr if ak′ ≥ 1for i = 1 to ak′nr if ak′ ≤ 1,
(3.27)
εΘ(j) = ‖ΘTk′(j)− ΘM(j − d)‖, for j = 1 to nθ. (3.28)
We denotes the location of pTk′ in the Cartesian coordinates as (cx, cy). For ∀ε�(i), ∀εΘ(j) ≥
ε, we can compute the set of image pixels in the altered area (X, Y ) as follow:
(X, Y ) = {(x1, y1), ...., (xt, yt)}, (3.29)
∀ε�(i), ∀εΘ(j) ≥ ε; X = cx + ε�(i) sin εΘ(j), Y = cy + ε�(i) cos εΘ(j). (3.30)
The threshold ε is specify by the user according to the desire sensitivity of the ap-
plication to the changes between the two images. The recommend value is between
0.1-0.2.
It should be noted that image comparison to find altered area between two images
can be done by any registration algorithm if image registration can be achieved. Our
method is robust in the sense that the image with altered area even in the center can
still be registered plus the reservation of the LPT advantages scale and rotating invari-
ant. Furthermore, the altered area can be identified without further computational
cost. For most registration algorithms, in order to compare two images that involve
the two-dimensional geometric transformation, additional computations are required
as post processes such as image alignment, geometric transformation to transform
both images to the same coordinates prior to the comparison, and image normaliza-
tion. For the proposed image registration approach, since both the model image and
71
the target image are already transformed to the projections of the APT domain, im-
age comparison can be performed directly and simultaneously while obtaining scale
and rotation parameters.
3.2.4 Complete algorithm
Below is the complete steps of the proposed algorithm:
• Extract feature points in both the reference and the target images.
• Create the model image by selecting one of the feature points in the reference
image pMk as the origin. Then crop circular image patch IM that covers the area
Rmax desired to be registered to the target image.
• Compute the projections �M and ΘM using the proposed APT approach and
the projection transform.
• Use each feature point in the target image pTz for z = 1 : nT , where nT is the
number of feature points in the target image as the origin and crop a circular
image patches ITz for z = 1 : nT with the radius size Rmax.
• Compute the sets of candidate projections �T = {�T1 , ..,�TnT} and ΘT =
{ΘT1 , ..,Θ
TnT}.
• Match each candidate with the model using the proposed APT matching algo-
rithm. Translation is the point (x, y) that yield the lowest distance coefficient.
The scale and rotation parameters are obtained simultaneously in this step.
• Find the altered area (X, Y ).
72
3.3 Experimental Results
To evaluate the effectiveness of the proposed image registration approach, we
apply twelve different sets of test images. In Subsection 3.3.1, we present the reg-
istration of images that are rotated and scaled. We use five sets of test images in
the experiment. Different rotation and scale parameters are applied to three sets of
test images: squirrel, statue of liberty and bird using Matlab. Another two sets of
test images consisting of images of objects: speed limit sign, and syrup bottle taken
at different locations and at different illuminations are also presented in this Subsec-
tion. In Subsection 3.3.2, we present the registration of images that are subjected to
occlusion and alteration. We use seven sets of test images for the experiment. The
first two sets of test images: Dreese building and flower are rotated and scaled using
Matlab. Occlusions artificially generated using random number are included in the
test images. The next five sets of test images are non-artificial including the medi-
cal images of the human brain, and human lung taken before and after the medical
therapy, and occluded images of objects: cereal box, stop sign and OSU tower. To
compare the performance of the proposed approach, we repeat the experiments with
the conventional LPT based image registration approach (as proposed in [94]).
For the proposed approach, the parameters nr and nθ used in the experiments
are computed according to the size of Rmax used in the experiments as shown in
Eq. (3.5). The translation information is obtained using the proposed feature based
search scheme. To obtain the scale parameter, we choose to use Algorithm 2, the
logarithm method, to get the estimated scale parameter a′z, followed by Algorithm
3, resampling projection, in which the parameters aL = a′z − 0.5, aU = a′z + 0.5, and
h = 20. For the experiments with the conventional LPT, the parameters nρ and nθ
73
are selected according to Eq. (3.4). The translation information is obtained using the
multiresolution technique as suggested in [94]. To achieve accuracy at pixel level, we
apply a small search windows of 5 × 5 pixels around the translation point for both
methods.
Figs 3.8-3.19 consist of three subfigures. The subfigure on the left is the reference
image, where the model image is the area inside the red rectangular (for the case where
the entire reference image is used for registration, the model image is the reference
image). The subfigure in the middle is the registration result using the conventional
LPT, where the red rectangular indicates the registration result. The subfigure on the
right is the registration result using our proposed APT based approach, where the red
rectangular indicates the registration result and the green area indicates the altered
area. Using the ground truth information, we measure and compare the accuracy of
the proposed method and the conventional LPT as shown in Table I and II.
All experiments are carried out using Matlab 2007a running on 2GHz Intel Pen-
tium IV machine. The computation time required for registering an image pair varies
depending on the image size, the richness of texture content in the image, which in-
fluence the number of feature point detected in the image, and the size of Rmax. In
our experiment, images with the size of 353 × 261 pixels (as shown in Fig. 3.11(b))
and 768 ×1024 pixels (as shown in Fig. 3.10(b)) are registered in approximately 25
and 55 seconds, respectively, and the average runtime from all experiments is ap-
proximately 35 seconds. We, however, found that approximately 70 percents of the
computational time involves the Gabor feature extraction process. Although com-
puting Gabor wavelet is much less expensive than that of the exhaustive search using
APT, it still imposes a high computational cost and the solution for speeding up the
74
process is an on going research topic [71,87,88]. There are existing works that are able
to achieve real-time computation using parallel hardware structure designed for the
computation of Gabor wavelet [71,87], which could greatly improve the computation
speed of our proposed approach. This topic is beyond our scope and not included in
this work.
3.3.1 Image registration in general conditions
We evaluate the performance of the proposed image registration approach in gen-
eral conditions, in which the images are subjected to scale, rotation, and illumination
changes. All images are taken from digital camera in normal lighting condition. Dif-
ferent scale and rotation parameters are applied to the first three images: squirrel,
statue of liberty, and bird using Matlab. The following two images: speed limit sign
and syrup are taken at different locations, zooming and viewpoints. In the first exper-
iment, a rectangular area with the size of 104 × 104 pixels are cropped out of squirrel
image and used as reference image, as shown in Fig. 3.8(a). Since the entire area of
the reference image is used in this test, the reference image is also considered as the
model image. The squirrel image is then scaled and rotated with the parameters of 1
and 45 degree respectively, and used as the target image. Figs. 3.8(b) and 3.8(c) show
the registration results using the conventional LPT based approach and the proposed
APT approach, respectively. In the second and third experiments with test images:
statue of liberty and bird, the similar process is applied. The rectangular areas with
the size of 146 × 146 pixels and 170 × 170 pixels are cropped out of statue of liberty
and bird images, respectively, and used as reference images, as shown in Figs. 3.9(a)
and 3.10(a). The scale and rotation parameters applied to the test images are 1.5 and
75
180 degrees for statue of liberty image and 3 and 90 degree for bird image. The reg-
istration results using the conventional LPT based approach and the proposed APT
approach of both test images are shown in Figs. 3.9(b), 3.9(c), 3.10(b), and 3.10(c).
Next, we use 2 sets of test images: speed limit sign, and syrup bottle. Each set
of the test images contain 2 different images that have the same object taken at
different locations, zooming and viewpoints. For the experiment with the speed limit
sign image, a model image with the size of 36 pixels in diameter containing the speed
limit sign is selected, as shown in Fig. 3.11(a). To better illustrate the rotation
between the selected model image and the registration result, we use rectangular to
represent the selected model image and the registration result instead of circle. Figs.
3.11(b) and 3.11(c) show the registration results using the conventional LPT based
approach and the proposed APT approach, respectively. The second experiment
uses two images of the syrup bottle taken at different viewpoints and with different
backgrounds. A model image with the size of 88 pixels in diameter is selected, as
shown in Fig. 3.12(a). The registration results are shown in Figs. 3.12(b) and 3.12(c).
As shown in Table I, we can see that our proposed method yields robustness in scale,
rotation, translation, and illumination changes. The accuracy of the registration
using our proposed approach is comparable to that of the conventional LPT based
method. However, the numbers of samples used in the proposed approach (i.e. nr =
52, 73, 85, 18 and 44 and nθ = 416, 584, 680, 144 and 352 samples for the experiments
on squirrel, statue of libery, bird, speed limit sign and syrup bottle, respectively) are
much smaller than that of the conventional LPT (i.e. nρ = 256, 512, 512, 64 and
256 and nθ = 2048, 4096, 4096, 512 and 2048 samples for the experiments on squirrel,
statue of libery, bird, speed limit sign and syrup bottle, respectively). Therefore, in
76
(a) (b) (c)
Figure 3.8: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of squirrel : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result)
general conditions, our proposed method yields almost similar accuracy as that of the
conventional LPT based approach, but with much less computational load.
3.3.2 Registration of images that are subjected to occlusionand alteration
In this Subsection, we evaluate the performance of the proposed approach for
images that are subjected to occlusion and alteration. First, two sets of test images:
Dreese building and flower are presented. Artificial occlusions are applied to both
test images. In the first experiment, the original Dreese building image with the
size of 200 × 200 pixels is used as reference image, as shown in Fig. 3.13(a). The
Dreese building image is then scaled and rotated with the parameters of 0.8 and 45
degrees, respectively. Artificial occlusion is applied to the scaled and rotated image to
create the target image. Figs. 3.13(b) and 3.13(c) show the registration results using
the conventional LPT based approach and the proposed APT approach, respectively.
77
(a) (b) (c)
Figure 3.9: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of statue of liberty : (a) thereference image where the entire area is used as the model image, (b) registrationresult using the conventional LPT based approach, and (c) registration result usingthe proposed approach (red rectangular indicates the registration result)
(a) (b) (c)
Figure 3.10: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of bird : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result)
78
(a) (b) (c)
Figure 3.11: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of speed limit sign: (a) thereference image with the model image selected as the area inside the red rectangular,(b) registration result using the conventional LPT based approach, and (c) registra-tion result using the proposed approach (red rectangular indicates the registrationresult)
(a) (b) (c)
Figure 3.12: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of syrup bottle: (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration result)
79
Proposed method Conventional LPTScale Rotation Trans. Scale Rotation Trans.
(degree) (pixel) (degree) (pixel)
squirrel 0 5 0 0.2 0 0(337×337), a = 1, θ = 45
stature of libery 0.1 0 0 0.2 0 0(450×437), a = 1.5, θ = 180
bird 0 1 0 0 0 0(768×1024), a = 3, θ = 90
speed limit sign 0.02 4 2 0.06 1 2(353×261), a = 0.65, θ = 26
syrup bottle 0.04 0 0 0.1 0 0(512×384), a = 1.8, θ = 30
Table 3.1: The image registration error compared to the ground truth in generalconditions (lower is better)
It can be observed that the included artificial occlusion has the texture and image
intensity close to that of the original image. As a result, both the conventional
LPT and our proposed approach yield accurate registration results. In the second
experiment, flower, a rectangular area with the size of 100 × 100 pixels are cropped
out of the image and used as reference image, as shown in Fig. 3.14(a). Scale and
rotation parameters of 1.4 and 90 degrees are applied to the image and used as the
target image. The registration results using the conventional LPT based approach
and the proposed APT approach are shown in Figs. 3.14(b) and 3.14(c), respectively.
In this experiment, even though the size of the occlusion is small, but the texture and
image intensity are very much different than that of the original image. Moreover,
the occlusion is located very close to the center point. Hence, the registration result
using the conventional LPT yields large error. Our proposed method, on the other
hand, still performs effectively and obtains accurate registration result. In the third
80
experiment, we use digital camera to capture two images of cereal box with and
without occlusion. For the non occluded image, we set the zooming parameter at
2x and rotate the camera at approximately 180 degrees, compared to that of the
occluded image. The registration result using the conventional LPT based approach
and the proposed APT approach are shown in Figs. 3.15(b) and 3.15(c), respectively.
Since the area that is occluded is large and located close to the center point, it can
be observed that the conventional LPT method yields large error in both translation
and scale parameters. Our method, on the other hand, still can register the image
correctly in this condition.
In the fourth and fifth experiments, the MR scanned images of human brain (avail-
able at http://www.maths.manchester.ac.uk/∼shardlow/moir/rueckert.pdf 1) and the
CT scanned images of human lung taken from [68] are used as the test sets of images
that are subjected to alteration. Each test image is taken from the same patient with
the same image sensor but at different times. For the experiment with the human
brain image, a model image with the size of 240 pixels in diameter is selected, as
shown in Fig. 3.16(a). Figs. 3.16(b) and 3.16(c) show the registration results for the
conventional LPT based approach and the proposed APT approach, respectively. In
the experiment with the human lung image, a model image with the size of 260 pixels
in diameter is selected, as shown in Fig. 3.17(a). The registration results are shown
in Figs. 3.17(b) and 3.17(c). As shown in Figs. 3.16(b) and 3.17(b), the registration
results using the conventional LPT based approach are inaccurate. Both scale and ro-
tation parameters are incorrect, and the translation parameter is also incorrect. This
illustrates that the conventional LPT approach is sensitive to the alteration between
1D. Rueckert, Applications of Medical Image Registration in Healthcare, Biomedical Researchand Drug Discovery
81
(a) (b) (c)
Figure 3.13: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of Dreese Building : (a) thereference image where the entire area is used as the model image, (b) registrationresult using the conventional LPT based approach, and (c) registration result usingthe proposed approach (red rectangular indicates the registration result and the greenarea indicates the altered area)
the model image and the target image. In Figs. 3.16(c) and 3.17(c), the registration
results indicate that our proposed method yields robustness to image alteration. The
registration results of the proposed approach are more accurate than that of the con-
ventional LPT based approach. Moreover, the altered area is accurately identified,
as shown in the green area of Figs. 3.13(c), 3.14(c), 3.15(c), 3.16(c) and 3.17(c). For
medical imaging, this ability is valuable as it can identify the area that needs medical
attention automatically. This would ease doctors in analyzing the surgery progress
or shorten the time to analyze the x-ray or CT scanned. We note that the threshold
ε of 0.1 is used in these five experiments.
We also experiment with the images that are subjected to non-artificial occlusion
occurred from taking photos at different view point. Two sets of test images: OSU
tower, and stop sign that are taken around the Ohio State University campus are
used for the experiment. For the experiment with images of OSU tower, a model
82
(a) (b) (c)
Figure 3.14: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of flower : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result and the green area indicatesthe altered area)
(a) (b) (c)
Figure 3.15: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of cereal box : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result and the green area indicatesthe altered area)
83
(a) (b) (c)
Figure 3.16: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human brain: (a) the ref-erence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)
(a) (b) (c)
Figure 3.17: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human lung : (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)
84
image with the size of 160 pixels in diameter is selected, as shown in Fig. 3.18(a),
and for the experiment with images of stop sign, a model image with the size of 58
pixels in diameter is selected, as shown in Fig. 3.19(a). As shown in Figs. 3.18 and
3.19, the areas that are subjected to occlusion in the target images are not only large
compared to the model images, but also are located close to the center point of the
transformation. As a result, the conventional LPT based approach fails to register
the images correctly, as shown in Figs. 3.18(b) and 3.19(b). The scale, rotation, and
translation parameters obtained from the conventional LPT based approach are far
incorrect. This illustrates how sensitive the conventional LPT is to image occlusion.
For our proposed approach, since the images are evenly sampled, which yields unbias
matching, the proposed approach can successfully register both sets of test images,
and the altered areas are also correctly identified, as shown in Figs. 3.18(c) and
3.19(c). We note that the threshold ε of 0.15 is used in these two experiments. The
experiments with the seven sets of test images indicate that our proposed method
outperforms the conventional LPT based approach and yields robustness to image
occlusion and alteration.
In terms of the computational load, similarly to the experiments in Subsection
3.3.1, the numbers of samples used in the proposed APT approach (i.e. nr= 100,
50, 95, 120, 130, 80, and 29 samples and nθ= 800, 400, 760, 960, 1040, 640, and
232 samples for the experiments on Dreese building, flower, cereal box, human brain,
human lung, OSU tower, and stop sign, respectively) are much smaller than that of
the conventional LPT (i.e. nρ=512, 256, 512, 1024, 1024, 512, and 128 samples and
nθ=4096, 2048, 4096, 8192, 8192, 4096, and 1024 samples for the experiments on
Dreese building, flower, cereal box, human brain, human lung, OSU tower, and stop
85
(a) (b) (c)
Figure 3.18: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of OSU tower : (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)
sign, respectively). Therefore, not only can the proposed APT approach provide more
accurate registration for the images that are subjected to occlusion and alteration
than the conventional LPT based approach, but the computational load is also much
smaller.
3.3.3 Registration results and errors
The errors of the registration results compared to the ground truth of both the
proposed method and the conventional LPT are shown in Table I and II. In the
normal conditions, both methods yield small errors and comparative results with
almost similar accuracy as shown in Table I. However, only the proposed method can
successfully register images that are altered or occluded, i.e., for images of flower,
cereal box, human lung, OSU tower and stop sign as shown in Table II. This is
because APT does not concentrate only at the fovea as the conventional LPT, but
consider importance to the entire image equally.
86
(a) (b) (c)
Figure 3.19: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of stop sign: (a) the referenceimage with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)
Proposed method Conventional LPTScale Rotation Trans. Scale Rotation Trans.
(degree) (pixel) (degree) (pixel)
Dress building 0.05 5 2 0.05 0 0(241×241), a = 0.8, θ = 45
flower 0.05 3 5 0.35 0 5(560×420), a = 1.4, θ = 90
cereal box 0 3 2 0.2 8 55(255×340), a = 0.5, θ = 180
human brain 0.01 0 0 0.12 3 0(276×276), a = 1.13, θ = 0
human lung 0 0 0 0 4 8(260×220), a = 1.1, θ = 0
OSU tower 0 1 4 0.14 40 13(275×198), a = 2, θ = 22
stop sign 0.02 2 2 0.1 30 56(424×308), a = 1.1, θ = 4
Table 3.2: The image registration error compared to the ground truth when imagesare altered or occluded (lower is better)
87
3.4 Extended Works
Motivated by the success of APT based approach, we further propose an inno-
vative solution that eliminate the interpolation computation needed in the mapping
process. This extended work uses one-to-one mapping mechanism that directly ar-
ranges image from Cartesian to Polar coordinate according to pre-computed APT
image. To transform a circular area with the size Rmax inside image I(x, y) in the
Cartesian coordinate to APT image, we first need to create a PPT (Projective Polar
Transform) map. We define a distance parameter D(xc,yc)(x, y) as
D(xc,yc)(x, y) = max{n ∈ Z+|n ≤
√(x− xc)2 + (y − yc)2} (3.31)
where (xc, yc) is the image pixel in Cartesian selected to be the center of the trans-
formation. Fig. 3.20(a) shows example of distance parameters of image with the size
10× 10 pixels where the center point (xc, yc) = (7, 7). In each pixel of the image, the
calculated distance parameter is filled. Fig. 3.20(b) is the graphic demonstration of
PPT map for image with the size 513×513 pixels where the center of the transforma-
tion (xc, yc) = (256, 256). As shown in Fig. 3.20(b), PPT map is an approximation
of Polar transform mapping.
To transform image, we directly map the image pixels in the Cartesian coordinate
I(x, y) that have distance parameter D(xc,yc)(x, y) = i to the transformed coordinate
IP (i, θ). To approximate sampling in both radius and angular directions, for each
iteration i, the mapping begins with I(xc, yc + i) and continue the search for image
pixels with D(xc,yc)(x, y) = i pixel-by-pixel in the counter-clockwise direction. The
process is repeated for i = 1, .., Rmax. Example of the transformed Lena image is
88
(a) (b)
Figure 3.20: PPT map (a) Distance parameters for image of size 10× 10 pixels and(xc, yc) = (7, 7), (b) Graphic demonstration of PPT map for image with size 513×513pixels and (xc, yc) = (256, 256)
(a) (b)
Figure 3.21: Example of PPT: (a) the original Lena image, (b) the PPT transformedimage of (a)
shown in Fig. 3.21. Since PPT map is static and does not required to be computed
for every transformation, and the mapping process is a simple one-to-one allocation
from Cartesian to PPT images and does not require interpolation, the computational
cost in this step is very low compared to that of LPT. While applying the similar
1D projection matching and feature based localization approaches, the new method
yields much lower computational cost than the APT approach.
89
3.5 Summary
Although LPT has been widely used in many image processing applications, it
suffers from nonuniform sampling. Hence, there are two major problems of the con-
ventional LPT: the high computational cost in the transformation process, which
comes from the oversampling at the fovea in the spatial domain, and the bias match-
ing, in which the matching mechanism focuses only on the fovea or central area while
the peripheral area is given less consideration. Previous works on image registration
using the conventional LPT indicate successful results in the ideal conditions as when
registering images with different orientations and scales. In reality, however, occlu-
sions and alterations between the two images need to be taken into consideration.
Inspired by this fact, this Chapter presents a new image registration algorithm based
on the novel APT approach. By evenly and effectively sampling the image in the
Cartesian coordinates and using the innovative projection transform to reduce the
dimensions, the proposed APT yields faster sampling than that of the conventional
LPT and provides more effective and unbias matching. A matching search scheme
using the scale and rotation invariant Gabor feature points is introduced to reduce
the search space for recovering the translation of the model image in the target image.
The image comparison scheme on the APT projection domain is then proposed to
effectively locate the occlusion and alteration between the two images without scari-
fying additional computational cost. This information is useful for some applications
that intend to further analyze the registered images such as in medical image analysis
or scene change detection. Experimental results indicate that our proposed image
registration approach outperforms the conventional LPT based approach and yields
robustness to image occlusion and alteration.
90
CHAPTER 4
SCALE AND ROTATION INVARIANT IMAGEREGISTRATION USING PRE-SHIFTED LOGARITHMIC
SPIRAL
LPT is a well known tool for image processing for its rotation and scale invariant
properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in the Cartesian
coordinates appear as translation or shifting in the log-polar domain. These invariant
properties make LPT based image registration robust to scale and rotation. However,
there is limitation to the LPT based approach. The accuracy of the registration
depends directly on the number of samples used in the mapping process. This problem
can be easily illustrated. For example, if 36 samples in the angular direction are used,
the level of accuracy of the registration can not exceed the 10 degrees resolution. Since
obtaining scale and rotation parameters involves 2D correlation method either in the
spatial domain or in the frequency domain, the computational complexity of the
matching procedure grows exponentially as the number of samples increases. This
problem of LPT has been extensively studied in the past decade. The work in [40]
introduces pseudo-LPT method that directly maps image pixels in the Cartesian to
LPT images in the frequency space resulting in reduction of computational time.
In [48], the authors tackle the problem from the sampling distribution perspective
91
aiming to effectively reduce the computational waste as a result of oversampling at
the fovea in the transformation process.
Motivated by the success of the LPT based approach and its limitation, we propose
a novel pre-shifted logarithmic spiral (PSLS) based image registration method that
is robust to translation, scale, and rotation and yields high accuracy while requires
lower computational cost. With PSLS transform, rotation and scale in the Cartesian
coordinates appear as translations in the transformed domain. The proposed method
uses two separate sets of logarithmic spiral (LS) mappings in the opposite direction.
Both the model image and the target image are mapped twice with each of the LS
mappings. By pre-shifting the sampling point of one of the mapping by π/nθ radian,
where nθ is the number of samples in the angular direction, the registration accuracy
increases two fold. As a result, to achieve the proposed approach can achieve the same
registration accuracy as that of the LPT while using half the number of samples in the
angular direction. Therefore, the computational complexity in the matching process
is greatly reduced. We further introduce a new search method that efficiently uses
the scale and rotation invariant feature points to eliminate the exhaustive search for
all the possible translations of the model image.
This chapter is organized as follows: Section 4.1 presents the new LS and PSLS
approaches, their properties and advantages are also discussed in this section, Section
4.2 presents image registration approach using PSLS appraoch. The proposed method
consists of two major components: feature point based search scheme, in which the
translation between the model image and target image is recovered, and PSLS match-
ing, an effective mechanism using PSLS transform to match and accurately obtain the
scale and the rotation parameters between the two images. Experiments and results
92
are given in Section 4.3 including comparisons to the conventional LPT. Finally our
work is summarized in Section 4.4.
4.1 The Proposed Logarithmic Spiral Approach
LPT has been widely used in many image processing applications. The works
in [48, 94] used LPT for scale and rotation recovery for image registration pur-
poses. However, there are two major problems with the LPT based image regis-
tration method: inefficient sampling point distribution and high computational cost
in the matching procedure. As shown in Fig. 1.1, the sampling points are distributed
as multiple circles in which the distance between two consecutive circumferences in-
creases exponentially as the points are further away from the center. As a result,
image pixels that fall between the sampling circumferences are missed in the sam-
pling process. Hence, some information in the radius direction are lost. To reduce the
problems caused by nonuniform sampling, we propose a LS mapping method. Unlike
LPT mapping that the distance from the center to the sampling circumference in-
creases in the discrete or step-like manner, the distance from the the center point to
the sampling point of LS mapping increases continuously in the spiral manner. As a
results, information in the radius direction will not be all discarded, while maintain-
ing the scale and rotation invariant properties as that of LPT. We will later discuss
in detail about LS sampling in Subsection 4.1.1.
Another problem of LPT is the high computational cost in the matching process.
The accuracy of the registration depends directly on the number of samples used in
the mapping process. And since obtaining scale and rotation parameters involves 2D
93
correlation method either in the spatial domain or in the frequency domain, the com-
putational complexity of the matching procedure grows exponentially as the number
of samples increases. This problem of high computational cost of LPT has been exten-
sively studied in the past decade. The work in [40] introduces presudo-LPT method
that directly map image pixels in the Cartesian to LPT images in the frequency space
resulting in reduction of computational time. In [48], the authors tackle the prob-
lem from the sampling point distribution perspective aiming to effectively reduce the
computational waste as a result of oversampling at the fovea in the transformation
process. In this paper, we propose a novel pre-shifted sampling mechanism that sig-
nificantly reduces the computational cost in the matching process. By pre-shifting
the sampling point by π/nθ radian, the total number of samples in angular direction
for each mapping is reduced by half. As a result, the computational complexity in the
matching process is greatly reduced. Combining LS mapping with the innovative pre-
shifted sampling mechanism, we propose a novel pre-shifted logarithmic spiral (PSLS)
based image registration method that is robust to translation, scale, and rotation and
yields high accuracy while requires lower computational cost than that of LPT.
4.1.1 Logarithmic spiral mapping
A logarithmic spiral (LS) [4, 16, 66] is a unique kind of spiral curve that can
be found in nature such as a nautilus shell. In polar coordinates, the mathematical
expression of LS can be given as r = cebθ, where r is the radius size, θ is the orientation,
b and c are the arbitrary constant controlling the tightness of the spiral and the point
where spiral begins, respectively. For the sake of simplicity, throughout this paper we
use c = 1 to have the spiral begins at r = 1 when θ = 0. One special property of LS
94
−8 −6 −4 −2 0 2 4 6 8−8
−6
−4
−2
0
2
4
6
8LS − CW direction
(a)
−8 −6 −4 −2 0 2 4 6 8−8
−6
−4
−2
0
2
4
6
8LS − CCW direction
(b)
−8 −6 −4 −2 0 2 4 6 8−8
−6
−4
−2
0
2
4
6
8Pre−shifted LS − CCW direction
(c)
Figure 4.1: LS mapping with nr = 8 and nθ = 16: (a) CW direction, (b) CCWdirection, and (c) PSLS in the CCW direction
is that the angle ψ between the tangent and the radial line at any point p on spiral is
constant and equal to cot−1(b). Fig. 4.1 shows examples of LS. Applications of LS vary
from the fields of arts, computer graphic and architecture [6,29], mathematics [7,25],
design of antennas [17,33,41,42,75], data fitting, to the studies of plants and animals.
To the best of our knowledge, there is no research work of LS in image registration
applications.
To apply LS to image mapping, new parameters need to be introduced. We denote
Rmax as the furthest distance from the center point of transformation I(x0, y0) of the
image in the Cartesian to the sampling point on the LS curve, N as the total number
of samples on the entire LS curve, nθ as the number of samples within the 2π radian
on the curve, and nr = N/nθ. The LS mapping procedure I(x, y) �→ S(θ) where S is
a one dimensional transformed vector is as follow:
S(i) = I(xi, yi) for i = 1, . . . , N where (4.1)
95
{xi = eiΔθ cotψ cos(iΔθ) + x0, yi = eiΔθ cotψ sin(iΔθ) + y0
b = log(Rmax)2πnr
, ψ = cot−1(b), Δθ = 2πnθ.
(4.2)
The result of the transformation is a vector with the dimension N . Fig. 4.3
shows examples of the LS transformed vector P of the barbara image. Based on the
transformed vectors S of the model image and the target image, we can obtain the
scale and rotation parameters between the two images. This topic is discussed in
Subsection 4.1.2.
LS mapping yields significant advantages over LPT mapping in terms of the effec-
tiveness of sampling points distribution. Fig. 4.2 shows the comparison of the sam-
pling points distribution between LPT and LS mappings using 192 samples (nr = 3
and nθ = 64). Figs. 4.2(a) and 4.2(b) shows the mapping on the original Target logo
image, and the Target logo image that is scaled by 1.3, respectively. It can be seen
that using LPT mapping, from the center point outward, the sampling points in the
radius direction (circles) in the first image fall into the inner black region, the white
region, and the outer black region, respectively. However, in the second image, all the
sampling points of LPT map fall into the inner black region, and outer black region
only, and none of the sampling points falls into the white region. As a result, the two
images can not be registered correctly. On the other hand, distributing the sampling
points in the spiral manner as shown in Figs. 4.2(a) and 4.2(b), the sampling points
of LS mapping for both images fall into all the three regions. As a result, with only
small number of samples, the LS method can provide more efficient registration than
the LPT method. Fig. 4.2(c) shows the comparison of the sampling distribution be-
tween LPT and LS methods. It can be seen that using LPT mapping, image will be
sampled at only specific radius parameters such as at r = 1.14, 2, 2.82, 4, 5.65, 8, . . .,
96
where as image pixels at the distance in between r are missed. However, with LS
mapping, image pixels at every distance from the center are sampled.
4.1.2 LS matching
Given a model image IM(x, y), a target image IT (x, y) that is a scaled and rotated
version of the model image with scale parameter a and rotation parameter θ can be
expressed mathematically as
IT (x, y) = IM(ax cos θ + ay sin θ,−ax sin θ + ay cos θ). (4.3)
In the log-polar domain, Eq. (4.3) can be expressed as:
IT (log(r), θ) = IM(log(r) + log(a), θ − θ). (4.4)
Similarly to LPT, rotation and scale in the Cartesian coordinates appear as trans-
lation or shifting in the LS domain, as shown in Fig. 4.3. To find the displacement d,
such that ST (i) = SM (i−d), one can evaluate the phase-correlation function between
the vectors SM and ST :
[d,m] = arg maxP(SM ,ST ) (4.5)
where P and m are the phase-correlation function and its peak value, respectively.
However, d can result from either scale or rotation or both. As a result, two LS
mappings in opposite directions i.e. clockwise (CW) and counter-clockwise (CCW)
are needed. The second mapping can be achieved by simply inverting the sign of the
non-constant component of the variable y in Eq. 4.1, i.e. xi = eθi cotψ cos θi + x0 and
yi = −eθi cotψ sin θi + y0. To obtain the scale and rotation parameters between the
two images, each image needs to be mapped with two LS mappings, CW and CCW,
respectively. The transform vectors of the same direction are matched:
97
Comparison of LPT and LS mappings
LSLPT
(a)
Comparison of LPT and LS mappings
LSLPT
(b)
0 100 200 300 400 5000
2
4
6
8
10
12
14
16
Sampling point #
Dis
tanc
e fr
om c
ente
r to
the
sam
plin
g po
int (
pixe
ls)
Comparison of LPT and LS − Distance from center to the sampling point
LPTLS
(c)
Figure 4.2: Comparison of the sampling points distribution between LPT and LSmappings: (a) original Target logo image, (b)Target logo image with scale parameter1.3, and (c) comparison the locations of the sampling points from the center
98
[dcw, mcw] = arg maxP(SMcw,STcw),
[dccw, mccw] = arg maxP(SMccw,STccw) (4.6)
where the subscription cw and ccw represent the direction of the mapping. With the
displacement parameters d from both CW and CWW directions obtained, we can
now distinguish the displacement resulting from rotation and the one resulting from
scale. We denote dr and ds the displacement parameters from rotation and scale,
respectively. These two parameters can be obtained as follows:
Algorithm 1 : Calculate dr and ds from dcw and dccwif dcw = dccw then⇒ ds = dcw {Scale only}
else if dcw = −dccw then⇒ dr = dcw {Rotation only (respected to CW direction)}
else if ‖dcw‖ �= ‖dccw‖ thendcw = ds + drdccw = ds − dr {Both scale and rotation.}
end if
Finally, scale and rotation parameters in the Cartesian coordinate, a and θ, can
be computed as:
a = exp[ds log(Rmax)
nr], θ =
2πdrnθ
(4.7)
where ds = ceil[ds/nθ], and the operation ceil(A) denotes the nearest integers greater
than or equal to A.
99
50 100 150 200 2500
50
100
150
200
250
(a)
50 100 150 200 2500
50
100
150
200
250
(b)
50 100 150 200 2500
50
100
150
200
250
(c)
50 100 150 200 2500
50
100
150
200
250
(d)
Figure 4.3: LS transformed vectors of barbara images, where nr = nθ = 16 andRmax = 64: (a) from original barbara image (this transformed vector is presented in(b)-(d) as broken blue line for comparison), (b) from barbara image with 90 degreesrotation, (c) from barbara image with scale parameter of 2, and (d) from barbaraimage with scale and rotation parameters of 2 and 90 degrees, respectively
100
4.1.3 Pre-shifted logarithmic spiral
Both LPT and LS methods involve correlation for matching while the fastest and
most effective correlation technique is the phase correlation, which involve compu-
tation of Fourier transforms. The computational complexity of Fourier transform is
O(N log2(N)), where N is the total number of samples. Since the LS method re-
quires mapping for both CW and CWW directions, the computational cost of the LS
method is two times higher than that of LPT approach in order to have the same
registration accuracy. To reduce the computational complexity while maintaining the
same registration accuracy, we propose a novel PSLS technique. By modifying Eq.
4.1, a pre-shifted LS mapping is given by:
xi = eθi cotψ cos(θi + π/nθ) + x0,
yi = eθi cotψ sin(θi + π/nθ) + y0. (4.8)
We denote SPSLS(i) the transformed vector resulting from the pre-shifted LS map.
Example of PSLS mapping is shown in Fig. 4.1(c). Since the proposed approach
requires each image to be mapped with two separate spiral mappings, one in CW
direction and another in CWW direction, respectively, having one spiral mapping
pre-shifted by π/nθ radians would give the similar results as sampling the image with
2nθ samples in the angular direction. As a result, only half of the number of samples,
N/2, for each mapping are needed to maintaining the same registration accuracy as
that of the LPT approach with N number of samples. And since the computational
cost in the mapping process using Fourier transform method depends directly on the
number of samples, the computational cost is thus reduced. A comparison to LPT
approach is given in Section 4.3.
101
−20 −15 −10 −5 0 5 10 15 20−20
−15
−10
−5
0
5
10
15
20Comparison of LPT and LS mappings
LSLPT
(a)
−20 −15 −10 −5 0 5 10 15 20−20
−15
−10
−5
0
5
10
15
20Comparison of LPT and PSLS mappings
PSLSLPT
(b)
Figure 4.4: Comparison of sampling distributions: (a) LPT and LS mappings and(b) LPT and PSLS mappings
Another advantage of PSLS approach is the effectiveness in distributing the sam-
pling points. Figs. 4.4(a) and 4.4(b) show the comparison of LPT mapping with LS
mappings and PSLS mappings for image with the size 33× 33 pixels using 8 samples
in the radius direction and 64 samples in the angular direction. Unlike LPT mapping
in which the sampling in radius direction expands in the discrete manner, LS and
PSLS mapping continuously expand the sampling point in the radius direction in the
non-discrete manner. As a result, LS and PSLS methods sample the image for every
radius distance continuously. Thus, more information in the radius direction of the
image are obtained in the sampling process.
The effectiveness of sampling distribution can also be demonstrated in terms of
the amount of information that can be obtained from the sampling process. Since
distance from the center to the sampling points of LPT, LS, and PSLS increases
exponentially due to logarithmic functions, image pixels close to the center point tend
to be oversampled while image pixels that are further away tend to be undersampled
102
or missed. And since no information is gained when image pixels are oversampled,
computational waste and lost of information are occured. To evaluate and compare
the performance of the three methods, we apply LPT, LS, and PSLS to images with
the size 64 × 64 and 128 × 128, respectively using 9 different values of the number
of samples. The sampling points for each method are recorded and compared. We
define the percentage of oversampling rate, Q, as
Q =nrepntotal
× 100, (4.9)
where nrep is the number of image pixels that are sampled more than once, and ntotal
is the total number of samples. Fig. 4.5 shows the Comparison of percentage of
oversampling between LPT and PSLS mappings. It can be seen that both LS and
PSLS yields lower oversampling rate than that of the LPT. Thus, with the same
number of samples, both LS and PSLS will sampling more pixels than that of the
LPT. As a result, more information is obtained.
4.2 The New Image Registration Approach
4.2.1 Feature point based search scheme
Similarly to LPT based approach, the translation parameter between model image
and target image has to be found in order to keep the scale and rotation invariant
properties. The simplest solution is to perform the exhaustive search, in which the
PSLS is computed for every pixel in the target image and the translation parameter
is found from the image pixel that yields the best matching result to the PSLS of the
model image. Since it is computationally expensive to perform the exhaustive search,
an innovative method to reduce the space of search is needed. Many works have
103
8x8 8x16 16x16 16x32 32x32 32x64 64x64 64x128 128x10
10
20
30
40
50
60
70
80
90
100
Comparison of sampling effectiveness, Rmax
= 32
Number of samples, nr × nθ
(%)
Per
cent
age
of o
vers
ampl
ing
LPTLSPSLS
(a)
8x8 8x16 16x16 6x32 32x32 32x64 64x64 64x128 128x10
10
20
30
40
50
60
70
80
90
100
Comparison of sampling effectiveness, Rmax
= 64
Number of samples, nr × nθ
(%)
Per
cent
age
of o
vers
ampl
ing
LPTLSPSLS
(b)
Figure 4.5: Comparison of percentage of oversampling between LPT and PSLS map-pings: (a) images with the size 64× 64, and (b)images with the size 128× 128
proposed methods to speed up the search procedure. Zokai et al [94] uses a multi-
resolution imaging technique to reduce the computation cost of translation recovery
by searching from the coarsest to the finest level. This technique, however, performs
well only when the target image contains enough low frequency information and that
low frequency component is not distorted by noise. Fourier-Mellin [64, 69] uses the
Fourier phase correlation to recover the translation before computing the log-polar
matching in the frequency domain. However, a recent study [73] indicates the aliasing
problem from using the phase magnitude of FFT to recover the translation (of the
center point). We introduce a new method to accelerate the search procedure. The
new method is based on a reduced set of feature points.
In order to avoid the exhaustive search of the target image for the model image, we
reduce the search from every pixel of the target image to a set of feature points only.
These feature points are obtained by applying Gabor transform to every pixel and
selecting those pixels which generate high energy in the wavelet domain. Apparently,
104
the number of feature points is much smaller than that of the pixels in the target
image, while the computation of the Gabor wavelet transform is much lower than
that of PSLS. Thus, the computation load is much lighter than the exhaustive search
using PSLS. The reason we choose Gabor wavelet for extracting feature points is due
to its invariant properties to scale, rotation, and noise. More details of the Gabor
feature extraction method can be found in [47].
First, feature points are extracted from both the reference image and the target
image. We denote PM = {pM1 , .., pMnM} and P T = {pT1 , .., pTnT
} as sets of feature points
in the reference image and in the target image, respectively. The superscription M
and T denote the model (reference) image and the target image, and nM and nT are
the number of feature points in the reference image and the number of feature points
in the target image, respectively. To crop the model image out of the reference image,
we select one of the feature points in the reference image pMk as a center point for
PSLS and choose the size of the radius for the transformation Rmax that sufficiently
covers the model image to compute PSLS, in which SMcw and SMccw represent the model
image. To recover the translation between the model image and the target image,
we need to find the corresponding feature point pTk′ in the target image that relates
with the feature point pMk in the model image. In other word, by selecting one of
the feature points in the model image as a center point for PSLS, (x0, y0), we reduce
the search from every pixel of the target image to a set of feature points found in
the target image only. The matched feature point is the point that yields the highest
correlation coefficient E , where E for matching the model image to the target image
at feature point z is Ez = max{mcw−z, mccw−z}.
105
The number of feature points identified varies depending on the richness of details
in the image. In general, the number of feature points identified using the proposed
method is approximately 0.1% -1% of the total number of pixels in the image. And
for most images, these amounts of feature points are sufficient enough to find the
corresponding pixels; i.e. pMk in the model image and pTk′ in the target image. We
compromise the number of feature points identified in the image with the compu-
tational load for localization (refer to the Gabor feature extraction method in [47]).
The number of feature points is not too small to detect the corresponding feature
points, while not too large to significantly increase the computational load.
4.2.2 PSLS matching
Similarl to LS matching, each image is mapped with two LS mapping in the CW
and CWW directions, respectively. However, for the PSLS approach, one of the four
LS mappings is pre-shifted by π/nθ radian. For instance, the LS map shown in Fig.
4.1(a) is used for sampling both the model image and the target image. The LS
mappings shown in Figs. 4.1(b) and 4.1(c) are used for sampling the model image
and the target image, respectively. The matching procedure and the computation
of finding the scale parameter a remain the same as discussed in Subsection 4.1.2.
However, to find the rotation parameter θ, we take into consideration the peak value
of the phase-correlation function as similarity indicator of the two matchings: between
both non-shifted LS (CW matching in this example) and between non-shifted LS and
pre-shifted LS (CCW matching in this example). The procedure of this matching can
be described as follows:
106
Algorithm 2 : Calculate θ
if mcw ≥ mccw then⇒ θ = 2πdr
nθ
else if mcw < mccw then⇒ θ = 2πdr
nθ− π/nθ
end if
With PSLS, we can obtain the same registration accuracy as having nθ number
of samples by using only nθ/2 samples for each mapping. With this reduction the
total number of sample for each of the LS mapping reduces by half. As a result, the
computational cost of the matching using phase-correlation is significantly reduced.
4.2.3 Complete algorithm
Below is the complete steps of the proposed algorithm:
• Extract features in both the model image and the target image.
• Create the model image by selecting one of the feature points in the reference
image pMk as the origin. Then crop circular image patch IM that covers the area
Rmax desired to be registered to the target image.
• Compute the LS vectors SMcw and SMccw using the proposed LS mapping methods.
• Use each feature point in the target image pTz for z = 1 : nT , where nT is the
number of feature points in the target image as the origin and crop a circular
image patches ITz for z = 1 : nT with the radius size Rmax.
• Compute the sets of candidate PSLS vectors STcw = {STcw−1, . . . ,STccw−nT} and
STPSLS:ccw = {STPSLS:ccw−1, . . . ,STPSLS:cw−nT}.
107
• Match each candidate with the model using the proposed PSLS matching algo-
rithm. Translation is the point (x, y) that yield the highest correlation coeffi-
cient E . The scale and rotation parameters are obtained.
4.3 Experimental Results
In this section, the performance of the proposed PSLS approach is evaluated in
two categories: the registration accuracy and the computation cost. All experiments
are carried out using Matlab 2007a running on 2GHz Intel Pentium IV machine.
The computation time required for registering an image pair varies depending on the
image size, the richness of texture content in the image, which influence the number
of feature point detected in the image, and the size of Rmax.
4.3.1 Registration accuracy
To evaluate the effectiveness of the proposed image registration approach, we
use 20 test images, in which 10 images are randomly selected from the 1Ohio State
University database, 5 images are captured using digital camera in general lighting
condition, and 5 images are aerial images. We randomly cropped a rectangular area
from each of the test images to be the set of model images. Each test images are then
scaled with random values between 0.5 to 3 and rotated with random values between
30 to 180 degrees, respectively. We use the number of samples of nr = nθ = 64
for all the experiments and the sizes of Rmax are chosen according to the size of the
model image. The translation information is obtained using the proposed feature
based search scheme. The size of the search space using Gabor feature point varies
depending on the richness of details in the image, with the average of 100-150 points,
1http://sampl.ece.ohio-state.edu/database.htm
108
which considers as approximately 0.2% of the total number of pixels in the image. To
achieve accuracy at pixel level, we apply a small search windows of 5×5 pixels around
the translation point. Fig. 4.6 shows examples of the registration results using the
proposed approach. From the experiments, the average runtime is approximately 5-8
seconds for 400×400 pixels to 800×600 pixels images. By comparing the registration
results with the ground truth, our method yields average errors of 2.4 pixels, 0.08,
and 2.9 degrees for translation recovery, obtaining scale and rotation parameters,
respectively. This demonstrates the effectiveness and robustness of our proposed
PSLS image registration approach.
To compare the performance of our proposed method with LPT, we use three
images: barbara, lena and cameraman as the target images, all with the size of
256 × 256 pixels. A set of 10 model image patches of the size 128 × 128 pixels are
randomly cropped from each of the test image. We randomly apply one of the 30
different orientations, θ = {5.63, . . . , 168.9} degrees, and one of 30 different scale
parameters, a = {0.7, . . . , 3.7}, to each of the 30 model images. These model images
are then registered to the target images using the proposed PSLS approach and LPT
approach, both with same parameters of nr = 64, nθ = 32, and Rmax = 80 and
translation between the model image and the target image is obtained using the
proposed feature point based search scheme.
Figs. 4.7(a)-4.7(d) show the registration results in obtaining scale and rotation
parameters. As shown in Figs. 4.7(a) and 4.7(c), the proposed method yield higher
registration accuracy than of that the LPT approach with the same number of sam-
ples, and Figs. 4.7(b) and 4.7(d) indicate that the proposed method yields similar
scale invariant property to that of LPT. Therefore, our proposed method can provide
109
(a) (b) (c)
Figure 4.6: Image registration results using the proposed PSLS approach: (a) thereference images where the entire area are used as the model images, (b) targetimages and their features, and (c) registration results (red rectangular indicates theregistration result)
110
0 5 10 15 20 25 300
20
40
60
80
100
120
140
160
180
experiment #
rota
tion
PSLS approach: testing rotation invariant property
actual rotationexperimental results
(a)
0 5 10 15 20 25 30 350.5
1
1.5
2
2.5
3
3.5
4
experiment #
scal
e
PSLS approach: testing scale invariant property
actual scaleexperimental results
(b)
0 5 10 15 20 25 300
20
40
60
80
100
120
140
160
180
experiment #
rota
tion
LPT approach: testing rotation invariant property
actual rotationexperimental results
(c)
0 5 10 15 20 25 30 350.5
1
1.5
2
2.5
3
3.5
4
experiment #
scal
e
LPT approach: testing scale invariant property
actual scaleexperimental results
(d)
Figure 4.7: Experimental results: (a) PSLS rotation test results, (b) PSLS scale testresults, (c) LPT rotation test results, and (d) LPT scale test results
111
better registration accuracy than that of the LPT based method. This is because,
with the same number of samples in angular direction, nθ, the proposed PSLS ap-
proach samples the image at π/nθ radian interval, while the LPT approach samples
image at 2π/nθ radian interval. As a results, the proposed PSLS approach yields two
times better registration accuracy than that of LPT.
4.3.2 Computational cost
We evaluate the computation benefit of using PSLS approach. We define the
computational benefit (CB) in percentage of using PSLS in performing FFT as:
CB(%) =(TLPT − TPSLS)
TLPT× 100 (4.10)
where T is the computational time in the experiments. In practice, the computation
of the Fast-Fourier transform method on most mathematical software are performed
using FFTW method, which applying Cooley-Turkey algorithm that factorizes N into
smaller components. This FFTW method is proven to be faster than the conventional
FFT. However, as stated in [22], there is currently no theory of how to predict the
factorization (optimal plan) for each value of N and the computation time varies from
systems and computational platforms. Fig. 4.8 shows (CBexp) of applying FFTW to
PSLS approach at different number of samples using Matlab 2008a running on Intel
Core 2 Duo 2.0 GHz system. It can be seen that PSLS yields computational benefit
over LPT up to 40%. This indicates the effectiveness of the proposed pre-shifted
mechanism in reducing the computational cost while maintaining the registration
accuracy.
112
16x16 32x32 64x64 128x128 256x256 512x512 1024x1024−60
−50
−40
−30
−20
−10
0
10
20
30
40
50
Number of samples, nr × nθ
Com
puta
tiona
l ben
efit
in p
erce
ntag
e
Comparison of computational cost
Computational benefit
Figure 4.8: Computational benefit using PSLS approach
4.4 Summary
Although LPT has been widely used in many image processing applications for
its rotation and scale invariant properties, there are two major problems with the
LPT based image registration method: inefficient sampling point distribution and
high computational cost in the matching procedure. In this paper, we presented a
new image registration approach using PSLS approach. Unlike the LPT approach in
which the sampling points in the radius direction are distributed as multiple circles
and the distance between two consecutive circumferences increases exponentially as
the circles get further away from the center, the distance from each sampling point
to the center points of the proposed approach increases continuously in the spiral
113
manner. As a result, information at every distance from the center can be obtained.
Moreover, the spiral mapping distributes the sampling point efficiently and reduces
the problem of oversampling at the fovea, while maintaining the scale and rotation
invariant properties of the LPT. The proposed method employs two separate spiral
mappings for each image, one in the CW direction and another in CCW direction,
respectively. And by pre-shifted the one of the spiral mappings by π/nθ radian, we
can register images with the same accuracy as LPT approach while using only half of
the number of samples for each mapping. As a result, the computational complexity
in the matching process is greatly reduced. An effective 1D matching mechanisms
are proposed to effectively and accurately obtain scale and rotation parameters of
the registered images. Experiments are provided to demonstrate the accuracy and
robustness of the proposed approach.
114
CHAPTER 5
CONCLUSION AND FUTURE WORKS
Image registration is a process of aligning two images that share common visual
information such as images of the same object or images of the same scene taken at
different geometric viewpoints, different time, or by different image sensors. Image
registration is an essential step in many image processing applications that involve
multiple images for comparison, integration or analysis such as image fusion, image
mosaics, image or scene change detection, and medical imaging. Many image regis-
tration methods have been proposed in the past 20 years [5,8,35,39,45,64,93,94]. In
the past decade, LPT become a well-known tool for image registration for its rotation
and scale invariant properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in
the Cartesian coordinates appear as translation or shifting in the log-polar domain.
These invariant properties make LPT based image registration robust to scale and
rotation. The major work of the dissertation is to extend the horizon of LPT and
address the major problems of the method:
1. nonuniform sampling, which leads to high computational cost in the transfor-
mation process
2. bias matching
115
3. sensitive to translation
4. high computational cost in the matching process
5. ineffective sampling point distribution due to the oversampling in the spatial
domain
In Chapter 2, we explore the use of LPT to 2D object recognition applications.
We combine a feature extraction technique with innovative feature point filtering
approach to address the problem of high sensitivity to translation of LPT and reduce
the computational time in the translation recovery process. In Chapter 3, we tackle
the problem of LPT for image registration application by focusing on the mapping
procedure and how to reduce the effect of nonuniform sampling that leads to many
problems as listed above. One of the contribution of this dissertation is the new
innovative adaptive sampling method that reduces the computational waste in the
transform process due to the oversampling at the image pixel around the fovea of
LPT. Along with the proposed projection and 1D matching techniques, the proposed
method greatly improves the performance of the LPT and yields less computational
cost.
In Chapter 4, we further observe LPT in the different perspective, not at one
particular issue, but for the entire LPT based image registration as a whole. Following
the same line as LPT, we introduce new image processing methods, LS, and PSLS. At
a glance, the new technique shares some common characteristics as LPT. However, the
actual process is entirely different than that of LPT from the mathematical expression
of the transformations, the sampling points distribution, and the matching procedure.
116
Significant improvement in performance, accuracy, and computational cost reduction
are obtained.
In closing, this dissertation provides in depth analysis of LPT approach, its ad-
vantages, problems and applications for image processing. Innovative techniques are
proposed to address the problems of LPT and providing performance improvement
for the LPT method. However, the work is still far from complete to make perfect
image registration system. Each topic listed above is rich in content, and to address
only one of them adequately is a challenging task. With demands in image regis-
tration application growing rapidly from military applications, medical section, and
manufacturing, a faster and more accurate method is always desired. Today new
photographic devices are capable of much higher resolution than before and real-time
computation is always in demand for many applications and in practical use. Future
works need to focus on reducing sensitivity to translation of the methods. Artificial
intelligent such as recognition and retrieval can be integrated to the system to reduce
the search time. With high registration accuracy and 2D invariant properties of LPT
and PSLS, new applications should be explored. With the limitless possibility, one
day LPT and PSLS may have their footprint in every area we can possibly imagine.
117
APPENDIX A
GABOR WAVELET FEATURE EXTRACTION
In order to keep the advantage of the LPT scale and rotation invariant properties,
translation parameter has to be found. FM uses Fourier phase correlation to recover
the translation before computing the Log-Polar matching in the frequency domain.
However, recent study [94] indicates aliasing problem from using Phase magnitude
of FFT to recover the translation of an object (center point). Zokai et al [94] uses
multi-resolution imaging technique to computation cost of translation recovery by
searching from the coarsest to the finest level. This technique, however, performs
well only when the target image contains enough low frequency information and
that low frequency component is not distorted by noise. To address this problem,
we introduce scale and rotation invariant feature based search scheme that not only
suits our goal but accelerate the search time by avoiding the exhaustive search. By
convolving image with Gabor wavelets [14,20,23,27,32,54,59,71,85,86,88] at different
scale and orientation, the Gabor coefficients that reflects the frequency response of
the image to the Gabor filter at different scale and rotation is obtained, as shown
below:
g(x, y) =
(1
2πσxσy
)exp
[−1
2
(x2
σ2x
+y2
σ2y
)+ 2πjWx
](A.1)
118
gmn(x, y) = a−mg(x′, y′); a > 1;m = 1 : M ;n = 1 : N (A.2)
[x′
y′
]= a−m
[cosθn sinθn−sinθn cosθn
] [xy
](A.3)
C(x, y,m, n) =
∫ ∫I(x′, y′)gmn(x− x′, y − y′)dx′dy′ (A.4)
where σ is the standard deviation along the axes. M is the number of scale and
N is the number of orientations of the Gabor filter.
To obtain Gabor features. firstly, the image coefficients are normalized in order to
reduce the effect of illumination changes. For every image pixel (x, y), we choose only
the coefficients from the scale factor that gives the highest average frequency response
for all orientation, the remaining coefficients from other scale factor is discarded. We
then calculate the geometric mean from the remaining coefficients of that image pixel
as shown below:
C(x, y) =
(N∑n=1
C(x, y,ms, n)
) 1n
(A.5)
where ms is scale factor that gives the highest average frequency response for all
orientation at pixel (x, y). A non-maxima suppress is applied to select the maxima
value. We note that in our approach, unlike feature based image registration methods,
feature points do not represent the image or object. They are used for accelerate the
search only.
As illustrated in Figs. A.1(a) and A.1(b), Gabor feature gives comparable per-
formance to SIFT and Harris-Laplacian as scale and rotation invariant features.
119
20 40 60 80 100 120 140 160 1800
20
40
60
80
100
Orientaion (degree)
Rep
eata
bilit
y (%
)
Gabor featuresSIFTHarris−Laplacian
(a)
0.4 0.5 0.6 0.7 0.8 0.90
20
40
60
80
100
Scale factor
Rep
eata
bilit
y (%
)
Gabor featuresSIFTHarris−Laplacian
(b)
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160
20
40
60
80
100
Gaussian Noise (Variance)
Rep
eata
bilit
y (%
)
Gabor featuresSIFTHarris−Laplacian
(c)
Figure A.1: Performance evaluations of proposed Gabor features, SIFT [43], andHarris-Laplacian [50], (a) Orientation change, (b) scale change, (c) noise added
120
However, in the noisy environment, Gabor feature outperforms SIFT and Harris-
Laplacian, which is shown in Fig. A.1(c). We note that the performance of feature
extraction methods were evaluated as repeatability manner as suggested in [51]. Fea-
tures are considered repeatable if no more than 5 pixels displacement offset is detected
in the test image.
121
BIBLIOGRAPHY
[1] J. Ahmed and M. N. Jafri. Improved phase correlation matching. In Proc. Imageand Signal Processing, pages 128–135, 2008.
[2] H. Andreasson and T. Duckett. Topological localization for mobile robots us-ing omni-directional vision and local features. In Proc. Intelligent Autonomous
Vehicles, Jul 2004.
[3] H. Araujo and J. M. Dias. An introduction to the log-polar mapping. In Proc.
Second Workshop on Cybernetic Vision, pages 139–144, Dec 1996.
[4] R. C. Archibald. The logarithmic spiral. Amer. Math. Monthly, 25:189–193,1918.
[5] D. I. Barnea and H. F. Silverman. A class of algorithms for fast digital imageregistration. IEEE Trans. Computers, 21(2):179–186, Feb 1972.
[6] C. Baumgarten and G. Farin. Approximation of logarithmic spirals. ComputerAided Geometric Design, 14(6):515–532, 1997.
[7] A. Bottcher, Y. I. Karlovich, and V. S. Rabinovich. Emergence, persistence, anddisappearance of logarithmic spirals in the spectra of singular integral operators.
Integral Equations and Operator Theory, 25(4):406–444, Dec 1996.
[8] L. G. Brown. A survey of image registration techniques. ACM Computing Sur-
veys, 24(4):325–376, Dec 1992.
[9] M. O. Caceres, A. K. Chattah, W.-H. Wang, and Y.-C. Chen. Image registrationby control points pairing using the invariant properties of line segments. Pattern
Recognition Letters, 18(13):269–281, Mar 1997.
[10] C. Capurro, F. Panerai, and G. Sandini. Vergence and tracking fusing log-polar
images. In Proc. Intl. Conf. Pattern Recognition, volume 4, pages 740–744, Aug1996.
122
[11] Q. S. Chen, M. Defrise, and F. Deconinck. Symmetric phase-only matched fil-tering of Fourier-Mellin transforms for image registration and recognition. IEEE
Trans. Pattern Analysis and Machine Intelligence, 16(12):1156–1168, Dec 1994.
[12] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based object tracking. IEEE
Trans. Pattern Analysis and Machine Intelligence, 25(5):564–577, May 2003.
[13] M. Comer and E. Delp. Segmentation of textured images using a multiresolution
gaussian autoregressive model. IEEE Trans. Image Processing, 8(3):408–420,Mar 1999.
[14] J. Cook, V. Chandran, and S. Sridharan. Multiscale representation for 3-D facerecognition. IEEE Trans. Information Forensics and Security, 2(3):529–536, Sep
2007.
[15] T. Darrell. A radial cumulative similarity transform for robust image correspon-dence. In Proc. Computer Vision and Pattern Recognition, pages 656–662, Jun
1998.
[16] P. J. Davis, W. Gautschi, and A. Iserles. Spirals: From Theodorus to Chaos. AK
Peters, Ltd., Wellesley, Jun 1993.
[17] G. Deschamps and J. Dyson. The logarithmic spiral in a single-aperture multi-
mode antenna system. , IEEE Trans. Antennas and Propagation, 19(1):90–96,Jan 1971.
[18] R. M. Dufour, E. L. Miller, and N. P. Galatsanos. Template matching basedobject recognition with unknown geometric parameters. IEEE Trans. Image
Processing, 11(12):1385–1396, Dec 2002.
[19] S. E. Englert, Z. Sheng, and R. L. Kirlin. The evaluation of a crosscorrelationpattern recognition technique for flow visualization. Pattern Recognition, 23(3–
4):237–243, 1990.
[20] M. Feng and T. R. Reed. Motion estimation in the 3-D Gabor domain. IEEE
Trans. Image Processing, 16(8):2038–2047, Aug 2007.
[21] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for
model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, Jun 1981.
[22] Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecturefor the FFT. In Proc. Int. Conf. Acoustics, Speech and Signal Processing, pages
1381–1384, May 1998.
123
[23] D. Gonzalez-Jimenez and J. L. Alba-Castro. Shape-driven Gabor jets for facedescription and authentication. IEEE Trans. Information Forensics and Security,
2(4):769–780, Dec 2007.
[24] A. Goshtasby and C. Shekhar. Point pattern matching using convex hull edges.IEEE Trans. Systems, Man and Cybernetics, 15:631–637, 1985.
[25] A. Gray, E. Abbena, and S. Salamon. Modern Differential Geometry of Curves
and Surfaces with Mathematica, Third Edition. Chapman and Hall CRC, 2006.
[26] C. Harris and M. Stephens. A combined corner and edge detection. In Proc. theFourth Alvey Vision Conference, pages 147–151, 1988.
[27] C. He, Y. F. Zheng, and S. C. Ahalt. Object tracking using the Gabor wavelet
transform and the golden section algorithm. IEEE Trans. Multimedia, 4(4):528–538, Dec 2002.
[28] M. Helm. Towards automatic rectification of satellite images using feature based
matching. In Proc. Intl. Conf. Geoscience and Remote Sensing Symposium, pages2439–2442, Jun 1991.
[29] P. Hemenway and A. Ray. Divine Proportion: Φ in Art, Nature, and Science.Sterling Publisher, 2005.
[30] J.-W. Hsieh, H.-Y. M. Liao, K.-C. Fan, M.-T. Ko, and Y.-P. Hung. Image
registration using a new edge-based approach. Computer Vision and Image Un-derstanding, 67(2):112–130, 1997.
[31] Y. C. Hsieh, D. M. McKeown, and F. P. Perlant. Performance evaluation of
scene registration and stereo matching for cartographic feature extraction. IEEETrans. Pattern Analysis and Machine Intelligence, 14(2):214–238, Feb 1992.
[32] J. Ilonen, J.-K. Kamarainen, P. Paalanen, M. Hamouz, J. Kittler, and H. Kalvi-
ainen. Image feature localization by multiple hypothesis testing of Gabor fea-tures. IEEE Trans. Image Processing, 17(3):311–325, Mar 2008.
[33] O. Isik, K. P. Esselle, and Y. Ge. Compact microstrip and CPW duplexers
using complementary and conventional logarithmic spiral resonators. In Proc.Antennas and Propagation Society International Symposium, pages 4977–4980,
Jun 2007.
[34] F. Jurie. A new log-polar mapping for space variant imaging - application toface detection and tracking. Pattern Recognition, 32(5):865–875, May 1999.
[35] C. D. Kuglin and D. C. Hines. The phase correlation image alignment method.
In Proc. Cybernetics and Society, pages 163–165, Sep 1975.
124
[36] B. V. K. V. Kumar, A. Mahalanobis, and R. D. Juday. Correlation patternrecognition. Cambridge University Press, New York, NY, USA, 2005.
[37] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. V. D. Malsburg, R. P.Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic
link architecture. IEEE Trans. Computers, 42:300–311, Mar 1993.
[38] D. A. LeMaster. A comparison of template matching registration methods for
polarimetric imagery. In Proc. Aerospace Conference, pages 1–9, Mar 2008.
[39] J. P. Lewis. Fast normalized cross-correlation. In Proc. Vision Interface, pages
120–123, May 1995.
[40] H. Liu, B. Guo, and Z. Feng. Pseudo-log-polar Fourier transform for imageregistration. IEEE Trans. Signal Processing Letters, 13(1):17–20, Jan 2006.
[41] Z. Lou and J.-M. Jin. Modeling and simulation of broad-band antennas usingthe time-domain finite element method. IEEE Trans. Antennas and Propagation,
53(12):4099–4110, Dec 2005.
[42] Z. Lou and J. M. Jin. Time-domain finite-element modeling and simulation of
broadband antennas. In Proc. Antennas and Propagation Society InternationalSymposium, pages 2809–2812, Jul 2006.
[43] D. Lowe. Distinctive image features from scale-invariant keypoints. InternationalJournal of Computer Vision, 60(2):91–110, Nov 2004.
[44] H. Mahersia and K. Hamrouni. New rotaion invariant features for texture classi-fication. In Proc. Intl. Conf. Computer and Communication Engineering, pages
687–690, May 2008.
[45] J. B. A. Maintz and M. A. Viergever. A survey of medical image registration.Medical Image Analysis, 2(1):1–36, Mar 1998.
[46] B. S. Manjunath and C. Shekhar. A new approach to image feature detectionwith applications. Pattern Recognition, 29(4):627–640, Apr 1996.
[47] R. Matungka, Y. F. Zheng, and R. L. Ewing. 2D invariant object recognitionusing log-polar transform. In Proc. World Congress on Intelligent Control and
Automation, pages 223–228, Jun 2008.
[48] R. Matungka, Y. F. Zheng, and R. L. Ewing. Image registration using adaptive
polar transform. In Proc. Int. Conf. Image Processing, pages 2416–2419, Oct2008.
125
[49] R. Matungka, Y. F. Zheng, and R. L. Ewing. Object recognition using log-polarwavelet mapping. In Proc. Intl. Conf. Tools with Artificial Intelligence, volume 2,
pages 559–563, Nov. 2008.
[50] K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors.
International Journal of Computer Vision, 60(1):63–86, Oct 2004.
[51] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors.
IEEE Trans. Pattern Analysis and Machine Intelligence, 27(10):1615–1630, Oct2005.
[52] S. Moss and E. R. Hancock. Image registration with shape mixtures. In Proc.Intl. Conf. Image Analysis and Processing, pages 172–179, London, UK, 1997.
Springer-Verlag.
[53] S. Moss and E. R. Hancock. Multiple line-template matching with the EMalgorithm. Pattern Recognition Letters, 18(11–13):1283–1292, Nov 1997.
[54] R. M. Mutelo, W. L. Woo, and S. S. Dlay. Discriminant analysis of the two-dimensional Gabor features for face recognition. Computer Vision, IET, 2(2):37–
49, Jun 2008.
[55] A. N. Myrna, M. G. Venkateshmurthy, and C. G. Patil. Detection of region du-
plication forgery in digital images using wavelets and log-polar mapping. In Proc.Intl. Conf. Computational Intelligence and Multimedia Applications, volume 3,
pages 371–377, Dec 2007.
[56] K. Noe, J. Mosegaard, K. Tanderup, and T. S. Sørensen. A framework for shape
matching in deformable image registration. Studies in health technology and
informatics, 132:333–335, Sep 2008.
[57] J. J. K. O Ruanaidh and T. Pun. Rotation, scale and translation invariant digital
image watermarking. In Proc. Intl. Conf. Image Processing, pages 536–539, Oct1997.
[58] G. A. Papakostas, Y. S. Boutalis, D. A. Karras, and B. G. Mertzios. Fast nu-merically stable computation of orthogonal Fourier Mellin moments. Computer
Vision, IET, 1(1):11–16, Mar 2007.
[59] T.W. Parks and C. S. Burrus. Digital Filter Design. John Wiley and Sons, 1987.
[60] C. M. Pun and M. C. Lee. Log-polar wavelet energy signatures for rotation andscale invariant texture classification. IEEE Trans. Pattern Anal. Mach. Intell.,
25(5):590–603, May 2003.
126
[61] S. P. Raman and U. B. Desai. 2D object recognition using Fourier Mellin trans-form and a MLP network. In Proc. Intl. Conference on Neural Networks, vol-
ume 4, pages 2154–2156, Dec 1995.
[62] M. G. Ramos and S. S. Hemami. Eigenfeatures coding of videoconferencing
sequences. In Proc. Visual Communications and Image Processing, pages 100–110, Feb 1996.
[63] H. S. Ranganath and S. G. Shiva. Correlation of adjacent pixels for multipleimage registration. IEEE Trans. Computers, 34(7):674–677, 1985.
[64] B. S. Reddy and B. N. Chatterji. An FFT-based technique for translation,rotation, and scale-invariant image registration. IEEE Trans. Image Processing,
5(8):1266–1271, Aug 1996.
[65] A. Said and W. A. Pearlman. An image multiresolution representation for losslessand lossy compression. IEEE Trans. Image Processing, 5(6):1303–1310, Sep 1996.
[66] H. Schmidt. Ausgewa hlte ho here kurven. Kesselringsche VerlagsbuchhandlungWiesbaden, Germany, 1949.
[67] M. Sester, H. Hild, and D. Fritsch. Definition of groundcontrol features for imageregistration using GIS data. In Proc. Object Recognition and Scene Classification
from Multipspectral and Multisensor Pixels, pages 537–543, 1998.
[68] S. Y. E. Sharouni, H. B. Kal, and J. J. Battermann. Accelerated regrowth of
non-small-cell lung tumours after induction chemotherapy. British Journal ofCancer, 89(12):2184–2189, Dec 2003.
[69] A. Y. Sheng, C. Lejeune, and H. H. Arsenault. Frequency-domain Fourier-Mellin
descriptors for invariant pattern recognition. Optical Engineering, 27(5):354–357,May 1988.
[70] R. Song and J. Szymanski. Auto-sorting scheme for image ordering applicationsin image mosaicing. Electronics Letters, 44(13):798–799, Jun 2008.
[71] A. Spinei, D. Pellerin, D. Fernandes, and J. Herault. Fast hardware imple-mentation of Gabor filter based motion estimation. Integrate Computer-Aided
Engineering, 7(1):67–77, Jan 2000.
[72] G. C. Stockman, S. Kopstein, and S. Benett. Matching images to models for
registration and object detection via clustering. IEEE Trans. Pattern Analysisand Machine Intelligence, 4(3):229–241, May 1982.
127
[73] H. S. Stone, B. Tao, and M. McGuire. Analysis of image registration noise dueto rotationally dependent aliasing. Journal of Visual Communication and Image
Representation, 14(2):114–135, Jun 2003.
[74] M. Tistarelli and G. Sandini. On the advantages of polar and log-polar mapping
for direct estimation of time-to-impact from optical flow. IEEE Trans. PatternAnalysis and Machine Intelligence, 15(4):401–410, Apr 1993.
[75] H. Tomioka, M. Suhara, and T. Okumura. Analysis of edge effect on millimeter-sized planar logarithmic spiral antennas. In Proc. Asia-Pacific Microwave Con-
ference, volume 5, pages 3–5, Dec 2005.
[76] V. J. Traver, A. Bernardino, P. Moreno, and J. S. Victor. Appearance-based
object detection in space-variant images: a multi-model approach. In Proc. Intl.
Conf. Image Analysis and Recognition, pages 538–546, Sep 2004.
[77] V. J. Traver and F. Pla. Dealing with 2D translation estimation in log-polar
imagery. Image and Vision Computing, 21(2):145–160, Feb 2003.
[78] V. J. Traver and F. Pla. Similarity motion estimation and active tracking through
spatial-domain projections on log-polar images. Computer Vision and ImageUndersanding, 97(2):209–241, 2005.
[79] V. J. Traver and F. Pla. Log-polar mapping template design: from task-levelrequirements to geometry parameters. Image Vision Computing, 26(10):1354–
1370, 2008.
[80] T. VanCourt, Y. Gu, and M. C. Herbordt. Three-dimensional template corre-
lation: object recognition in 3D voxel data. In Proc. Computer Architecture for
Machine Perception, pages 153–158, Jul 2005.
[81] T. Vincent and R. Laganiere. Matching feature points in stereo pairs: a compara-
tive study of some matching strategies. Machine Graphics and Vision, 10(3):237–260, 2001.
[82] T. Vincent and R. Laganiere. Matching feature points for telerobotics. In Proc.Haptic Virtual Environments and Their Applications, pages 13–18, 2002.
[83] S. Wisetphanichkij and K. Dejhan. Multi-resolution image registration based oncorrelation technique. In Proc. Intl. Geoscience and Remote Sensing Symposium,
volume 6, pages 3871–3875, Jul 2005.
[84] W. K. Wong, C. W. Choo, C. K. Loo, and J. P. Teh. FPGA implementation of
log-polar mapping. In Proc. Intl. Conf. on Mechatronics and Machine Vision inPractice, pages 45–50, Dec. 2008.
128
[85] J. Wu, G. An, and Q. Ruan. Independent Gabor analysis of discriminant featuresfusion for face recognition. IEEE Trans. Signal Processing Letters, 16(2):97–100,
Feb 2009.
[86] X. Wu and B. Bhanu. Gabor wavelet representation for 3-D object recognition.
IEEE Trans. Image Processing, 6(1):47–64, Jan 1997.
[87] B. Xiong, C. Zhu, C. Charoansak, and W. Yu. An FPGA prototyping of Gabor-
wavelets transform for motion detection. International Journal of ComputerSciences and Engineering Systems, 1(1):65–70, Jan 2007.
[88] F. Yang and M. Paindavoine. Fast motion estimation based on spatio-temporalGabor filters: parallel implementation on multi-dsp. In Proc. SPIE: Advanced
Signal Processing Algorithms, Architectures, and Implementations, volume 4116,
pages 454–462, Aug 2000.
[89] D. Young. Straight lines and circles in the log-polar image. In Proc. British
Machine Vision Conference, pages 426–435, Sep 2000.
[90] F. Yuan, H. Zhang, and R. Jia. Digital image stabilization based on log-polar
transform. In Proc. Intl. Conf. Image and Graphics, pages 769–773, Aug 2007.
[91] Z. Zhang, R. Deriche, O. D. Faugeras, and Q. T. Luong. A robust technique for
matching two uncalibrated images through the recovery of the unknown epipolargeometry. Artificial Intelligence, 78(1–2):87–119, 1995.
[92] Q. Zheng and R. Chellappa. A computational vision approach to image regis-tration. IEEE Trans. Image Processing, 2(3):311–326, Jul 1993.
[93] B. Zitova and J. Flusser. Image registration methods: a survey. Image and
Vision Computing, 21(11):977–1000, Oct 2003.
[94] S. Zokai and G. Wolberg. Image registration using log-polar mappings for recov-
ery of large-scale similarity and projective transformations. IEEE Trans. ImageProcessing, 14(10):1422–1434, Oct 2005.
129