studies on log-polar transform for image registration and improvements using adaptive sampling

STUDIES ON LOG-POLAR TRANSFORM FOR IMAGE

REGISTRATION AND IMPROVEMENTS USING

ADAPTIVE SAMPLING AND LOGARITHMIC SPIRAL

DISSERTATION

Presented in Partial Fulfillment of the Requirements for

the Degree Doctor of Philosophy in the

Graduate School of The Ohio State University

By

Rittavee Matungka, M.S.E., M.A., B.S.

* * * * *

The Ohio State University

2009

Dissertation Committee:

Professor Yuan F. Zheng, Adviser

Professor Bradley Clymer

Professor Ashok Krishnamurthy

Approved by

Adviser

Graduate Program inElectrical and Computer

Engineering

c© Copyright by

Rittavee Matungka

2009

ABSTRACT

Image registration is an essential step in many image processing applications that

need visual information from multiple images for comparison, integration or analysis.

Recently researchers have introduced image processing techniques using the log-polar

transform (LPT) for its rotation and scale invariant properties. However, limitations

still exist when LPT is applied to image registration applications. The major thesis of

this dissertation is to provide in depth analysis of LPT, its advantages, and problems.

We introduce new formulations that are derived to overcome the limitations of LPT

and thus extend its horizon in practice. A novel techniques are proposed to address

the limitations.

The first extension to be introduced is to extend the use of LPT to 2D object recog-

nition application. Motivated by the observation that LPT is sensitive to translation,

which leads to high computation in the search process, we integrate the combination

of feature extraction and feature point screening method to the recognition system.

With the scale and rotation invariant properties of LPT and the low computation

feature extraction and screening approach that further reduces the number of feature

point, a 2D object recognition is presented.

The second limitations of LPT that is addressed in this dissertation is the nonuni-

form sampling and its effects. LPT suffers from nonuniform sampling which makes it

not suitable for applications in which the registered images are altered or occluded.

ii

Inspired by this fact, we presents a new registration algorithm that addresses the

problems of the conventional LPT while maintaining the robustness to scale and ro-

tation. We introduce a novel Adaptive Polar Transform (APT) technique that evenly

and effectively samples the image in the Cartesian coordinates. Combining APT with

an innovative projection transform along with a matching mechanism, the proposed

method yields less computational load and more accurate registration than that of

the conventional LPT. Translation between the registered images is recovered with

the new search scheme using Gabor feature extraction to accelerate the localization

procedure. Moreover an image comparison scheme is proposed for locating the area

where the image pairs differ. Experiments on real images demonstrate the effective-

ness and robustness of the proposed approach for registering images that are subjected

to occlusion and alteration in addition to scale, rotation and translation.

Lastly, this dissertation observes the accuracy of the LPT approach that is limited

by the number of samples used in the mapping process. Since obtaining scale and

rotation parameters involves 2D correlation method either in the spatial domain or

in the frequency domain, the computational complexity of the matching procedure

grows exponentially as the number of samples increases. Motivated by this limitation,

we propose a novel pre-shifted logarithmic spiral (PSLS) approach that is robust

to translation, scale, and rotation and requires lower computational cost. By pre-

shifting the sampling point in the angular direction by π/nθ radian, the total number

of samples in angular direction can be reduced by half. This yields great reduction in

computational load in the matching process. Experimental results demonstrate the

effectiveness and robustness of the proposed approach.

iii

This is dedicated to my parents

iv

ACKNOWLEDGMENTS

Firstly, I would like to express my deep gratitude and appreciation to my advisor,

Dr. Yuan F. Zheng, for the guidance, encouragement and support he has given me

throughout my Ph.D study at the Ohio State University. He taught me how to think

outside the box, approach the research problems and express my ideas. He was always

there to encourage me whenever I lost hope and confidence, to give me advise and

guidance whenever I felt like my work hit the brick wall, and he always there to inspire

me that with the hard work and persistent, I can accomplish any goal. Without him,

this dissertation would not have been completed.

Special thanks to Dr. Bradley Clymer and Dr. Ashok Krishnamurthy for serving

on my committee. Their technical feedbacks and constructive comments make this

dissertation better than it could be.

It has been my pleasure to work with my colleges at The Ohio State University.

Mr. Junda Zhu, Mr. Liang Yuan, Ms. Yen-Lun Chen, Mr. Yuanwei Lao, Mr.

Gaotham Tamma, and all my colleges at the Multimedia and Robotics Laboratory,

thanks for their continual help and warmth friendship. I also would like to thank

Mr. Juan Torres, Mr. Zakaria Chehab, and Ms. Sripattamma Tawasay for their love,

support, and encouragement. It has been a wonderful journey because of them.

v

I cannot end without thanking my family, Mrs. Aprirada Matungka, Mr. Adiz

Rittikal, Mr. Vorrayuth Matungka, Mrs. Sunee Matungka and Ms. Nattarudee

Matungka on whose constant encouragement and love I have relied through all the

highs and lows. It is to them that I dedicate this work.

vi

VITA

August 20, 1980 . . . . . . . . . . . . . . . . . . . . . . . . . . . . Born - Bangkok, Thailand

2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .B.S. Electrical Engineering, Chula-longkorn University, Thailand

2001 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.A. Business and Managerial Eco-nomics, Chulalongkorn University,Thailand

2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .M.S.E. Telecommunications and Net-working Engineering, University ofPennsylvania, Pennsylvania

PUBLICATIONS

R. Matungka and Y. F. Zheng, “Scale and rotation invariant image registration usingpre-shifted logarithmic spiral,” in Proc. International Conference on Image Process-

ing, 2009 (Paper submitted)

R. Matungka and Y. F. Zheng, “Aerial image registration using projective polar trans-

form,” in Proc. International Conference on Acoustics, Speech, and Signal Processing,2009 (Paper accepted: to be published in 2009)

R. Matungka and Y. F. Zheng, “Image registration using adaptive polar transform,”

IEEE Trans. Image Processing, 2009 (Journal paper is conditionally accepted)

R. Matungka, Y. F. Zheng, and R. L. Ewing, “Object recognition using log-polarwavelet mapping,” in Proc. Tools with Artificial Intelligence, Nov 2008, pp. 559–563.

R. Matungka, Y. F. Zheng, and R. L. Ewing, “Image registration using adaptive polar

transform,” in Proc. International Conference on Image Processing, Oct 2008, pp.2416–2419.

vii

R. Matungka, Y. F. Zheng, and R. L. Ewing, “2D invariant object recognition usinglog-polar transform,” in Proc. World Congress on Intelligent Control and Automa-

tion, Jun 2008, pp. 223–228.

FIELDS OF STUDY

Major Field: Electrical and Computer Engineering

Studies in:

Communications and Signal ProcessingComputer Engineering

viii

TABLE OF CONTENTS

Page

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Chapters:

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Log-Polar Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Correlation approaches . . . . . . . . . . . . . . . . . . . . . 6

1.2.2 Phase correlation . . . . . . . . . . . . . . . . . . . . . . . . 71.2.3 Fourier-Mellin . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.4 Feature-based approaches . . . . . . . . . . . . . . . . . . . 101.2.5 Multiresolution framework . . . . . . . . . . . . . . . . . . . 11

2. 2D INVARIANT OBJECT RECOGNITION USING LOG-POLAR TRANS-

FORM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.1 Log-Polar Transform . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2 The New LPT-Based Object Recognition Approach . . . . . . . . . 172.2.1 Establish the model . . . . . . . . . . . . . . . . . . . . . . 17

2.2.2 Global search . . . . . . . . . . . . . . . . . . . . . . . . . . 18

ix

2.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.1 The robustness . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.3.2 The accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . 352.4 Extended Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3. IMAGE REGISTRATION USING ADAPTIVE POLAR TRANSFORM 47

3.1 Adaptive Polar Transform Approach . . . . . . . . . . . . . . . . . 49

3.1.1 Motivation of the proposed approach . . . . . . . . . . . . . 493.1.2 The proposed APT approach . . . . . . . . . . . . . . . . . 52

3.1.3 Projection transform . . . . . . . . . . . . . . . . . . . . . . 56

3.2 The New Image Registration Approach Using APT . . . . . . . . . 603.2.1 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.2 APT matching . . . . . . . . . . . . . . . . . . . . . . . . . 623.2.3 Image comparison . . . . . . . . . . . . . . . . . . . . . . . 70

3.2.4 Complete algorithm . . . . . . . . . . . . . . . . . . . . . . 723.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 73

3.3.1 Image registration in general conditions . . . . . . . . . . . 753.3.2 Registration of images that are subjected to occlusion and

alteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.3.3 Registration results and errors . . . . . . . . . . . . . . . . 86

3.4 Extended Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . 883.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4. SCALE AND ROTATION INVARIANT IMAGE REGISTRATION US-

ING PRE-SHIFTED LOGARITHMIC SPIRAL . . . . . . . . . . . . . . 91

4.1 The Proposed Logarithmic Spiral Approach . . . . . . . . . . . . . 934.1.1 Logarithmic spiral mapping . . . . . . . . . . . . . . . . . . 94

4.1.2 LS matching . . . . . . . . . . . . . . . . . . . . . . . . . . 974.1.3 Pre-shifted logarithmic spiral . . . . . . . . . . . . . . . . . 101

4.2 The New Image Registration Approach . . . . . . . . . . . . . . . . 1034.2.1 Feature point based search scheme . . . . . . . . . . . . . . 103

4.2.2 PSLS matching . . . . . . . . . . . . . . . . . . . . . . . . . 1064.2.3 Complete algorithm . . . . . . . . . . . . . . . . . . . . . . 107

4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 1084.3.1 Registration accuracy . . . . . . . . . . . . . . . . . . . . . 108

4.3.2 Computational cost . . . . . . . . . . . . . . . . . . . . . . 1124.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

x

5. CONCLUSION AND FUTURE WORKS . . . . . . . . . . . . . . . . . . 115

Appendices:

A. Gabor Wavelet Feature Extraction . . . . . . . . . . . . . . . . . . . . . 118

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

xi

LIST OF TABLES

Table Page

2.1 Recognition results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.1 The image registration error compared to the ground truth in general

conditions (lower is better) . . . . . . . . . . . . . . . . . . . . . . . . 80

3.2 The image registration error compared to the ground truth when im-ages are altered or occluded (lower is better) . . . . . . . . . . . . . . 87

xii

LIST OF FIGURES

Figure Page

1.1 LPT mapping: (a) LPT sampling in the Cartesian Coordinates, (b) the

resulting sample distribution in the angular and log-radius directions.Note that in (a), the distance between consecutive sampling points in

the radius direction increases exponentially from center to the furthestcircumference due to the logarithm sampling . . . . . . . . . . . . . . 5

1.2 (a) The original image Lena, (b) the scaled and rotated image of (a),

(c) the LPT transformed image of (a), and (d) the LPT transformed

image of (b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 (a) Original image, (b) the LPT of (a), (c) scaled and rotated imageof (a), (d) LPT of (c) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 A screening mask (dimension is in pixel) . . . . . . . . . . . . . . . . 22

2.3 (a) feature points extracted from the original image, (b)the red dot

indicates the feature point selected as the original point for creatingthe model and the screening vector, (c) feature points extracted from

the scaled and rotated image, (d) feature points remained after screening 24

2.4 (a) and (c) Feature points extraction of the House and blox images,(b) and (d) 8 feature points are randomly selected from each image . 28

2.5 18 candidates from scaled and rotated the House image . . . . . . . 29

2.6 18 candidates from scaled and rotated the blox image . . . . . . . . . 30

2.7 A demonstration of the performance in matching feature points in theobject that is rotated with various orientations. (a) proposed approach

(b) VNC (c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

xiii

2.8 A demonstration of the performance in screening feature points using

feature number 3 of the blox image. (a) proposed approach (b) VNC(c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.9 A demonstration of the performance in screening feature points using

feature number 3 of the house image. (a) proposed approach (b) VNC(c) ASD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.10 A demonstration of the performance in screening feature points in the

noisy environment using feature number 3 of the blox image. (a) pro-posed approach (b) VNC (c) ASD . . . . . . . . . . . . . . . . . . . . 34

2.11 A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the house image. (a)

proposed approach (b) VNC (c) ASD . . . . . . . . . . . . . . . . . . 35

2.12 Reference images and selected object (a) US coins(b) BMW (c) Aircraft 36

2.13 Experimental results in the target image: (from left to right) featurepoint extraction, screening, recognizing object . . . . . . . . . . . . . 42

2.14 Sets of tested images, UScoins . . . . . . . . . . . . . . . . . . . . . 43

2.15 Sets of tested images, UScoins in noisy enivironment . . . . . . . . . 43

2.16 Sets of tested images, BMW . . . . . . . . . . . . . . . . . . . . . . 44

2.17 Sets of tested images, Aircraft . . . . . . . . . . . . . . . . . . . . . 44

2.18 MR-LPT mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.19 MR-LPT of the quarter coin in US coin image: (a) 16×16, (b) 32×32,and (c) 64× 64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.20 Feature point reduction from MR-LPT (red dot represents the center

point of LPT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

xiv

3.1 Example of inaccurate registration results using the conventional LPTdue to occlusion in the target image: (a) the model image patch

cropped from Lena image is registered to the non-occluded target im-age on the upper-right and occluded target image on the lower-right,

where the red rectangular indicate the registration results, (b) the cor-rection coefficients when registering to the non-occluded target image,

and (c) the correction coefficients when registering to the occludedtarget image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2 APT mapping: (a) adaptive sampling in the spatial domain with more

samples in the angular direction as the radius increases, (b) the result-ing sample distribution in the angular and radius directions. Note that

in (a), the distance between consecutive sampling points in the radius

direction remains the same for all radius circumferences and so doesthe angular direction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3 Examples of APT: (a) the original Lena image, (b) the APT trans-

formed image of (a), (c) the scaled and rotated image of (a), and (d)the APT transformed image of (c) . . . . . . . . . . . . . . . . . . . . 55

3.4 The effects of the changes in scale and rotation in the Cartesian coordi-

nates to the projections � and Θ: (a) the scale change in the Cartesian(scale factor of 1.2 is applied to Lena image) appears as variable-scale

in the projection �, while the projection Θ becomes slightly altered,(b) the rotation in the Cartesian (rotation parameter of 45 degrees is

applied to Lena image) appears as shifting in the projection Θ, whilethere is no change in the projection �, and (c) the changes in both

scale and rotation and in the Cartesian (scale and rotation parameters

of 1.2 and 45 degrees, respectively are applied to Lena image) appearas variable-scale in projection � and shifting in projection Θ, respectively 57

3.5 Examples of APT of images of syrup bottle: (a) the reference image

with the model image selected as the area inside the red circle, (b) theAPT transformed image of (a), (c) the target image, and (d) the APT

transformed image of the area inside the red circle of the target image 58

xv

3.6 The projections of the syrup bottle images: (a) comparison of projec-tions � of the syrup bottle from the reference image and the target

image, and (b) comparison of projections Θ of the syrup bottle fromthe reference image and the target image. The differences between pro-

jections of the two images are more pronounced, which need additionaltechniques to resolve . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.7 Feature point extraction and localization procedure . . . . . . . . . . 63

3.8 Comparison of image registration results using the proposed approach

and the conventional LPT based approach on the image of squirrel : (a)the reference image where the entire area is used as the model image,

(b) registration result using the conventional LPT based approach, and

(c) registration result using the proposed approach (red rectangularindicates the registration result) . . . . . . . . . . . . . . . . . . . . . 77


and the conventional LPT based approach on the image of statue ofliberty : (a) the reference image where the entire area is used as the

model image, (b) registration result using the conventional LPT basedapproach, and (c) registration result using the proposed approach (red

rectangular indicates the registration result) . . . . . . . . . . . . . . 78

3.10 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of bird : (a) the

reference image where the entire area is used as the model image, (b)registration result using the conventional LPT based approach, and

(c) registration result using the proposed approach (red rectangular

indicates the registration result) . . . . . . . . . . . . . . . . . . . . . 78

3.11 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of speed limit

sign: (a) the reference image with the model image selected as the areainside the red rectangular, (b) registration result using the conventional

LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result) . . . . . . 79

xvi

3.12 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of syrup bottle:

(a) the reference image with the model image selected as the areainside the red rectangular, (b) registration result using the conventional

LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result) . . . . . . 79


and the conventional LPT based approach on the image of DreeseBuilding : (a) the reference image where the entire area is used as the

model image, (b) registration result using the conventional LPT basedapproach, and (c) registration result using the proposed approach (red

rectangular indicates the registration result and the green area indi-

cates the altered area) . . . . . . . . . . . . . . . . . . . . . . . . . . 82

3.14 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of flower : (a)

the reference image where the entire area is used as the model image,(b) registration result using the conventional LPT based approach, and

(c) registration result using the proposed approach (red rectangularindicates the registration result and the green area indicates the altered

area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.15 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of cereal box :

(a) the reference image where the entire area is used as the model im-age, (b) registration result using the conventional LPT based approach,

and (c) registration result using the proposed approach (red rectangu-

lar indicates the registration result and the green area indicates thealtered area) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83


and the conventional LPT based approach on the image of humanbrain: (a) the reference image with the model image selected as the

area inside the red rectangular, (b) registration result using the con-ventional LPT based approach, and (c) registration result using the

proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area) . . . . . . . . . . . . . 84

xvii

3.17 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human lung :

(a) the reference image with the model image selected as the area in-side the red rectangular, (b) registration result using the conventional

LPT based approach, and (c) registration result using the proposed ap-proach (red rectangular indicates the registration result and the green

area indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 84

3.18 Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of OSU tower :

(a) the reference image with the model image selected as the area in-side the red rectangular, (b) registration result using the conventional

LPT based approach, and (c) registration result using the proposed ap-

proach (red rectangular indicates the registration result and the greenarea indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 86


and the conventional LPT based approach on the image of stop sign:(a) the reference image with the model image selected as the area in-

side the red rectangular, (b) registration result using the conventionalLPT based approach, and (c) registration result using the proposed ap-

proach (red rectangular indicates the registration result and the greenarea indicates the altered area) . . . . . . . . . . . . . . . . . . . . . 87

3.20 PPT map (a) Distance parameters for image of size 10× 10 pixels and

(xc, yc) = (7, 7), (b) Graphic demonstration of PPT map for imagewith size 513× 513 pixels and (xc, yc) = (256, 256) . . . . . . . . . . 89

3.21 Example of PPT: (a) the original Lena image, (b) the PPT transformedimage of (a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.1 LS mapping with nr = 8 and nθ = 16: (a) CW direction, (b) CCW

direction, and (c) PSLS in the CCW direction . . . . . . . . . . . . . 95

4.2 Comparison of the sampling points distribution between LPT and LSmappings: (a) original Target logo image, (b)Target logo image with

scale parameter 1.3, and (c) comparison the locations of the samplingpoints from the center . . . . . . . . . . . . . . . . . . . . . . . . . . 98

xviii

4.3 LS transformed vectors of barbara images, where nr = nθ = 16 andRmax = 64: (a) from original barbara image (this transformed vector is

presented in (b)-(d) as broken blue line for comparison), (b) from bar-bara image with 90 degrees rotation, (c) from barbara image with scale

parameter of 2, and (d) from barbara image with scale and rotationparameters of 2 and 90 degrees, respectively . . . . . . . . . . . . . . 100

4.4 Comparison of sampling distributions: (a) LPT and LS mappings and

(b) LPT and PSLS mappings . . . . . . . . . . . . . . . . . . . . . . 102

4.5 Comparison of percentage of oversampling between LPT and PSLSmappings: (a) images with the size 64 × 64, and (b)images with the

size 128× 128 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

4.6 Image registration results using the proposed PSLS approach: (a) the

reference images where the entire area are used as the model images,(b) target images and their features, and (c) registration results (red

rectangular indicates the registration result) . . . . . . . . . . . . . . 110

4.7 Experimental results: (a) PSLS rotation test results, (b) PSLS scaletest results, (c) LPT rotation test results, and (d) LPT scale test results111

4.8 Computational benefit using PSLS approach . . . . . . . . . . . . . . 113

A.1 Performance evaluations of proposed Gabor features, SIFT [43], and

Harris-Laplacian [50], (a) Orientation change, (b) scale change, (c)noise added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

xix

CHAPTER 1

INTRODUCTION

Image registration is a process of aligning two images that share common visual

information such as images of the same object or images of the same scene taken at

different geometric viewpoints, different time, or by different image sensors. Image

registration is an essential step in many image processing applications that involve

multiple images for comparison, integration or analysis such as image fusion, image

mosaics, image or scene change detection, and medical imaging. The main objective

of image registration is to find the geometric transformations of the model image, IM ,

in the target image, IT , where IT (x, y) = T {IM(x′, y′)} and T is a two-dimensional

geometric transformation that associates the (x′, y′) coordinates in IM with the (x, y)

coordinates in IT . These two-dimensional geometric transformations include scale,

rotation, and translation in the Cartesian coordinates. The model image is often

presented in the form of image patch that is cropped from the reference image. For

the case that the entire reference image is desired to be registered to the target image,

the reference image is considered the model image.

Many image registration methods have been proposed in the past 20 years [5, 8,

35, 39, 45, 64, 93, 94], which can be categorized into two major groups: the feature-

based approach and the area-based approach. The feature-based approach uses only

1

the correspondence between the features in the two images for registration. The

features include region features such as building [24], water content [28], forests [67]

color gradient, line features such as edges [30, 31], line segments [9, 53], geometric

shape [52,56], and feature points such as lines or edges intersections [72], Gabor feature

[27, 46, 92]. Recently, Lowe et al. [43] introduce the so called SIFT method that can

extract image features that are invariant to illumination change, scale, and rotation.

Another well know method using a combination of Harris corner detection [26] and

Laplacian of Gaussian to extract features that are invariant to scale and rotation is

proposed in [50]. Since only the features are involved in the registration, the feature-

based approach has advantages in registering images that are subjected to alteration

or occlusion. However, the use of the feature-based approach is recommended only

when the images contain enough distinctive features [93]. As a result, for some

applications such as medical imaging, in which the images are not rich in detail

and features are difficult to be distinguished from one another, the feature-based

approach may not perform effectively. This problem can be overcome by the area-

based approach.

The common area-based approach is the normalized cross-correlation [18,39]. An-

other correlation based technique which is more robust to noise and changes in the

image intensity than the cross-correlation technique is the phase correlation [35], in

which the normalized cross-power spectrum between the two images is computed in

the frequency domain. Although the correlation approaches show successful results

in registering images that yield translation in the Cartesian, they both fail in the

case where there are changes in scale or rotation between the two images. Combining

2

the phase correlation technique with the log-polar transform (LPT), the Fourier-

Mellin [64,69] approach is proposed as a breakthrough area-based method that yields

invariant properties to translation, scale and rotation. However, recent studies [94]

and [73] indicate that the Fourier-Mellin method is able to recover only a fair amount

of rotation and scale. Moreover Fourier transform introduces a problem of border ef-

fect where rotation and scale affect the aliasing of the image. Another work in image

registration that uses the advantage of LPT is proposed in [94]. Unlike the Fourier-

Mellin approach in which matching and localization procedures are performed in the

frequency domain, the registration method proposed by Zokai et al. is performed

entirely in the spatial domain. The translation parameter is recovered by using the

coarse-to-fine multiresolution framework, while the scale and rotation parameters are

obtained by matching the log-polar transformed images using the cross-correlation

function.

1.1 Log-Polar Transform

Log-Polar transform (LPT) [3, 10, 34, 44, 47–49, 55, 57, 60, 74, 76–79, 84, 89, 90, 94]

is a well known tool for image processing for its rotation and scale invariant prop-

erties. LPT is a nonlinear and nonuniform sampling method used to convert image

from the Cartesian coordinates I(x, y) to the log-polar coordinates ILP (ρ, θ). The

mathematical expression of the mapping procedure is shown below

ρ = logbase√

(x− xc)2 + (y − yc)2 (1.1)

θ = tan−1 y − ycx− xc

(1.2)

where (xc, yc) is the center pixel of the transformation in the Cartesian coordinates.

(x, y) denotes the sampling pixel in the Cartesian coordinates and (ρ, θ) denotes

3

the log-radius and the angular position in the log-polar coordinates. For the sake

of simplicity, we assume the natural logarithmic is used in this paper. Sampling

is achieved by mapping image pixels in the Cartesian to the log-polar coordinates

according to Eqs. (1.1) and (1.2). Fig. 1.1 shows an example of the sampling point

for image in the Cartesian coordinates and the transformed result. As shown in Fig.

1.1(a), the distance between two consecutive sampling points in the radius direction

increases exponentially from center to the furthest circumference. In the angular

direction, for each radius, the circumference is sampled with the same number of

samples. Hence, image pixels close to the center are oversampled while image pixels

further away from the center are undersampled or missed.

The advantage of using log-polar over the Cartesian coordinate representation is

that any rotation and scale in the Cartesian coordinates is represented as shifting in

the angular and the log-radius directions in the log-polar coordinates, respectively.

Given g(x′, y′) a scaled and rotated image of f(x, y) with scale rotation parameters a

and α degrees, respectively, we have:

[x′

y′

]=

[a cosα −a sinαa sinα a cosα

] [xy

], (1.3)

x′ = ax cosα− ay sinα, y′ = ax sinα + ay cosα. (1.4)

In log-polar coordinate, f(ρ, θ)→ g(ρ′, θ′), we have:

ρ′ = logbase√

(x′)2 + (y′)2

ρ′ = logbase√

(ax cosα− ay sinα)2 + (ax sinα+ ay cosα)2

ρ′ = logbase√

(ar cos θ cosα− ar sin θ sinα)2 + (ar cos θ sinα + ar sin θ cosα)2

4

(a) (b)

Figure 1.1: LPT mapping: (a) LPT sampling in the Cartesian Coordinates, (b) theresulting sample distribution in the angular and log-radius directions. Note that in(a), the distance between consecutive sampling points in the radius direction increasesexponentially from center to the furthest circumference due to the logarithm sampling

ρ′ = logbase√

(ar cos(θ + α))2 + (ar sin(θ + α))2 = logbase√a2r2 = ρ+ logbase(a)

and

θ′ = tan−1

(y′

x′

)= tan−1

(ax sinα+ ay cosα

ax cosα− ay sinα

)= tan−1

(ar cos θ sinα + ar sin θ cosα

ar cos θ cosα− ar sin θ sinα

)

θ′ = tan−1

(ar sin(θ + α)

ar cos(θ + α)

)= θ + α.


that any rotation and scale in the Cartesian coordinates is represented as shifting in

the angular and the log-radius directions in the log-polar coordinates, respectively, as

shown in Fig. 1.2. Fig. 1.2(a) is the original image and Fig. 1.2(b) is the scaled and

rotated version of the original image. Figs. 1.2(c) and 1.2(d) are the LPT images

of Figs. 1.2(a) and 1.2(b), respectively. The column of the log-polar coordinates

represents the angular direction while the row represents the log-radius. We can see

that rotation and scale in the Cartesian coordinates are represented as shifting in the

log-polar coordinates.

5

(a) (b) (c) (d)

Figure 1.2: (a) The original image Lena, (b) the scaled and rotated image of (a), (c)the LPT transformed image of (a), and (d) the LPT transformed image of (b)

1.2 Related Works

1.2.1 Correlation approaches

Correlation between two images is a statistical approach that has been widely used

as a key component for many template matching based image processing applications

such as pattern recognition [19,36], image registration [63,83], and object recognition

[80]. The most commonly used image correlation technique is the cross-correlation.

The technique is based on the squared Euclidean distance measurement. To compute

the cross-correlation coefficient, C(M, I, u, v), between a template or an object model

M and a query image or a target image I, the template is shifted to every position in

the image and the similarity of the grey level between the two images are computed.

C(M, I, u, v) =

∑x

∑y

[M(x, y)− M

] [I(x− u, y − v)− I

]σMσI

, (1.5)

σM =

√∑M

[M(x, y)− M

]2(1.6)

σI =

√∑I

[I(x, y)− I

]2(1.7)

6

where M is the average intensity of the model and I is the average intensity of the

target image at the area of interest. Although the cross-correlation technique itself is

not a template matching technique, the cross-correlation coefficient will have its peak

at the point (u′, v′) where the template matches the image the most. Therefore, we

can use this property to recover the translation of the object between the model and

the target image.

An improved correlation technique for similarity measure purpose is the Sequential

Similarity Detection Algorithm (SSDA) [8], [5]. The similarity between the Model

and the Target Image is computed by the following function.

SSDA(u, v) =∑x

∑y

∣∣M(x, y)− M − I(x− u, y − v) + I(u, v)∣∣ (1.8)

SSDA is not only simpler than the cross-correlation technique, it also performs better

when there is image intensity change between the two images [8].

1.2.2 Phase correlation

Proposed in 1975 [35], performing correlation matching in the frequency domain

demonstrated excellent robustness against noise and changes in image illumination.

Phase correlation [1,38,70] is based on the Shift theorem which states that shifting in

function f(x) by α equal to the multiplication of its Fourier transform by e−jαx. This

theorem also holds for a two the dimensional function or 2D image. For an image

I1(x, y) that is shifted by (α, β) , the Fourier transform and its relationship to the

original image can be shown as follow:

I2(x, y) = I1(x+ α, y + β) (1.9)

F2(ωx, ωy) = e−j(αωx+βωy)F1(ωx, ωy) (1.10)

7

where F (ωx, ωy) is the Fourier transformed image. We can see that the transla-

tion between two images in spatial domain can be represented as a phase difference

in frequency domain. The magnitude of the phase difference can be computed by

the normalized multiplication in frequency domain between the first image and the

complex conjugate of the second image. This computation technique is called a nor-

malized cross power spectrum. We can prove that the normalized cross spectrum is

the magnitude of phase difference follow:

F1(ωx, ωy)F∗2 (ωx, ωy)

|F1(ωx, ωy)| |F ∗2 (ωx, ωy)|

=F1(ωx, ωy)F1(ωx, ωy)e

−j(αωx+βωy)

|F1(ωx, ωy)F1(ωx, ωy)e−j(αωx+βωy)| (1.11)

= e−j(αωx+βωy) (1.12)

To represent this phase difference as translation in spatial domain, we apply the 2D

inverse Fourier transform to it. The peak value of this inverse Fourier transform

indicates the translation between the two images.

1.2.3 Fourier-Mellin

The Fourier-Mellin [11, 58, 61, 64, 69] approach combines the phase correlation

technique with the polar representation to address the problem of object that is both

translated and rotated. If I1(x, y) is an original image, the mathematical description

of translated and rotated image, I2(x, y), can be shown as:

I2(x, y) = I1(x cos θ0 + y sin θ0 − u,−x sin θ0 + y cos θ0 − v) (1.13)

where (u, v) is the translation parameter and θ is an orientation of the rotation.

According to the rotation and translation properties of Fourier transform that the

power spectrum of the rotated and translated image will also be rotated by the same

orientation while the magnitude remains the same, the Fourier transforms of I1(x, y)

8

and I2(x, y) are related by:

F2 (ωx, ωy) = e−j2π(ωxu+ωyv)F1 (ωx cos θ0 + ωy sin θ0,−ωx sin θ0 + ωy cos θ0) (1.14)

It is obvious that the magnitude of F2 is equal to F1, while the spectrum is rotated.

By representing the spectra in the polar coordinate and applying the phase correlation

technique, the rotation can be recovered.

|F2 (r, θ)| = |F1 (r, θ − θ0)| (1.15)

The extension of this method was to apply the Log-Polar transform to the Fourier

Magnitude in order to handle the scale change. If I2(x, y) is a scaled image of I1(x, y)

with the scale factor ξ, its Fourier transform can be expressed as:

I2 (x, y) = I1 (ξx, ξy) (1.16)

F2 (ω1, ω2) =1

ξ2F1

(ω1

ξ,ω2

ξ

)(1.17)

By applying Log-Polar transform to the transformed image, scale is represented as

translation in frequency domain.

F2 (logω1, logω2) = F1 (logω1 − log ξ, logω2 − log ξ) (1.18)

From (11) - (14), we can find the Fourier magnitude spectrum of translated, rotated,

and scaled image in Log-Polar coordinate as:

F2 (log r, θ) = F1 (log r − log ξ, θ − θ0) (1.19)

According to [94], the method however is able to recover a fair rotation and scale.

Moreover since Fourier transform introduce a problem of border effect where the

rotation and scale affects the aliasing of the image.

9

1.2.4 Feature-based approaches

Unlike template matching based object recognition that compare the similarity

between the entire area of interest with the template, feature-based approach uses

only the correspondence between the features to identify and recognize an object. The

commonly used features are such as object color, edges, geometric shape and contour,

object skeleton, and object interest point or feature point. One major advantage of

using feature based approach is dealing with the deformation. The ideal properties of

features are the invariance to illumination change, rotation, scale, camera viewpoint

and object deformation. He et al. [27] used Gabor wavelet to extract the rotation

invariant feature points of the human face and form a mesh representation of the

face for tracking within the video frames. The experiment shows successful results

for objects in the dark background, but the approach may fail in the more realistic

background with various intensities. Based on Harris corner detector, Mikolajczyk

et al. proposed in [50] the scale and affine invariant feature point extraction using

a combination of corner detection and blob detection, or the Laplacian of Gaussian.

Lowe et al [43] introduce SIFT (Scale Invariant Feature Transform), a scale and ro-

tation invariant feature extraction for object recognition and matching by convolving

the image with the Gaussian filter at the different scales creating a Difference-of-

Gaussian (DoG) image, then choose the maxima and minima of the convolved images

across the scales. Most of the works that is based on features aim to address the

problem of recognizing object that is partially viewed or is deformed.

To the best of our knowledge, most feature-based object recognition approaches

suffers from a high computational cost. The common recognition procedure con-

sists of feature extration, invariant property conversion, and feature matching using

10

RANSAC [21] algorithm or k−d tree. our object recognition approach yeild compet-

itive result with much less computational complexity.

1.2.5 Multiresolution framework

One of the most widely used technique to reduce the time for locating an object or

an area of interest in an image is by applying the multiresolution framework [13,65,94].

An Gaussian pyramid is build as a collection of the representation of an image at

different resolution level. A query image or a target image is convolved with the

Gaussian filter, which perform as a two dimensional low pass filter to remove unwanted

effect such as noise and edges as well as to smoothen the image. Since the high

frequency component is removed at each level of the pyramid, one can resample

the image at each level and reduce the number of pixel by a multiple of a power

of two. The smallest image in the hierarchy is the most smoothened and is usually

referred as a coarsest level in the pyramid, while the original image which has not been

smoothened by any filter is referred as a finest level. For spatial search proposes in

many computer vision application, the search begins by matching the known template

with the coarsest level of the target image to obtain a rough location of the object.

The search will be performed repeatedly at the finer level around the neighbor area

of the matched location of the coarser level. This search strategy is known as the

multiresolution framework or course-to-fine search strategy. Recently, researchers

have also studied an alternative way to perform a multiresolution framework by using

wavelet transform or other type of two dimensional low pass filters. Even though

different type of filter provides slightly different result and characteristic of filtered

image, the main idea for spatial search theme remains the same. The performance of

11

the multiresolution framework in reducing the search time for locating an object inside

the image is quite impressive as it has been widely used in many applications, however

it is lack in using the object feature information. Therefore, at each level of the

pyramid, the multiresolution framework still performs based on the blindly searching

theme for every translation around the neighbor area of the matched location obtained

from the coarser level.

12

CHAPTER 2

2D INVARIANT OBJECT RECOGNITION USINGLOG-POLAR TRANSFORM

To recognize an object that appears in a target image is an essential task in many

image processing applications such as image registration, artificial intelligence, and

active vision. Although objects appear as three-dimensional (3D) in the real-world,

the perceived objects in digital video or image are a two-dimensional (2D) projection

of the actual three-dimensional objects. In most cases, major problems in recogniz-

ing objects lie on the 2D changes in object appearances e.g. translation, rotation,

scale, and noise. Recently Lowe et al [43] proposed SIFT, a feature extraction and

description mechanism that is invariant to rotation and scale by convolving the image

with the Gaussian filter at different scales creating a Difference-of-Gaussian image.

Mikolajczyk et al [50] proposed Harris-Laplacian interest point detector using a com-

bination of Harris corner detection [26] and Laplacian-of-Gaussin to obtain scale,

rotation and affine invariant feature points. These two methods are widely used as a

tool for feature-based object recognition. Although SIFT and Harris-Laplacian per-

form well in extracting 2D invariant feature points of the rotated and scaled images

with high repeatability factor [51], the performance of the methods degrades when

working in the noisy environment. This will be illustrated later in Section III.

13

An alternative mechanism, the Log-Polar transform (LPT) [3], is used for 2D

matching applications [55, 60, 94] for its scale and rotation invariant properties. Ro-

tation and scale change in Cartesian coordinate appears as translation in log-polar

domain. The study in [94] shows that the invariant properties of LPT provide ad-

vantages in 2D matching over the Cartesian coordinate and the robustness in noisy

environment. However, in order to achieve the invariant properties, center of log-

polar transformation for both images has to match. One solution is proposed in [64],

in which the technique is usually known as Fourier-mellin (FM). The translation

of the center point is recovered from the correlation of phase magnitude of Fourier

transforms (FFT). By applying LPT to the Fourier transformed image followed by

matching the FFT-LPT image in frequency domain, the rotation and scale parameter

is obtained. Using FFT to match objects in frequency domain makes FM an effective

approach in term of noise resistant, and with LPT, FM is also invariant to rotation

and scale change. However, the studies in [64] indicate that the correctness for log-

polar matching in frequency domain as used in FM is limited to 1.8 scale factor and 80

degree rotation. Another problem of FM is reported in [73] as using phase magnitude

of FFT to recover the translation of an object introduces an aliasing problem.

Another solution is proposed in [94], where LPT is used for image registration

purposes. In this work, Gaussian pyramid is built as a collection of the representa-

tion of an image at different resolution levels. The search for center point begins by

matching the known template with the coarsest level of the target image to obtain a

rough location of the object. The search will be performed repeatedly at the finer level

around the neighbor area of the matched location of the coarser level. This search

14

strategy is known as the multi-resolution framework. Using this multi-resolution ap-

proach, the translation of center point can be found. However, since multi-resolution

approach uses low pass Gaussian filter function to remove the high frequency com-

ponent such as edges, the object or template to be searched needs to contain enough

low frequency information to assure a successful search in coarse level. Moreover, in

noisy environment, the low frequency component can be distorted which yields to the

failure in locating the center point.

Inspired by [94] and [64], we proposed in this Chapter a new LPT based template

matching based object recognition recognition that is invariant to 2D rotation and

scaling, and resistant to a reasonable amount of noise. Unlike the previous works,

we proposed invariant feature based method for recovering the translation of center

point. Our proposed search method is achieved by a combination of Gabor invariant

feature extraction and new multi-resolution log-polar transform. We note that our

proposed method differs from FM in several aspects. First, in our approach the log-

polar matching is performed in spatial domain not the frequency domain. Secondly

we address the problem of translation with the feature based approach. In contrast

with FM, the phase correlation technique is used in our approach for recovering the

scale and rotation in Log-Polar matching, not for recovering the translation.

To make the approach suites our purposes, our proposed searching method needs

to be invariant to 2D geometry changes as well as tolerate to a reasonable amount

of noise. We show in this paper the comparison of 2D invariant properties and noise

resistance between our proposed Gabor feature extraction and several commonly used

invariant feature extraction techniques.

15

2.1 Log-Polar Transform

In our approach we use the LPT to achieve rotation and scale invariant recognition.

For image processing, the LPT is a nonlinear sampling method used to convert image

from the Cartesian coordinate I(x, y) to the log-polar coordinate LPT(ρ,θ). First, a

square image with the size Rmax ×Rmax is transformed to the polar coordinate IP(r,

θ) by the following mapping:

IP (r, θ) = I(Rmax

2+ r cos(

2πθ

nθ),Rmax

2− r sin(

2πθ

nθ)). (2.1)

The variable Rmax is the maximum circle radius of the transformation inside the

image in the Cartesian coordinates. Polar transform will sample the image equally

in the radius direction for r = 0, . . . , nr where nr is the number of samples, and will

sample equally to cover 360 degrees in the angular direction for θ = 0, . . . , nθ where

nθ is the number of samples. Since the transformation is not a one-to-one mapping,

it requires interpolation.

Then the image in the polar coordinates is transformed to the log-polar coordinates

by applying the logarithm arithmetic to the scale r as follows:

ILP (ρ, θ) = IP (log(r), θ) (2.2)


because any rotation and scale in the Cartesian coordinates is represented as shifts

in the angular and log-radius directions, respectively, in the log-polar coordinates as

shown in Fig. 2.1. Fig. 2.1 (a) is the original image while Fig. 2.1 (c) shows the 50%

scale and 45 degree counter-clockwise rotation of the original image. Figs. 2.1 (b)

and (d) are the LPT images of Fig. 2.1 (a) and (c), respectively. In Fig. 2.1 (d), the

16

column of the log-polar coordinates represents the angular direction while the row

represents the log-radius. We can see that the rotation and scale in the Cartesian

coordinates are represented by shift in the log-polar coordinates.

Another advantage of log-polar for the purpose of object recognition is from the

log-polar’s space variant properties. The sampling resolution in the Cartesian co-

ordinates is higher around the center point or the origin of the LPT and decreases

exponentially as it gets further away from the center. This space variant property

in log-polar gives the advantage for object recognition since the area occupying the

center of the LPT automatically becomes more important than the surrounding ar-

eas which are more likely to be the background of the target image. That makes the

recognition possible even when the object is not perfectly circle or heavily covered by

the background in its boundary areas.

2.2 The New LPT-Based Object Recognition Approach

2.2.1 Establish the model

We define model as an LPT matrix of the object ILPM(ρ, θ) to be mapped with

a set of LPT matrices in the target image, ILPC1 (ρ, θ), . . . , ILPC

n (ρ, θ) which we

call candidates. The superscript M and C represent the model and the candidate,

respectively. The subscript n is the number of feature points found in the target

image. How to identify the feature point will be discussed later in the subsection

3.2.1. To establish the model, we first manually choose an object from the reference

image. Next, we extract a set of feature points pM1 , . . . , pMn from the object using

Harris corner detector [26], which will be explained later again. Then we manually

chose one of the feature points pMi to be the origin of LPT. The size of RMmax of

17

(a) (b)

(c) (d)

Figure 2.1: (a) Original image, (b) the LPT of (a), (c) scaled and rotated image of(a), (d) LPT of (c)

the transformation is chosen to be the radius of the transformation circle that does

not contain any part of the background. The model is established by using the

transformations described in Eqs. 2.1 and 2.2.

2.2.2 Global search

In reality object can be translated to any location inside the entire target image,

and any small translation of the center point of the LPT could distort the log-polar

image or even change the image entirely. Therefore, in order to keep the advantage

18

of the LPT, the problem of translation or global search has to be addressed. A

work regarding this problem is presented in [77] by using projection-based translation

estimation. However, the solution only succeeds for a small translation. In many

recognition approaches, a large computational costs associate with the problem of

translation. The most straightforward search strategy is simply verifying all the

translation throughout the image [62], [37], [86]. In many motion related work [12],

the search space is limited to a small corresponding area from the previous video

frame. He et al proposed in [27] a more robust search strategy using the 2D golden

section algorithm by narrowing the search limit for both upper bounds and lower

bounds for each search iteration. In [94] a multi-resolution or image Gaussian pyramid

framework is used as a search strategy. At each level of the pyramid, first image is

convoluted with the low pass Gaussian filter function to remove the high frequency

component such as edges as well as smoother the image. The filtered image is then

down sampling by 2. Therefore each iteration reduces the size of the image by 4.

A search strategy begins with the coarsest level to locate the area of interest before

moving to the finer level.

Since LPT is very sensitive to translation, sub-pixel accuracy for translation re-

covery is needed. Using multi-resolution approach as a search strategy [94] can gives

the accuracy needed; however, the computational cost is still high due to multiple

level searches. To address this problem, we use a combination of Harris corner de-

tection and our new screening method as a key component to limit the global search

space. We use corner as a feature point in our approach because it gives advantages

of rotation invariant.

19

1. Feature Point Extraction: To counter the classic time-consuming global search

which exhaustively examines every pixel in the target image, we adopt Harris

corner detector to augment the LPT approach. In [26], corners are defined

as the points that have high intensity changes in both horizontal and vertical

directions. For a point (x,y) in an image I, given a shift (u, v), the intensity

change can be expressed as the auto-correlation function:

]E(u, v) =∑x,y

w(x, y) [I(x+ u, y + v)− I(x, y)]2 (2.3)

where w(x,y) is the Gaussian window function. For the small shift (u,v), we

have bilinear approximation of E(u,v) as:

E(u, v) ≈ [u, v]M

[uv

],M =

∑x,y

w(x, y)

[I2x IxIy

IxIy I2y

]. (2.4)

M is a symmetric matrix computed from the second derivative of E around

(x, y). To avoid the explicit eigenvalue decomposition of matrix M , Harris

suggested [26] the response function as follows:

fR = detM − k(tr[M ])2 (2.5)

where k is a constant value suggested to be 0.04 in [26], detM = λ1λ2 and

tr[M ] = λ1 + λ2. The value of the response function fR is used to classify

the type of image point (x, y). The point (x, y) can be classified as corner if

fR is positive and large. A local maxima window is applied to remove the

weak corners in the area of a strong corner. We extract a set of feature points

pC1 , . . . , pCn from the target image using Eqs. 2.3-2.5.

2. Screening: Corners have been proved to be invariant to rotation; however, it

is not invariant to scale change. Having the standard deviation parameter of

20

the Gaussian window and the size of the local-maxima mask to be small could

help solving this problem. As a result, even when object is scaled, most corners

detected still remain. However, the number of feature point detected increases

dramatically when the standard deviation is small, and this would slow down

the global search process. To limit the number of feature points, we introduce

a new screening method which requires much lower computational time than

applying the LPT to every feature point.

In this screening phase, we construct a four-dimensional screening vector,−→SV =

[qi, qj , qk, ql], for every feature point detected in the previous step by applying

the screening mask, as shown in Fig. 2.2, to every feature point. More specif-

ically, the screening mask has the size of 11 × 11 pixels and its center pixel,

C(i, j), is the feature point we want to perform the screening. Two types of

features are computed accordingly: axis features, denoted as ai, and quadrant

features, denoted as qi, for i = 1, . . . , 4. The axis features are computed from

the difference between the average intensity of every pixel in area Ai and the

feature point C(i, j). These axis features represent the intensity change when

the feature point C(i, j) is shifted in each axis direction. The quadrant features

are computed from the ratio between the average intensity of every pixel in area

Qi and the feature point C(i, j). The mathematical descriptions are shown in

2.6.

The axis features of the same axis are compared, i.e. a1 with a3 and a2 with a4.

We choose the area Qk that is not connected to the areas that have the highest

axis feature value in its axis, and use the variable qk obtained from this area

21

as the first element in our screening vector−→SV . For example, if a1 > a3 and

a4 > a2, we choose q3 to be the first element in−→SV .

Figure 2.2: A screening mask (dimension is in pixel)

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

a1 = 14

4∑n=1

I(i, j − n)− I(i, j)

a2 = 14

4∑n=1

I(i− n, j)− I(i, j)

a3 = 14

4∑n=1

I(i, j + n)− I(i, j)

a4 = 14

4∑n=1

I(i+ n, j)− I(i, j)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦,

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

q1 =4∑

n=1

4∑m=1

I(i+ n, j −m)/I(i, j)

q2 =4∑

n=1

4∑m=1

I(i− n, j −m)/I(i, j)

q3 =4∑

n=1

4∑m=1

I(i− n, j +m)/I(i, j)

q4 =4∑

n=1

4∑m=1

I(i+ n, j +m)/I(i, j)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦.

(2.6)

We now can create a vector−→SV by substituting the second, third and forth

element in−→SV with the rest of three q variables selected after the first element

in the counter-clockwise direction. Thus, in this example since q3 is the first

element in−→SV , we will have

−→SV = [q3, q4, q1, q2]. Since corner is invariant to

22

rotation, screening vector−→SV remains proportionally the same in the target

image even though the object is rotated.

To screen the feature points in the target image, we first find the screening

vector−−−→SV M from the model which is computed in the reference image at the

feature point that is used as the origin for creating the model. We then find

a set of screening vectors from the target image {−−→SV C

1 , . . . ,−−→SV C

k } where k is

the number of feature points extracted in Subsection 2.2.2. A set of remaining

feature point after screening, Pscr, can be described as:

Pscr =

{pi

∣∣∣∣ SV Ci . SV

M

‖SV Ci ‖ × ‖SV M‖ ≥ δ ; i = 1, .., k

}(2.7)

where the operator “.” is the inner product, and || || is the norm operator. The

search space is now limited to the remaining feature points in the target image.

We establish a set of candidates {ILPC1 (ρ, θ), . . . , ILPC

n (ρ, θ)} from the set of

remaining feature points, Pscr, by applying the LPT, described in Eqs. 2.1 and

2.2, to the target image using each feature point pC1 , . . . , pCn as the origin, where

n is the size of Pscr. The size of the LPT circle RCmax should be at least the size

of the object in the target image. Since the size of the object is unknown in the

target image, one may choose RCmax = RM

max The values of nr and nθ have to be

the same as what were used in establishing the model. Fig. 2.3 demonstrates

the performance of our screening method.

3. Local search: After the model and the candidates are established, we now per-

form a local search by mapping each of the candidates to the model. Since

rotation and scale appear as shift in the angular and the log-radius directions of

the log-polar coordinates, model and candidates may be misaligned. We recover

23

(a) (b)

(c) (d)

Figure 2.3: (a) feature points extracted from the original image, (b)the red dot indi-cates the feature point selected as the original point for creating the model and thescreening vector, (c) feature points extracted from the scaled and rotated image, (d)feature points remained after screening

these shifting by phase-correlating each of the candidates ILPCi (ρ, θ) with the

model ILPM(ρ, θ). To do so, we first find their 2D discrete Fourier transform

FCi (ωρ, ωθ) for i = 1, . . . , n and FM(ωρ, ωθ). For an N × M image, the 2D

discrete Fourier transform can be computed as:

F (ωρ, ωθ) =1

NM

N−1∑ρ=0

M−1∑θ=0

ILP (ρ, θ)e−2πj(ρωρ/N+θωθ/M). (2.8)

24

Any translation in the spatial domain results in phase shift in the frequency

domain as shown below:

ILPb(ρ, θ) ≡ ILPa(ρ−Δρ, θ −Δθ) (2.9)

FM(ωMρ , ωMθ )FC∗

i (ωCρ , ωCθ )∣∣FM(ωMρ , ω

Mθ )FC∗

i (ωCρ , ωCθ )∣∣ = eωρΔρi+ωθΔθi. (2.10)

The translation, (Δρ,Δθ), between two images is obtained by finding the peak

of the inverse Fourier transform of this normalized cross-power. From these two

variables, we can compute the scale and rotation between two log-polar images

as:

Scale = exp[Δρ log(Rmax)

nρ], Rotation = 360× Δθ

nθdegrees. (2.11)

In our approach, we phase-correlate each of the candidates ILPCi (ρ, θ), i =

1, . . . , n with the model ILPM(ρ, θ). Each candidate is then shifted by ILP SCi (ρ, θ) =

ILPCi (ρ−Δρ, θ −Δθ) before performing further verification. The superscript

SC represents the shift-recovered candidates.

4. Similarity measure: Although correlation technique itself is already a similarity

measure [8], in reality we may have to recognize or track multiples objects in the

target image. Each object gives different phase correlation value when mapping

with the target image. Thus relying only on the maximum phase correlation

value is not enough for the purpose of object recognition. We introduce a simple

similarity measure technique that gives fast computation and can withstand

noise and changes in image intensities. After each candidate is phase-correlated

and recovered for its rotation and scale, we now verify the similarity between

each of the shift-recovered candidates and the model. We treat each row and

25

column of two images as vectors and find the inner product result sρ(i) for

i = Δρ to nρ and sθ(j) for j = 1 to nθ as follows:

s•(k) =LSC• (k).LM• (k)

‖LSC• (k)‖ × ‖LM• (k)‖ (2.12)

where LSC(k) is the image vector k of the shift-recovered candidates, LM (k) is

the image vector k of the model. The subscript “.” indicates a particular ρ or

θ vector. The final similarity measure, Sl, is the sum of all sρ(i) and sθ(j),

Sl =

[nρ∑i=1

sρ(i) +

nθ∑j=1

sθ(j)

]/ (nρ + nθ −Δρ) . (2.13)

To make decision if the candidate matches the model, the similarity factor Sl

has to be greater than a threshold value ε.

2.3 Experimental Results

The performance of the proposed approach is evaluated on two bases; the robust-

ness and the accuracy. In the first phase of this section, the computational cost of the

algorithm is tested. We also discuss in detail the robustness of the proposed search

strategy which combines the feature point extraction with the screening method. The

performance of the proposed approach in term of feature point matching and screen-

ing as well as the robustness to scale and rotation is discussed and compared with

the correlation based techniques. The later subsection will present the accuracy of

the proposed object recognition approach.

2.3.1 The robustness

One of the major contributions in this work is the proposed search strategy which

combines a feature point extraction and the screening method. First we will evaluate

26

the overall computational cost of the proposed object recognition approach. All ex-

periments are carried out on Matlab software running on Pentium IV 2.0-GHz PC.

The average computational time is shown in Table 2.1. The results demonstrate

the performance of our screening method for which the average computational time

reduces to be approximately 25.19% and 7.43% of the computational time without

the screening in the normal environment and in the noisy environment, respectively.

Compared with the naive search strategy that blindly compute the LPT for all trans-

lations in the target image, the computational load of the later case is inefficiently

high to the approximately 100 times more than the proposed approach, e.g. 550 sec-

onds for 240 × 360 image size. The computational time can be slightly reduced when

smaller size of nr and nθ, is used. However, reducing the resolution of LPT leads to

the lower in accuracy rate. We also applied a multi-resolution search strategy frame-

work as suggested in [94]. We adopt a Gaussian pyramid with 3 levels reduction.

The search begins with all translations in the coarsest level. For the higher level, we

search only the 20 × 20 neighbor pixels around the match point of the coarser level.

The average computational time using a multi-resolution framework is greater than

our proposed approach as it requires approximately 26 seconds for each of the BMW

video frame image and 32 seconds for US coins images. This shows that our We also

note that using multi-resolution framework might not be suitable for multiple object

recognition since more than one area of interest might be needed to consider and this

could lead to a much higher computational cost in the finer level.

From the experiment, as shown in Table 2.1, the robustness of our object recog-

nition approach is presented. Feature point and feature point extraction plays an

27

(a) (b)

(c) (d)

Figure 2.4: (a) and (c) Feature points extraction of the House and blox images, (b)and (d) 8 feature points are randomly selected from each image

important role in reducing the computational cost. It is also shown that the screen-

ing method significantly further reduces the computational cost by approximately 30

to 80 percents. We extend our experiment to feature point matching and feature point

screening to evaluate the efficiency of the proposed screening method. To the best of

our knowledge, there is no publication that provides the screening method suited our

LPT based object recognition purposes. One of the commonly used feature matching

techniques is RANSAC [21]. This technique provides high accuracy feature point

matching with the tradeoff of high computational cost as the multiple feature point

outliners are involved in the computation to fit the models. Thus, it is not suitable

28

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n) (o) (p) (q) (r)

Figure 2.5: 18 candidates from scaled and rotated the House image

for our screening purposes. We therefore compare our proposed screening method

with the correlation based feature matching techniques which can provide the result

closest to our screening purposes. Correlation technique is widely used as a tool for

matching feature points between two distinct images [81], [2], [82], [15], [91]. Two of

the correlation based techniques: average square difference correlation (ASD) [2], [82]

and variance normalized correlation (VNC) [2], [82] are compared with our work. The

first technique, ASD, is based on the intensity difference between square areas around

the feature point. The ASD correlation coefficient between feature point m and n is

defined as:

CASD (m,n) =1

N

∑m′∈ℵ(m),n′∈ℵ(n)

[I (m′)− I (n′)] (2.14)

where N is the size of neighborhood window. The size of neighborhood window used

in the experiment is 9 × 9 pixels. The second correlation based feature matching

29

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n) (o) (p) (q) (r)

Figure 2.6: 18 candidates from scaled and rotated the blox image

technique, VCN, uses the normalized intensity of the neighborhood window and its

variance to match the feature point. The VNC correlation coefficient between feature

point m and n is defined as:

CV NC (m,n) =1

N√σ2I (m)σ2

I (n)

∑m′∈ℵ(m),n′∈ℵ(n)

[I(m′)− I(m)

] [I(n′)− I(n)

](2.15)

In order to evaluate the performance of the proposed screening approach when

the object is scaled and rotated. The House and blox images, which are chosen as

reference images in this experiment, are scaled by the factor of 0.9 and 2.5 respectively.

Two sets of candidate images are created by rotating the scaled images by 10i degree,

where i = 1, . . . , 18. This will create a total number of 18 candidates for each set of

the tested images as shown in Figs. 2.5 and 2.6. Each image is applied the Harris

corner detection to extract the feature point. We randomly choose 8 feature points

30

from each reference image, as shown in Fig 2.4, as reference features to be matched

with the candidates. The parameters used in the test are selected as; size of screening

mask is 11 × 11 pixel, the size of VNC mask and ASD mask is 9 × 9 and σ equal to

1.

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Per

cent

age

of c

orre

ct m

atch

δ = 0.99δ = 0.98δ = 0.97δ = 0.96δ = 0.95

(a)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Per

cent

age

of c

orre

ct m

atch

δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.5

(b)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Per

cent

age

of c

orre

ct m

atch

δ = 1000δ = 2000δ = 3000δ = 4000δ = 5000

(c)

Figure 2.7: A demonstration of the performance in matching feature points in theobject that is rotated with various orientations. (a) proposed approach (b) VNC (c)ASD

The 8 reference feature points of both reference images are matched with the

candidates using our proposed approach, VNC and ASD. We visually observe the

result of each approach to see the average percentage of the correct feature matching

at the different image orientation and at the different threshold value. Fig. 2.7

shows the average proportion of the correct feature matching using the reference

features. The result of using VNC and ASD for matching features is also provided for

comparison. From the experiment, the performance of both VNC and ASD degrades

31

as the orientation becomes higher. This is because the correlation technique is not

invariant to rotation. At the higher orientation, it can be easily seen that the proposed

approach outperform VNC and ASD and can provide the success rate (percentage of

correct match) of more than 40 percents even with the largest threshold value.

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.93δ = 0.92δ = 0.91

(a)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.6δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.2δ = 0.1δ = 0.1

(b)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 1000δ = 1000δ = 1000δ = 2000δ = 3000δ = 4000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 7000δ = 7000δ = 8000δ = 9000

(c)

Figure 2.8: A demonstration of the performance in screening feature points usingfeature number 3 of the blox image. (a) proposed approach (b) VNC (c) ASD

We tested the performance of our approach for matching feature points, now the

performance of the proposed approach in screening purpose is evaluated. For each of

the reference image, 1 feature point from the 8 reference feature points is randomly

chosen. Reference features number 3 from House image and number 4 from blox

image are chosen as reference feature point for screening. We visually observe the

remaining feature points after screening in each candidate, Pscr, to see whether the

screening procedure succeed in removing the undesire feature point while maintaining

the desire feature point. The result of using VNC and ASD for matching features is

32

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

OrientationR

emai

ning

feat

ures

afte

r sc

reen

ing

(%)

δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.94δ = 0.93δ = 0.93δ = 0.92δ = 0.91

(a)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.9δ = 0.8δ = 0.7δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1

(b)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

OrientationR

emai

ning

feat

ures

afte

r sc

reen

ing

(%)

δ = 1000δ = 2000δ = 3000δ = 4000δ = 4000δ = 4000δ = 4000δ = 5000δ = 6000δ = 7000δ = 8000δ = 9000

(c)

Figure 2.9: A demonstration of the performance in screening feature points usingfeature number 3 of the house image. (a) proposed approach (b) VNC (c) ASD

also provided for comparison. Figs. 2.8 and 2.9 show the proportion of the number

of the remaining feature point after screening, Pscr, to the total number of feature

point for each candidate at the different threshold value, δ. The discontinuity in the

plot indicates the failure of the tested approach at that particular threshold value in

maintaining the correct feature point after screening. The experiment shows VNC

fails to perform as screening at high orientation. Although ASD provides better

result than VNC, its performance is still relatively low as the proportion of remaining

features after screening is high (approximately 50 percents using threshold = 5000).

The experiment shows that the proposed approach performs better at almost every

orientation value. It is proven to be invariant to rotation and scale. Using the

threshold value of 0.93 in both experiments by the proposed approach, the correct

feature point remains while more than 60 percents of the total number of feature

points is removed.

33

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.99δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.97δ = 0.97δ = 0.96δ = 0.96δ = 0.96δ = 0.96δ = 0.95δ = 0.95δ = 0.94δ = 0.94δ = 0.93δ = 0.92δ = 0.91

(a)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.7δ = 0.6δ = 0.5δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1δ = 0.1δ = 0.1

(b)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 2000δ = 2000δ = 3000δ = 3000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 6000δ = 7000δ = 7000δ = 8000δ = 8000δ = 9000

(c)

Figure 2.10: A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the blox image. (a) proposed approach(b) VNC (c) ASD

We extend our test to the noisy environment. For the experiment, a Gaussian with

zero mean and 0.01 variance is added to the images. Figs. 2.10 and 2.11 demonstrated

the performance of the three approaches in screening. Our proposed screening method

outperform VNC and ASD with higher success rate and also can provide less number

of remaining features after screening. The result also indicate our proposed screening

method has a much smoother screening result throughout all orientation than VNC

and ASD.

It is also important to note that another advantage of the proposed method over

ASD and VNC is the smaller range of the threshold value which is from 0 to 1. In

ASD, the range of the correlation coefficient is limited by the size of the screening

mask which can grow more than 10000 when the mask size is bigger than 9 × 9

pixels. VNC on the other hand is a normalized correlation technique which provides

34

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.99δ = 0.99δ = 0.99δ = 0.98δ = 0.98δ = 0.98δ = 0.97δ = 0.97δ = 0.96δ = 0.95δ = 0.94δ = 0.93δ = 0.92δ = 0.91

(a)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 0.8δ = 0.7δ = 0.6δ = 0.5δ = 0.4δ = 0.3δ = 0.2δ = 0.1

(b)

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientation

Rem

aini

ng fe

atur

es a

fter

scre

enin

g (%

)

δ = 2000δ = 3000δ = 4000δ = 4000δ = 4000δ = 5000δ = 5000δ = 6000δ = 6000δ = 7000δ = 8000δ = 9000

(c)

Figure 2.11: A demonstration of the performance in screening feature points in thenoisy environment using feature number 3 of the house image. (a) proposed approach(b) VNC (c) ASD

the limited range of the correlation coefficient from -1 to 1. Based on correlation

technique, both ASD and VNC performs slightly better than the proposed approach

at the low orientation. However, at the higher orientation, it can be seen clearly that

the proposed screening method outperform VNC and ASD in both without noise and

with noise environment. This allow user to set the proper threshold value for the

desired application easier than using ASD and VNC.

2.3.2 The accuracy

To test the accuracy of our object recognition approach, we used images taken from

a Nikon 4300 digital camera (UScoins), 2 video frame sequence images (a BMW car

video sequence from Caltech Image Database 1, and images taken from the aircraft

video sequence). We also create a set of noisy images by adding the Guassian noise to

1http://www.vision.caltech.edu/html-files/archive.html

35

(a) (b) (c)

Figure 2.12: Reference images and selected object (a) US coins(b) BMW (c) Aircraft

Images Size # of # of success Avg. time Avg. timeexperiment w/o screening w/ screening

US Coins 442×476 14 14 18.6261 5.9557BMW video 240×360 18 14 26.1667 3.8649Noisy Coins 442×476 14 11 102.2642 7.6032Aircraft 480×704 7 4 21.1238 8.9620

Table 2.1: Recognition results

some of the tested images. Samples of recognition procedures are shown in Fig. 2.13.

The sets of tested images are presented in Figs. 2.14 to Fig. 2.17. We also perform

the test with several images in the different conditions such as an artificially rotated

and scaled object, Fig. 2.13 (a), a video sequence of multiple image frames, Figs. 2.13

(b) and (d), and image in noisy environments (Gaussian noise is added), Fig. 2.13 (c).

The parameters of the LPT, feature extraction, and similarity measure are selected

accordingly in the experiments. The parameter σ, nr, nθ, δ, ε and the size of the local

maxima mask equal to 1, 128, 128, 0.98, 0.97, and 11 × 11 pixels respectively. The

parameter RMmax, is chosen manually in the experiments and the parameter RC

max, is

set to be the same as RMmax. The results of the experiments support our objective of

36

using LPT for the template matching based object recognition with the accuracy of

82.05% in the normal environment and 77.78% in the noisy environment.

2.4 Extended Works

We presented the innovative screening approach in Subsection 2.2.2 that signifi-

cantly reduces the computational time in the search process than applying the LPT

directly to every feature point. However, as discussed earlier, corner is not invariant

to scale change. As a result, it is important to use small value of standard deviation

parameter of the Gaussian window and the size of the local-maxima mask in order

to have sufficiently large amount of feature points to ensure that most corners de-

tected still remain even when object is scaled. The first extension of this work is to

improve the feature point extraction method to not only invariant to rotation, but

also to scale change. As a results, we adopt Gabor wavelet approach for extracting

the 2D invariant feature points. Some modifications to existing Gabor feature ex-

traction methods [27] are made to enhance the performance. More details of Gabor

feature extraction method can be found in Subsection 3.2.1 of Chapter 3 as well as

in Appendix A.

Although feature point extraction can reduce the search space to a set of features,

the number of feature detected is still high. To reduce the computation load, we

introduce an innovative search method, the Multi-resolution LPT (MR-LPT). Our

MR-LPT differs from the multi-resolution based method used in [94] as we do not use

image pyramid of the target image as a search space since performing the search from

coarsest to finest level in the noisy environment gives higher error rate than searching

from the highest resolution. Instead, we keep the resolution of the target image as

37

its original and use different sampling parameters nρ and ntheta for each iteration of

the search.

The search begins by establishing a set of candidates {ILPC1 (ρ, θ), . . . , ILPC

n (ρ, θ)},

from the set of feature points, P , by applying the LPT, described in Eqs. 2.1 and

2.2, to the target image using each feature point {pC1 , . . . , pCn } as the origin, where

n is the size of feature point (more details of feature extraction procedure can be

found in Subsection 3.2.1 of Chapter 3). The size of the LPT circle RCmax should be

at least the size of the object in the target image. Since the size of the object is

unknown in the target image, one may choose RCmax = RM

max, where the superscript

M and C represent the model and the candidate, respectively. The values of nρ and

ntheta for each iteration is T × 2(s− 1) where s is the iteration number s = {1, 2, 3}.

We use T = 8 in this example and throughout paper. In other word, the MR-LPT

performs from 16 × 16 sampling resolution until 64 × 64 sampling resolution, which

is the resolution used for creating the model. At each resolution level, the MR-LPT

images from each feature points in the target image are phase correlated with the

model that is created from MR-LPT at the same resolution. Feature point will be

discarded if its phase correlation value is less than a threshold value. The procedure

is repeated from the coarsest resolution level to the finest.

To obtain MR-LPT, one may perform the LPT with different sampling param-

eters, i.e. nρ = nθ = 16, 32, and 64, respectively. However, since LPT at different

sampling resolutions is computed independently and thus doing so the LPT image

from the coarser level will be discard and the LPT image at the finer level is computed

separately. This makes the computation of the coarser level wasted. We propose an

innovative and alternative way to perform the MR-LPT by mapping the location of

38

the area where the LPT at the coarser level to the area of LPT of the finer level. Fig.

2.18 demonstrates how the area of coarser level LPT is mapped to the area of finer

LPT. Figs. 2.18(a), 2.18(b), and 2.18(c) are the LPT mapping at the sampling reso-

lution of 16×16, 32×32, and 64×64, respectively. In MR-LPT, the areas with strip

pattern are used to represent the coarsest level of LPT image. Unlike regular LPT

(as shown in Fig. 2.18(a)) this LPT image at the coarsest level will not be discarded

after screening. Instead it is reused as a part of the finer LPT. The coloured areas

represent a finer MR-LPT 32× 32. And lastly, every block in Fig. 2.18(c) represents

the finest level, 64× 64.

The mathematical expression of MR-LPT can be demonstrated as follow. Let

M(1)sm×m be an MR-LPT image at level s with the size m × m. The finer level of

MR-LPT, M(1)sm×m can be computed by first calculating M

(2)sm×m and M

(3)sm×m, in the

same manner as M(1)sm×m with 2− n step shift in the log-radius direction, orientation

direction, and both log-radius and orientation directions, respectively. Then

Z(n)s+12m×2m = A2m×m ×M (n)s

m×m × ATm×2m (2.16)

where A is a matrix such that:

A = {aij ∈ A|aij = 1 for 2j − 1 = i, aij = 0 elsewhere.} (2.17)

Then,

Z(n)s+12m×2m = R

(n)s+12m×2m × Z

(n)s+12m×2m × C

(n)s+12m×2m (2.18)

where R and C are the translation matrices in the row and column direction, respec-

tively:

39

R(n)2m×2m =

{I2m×2m for n = 1, 3rij ∈ R|rij = 1 if j = i+ 1, rij = 0 elsewhere

(2.19)

C(n)2m×2m =

{I2m×2m for n = 1, 3cij ∈ C|cij = 1 if j = i+ 1, cij = 0 elsewhere

(2.20)

M(1)s+12m×2m =

3∑n=1

Z(n)s+12m×2m (2.21)

Fig. 2.19 demonstrates the MR-LPT image of US coins (Fig. 2.20) at resolution

o16×16, 32×32, and 64×64, respectively. The effectiveness of combined Gabor feature

extraction and MR-LPT search method in reducing the number of feature points in

the search space is presented in Fig. 2.20. More details and experimental results

using the proposed MR-LPT and Gabor feature extraction are presented in [47].

2.5 Summary

In this chapter, we have presented a new template matching based object recogni-

tion approach using the LPT. The rotation and scale invariant properties of the LPT

allow us to develop a new method for recognizing objects that is difficult to overcome

in the Cartesian coordinates. The method proposed extracts the rotation invariant

feature points from the object and the target image, and uses these feature points to

develop the LPT representations. Furthermore, we propose a new method to evaluate

and select the feature points using the screening vector which shows successful results

in reducing the number of the feature points that are considered candidates for log-

polar mapping. The result is a significant reduction of the computational time. We

also introduce a new similarity measure for final classification.

40

The experiments demonstrate how our recognition approach succeeds in recog-

nizing objects that are rotated and scaled in different conditions, i.e. still images,

video sequences, and noisy environment. The performance of our proposed screening

method is also evaluated. Our approach yields robustness for fast computation, and

since no motion estimation is used, our approach also works well for object tracking

even when the video sequence is skipped or reordered.

41

(a) (b) (c)

corners detected

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 2.13: Experimental results in the target image: (from left to right) featurepoint extraction, screening, recognizing object

42

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n)

Figure 2.14: Sets of tested images, UScoins

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n)

Figure 2.15: Sets of tested images, UScoins in noisy enivironment

43

(a) (b) (c) (d) (e) (f)

(g) (h) (i) (j) (k) (l)

(m) (n) (o) (p) (q) (r)

Figure 2.16: Sets of tested images, BMW

(a) (b) (c) (d) (e) (f)

(g)

Figure 2.17: Sets of tested images, Aircraft

44

(a) (b) (c)

Figure 2.18: MR-LPT mapping

(a) (b) (c)

Figure 2.19: MR-LPT of the quarter coin in US coin image: (a) 16× 16, (b) 32× 32,and (c) 64× 64

45

(a) (b) (c)

Figure 2.20: Feature point reduction from MR-LPT (red dot represents the centerpoint of LPT)

46

CHAPTER 3

IMAGE REGISTRATION USING ADAPTIVE POLARTRANSFORM

LPT is a well known tool for image processing for its rotation and scale invariant

properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in the Cartesian co-

ordinates appear as translation or shifting in the log-polar domain. These invariant

properties make LPT based image registration robust to scale and rotation. How-

ever, LPT suffers from the nonuniform sampling. One major problem of that is the

high computational cost in the transformation process, which comes from the over-

sampling at the fovea in the spatial domain. Since no information is gained from

oversampling, this computation is wasted. Another major problem of LPT is the bias

matching. With LPT, the matching mechanism focuses only at the fovea or the area

close to the center point of the transformation, while the peripheral area is given less

consideration. Furthermore, occlusions and alterations between the two images may

occur which is not considered by LPT. For example, the satellite images of the same

location but taken at different times may contain occlusion due to climate change or

cloud. In order to effectively register the two images, image registration method that

is invariant to occlusion and alteration is needed.

47

Motivated by these observations of the conventional LPT, our work attempts

to effectively register two images that are subjected to occlusion and alteration in

addition to scale, rotation and translation. We introduce adaptive polar transform

(APT) technique in the spatial domain that evenly samples the image. We further

apply the projection transform to the transformed image to reduce the image from

two to one dimensional vector. With APT and the projection transform, rotation and

scale in the Cartesian appear as shifting and variable-scale in the transformed domain.

The new method requires less computational cost in the transformation process than

that of the conventional LPT, while maintaining the robustness to the changes in

scale and rotation. We further introduce a new search method that efficiently uses

the scale and rotation invariant feature points to eliminate the exhaustive search

for all the possible translations of the model image. Another contribution of this

work is the image comparison scheme in the projection domains that is designed for

locating the areas that are subjected to occlusions and alterations in the image. This

information is useful for many applications such as medical imaging or scene change

detection. For example, medical images of the same patient that are taken before

and after the surgery are registered, compared and analyzed to evaluate the progress

of the surgery. It is valuable for the image registration system to not only align the

images, but also to automatically locate the area where the two images differ. This

information would ease the doctor for comparing and analyzing the medical images.

We denote the area that contains alteration or occlusion as the altered area.

48

3.1 Adaptive Polar Transform Approach

Inspired by the scale and rotation invariance properties of the conventional LPT

approach, we propose a novel image transformation scheme, APT, that is designed

to address the two major problems of LPT: the high computational cost in the trans-

formation procedure and the bias matching due to the nonuniform sampling, while

maintaining all the advantages. Subsection 1.1 briefly introduces LPT and its advan-

tages for image registration. Subsection 3.1.1 presents the motivation of the proposed

APT approach. Subsection 3.1.2 and 3.1.3 present the proposed APT and the pro-

jection transform, respectively.

3.1.1 Motivation of the proposed approach

Although LPT has been widely used in many image processing applications, it

suffers from nonuniform sampling. As shown in Fig. 1.1(a), as the radius of the

mapping increases, pixels in the Cartesian coordinates are sampled with less number

of times. This nonuniform sampling would cause image pixels that are far away from

the center point to be missed. This phenomenon yields losses in image information

which will eventually decrease the accuracy of the registration system. To prevent

any image pixel from being missed in the mapping process, large numbers of samples

are required in both the log-radius and the angular directions. We denotes nρ and

nθ as numbers of samples in the log-radius and the angular directions, respectively.

We also denote Ri as the radius size in pixel for i = 1, . . . , nρ, which will be sampled

equally to cover 360 degrees in the angular direction for θ = 0, . . . , nθ. To prevent

undersampling in the log-radius direction, the condition of Rnρ − Rnρ−1 ≤ 1 has to

49

be met. It can be easily found that the following equations should hold:

Ri = exp[i× logRmax

nρ], Rmax = Rnρ , (3.1)

nρ ≥logRmax

logRmax − log(Rmax − 1). (3.2)

In the angular direction, the greatest sampling circumference, Rmax, contains

approximately 2πRmax image pixels. Hence, we need to have the number of samples

in the angular direction equal or greater than that amount to prevent image pixels

being missed in the sampling procedure:

nθ ≥ 2πRmax. (3.3)

For the sake of simplicity, nρ and nθ can be selected in the form of multiple power

of two. This will not only make the computation simpler, but also accelerate the

matching which is usually carried out by cross-correlation [39] or by Fourier phase

correlation [35]. Hence, Eqs. (3.2) and (3.3) can be expressed as:

nρ = 2{ceil{1

log 2×log[ log Rmax

logRmax−log(Rmax−1)]}}, nθ = 8Rmax. (3.4)

The operation ceil(A) denotes the nearest integers greater than or equal to A. With

the proposed numbers of samples to be used in the computation of LPT, image pixels

in the Cartesian coordinates will not be missed in the mapping process. The pixels at

the circumference will be mapped to the log-polar coordinates at approximately one-

to-one. However, image pixels at the fovea will be oversampled. Since no information

is gained from oversampling, the computation is wasted. For an image of size 256×256

pixels, the minimum 613 samples in the log-radius and 805 samples in the angular

directions are required (1024 samples in both the log-radius and the angular directions

50

in the case of power of 2 number). It can be easily seen that the required number

of samples is too large compared to the original size of the image in the Cartesian

coordinates.

Another major problem of using the conventional LPT for image registration is

the bias matching. Since the number of samples at the fovea of the image is larger

than that of the peripheral, the matching mechanism will also focus more on the

fovea or central area while the peripheral area is given less consideration. Hence,

using the conventional LPT for image registration yields inaccurate result when there

are changes in the image, especially at the area close to the center point. Fig. 3.1

shows an example of registering the Lena image patch to the non-occluded target

image (the upper image on the right of Fig. 3.1(a)) and to the occluded target image

(the lower image on the right of Fig. 3.1(a)), respectively. In this demonstration, an

artificial block of 20×20 pixels is used as occlusion. The sensitivity of the matching is

evaluated through the correlation function between the model image and the target

image (referring to the registration method using the conventional LPT proposed

in [94]). Figs. 3.1(b) and 3.1(c) show the correlation coefficients at the point where

the model image and the target image are matched in case of the non-occluded target

image and the occluded target image, respectively. In this demonstration, it can

be seen that the conventional LPT performs effectively in registering images in the

normal condition. Fig 3.1(b) shows the strong peak of the correlation coefficients

and Fig. 3.1(a) (upper-right image) indicates accurate registration result. However,

when part of the target image contains occlusion, the peak of the correction is reduced

dramatically to the point that it is difficult to locate the peak, as shown in Fig. 3.1(c).

In fact, the peak of the correlation coefficients is mislocated (false peak). Therefore,

51

the registration result using the conventional LPT in this case yields errors for both

scale and rotation, as shown in the lower-right image in Fig. 3.1(a). Since the method

yields high sensitivity to changes in the image, especially at the area close to the fovea,

the conventional LPT is not suitable for registering image that is occluded or altered.

3.1.2 The proposed APT approach

The objective of our proposed APT is to address the two major problems of

the conventional LPT: the uneven sampling to the entire transformed image, and

the computational waste due to the oversampling at the fovea. It can be seen that

the cause of the problems comes from the fact that the number of samples of the

conventional LPT increases exponentially from the peripheral to the fovea. Hence,

when the transformed image is used for matching, more consideration is given to the

fovea than to the other area of the image. To address these problems, we propose

a novel APT approach that yields robustness in registering images that subjects to

occlusion and alteration, while maintaining sufficiently low computational cost during

the transformation.

We denote nr and nθ as the numbers of samples in the radius and the angular

directions, respectively. For APT, given a square image with the size of 2Rmax×2Rmax

pixels that we want to evenly sample and transform from the Cartesian to polar

coordinates, the parameter nr should be greater or equal to Rmax. We denote Ri as

a size of radius in pixel for the sample i in the radius direction, and Ui the sampling

circumference at Ri. Hence Ri = i × Rmax/nr. In the conventional LPT and polar

transform, Ui is sampled equally with the fixed numbers of sample, nθ, to cover 360

degree. To effectively sample the image, the parameter nθ needs to be adaptive. Since

52

(a)

(b) (c)

Figure 3.1: Example of inaccurate registration results using the conventional LPT dueto occlusion in the target image: (a) the model image patch cropped from Lena imageis registered to the non-occluded target image on the upper-right and occluded targetimage on the lower-right, where the red rectangular indicate the registration results,(b) the correction coefficients when registering to the non-occluded target image, and(c) the correction coefficients when registering to the occluded target image

53

(a) (b)

Figure 3.2: APT mapping: (a) adaptive sampling in the spatial domain with moresamples in the angular direction as the radius increases, (b) the resulting sampledistribution in the angular and radius directions. Note that in (a), the distancebetween consecutive sampling points in the radius direction remains the same for allradius circumferences and so does the angular direction

each circumference Ui covers approximately 2πRi pixels, the number of samples in

the angular direction for each sample i in the radius direction, nθi, should also be

adjusted accordingly. For the sake of simplicity, we estimate 2πRi ≈ 8Ri. Hence, the

effective value of the sampling parameters nr and nθican be expressed as:

nr = Rmax, nθi= 8Ri. (3.5)

The complete implementation to transform a square image with the size 2Rmax ×

2Rmax pixels in the Cartesian I(x, y) to adaptive polar image IP (r, θ) is shown below:

for i =1 to nr dofor j =1 to nθi

doIP (i, j) = I(Rmax +Ri cos(2πj

nθi), Rmax +Ri sin(2πj

nθi))

end forend for

54

(a) (b)

(c) (d)

Figure 3.3: Examples of APT: (a) the original Lena image, (b) the APT transformedimage of (a), (c) the scaled and rotated image of (a), and (d) the APT transformedimage of (c)

As shown in Fig. 3.2(a), the number of samples in the angular direction of APT

is distributed adaptively according to the size of the radius. The result in this step

is a series of sample bins arranged in the step-like manner as shown in Fig. 3.2(b),

i.e., as r increases, the length of the sample bins (number of samples in θ) increases.

The total number of samples is less than what is needed for the conventional LPT

methods to have the same sampling resolution. Since sampling involves interpolation

in both cases, which requires high computation, the reduction of samples in APT

accelerates the computation. The results of applying APT to the Lena image and

the scaled and rotated version of Lena are shown in Fig. 3.3.

55

3.1.3 Projection transform

As shown in Figs. 3.3(b) and 3.3(d), the result from adaptive sampling is a series of

sample bins arranged in the step-like manner which do not show coordinate shifts for

scaled and rotated image as in LPT any longer. To maintain the advantages of scale

and rotation invariance in LPT, we use an innovative projection transform method

which projects the two-dimensional image on the radius and angular coordinates,

respectively. From the two projections, we can accurately calculate the scale and

rotation parameters, for which the details will be elaborated in Section 3.2. For now

we define the projection transform for the APT transformed image.

Given a transformed image IP (r, θ) that consists of nr sample bins in which

each bin has the length of nθifor i = 1, ..., nr. We denote nθ,� and Θ as the

number of samples in the angular direction at Ri = Rmax, the projection on the

radius coordinate, and the projection on the angular coordinate, respectively. The

mathematical expressions of � and Θ are as follow:

�(i) = Ωi

nθi∑j=1

IP (i, j) (3.6)

Θ(j) =nr∑i=1

[η1ijIP (i, ceil(

j − 1

Ωi)) + η2

ijIP (i, ceil(j

Ωi))] (3.7)

where ⎧⎨⎩

i ∈ [1, ..., nr], j ∈ [1, ..., nθ], Ωi = nθ

nθi

η1ij = Ωi(j − 1)− floor[Ωi(j − 1)], η2

ij = 1− η1ij

IP (i, 0) = 0, ∀i(3.8)

The operation floor(A) denotes the nearest integers less than or equal to A. The

results of the projection transform, vectors � and Θ will have the dimension of nr

and nθ, respectively. Both projections � and Θ are normalized to reduce the effect

of illumination changes. The computation of projection � is simple and does not

56

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection ℜ

Projection ℜ of original imageProjection ℜ of scaled image

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection ℜ

Projection ℜ of original imageProjection ℜ of rotated image

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection ℜ

Projection ℜ of original imageProjection ℜ of scaled and rotated image

0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection Θ

Projection Θ of original image

Projection Θ of scaled image

(a)

0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection Θ


Projection Θ of rotated image

(b)

0 500 1000 1500 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection Θ


Projection Θ of scaled and rotated image

(c)

Figure 3.4: The effects of the changes in scale and rotation in the Cartesian coordi-nates to the projections � and Θ: (a) the scale change in the Cartesian (scale factorof 1.2 is applied to Lena image) appears as variable-scale in the projection �, whilethe projection Θ becomes slightly altered, (b) the rotation in the Cartesian (rotationparameter of 45 degrees is applied to Lena image) appears as shifting in the projectionΘ, while there is no change in the projection �, and (c) the changes in both scale androtation and in the Cartesian (scale and rotation parameters of 1.2 and 45 degrees,respectively are applied to Lena image) appear as variable-scale in projection � andshifting in projection Θ, respectively

require interpolation. The computation of projection Θ, on the other hand, requires

one-dimensional interpolation, as shown in Eq. (3.7).

Examples of the projections of APT of the Lena image and its scaled and rotated

version are shown in Fig. 3.4. It can be seen that the scale change in the Cartesian

appears as the variable-scale in the projection � (i.e. from f(t) to f(at)) and the

rotation change in the Cartesian appears as shifting in the projection Θ.

57

(a) (b)

(c) (d)

Figure 3.5: Examples of APT of images of syrup bottle: (a) the reference image withthe model image selected as the area inside the red circle, (b) the APT transformedimage of (a), (c) the target image, and (d) the APT transformed image of the areainside the red circle of the target image

Another example of APT transformed images and their projections are shown

in Figs. 3.5 and 3.6, both using real images instead of synthetic ones of Fig. 3.3.

The reference image is shown in Fig. 3.5(a), where the syrup bottle is chosen to

be the model image as shown by the red circle. Fig. 3.5(b) shows the APT of

the model image. The target image is shown in 3.5(c) and the APT, which is the

area inside the red circle, is shown in 3.5(d). To demonstrate the effect of scale

and rotation to the projections, we use the same size of the circular area to be

transformed for both the reference image and the target image (Rmax is equal for

both images). The projections � and Θ of APT of the syrup bottle in the reference

image and the target image are shown in Figs. 3.6(a) and 3.6(b), respectively. Unlike

the previous example with Lena images that involves only small scale parameter of

58

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection ℜ

Projection ℜ of model image

Projection ℜ of target image

(a)

0 100 200 300 400 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Projection Θ

Projection Θ of model image

Projection Θ of target image

(b)

Figure 3.6: The projections of the syrup bottle images: (a) comparison of projections �of the syrup bottle from the reference image and the target image, and (b) comparisonof projections Θ of the syrup bottle from the reference image and the target image.The differences between projections of the two images are more pronounced, whichneed additional techniques to resolve

1.2, this demonstration with syrup bottle images involves larger scale parameter of

1.84. Moreover the syrup bottle images are non-synthetic, which are captured with

digital camera in the uncontrolled environment. This introduces various parameters

that affect the similarity between the two images as well as their projections such

as illumination changes, blurness, and three-dimensional perspective. As a result,

the alterations between the projections are more pronounced. This also makes it

more difficult to obtain the scale and rotation parameters by only visual inspection.

Moreover, if the center point of the APT transform does not correspond to that of the

model image (the red circle area is moved), the alterations between the projections

of both images will become even greater. As a result, novel techniques for obtaining

scale, rotation, as well as translation between two images need to be introduced.

These new techniques are discussed in detail in the next Section.

59

3.2 The New Image Registration Approach Using APT

3.2.1 Localization

In order to fully exploit the advantage of APT, the translation parameter between

the two images has to be found. The simplest solution is to perform the exhaustive

search, in which the APT is computed for every pixel in the target image and the

translation parameter is found from the image pixel that yields the best matching

result to the APT of the model image. Since it is computationally expensive to

perform the exhaustive search, an innovative method to reduce the space of search

is needed. Many works have proposed methods to speed up the search procedure.

Fourier-Mellin [64, 69] uses the Fourier phase correlation to recover the translation

before computing the log-polar matching in the frequency domain. However, a recent

study [73] indicates the aliasing problem from using the phase magnitude of FFT to

recover the translation (of the center point). Zokai et al [94] uses a multi-resolution

imaging technique to reduce the computation cost of translation recovery by searching

from the coarsest to the finest level. This technique, however, performs well only when

the target image contains enough low frequency information and that low frequency

component is not distorted by noise. We introduce a new method to accelerate the

search procedure. The new method is based on a reduced set of feature points.

In order to avoid the exhaustive search of the target image for the model image, we

reduce the search from every pixel of the target image to a set of feature points only.

These feature points are obtained by applying Gabor transform to every pixel and

selecting those pixels which generate high energy in the wavelet domain. Apparently,

the number of feature points is much smaller than that of the pixels in the target

image, while the computation of the Gabor wavelet transform is much lower than

60

that of APT. Thus, the computation load is much lighter than the exhaustive search

using APT. The reason we choose Gabor wavelet for extracting feature points is due

to its invariant properties to scale, rotation, and noise. More details of the Gabor

feature extraction method can be found in [47].

First, feature points are extracted from both the reference image and the target

image. We denote PM = {pM1 , .., pMnM} and P T = {pT1 , .., pTnT

} as sets of feature points

in the reference image and in the target image, respectively. The superscription M

and T denote the model (reference) image and the target image, and nM and nT are

the number of feature points in the reference image and the number of feature points

in the target image, respectively. To crop the model image out of the reference image,

we manually select one of the feature points in the reference image pMk as a center

point for APT and choose the size of the radius for the transformation Rmax that

sufficiently covers the model image to compute APT and the projections of APT, �M

and ΘM . These projections �M and ΘM represent the model image. We note that

for the case where the entire reference image is desired to be registered to the target

image (model image is the reference image), the process of selecting feature point pMk

can be performed automatically by selecting the feature point that locates closest to

the center of the model image. The size of the radius for the transformation Rmax

can be computed as the minimal distance from the feature point to four boundaries

in x−y axes of the model image. To recover the translation between the model image

and the target image, we need to find the corresponding feature point pTk′ in the target

image that relates with the feature point pMk in the model image.

Fig. 3.7 shows an example of the localization procedure. On the left is the refer-

ence image (Lena with the size 256 × 256 pixels) with 166 feature points identified.

61

The model image is created by selecting one of the feature points in the reference

image pMk as the origin, and is cropped to a circular image patch that covers the area

Rmax desired to be registered to the target image. On the right is the target image

(Lena that is scaled by 0.7 and rotated by 30 degrees, the size of the image is 256

× 350 pixels) with 180 feature points identified. To recover the translation between

the model image and the target image, we need to locate the corresponding feature

point pTk′ in the target image. As shown in Fig. 3.7, the search space is limited to

only a set of feature points in the target image, which is approximately 0.2% of the

total number of pixels in the target image in this example. Hence, the search space

is much smaller than that of the exhaustive search strategy.

The number of feature points identified varies depending on the richness of details

in the image. In general, the number of feature points identified using the proposed

method is approximately 0.1% -1% of the total number of pixels in the image. And

for most images, these amounts of feature points are sufficient enough to find the

corresponding pixels; i.e. pMk in the model image and pTk′ in the target image. We

compromise the number of feature points identified in the image with the compu-

tational load for localization (refer to the Gabor feature extraction method in [47]).

The number of feature points is not too small to detect the corresponding feature

points, while not too large to significantly increase the computational load.

3.2.2 APT matching

After extracting feature points in the model image and selecting one feature point

pMk as the center point for computing the projections �M and ΘM using the APT

approach, the next step is to find the corresponding feature point pTk′ in the target

62

Figure 3.7: Feature point extraction and localization procedure

image and obtain both the scale and the rotation parameters between the two im-

ages. Given a set of feature points in the target image and the radius size of the APT

transformation Rmax to be the same as for computing the projections �M and ΘM ,

we use each feature point in the target image as a center point for computing APT

creating the set of candidate projections �T = {�T1 , ..,�TnT} and ΘT = {ΘT

1 , ..,ΘTnT}.

By matching �M and ΘM with every member in the sets of projections �T and ΘT ,

respectively, the translation, scale, and rotation parameters are obtained simultane-

ously. The matching results have three dimensions: the scale parameter, the rotation

parameter, and the distance coefficient. The translation parameter is the offset be-

tween the location of pMk to the feature point pTk′ in the target image that yields the

lowest distance coefficient.

For a given feature point pTz in the target image, we denotes IM , ITz , �Tz , and

ΘTz , as the model image, the image patch that is cropped from the target image

to be matched with the model image, the projection on the radius coordinate, and

the projection on the angular coordinate, respectively. Then the scale change in the

Cartesian coordinates reflects as variable-scale in the projection �, and the rotation

63

change reflects as shifting in the projection Θ. Therefore, the innovative matching

mechanisms are proposed to obtain both the scale and the rotation parameters. The

mathematical expression of image ITz that is a scaled and rotated version of image

IM is

ITz (x, y) = IM(azx cos θz + ay sin θz ,−azx sin θz + azy cos θz). (3.9)

In the polar domain, Eq. (4.3) can be expressed as:

ITz (r, θ) = IM(azr, θ − θz) (3.10)

where az and θz are the scale and the rotation parameters, respectively. As shown

in Eqs. 4.3 and 4.4, the scale parameter az, in the Cartesian coordinates appears as

scaling with the same value in the radius direction of the polar transformed image.

For the case of adaptive sampling (APT), we can see that the effect of scaling in the

Cartesian to APT mapping remains similar to that of the polar transform mapping.

Except for APT, the number of samples in the angular direction can vary (adaptive).

To compute the 1D projection in the radius direction �(i), we introduce multiplication

variable Ωi to compensate the effect of adaptive sampling in the angular direction, as

shown in Eqs. 3.6 to 3.8. As a result, the 1D projection � of APT is an approximation

to that of the polar transformed image. From Eq. 3.6, we have:

�Tz (i) = Ωi

nθi∑j=1

IP Tz (i, j) (3.11)

and

�M(i) = Ωi

nθi∑j=1

IPM(i, j). (3.12)

We can compute �Tz in terms of �M as follow:

�Tz (i) = Ωaz i

nθazi∑j=1

IPM(azi, j), (3.13)

64

�Tz (i) = �M(azi). (3.14)

Thus, the variable-scale az is a global and uniform scale for the 1D projection � curve.

As shown in Figs. 3.4(a) and 3.4(c), the projections Θ of the scaled images in the

Cartesian are slightly altered when compared with that of the original image. This

is because the areas covered during the APT transformations are different as a result

of the scaling. Hence, in order to accurately obtain the shifting parameter between

the two projections ΘM and ΘTz , the variable-scale parameter az between the two

projections �M and �Tz needs to be obtained first.

Find scale parameter

There are several ways to obtain the scale parameter from the projections depend-

ing on the requirements of the application in term of the computational cost, accuracy,

image types and environments. We introduce here three effective algorithms.

• Algorithm 1: Fourier method.

Based on the property of Fourier transform, given a Fourier transform of signal

x(t) as X(ω), the Fourier transform of the scaled signal can be expressed as:

x(t)←→ X(ω) (3.15)

x(at)←→ 1

|a|X(ω

a). (3.16)

Similar to the polar transform, the projection �Tz computed from image ITz that

is the scaled version of image IM with the scale parameter az will also be the

scaled version of the projection �M with the same scale parameter az. From

65

Eqs. (3.15) and (3.16), given the Fourier transform of the projection �(.) as

Γ(.), we can obtain the scale parameter az by:

�Tz (i) = �M(azi) (3.17)

|az| =ΓM(0)

ΓTz (0). (3.18)

We note that this Fourier method performs effectively only when the scale

parameter is small. Large scale parameter yields the aliasing problem. Hence,

this algorithm is suitable for applications that require fast and accurate image

registration while the scale parameter between the two images is expected to

be small, such as medical image registration [45], or scene change detection.

• Algorithm 2: Logarithm method

The second algorithm uses the scale invariant property of the logarithm func-

tion. First, the logarithm function is applied to the projection � and the output

is then quantized to maintain the original dimension of the projection. The

mathematical expression of the implementation is as follow:

L�(k) = �(nrlog k

lognr); k = 1, ..., nr. (3.19)

The parameter L�(.) denotes the logarithmic of the projection �. Given image

ITz a scaled and rotated version of image IM , the scale parameter az between

the two images would appear as translation in the logarithm domain:

£�Tz (ρ) = £�M(ρ) + log az. (3.20)

66

To find the displacement d, where d = log az, such that L�Tz (ρ) = L�M(ρ −

d), one can evaluate the correlation function between the two logarithmic of

projections C(£�M ,£�Tz ):

d = arg max C(L�M ,L�Tz ). (3.21)

This algorithm yields high accuracy and fast matching in the general condition.

However, similarly to the conventional LPT, this method can give inaccurate

result if there are high changes at the fovea between the two images. Although

this algorithm uses logarithmic in the computation, the proposed algorithm is

different from the image registration method in [94] in many aspects. First

of all, the correlation coefficient is not used to verify the global translation.

Instead, the correlation function here is used to obtain the scale parameter

only. Secondly, minimum-square-error (MSE) between projections Θ is used in

the adaptive polar domain as a true similarity measure to verify the registered

image pair. This will be discussed later in detail in Subsection 3.2.2.

• Algorithm 3: Resampling projection

Both algorithms 1 and 2 can yield accuracy at some degrees. As stated earlier,

the accuracy of algorithm 1 degrades as the scale parameter increases as a result

of aliasing between the two projections. In the case of algorithm 2, although

the accuracy of the algorithm is not affected by the size of the scale parameter,

it yields inaccurate result when there are strong changes at the fovea between

the two images. Moreover, the second algorithm yields displacement estimation

with only integer accuracy. If high precision is required, one may include an

extra step after acquiring the estimated scale parameter, denoted as a′z, from

67

either algorithm 1 or algorithm 2. Given the lower limit aL and the upper limit

aU of the search space for scale parameter such that aL < a′z < aU , respectively,

and given the number of the search steps equals to h. The search process can

be described as follow:

for i =1 to h do�Mres = Φ(�M , ai) where ai = aL + i(aU−aL)

h

if ai ≥ 1 then

E� =

√∑nraij=1[�Tz (j)−�Mres(j)]2

elseE� =

√∑nr

j=1[�Tz (j)−�Mres(j)]2end if

end for

where the operation y = Φ(x, a) is a resampling procedure. This resampling

procedure is a common operation in signal processing for computing the se-

quence in vector x at a times the original sampling rate by using a polyphase

filter implementation. Operation Φ applies an anti-aliasing linear-phase (low-

pass) FIR filter to x during the resampling process. The result will have the

dimension of vector equal to length(y) = length(x) × a. We normalize the

resampled projection prior comparison. The scale parameter is equal to the

resampling parameter ai that yields the lowest MSE, E�. It is obvious that the

larger parameter h is, the higher accuracy would be obtained.

Find rotation parameter

After the scale parameter is obtained, the next step is to find the rotation pa-

rameter θz. For the same sampling radius, Rmax, the larger the image (az >> 1),

68

the smaller the area of the scene is covered in the sampling procedure. As a result,

the magnitude of the projections Θ between the two images could be slightly altered.

This phenomenon is illustrated in Figs. 3.4(a) and 3.4(c). Hence, to accurately ob-

tain the rotation parameter θz, Eq. (3.7) needs to be modified according to the scale

parameter az:

ΘM =

{ΘM if az ≤ 1∑nr

azi=1[η

1ijIP

M(i, ceil( j−1Ωi

)) + η2ijIP

M(i, ceil( jΩi

))] otherwise,(3.22)

ΘTz =

{ΘTz if az ≥ 1∑aznr

i=1 [η1ijIP

Tz (i, ceil( j−1

Ωi)) + η2

ijIPTz (i, ceil( j

Ωi))] otherwise.

(3.23)

Both projections are then resampled to be equal in length. The rotation parameter

can be found by evaluating the correlation function C(ΘM , ΘTz ):

d = arg max C(ΘM , ΘTz ) (3.24)

θz =2πd

nθ. (3.25)

Find distance coefficient

The final and crucial component resulting from APT matching is the distance

coefficient, denoted as E . This distance coefficient indicates how large the difference

between the two images is. The distance coefficient Ez between the model image IM

and the target image ITz can be computed from the Euclidean distance between the

projections ΘM and ΘTz :

Ez =

√√√√ nθ∑i=1

[ΘTz (i)− ΘM(i− d)]2. (3.26)

69

For the proposed image registration approach, we are searching for one feature

point pTk′ in the target image that corresponds with the feature point pMk in the

model image. Such feature point pTk′ is the point that yields the lowest distance

coefficient. We note that the projections �M and �Tz are not used for computing the

distance coefficient Ez because the dimension of projections that can be used in the

computation varies depending on the scale parameter az (as seen in the computation

of E� in algorithm 3 of the Subsection 3.2.2). As a result, only the projections ΘM

and ΘT are used in the computation.

3.2.3 Image comparison

This work extends the use of image registration to the environment where images

contain occlusion or alteration. Hence, it is important to be able to locate the area

such changes take place in the target image, which will be useful for many applications

such as medical image registration and scene change detection. Using the advantages

of APT, we propose a fast and simple image comparison scheme that can effectively

and automatically locate the altered area or the area where the registered image pair

differs without scarifying additional computational cost.

Since the scale difference in the Cartesian coordinates between the model image

and the target image yields different area coverage in the transformation process, the

dimension of the projection � for comparing the two images needs to be adjusted

accordingly. Given the scale and the rotation parameters between the model image

and the target image as ak′ and θk′ , respectively, we can find the altered area as a set

of pixels (X, Y ) in the Cartesian coordinates, where X = x1, ..., xt, Y = y1, ..., yt, and

parameter t is the number of pixels in the altered area as follow:

70

ε�(i) = ‖�Tk′(i)− �Mres(i)‖{

for i = 1 to nr if ak′ ≥ 1for i = 1 to ak′nr if ak′ ≤ 1,

(3.27)

εΘ(j) = ‖ΘTk′(j)− ΘM(j − d)‖, for j = 1 to nθ. (3.28)

We denotes the location of pTk′ in the Cartesian coordinates as (cx, cy). For ∀ε�(i), ∀εΘ(j) ≥

ε, we can compute the set of image pixels in the altered area (X, Y ) as follow:

(X, Y ) = {(x1, y1), ...., (xt, yt)}, (3.29)

∀ε�(i), ∀εΘ(j) ≥ ε; X = cx + ε�(i) sin εΘ(j), Y = cy + ε�(i) cos εΘ(j). (3.30)

The threshold ε is specify by the user according to the desire sensitivity of the ap-

plication to the changes between the two images. The recommend value is between

0.1-0.2.

It should be noted that image comparison to find altered area between two images

can be done by any registration algorithm if image registration can be achieved. Our

method is robust in the sense that the image with altered area even in the center can

still be registered plus the reservation of the LPT advantages scale and rotating invari-

ant. Furthermore, the altered area can be identified without further computational

cost. For most registration algorithms, in order to compare two images that involve

the two-dimensional geometric transformation, additional computations are required

as post processes such as image alignment, geometric transformation to transform

both images to the same coordinates prior to the comparison, and image normaliza-

tion. For the proposed image registration approach, since both the model image and

71

the target image are already transformed to the projections of the APT domain, im-

age comparison can be performed directly and simultaneously while obtaining scale

and rotation parameters.

3.2.4 Complete algorithm

Below is the complete steps of the proposed algorithm:

• Extract feature points in both the reference and the target images.

• Create the model image by selecting one of the feature points in the reference

image pMk as the origin. Then crop circular image patch IM that covers the area

Rmax desired to be registered to the target image.

• Compute the projections �M and ΘM using the proposed APT approach and

the projection transform.

• Use each feature point in the target image pTz for z = 1 : nT , where nT is the

number of feature points in the target image as the origin and crop a circular

image patches ITz for z = 1 : nT with the radius size Rmax.

• Compute the sets of candidate projections �T = {�T1 , ..,�TnT} and ΘT =

{ΘT1 , ..,Θ

TnT}.

• Match each candidate with the model using the proposed APT matching algo-

rithm. Translation is the point (x, y) that yield the lowest distance coefficient.

The scale and rotation parameters are obtained simultaneously in this step.

• Find the altered area (X, Y ).

72


To evaluate the effectiveness of the proposed image registration approach, we

apply twelve different sets of test images. In Subsection 3.3.1, we present the reg-

istration of images that are rotated and scaled. We use five sets of test images in

the experiment. Different rotation and scale parameters are applied to three sets of

test images: squirrel, statue of liberty and bird using Matlab. Another two sets of

test images consisting of images of objects: speed limit sign, and syrup bottle taken

at different locations and at different illuminations are also presented in this Subsec-

tion. In Subsection 3.3.2, we present the registration of images that are subjected to

occlusion and alteration. We use seven sets of test images for the experiment. The

first two sets of test images: Dreese building and flower are rotated and scaled using

Matlab. Occlusions artificially generated using random number are included in the

test images. The next five sets of test images are non-artificial including the medi-

cal images of the human brain, and human lung taken before and after the medical

therapy, and occluded images of objects: cereal box, stop sign and OSU tower. To

compare the performance of the proposed approach, we repeat the experiments with

the conventional LPT based image registration approach (as proposed in [94]).

For the proposed approach, the parameters nr and nθ used in the experiments

are computed according to the size of Rmax used in the experiments as shown in

Eq. (3.5). The translation information is obtained using the proposed feature based

search scheme. To obtain the scale parameter, we choose to use Algorithm 2, the

logarithm method, to get the estimated scale parameter a′z, followed by Algorithm

3, resampling projection, in which the parameters aL = a′z − 0.5, aU = a′z + 0.5, and

h = 20. For the experiments with the conventional LPT, the parameters nρ and nθ

73

are selected according to Eq. (3.4). The translation information is obtained using the

multiresolution technique as suggested in [94]. To achieve accuracy at pixel level, we

apply a small search windows of 5 × 5 pixels around the translation point for both

methods.

Figs 3.8-3.19 consist of three subfigures. The subfigure on the left is the reference

image, where the model image is the area inside the red rectangular (for the case where

the entire reference image is used for registration, the model image is the reference

image). The subfigure in the middle is the registration result using the conventional

LPT, where the red rectangular indicates the registration result. The subfigure on the

right is the registration result using our proposed APT based approach, where the red

rectangular indicates the registration result and the green area indicates the altered

area. Using the ground truth information, we measure and compare the accuracy of

the proposed method and the conventional LPT as shown in Table I and II.

All experiments are carried out using Matlab 2007a running on 2GHz Intel Pen-

tium IV machine. The computation time required for registering an image pair varies

depending on the image size, the richness of texture content in the image, which in-

fluence the number of feature point detected in the image, and the size of Rmax. In

our experiment, images with the size of 353 × 261 pixels (as shown in Fig. 3.11(b))

and 768 ×1024 pixels (as shown in Fig. 3.10(b)) are registered in approximately 25

and 55 seconds, respectively, and the average runtime from all experiments is ap-

proximately 35 seconds. We, however, found that approximately 70 percents of the

computational time involves the Gabor feature extraction process. Although com-

puting Gabor wavelet is much less expensive than that of the exhaustive search using

APT, it still imposes a high computational cost and the solution for speeding up the

74

process is an on going research topic [71,87,88]. There are existing works that are able

to achieve real-time computation using parallel hardware structure designed for the

computation of Gabor wavelet [71,87], which could greatly improve the computation

speed of our proposed approach. This topic is beyond our scope and not included in

this work.

3.3.1 Image registration in general conditions

We evaluate the performance of the proposed image registration approach in gen-

eral conditions, in which the images are subjected to scale, rotation, and illumination

changes. All images are taken from digital camera in normal lighting condition. Dif-

ferent scale and rotation parameters are applied to the first three images: squirrel,

statue of liberty, and bird using Matlab. The following two images: speed limit sign

and syrup are taken at different locations, zooming and viewpoints. In the first exper-

iment, a rectangular area with the size of 104 × 104 pixels are cropped out of squirrel

image and used as reference image, as shown in Fig. 3.8(a). Since the entire area of

the reference image is used in this test, the reference image is also considered as the

model image. The squirrel image is then scaled and rotated with the parameters of 1

and 45 degree respectively, and used as the target image. Figs. 3.8(b) and 3.8(c) show

the registration results using the conventional LPT based approach and the proposed

APT approach, respectively. In the second and third experiments with test images:

statue of liberty and bird, the similar process is applied. The rectangular areas with

the size of 146 × 146 pixels and 170 × 170 pixels are cropped out of statue of liberty

and bird images, respectively, and used as reference images, as shown in Figs. 3.9(a)

and 3.10(a). The scale and rotation parameters applied to the test images are 1.5 and

75

180 degrees for statue of liberty image and 3 and 90 degree for bird image. The reg-

istration results using the conventional LPT based approach and the proposed APT

approach of both test images are shown in Figs. 3.9(b), 3.9(c), 3.10(b), and 3.10(c).

Next, we use 2 sets of test images: speed limit sign, and syrup bottle. Each set

of the test images contain 2 different images that have the same object taken at

different locations, zooming and viewpoints. For the experiment with the speed limit

sign image, a model image with the size of 36 pixels in diameter containing the speed

limit sign is selected, as shown in Fig. 3.11(a). To better illustrate the rotation

between the selected model image and the registration result, we use rectangular to

represent the selected model image and the registration result instead of circle. Figs.

3.11(b) and 3.11(c) show the registration results using the conventional LPT based

approach and the proposed APT approach, respectively. The second experiment

uses two images of the syrup bottle taken at different viewpoints and with different

backgrounds. A model image with the size of 88 pixels in diameter is selected, as

shown in Fig. 3.12(a). The registration results are shown in Figs. 3.12(b) and 3.12(c).

As shown in Table I, we can see that our proposed method yields robustness in scale,

rotation, translation, and illumination changes. The accuracy of the registration

using our proposed approach is comparable to that of the conventional LPT based

method. However, the numbers of samples used in the proposed approach (i.e. nr =

52, 73, 85, 18 and 44 and nθ = 416, 584, 680, 144 and 352 samples for the experiments

on squirrel, statue of libery, bird, speed limit sign and syrup bottle, respectively) are

much smaller than that of the conventional LPT (i.e. nρ = 256, 512, 512, 64 and

256 and nθ = 2048, 4096, 4096, 512 and 2048 samples for the experiments on squirrel,

statue of libery, bird, speed limit sign and syrup bottle, respectively). Therefore, in

76

(a) (b) (c)

Figure 3.8: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of squirrel : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result)

general conditions, our proposed method yields almost similar accuracy as that of the

conventional LPT based approach, but with much less computational load.

3.3.2 Registration of images that are subjected to occlusionand alteration

In this Subsection, we evaluate the performance of the proposed approach for

images that are subjected to occlusion and alteration. First, two sets of test images:

Dreese building and flower are presented. Artificial occlusions are applied to both

test images. In the first experiment, the original Dreese building image with the

size of 200 × 200 pixels is used as reference image, as shown in Fig. 3.13(a). The

Dreese building image is then scaled and rotated with the parameters of 0.8 and 45

degrees, respectively. Artificial occlusion is applied to the scaled and rotated image to

create the target image. Figs. 3.13(b) and 3.13(c) show the registration results using

the conventional LPT based approach and the proposed APT approach, respectively.

77

(a) (b) (c)

Figure 3.9: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of statue of liberty : (a) thereference image where the entire area is used as the model image, (b) registrationresult using the conventional LPT based approach, and (c) registration result usingthe proposed approach (red rectangular indicates the registration result)

(a) (b) (c)

Figure 3.10: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of bird : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result)

78

(a) (b) (c)

Figure 3.11: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of speed limit sign: (a) thereference image with the model image selected as the area inside the red rectangular,(b) registration result using the conventional LPT based approach, and (c) registra-tion result using the proposed approach (red rectangular indicates the registrationresult)

(a) (b) (c)

Figure 3.12: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of syrup bottle: (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration result)

79

Proposed method Conventional LPTScale Rotation Trans. Scale Rotation Trans.

(degree) (pixel) (degree) (pixel)

squirrel 0 5 0 0.2 0 0(337×337), a = 1, θ = 45

stature of libery 0.1 0 0 0.2 0 0(450×437), a = 1.5, θ = 180

bird 0 1 0 0 0 0(768×1024), a = 3, θ = 90

speed limit sign 0.02 4 2 0.06 1 2(353×261), a = 0.65, θ = 26

syrup bottle 0.04 0 0 0.1 0 0(512×384), a = 1.8, θ = 30

Table 3.1: The image registration error compared to the ground truth in generalconditions (lower is better)

It can be observed that the included artificial occlusion has the texture and image

intensity close to that of the original image. As a result, both the conventional

LPT and our proposed approach yield accurate registration results. In the second

experiment, flower, a rectangular area with the size of 100 × 100 pixels are cropped

out of the image and used as reference image, as shown in Fig. 3.14(a). Scale and

rotation parameters of 1.4 and 90 degrees are applied to the image and used as the

target image. The registration results using the conventional LPT based approach

and the proposed APT approach are shown in Figs. 3.14(b) and 3.14(c), respectively.

In this experiment, even though the size of the occlusion is small, but the texture and

image intensity are very much different than that of the original image. Moreover,

the occlusion is located very close to the center point. Hence, the registration result

using the conventional LPT yields large error. Our proposed method, on the other

hand, still performs effectively and obtains accurate registration result. In the third

80

experiment, we use digital camera to capture two images of cereal box with and

without occlusion. For the non occluded image, we set the zooming parameter at

2x and rotate the camera at approximately 180 degrees, compared to that of the

occluded image. The registration result using the conventional LPT based approach

and the proposed APT approach are shown in Figs. 3.15(b) and 3.15(c), respectively.

Since the area that is occluded is large and located close to the center point, it can

be observed that the conventional LPT method yields large error in both translation

and scale parameters. Our method, on the other hand, still can register the image

correctly in this condition.

In the fourth and fifth experiments, the MR scanned images of human brain (avail-

able at http://www.maths.manchester.ac.uk/∼shardlow/moir/rueckert.pdf 1) and the

CT scanned images of human lung taken from [68] are used as the test sets of images

that are subjected to alteration. Each test image is taken from the same patient with

the same image sensor but at different times. For the experiment with the human

brain image, a model image with the size of 240 pixels in diameter is selected, as

shown in Fig. 3.16(a). Figs. 3.16(b) and 3.16(c) show the registration results for the

conventional LPT based approach and the proposed APT approach, respectively. In

the experiment with the human lung image, a model image with the size of 260 pixels

in diameter is selected, as shown in Fig. 3.17(a). The registration results are shown

in Figs. 3.17(b) and 3.17(c). As shown in Figs. 3.16(b) and 3.17(b), the registration

results using the conventional LPT based approach are inaccurate. Both scale and ro-

tation parameters are incorrect, and the translation parameter is also incorrect. This

illustrates that the conventional LPT approach is sensitive to the alteration between

1D. Rueckert, Applications of Medical Image Registration in Healthcare, Biomedical Researchand Drug Discovery

81

(a) (b) (c)

Figure 3.13: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of Dreese Building : (a) thereference image where the entire area is used as the model image, (b) registrationresult using the conventional LPT based approach, and (c) registration result usingthe proposed approach (red rectangular indicates the registration result and the greenarea indicates the altered area)

the model image and the target image. In Figs. 3.16(c) and 3.17(c), the registration

results indicate that our proposed method yields robustness to image alteration. The

registration results of the proposed approach are more accurate than that of the con-

ventional LPT based approach. Moreover, the altered area is accurately identified,

as shown in the green area of Figs. 3.13(c), 3.14(c), 3.15(c), 3.16(c) and 3.17(c). For

medical imaging, this ability is valuable as it can identify the area that needs medical

attention automatically. This would ease doctors in analyzing the surgery progress

or shorten the time to analyze the x-ray or CT scanned. We note that the threshold

ε of 0.1 is used in these five experiments.

We also experiment with the images that are subjected to non-artificial occlusion

occurred from taking photos at different view point. Two sets of test images: OSU

tower, and stop sign that are taken around the Ohio State University campus are

used for the experiment. For the experiment with images of OSU tower, a model

82

(a) (b) (c)

Figure 3.14: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of flower : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result and the green area indicatesthe altered area)

(a) (b) (c)

Figure 3.15: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of cereal box : (a) the referenceimage where the entire area is used as the model image, (b) registration result usingthe conventional LPT based approach, and (c) registration result using the proposedapproach (red rectangular indicates the registration result and the green area indicatesthe altered area)

83

(a) (b) (c)

Figure 3.16: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human brain: (a) the ref-erence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)

(a) (b) (c)

Figure 3.17: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of human lung : (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)

84

image with the size of 160 pixels in diameter is selected, as shown in Fig. 3.18(a),

and for the experiment with images of stop sign, a model image with the size of 58

pixels in diameter is selected, as shown in Fig. 3.19(a). As shown in Figs. 3.18 and

3.19, the areas that are subjected to occlusion in the target images are not only large

compared to the model images, but also are located close to the center point of the

transformation. As a result, the conventional LPT based approach fails to register

the images correctly, as shown in Figs. 3.18(b) and 3.19(b). The scale, rotation, and

translation parameters obtained from the conventional LPT based approach are far

incorrect. This illustrates how sensitive the conventional LPT is to image occlusion.

For our proposed approach, since the images are evenly sampled, which yields unbias

matching, the proposed approach can successfully register both sets of test images,

and the altered areas are also correctly identified, as shown in Figs. 3.18(c) and

3.19(c). We note that the threshold ε of 0.15 is used in these two experiments. The

experiments with the seven sets of test images indicate that our proposed method

outperforms the conventional LPT based approach and yields robustness to image

occlusion and alteration.

In terms of the computational load, similarly to the experiments in Subsection

3.3.1, the numbers of samples used in the proposed APT approach (i.e. nr= 100,

50, 95, 120, 130, 80, and 29 samples and nθ= 800, 400, 760, 960, 1040, 640, and

232 samples for the experiments on Dreese building, flower, cereal box, human brain,

human lung, OSU tower, and stop sign, respectively) are much smaller than that of

the conventional LPT (i.e. nρ=512, 256, 512, 1024, 1024, 512, and 128 samples and

nθ=4096, 2048, 4096, 8192, 8192, 4096, and 1024 samples for the experiments on

Dreese building, flower, cereal box, human brain, human lung, OSU tower, and stop

85

(a) (b) (c)

Figure 3.18: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of OSU tower : (a) the refer-ence image with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)

sign, respectively). Therefore, not only can the proposed APT approach provide more

accurate registration for the images that are subjected to occlusion and alteration

than the conventional LPT based approach, but the computational load is also much

smaller.

3.3.3 Registration results and errors

The errors of the registration results compared to the ground truth of both the

proposed method and the conventional LPT are shown in Table I and II. In the

normal conditions, both methods yield small errors and comparative results with

almost similar accuracy as shown in Table I. However, only the proposed method can

successfully register images that are altered or occluded, i.e., for images of flower,

cereal box, human lung, OSU tower and stop sign as shown in Table II. This is

because APT does not concentrate only at the fovea as the conventional LPT, but

consider importance to the entire image equally.

86

(a) (b) (c)

Figure 3.19: Comparison of image registration results using the proposed approachand the conventional LPT based approach on the image of stop sign: (a) the referenceimage with the model image selected as the area inside the red rectangular, (b)registration result using the conventional LPT based approach, and (c) registrationresult using the proposed approach (red rectangular indicates the registration resultand the green area indicates the altered area)

Proposed method Conventional LPTScale Rotation Trans. Scale Rotation Trans.

(degree) (pixel) (degree) (pixel)

Dress building 0.05 5 2 0.05 0 0(241×241), a = 0.8, θ = 45

flower 0.05 3 5 0.35 0 5(560×420), a = 1.4, θ = 90

cereal box 0 3 2 0.2 8 55(255×340), a = 0.5, θ = 180

human brain 0.01 0 0 0.12 3 0(276×276), a = 1.13, θ = 0

human lung 0 0 0 0 4 8(260×220), a = 1.1, θ = 0

OSU tower 0 1 4 0.14 40 13(275×198), a = 2, θ = 22

stop sign 0.02 2 2 0.1 30 56(424×308), a = 1.1, θ = 4

Table 3.2: The image registration error compared to the ground truth when imagesare altered or occluded (lower is better)

87

3.4 Extended Works

Motivated by the success of APT based approach, we further propose an inno-

vative solution that eliminate the interpolation computation needed in the mapping

process. This extended work uses one-to-one mapping mechanism that directly ar-

ranges image from Cartesian to Polar coordinate according to pre-computed APT

image. To transform a circular area with the size Rmax inside image I(x, y) in the

Cartesian coordinate to APT image, we first need to create a PPT (Projective Polar

Transform) map. We define a distance parameter D(xc,yc)(x, y) as

D(xc,yc)(x, y) = max{n ∈ Z+|n ≤

√(x− xc)2 + (y − yc)2} (3.31)

where (xc, yc) is the image pixel in Cartesian selected to be the center of the trans-

formation. Fig. 3.20(a) shows example of distance parameters of image with the size

10× 10 pixels where the center point (xc, yc) = (7, 7). In each pixel of the image, the

calculated distance parameter is filled. Fig. 3.20(b) is the graphic demonstration of

PPT map for image with the size 513×513 pixels where the center of the transforma-

tion (xc, yc) = (256, 256). As shown in Fig. 3.20(b), PPT map is an approximation

of Polar transform mapping.

To transform image, we directly map the image pixels in the Cartesian coordinate

I(x, y) that have distance parameter D(xc,yc)(x, y) = i to the transformed coordinate

IP (i, θ). To approximate sampling in both radius and angular directions, for each

iteration i, the mapping begins with I(xc, yc + i) and continue the search for image

pixels with D(xc,yc)(x, y) = i pixel-by-pixel in the counter-clockwise direction. The

process is repeated for i = 1, .., Rmax. Example of the transformed Lena image is

88

(a) (b)

Figure 3.20: PPT map (a) Distance parameters for image of size 10× 10 pixels and(xc, yc) = (7, 7), (b) Graphic demonstration of PPT map for image with size 513×513pixels and (xc, yc) = (256, 256)

(a) (b)

Figure 3.21: Example of PPT: (a) the original Lena image, (b) the PPT transformedimage of (a)

shown in Fig. 3.21. Since PPT map is static and does not required to be computed

for every transformation, and the mapping process is a simple one-to-one allocation

from Cartesian to PPT images and does not require interpolation, the computational

cost in this step is very low compared to that of LPT. While applying the similar

1D projection matching and feature based localization approaches, the new method

yields much lower computational cost than the APT approach.

89

3.5 Summary

Although LPT has been widely used in many image processing applications, it

suffers from nonuniform sampling. Hence, there are two major problems of the con-

ventional LPT: the high computational cost in the transformation process, which

comes from the oversampling at the fovea in the spatial domain, and the bias match-

ing, in which the matching mechanism focuses only on the fovea or central area while

the peripheral area is given less consideration. Previous works on image registration

using the conventional LPT indicate successful results in the ideal conditions as when

registering images with different orientations and scales. In reality, however, occlu-

sions and alterations between the two images need to be taken into consideration.

Inspired by this fact, this Chapter presents a new image registration algorithm based

on the novel APT approach. By evenly and effectively sampling the image in the

Cartesian coordinates and using the innovative projection transform to reduce the

dimensions, the proposed APT yields faster sampling than that of the conventional

LPT and provides more effective and unbias matching. A matching search scheme

using the scale and rotation invariant Gabor feature points is introduced to reduce

the search space for recovering the translation of the model image in the target image.

The image comparison scheme on the APT projection domain is then proposed to

effectively locate the occlusion and alteration between the two images without scari-

fying additional computational cost. This information is useful for some applications

that intend to further analyze the registered images such as in medical image analysis

or scene change detection. Experimental results indicate that our proposed image

registration approach outperforms the conventional LPT based approach and yields

robustness to image occlusion and alteration.

90

CHAPTER 4

SCALE AND ROTATION INVARIANT IMAGEREGISTRATION USING PRE-SHIFTED LOGARITHMIC

SPIRAL

LPT is a well known tool for image processing for its rotation and scale invariant

properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in the Cartesian

coordinates appear as translation or shifting in the log-polar domain. These invariant

properties make LPT based image registration robust to scale and rotation. However,

there is limitation to the LPT based approach. The accuracy of the registration

depends directly on the number of samples used in the mapping process. This problem

can be easily illustrated. For example, if 36 samples in the angular direction are used,

the level of accuracy of the registration can not exceed the 10 degrees resolution. Since

obtaining scale and rotation parameters involves 2D correlation method either in the

spatial domain or in the frequency domain, the computational complexity of the

matching procedure grows exponentially as the number of samples increases. This

problem of LPT has been extensively studied in the past decade. The work in [40]

introduces pseudo-LPT method that directly maps image pixels in the Cartesian to

LPT images in the frequency space resulting in reduction of computational time.

In [48], the authors tackle the problem from the sampling distribution perspective

91

aiming to effectively reduce the computational waste as a result of oversampling at

the fovea in the transformation process.

Motivated by the success of the LPT based approach and its limitation, we propose

a novel pre-shifted logarithmic spiral (PSLS) based image registration method that

is robust to translation, scale, and rotation and yields high accuracy while requires

lower computational cost. With PSLS transform, rotation and scale in the Cartesian

coordinates appear as translations in the transformed domain. The proposed method

uses two separate sets of logarithmic spiral (LS) mappings in the opposite direction.

Both the model image and the target image are mapped twice with each of the LS

mappings. By pre-shifting the sampling point of one of the mapping by π/nθ radian,

where nθ is the number of samples in the angular direction, the registration accuracy

increases two fold. As a result, to achieve the proposed approach can achieve the same

registration accuracy as that of the LPT while using half the number of samples in the

angular direction. Therefore, the computational complexity in the matching process

is greatly reduced. We further introduce a new search method that efficiently uses

the scale and rotation invariant feature points to eliminate the exhaustive search for

all the possible translations of the model image.

This chapter is organized as follows: Section 4.1 presents the new LS and PSLS

approaches, their properties and advantages are also discussed in this section, Section

4.2 presents image registration approach using PSLS appraoch. The proposed method

consists of two major components: feature point based search scheme, in which the

translation between the model image and target image is recovered, and PSLS match-

ing, an effective mechanism using PSLS transform to match and accurately obtain the

scale and the rotation parameters between the two images. Experiments and results

92

are given in Section 4.3 including comparisons to the conventional LPT. Finally our

work is summarized in Section 4.4.

4.1 The Proposed Logarithmic Spiral Approach

LPT has been widely used in many image processing applications. The works

in [48, 94] used LPT for scale and rotation recovery for image registration pur-

poses. However, there are two major problems with the LPT based image regis-

tration method: inefficient sampling point distribution and high computational cost

in the matching procedure. As shown in Fig. 1.1, the sampling points are distributed

as multiple circles in which the distance between two consecutive circumferences in-

creases exponentially as the points are further away from the center. As a result,

image pixels that fall between the sampling circumferences are missed in the sam-

pling process. Hence, some information in the radius direction are lost. To reduce the

problems caused by nonuniform sampling, we propose a LS mapping method. Unlike

LPT mapping that the distance from the center to the sampling circumference in-

creases in the discrete or step-like manner, the distance from the the center point to

the sampling point of LS mapping increases continuously in the spiral manner. As a

results, information in the radius direction will not be all discarded, while maintain-

ing the scale and rotation invariant properties as that of LPT. We will later discuss

in detail about LS sampling in Subsection 4.1.1.

Another problem of LPT is the high computational cost in the matching process.

The accuracy of the registration depends directly on the number of samples used in

the mapping process. And since obtaining scale and rotation parameters involves 2D

93

correlation method either in the spatial domain or in the frequency domain, the com-

putational complexity of the matching procedure grows exponentially as the number

of samples increases. This problem of high computational cost of LPT has been exten-

sively studied in the past decade. The work in [40] introduces presudo-LPT method

that directly map image pixels in the Cartesian to LPT images in the frequency space

resulting in reduction of computational time. In [48], the authors tackle the prob-

lem from the sampling point distribution perspective aiming to effectively reduce the

computational waste as a result of oversampling at the fovea in the transformation

process. In this paper, we propose a novel pre-shifted sampling mechanism that sig-

nificantly reduces the computational cost in the matching process. By pre-shifting

the sampling point by π/nθ radian, the total number of samples in angular direction

for each mapping is reduced by half. As a result, the computational complexity in the

matching process is greatly reduced. Combining LS mapping with the innovative pre-

shifted sampling mechanism, we propose a novel pre-shifted logarithmic spiral (PSLS)

based image registration method that is robust to translation, scale, and rotation and

yields high accuracy while requires lower computational cost than that of LPT.

4.1.1 Logarithmic spiral mapping

A logarithmic spiral (LS) [4, 16, 66] is a unique kind of spiral curve that can

be found in nature such as a nautilus shell. In polar coordinates, the mathematical

expression of LS can be given as r = cebθ, where r is the radius size, θ is the orientation,

b and c are the arbitrary constant controlling the tightness of the spiral and the point

where spiral begins, respectively. For the sake of simplicity, throughout this paper we

use c = 1 to have the spiral begins at r = 1 when θ = 0. One special property of LS

94

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8LS − CW direction

(a)

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8LS − CCW direction

(b)

−8 −6 −4 −2 0 2 4 6 8−8

−6

−4

−2

0

2

4

6

8Pre−shifted LS − CCW direction

(c)

Figure 4.1: LS mapping with nr = 8 and nθ = 16: (a) CW direction, (b) CCWdirection, and (c) PSLS in the CCW direction

is that the angle ψ between the tangent and the radial line at any point p on spiral is

constant and equal to cot−1(b). Fig. 4.1 shows examples of LS. Applications of LS vary

from the fields of arts, computer graphic and architecture [6,29], mathematics [7,25],

design of antennas [17,33,41,42,75], data fitting, to the studies of plants and animals.

To the best of our knowledge, there is no research work of LS in image registration

applications.

To apply LS to image mapping, new parameters need to be introduced. We denote

Rmax as the furthest distance from the center point of transformation I(x0, y0) of the

image in the Cartesian to the sampling point on the LS curve, N as the total number

of samples on the entire LS curve, nθ as the number of samples within the 2π radian

on the curve, and nr = N/nθ. The LS mapping procedure I(x, y) �→ S(θ) where S is

a one dimensional transformed vector is as follow:

S(i) = I(xi, yi) for i = 1, . . . , N where (4.1)

95

{xi = eiΔθ cotψ cos(iΔθ) + x0, yi = eiΔθ cotψ sin(iΔθ) + y0

b = log(Rmax)2πnr

, ψ = cot−1(b), Δθ = 2πnθ.

(4.2)

The result of the transformation is a vector with the dimension N . Fig. 4.3

shows examples of the LS transformed vector P of the barbara image. Based on the

transformed vectors S of the model image and the target image, we can obtain the

scale and rotation parameters between the two images. This topic is discussed in

Subsection 4.1.2.

LS mapping yields significant advantages over LPT mapping in terms of the effec-

tiveness of sampling points distribution. Fig. 4.2 shows the comparison of the sam-

pling points distribution between LPT and LS mappings using 192 samples (nr = 3

and nθ = 64). Figs. 4.2(a) and 4.2(b) shows the mapping on the original Target logo

image, and the Target logo image that is scaled by 1.3, respectively. It can be seen

that using LPT mapping, from the center point outward, the sampling points in the

radius direction (circles) in the first image fall into the inner black region, the white

region, and the outer black region, respectively. However, in the second image, all the

sampling points of LPT map fall into the inner black region, and outer black region

only, and none of the sampling points falls into the white region. As a result, the two

images can not be registered correctly. On the other hand, distributing the sampling

points in the spiral manner as shown in Figs. 4.2(a) and 4.2(b), the sampling points

of LS mapping for both images fall into all the three regions. As a result, with only

small number of samples, the LS method can provide more efficient registration than

the LPT method. Fig. 4.2(c) shows the comparison of the sampling distribution be-

tween LPT and LS methods. It can be seen that using LPT mapping, image will be

sampled at only specific radius parameters such as at r = 1.14, 2, 2.82, 4, 5.65, 8, . . .,

96

where as image pixels at the distance in between r are missed. However, with LS

mapping, image pixels at every distance from the center are sampled.

4.1.2 LS matching

Given a model image IM(x, y), a target image IT (x, y) that is a scaled and rotated

version of the model image with scale parameter a and rotation parameter θ can be

expressed mathematically as

IT (x, y) = IM(ax cos θ + ay sin θ,−ax sin θ + ay cos θ). (4.3)

In the log-polar domain, Eq. (4.3) can be expressed as:

IT (log(r), θ) = IM(log(r) + log(a), θ − θ). (4.4)

Similarly to LPT, rotation and scale in the Cartesian coordinates appear as trans-

lation or shifting in the LS domain, as shown in Fig. 4.3. To find the displacement d,

such that ST (i) = SM (i−d), one can evaluate the phase-correlation function between

the vectors SM and ST :

[d,m] = arg maxP(SM ,ST ) (4.5)

where P and m are the phase-correlation function and its peak value, respectively.

However, d can result from either scale or rotation or both. As a result, two LS

mappings in opposite directions i.e. clockwise (CW) and counter-clockwise (CCW)

are needed. The second mapping can be achieved by simply inverting the sign of the

non-constant component of the variable y in Eq. 4.1, i.e. xi = eθi cotψ cos θi + x0 and

yi = −eθi cotψ sin θi + y0. To obtain the scale and rotation parameters between the

two images, each image needs to be mapped with two LS mappings, CW and CCW,

respectively. The transform vectors of the same direction are matched:

97

Comparison of LPT and LS mappings

LSLPT

(a)

Comparison of LPT and LS mappings

LSLPT

(b)

0 100 200 300 400 5000

2

4

6

8

10

12

14

16

Sampling point #

Dis

tanc

e fr

om c

ente

r to

the

sam

plin

g po

int (

pixe

ls)

Comparison of LPT and LS − Distance from center to the sampling point

LPTLS

(c)

Figure 4.2: Comparison of the sampling points distribution between LPT and LSmappings: (a) original Target logo image, (b)Target logo image with scale parameter1.3, and (c) comparison the locations of the sampling points from the center

98

[dcw, mcw] = arg maxP(SMcw,STcw),

[dccw, mccw] = arg maxP(SMccw,STccw) (4.6)

where the subscription cw and ccw represent the direction of the mapping. With the

displacement parameters d from both CW and CWW directions obtained, we can

now distinguish the displacement resulting from rotation and the one resulting from

scale. We denote dr and ds the displacement parameters from rotation and scale,

respectively. These two parameters can be obtained as follows:

Algorithm 1 : Calculate dr and ds from dcw and dccwif dcw = dccw then⇒ ds = dcw {Scale only}

else if dcw = −dccw then⇒ dr = dcw {Rotation only (respected to CW direction)}

else if ‖dcw‖ �= ‖dccw‖ thendcw = ds + drdccw = ds − dr {Both scale and rotation.}

end if

Finally, scale and rotation parameters in the Cartesian coordinate, a and θ, can

be computed as:

a = exp[ds log(Rmax)

nr], θ =

2πdrnθ

(4.7)

where ds = ceil[ds/nθ], and the operation ceil(A) denotes the nearest integers greater

than or equal to A.

99

50 100 150 200 2500

50

100

150

200

250

(a)

50 100 150 200 2500

50

100

150

200

250

(b)

50 100 150 200 2500

50

100

150

200

250

(c)

50 100 150 200 2500

50

100

150

200

250

(d)

Figure 4.3: LS transformed vectors of barbara images, where nr = nθ = 16 andRmax = 64: (a) from original barbara image (this transformed vector is presented in(b)-(d) as broken blue line for comparison), (b) from barbara image with 90 degreesrotation, (c) from barbara image with scale parameter of 2, and (d) from barbaraimage with scale and rotation parameters of 2 and 90 degrees, respectively

100

4.1.3 Pre-shifted logarithmic spiral

Both LPT and LS methods involve correlation for matching while the fastest and

most effective correlation technique is the phase correlation, which involve compu-

tation of Fourier transforms. The computational complexity of Fourier transform is

O(N log2(N)), where N is the total number of samples. Since the LS method re-

quires mapping for both CW and CWW directions, the computational cost of the LS

method is two times higher than that of LPT approach in order to have the same

registration accuracy. To reduce the computational complexity while maintaining the

same registration accuracy, we propose a novel PSLS technique. By modifying Eq.

4.1, a pre-shifted LS mapping is given by:

xi = eθi cotψ cos(θi + π/nθ) + x0,

yi = eθi cotψ sin(θi + π/nθ) + y0. (4.8)

We denote SPSLS(i) the transformed vector resulting from the pre-shifted LS map.

Example of PSLS mapping is shown in Fig. 4.1(c). Since the proposed approach

requires each image to be mapped with two separate spiral mappings, one in CW

direction and another in CWW direction, respectively, having one spiral mapping

pre-shifted by π/nθ radians would give the similar results as sampling the image with

2nθ samples in the angular direction. As a result, only half of the number of samples,

N/2, for each mapping are needed to maintaining the same registration accuracy as

that of the LPT approach with N number of samples. And since the computational

cost in the mapping process using Fourier transform method depends directly on the

number of samples, the computational cost is thus reduced. A comparison to LPT

approach is given in Section 4.3.

101

−20 −15 −10 −5 0 5 10 15 20−20

−15

−10

−5

0

5

10

15

20Comparison of LPT and LS mappings

LSLPT

(a)

−20 −15 −10 −5 0 5 10 15 20−20

−15

−10

−5

0

5

10

15

20Comparison of LPT and PSLS mappings

PSLSLPT

(b)

Figure 4.4: Comparison of sampling distributions: (a) LPT and LS mappings and(b) LPT and PSLS mappings

Another advantage of PSLS approach is the effectiveness in distributing the sam-

pling points. Figs. 4.4(a) and 4.4(b) show the comparison of LPT mapping with LS

mappings and PSLS mappings for image with the size 33× 33 pixels using 8 samples

in the radius direction and 64 samples in the angular direction. Unlike LPT mapping

in which the sampling in radius direction expands in the discrete manner, LS and

PSLS mapping continuously expand the sampling point in the radius direction in the

non-discrete manner. As a result, LS and PSLS methods sample the image for every

radius distance continuously. Thus, more information in the radius direction of the

image are obtained in the sampling process.

The effectiveness of sampling distribution can also be demonstrated in terms of

the amount of information that can be obtained from the sampling process. Since

distance from the center to the sampling points of LPT, LS, and PSLS increases

exponentially due to logarithmic functions, image pixels close to the center point tend

to be oversampled while image pixels that are further away tend to be undersampled

102

or missed. And since no information is gained when image pixels are oversampled,

computational waste and lost of information are occured. To evaluate and compare

the performance of the three methods, we apply LPT, LS, and PSLS to images with

the size 64 × 64 and 128 × 128, respectively using 9 different values of the number

of samples. The sampling points for each method are recorded and compared. We

define the percentage of oversampling rate, Q, as

Q =nrepntotal

× 100, (4.9)

where nrep is the number of image pixels that are sampled more than once, and ntotal

is the total number of samples. Fig. 4.5 shows the Comparison of percentage of

oversampling between LPT and PSLS mappings. It can be seen that both LS and

PSLS yields lower oversampling rate than that of the LPT. Thus, with the same

number of samples, both LS and PSLS will sampling more pixels than that of the

LPT. As a result, more information is obtained.

4.2 The New Image Registration Approach

4.2.1 Feature point based search scheme

Similarly to LPT based approach, the translation parameter between model image

and target image has to be found in order to keep the scale and rotation invariant

properties. The simplest solution is to perform the exhaustive search, in which the

PSLS is computed for every pixel in the target image and the translation parameter

is found from the image pixel that yields the best matching result to the PSLS of the

model image. Since it is computationally expensive to perform the exhaustive search,

an innovative method to reduce the space of search is needed. Many works have

103

8x8 8x16 16x16 16x32 32x32 32x64 64x64 64x128 128x10

10

20

30

40

50

60

70

80

90

100

Comparison of sampling effectiveness, Rmax

= 32

Number of samples, nr × nθ

(%)

Per

cent

age

of o

vers

ampl

ing

LPTLSPSLS

(a)

8x8 8x16 16x16 6x32 32x32 32x64 64x64 64x128 128x10

10

20

30

40

50

60

70

80

90

100

Comparison of sampling effectiveness, Rmax

= 64


(%)

Per

cent

age

of o

vers

ampl

ing

LPTLSPSLS

(b)

Figure 4.5: Comparison of percentage of oversampling between LPT and PSLS map-pings: (a) images with the size 64× 64, and (b)images with the size 128× 128

proposed methods to speed up the search procedure. Zokai et al [94] uses a multi-

resolution imaging technique to reduce the computation cost of translation recovery

by searching from the coarsest to the finest level. This technique, however, performs

well only when the target image contains enough low frequency information and that

low frequency component is not distorted by noise. Fourier-Mellin [64, 69] uses the

Fourier phase correlation to recover the translation before computing the log-polar

matching in the frequency domain. However, a recent study [73] indicates the aliasing

problem from using the phase magnitude of FFT to recover the translation (of the

center point). We introduce a new method to accelerate the search procedure. The

new method is based on a reduced set of feature points.

In order to avoid the exhaustive search of the target image for the model image, we

reduce the search from every pixel of the target image to a set of feature points only.

These feature points are obtained by applying Gabor transform to every pixel and

selecting those pixels which generate high energy in the wavelet domain. Apparently,

104

the number of feature points is much smaller than that of the pixels in the target

image, while the computation of the Gabor wavelet transform is much lower than

that of PSLS. Thus, the computation load is much lighter than the exhaustive search

using PSLS. The reason we choose Gabor wavelet for extracting feature points is due

to its invariant properties to scale, rotation, and noise. More details of the Gabor

feature extraction method can be found in [47].

First, feature points are extracted from both the reference image and the target

image. We denote PM = {pM1 , .., pMnM} and P T = {pT1 , .., pTnT

} as sets of feature points

in the reference image and in the target image, respectively. The superscription M

and T denote the model (reference) image and the target image, and nM and nT are

the number of feature points in the reference image and the number of feature points

in the target image, respectively. To crop the model image out of the reference image,

we select one of the feature points in the reference image pMk as a center point for

PSLS and choose the size of the radius for the transformation Rmax that sufficiently

covers the model image to compute PSLS, in which SMcw and SMccw represent the model

image. To recover the translation between the model image and the target image,

we need to find the corresponding feature point pTk′ in the target image that relates

with the feature point pMk in the model image. In other word, by selecting one of

the feature points in the model image as a center point for PSLS, (x0, y0), we reduce

the search from every pixel of the target image to a set of feature points found in

the target image only. The matched feature point is the point that yields the highest

correlation coefficient E , where E for matching the model image to the target image

at feature point z is Ez = max{mcw−z, mccw−z}.

105

The number of feature points identified varies depending on the richness of details

in the image. In general, the number of feature points identified using the proposed

method is approximately 0.1% -1% of the total number of pixels in the image. And

for most images, these amounts of feature points are sufficient enough to find the

corresponding pixels; i.e. pMk in the model image and pTk′ in the target image. We

compromise the number of feature points identified in the image with the compu-

tational load for localization (refer to the Gabor feature extraction method in [47]).

The number of feature points is not too small to detect the corresponding feature

points, while not too large to significantly increase the computational load.

4.2.2 PSLS matching

Similarl to LS matching, each image is mapped with two LS mapping in the CW

and CWW directions, respectively. However, for the PSLS approach, one of the four

LS mappings is pre-shifted by π/nθ radian. For instance, the LS map shown in Fig.

4.1(a) is used for sampling both the model image and the target image. The LS

mappings shown in Figs. 4.1(b) and 4.1(c) are used for sampling the model image

and the target image, respectively. The matching procedure and the computation

of finding the scale parameter a remain the same as discussed in Subsection 4.1.2.

However, to find the rotation parameter θ, we take into consideration the peak value

of the phase-correlation function as similarity indicator of the two matchings: between

both non-shifted LS (CW matching in this example) and between non-shifted LS and

pre-shifted LS (CCW matching in this example). The procedure of this matching can

be described as follows:

106

Algorithm 2 : Calculate θ

if mcw ≥ mccw then⇒ θ = 2πdr

nθ

else if mcw < mccw then⇒ θ = 2πdr

nθ− π/nθ

end if

With PSLS, we can obtain the same registration accuracy as having nθ number

of samples by using only nθ/2 samples for each mapping. With this reduction the

total number of sample for each of the LS mapping reduces by half. As a result, the

computational cost of the matching using phase-correlation is significantly reduced.

4.2.3 Complete algorithm

Below is the complete steps of the proposed algorithm:

• Extract features in both the model image and the target image.

• Create the model image by selecting one of the feature points in the reference

image pMk as the origin. Then crop circular image patch IM that covers the area

Rmax desired to be registered to the target image.

• Compute the LS vectors SMcw and SMccw using the proposed LS mapping methods.

• Use each feature point in the target image pTz for z = 1 : nT , where nT is the

number of feature points in the target image as the origin and crop a circular

image patches ITz for z = 1 : nT with the radius size Rmax.

• Compute the sets of candidate PSLS vectors STcw = {STcw−1, . . . ,STccw−nT} and

STPSLS:ccw = {STPSLS:ccw−1, . . . ,STPSLS:cw−nT}.

107

• Match each candidate with the model using the proposed PSLS matching algo-

rithm. Translation is the point (x, y) that yield the highest correlation coeffi-

cient E . The scale and rotation parameters are obtained.


In this section, the performance of the proposed PSLS approach is evaluated in

two categories: the registration accuracy and the computation cost. All experiments

are carried out using Matlab 2007a running on 2GHz Intel Pentium IV machine.

The computation time required for registering an image pair varies depending on the

image size, the richness of texture content in the image, which influence the number

of feature point detected in the image, and the size of Rmax.

4.3.1 Registration accuracy

To evaluate the effectiveness of the proposed image registration approach, we

use 20 test images, in which 10 images are randomly selected from the 1Ohio State

University database, 5 images are captured using digital camera in general lighting

condition, and 5 images are aerial images. We randomly cropped a rectangular area

from each of the test images to be the set of model images. Each test images are then

scaled with random values between 0.5 to 3 and rotated with random values between

30 to 180 degrees, respectively. We use the number of samples of nr = nθ = 64

for all the experiments and the sizes of Rmax are chosen according to the size of the

model image. The translation information is obtained using the proposed feature

based search scheme. The size of the search space using Gabor feature point varies

depending on the richness of details in the image, with the average of 100-150 points,

1http://sampl.ece.ohio-state.edu/database.htm

108

which considers as approximately 0.2% of the total number of pixels in the image. To

achieve accuracy at pixel level, we apply a small search windows of 5×5 pixels around

the translation point. Fig. 4.6 shows examples of the registration results using the

proposed approach. From the experiments, the average runtime is approximately 5-8

seconds for 400×400 pixels to 800×600 pixels images. By comparing the registration

results with the ground truth, our method yields average errors of 2.4 pixels, 0.08,

and 2.9 degrees for translation recovery, obtaining scale and rotation parameters,

respectively. This demonstrates the effectiveness and robustness of our proposed

PSLS image registration approach.

To compare the performance of our proposed method with LPT, we use three

images: barbara, lena and cameraman as the target images, all with the size of

256 × 256 pixels. A set of 10 model image patches of the size 128 × 128 pixels are

randomly cropped from each of the test image. We randomly apply one of the 30

different orientations, θ = {5.63, . . . , 168.9} degrees, and one of 30 different scale

parameters, a = {0.7, . . . , 3.7}, to each of the 30 model images. These model images

are then registered to the target images using the proposed PSLS approach and LPT

approach, both with same parameters of nr = 64, nθ = 32, and Rmax = 80 and

translation between the model image and the target image is obtained using the

proposed feature point based search scheme.

Figs. 4.7(a)-4.7(d) show the registration results in obtaining scale and rotation

parameters. As shown in Figs. 4.7(a) and 4.7(c), the proposed method yield higher

registration accuracy than of that the LPT approach with the same number of sam-

ples, and Figs. 4.7(b) and 4.7(d) indicate that the proposed method yields similar

scale invariant property to that of LPT. Therefore, our proposed method can provide

109

(a) (b) (c)

Figure 4.6: Image registration results using the proposed PSLS approach: (a) thereference images where the entire area are used as the model images, (b) targetimages and their features, and (c) registration results (red rectangular indicates theregistration result)

110

0 5 10 15 20 25 300

20

40

60

80

100

120

140

160

180

experiment #

rota

tion

PSLS approach: testing rotation invariant property

actual rotationexperimental results

(a)

0 5 10 15 20 25 30 350.5

1

1.5

2

2.5

3

3.5

4

experiment #

scal

e

PSLS approach: testing scale invariant property

actual scaleexperimental results

(b)

0 5 10 15 20 25 300

20

40

60

80

100

120

140

160

180

experiment #

rota

tion

LPT approach: testing rotation invariant property

actual rotationexperimental results

(c)

0 5 10 15 20 25 30 350.5

1

1.5

2

2.5

3

3.5

4

experiment #

scal

e

LPT approach: testing scale invariant property

actual scaleexperimental results

(d)

Figure 4.7: Experimental results: (a) PSLS rotation test results, (b) PSLS scale testresults, (c) LPT rotation test results, and (d) LPT scale test results

111

better registration accuracy than that of the LPT based method. This is because,

with the same number of samples in angular direction, nθ, the proposed PSLS ap-

proach samples the image at π/nθ radian interval, while the LPT approach samples

image at 2π/nθ radian interval. As a results, the proposed PSLS approach yields two

times better registration accuracy than that of LPT.

4.3.2 Computational cost

We evaluate the computation benefit of using PSLS approach. We define the

computational benefit (CB) in percentage of using PSLS in performing FFT as:

CB(%) =(TLPT − TPSLS)

TLPT× 100 (4.10)

where T is the computational time in the experiments. In practice, the computation

of the Fast-Fourier transform method on most mathematical software are performed

using FFTW method, which applying Cooley-Turkey algorithm that factorizes N into

smaller components. This FFTW method is proven to be faster than the conventional

FFT. However, as stated in [22], there is currently no theory of how to predict the

factorization (optimal plan) for each value of N and the computation time varies from

systems and computational platforms. Fig. 4.8 shows (CBexp) of applying FFTW to

PSLS approach at different number of samples using Matlab 2008a running on Intel

Core 2 Duo 2.0 GHz system. It can be seen that PSLS yields computational benefit

over LPT up to 40%. This indicates the effectiveness of the proposed pre-shifted

mechanism in reducing the computational cost while maintaining the registration

accuracy.

112

16x16 32x32 64x64 128x128 256x256 512x512 1024x1024−60

−50

−40

−30

−20

−10

0

10

20

30

40

50


Com

puta

tiona

l ben

efit

in p

erce

ntag

e

Comparison of computational cost

Computational benefit

Figure 4.8: Computational benefit using PSLS approach

4.4 Summary

Although LPT has been widely used in many image processing applications for

its rotation and scale invariant properties, there are two major problems with the

LPT based image registration method: inefficient sampling point distribution and

high computational cost in the matching procedure. In this paper, we presented a

new image registration approach using PSLS approach. Unlike the LPT approach in

which the sampling points in the radius direction are distributed as multiple circles

and the distance between two consecutive circumferences increases exponentially as

the circles get further away from the center, the distance from each sampling point

to the center points of the proposed approach increases continuously in the spiral

113

manner. As a result, information at every distance from the center can be obtained.

Moreover, the spiral mapping distributes the sampling point efficiently and reduces

the problem of oversampling at the fovea, while maintaining the scale and rotation

invariant properties of the LPT. The proposed method employs two separate spiral

mappings for each image, one in the CW direction and another in CCW direction,

respectively. And by pre-shifted the one of the spiral mappings by π/nθ radian, we

can register images with the same accuracy as LPT approach while using only half of

the number of samples for each mapping. As a result, the computational complexity

in the matching process is greatly reduced. An effective 1D matching mechanisms

are proposed to effectively and accurately obtain scale and rotation parameters of

the registered images. Experiments are provided to demonstrate the accuracy and

robustness of the proposed approach.

114

CHAPTER 5

CONCLUSION AND FUTURE WORKS

Image registration is a process of aligning two images that share common visual

information such as images of the same object or images of the same scene taken at

different geometric viewpoints, different time, or by different image sensors. Image

registration is an essential step in many image processing applications that involve

multiple images for comparison, integration or analysis such as image fusion, image

mosaics, image or scene change detection, and medical imaging. Many image regis-

tration methods have been proposed in the past 20 years [5,8,35,39,45,64,93,94]. In

the past decade, LPT become a well-known tool for image registration for its rotation

and scale invariant properties [3, 10, 34, 47, 57, 60, 76, 77, 89, 94]. Scale and rotation in

the Cartesian coordinates appear as translation or shifting in the log-polar domain.

These invariant properties make LPT based image registration robust to scale and

rotation. The major work of the dissertation is to extend the horizon of LPT and

address the major problems of the method:

1. nonuniform sampling, which leads to high computational cost in the transfor-

mation process

2. bias matching

115

3. sensitive to translation

4. high computational cost in the matching process

5. ineffective sampling point distribution due to the oversampling in the spatial

domain

In Chapter 2, we explore the use of LPT to 2D object recognition applications.

We combine a feature extraction technique with innovative feature point filtering

approach to address the problem of high sensitivity to translation of LPT and reduce

the computational time in the translation recovery process. In Chapter 3, we tackle

the problem of LPT for image registration application by focusing on the mapping

procedure and how to reduce the effect of nonuniform sampling that leads to many

problems as listed above. One of the contribution of this dissertation is the new

innovative adaptive sampling method that reduces the computational waste in the

transform process due to the oversampling at the image pixel around the fovea of

LPT. Along with the proposed projection and 1D matching techniques, the proposed

method greatly improves the performance of the LPT and yields less computational

cost.

In Chapter 4, we further observe LPT in the different perspective, not at one

particular issue, but for the entire LPT based image registration as a whole. Following

the same line as LPT, we introduce new image processing methods, LS, and PSLS. At

a glance, the new technique shares some common characteristics as LPT. However, the

actual process is entirely different than that of LPT from the mathematical expression

of the transformations, the sampling points distribution, and the matching procedure.

116

Significant improvement in performance, accuracy, and computational cost reduction

are obtained.

In closing, this dissertation provides in depth analysis of LPT approach, its ad-

vantages, problems and applications for image processing. Innovative techniques are

proposed to address the problems of LPT and providing performance improvement

for the LPT method. However, the work is still far from complete to make perfect

image registration system. Each topic listed above is rich in content, and to address

only one of them adequately is a challenging task. With demands in image regis-

tration application growing rapidly from military applications, medical section, and

manufacturing, a faster and more accurate method is always desired. Today new

photographic devices are capable of much higher resolution than before and real-time

computation is always in demand for many applications and in practical use. Future

works need to focus on reducing sensitivity to translation of the methods. Artificial

intelligent such as recognition and retrieval can be integrated to the system to reduce

the search time. With high registration accuracy and 2D invariant properties of LPT

and PSLS, new applications should be explored. With the limitless possibility, one

day LPT and PSLS may have their footprint in every area we can possibly imagine.

117

APPENDIX A

GABOR WAVELET FEATURE EXTRACTION

In order to keep the advantage of the LPT scale and rotation invariant properties,

translation parameter has to be found. FM uses Fourier phase correlation to recover

the translation before computing the Log-Polar matching in the frequency domain.

However, recent study [94] indicates aliasing problem from using Phase magnitude

of FFT to recover the translation of an object (center point). Zokai et al [94] uses

multi-resolution imaging technique to computation cost of translation recovery by

searching from the coarsest to the finest level. This technique, however, performs

well only when the target image contains enough low frequency information and

that low frequency component is not distorted by noise. To address this problem,

we introduce scale and rotation invariant feature based search scheme that not only

suits our goal but accelerate the search time by avoiding the exhaustive search. By

convolving image with Gabor wavelets [14,20,23,27,32,54,59,71,85,86,88] at different

scale and orientation, the Gabor coefficients that reflects the frequency response of

the image to the Gabor filter at different scale and rotation is obtained, as shown

below:

g(x, y) =

(1

2πσxσy

)exp

[−1

2

(x2

σ2x

+y2

σ2y

)+ 2πjWx

](A.1)

118

gmn(x, y) = a−mg(x′, y′); a > 1;m = 1 : M ;n = 1 : N (A.2)

[x′

y′

]= a−m

[cosθn sinθn−sinθn cosθn

] [xy

](A.3)

C(x, y,m, n) =

∫ ∫I(x′, y′)gmn(x− x′, y − y′)dx′dy′ (A.4)

where σ is the standard deviation along the axes. M is the number of scale and

N is the number of orientations of the Gabor filter.

To obtain Gabor features. firstly, the image coefficients are normalized in order to

reduce the effect of illumination changes. For every image pixel (x, y), we choose only

the coefficients from the scale factor that gives the highest average frequency response

for all orientation, the remaining coefficients from other scale factor is discarded. We

then calculate the geometric mean from the remaining coefficients of that image pixel

as shown below:

C(x, y) =

(N∑n=1

C(x, y,ms, n)

) 1n

(A.5)

where ms is scale factor that gives the highest average frequency response for all

orientation at pixel (x, y). A non-maxima suppress is applied to select the maxima

value. We note that in our approach, unlike feature based image registration methods,

feature points do not represent the image or object. They are used for accelerate the

search only.

As illustrated in Figs. A.1(a) and A.1(b), Gabor feature gives comparable per-

formance to SIFT and Harris-Laplacian as scale and rotation invariant features.

119

20 40 60 80 100 120 140 160 1800

20

40

60

80

100

Orientaion (degree)

Rep

eata

bilit

y (%

)

Gabor featuresSIFTHarris−Laplacian

(a)

0.4 0.5 0.6 0.7 0.8 0.90

20

40

60

80

100

Scale factor

Rep

eata

bilit

y (%

)


(b)

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160

20

40

60

80

100

Gaussian Noise (Variance)

Rep

eata

bilit

y (%

)


(c)

Figure A.1: Performance evaluations of proposed Gabor features, SIFT [43], andHarris-Laplacian [50], (a) Orientation change, (b) scale change, (c) noise added

120

However, in the noisy environment, Gabor feature outperforms SIFT and Harris-

Laplacian, which is shown in Fig. A.1(c). We note that the performance of feature

extraction methods were evaluated as repeatability manner as suggested in [51]. Fea-

tures are considered repeatable if no more than 5 pixels displacement offset is detected

in the test image.

121

BIBLIOGRAPHY

[1] J. Ahmed and M. N. Jafri. Improved phase correlation matching. In Proc. Imageand Signal Processing, pages 128–135, 2008.

[2] H. Andreasson and T. Duckett. Topological localization for mobile robots us-ing omni-directional vision and local features. In Proc. Intelligent Autonomous

Vehicles, Jul 2004.

[3] H. Araujo and J. M. Dias. An introduction to the log-polar mapping. In Proc.

Second Workshop on Cybernetic Vision, pages 139–144, Dec 1996.

[4] R. C. Archibald. The logarithmic spiral. Amer. Math. Monthly, 25:189–193,1918.

[5] D. I. Barnea and H. F. Silverman. A class of algorithms for fast digital imageregistration. IEEE Trans. Computers, 21(2):179–186, Feb 1972.

[6] C. Baumgarten and G. Farin. Approximation of logarithmic spirals. ComputerAided Geometric Design, 14(6):515–532, 1997.

[7] A. Bottcher, Y. I. Karlovich, and V. S. Rabinovich. Emergence, persistence, anddisappearance of logarithmic spirals in the spectra of singular integral operators.

Integral Equations and Operator Theory, 25(4):406–444, Dec 1996.

[8] L. G. Brown. A survey of image registration techniques. ACM Computing Sur-

veys, 24(4):325–376, Dec 1992.

[9] M. O. Caceres, A. K. Chattah, W.-H. Wang, and Y.-C. Chen. Image registrationby control points pairing using the invariant properties of line segments. Pattern

Recognition Letters, 18(13):269–281, Mar 1997.

[10] C. Capurro, F. Panerai, and G. Sandini. Vergence and tracking fusing log-polar

images. In Proc. Intl. Conf. Pattern Recognition, volume 4, pages 740–744, Aug1996.

122

[11] Q. S. Chen, M. Defrise, and F. Deconinck. Symmetric phase-only matched fil-tering of Fourier-Mellin transforms for image registration and recognition. IEEE

Trans. Pattern Analysis and Machine Intelligence, 16(12):1156–1168, Dec 1994.

[12] D. Comaniciu, V. Ramesh, and P. Meer. Kernel-based object tracking. IEEE

Trans. Pattern Analysis and Machine Intelligence, 25(5):564–577, May 2003.

[13] M. Comer and E. Delp. Segmentation of textured images using a multiresolution

gaussian autoregressive model. IEEE Trans. Image Processing, 8(3):408–420,Mar 1999.

[14] J. Cook, V. Chandran, and S. Sridharan. Multiscale representation for 3-D facerecognition. IEEE Trans. Information Forensics and Security, 2(3):529–536, Sep

2007.

[15] T. Darrell. A radial cumulative similarity transform for robust image correspon-dence. In Proc. Computer Vision and Pattern Recognition, pages 656–662, Jun

1998.

[16] P. J. Davis, W. Gautschi, and A. Iserles. Spirals: From Theodorus to Chaos. AK

Peters, Ltd., Wellesley, Jun 1993.

[17] G. Deschamps and J. Dyson. The logarithmic spiral in a single-aperture multi-

mode antenna system. , IEEE Trans. Antennas and Propagation, 19(1):90–96,Jan 1971.

[18] R. M. Dufour, E. L. Miller, and N. P. Galatsanos. Template matching basedobject recognition with unknown geometric parameters. IEEE Trans. Image

Processing, 11(12):1385–1396, Dec 2002.

[19] S. E. Englert, Z. Sheng, and R. L. Kirlin. The evaluation of a crosscorrelationpattern recognition technique for flow visualization. Pattern Recognition, 23(3–

4):237–243, 1990.

[20] M. Feng and T. R. Reed. Motion estimation in the 3-D Gabor domain. IEEE

Trans. Image Processing, 16(8):2038–2047, Aug 2007.

[21] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for

model fitting with applications to image analysis and automated cartography.Communications of the ACM, 24(6):381–395, Jun 1981.

[22] Matteo Frigo and Steven G. Johnson. FFTW: An adaptive software architecturefor the FFT. In Proc. Int. Conf. Acoustics, Speech and Signal Processing, pages

1381–1384, May 1998.

123

[23] D. Gonzalez-Jimenez and J. L. Alba-Castro. Shape-driven Gabor jets for facedescription and authentication. IEEE Trans. Information Forensics and Security,

2(4):769–780, Dec 2007.

[24] A. Goshtasby and C. Shekhar. Point pattern matching using convex hull edges.IEEE Trans. Systems, Man and Cybernetics, 15:631–637, 1985.

[25] A. Gray, E. Abbena, and S. Salamon. Modern Differential Geometry of Curves

and Surfaces with Mathematica, Third Edition. Chapman and Hall CRC, 2006.

[26] C. Harris and M. Stephens. A combined corner and edge detection. In Proc. theFourth Alvey Vision Conference, pages 147–151, 1988.

[27] C. He, Y. F. Zheng, and S. C. Ahalt. Object tracking using the Gabor wavelet

transform and the golden section algorithm. IEEE Trans. Multimedia, 4(4):528–538, Dec 2002.

[28] M. Helm. Towards automatic rectification of satellite images using feature based

matching. In Proc. Intl. Conf. Geoscience and Remote Sensing Symposium, pages2439–2442, Jun 1991.

[29] P. Hemenway and A. Ray. Divine Proportion: Φ in Art, Nature, and Science.Sterling Publisher, 2005.

[30] J.-W. Hsieh, H.-Y. M. Liao, K.-C. Fan, M.-T. Ko, and Y.-P. Hung. Image

registration using a new edge-based approach. Computer Vision and Image Un-derstanding, 67(2):112–130, 1997.

[31] Y. C. Hsieh, D. M. McKeown, and F. P. Perlant. Performance evaluation of

scene registration and stereo matching for cartographic feature extraction. IEEETrans. Pattern Analysis and Machine Intelligence, 14(2):214–238, Feb 1992.

[32] J. Ilonen, J.-K. Kamarainen, P. Paalanen, M. Hamouz, J. Kittler, and H. Kalvi-

ainen. Image feature localization by multiple hypothesis testing of Gabor fea-tures. IEEE Trans. Image Processing, 17(3):311–325, Mar 2008.

[33] O. Isik, K. P. Esselle, and Y. Ge. Compact microstrip and CPW duplexers

using complementary and conventional logarithmic spiral resonators. In Proc.Antennas and Propagation Society International Symposium, pages 4977–4980,

Jun 2007.

[34] F. Jurie. A new log-polar mapping for space variant imaging - application toface detection and tracking. Pattern Recognition, 32(5):865–875, May 1999.

[35] C. D. Kuglin and D. C. Hines. The phase correlation image alignment method.

In Proc. Cybernetics and Society, pages 163–165, Sep 1975.

124

[36] B. V. K. V. Kumar, A. Mahalanobis, and R. D. Juday. Correlation patternrecognition. Cambridge University Press, New York, NY, USA, 2005.

[37] M. Lades, J. C. Vorbruggen, J. Buhmann, J. Lange, C. V. D. Malsburg, R. P.Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic

link architecture. IEEE Trans. Computers, 42:300–311, Mar 1993.

[38] D. A. LeMaster. A comparison of template matching registration methods for

polarimetric imagery. In Proc. Aerospace Conference, pages 1–9, Mar 2008.

[39] J. P. Lewis. Fast normalized cross-correlation. In Proc. Vision Interface, pages

120–123, May 1995.

[40] H. Liu, B. Guo, and Z. Feng. Pseudo-log-polar Fourier transform for imageregistration. IEEE Trans. Signal Processing Letters, 13(1):17–20, Jan 2006.

[41] Z. Lou and J.-M. Jin. Modeling and simulation of broad-band antennas usingthe time-domain finite element method. IEEE Trans. Antennas and Propagation,

53(12):4099–4110, Dec 2005.

[42] Z. Lou and J. M. Jin. Time-domain finite-element modeling and simulation of

broadband antennas. In Proc. Antennas and Propagation Society InternationalSymposium, pages 2809–2812, Jul 2006.

[43] D. Lowe. Distinctive image features from scale-invariant keypoints. InternationalJournal of Computer Vision, 60(2):91–110, Nov 2004.

[44] H. Mahersia and K. Hamrouni. New rotaion invariant features for texture classi-fication. In Proc. Intl. Conf. Computer and Communication Engineering, pages

687–690, May 2008.

[45] J. B. A. Maintz and M. A. Viergever. A survey of medical image registration.Medical Image Analysis, 2(1):1–36, Mar 1998.

[46] B. S. Manjunath and C. Shekhar. A new approach to image feature detectionwith applications. Pattern Recognition, 29(4):627–640, Apr 1996.

[47] R. Matungka, Y. F. Zheng, and R. L. Ewing. 2D invariant object recognitionusing log-polar transform. In Proc. World Congress on Intelligent Control and

Automation, pages 223–228, Jun 2008.

[48] R. Matungka, Y. F. Zheng, and R. L. Ewing. Image registration using adaptive

polar transform. In Proc. Int. Conf. Image Processing, pages 2416–2419, Oct2008.

125

[49] R. Matungka, Y. F. Zheng, and R. L. Ewing. Object recognition using log-polarwavelet mapping. In Proc. Intl. Conf. Tools with Artificial Intelligence, volume 2,

pages 559–563, Nov. 2008.

[50] K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors.

International Journal of Computer Vision, 60(1):63–86, Oct 2004.

[51] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors.

IEEE Trans. Pattern Analysis and Machine Intelligence, 27(10):1615–1630, Oct2005.

[52] S. Moss and E. R. Hancock. Image registration with shape mixtures. In Proc.Intl. Conf. Image Analysis and Processing, pages 172–179, London, UK, 1997.

Springer-Verlag.

[53] S. Moss and E. R. Hancock. Multiple line-template matching with the EMalgorithm. Pattern Recognition Letters, 18(11–13):1283–1292, Nov 1997.

[54] R. M. Mutelo, W. L. Woo, and S. S. Dlay. Discriminant analysis of the two-dimensional Gabor features for face recognition. Computer Vision, IET, 2(2):37–

49, Jun 2008.

[55] A. N. Myrna, M. G. Venkateshmurthy, and C. G. Patil. Detection of region du-

plication forgery in digital images using wavelets and log-polar mapping. In Proc.Intl. Conf. Computational Intelligence and Multimedia Applications, volume 3,

pages 371–377, Dec 2007.

[56] K. Noe, J. Mosegaard, K. Tanderup, and T. S. Sørensen. A framework for shape

matching in deformable image registration. Studies in health technology and

informatics, 132:333–335, Sep 2008.

[57] J. J. K. O Ruanaidh and T. Pun. Rotation, scale and translation invariant digital

image watermarking. In Proc. Intl. Conf. Image Processing, pages 536–539, Oct1997.

[58] G. A. Papakostas, Y. S. Boutalis, D. A. Karras, and B. G. Mertzios. Fast nu-merically stable computation of orthogonal Fourier Mellin moments. Computer

Vision, IET, 1(1):11–16, Mar 2007.

[59] T.W. Parks and C. S. Burrus. Digital Filter Design. John Wiley and Sons, 1987.

[60] C. M. Pun and M. C. Lee. Log-polar wavelet energy signatures for rotation andscale invariant texture classification. IEEE Trans. Pattern Anal. Mach. Intell.,

25(5):590–603, May 2003.

126

[61] S. P. Raman and U. B. Desai. 2D object recognition using Fourier Mellin trans-form and a MLP network. In Proc. Intl. Conference on Neural Networks, vol-

ume 4, pages 2154–2156, Dec 1995.

[62] M. G. Ramos and S. S. Hemami. Eigenfeatures coding of videoconferencing

sequences. In Proc. Visual Communications and Image Processing, pages 100–110, Feb 1996.

[63] H. S. Ranganath and S. G. Shiva. Correlation of adjacent pixels for multipleimage registration. IEEE Trans. Computers, 34(7):674–677, 1985.

[64] B. S. Reddy and B. N. Chatterji. An FFT-based technique for translation,rotation, and scale-invariant image registration. IEEE Trans. Image Processing,

5(8):1266–1271, Aug 1996.

[65] A. Said and W. A. Pearlman. An image multiresolution representation for losslessand lossy compression. IEEE Trans. Image Processing, 5(6):1303–1310, Sep 1996.

[66] H. Schmidt. Ausgewa hlte ho here kurven. Kesselringsche VerlagsbuchhandlungWiesbaden, Germany, 1949.

[67] M. Sester, H. Hild, and D. Fritsch. Definition of groundcontrol features for imageregistration using GIS data. In Proc. Object Recognition and Scene Classification

from Multipspectral and Multisensor Pixels, pages 537–543, 1998.

[68] S. Y. E. Sharouni, H. B. Kal, and J. J. Battermann. Accelerated regrowth of

non-small-cell lung tumours after induction chemotherapy. British Journal ofCancer, 89(12):2184–2189, Dec 2003.

[69] A. Y. Sheng, C. Lejeune, and H. H. Arsenault. Frequency-domain Fourier-Mellin

descriptors for invariant pattern recognition. Optical Engineering, 27(5):354–357,May 1988.

[70] R. Song and J. Szymanski. Auto-sorting scheme for image ordering applicationsin image mosaicing. Electronics Letters, 44(13):798–799, Jun 2008.

[71] A. Spinei, D. Pellerin, D. Fernandes, and J. Herault. Fast hardware imple-mentation of Gabor filter based motion estimation. Integrate Computer-Aided

Engineering, 7(1):67–77, Jan 2000.

[72] G. C. Stockman, S. Kopstein, and S. Benett. Matching images to models for

registration and object detection via clustering. IEEE Trans. Pattern Analysisand Machine Intelligence, 4(3):229–241, May 1982.

127

[73] H. S. Stone, B. Tao, and M. McGuire. Analysis of image registration noise dueto rotationally dependent aliasing. Journal of Visual Communication and Image

Representation, 14(2):114–135, Jun 2003.

[74] M. Tistarelli and G. Sandini. On the advantages of polar and log-polar mapping

for direct estimation of time-to-impact from optical flow. IEEE Trans. PatternAnalysis and Machine Intelligence, 15(4):401–410, Apr 1993.

[75] H. Tomioka, M. Suhara, and T. Okumura. Analysis of edge effect on millimeter-sized planar logarithmic spiral antennas. In Proc. Asia-Pacific Microwave Con-

ference, volume 5, pages 3–5, Dec 2005.

[76] V. J. Traver, A. Bernardino, P. Moreno, and J. S. Victor. Appearance-based

object detection in space-variant images: a multi-model approach. In Proc. Intl.

Conf. Image Analysis and Recognition, pages 538–546, Sep 2004.

[77] V. J. Traver and F. Pla. Dealing with 2D translation estimation in log-polar

imagery. Image and Vision Computing, 21(2):145–160, Feb 2003.

[78] V. J. Traver and F. Pla. Similarity motion estimation and active tracking through

spatial-domain projections on log-polar images. Computer Vision and ImageUndersanding, 97(2):209–241, 2005.

[79] V. J. Traver and F. Pla. Log-polar mapping template design: from task-levelrequirements to geometry parameters. Image Vision Computing, 26(10):1354–

1370, 2008.

[80] T. VanCourt, Y. Gu, and M. C. Herbordt. Three-dimensional template corre-

lation: object recognition in 3D voxel data. In Proc. Computer Architecture for

Machine Perception, pages 153–158, Jul 2005.

[81] T. Vincent and R. Laganiere. Matching feature points in stereo pairs: a compara-

tive study of some matching strategies. Machine Graphics and Vision, 10(3):237–260, 2001.

[82] T. Vincent and R. Laganiere. Matching feature points for telerobotics. In Proc.Haptic Virtual Environments and Their Applications, pages 13–18, 2002.

[83] S. Wisetphanichkij and K. Dejhan. Multi-resolution image registration based oncorrelation technique. In Proc. Intl. Geoscience and Remote Sensing Symposium,

volume 6, pages 3871–3875, Jul 2005.

[84] W. K. Wong, C. W. Choo, C. K. Loo, and J. P. Teh. FPGA implementation of

log-polar mapping. In Proc. Intl. Conf. on Mechatronics and Machine Vision inPractice, pages 45–50, Dec. 2008.

128

[85] J. Wu, G. An, and Q. Ruan. Independent Gabor analysis of discriminant featuresfusion for face recognition. IEEE Trans. Signal Processing Letters, 16(2):97–100,

Feb 2009.

[86] X. Wu and B. Bhanu. Gabor wavelet representation for 3-D object recognition.

IEEE Trans. Image Processing, 6(1):47–64, Jan 1997.

[87] B. Xiong, C. Zhu, C. Charoansak, and W. Yu. An FPGA prototyping of Gabor-

wavelets transform for motion detection. International Journal of ComputerSciences and Engineering Systems, 1(1):65–70, Jan 2007.

[88] F. Yang and M. Paindavoine. Fast motion estimation based on spatio-temporalGabor filters: parallel implementation on multi-dsp. In Proc. SPIE: Advanced

Signal Processing Algorithms, Architectures, and Implementations, volume 4116,

pages 454–462, Aug 2000.

[89] D. Young. Straight lines and circles in the log-polar image. In Proc. British

Machine Vision Conference, pages 426–435, Sep 2000.

[90] F. Yuan, H. Zhang, and R. Jia. Digital image stabilization based on log-polar

transform. In Proc. Intl. Conf. Image and Graphics, pages 769–773, Aug 2007.

[91] Z. Zhang, R. Deriche, O. D. Faugeras, and Q. T. Luong. A robust technique for

matching two uncalibrated images through the recovery of the unknown epipolargeometry. Artificial Intelligence, 78(1–2):87–119, 1995.

[92] Q. Zheng and R. Chellappa. A computational vision approach to image regis-tration. IEEE Trans. Image Processing, 2(3):311–326, Jul 1993.

[93] B. Zitova and J. Flusser. Image registration methods: a survey. Image and

Vision Computing, 21(11):977–1000, Oct 2003.

[94] S. Zokai and G. Wolberg. Image registration using log-polar mappings for recov-

ery of large-scale similarity and projective transformations. IEEE Trans. ImageProcessing, 14(10):1422–1434, Oct 2005.

129

Documents

studies on log-polar transform for image registration and improvements using adaptive sampling