Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

Ed Bremer

University of Rochester


2

References

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004,http://lear.inrialpes.fr/pubs/2004/MS04a

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.

[4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393.

[5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or ”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 414-431, vol 1.

[6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages 774-781.

[7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, pages 525-531.


3

Outline

Motivation

Applications

Process Components

Region Detectors

Descriptors

Matching Criteria

Performance Evaluation

Conclusion & Next Steps


4

Motivation

Multi-view/Multi-image MatchingMultiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination

3D scene


5

Motivation

Applications

… detecting matching regions is used in all the following

Image registration

Super-resolution

Stereo vision

Object detection and recognition

Object and motion tracking

Indexing and retrieval of objects

3D scene reconstruction

Scene recognition


6

Examples of Multi-view Images [2]



7

Process Components

Covariant region detection Detect image regions covariant to class of

transformation between reference image and transformed image

Invariant descriptor Compute invariant descriptors from covariant regions

Descriptor matching Compute distance between descriptors in reference

image and transformed image

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,

http://lear.inrialpes.fr/pubs/2004/MS04a


8

Region Detectors

Support regions for computation of descriptors

Determined independently in each image Scale invariant or Affine invariant Can be points (feature points) or regions (covariant) Provide dense (local) coverage – robust to occlusion Need to be stable and repeatable

Five region detectors -

Harris points -> invariant to rotation Harris-Laplacian -> invariant to rotation and scale Hessian-Laplace ->invariant to rotation and scale Harris-Affine -> invariant to affine image transformations Hessian-Affine -> invariant to affine image transformations

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a


9

Region Detectors

Harris points - Maxima of Harris function used to locate interest point Support region fixed in size, 41x41 neighborhood centered at

interest point

Harris-Laplace regions - Scale adapted Harris function Interest point is local minima or maxima across scale-space by

Laplacian-of-Gaussian



10

Region Detectors

Harris-Laplace Performance - Approximately 10% better than Laplacian, Lowe or

gradient methods. Harris standard detector is very poor under scale changes

[7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531.


11

Region Detectors

Hessian-Laplace regions - Interest point is at local maxima of Hessian determinant

Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians)


[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.


12

Region Detectors

Harris-Affine regions - Find regions using Harris-Laplace detector Region based on 2nd moment & affine adapted

Hessian-Affine regions - Find regions using Hessian-Laplace detector Affine adapted region based on 2nd moment.



13

Region Detectors

Regions produced by Harris-Affine and Hessian-Affine detectors



14

Region Detectors

Affine normalization using 2nd moment matrix for region L and R



15

Region Detectors

Region normalization Detectors produce circular or elliptical regions Size dependant on detection scale Map regions to circular region with constant radius Rotate regions in direction of dominant gradient

orientation

Illumination normalization Use affine transformation -> aI(x) + b Mean and standard deviation of pixel intensities



16

Descriptors

Descriptors -> Feature vector Invariant to changes in scale, rotation, affine translation and affine

illumination Need to be distinct, stable and repeatable Distribution (histogram) type or Covariance type

Ten Descriptor types Scale-Invariant Feature Transform (SIFT) Gradient Location and Orientation histogram (GLOH) Shape Context Principal Component Analysis (PCA)-SIFT Steerable Filters Differential Invariants Complex Filters Moment Invariants Cross-Correlation Spin Image



17

Descriptors

SIFT and GLOH 3D Descriptors SIFT -> 4 x 4 x 8 = 128 dimension descriptor GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor



18

Matching Criteria

Distance measure Find putative matches between images Mahalanobis distance – used for covariant descriptors Euclidean distance – used for distribution (histogram) descriptors Direct distance comparison not suitable for indexing or database

searching

Simple threshold Descriptors match if distance between is below threshold t Descriptor in reference image can have many matches to

descriptors in transformed image

Nearest Neighbor (NN) Find closest match between descriptors in reference and

transformed image Descriptor in reference image can have only 1 match to descriptor

in transformed image


19

Performance Evaluation

Criterion basis Recall rate = #correct matched/#correspondences 1-precision = #false matches/[#correct matches + #false matches] Ideal descriptor -> recall rate = 1, for all precision given no overlap error



20

SIFT - Scale Invariant Feature Transform

Scale Invariant Feature Transform (SIFT) Lowe [3]

Features – Invariant to image scale, rotation Invariant for small changes in illumination and 3D camera

viewpoint

Extracts large number of highly distinctive features Enables detection of small objects Improved performance in cluttered scenes

Algorithms are efficient – complex operations applied to local regions or features vs whole image

Procedure Scale-space extrema detection Keypoint localization Orientation asignment Keypoint vector (descriptor)


21

SIFT - Scale Invariant Feature Transform [3]

Scale-Space Blob Detector - Search for stable features over all scales and image

locations Scale-space kernel -> Gaussian function

Difference of Gaussian


22


Difference of Gaussian (DoG) simple subtraction of blurred L images

Approximation to scale-normalized Laplacian of Gaussian

Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002)


23


Scale-Space Image Set - Divide each octave into s intervals

Compute s + 3 filtered (increasing blurry) images, k = 2(1/s)

s = 3, k = 1.26 -> 6th –> 3.18σ5th –> 2.52σ4th –> 2.00σ3rd –> 1.59σ2nd –> 1.26σ 1st –> 1.00σ

Subtract adjacent images to produce DoG images

Repeat for next octave using 2nd image from top and decimate by 2


24


Scale-Space Pyramid -(from Lowe)


25


Locating Scale-Space Extrema - Detection of local maxima or minima of D(x, y, σ)

Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below.

Mark if sample is greater than or less than all of the neighbors

Compares s number of DoG images


26


Improving Localization -

Reject points that have low contrast using:

<threshold

Where –>

Gives offset extremum ->

Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point


27


Edge Rejection -

Eliminate poorly defined peaks (edges) using Hessian matrix

Verify ratio of principal curves is less than threshold r<10

Efficient to compute -> less than 20 floating point operations


28


Results from Lowe [3] – 832 keypoints reduced to 536 (233x189 image)


29


Results from Lowe [3] – performance measures


30


Results from Lowe [3] – performance measures


31


Orientation – rotational invariance Use scale of point to select image L(x, y, σ)

Compute the gradient m(x, y) and orientation θ(x, y) at each image sample using differences.

Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range

Peaks in histogram correspond to dominant directions of local gradients


32


Descriptor – the feature vector

8x8 sub-region histograms allow shift in gradient positions

128 element feature vector -> 4x4 array of 8 orientations(2x2x8 from Lowe is shown below)

Feature vectors matched by nearest neighbor (Euclidean distance)


33


Results from Lowe [3] – Two training objects recognized in cluttered image Small squares show point matches Large rectangles shown border of training image after affine

transformation


34

Conclusions

Conclusions Harris-Laplacian region detector performs better than Laplacian, DoG and

gradient scale-space operators

Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint.

Affine adaptation provides invariance to affine transformations

GLOH and SIFT descriptors provide the best performance.

Dense, localized descriptors perform well under occlusions

Nexts steps Coding and testing of region detectors, descriptors and matching…

Documents

Automatic Matching of Multi-View Images