34
Automatic Matching of Multi-View Images Ed Bremer University of Rochester

Automatic Matching of Multi-View Images

  • Upload
    kasen

  • View
    41

  • Download
    2

Embed Size (px)

DESCRIPTION

Automatic Matching of Multi-View Images. Ed Bremer University of Rochester. References. [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004, http://lear.inrialpes.fr/pubs/2004/MS04a - PowerPoint PPT Presentation

Citation preview

Page 1: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

Ed Bremer

University of Rochester

Page 2: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

2

References

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004,http://lear.inrialpes.fr/pubs/2004/MS04a

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.

[4] Matas, J., Chum, O., Urban, M., Pajdla,T. 2002. Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393.

[5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or  ”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages 414-431, vol 1.

[6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages 774-781.

[7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, pages 525-531.

Page 3: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

3

Outline

Motivation

Applications

Process Components

Region Detectors

Descriptors

Matching Criteria

Performance Evaluation

Conclusion & Next Steps

Page 4: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

4

Motivation

Multi-view/Multi-image MatchingMultiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination

3D scene

Page 5: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

5

Motivation

Applications

… detecting matching regions is used in all the following

Image registration

Super-resolution

Stereo vision

Object detection and recognition

Object and motion tracking

Indexing and retrieval of objects

3D scene reconstruction

Scene recognition

Page 6: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

6

Examples of Multi-view Images [2]

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

Page 7: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

7

Process Components

Covariant region detection Detect image regions covariant to class of

transformation between reference image and transformed image

Invariant descriptor Compute invariant descriptors from covariant regions

Descriptor matching Compute distance between descriptors in reference

image and transformed image

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI,

http://lear.inrialpes.fr/pubs/2004/MS04a

Page 8: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

8

Region Detectors

Support regions for computation of descriptors

Determined independently in each image Scale invariant or Affine invariant Can be points (feature points) or regions (covariant) Provide dense (local) coverage – robust to occlusion Need to be stable and repeatable

Five region detectors -

Harris points -> invariant to rotation Harris-Laplacian -> invariant to rotation and scale Hessian-Laplace ->invariant to rotation and scale Harris-Affine -> invariant to affine image transformations Hessian-Affine -> invariant to affine image transformations

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 9: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

9

Region Detectors

Harris points - Maxima of Harris function used to locate interest point Support region fixed in size, 41x41 neighborhood centered at

interest point

Harris-Laplace regions - Scale adapted Harris function Interest point is local minima or maxima across scale-space by

Laplacian-of-Gaussian

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 10: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

10

Region Detectors

Harris-Laplace Performance - Approximately 10% better than Laplacian, Lowe or

gradient methods. Harris standard detector is very poor under scale changes

[7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages 525-531.

Page 11: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

11

Region Detectors

Hessian-Laplace regions - Interest point is at local maxima of Hessian determinant

Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians)

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

[3] Lowe, D., 2004. Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp. 91-118.

Page 12: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

12

Region Detectors

Harris-Affine regions - Find regions using Harris-Laplace detector Region based on 2nd moment & affine adapted

Hessian-Affine regions - Find regions using Hessian-Laplace detector Affine adapted region based on 2nd moment.

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

Page 13: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

13

Region Detectors

Regions produced by Harris-Affine and Hessian-Affine detectors

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

Page 14: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

14

Region Detectors

Affine normalization using 2nd moment matrix for region L and R

[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, http://lear.inrialpes.fr/pubs/2004/MTSZMSKG04

Page 15: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

15

Region Detectors

Region normalization Detectors produce circular or elliptical regions Size dependant on detection scale Map regions to circular region with constant radius Rotate regions in direction of dominant gradient

orientation

Illumination normalization Use affine transformation -> aI(x) + b Mean and standard deviation of pixel intensities

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 16: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

16

Descriptors

Descriptors -> Feature vector Invariant to changes in scale, rotation, affine translation and affine

illumination Need to be distinct, stable and repeatable Distribution (histogram) type or Covariance type

Ten Descriptor types Scale-Invariant Feature Transform (SIFT) Gradient Location and Orientation histogram (GLOH) Shape Context Principal Component Analysis (PCA)-SIFT Steerable Filters Differential Invariants Complex Filters Moment Invariants Cross-Correlation Spin Image

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 17: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

17

Descriptors

SIFT and GLOH 3D Descriptors SIFT -> 4 x 4 x 8 = 128 dimension descriptor GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 18: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

18

Matching Criteria

Distance measure Find putative matches between images Mahalanobis distance – used for covariant descriptors Euclidean distance – used for distribution (histogram) descriptors Direct distance comparison not suitable for indexing or database

searching

Simple threshold Descriptors match if distance between is below threshold t Descriptor in reference image can have many matches to

descriptors in transformed image

Nearest Neighbor (NN) Find closest match between descriptors in reference and

transformed image Descriptor in reference image can have only 1 match to descriptor

in transformed image

Page 19: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

19

Performance Evaluation

Criterion basis Recall rate = #correct matched/#correspondences 1-precision = #false matches/[#correct matches + #false matches] Ideal descriptor -> recall rate = 1, for all precision given no overlap error

[1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, http://lear.inrialpes.fr/pubs/2004/MS04a

Page 20: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

20

SIFT - Scale Invariant Feature Transform

Scale Invariant Feature Transform (SIFT) Lowe [3]

Features – Invariant to image scale, rotation Invariant for small changes in illumination and 3D camera

viewpoint

Extracts large number of highly distinctive features Enables detection of small objects Improved performance in cluttered scenes

Algorithms are efficient – complex operations applied to local regions or features vs whole image

Procedure Scale-space extrema detection Keypoint localization Orientation asignment Keypoint vector (descriptor)

Page 21: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

21

SIFT - Scale Invariant Feature Transform [3]

Scale-Space Blob Detector - Search for stable features over all scales and image

locations Scale-space kernel -> Gaussian function

Difference of Gaussian

Page 22: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

22

SIFT - Scale Invariant Feature Transform [3]

Difference of Gaussian (DoG) simple subtraction of blurred L images

Approximation to scale-normalized Laplacian of Gaussian

Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002)

Page 23: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

23

SIFT - Scale Invariant Feature Transform [3]

Scale-Space Image Set - Divide each octave into s intervals

Compute s + 3 filtered (increasing blurry) images, k = 2(1/s)

s = 3, k = 1.26 -> 6th –> 3.18σ5th –> 2.52σ4th –> 2.00σ3rd –> 1.59σ2nd –> 1.26σ 1st –> 1.00σ

Subtract adjacent images to produce DoG images

Repeat for next octave using 2nd image from top and decimate by 2

Page 24: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

24

SIFT - Scale Invariant Feature Transform [3]

Scale-Space Pyramid -(from Lowe)

Page 25: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

25

SIFT - Scale Invariant Feature Transform [3]

Locating Scale-Space Extrema - Detection of local maxima or minima of D(x, y, σ)

Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below.

Mark if sample is greater than or less than all of the neighbors

Compares s number of DoG images

Page 26: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

26

SIFT - Scale Invariant Feature Transform [3]

Improving Localization -

Reject points that have low contrast using:

<threshold

Where –>

Gives offset extremum ->

Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point

Page 27: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

27

SIFT - Scale Invariant Feature Transform [3]

Edge Rejection -

Eliminate poorly defined peaks (edges) using Hessian matrix

Verify ratio of principal curves is less than threshold r<10

Efficient to compute -> less than 20 floating point operations

Page 28: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

28

SIFT - Scale Invariant Feature Transform [3]

Results from Lowe [3] – 832 keypoints reduced to 536 (233x189 image)

Page 29: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

29

SIFT - Scale Invariant Feature Transform

Results from Lowe [3] – performance measures

Page 30: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

30

SIFT - Scale Invariant Feature Transform

Results from Lowe [3] – performance measures

Page 31: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

31

SIFT - Scale Invariant Feature Transform [3]

Orientation – rotational invariance Use scale of point to select image L(x, y, σ)

Compute the gradient m(x, y) and orientation θ(x, y) at each image sample using differences.

Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range

Peaks in histogram correspond to dominant directions of local gradients

Page 32: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

32

SIFT - Scale Invariant Feature Transform [3]

Descriptor – the feature vector

8x8 sub-region histograms allow shift in gradient positions

128 element feature vector -> 4x4 array of 8 orientations(2x2x8 from Lowe is shown below)

Feature vectors matched by nearest neighbor (Euclidean distance)

Page 33: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

33

SIFT - Scale Invariant Feature Transform [3]

Results from Lowe [3] – Two training objects recognized in cluttered image Small squares show point matches Large rectangles shown border of training image after affine

transformation

Page 34: Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

34

Conclusions

Conclusions Harris-Laplacian region detector performs better than Laplacian, DoG and

gradient scale-space operators

Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint.

Affine adaptation provides invariance to affine transformations

GLOH and SIFT descriptors provide the best performance.

Dense, localized descriptors perform well under occlusions

Nexts steps Coding and testing of region detectors, descriptors and matching…