1
Handling Urban Location Recognition as a 2D Homothetic Problem Goal: City-wide visual localization for augmented reality based on street-level panoramic imagery and 3D building models. No need for GPS Idea: - Query and reference image might be taken from very different viewpoints - Facades related by a homography - Easier to compare when both images have been warped to a canonical viewpoint (frontal) - Frontal facades related by a homothetic transformation (scale and translation) - 3D rotation invariant matching - Can use upright SIFT (more discriminative) - Estimate only 3 parameters, one at a time - Get full 6 degrees of freedom camera pose for free Pipeline: Georges Baatz 1 , Kevin Köser 1 , David Chen 2 , Radek Grzeszczuk 3 and Marc Pollefeys 1 Online Offline Panorama Building geometry Orthographic view Upright SIFT features Vocabulary tree and inverted file system Upright versus traditional SIFT: Large scale recognition experiments: Contributions: - Leverage known geometry to generate frontal images - More discriminative power from upright SIFT - Estimate homothetic transformation - Efficient stratified voting scheme instead of RANSAC Short list of candidate solutions Best match Rectification using 3D models: Geometric verification: Orthographic view Upright SIFT features Query image 6DOF camera pose 60000 calibrated street-level images Project images onto 3D building models 100000 frontal images Upright SIFT outperforms traditional SIFT on the standard image sequences Graffiti, Bark and Wall Performance of our approach evaluated on datasets Earthmine, Navteq and Cellphone Affine Masked Rectified Upright Earthmine 84.3% 83.0% 82.6% 85.0% Navteq 33.9% 26.3% 25.2% 38.9% Cellphone 30.2% 23.2% 25.2% 32.1% 0 10 20 30 40 50 0.1 0.2 0.3 0.4 0.5 Number of Candidates Recall Upright Rectified Masked Affine 0 10 20 30 40 50 0.1 0.2 0.3 0.4 0.5 Number of Candidates Recall Upright Rectified Masked Affine Unsuccessful Successful Unsuccessful Successful Cellphone Navteq 1 2 3 −8 −7 −6 −5 −4 −3 −2 −1 0 0 500 Scale Support −40 −30 −20 −10 0 10 20 30 40 0 50 X Translation Support −20 −15 −10 −5 0 5 10 15 20 25 0 50 100 Y Translation Support Hotel Ibis 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 recall 1-precision upright SIFT traditional SIFT 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 recall 1-precision upright SIFT traditional SIFT 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 recall 1-precision upright SIFT traditional SIFT Homothetic transform Consistent scale X translation Y translation

ECCV2010-poster-small · Title: ECCV2010-poster-small Created Date: 9/1/2010 6:10:23 PM

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECCV2010-poster-small · Title: ECCV2010-poster-small Created Date: 9/1/2010 6:10:23 PM

Handling Urban Location Recognition as a 2D Homothetic Problem

Goal:City-wide visual localization for augmented reality

based on street-level panoramic imagery and 3D building models. No need for GPS

Idea:- Query and reference image might be taken from very

different viewpoints- Facades related by a homography

- Easier to compare when both images have been warped to a canonical viewpoint (frontal)

- Frontal facades related by a homothetic transformation (scale and translation)

- 3D rotation invariant matching- Can use upright SIFT (more discriminative)- Estimate only 3 parameters, one at a time- Get full 6 degrees of freedom camera pose for free

Pipeline:

Georges Baatz1, Kevin Köser1, David Chen2, Radek Grzeszczuk3 and Marc Pollefeys1

Online Offline

Panorama Building geometry

Orthographic view

Upright SIFT features

Vocabulary tree and inverted file system

Upright versus traditional SIFT:

Large scale recognition experiments:

Contributions:- Leverage known geometry to generate frontal images- More discriminative power from upright SIFT- Estimate homothetic transformation- Efficient stratified voting scheme instead of RANSAC

Short list of candidate solutions

Best match

Rectification using 3D models:

Geometric verification:

Orthographic view

Upright SIFT features

Query image

6DOF camera pose

60000 calibrated street-level images

Project images onto 3D building models

100000 frontal images

Upright SIFT outperforms traditional SIFT on the standard image sequences Graffiti, Bark and Wall

Performance of our approach evaluated on datasets Earthmine, Navteq and Cellphone

Affine Masked Rectified Upright Earthmine 84.3% 83.0% 82.6% 85.0%

Navteq 33.9% 26.3% 25.2% 38.9% Cellphone 30.2% 23.2% 25.2% 32.1%

0 10 20 30 40 500.1

0.2

0.3

0.4

0.5

Number of Candidates

Rec

all

UprightRectifiedMaskedAffine

0 10 20 30 40 500.1

0.2

0.3

0.4

0.5

Number of Candidates

Rec

all

UprightRectifiedMaskedAffine

UnsuccessfulSuccessful UnsuccessfulSuccessful

Cellphone Navteq

1 2 3

−8 −7 −6 −5 −4 −3 −2 −1 00

500

Scale

Sup

port

−40 −30 −20 −10 0 10 20 30 400

50

X Translation

Sup

port

−20 −15 −10 −5 0 5 10 15 20 250

50

100

Y Translation

Sup

port

Hotel Ibis 0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

reca

ll

1-precision

upright SIFTtraditional SIFT

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

reca

ll

1-precision

upright SIFTtraditional SIFT

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5

reca

ll

1-precision

upright SIFTtraditional SIFT

Homothetic transform

Consistent scale

X translation Y translation