Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian

Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map

Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)

Center for Multimedia & Network Technology (CeMNet)

School of Computer Engineering

Nanyang Technological University, Singapore

Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)

Center for Multimedia & Network Technology (CeMNet)

School of Computer Engineering

Nanyang Technological University, Singapore

Estimating Camera Pose from a Single Urban

Ground-View Omnidirectional Image and a 2D

Building Outline Map

Urban Landmarks

• Those easy to recognize • Those that aren’t

© kevincole

© qureyoon

© Anirudh Koul

“Back-to-Basics” Map Reading!

• An image or images taken from a single location, at probe time

• A plan-view outline map

• Won’t consider GPS– GPS reception bad in high-

rise urban areas– GPS can be jammed or

spoofed

Related Work and Differences

• Appearance-based matching in urban areas– Robertson & Cipolla BMVC04, Yeh et al. CVPR04, Zhang & Košecká

3DPVT06

• General wide-baseline stereo / multi-view (but not targeted for searching through significant-sized datasets)– Bay et al. CVPR05, Mičušík et al. CVPR08, Schindler et al. 3DPVT06,

Schmid & Zisserman IJCV00, Werner & Zisserman ECCV02

• Key differences here:– No prior appearance information

• Only a 2D plan-view geometric map available– No stereo / multi-view

• Images are taken from single location

A Geometric Matching Paradigm

• Assume buildings are vertical planar extrusions

• Match building corners in map vertical corner lines in rectified image– Significant building corners – Not façade details / painted edges

Geometric Signature

2D Geometric Image Features

Basic Lines (2D)

2½D Geometric Image Features

• David Marr’s bottom-up visual perception framework

Image Primal Sketch 2½D Sketch 3D model

Augmented Lines (2D + adjacent 3D normals)




Elemental Planes (2D + fixed depth ratios of vertical boundaries)




Structural Fragments (piecewise 3D structures with unknown scales)

Basic Lines (2D)

Geometric Signatures – Uniqueness Analysis Under Ideal Conditions

Augmented Lines (2D + 3D normals)Elemental Planes (2D + fixed depth ratios)Structural Fragments (3D structure with unknown scale) Strongmatch

Poormatch

BOTTOM-UP

TOP-DOWN

Overview of Localization Method

2D mapCamera pose

Geometric hashing lookup for

correspondence candidates

Voting-based estimate of optimal camera

pose

Query image Extract vertical corners + normals

Recover elemental planes with 3D normals

Link into plan-view structural fragments (modulo similarity)

Calibration from vanishing points

Estimation of Quasi-Manhattan Vanishing Points

• Use EM algorithm (Schindler et al. 3DPVT 2006)

– Details in paper

• Image rectification 3D verticals become || to image y-axis

Vertical Corner Line Hypothesis (VCLH)• Hypotheses for corners of buildings

– Based on heuristics• 3 Categories:

Basic lineUni-Normal

Augmented LineBi-Normal

Augmented Line

Elemental Planes• Elemental Plane:

– 2 VCLHs connected by groups of collinear horizontal edges• Same plane normals on linked sides

az bz

constant

),( positions VCLHnormal plane

fz

z

b

a

Invariant Depth Ratio:

Structural Fragments

• Structural fragment– Sequence of adjacent elemental planes sharing bi-normal

VCLHs

),( positions VCLHnormals planefz

Full 3D structure(modulo scale)

More Examples

Elemental Planes

Structural Fragments

Matching with Structural Fragments

• Exhaustive testing:– Correspondence

• structural fragment of l planes l linked building edges– Best-fit matching with error– Consensus support C from other VCLHs

• Vote in pose-space accumulator array– Vote score:

• Complexity: O(n), n = # of building corners in map– 8s per search on Matlab

2

21 lC

s

Matching Example with Structural Fragments

Inconsistent matches Consistent matches

Experiments – Dataset I• Bronx neighborhood of Woodstock• Google Street View images (total 212)

– 53 unique locations, 4 images per location (shown in quads)

• Manually created building outline plan view map– 111 buildings with 885 corners

Experiments – Dataset II

• Singapore government housing (HDB) estate• Self-collected images (total 120)

– 30 unique locations, 4 images per location• Manually created building outline plan view map

– 20 mega buildings with 659 corners

Matching Results• Compare probe signature to signatures at 3600 grid locations, and

sort matching scores– Find rank of ground truth

Selectivity of 0-10%

Match ranks

% of test probes where correct pose is better than

this rank

• Example results for matching• 3D models are only used for visualizing results

Dataset II Example Correct Matches

Observations• This is a start to solving a challenging problem

– difficult even for humans• Results are mixed:

– Selectivity is very high• 57-70% of correct poses within top-1% selectivity (36 out of

3600)– But need to be higher to be end-usable– Yet in ideal conditions signatures appear very discriminative

• Main challenges– False VCLH negatives (some)

• building corners not detected due to poor resolution, etc.– False VCLH positives (many)

• Windows / other façade features often misdetected as corners– Architectural designs are seldom perfect extrusions

• Overhangs, balconies, fire escapes, etc.

Concluding Remarks• Geometric features can be powerful for discriminating locations

– Do not always have to rely on prior appearance data– Intelligent extension to geometric 2½D features

• 2D 2D+normals 2D+depth ratios 3D (mod scale)– Informal test in ideal conditions show excellent discriminating

power

• Key challenge lies in more robust image analysis– Needs robustness to noise and minor deviations from map

• Future Work– Use existing results to bootstrap more advanced (and costly)

registration techniques• E.g. top-down bundle adjustment working directly on raw

image intensities, rather than detected edgels

Credits• Joint work with

– Arridhana Ciptadi– Wei-Chian Tan– Minh-Tri Pham– Clement Liang-Tien Chia

• Thanks– Teck-Khim Ng– Zahoor Zafrulla– Rudianto Sugiyarto

• Research Sponsor– Project Tacrea Grant

Defence Science & Technology Agency (DSTA), Singapore

Scene Assumptions

• Quasi-Manhattan World– Vertical direction is orthogonal to all horizontal directions– Horizontal directions need not be orthogonal to each other

• Vertical Extrusion Model– Each building is a vertical extrusion of a ground-plane cross

section

• Implies buildings have simple vertical planar facades

Potential Future Directions

• Exploit localized architectural design “language”?– priors to improve geometric feature detection in poor quality

images– predict occluded parts of higher order geometric features that

form the local architectural “vocabulary”

• Investigate if reasonable to have prior distribution that buildings close by have similar geometric designs

Documents

Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map Tat-Jen CHAM (with Arridhana Ciptadi, Wei-Chian