Upload
anne-quinn
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Estimating Camera Pose from a Single Urban Ground-View Omnidirectional Image and a 2D Building Outline Map
Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)
Center for Multimedia & Network Technology (CeMNet)
School of Computer Engineering
Nanyang Technological University, Singapore
Tat-Jen CHAM(with Arridhana Ciptadi, Wei-Chian Tan, Minh-Tri Pham, Liang-Tien Chia)
Center for Multimedia & Network Technology (CeMNet)
School of Computer Engineering
Nanyang Technological University, Singapore
Estimating Camera Pose from a Single Urban
Ground-View Omnidirectional Image and a 2D
Building Outline Map
Urban Landmarks
• Those easy to recognize • Those that aren’t
© kevincole
© qureyoon
© Anirudh Koul
“Back-to-Basics” Map Reading!
• An image or images taken from a single location, at probe time
• A plan-view outline map
• Won’t consider GPS– GPS reception bad in high-
rise urban areas– GPS can be jammed or
spoofed
Related Work and Differences
• Appearance-based matching in urban areas– Robertson & Cipolla BMVC04, Yeh et al. CVPR04, Zhang & Košecká
3DPVT06
• General wide-baseline stereo / multi-view (but not targeted for searching through significant-sized datasets)– Bay et al. CVPR05, Mičušík et al. CVPR08, Schindler et al. 3DPVT06,
Schmid & Zisserman IJCV00, Werner & Zisserman ECCV02
• Key differences here:– No prior appearance information
• Only a 2D plan-view geometric map available– No stereo / multi-view
• Images are taken from single location
A Geometric Matching Paradigm
• Assume buildings are vertical planar extrusions
• Match building corners in map vertical corner lines in rectified image– Significant building corners – Not façade details / painted edges
Geometric Signature
2D Geometric Image Features
Basic Lines (2D)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Augmented Lines (2D + adjacent 3D normals)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Elemental Planes (2D + fixed depth ratios of vertical boundaries)
2½D Geometric Image Features
• David Marr’s bottom-up visual perception framework
Image Primal Sketch 2½D Sketch 3D model
Structural Fragments (piecewise 3D structures with unknown scales)
Basic Lines (2D)
Geometric Signatures – Uniqueness Analysis Under Ideal Conditions
Augmented Lines (2D + 3D normals)Elemental Planes (2D + fixed depth ratios)Structural Fragments (3D structure with unknown scale) Strongmatch
Poormatch
BOTTOM-UP
TOP-DOWN
Overview of Localization Method
2D mapCamera pose
Geometric hashing lookup for
correspondence candidates
Voting-based estimate of optimal camera
pose
Query image Extract vertical corners + normals
Recover elemental planes with 3D normals
Link into plan-view structural fragments (modulo similarity)
Calibration from vanishing points
Estimation of Quasi-Manhattan Vanishing Points
• Use EM algorithm (Schindler et al. 3DPVT 2006)
– Details in paper
• Image rectification 3D verticals become || to image y-axis
Vertical Corner Line Hypothesis (VCLH)• Hypotheses for corners of buildings
– Based on heuristics• 3 Categories:
Basic lineUni-Normal
Augmented LineBi-Normal
Augmented Line
Elemental Planes• Elemental Plane:
– 2 VCLHs connected by groups of collinear horizontal edges• Same plane normals on linked sides
az bz
constant
),( positions VCLHnormal plane
fz
z
b
a
Invariant Depth Ratio:
Structural Fragments
• Structural fragment– Sequence of adjacent elemental planes sharing bi-normal
VCLHs
),( positions VCLHnormals planefz
Full 3D structure(modulo scale)
More Examples
Elemental Planes
Structural Fragments
Matching with Structural Fragments
• Exhaustive testing:– Correspondence
• structural fragment of l planes l linked building edges– Best-fit matching with error– Consensus support C from other VCLHs
• Vote in pose-space accumulator array– Vote score:
• Complexity: O(n), n = # of building corners in map– 8s per search on Matlab
2
21 lC
s
Matching Example with Structural Fragments
Inconsistent matches Consistent matches
Experiments – Dataset I• Bronx neighborhood of Woodstock• Google Street View images (total 212)
– 53 unique locations, 4 images per location (shown in quads)
• Manually created building outline plan view map– 111 buildings with 885 corners
Experiments – Dataset II
• Singapore government housing (HDB) estate• Self-collected images (total 120)
– 30 unique locations, 4 images per location• Manually created building outline plan view map
– 20 mega buildings with 659 corners
Matching Results• Compare probe signature to signatures at 3600 grid locations, and
sort matching scores– Find rank of ground truth
Selectivity of 0-10%
Match ranks
% of test probes where correct pose is better than
this rank
• Example results for matching• 3D models are only used for visualizing results
Dataset II Example Correct Matches
Observations• This is a start to solving a challenging problem
– difficult even for humans• Results are mixed:
– Selectivity is very high• 57-70% of correct poses within top-1% selectivity (36 out of
3600)– But need to be higher to be end-usable– Yet in ideal conditions signatures appear very discriminative
• Main challenges– False VCLH negatives (some)
• building corners not detected due to poor resolution, etc.– False VCLH positives (many)
• Windows / other façade features often misdetected as corners– Architectural designs are seldom perfect extrusions
• Overhangs, balconies, fire escapes, etc.
Concluding Remarks• Geometric features can be powerful for discriminating locations
– Do not always have to rely on prior appearance data– Intelligent extension to geometric 2½D features
• 2D 2D+normals 2D+depth ratios 3D (mod scale)– Informal test in ideal conditions show excellent discriminating
power
• Key challenge lies in more robust image analysis– Needs robustness to noise and minor deviations from map
• Future Work– Use existing results to bootstrap more advanced (and costly)
registration techniques• E.g. top-down bundle adjustment working directly on raw
image intensities, rather than detected edgels
Credits• Joint work with
– Arridhana Ciptadi– Wei-Chian Tan– Minh-Tri Pham– Clement Liang-Tien Chia
• Thanks– Teck-Khim Ng– Zahoor Zafrulla– Rudianto Sugiyarto
• Research Sponsor– Project Tacrea Grant
Defence Science & Technology Agency (DSTA), Singapore
Scene Assumptions
• Quasi-Manhattan World– Vertical direction is orthogonal to all horizontal directions– Horizontal directions need not be orthogonal to each other
• Vertical Extrusion Model– Each building is a vertical extrusion of a ground-plane cross
section
• Implies buildings have simple vertical planar facades
Potential Future Directions
• Exploit localized architectural design “language”?– priors to improve geometric feature detection in poor quality
images– predict occluded parts of higher order geometric features that
form the local architectural “vocabulary”
• Investigate if reasonable to have prior distribution that buildings close by have similar geometric designs