View
51
Download
0
Category
Tags:
Preview:
DESCRIPTION
Vladimir Nedović. Depth Estimation via Scene Classification. vnedovic@science.uva.nl. with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research). 28-05-2008. Order in Pollock's Chaos. - PowerPoint PPT Presentation
Citation preview
Depth Estimation via Scene Classification
Vladimir NedovićVladimir Nedović
28-05-200828-05-2008
vnedovic@science.uva.nlvnedovic@science.uva.nl
with: Arnold Smeulders & Jan-Mark Geusebroek (UvA) André Redert (Philips Research)
seems chaotic, but there is seems chaotic, but there is structure - same as in structure - same as in natural image statisticsnatural image statistics
viewpoint viewpoint constraintsconstraints understood, understood, influence on influence on film artfilm art
‘‘modal’ scene modal’ scene configurations configurations – – structures structures orthogonalorthogonal to to each othereach other
Order in Pollock's ChaosOrder in Pollock's Chaos
Jackson Pollock, Jackson Pollock, Blue Poles: Number 1, 1952Blue Poles: Number 1, 1952
R.P. Taylor, A.P. Micolich and D. Jonas, R.P. Taylor, A.P. Micolich and D. Jonas, Fractal Analysis Of Pollock's Drip Fractal Analysis Of Pollock's Drip PaintingsPaintings, Nature, vol. 399, p.422 (1999), Nature, vol. 399, p.422 (1999)
Sandro Botticelli, Sandro Botticelli, AnnunciationAnnunciation, 1489-90, 1489-90
Post-perspective Post-perspective (Quattrocento, after 1430)(Quattrocento, after 1430)
Pre-perspective Pre-perspective (Gothic art, before 1430)(Gothic art, before 1430)
Simone Martini (1285-1344)Simone Martini (1285-1344)
W. Richards, A. Jepson and J. Feldman, W. Richards, A. Jepson and J. Feldman, Priors, Priors, Preferences and Categorical PerceptsPreferences and Categorical Percepts, in , in Perception as Bayesian InferencePerception as Bayesian Inference, pp. 80-111, , pp. 80-111, 1996.1996.
Know any tilted buildings?Know any tilted buildings?
OutlineOutline
IntroductionIntroduction
Related workRelated work
Our approachOur approach
Preliminary classificationPreliminary classification
ConclusionsConclusions
IntroductionIntroduction
The context: fully automatic 2D to 3D conversion of The context: fully automatic 2D to 3D conversion of video data for 3DTVvideo data for 3DTV
GOALGOAL: in a fast manner, obtain an approximate, but : in a fast manner, obtain an approximate, but visually pleasing 3D model from a single imagevisually pleasing 3D model from a single image
We know about stereo, structure from motion, etc. We know about stereo, structure from motion, etc. but can we also derive depth from a single image?but can we also derive depth from a single image? humans can, right?humans can, right?
Can we exploit some constraints?Can we exploit some constraints? is the data really chaotic?is the data really chaotic? what about perceptual limitations of viewers?what about perceptual limitations of viewers?
Related workRelated work
BUTBUT:: outdoor images only + assumes sky&ground are always presentoutdoor images only + assumes sky&ground are always present i.e. accounts for less than half of all possibilitiesi.e. accounts for less than half of all possibilities
Related work (3): Saxena (Stanford Univ.)Related work (3): Saxena (Stanford Univ.) 3D mesh from ML on low-level features (no classes)3D mesh from ML on low-level features (no classes)
Related work (2): Hoiem (Carnegie Melon Univ.)Related work (2): Hoiem (Carnegie Melon Univ.) obtained 3D orientation of scene surfaces using machine obtained 3D orientation of scene surfaces using machine
learning (ICCV 2005)learning (ICCV 2005) improved object detection (CVPR 2006 best paper) + accounted improved object detection (CVPR 2006 best paper) + accounted
for occlusions to derive relative ordering of elements (ICCV 2007)for occlusions to derive relative ordering of elements (ICCV 2007)
Related work (1): Related work (1): Torralba & Oliva showed that depth can be derived from structure, itself
derived from natural image statistics (IEEE PAMI 2001)
SSeparate a visual scene into its two constituent elements:
consider objects separately from the stage on which they act
Our approachOur approach
Our approach: depth estimation via geometric Our approach: depth estimation via geometric scene classificationscene classification i.e. holistic, not pixel-based
Determine the 3D stage model firstDetermine the 3D stage model first
SStage ≈ first approximation of global depth reduces subsequent (finer) depth processing tasks can guide other processes, e.g. object localization & recognition
V. Nedovićć et al. ICCV2007
objectobject
stagestage
Our approachOur approach- stage models -- stage models -
For the stage, a rough depth model is sufficientFor the stage, a rough depth model is sufficient
Exploit geometric structure of images, which Exploit geometric structure of images, which reduces the number of possible configurationsreduces the number of possible configurations
Only a few configurations are prominent => the Only a few configurations are prominent => the first step in depth estimation can be first step in depth estimation can be stage stage classificationclassification
regularities arise from:regularities arise from: natural image statistics -> texture gradientsnatural image statistics -> texture gradients
viewpoint constraints -> perspectiveviewpoint constraints -> perspective
modal configurations & film rules -> orthogonalitymodal configurations & film rules -> orthogonality
Our approachOur approach- stage hierarchy -- stage hierarchy -
Structure of the visual world leads to only 15 Structure of the visual world leads to only 15 geometric scene typesgeometric scene types
Influence of structure identical indoors & outdoors => such distinction unnecessary
Three-level hierarchyThree-level hierarchy
perform classification in steps: first determine the geometric neighbourhood, then proceed further
Our approachOur approach- three-level hierarchy -- three-level hierarchy -
i.e. no parameteri.e. no parameterestimation needed!estimation needed!
i.e. 2-3 sub-stages per each stage accounting for i.e. 2-3 sub-stages per each stage accounting for variability in parametersvariability in parameters
geometry at bottom so constrained that pre-geometry at bottom so constrained that pre-defined crude depth maps already possibledefined crude depth maps already possible
Preliminary classification (1)Preliminary classification (1)
Proof of concept with a single Proof of concept with a single feature typefeature type natural image statistics-based Weibull natural image statistics-based Weibull
features (i.e. texture gradients)features (i.e. texture gradients)
TRECVID dataset of TV news used for evaluation
A.F. Smeaton et al. “Evaluation campaigns and TRECVid”, 8th ACM Int’l Workshop on Multimedia Info. Retrieval, 2006.
Features extracted based on a 4x4 region grid over the image
two features per region => 64 features in total
Preliminary classification (2)Preliminary classification (2)
Support Vector Machines (SVM) classifier based on a 1 vs. 1 multi-class approach
class name % in dataset % correct1 sky+bkg+gnd 6.3% 16.7%2 gnd+bkg 7.1% 8.2%3 sky+gnd 8.7% 60.7%4 gnd+bkg 7.4% 44.7%5 gnd+diagBkg 10.8% 26.9%6 diagBkg 6.4% 14.3%7 box 5.5% 8.1%8 1 side-wall 9.0% 13.6%9 corner 10.8% 34.3%
10 tab+pers+bkg 7.4% 48.0%11 pers+bkg 13.1% 42.5%12 no depth 7.4% 22.4%
AVG: 28.4%
individual stages (results of individual stages (results of symmetrical variants symmetrical variants combined)combined)
group name % in dat aset % correctI straight/no bkg. 29.5% 69.5%II tilted bkg. 17.2% 35.2%III box 14.5% 19.6%IV corner 10.8% 13.2%V person+bkg 20.5% 63.1%
AVG: 4 0.1%
stage groupsstage groups
group name % correct (AVG)I straight/no bkg. 59.5%II tilted bkg. 27.0%III box 41.1%V person+bkg 72.7%
two-step classification, average within two-step classification, average within group (assuming super-stage is known)group (assuming super-stage is known)
Conclusions (1)Conclusions (1)
We need a fast & approximate solution:We need a fast & approximate solution: do only what is necessary, viewers may not do only what is necessary, viewers may not
perceive it anywayperceive it anyway generalize where possible, to reduce the problem at generalize where possible, to reduce the problem at
every stepevery step
Separate a scene into a stage and the objectsSeparate a scene into a stage and the objects
Determine the stage 3D model firstDetermine the stage 3D model first rough model is sufficientrough model is sufficient plus, structure greatly reduces the number of plus, structure greatly reduces the number of
possible configurationspossible configurations and, stage will help us to locate and process objectsand, stage will help us to locate and process objects
Conclusions (2)Conclusions (2)
Therefore, we can use scene classification as Therefore, we can use scene classification as the first step in depth estimationthe first step in depth estimation
Due to structure, we can create simple models Due to structure, we can create simple models that fit TV datathat fit TV data 15 stages is sufficient15 stages is sufficient no need to distinguish between indoor & outdoorno need to distinguish between indoor & outdoor
Conclusions (3)Conclusions (3)
Our approach: three-step classificationOur approach: three-step classification geometry at the bottom constrained enough, so we geometry at the bottom constrained enough, so we
can already assign pre-defined depth mapscan already assign pre-defined depth maps no parameter estimation necessaryno parameter estimation necessary
Proof of concept demonstrated with a single Proof of concept demonstrated with a single feature typefeature type performance much better than chanceperformance much better than chance but enhancements needed (more features etc.)but enhancements needed (more features etc.)
Recommended