Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Cues and Constraints for Single Image 3D
David Fouhey
Joint work with: Abhinav Gupta, Martial Hebert,
Xiaolong Wang, Wajahat Hussain
2
You See…
3
Unfortunately…
4
Why?
5
Local Cues
6
Constraints
7
Constraints
8
Today
9
Single Image 3D Without a Single 3D Image. Fouhey, Hussain, Gupta, Hebert.
In Submission to ICCV 2015.
Unfolding an Indoor Origami World. Fouhey, Gupta, Hebert. ECCV 2014.
Designing Deep Networks for Surface Normal Estimation. Wang, Fouhey, Gupta. CVPR 2015.
Specific Problem
10
Input Normals
Image Legend
My Other Interests
11
People and Scenes with V. Delaitre, A.A. Efros, I. Laptev, J. Sivic
ECCV ’12, IJCV ‘14
OneWeirdKernelTrick.com Visual Identification of Matrix Rank, Spectral Approaches to Ghost Detection, etc.
with D. Maturana, SIGBOVIK ‘13, ‘15
Visual Prediction with L. Zitnick
CVPR ‘14
Local Cues
12
Local Cues for Surface Normals
13
Hoiem et al. (ICCV ‘05)
Fouhey et al. (ICCV ’13)
Ladicky et al. (ECCV ’14)
Li et al. (CVPR ’15)
Eigen et al. (Arxiv ‘14)
Standard Paradigm
14
Input: RGBD Data Output: Model
Our Paradigm
15
Input: RGBD Data Output: Model
Which One Is Unsupervised?
16
Which One Is Unsupervised?
17
Median Error
23.1 Median Error
19.2
Median Error
17.9 Median Error
21.7
Which One Is Unsupervised?
18
(B) (A)
(D) (C)
Factorization
19
Factorization
20 3D Model from Guo and Hoiem, ICCV13.
3D Structure
Factorization
21
Style 3D Structure
Factorization
22
Style Image
Style Elements
23
Style Image
Detecting with a Style Element
24
Style Element
Input Image
Rectified Images
Detecting with a Style Element
25
Style Element
Input image Detection +
Orientation
Similar to Hedau et al., 2010, Fidler 2012,
Detecting with a Style Element
26
Original Images
Rectified Images
Detection + Orientation
How Do We Discover Elements?
27
…
Automatic Method
Input: RGB Images Output: Style Elements
Modeling Assumptions
28
Overview
29
…
Hypothesize
30
Hypothesize
31
Hypothesize
32 HOG, Dalal and Triggs ’05; ELDA from Hariharan et al. ‘12
What Do We Know?
33
θ
What Do We Know?
34
θ ~Unif([0,360])
θ
H/W ~Unif([1,2])
H
W
Verifying Style Elements
Surface Orientation
X Location 35
Verifying Style Elements
Surface Orientation
Prior = (Rectangular Box)
X Location 36
θ
Verifying Style Elements
Surface Orientation
X Location 37
Verifying Style Elements
Surface Orientation
X Location 38
Verifying Style Elements
X Location 39
𝑃𝑟𝑖𝑜𝑟𝑖 −𝐷𝑎𝑡𝑎𝑖
𝑊
𝑖=1
Verifying Style Elements
Surface Orientation
X Location 40
Verifying Style Elements
Surface Orientation
X Location 41
Discovered Style Elements
Element Detections Element Detections
42
Interpreting
…
…
43
Interpreting
44
Results
45
Input
GT
Output
Results
46
Input
GT
Output
Results
47
Input
GT
Output
Quantitative Results
49
Input Ground Truth Prediction
5⁰
Quantitative Results
50
All Pixels Pixels < 30⁰
Median Error
Proposed 21.7 55.4
Vertical Pixels Pixels < 30⁰
Median Error
19.9 58.8
19.7 59.7
Pixels < 11.25⁰
36.8
3DP 19.2 57.8 39.2
Fouhey et al. ‘14 17.9 58.9
Ladicky et al. ’14 23.5 58.7 40.5 27.7
Eigen et al. ‘14 15.5 71.1
Wang et al. ’15 14.4 68.2 39.2 42.0
Results on Internet Images
51
RGBD Datasets What about?
Results on Internet Images
52
Supermarket
Museum
Laundromat
Locker Room
Results on Internet Images
53
Results on Internet Images
54
Quantitative Results
• Places-205 Dataset (Zhou et al.)
• ~3.7% better than pretrained supervised model (3DP); better in 9/10 categories
55
Observation
56
Element #1
Element #2
Element #3
Element #4
Element #5
Purely Data-Driven Single
Element
Data+Structure
So Far
57
Input Output
What about?
Constraints for Single Image 3D
58
Low Level, Generic
Hoiem et al. 2005, Saxena et al. 2005, 2008, Munoz et al., 2009, etc.
Constraints for Single Image 3D
59
High Level, Physical
Low Level, Generic
Coughlan et al. 2000, Hedau et al. 2009, Del Pero et al., 2011, Wang et al., 2012, Schwing et al. 2012, Lee et al. 2010, Xiao et al. 2012, Zhao et al. 2013, Schwing et al., 2013, etc.
Mid-level in the Past
60 Huffman 71, Clowes 71, Kanade 80, 81 Sugihara 86, Malik 87, etc.
Our Mid-Level Constraints
61
Our Output
62
Input: Single Image
Output: Discrete Scene Parse
Parameterization
63
Parameterization
vp1
vp2
vp3
VP Estimator from Hedau et al., 2009 64
Parameterization
Two VPs give grid cell
65
Encoding Surface Normals
66
Encoding Surface Normals
67
Encoding Surface Normals
68
Encoding Surface Normals
x1,…, x400 x401,…, x800 x801,…, x1200 69
Formulation
70
Constraints
71
Unaries
72
Unaries
Low c Any 3D Evidence
High c
73
Binaries
74
Convex/Concave Constraints
Detected Concave (-) 75
Convex/Concave Constraints
Detected Concave (-) 76
Convex/Concave Constraints
Detected Concave (-) 77
Convex/Concave Constraints
Detected Concave (-) 78
Convex/Concave Constraints
Detected Concave (-) 79
Detecting Convex/Concave
Ground-Truth Discontinuities similar to Gupta, Arbelaez, Malik, 2013 3DP from Fouhey, Gupta, Hebert, 2013
Input 3D Primitive Bank
…
Use 3DP to Transfer Convex/Concave
80
Solving the Model
82
Qualitative Results
83
Qualitative Results
85
Qualitative Results
86
Results – Quantitative
Summary Stats (⁰) (Lower Better)
% Good Pixels (Higher Better)
11.25⁰ 22.5⁰ 30⁰
Proposed 40.5 54.1 58.9
Mean Median
3DP 39.2 52.9 57.8 36.3 19.2
Ladicky ‘14 27.7 49.0 58.7 33.5 23.1
Li ’15* 19.6 40.6 53.7 30.6 27.8
35.2 17.9
88 Fouhey et al. ICCV ’13; Ladicky et al. ECCV ‘14; Li et al. CVPR ‘15
So Far
89
Input Outputs
HOG HOG + Standard Vision Feats + QP
Applying Deep Learning
90
CNN
How do we enforce constraints?
How do we represent the output?
Input Goal Output
Representation
91 Normal Coding from Ladicky, Zeisl, Pollefeys ECCV ‘14
Applying Deep Learning
92
CNN
How do we enforce constraints?
How do we represent the output?
Input Output
What Might We Want?
93
Global Cues Manhattan-World Room Layout
Local Cues Discontinuities
Design
94 Local Edges Global Manhattan Layout
?
Global Network
95
Global
Local Edges Global Manhattan Layout
Local Network
96
Local
Local Edges Global Manhattan Layout
Local Network
97
Local
Local Edges Global Manhattan Layout
Fusion
98
Local
Local Edges Global Manhattan Layout
Global
Fusion
Results
Input Output Input Output Input Output
99
Results
Input Output
100
Results
101
Input Output
Quantitative Results
102
Summary Stats (⁰) % Good Pixels
11.25⁰ 22.5⁰ 30⁰ Mean Median
3DP 39.2 52.9 57.8 36.3 19.2
Ladicky ‘14 27.7 49.0 58.7 33.5 23.1
Li ’15* 19.6 40.6 53.7 30.6 27.8
UNFOLD 40.5 54.1 58.9 35.2 17.9
Proposed 42.0 61.2 68.2 26.9 14.8
Eigen ‘14 39.2 62.0 71.1 23.7 15.5
Eigen et al. Arxiv ’14; Fouhey et al. ICCV ’13, ECCV ‘14; Ladicky et al. ECCV ‘14; Li et al. CVPR ‘15
Summary – Task
103
Input Normals
Summary – Solutions
104
RGB Images Model Surface Normals
Summary – Solutions
105
Edge Constraints Scene Parse
Summary – Solutions
106
Local
Global
Fusion
CNNs + 3D Representations and Constraints
Acknowledgments
107
Collaborators
Xiaolong Wang
Abhinav Gupta Martial Hebert
Sponsors • NSF Graduate Research
Fellowship • NDSEG Fellowship • ONR • NSF • DARPA • Bosch Research &
Technology
Wajahat Hussain
Thank You!
108