Cues and Constraints for Single Image 3Dseminars/seminars/Extra/2015_07_14...Cues and Constraints for Single Image 3D David Fouhey Joint work with: Abhinav Gupta, Martial Hebert, Xiaolong

Cues and Constraints for Single Image 3D

David Fouhey

Joint work with: Abhinav Gupta, Martial Hebert,

Xiaolong Wang, Wajahat Hussain

2

You See…

3

Unfortunately…

4

Why?

5

Local Cues

6

Constraints

7

Constraints

8

Today

9

Single Image 3D Without a Single 3D Image. Fouhey, Hussain, Gupta, Hebert.

In Submission to ICCV 2015.

Unfolding an Indoor Origami World. Fouhey, Gupta, Hebert. ECCV 2014.

Designing Deep Networks for Surface Normal Estimation. Wang, Fouhey, Gupta. CVPR 2015.

Specific Problem

10

Input Normals

Image Legend

My Other Interests

11

People and Scenes with V. Delaitre, A.A. Efros, I. Laptev, J. Sivic

ECCV ’12, IJCV ‘14

OneWeirdKernelTrick.com Visual Identification of Matrix Rank, Spectral Approaches to Ghost Detection, etc.

with D. Maturana, SIGBOVIK ‘13, ‘15

Visual Prediction with L. Zitnick

CVPR ‘14

Local Cues

12

Local Cues for Surface Normals

13

Hoiem et al. (ICCV ‘05)

Fouhey et al. (ICCV ’13)

Ladicky et al. (ECCV ’14)

Li et al. (CVPR ’15)

Eigen et al. (Arxiv ‘14)

Standard Paradigm

14

Input: RGBD Data Output: Model

Our Paradigm

15

Input: RGBD Data Output: Model

Which One Is Unsupervised?

16


17

Median Error

23.1 Median Error

19.2

Median Error

17.9 Median Error

21.7


18

(B) (A)

(D) (C)

Factorization

19

Factorization

20 3D Model from Guo and Hoiem, ICCV13.

3D Structure

Factorization

21

Style 3D Structure

Factorization

22

Style Image

Style Elements

23

Style Image

Detecting with a Style Element

24

Style Element

Input Image

Rectified Images


25

Style Element

Input image Detection +

Orientation

Similar to Hedau et al., 2010, Fidler 2012,


26

Original Images

Rectified Images

Detection + Orientation

How Do We Discover Elements?

27

…

Automatic Method

Input: RGB Images Output: Style Elements

Modeling Assumptions

28

Overview

29

…

Hypothesize

30

Hypothesize

31

Hypothesize

32 HOG, Dalal and Triggs ’05; ELDA from Hariharan et al. ‘12

What Do We Know?

33

θ

What Do We Know?

34

θ ~Unif([0,360])

θ

H/W ~Unif([1,2])

H

W

Verifying Style Elements

Surface Orientation

X Location 35


Surface Orientation

Prior = (Rectangular Box)

X Location 36

θ


Surface Orientation

X Location 37


Surface Orientation

X Location 38


X Location 39

𝑃𝑟𝑖𝑜𝑟𝑖 −𝐷𝑎𝑡𝑎𝑖

𝑊

𝑖=1


Surface Orientation

X Location 40


Surface Orientation

X Location 41

Discovered Style Elements

Element Detections Element Detections

42

Interpreting

…

…

43

Interpreting

44

Results

45

Input

GT

Output

Results

46

Input

GT

Output

Results

47

Input

GT

Output

Quantitative Results

49

Input Ground Truth Prediction

5⁰


50

All Pixels Pixels < 30⁰

Median Error

Proposed 21.7 55.4

Vertical Pixels Pixels < 30⁰

Median Error

19.9 58.8

19.7 59.7

Pixels < 11.25⁰

36.8

3DP 19.2 57.8 39.2

Fouhey et al. ‘14 17.9 58.9

Ladicky et al. ’14 23.5 58.7 40.5 27.7

Eigen et al. ‘14 15.5 71.1

Wang et al. ’15 14.4 68.2 39.2 42.0

Results on Internet Images

51

RGBD Datasets What about?


52

Supermarket

Museum

Laundromat

Locker Room


53


54


• Places-205 Dataset (Zhou et al.)

• ~3.7% better than pretrained supervised model (3DP); better in 9/10 categories

55

Observation

56

Element #1

Element #2

Element #3

Element #4

Element #5

Purely Data-Driven Single

Element

Data+Structure

So Far

57

Input Output

What about?

Constraints for Single Image 3D

58

Low Level, Generic

Hoiem et al. 2005, Saxena et al. 2005, 2008, Munoz et al., 2009, etc.

Constraints for Single Image 3D

59

High Level, Physical

Low Level, Generic

Coughlan et al. 2000, Hedau et al. 2009, Del Pero et al., 2011, Wang et al., 2012, Schwing et al. 2012, Lee et al. 2010, Xiao et al. 2012, Zhao et al. 2013, Schwing et al., 2013, etc.

Mid-level in the Past

60 Huffman 71, Clowes 71, Kanade 80, 81 Sugihara 86, Malik 87, etc.

Our Mid-Level Constraints

61

Our Output

62

Input: Single Image

Output: Discrete Scene Parse

Parameterization

63

Parameterization

vp1

vp2

vp3

VP Estimator from Hedau et al., 2009 64

Parameterization

Two VPs give grid cell

65

Encoding Surface Normals

66


67


68


x1,…, x400 x401,…, x800 x801,…, x1200 69

Formulation

70

Constraints

71

Unaries

72

Unaries

Low c Any 3D Evidence

High c

73

Binaries

74

Convex/Concave Constraints

Detected Concave (-) 75









Detecting Convex/Concave

Ground-Truth Discontinuities similar to Gupta, Arbelaez, Malik, 2013 3DP from Fouhey, Gupta, Hebert, 2013

Input 3D Primitive Bank

…

Use 3DP to Transfer Convex/Concave

80

Solving the Model

82

Qualitative Results

83

Qualitative Results

85

Qualitative Results

86

Results – Quantitative

Summary Stats (⁰) (Lower Better)

% Good Pixels (Higher Better)

11.25⁰ 22.5⁰ 30⁰

Proposed 40.5 54.1 58.9

Mean Median

3DP 39.2 52.9 57.8 36.3 19.2

Ladicky ‘14 27.7 49.0 58.7 33.5 23.1

Li ’15* 19.6 40.6 53.7 30.6 27.8

35.2 17.9

88 Fouhey et al. ICCV ’13; Ladicky et al. ECCV ‘14; Li et al. CVPR ‘15

So Far

89

Input Outputs

HOG HOG + Standard Vision Feats + QP

Applying Deep Learning

90

CNN

How do we enforce constraints?

How do we represent the output?

Input Goal Output

Representation

91 Normal Coding from Ladicky, Zeisl, Pollefeys ECCV ‘14

Applying Deep Learning

92

CNN

How do we enforce constraints?

How do we represent the output?

Input Output

What Might We Want?

93

Global Cues Manhattan-World Room Layout

Local Cues Discontinuities

Design

94 Local Edges Global Manhattan Layout

?

Global Network

95

Global

Local Edges Global Manhattan Layout

Local Network

96

Local


Local Network

97

Local


Fusion

98

Local


Global

Fusion

Results

Input Output Input Output Input Output

99

Results

Input Output

100

Results

101

Input Output


102

Summary Stats (⁰) % Good Pixels

11.25⁰ 22.5⁰ 30⁰ Mean Median

3DP 39.2 52.9 57.8 36.3 19.2

Ladicky ‘14 27.7 49.0 58.7 33.5 23.1

Li ’15* 19.6 40.6 53.7 30.6 27.8

UNFOLD 40.5 54.1 58.9 35.2 17.9

Proposed 42.0 61.2 68.2 26.9 14.8

Eigen ‘14 39.2 62.0 71.1 23.7 15.5

Eigen et al. Arxiv ’14; Fouhey et al. ICCV ’13, ECCV ‘14; Ladicky et al. ECCV ‘14; Li et al. CVPR ‘15

Summary – Task

103

Input Normals

Summary – Solutions

104

RGB Images Model Surface Normals


105

Edge Constraints Scene Parse


106

Local

Global

Fusion

CNNs + 3D Representations and Constraints

Acknowledgments

107

Collaborators

Xiaolong Wang

Abhinav Gupta Martial Hebert

Sponsors • NSF Graduate Research

Fellowship • NDSEG Fellowship • ONR • NSF • DARPA • Bosch Research &

Technology

Wajahat Hussain

Thank You!

108

Documents

Cues and Constraints for Single Image 3Dseminars/seminars/Extra/2015_07_14...Cues and Constraints for Single Image 3D David Fouhey Joint work with: Abhinav Gupta, Martial Hebert, Xiaolong