Transcript
Page 1: Exp1 Exp2 Exp3 - Virginia Techygoyal/posters/holistic.pdf · Exp1 Exp2 Exp3 Se ma ntic Se g me nta tion 3 D Su pport t ima tion 2 D Se ma ntic Se g me nta tion 3 D Se ma ntic Se g

Holistic Scene Understanding via Multiple Structured Hypotheses from Perception ModulesGordon Christie1* Ankit Laddha2* Aishwarya Agrawal1 Stanislaw Antol1 Yash Goyal1 Dhruv Batra1

1Virginia Tech 2CMU*equal contribution

Overview

MotivationPerception problems are hardGoalHolistic Scene Understanding (inputs from multiple modules)Challenges• Inaccurate models• Search space explosion

Experiment 1: Captioned Scene Understanding Experiment 2: Indoor Scene Understanding

Proposed Solution

all possible segmentations

all possiblesupport estimations

all possiblesentence parsings

Semantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y aSemantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y a

Semantic

Segmentation

Sentence

Parsing

Support

Estimation

ay

S(yi )

S(zj )

S(ak)

C(yk,zj ) C(zj ,ak)

C(yi ,ak)

C(yi ,zj ,ak)

z

Delta Approximation Delta Approximation

Approximation

Delta

aSco

re(a

)

Sco

re(y

)

Sco

re(z

)

Sco

re(z

)

Sco

re(y

)

Sco

re(a

)

y

z z

y a

X X .. X

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3 Module 1: Semantic Segmentation (SS)Module 2: Prepositional Phrase Attachment

Resolution (PPAR)Datasets : ABSTRACT-50S/ PASCAL-50S/

NYU-v2 Features : Module, Consistent Preposition

and Presence

Approach

• Extract diverse hypotheses from multiple modules [1]

• Jointly reason about hypotheseso Develop “Mediator”

model (factor graph)o Infer consistency

ABSTRACT-50S

Module INDEP Ours-MEDIATOR oracle

PPAR 56.73 77.39 97.53

NYUv2

Module INDEP Ours-CASCADE Ours-MEDIATOR oracle

SS 46.13 46.05 46.37 51.30

PPAR 61.54 57.69 64.42 92.31

Average 53.84 51.87 55.40 71.81

PASCAL-50S

Module INDEP Ours-CASCADE Ours-MEDIATOR oracle

SS 31.14 32.68 34.12 38.87

PPAR 62.42 78.92 87.00 96.50

Average 46.78 55.80 60.56 67.68

Methods:• INDEP: 1-best solution for each

module• Ours-CASCADE: DivMBest for module1

+ 1-best for module2• Ours-MEDIATOR: DivMBest for

module1 and module2• oracle: best tuple always selected

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Module 1: Semantic Segmentation (SS)Module 2: 3D Support Estimation (SE)Dataset: NYUv2

Experiment 2 Results

Module INDEP Joint Ours-CASCADE

Ours-MEDIATOR

oracle

SS 64.24 62.00 64.22 64.24 70.24

SE 55.48 56.43 57.38 57.33 62.29

Average 59.86 59.22 60.80 60.79 66.27

Experiment 3: Urban Scene Understanding

Module 1: 2D Semantic Segmentation Module 2: 3D Semantic SegmentationDataset: CITY (stereo)

Semantic

Segmentation

3D Support

Estimation

2D Semantic

Segmentation

3D Semantic

Segmentation

Consistent

Hypothesis

#1

Hypothesis

#2

Hypothesis

#M

Semantic

Segmentation

Sentence

Parsing

“A dog is standing next to a woman

on a couch”

Consistent

Couch

Person

Person

Person

Couch

Dog

Other

Structure Wall

Other

Prop

Wall Other

Structure

Wall

Table

Sofa

Table

Other

Prop

Chair

Wall

Wall

Television

Consistent

Road

Building Sky

Person Car

Sidewalk

Building

Road

Sidewalk

Curb

Person Car

Sky

Road

Mark Sidewalk

Sidewalk Person

Sign Sky Tree/bush

Building

Building

Mark Road

Building Sign

Car

Tree/bush

Road

Building Car

Mark

Car

Road

Tree/bush

Sidewalk

Sidewalk

Curb

Car

Tree/bush Building

Couch

Exp1 Exp2 Exp3

Experiment 3 Results

Module INDEP Ours-CASCADE

Ours-MEDIATOR

oracle

2D SS 54.80 55.65 55.65 57.82

3D SS 32.07 57.16 57.98 61.15

Average 43.44 56.41 56.82 59.49

[1] D. Batra et al. Diverse M-Best Solutions in Markov Random Fields. In ECCV, 2012.

INDEP

Ours-MEDIATOR

INDEP

Ours-MEDIATOR

INDEP

Ours-MEDIATOR

+20.7 %

+1.6 %

+13.8 %

Ours-MEDIATOR

INDEP+0.93 %

Ours-MEDIATOR

INDEP+13.4 %

…couchcouch dog cat

couch