Scene Cut: Class-specific object detection and ...€¦ · the scene S of vertices v with descriptors d and edges E, ISM based unary term 14 The power of the detection method was

Jan Knopp, Mukta Prasad, Luc Van Gool

VISICS, KU Leuven, BelgiumBIWI, ETH Zurich, Switzerland

Scene Cut:Class-specific object detection and

segmentation in 3D scenes

1

Detection & SegmentationThe goal of the work is to find objects of a specific class in the scene.

2

Trash bin

Person

Table

Bottle

Chair

3

Why is it difficult?The problem of a successful scene segmentation comes with the following challenges:

• Collision of objects

• Global context of local parts

• Real-life data is often noisy and uncomplete

4

Why is it important?Successful segmentation opens doors for the following applications:

• Scene understanding

• Robot navigation, range maps

• Create layers of the scene automatically

5

Motivation SOTA vs. ours

Method Results & Summary

6

Related work

• Knopp, J. et al.: “Hough transf...” In: ECCV. (2010) Salti, S. et al.: “On the use...” In ACCV. (2010)

• Sun et al. “Depth-Encoded Hough Voting for joint object detection...”, In ECCV, 2010.

• Munoz et al. “Directional associativemarkov network...”, In 3DPVT, 2008.

• Golovinskiy et al. “Shape-based recognition of 3d points...”, In ICCV, 2009.

• Kalogerakis et. al. “Learning 3D mesh segement...”, In ACM Trans. On Grph., 2010

How to detect and segment objects in the scene?

Input Classification

Detection & SegmentationWe combined a voting-based ISM class recognition approach with Graph-Cuts segmentation to find class-specific objects in 3D scenes.

7

• GraphCut: segmentation as a graph optimization problem

• Detection: find the object's location

• Back-projection: model the object characteristics

8



MRF based segmentation

9

1) foreground obeys object class characteristics

2) foreground/background segments should be smooth


10

The goal is to find the labelling f that maximizes the distribution p in the scene S of vertices v with descriptors d and edges E,

1) foreground obeys object class characteristics

2) foreground/background segments should be smooth


11

A

B

Unary: is the specific vertex/node foreground or not?



12

AB


Ising pairwise prior: neighbours have same label.



13


Pairwise Prior: neighbours have same label.

Data-dependent Pairwise: similar vertex characteristics should have the same label

A B


ISM based unary term

14

The power of the detection method was used to define the unary term. The descriptors of the training shapes are clustered. ISM represents shapes as a collection of these cluster centres (visual words).

The ISM [1,2] is the collection of visual words together with their position related to the object centre:

...

[1] Knopp, J. et al.: Hough transform and 3d surf..In: ECCV. (2010)[2] Salti, S. et al.: On the use of implict... In: ACCV. (2010)


15

Using the trained ISM model, we find the centre of the object of interest.

Scene Votes for the object centre Detected centre of the person class


16

The centre of the object is then used to back-project the training data and to construct the unary term.

Backprojection Unary term Segmentation result

where b is i-th closest back-projection of k-th vertex v. f models whether the vertex voted correctly for the object centre or not.

Unary term:

Pairwise term

17

Ising smoothness prior: neighbours should have the same label.

Pairwise data-dependent:

1. Vertex similarity

Based on the difference of descriptors of vertices:

2. “Plane-like” surface

Edge smoothness measured using the angle between faces.

18



Experiments synthetic data

19

Half of the TOSCA [1] dataset was used for training. Models excluded from training composed a test scene. Additionally, the Light contest scene [2] together with the TOSCA shape was used. In all experiments, we used 3DSURF [3] as the shape descriptor.

Exam

ple

of

test

dat

a:

[1] Bronstein, A. et al.: Numerical Geometry of Non-Rigid Shapes. Springer Publishing Company (2008)[2] 3dRender.com: (Lighting challenges) http://www.3drender.com/challenges/. [3] Knopp, J. et al.: Hough transform and 3d surf for robust three dimensional classification. In: ECCV. (2010)

Resu

lts:

Experiments synthetic data

20

Half of TOSCA [1] dataset used for training. Models excluded from training were merge to create a test scene.

VW: occurrence freq. of visual words.

VW-tfidf: normalized VW by tf-idf [2].

Multi-VW: VW-like using soft assignment.

Desc-NN: nearest neig. on the set of all descriptors [3].

ISM:

[1] Bronstein, A. et al.: Numerical Geometry of Non-Rigid Shapes. Springer Publishing Company (2008)[2] Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV. (2003)[3] Boiman, O., Shechtman, E., Irani, M.: In defense of nearestneighbor based image classification. In: CVPR. (2008)

Experiments real-life data

21

Trai

ning

da

ta:

Resu

lts:

Arc3D

Proc

ess

of

obta

inin

g da

ta:

Set of images Reconstructed modelSfM algorithm

Summary

22

• Segmentation using the power of the ISM based detection approach gives an important improvement.

• No additional scene processing such as background plane filtering is required.

• Segmentation of clean shapes as well as noisy and incomplete models from SfM.

23

Thank you for your attention!

Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23

Documents

Scene Cut: Class-specific object detection and ...€¦ · the scene S of vertices v with descriptors d and edges E, ISM based unary term 14 The power of the detection method was