23
Jan Knopp, Mukta Prasad, Luc Van Gool VISICS, KU Leuven, Belgium BIWI, ETH Zurich, Switzerland Scene Cut: Class-specific object detection and segmentation in 3D scenes 1

Scene Cut: Class-specific object detection and ...€¦ · the scene S of vertices v with descriptors d and edges E, ISM based unary term 14 The power of the detection method was

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Jan Knopp, Mukta Prasad, Luc Van Gool

    VISICS, KU Leuven, BelgiumBIWI, ETH Zurich, Switzerland

    Scene Cut:Class-specific object detection and

    segmentation in 3D scenes

    1

  • Detection & SegmentationThe goal of the work is to find objects of a specific class in the scene.

    2

    Trash bin

    Person

    Table

    Bottle

    Chair

  • 3

    Why is it difficult?The problem of a successful scene segmentation comes with the following challenges:

    • Collision of objects

    • Global context of local parts

    • Real-life data is often noisy and uncomplete

  • 4

    Why is it important?Successful segmentation opens doors for the following applications:

    • Scene understanding

    • Robot navigation, range maps

    • Create layers of the scene automatically

  • 5

    Motivation SOTA vs. ours

    Method Results & Summary

  • 6

    Related work

    • Knopp, J. et al.: “Hough transf...” In: ECCV. (2010) Salti, S. et al.: “On the use...” In ACCV. (2010)

    • Sun et al. “Depth-Encoded Hough Voting for joint object detection...”, In ECCV, 2010.

    • Munoz et al. “Directional associativemarkov network...”, In 3DPVT, 2008.

    • Golovinskiy et al. “Shape-based recognition of 3d points...”, In ICCV, 2009.

    • Kalogerakis et. al. “Learning 3D mesh segement...”, In ACM Trans. On Grph., 2010

    How to detect and segment objects in the scene?

    Input Classification

  • Detection & SegmentationWe combined a voting-based ISM class recognition approach with Graph-Cuts segmentation to find class-specific objects in 3D scenes.

    7

    • GraphCut: segmentation as a graph optimization problem

    • Detection: find the object's location

    • Back-projection: model the object characteristics

  • 8

    Motivation SOTA vs. ours

    Method Results & Summary

  • MRF based segmentation

    9

    1) foreground obeys object class characteristics

    2) foreground/background segments should be smooth

  • MRF based segmentation

    10

    The goal is to find the labelling f that maximizes the distribution p in the scene S of vertices v with descriptors d and edges E,

    1) foreground obeys object class characteristics

    2) foreground/background segments should be smooth

  • MRF based segmentation

    11

    A

    B

    Unary: is the specific vertex/node foreground or not?

    The goal is to find the labelling f that maximizes the distribution p in the scene S of vertices v with descriptors d and edges E,

  • MRF based segmentation

    12

    AB

    Unary: is the specific vertex/node foreground or not?

    Ising pairwise prior: neighbours have same label.

    The goal is to find the labelling f that maximizes the distribution p in the scene S of vertices v with descriptors d and edges E,

  • MRF based segmentation

    13

    Unary: is the specific vertex/node foreground or not?

    Pairwise Prior: neighbours have same label.

    Data-dependent Pairwise: similar vertex characteristics should have the same label

    A B

    The goal is to find the labelling f that maximizes the distribution p in the scene S of vertices v with descriptors d and edges E,

  • ISM based unary term

    14

    The power of the detection method was used to define the unary term. The descriptors of the training shapes are clustered. ISM represents shapes as a collection of these cluster centres (visual words).

    The ISM [1,2] is the collection of visual words together with their position related to the object centre:

    ...

    [1] Knopp, J. et al.: Hough transform and 3d surf..In: ECCV. (2010)[2] Salti, S. et al.: On the use of implict... In: ACCV. (2010)

  • ISM based unary term

    15

    Using the trained ISM model, we find the centre of the object of interest.

    Scene Votes for the object centre Detected centre of the person class

  • ISM based unary term

    16

    The centre of the object is then used to back-project the training data and to construct the unary term.

    Backprojection Unary term Segmentation result

    where b is i-th closest back-projection of k-th vertex v. f models whether the vertex voted correctly for the object centre or not.

    Unary term:

  • Pairwise term

    17

    Ising smoothness prior: neighbours should have the same label.

    Pairwise data-dependent:

    1. Vertex similarity

    Based on the difference of descriptors of vertices:

    2. “Plane-like” surface

    Edge smoothness measured using the angle between faces.

  • 18

    Motivation SOTA vs. ours

    Method Results & Summary

  • Experiments synthetic data

    19

    Half of the TOSCA [1] dataset was used for training. Models excluded from training composed a test scene. Additionally, the Light contest scene [2] together with the TOSCA shape was used. In all experiments, we used 3DSURF [3] as the shape descriptor.

    Exam

    ple

    of

    test

    dat

    a:

    [1] Bronstein, A. et al.: Numerical Geometry of Non-Rigid Shapes. Springer Publishing Company (2008)[2] 3dRender.com: (Lighting challenges) http://www.3drender.com/challenges/. [3] Knopp, J. et al.: Hough transform and 3d surf for robust three dimensional classification. In: ECCV. (2010)

    Resu

    lts:

  • Experiments synthetic data

    20

    Half of TOSCA [1] dataset used for training. Models excluded from training were merge to create a test scene.

    VW: occurrence freq. of visual words.

    VW-tfidf: normalized VW by tf-idf [2].

    Multi-VW: VW-like using soft assignment.

    Desc-NN: nearest neig. on the set of all descriptors [3].

    ISM:

    [1] Bronstein, A. et al.: Numerical Geometry of Non-Rigid Shapes. Springer Publishing Company (2008)[2] Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV. (2003)[3] Boiman, O., Shechtman, E., Irani, M.: In defense of nearestneighbor based image classification. In: CVPR. (2008)

  • Experiments real-life data

    21

    Trai

    ning

    da

    ta:

    Resu

    lts:

    Arc3D

    Proc

    ess

    of

    obta

    inin

    g da

    ta:

    Set of images Reconstructed modelSfM algorithm

  • Summary

    22

    • Segmentation using the power of the ISM based detection approach gives an important improvement.

    • No additional scene processing such as background plane filtering is required.

    • Segmentation of clean shapes as well as noisy and incomplete models from SfM.

  • 23

    Thank you for your attention!

    Slide 1Slide 2Slide 3Slide 4Slide 5Slide 6Slide 7Slide 8Slide 9Slide 10Slide 11Slide 12Slide 13Slide 14Slide 15Slide 16Slide 17Slide 18Slide 19Slide 20Slide 21Slide 22Slide 23