26
Saliency-based Object Discovery Simone Frintrop Cognitive Computer Vision Group Rheinische Friedrich-Wilhelms-Universität Bonn 13.10.2014

Saliency-based Object Discoverypages.iai.uni-bonn.de/frintrop_simone/talks/Genua2014-frintrop.pdf · Cooperation with Bastian Leibe and Esther Horbert: Object discovery in real-world

  • Upload
    danganh

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Saliency-based Object Discovery

Simone Frintrop

Cognitive Computer Vision GroupRheinische Friedrich-Wilhelms-Universität Bonn

13.10.2014

Object DiscoveryWhat is object discovery? (also called: general/generic object detection, object proposal detection)

Find object without pre-knowledge(“What is an object”?) Capture the ‘objectness’

Already 5 months old infants can reliablydistinguish objects from background![von Hofsten & Spelke, J. of Exp. Psych. 1985]

Simone Frintrop 2

Object DiscoveryApplications: • Pre-processing for object classification:

Reduce the number of queries for therecognition program

3

Classify

Google Glass

Simone Frintrop

Object DiscoveryApplications:

• Analyzing photos, e.g.:– Automatic cropping – Automatic thumbnailing

Simone Frintrop

Steniford, ICVS 2007

Marchesotti et al., ICCV 2009

4

Object DiscoveryApplications: • Robotics:

– Detect candidates for manipulation– Exploration: create database of all objects

Simone Frintrop

5

Rhino

iCubPlayBot: a robotic wheelchair for disabled children [Rotenstein et al. 2007]

Simone Frintrop

[Horbert, Martín-García,Frintrop, Leibe, submitted]

From human object perception to an object discovery system

In the following, object discovery on 3 types of input data:

Object Discovery

Simone Frintrop 6

1) Photos 2) Videos 3) RGB-D data

[Frintrop et al.: ICPR 2014][Martín-García/Frintrop:

CogSci 2013]

[Martín-García/Frintrop/Cremers:J. of KI 2013]

Human perception:• Object detection takes place before object recognition [Pylyshyn 2001]

Segmentation

2D Object Discovery• Segmentation processes on all levels of the visual system bundle parts of the visual

input [Scholl 2001]. Result: proto-objects (superpixels)• Proto-objects are combined by focused attention to form coherent objects (attention

“grabs” proto-objects [Rensink, 2000]). • Saliency map in V1? [Zhang et al. 2012]

Attention prioritizes processing

Attention consists of:• Bottom up (saliency)• Top-down

An image region is salientif it automatically attracts human attention

Simone Frintrop 7

Saliency computation

Object candidate

Proto-

objectsSaliency

map

[Felzenszwalb/Huttenlocher 2004]

Saliency

Simone Frintrop 8

Saliency systems from our group:

VOCUS: [Frintrop: LNAI 2006]

BITS: [Klein/Frintrop: ICCV 2011]

CoDi: [Klein/Frintrop: DAGM 2012]

Simple CoDi: [Frintrop et al: ICPR 2014]

Most recently:

VOCUS 2

Conspicuity maps

Scale representations

Input image

Saliency map

Max Finder

Feature 1 Feature n

Feature 1 Feature n

Trajectory of FOAs

Top-downinformation

Inhibition of return

...

1

2

...3

4

56

Computational Attention Systems

Saliency Systems

center-surround contrast

7

Saliency on Web ImagesMSRA Salient Object dataset [Liu et al., PAMI 2009]

Simone Frintrop 9[Klein & Frintrop, ICCV 2011][Klein & Frintrop, DAGM 2012][Frintrop, Martín-García, Cremers, ICPR 2014]

Image

Saliency map

Segmentation

2D Object DiscoveryHuman Perception:• Segmentation processes on all levels of the visual system bundle parts of the visual

input [Scholl 2001]. Result: proto-objects (superpixels)• Proto-objects are combined by focused attention to form coherent objects (attention

“grabs” proto-objects [Rensink, 2000]). • Saliency map in V1? [Zhang et al. 2012]

Simone Frintrop 10

Saliency computation

Object candidate

Super-

pixelsSaliency

map

[Felzenszwalb/Huttenlocher 2004]

Discovery on Web ImagesMSRA Salient Object dataset [Liu et al., PAMI 2009]

Simone Frintrop 11[Frintrop, Martín-García, Cremers, ICPR 2014]

Image

Saliency map

Segmentation

Results:Obj. candidates

Ground truth

Discovery on Web ImagesMSRA Salient Object dataset [Liu et al., PAMI 2009]

Simone Frintrop 12[Frintrop, Martín-García, Cremers, ICPR 2014]

Saliency

Saliency + Segmentation

Image

Saliency map

Segmentation

Results:Obj. candidates

Ground truth

2D Discovery in Real-world

Simone Frintrop 13

Kitchen dataset: Videos from Uni Bonn + RWTH Aachen:

2D Discovery in Real-worldCooperation with Bastian Leibe and Esther Horbert:Object discovery in real-world indoor sequences

Simone Frintrop 14

Saliency map Salient blobs

Segmentation

Object candidates

Computation on 4th pyramid layer

Results 2D Discovery• Experiments on new sequence-based dataset with RWTH Aachen:

5 sequences of real-world indoor scenarios• Precision-Recall curves (frame-based):

Simone Frintrop 15[Horbert, Martín-García, Frintrop, Leibe 2014 (submitted)]

Manén et al, ICCV 2013Alexe et al, PAMI 2012Arbelaez et al., PAMI 2011

Results 2D DiscoveryHow does the recall evolve over time?

Simone Frintrop 16[Horbert, Martín-García, Frintrop, Leibe 2014 (submitted)]

At the end of the sequence, we have found 90% of the objects

Our approach

Maximal possible recall

Manén et al, ICCV 2013

Alexe et al, PAMI 2012

Arbelaez et al., PAMI 2011

Results: Object Discovery

Simone Frintrop 17

[Video: Esther Horbert]

Cooperation with Bastian Leibe and Esther Horbert:Object discovery in real-world indoor sequences

From 2D to 3DHVS: Two pathways for object perception [Ungerleider 1982]:

– ventral stream (“what pathway”) processes color & form, responsible for object detection & recognition

– dorsal stream (“where pathway”) processes depth & motion, responsible for spatially localizing objects

18

Ventral stream

Dorsal stream

Depth processing stream:Creating 3D map

Map with 3D object models

Incrementally update map with new measurements

Color processing stream:Generating object candidates

[Martín-García/Frintrop, Proc. of the annual meeting of Cognitive Sciences (CogSci), 2013][Martín-García/Frintrop/Cremers, German Journal of Artificial Intelligence, 2013]

RGB-D Sensor

From Frames to Sequences: Visual Scene Exploration

Strategy to process image sequence:Two-stage processing as in human vision:

19

[Neisser, Cognitive Psychology, 1967][Treisman, Cognitive Psychology, 1985]

• Prioritization: visual attention directs the processing to the regions of most potential interest [Pashler, 1997].

[Frintrop et al: Computational Visual Attention Systems and their Cognitive Foundation: A Survey, ACM Trans. on Applied Perception (TAP), 2010 ]

Parallel, pre-attentive stage

(e.g. saliency system)

Serial,attentive stage

(e.g. recognition)Scene

Image Saliency map

20

Spatial Inhibition of Return• Inhibition of return (IOR) mechanisms inhibit cells that correspond to previously fixated

locations and objects [Posner and Cohen, 1984]. IOR supports orienting towards novelty and enables scene exploration.

[Martín-García/Frintrop, Proc. of the annual meeting of Cognitive Sciences (CogSci), 2013][Martín-García/Frintrop/Cremers, German Journal of Artificial Intelligence, 2013]

Inhibition

21

Spatial Inhibition of Return• Inhibition of return (IOR) mechanisms inhibit cells that correspond to previously fixated

locations and objects [Posner and Cohen, 1984]. IOR supports orienting towards novelty and enables scene exploration.

• IOR happens in spatial coordinates and not in retinotopic coordinates

Image Saliency map

[Martín-García/Frintrop, Proc. of the annual meeting of Cognitive Sciences (CogSci), 2013][Martín-García/Frintrop/Cremers, German Journal of Artificial Intelligence, 2013]

Inhibition

22

Spatial Inhibition of Return• Inhibition of return (IOR) mechanisms inhibit cells that correspond to previously fixated

locations and objects [Posner and Cohen, 1984]. IOR supports orienting towards novelty and enables scene exploration.

• IOR happens in spatial coordinates and not in retinotopic coordinates

Each voxel stores inhibition data:• Inhibition flag• Inhibition weight

Depth processing stream:Creating 3D map

Map with 3D object models

Color processing stream

Inhibitionmap

IORflags

…Objectcandidates

[Martín-García/Frintrop, Proc. of the annual meeting of Cognitive Sciences (CogSci), 2013][Martín-García/Frintrop/Cremers, German Journal of Artificial Intelligence, 2013]

[Martín-García/Frintrop, Proc. of the annual meeting of Cognitive Sciences (CogSci), 2013][Martín-García/Frintrop/Cremers, German Journal of Artificial Intelligence, 2013]

3D Object Discovery

23

24Simone Frintrop

Results: Coffee machine sequence

Ground truth Detected Objects

Object Discovery

Summary: our object discovery method performs well on:

Simone Frintrop 25

1) Photos 2) Videos 3) RGB-D data

[Frintrop et al.: ICPR 2014] [Horbert, Martín-García,Frintrop, Leibe, submitted]

[Martín-García/Frintrop:CogSci 2013]

[Martín-García/Frintrop/Cremers:J. of KI 2013]

Cognitive Computer VisionMany thanks to all collaborators:

Simone Frintrop 26

Saliency Detection ObjectDiscovery

Germán Martín García

Dominik A.Klein

ThomasWerner

MirceaPavel

BastianLeibe

Esther Horbert

Armin B.Cremers

Thank you foryour attention!