Visual Scene Understanding

Aude OlivaDepartment of Brain and Cognitive Sciences

Massachusetts Institute of TechnologyWebsite: http://cvcl.mit.edu

High-level Scene RepresentationI. Long-term Memory Representation

What is the fidelity of stored scene representations and the infrastructure that supports them?

II. High-level Neural Representation of Visual ScenesHow is the shape of visual scene represented?

Soojin Park

Michelle Greene

Timothy Brady

TaliaKonkle

GeorgeAlvarez

Memory Representation

“Basically, my recollection is that we just separated the pictures into distinct thematic categories: e.g. cars, animals, single-person, 2-people, plants, etc.) Only a few slides were selected which fell into each category, and they were visually distinct.”

According to Standing

Standing (1973)

10,000 images

83% Recognition

What we know… What we don’t know…

Dogs Dogs Playing CardsPlaying Cards

… people can remember thousands

of images

… what people are remembering for each image?

Completely different kinds of places…

Different instances of the same kind of place…

Welcome to

Massive Memory

Experiment

A stream of scenes will be presented on the screen for 3

seconds each.

Your primary task:

Remember them ALL!

afterwards you will be tested with…

A stream of scenes will be presented on the screen for 3

seconds each.

Your other task:

Detect exact repeats anywhere in the stream

Welcome to

Massive Memory

Experiment

Bedroom

Cavern

Closet

Countryroad

Greenhouse

Methods – The Study Stream128 unique semantic categories of natural images

2912 natural images shown in the stream (3 seconds each, 800 msec ISI)

Number of exemplars per category: 4, 16, or 64 !

N= 24 observers

Methods – The Study StreamOnline Task: Detect Exact Repeats

Repeats could be 2 to 1024 back in the stream

Repeats could be from categories with 4, 16, or 64 exemplars

7% of images in the stream were repeats (192 / 2912)

1024-back (>2hr!)

2-back

Methods – The Memory Test

Followed by 224 2-alternative forced choice tests

Novel Exemplar

None of the tested categories were n-backed

Test Pairs were always the same for all subjects

Any effect of interference is due to the additional exemplars

84 80 76

0102030405060708090

1-novel 4 16 64

Exemplar

Results – Recognition MemoryPe

t Corr

Replication of Standing (1973)

84 80 76

0102030405060708090

1-novel 4 16 64

Exemplar

Detailed RepresentationMinor Interference

2% drop with doubling the number of exemplars in memory

Konkle, Brady, Alvarez & Oliva (submitted)

84 80 76

0102030405060708090

1-novel 4 16 64

Exemplar

Highly DetailedMinor Interference

Objects & ScenesIs it fair to compare?

You can make each test

item and foil arbitrarily

We tried to span the

category with our exemplars

and sampled the test item and

foil uniformly

606570

75808590

95100 MMI-Scenes

MMI-Objects

2726252423222120chance

Number of Exemplars (log scale)

Object

Memory for Scenes and Objects

Konkle, Brady, Alvarez & Oliva (submitted)

Similar categorical interference effects

for objects and scenes

I. Conclusion• High fidelity representation in long term visual memory

• Similar categorical interference effects for scenes and objects

• Objects and scenes are entities represented at a similar level of abstraction in long term storage

• The results suggest that the structure of visual categories is information-theoretic optimal: It maximizes within category similarity & minimize between category similarities

Massive Memory Categorical Interference

See website with papers and stimuli: http://cvcl.mit.edu/MM

Visual Categories are represented by their shape

How to represent the shape of scenes ?

II – Neural Representationof Visual Scenes

Soojin Park Michelle Greene Timothy Brady

Walther et al (2009)

Scenes are spatial entitiesA scene is a 3 dimensional entity we act within:

it extends in space, it has a size, boundary, content, layout.

Shape of a scene: Spatial Boundary and Content

Spatial Envelope RepresentationA scene is inherently a 3D entity that may be described by

properties related to its size (volume) and its content

(1) Boundary of the spaceMean depth/SizeOpennessPerspective …

(2) Content of the spaceNaturalnessRoughnessClutter …

closet kitchen street

Oliva & Torralba (2001, 2002, 2006); Torralba & Oliva (2002, 2003); Greene & Oliva (2010, in press); Ross & Oliva (2010)

Spatial Envelope Representation of Visual Scenes

Oliva & Torralba (2001, 2002, 2006); Torralba & Oliva (2002, 2003); Greene & Oliva (2009, in press); Ross & Oliva (2010)

Spatial Boundary & Content Orthogonal Properties

Park, Brady, Greene & Oliva (submitted)

Experimental ConditionsN

Closed OpenSpatial Boundary

fixation Open Natural fixation Closed Natural fixation Open Urban …

10s 20s Time

20 blocks per condition

Experimental Procedure

1-back task

ROIs localized with Independent localizers

Epstein & Kanwisher (1998)

Classification Performances in PPA and LOC

Both PPA and LOC regions classified the 4 groups with ~ 50% accuracy

An SVM classifier was trained to classify the four conditions using all blocks but one, and then was tested on the remaining block.

Patterns of ErrorsThe patterns of errors allows to dissociate multiple levels of structure coexisting within intact images, and test the extent to which a specific property is coded in a certain brain area.

Patterns of Errors

II. ConclusionA dual neural pathway for representing the shape of a visual

Visual scenes are represented in a distributed and complementary manner by different brain regions sensitive to spatial boundary vs content of a scene

Park, Brady, Greene & Oliva (submitted)

Thank You

Funding: National Science Foundation Career Award IIS-0546262

http://cvcl.mit.edu/MM

Talia Konkle

Timothy Brady

George Alvarez

Michelle

Greene

Soojin

Timothy Brady

Visual Scene Understanding

Documents

Hybrid Scene Compression for Visual Localization - CVF Open Accessopenaccess.thecvf.com/content_CVPR_2019/papers/Camposeco... · 2019-06-10 · Hybrid Scene Compression for Visual

Visual Scene Understanding (CS 598)

Recognition Scene understanding / visual object categorization Pose clustering

6.870 Object Recognition and Scene Understanding

Towards Holistic Scene Understanding: Feedback Enabled

An Empirical Study on Leveraging Scene Graphs for Visual … · 2019. 7. 30. · ZHANG, CHAO, XUAN: LEVERAGING SCENE GRAPHS FOR VISUAL QA 1 An Empirical Study on Leveraging Scene

Scene Understanding and Assisted Living

Masters of the scene by Ronsho visual

Scene Construction, Visual Foraging, and Active Inference Construction, Visual... · Friston KJ (2016) Scene Construction, Visual Foraging, and Active Inference. Front. Comput. Neurosci

Perception II: Scene Analysis9.00/handouts/5perception2.pdf · Visual and Auditory Scene Analysis • Visual scene analysis: –World: 3 D Objects arranged in 3D space –Optical

Visual information representation and rapid-scene categorization … · 2018. 6. 3. · Visual information representation and rapid-scene categorization are simultaneous across cortex:

Kapitel 14 Recognition – p. 1 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image

Dynamic Scene Understanding and Upcoming Collision

Computer Vision II Scene Understanding · Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff: Material defined by

Neural Systems for Visual Scene Recognition · 2015. 1. 5. · Neural Systems for Visual Scene Recognition 107 scene recognition model that operated on seven global properties: openness,

Scene Graph Reasoning for Visual Question Answering · 1. Introduction Visual Question Answering (VQA) is a demanding task that involves understanding and reasoning over two data

Audio Visual Scene-Aware Dialogopenaccess.thecvf.com/content_CVPR_2019/papers/Alamri...Audio Visual Scene-aware Dialog (AVSD) Dataset to pro-vide a means for training and testing scene-aware

Holistic Scene Understanding

DeepPanoContext: Panoramic 3D Scene Understanding With

Learning Scene Geometry for Visual Localization in