Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009

Preview:

Citation preview

Computational Theories & Low-level

Pixels To PerceptsA. Efros, CMU, Spring 2009

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Light

Vision

Audition

STM

LTM

Motor

Sound

LightMove-ment

Odor (etc.)

Ceramiccup on a table

David Marr, 1982

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

The Retinal Image

An Image (blowup) Receptor Output

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Image-basedRepresentation

Primal Sketch(Marr)

An Image

(Line Drawing)

RetinalImage

Image-based

processes

EdgesLinesBlobsetc.

We likely throw away a lot

line drawings are universal

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Surface-basedRepresentation

Primal Sketch 2.5-D Sketch

Image-basedRepresentation

Surface-based

processes

StereoShadingMotion

etc.

Single Surface(Koenderink’s trick)

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Surface-basedRepresentation

Primal Sketch 2.5-D Sketch

Image-basedRepresentation

Surface-based

processes

StereoShadingMotion

etc.

Figure/Ground Organization

A contour belongs to one of the two (but not both) abutting regions.

Figure(face)

Ground(shapeless)

Figure(Goblet)Ground

(Shapeless)

Important for the perception of shape

© Stephen E. Palmer, 2002

Properties of figures vs. grounds

15.18

Figure GroundThing-like Not thing-likeCloser FartherShaped Extends behind

Figure-Ground OrganizationFigure-Ground Organization

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Surroundedness

15.19Figure-Ground OrganizationFigure-Ground Organization

Surrounded region --> FigureSurrounding region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Size

15.20Figure-Ground OrganizationFigure-Ground Organization

Smaller region --> FigureLarger region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Orientation

15.21Figure-Ground OrganizationFigure-Ground Organization

Horizontal/vertical region --> FigureOblique region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Contrast

15.22Figure-Ground OrganizationFigure-Ground Organization

Higher contrast region --> FigureLower contrast region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Symmetry

15.23Figure-Ground OrganizationFigure-Ground Organization

Symmetrical region --> FigureAsymmetrical region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Convexity

15.24Figure-Ground OrganizationFigure-Ground Organization

More convex region --> FigureLess convex region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Parallelism

15.25Figure-Ground OrganizationFigure-Ground Organization

More parallel region --> FigureLess parallel region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Lower region

15.26Figure-Ground OrganizationFigure-Ground Organization

Lower region --> FigureUpper region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Meaningfulness

15.27Figure-Ground OrganizationFigure-Ground Organization

More meaningful region --> FigureLess meaningful region --> Ground

© Stephen E. Palmer, 2002

Relation to Depth Factors

15.28Figure-Ground OrganizationFigure-Ground Organization

Figure-ground organization as edge assignment:To which side does the edge belong?

Depth cues can also be figure-ground factorsand

Figure-ground factors can be depth cues.

To the closer side. This fact connects figure-groundorganization with depth perception.

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Occlusion

15.29Figure-Ground OrganizationFigure-Ground Organization

Occluding region --> FigureOccluded region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Cast Shadows

15.30Figure-Ground OrganizationFigure-Ground Organization

Shadowing region --> FigureShadowed region --> Ground

© Stephen E. Palmer, 2002

Principles of figure-ground organization:

Shading

15.32Figure-Ground OrganizationFigure-Ground Organization

Shaded region --> FigureNonshaded region --> Ground

Line Labeling

> : contour direction+ : convex edge - : concave edge

possible junctions(constraints)

ConstraintPropagation

[Clowes 1971, Huffman 1971; Waltz 1972; Malik 1986]

26

Line Labeling

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Object-basedRepresentation

Object-based

processes

GroupingParsing

Completionetc.

Surface-basedRepresentation

2.5-D Sketch Volumetric Sketch

Geons(Biederman '87)

Four Stages of Visual PerceptionFour Stages of Visual Perception

© Stephen E. Palmer, 2002

Category-basedRepresentation

Category-based

processes

Pattern-Recognition

Spatial-description

Object-basedRepresentation

Volumetric Sketch Basic-level Category

Category: cup

Color: light-gray

Size: 6”

Location: table

We likely throw away a lot

line drawings are universal

However, things are not so simple…

● Problems with feed-forward model of processing…

Junctions in Real Images

Are Junctions local evidence?

J McDermott, 2004

© Stephen E. Palmer, 2002

14.38

Is grouping an early or late process?

Early vs. Late GroupingEarly vs. Late Grouping

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Light ? ? ? ?

© Stephen E. Palmer, 2002

14.39

Before or after stereoscopic depth?

(Rock & Brosgole, 1964)

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.40

Before or after lightness constancy?

(Rock, Nijhawan, Palmer & Tudor, 1992)

ReflectanceMatched

LuminanceMatched

TranslucentPlastic Strip

Early vs. Late GroupingEarly vs. Late Grouping

ReflectanceMatched

Luminance-Ratio Matched

OpaquePaper Strip

Opaquepaper strip

© Stephen E. Palmer, 2002

14.41

Before or after visual completion?

(Palmer, Neff & Beck, 1996)

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.42

Before or after illusory contours?

(Palmer & Nelson, 2000)

?

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.43

Conclusion: Grouping can occur “late”

Question: Can grouping also occur “early”

(Palmer & Brooks, in preparation)

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.44

Grouping affects shape constancy

(Palmer & Brooks, in preparation)

Ambiguous

Flat oval

Circle in depth

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.45

Proximity effects

Biased toward oval

Biased toward circle

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.46

Color similarity effects

Biased toward oval Biased toward circle

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.47

Common fate effects

Biased toward oval Biased toward circle

Early vs. Late GroupingEarly vs. Late Grouping

© Stephen E. Palmer, 2002

14.48

Conclusion: Grouping occurs both “early”

and “late” -- possibly everywhere!

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Light

Grouping Grouping Grouping Grouping

Early vs. Late GroupingEarly vs. Late Grouping

two-tone images

hair (not shadow!)

inferred external contours

“attached shadow” contour

“cast shadow” contour

Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity

The images do not contain this information a priori (at low level)

Cavanagh's argument

A Classical View of Vision

Grouping /Segmentation

Figure/GroundOrganization

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

A Contemporary View of Vision

Figure/GroundOrganization

Grouping /Segmentation

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

But where we draw this line?

Question #1:What (if anything) should be done at the “Low-Level”?

N.B. I have already told you everything that is known. From now on, there

aren’t any answers.. Only questions…

Who cares? Why not just use pixels?

Pixel differences vs. Perceptual differences

Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Cornsweet Illusion

Campbell-Robson contrast sensitivity curveCampbell-Robson contrast sensitivity curve

Sine wave

Metamers

Question #1:What (if anything) should be done at the “Low-Level”?

i.e. What input stimulus should we be invariant to?

Invariant to:

• Brightness / Color changes?

small brightness / color changeslow-frequency changes

But one can be too invariant

Invariant to:

• Edge contrast / reversal?

I shouldn’t care what background I am on!

but be careful of exaggerating noise

Representation choices

Raw Pixels

Gradients:

Gradient Magnitude:

Thresholded gradients (edge + sign):

Thresholded gradient mag. (edges):

Spatial invariance

• Rotation, Translation, Scale• Yes, but not too much…

• In brain: complex cells – partial invariance

• In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -- will discuss later

Many lives of a boundary

Often, context-dependent…

input canny human

Maybe low-level is never enough?

1/f amplitude spectra for natural images

(Field 1987)

There are statistical regularities in the natural world, and image statistics reflect that. (Burton & Moorehead 1987; Field 1987; Tolhurst et al. 1992)

Why 1/f?

Scale invariance

Edges have 1/f structure

Object distribution in real world (Ruderman 1997; Lee & Mumford 1999)

(Image source: smokiesguidebook.comSlide content: Simoncelli & Olshausen 2001)

A closer look at amplitude spectra

(Torralba & Oliva 2003)

Do natural image statistics matter?Sensory coding might exploit statistical regularities of our world according to various criteria:

Representational efficiency Decorrelate input responses, make them independent, sparse,

information theoretic metrics etc.

Metabolic efficiencySpike efficiency, minimal wiring.

Learning efficiencySparseness, invariance, over completeness etc.

Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)

Recommended