Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009

Computational Theories & Low-level

Pixels To PerceptsA. Efros, CMU, Spring 2009

Four Stages of Visual PerceptionFour Stages of Visual Perception

Image- BasedProcessing

Surface- BasedProcessing

Object-Based

Processing

Category- BasedProcessing

Vision

Audition

LightMove-ment

Odor (etc.)

Ceramiccup on a table

David Marr, 1982

The Retinal Image

An Image (blowup) Receptor Output

Image-basedRepresentation

Primal Sketch(Marr)

An Image

(Line Drawing)

RetinalImage

Image-based

processes

EdgesLinesBlobsetc.

We likely throw away a lot

line drawings are universal

Surface-basedRepresentation

Primal Sketch 2.5-D Sketch

Surface-based

processes

StereoShadingMotion

Single Surface(Koenderink’s trick)

Primal Sketch 2.5-D Sketch

Surface-based

processes

StereoShadingMotion

Figure/Ground Organization

A contour belongs to one of the two (but not both) abutting regions.

Figure(face)

Ground(shapeless)

Figure(Goblet)Ground

(Shapeless)

Important for the perception of shape

Properties of figures vs. grounds

Figure GroundThing-like Not thing-likeCloser FartherShaped Extends behind

Figure-Ground OrganizationFigure-Ground Organization

Principles of figure-ground organization:

Surroundedness

15.19Figure-Ground OrganizationFigure-Ground Organization

Surrounded region --> FigureSurrounding region --> Ground

Smaller region --> FigureLarger region --> Ground

Orientation

Horizontal/vertical region --> FigureOblique region --> Ground

Contrast

Higher contrast region --> FigureLower contrast region --> Ground

Symmetry

Symmetrical region --> FigureAsymmetrical region --> Ground

Convexity

More convex region --> FigureLess convex region --> Ground

Parallelism

More parallel region --> FigureLess parallel region --> Ground

Lower region

Lower region --> FigureUpper region --> Ground

Meaningfulness

More meaningful region --> FigureLess meaningful region --> Ground

Relation to Depth Factors

Figure-ground organization as edge assignment:To which side does the edge belong?

Depth cues can also be figure-ground factorsand

Figure-ground factors can be depth cues.

To the closer side. This fact connects figure-groundorganization with depth perception.

Occlusion

Occluding region --> FigureOccluded region --> Ground

Cast Shadows

Shadowing region --> FigureShadowed region --> Ground

Shading

Shaded region --> FigureNonshaded region --> Ground

Line Labeling

> : contour direction+ : convex edge - : concave edge

possible junctions(constraints)

ConstraintPropagation

[Clowes 1971, Huffman 1971; Waltz 1972; Malik 1986]

Line Labeling

Object-basedRepresentation

Object-based

processes

GroupingParsing

Completionetc.

2.5-D Sketch Volumetric Sketch

Geons(Biederman '87)

Category-basedRepresentation

Category-based

processes

Pattern-Recognition

Spatial-description

Object-basedRepresentation

Volumetric Sketch Basic-level Category

Category: cup

Color: light-gray

Size: 6”

Location: table

We likely throw away a lot

line drawings are universal

However, things are not so simple…

● Problems with feed-forward model of processing…

Junctions in Real Images

Are Junctions local evidence?

J McDermott, 2004

Is grouping an early or late process?

Early vs. Late GroupingEarly vs. Late Grouping

Object-Based

Processing

Light ? ? ? ?

Before or after stereoscopic depth?

(Rock & Brosgole, 1964)

Before or after lightness constancy?

(Rock, Nijhawan, Palmer & Tudor, 1992)

ReflectanceMatched

LuminanceMatched

TranslucentPlastic Strip

ReflectanceMatched

Luminance-Ratio Matched

OpaquePaper Strip

Opaquepaper strip

Before or after visual completion?

(Palmer, Neff & Beck, 1996)

Before or after illusory contours?

(Palmer & Nelson, 2000)

Conclusion: Grouping can occur “late”

Question: Can grouping also occur “early”

(Palmer & Brooks, in preparation)

Grouping affects shape constancy

(Palmer & Brooks, in preparation)

Ambiguous

Flat oval

Circle in depth

Proximity effects

Biased toward oval

Biased toward circle

Color similarity effects

Biased toward oval Biased toward circle

Common fate effects

Biased toward oval Biased toward circle

Conclusion: Grouping occurs both “early”

and “late” -- possibly everywhere!

Object-Based

Processing

Grouping Grouping Grouping Grouping

two-tone images

hair (not shadow!)

inferred external contours

“attached shadow” contour

“cast shadow” contour

Finding 3D structure in two-tone images requires distinguishing cast shadows, attached shadows, and areas of low reflectivity

The images do not contain this information a priori (at low level)

Cavanagh's argument

A Classical View of Vision

Grouping /Segmentation

Figure/GroundOrganization

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

A Contemporary View of Vision

Figure/GroundOrganization

Grouping /Segmentation

Object and Scene Recognition

pixels, features, edges, etc.Low-level

Mid-level

High-level

But where we draw this line?

Question #1:What (if anything) should be done at the “Low-Level”?

N.B. I have already told you everything that is known. From now on, there

aren’t any answers.. Only questions…

Who cares? Why not just use pixels?

Pixel differences vs. Perceptual differences

Eye is not a photometer!

"Every light is a shade, compared to the higher lights, till you come to the sun; and every shade is a light, compared to the deeper shades, till you come to the night."

— John Ruskin, 1879

Cornsweet Illusion

Campbell-Robson contrast sensitivity curveCampbell-Robson contrast sensitivity curve

Sine wave

Metamers

Question #1:What (if anything) should be done at the “Low-Level”?

i.e. What input stimulus should we be invariant to?

Invariant to:

• Brightness / Color changes?

small brightness / color changeslow-frequency changes

But one can be too invariant

Invariant to:

• Edge contrast / reversal?

I shouldn’t care what background I am on!

but be careful of exaggerating noise

Representation choices

Raw Pixels

Gradients:

Gradient Magnitude:

Thresholded gradients (edge + sign):

Thresholded gradient mag. (edges):

Spatial invariance

• Rotation, Translation, Scale• Yes, but not too much…

• In brain: complex cells – partial invariance

• In Comp. Vision: histogram-binning methods (SIFT, GIST, Shape Context, etc) or, equivalently, blurring (e.g. Geometric Blur) -- will discuss later

Many lives of a boundary

Often, context-dependent…

input canny human

Maybe low-level is never enough?

1/f amplitude spectra for natural images

(Field 1987)

There are statistical regularities in the natural world, and image statistics reflect that. (Burton & Moorehead 1987; Field 1987; Tolhurst et al. 1992)

Why 1/f?

Scale invariance

Edges have 1/f structure

Object distribution in real world (Ruderman 1997; Lee & Mumford 1999)

(Image source: smokiesguidebook.comSlide content: Simoncelli & Olshausen 2001)

A closer look at amplitude spectra

(Torralba & Oliva 2003)

Do natural image statistics matter?Sensory coding might exploit statistical regularities of our world according to various criteria:

Representational efficiency Decorrelate input responses, make them independent, sparse,

information theoretic metrics etc.

Metabolic efficiencySpike efficiency, minimal wiring.

Learning efficiencySparseness, invariance, over completeness etc.

Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)Lots and lots of work; see reviews Graham & Field (2007), Simoncelli & Olshausen (2001)

Computational Theories & Low-level Pixels To Percepts A. Efros, CMU, Spring 2009

Documents

Texture Synthesis by Non-parametric Sampling / Image Quilting for Texture Synthesis & Transfer by Efros and Leung / Efros and Freeman ICCV ’99 / SIGGRAPH

Image Manifolds 16-721: Learning-based Methods in Vision Alexei Efros, CMU, Spring 2007 © A.A. Efros With slides by Dave Thompson

MODELING USE OF CONCEPTS, PERCEPTS, ANDdigitalassets.lib.berkeley.edu/math/ucb/text/math_s5_v4_article-07.pdf · ANDUSE OF CONCEPTS, PERCEPTS, ... heuristic programming, and information

fshubhtuls,tinghuiz,efros,malikg@eecs.berkeley.edu arXiv ...Shubham Tulsiani, Tinghui Zhou, Alexei A. Efros, Jitendra Malik University of California, Berkeley fshubhtuls,tinghuiz,efros,malikg@eecs.berkeley.edu

Learning Category-Specific Mesh Reconstruction from Image ... · Angjoo Kanazawa ∗, Shubham Tulsiani , Alexei A. Efros, Jitendra Malik University of California, Berkeley {kanazawa,shubhtuls,efros,malik}@eecs.berkeley.edu

Measurement, validity and reliabilityusers.ox.ac.uk/~sfos0015/rd_5_2017.pdf · Measurement, validity and reliability . Some thoughts…. 'Concepts without percepts are empty...Percepts

INTELLIGENT AGENTS. Agent and Environment Environment Agent percepts actions ? Sensors Effectors

1B50 – Percepts and Concepts Daniel J Hulme. Outline Cognitive Vision –Why do we want computers to see? –Why can’t computers see? –Introducing percepts

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

François Martin, Pixels Award 2014 - Pixels Festival S01E01

Recognizing Action at a Distance - Computer Graphicsgraphics.cs.cmu.edu/people/efros/research/action/efros... · 2005-06-03 · motion descriptors nearest appearance neighbor(s) Retrieve

Advertising Specifications - Microsoft · ONLINE BANNER ADS 430 pixels x 90 pixels 5.972" w x 1.25" t 180 pixels x 180 pixels 2.5" w x 2.5" t 180 pixels x 90 pixels 2.5" w x 1.5"

Mapping auditory percepts into visual interfaces for ... · guidelines making web sites accessible for the visually im-paired [4]. How to map auditory percepts have previously been

Putting Objects in Perspective - Robotics Institute...Putting Objects in Perspective Derek Hoiem Alexei A. Efros Martial Hebert Carnegie Mellon University, Robotics Institute {dhoiem,efros,hebert}@cs.cmu.edu

Face Collections 15-463: Rendering and Image Processing Alexei Efros

BU505M/BU302M Series Users Guide - TOSHIBA TELI...20 : 2.0 mega pixels 23 : 2.3 mega pixels 30 : 3.0 mega pixels 40 : 4.0 mega pixels 50 : 5.0 mega pixels 60 : 6.0 mega pixels 65 :

15-463 (15-862): Computational Photography. Staff Prof: Alexei Efros (efros@cs), 4207 NSH (for now)@cs TA: Ronit Slyper (rys@cs) Web Page

jtunney.com · Megapixel = 1 pixels . Pixels and Megapixels Thousands of pixels of various shades . Pixels and Megapixels make up . Pixels and Megapixels every picture . Pixels and

Interactive Learning of Mappings from Visual Percepts to Actions … · 2018-09-10 · Interactive Learning of Mappings from Visual Percepts to Actions hp t,a t,r t+1,p t+1iof the

Physiology of Vision: a swift overview Pixels to Percepts A. Efros, CMU, Spring 2011 Some figures from Steve Palmer