The Purpose of Vision.The Purpose of Vision.
“To Know What is Where by Looking”. Aristotle. (384-322 BC).
Information Processing: receive a signal by light rays and decode its information.
Vision appears deceptively simple, but there is more to Vision than meets the Eye.
What are Humans Ideal for?What are Humans Ideal for?Clearly humans are not good at determining
the size of objects in images – at least for these types of stimuli.
But they are good at determining context and taking contextual cues into account – i.e. use perspective cues to estimate depth and make adjustments.
What reasoning/statistical tasks are humans ideal for?
Visual IllusionsVisual Illusions
The perception of brightness of a surface, or the length of a line, depends on context. Not on basic measurements like:the no. of photons that reach the eyeor the length of line in the image.
Vision is ill-posed.Vision is ill-posed.
Vision is ill-posed – the data in the retina is not sufficient to unambiguously determine the visual scene.
Vision is possible because we have prior knowledge about visual scenes.
Even simple perception is an act of creation.
Perception as InferencePerception as Inference
Helmholtz. 1821-1894.“Perception as Unconscious Inference”.
How Hard is Vision?How Hard is Vision?
The Human Brain devotes an enormous amount of resources to vision.
(I) Optic nerve is the biggest nerve in the body. (II) Roughly half of the neurons in the cortex are
involved in vision (van Essen). If intelligence is proportional to neural activity,
then vision requires more intelligence than mathematics or chess.
Vision and Artificial IntelligenceVision and Artificial Intelligence
The hardness of vision became clearer when
the Artificial Intelligence community tried to
design computer programs to do vision. ’60s.AI workers thought that vision was “low-
level” and easy. Prof. Marvin Minsky (pioneer of AI) asked
a student to solve vision as a summer project.
Chess and Face DetectionChess and Face Detection
Artificial Intelligence Community preferred Chess to Vision.
By the mid-90’s Chess programs could beat the world champion Kasparov.
But computers could not find faces in images.
Man and Machine.Man and Machine.
David Marr (1945-1980) Three Levels of explanation:
1. Computation Level/Information Processing
2. Algorithmic Level
3. Hardware: Neurons versus silicon chips.
Claim: Man and Machine are similar at Level 1.
Vision as Probabilistic Inference Vision as Probabilistic Inference
Represent the World by S.Represent the Image by I.Goal: decode I and infer S.Model image formation by likelihood
function, generative model, P(I|S)Model our knowledge of the world by a
prior P(S).
Bayes TheoremBayes Theorem
Then Bayes’ Theorem states we show infer the world S from I by
P(S|I) = P(I|S)P(S)/P(I).Rev. T. Bayes. 1702-1761
Ambiguity and Complexity of Images.Ambiguity and Complexity of Images.
Similar objects give rise to very different images. Different objects can cause similar images.
Ideal ObserversIdeal Observers
The Image of a cylinder is consistent with multiple objects and viewpoints.
The likelihood is ambiguous
(concave or convex). The prior resolves the ambiguity by
biasing towards convex objects viewed from above.
Influence Graphs and Visual TasksInfluence Graphs and Visual Tasks
Influence Graphs and the Visual Task
Examples of Vision TasksExamples of Vision Tasks
Visual Inference: (1) Estimating Shape. (2) Segmenting Images. (3) Detecting Faces. (4) Detecting and Reading Text. (5) Parsing the full image – detect and
recognize all objects in the image, understand the viewed scene.
Analysis by SynthesisAnalysis by Synthesis
Invert generation process to parse the image.
Probabilistic Grammars
for image generation
(week 2).
Probabilistic Grammars for ImagesProbabilistic Grammars for Images
(I) Image are generated by composing visual patterns: (II) Parse an image by decomposing it into patterns.
Generative Models for PatternsGenerative Models for Patterns
Examples of images synthesized from generative models (MCMC).
Towards Full Image ParsingTowards Full Image Parsing
The image genome project (Zhu).
Attempt to determine the grammar for images by interactive parsing of images.
Thereby learn the statistical regularities of images – the priors and the representations.
DatabaseDatabase
landscape
seashore
scenegeneric object
others
attribute curve
natural manmade
land mammal
pigcat
horsetigercattlebearpanda
kangarooorangutang
zebra...
bird
robin
eaglecrane
ibisparrotflamingoowlpigeon
duckhen...
marine
sharkbass
dolpintroutgoldfishshrimpoctopus...
insert
butterflyant
cockroachdragonflymayflyscorpiontick...
other
turtlecrocodile
forgcrabsnak...
animal other
mountain/hill
plantflowerfruit
body of water...
chairtable
bedbenchcouch...
furniture
televisionlampmicrowave
air-condition
ceiling fan
...
ambulancetelepnone
mp3cell phone
camera
electronic
helicopter
battleshipcannon
rifletank
sword...
weapon
food
containercomputer
flag
toolsmusic instrumentstationery...
other
airplanecarbusbicycle
motorcycle
...ambulance
truckSUV
cruise ship
vehicle
bathroombedroomcorridorhallkitchen
livingroomoffice
indoor
street
cityview
harborhighway
parking
rural
forest
outdoor
D a t a b a s e 561,726 images3,309,257 POs
804 images86,665 curves
525,850 frames2,794,727POsvideo
surveillance video clips
602 images18,878 POs
chinese
english
text1,194 images13,889 POsface
ageposeexpression
22,405 images129,184 POs
723 images48,907 POs
10,139 images217,007 POs
meetingshoppingsports
dinnerlecture
activity
graphlet
...
businessparkingairportresidentialindustryintersectionmarinaschool
aerial image
weak boundary
low-middle level vision
cartoon movie clips
Inventory of the annotated image database by Nov.06PO means a parsed object node in the database
Back to the BrainBack to the Brain
Top-Level; compare human performance to
Ideal Observers.
Explain human perceptual biases (visual
illusions) as strategies that are “statistical
effective”.
Brain Architecture Brain Architecture
The Bayesian models have interesting
analogies to the brain. Generative models and analysis by
synthesis.
This is consistent with top-down processing? (Kersten’s talk next week).
ConclusionConclusion
Vision is unconscious inference. Bayesian Approach lead to vision as analysis by
synthesis -- inverting the image generation process.
This requires “sophisticated” priors about the statistics of natural images.
This can be formulated mathematically in terms of Probabilistic Grammars for image formation.
These grammars can be learnt by analysing the “sophisticated” statistics of natural images.