Upload
selah
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SUN: A Model of Visual Salience Using Natural Statistics. Gary Cottrell Lingyun Zhang Matthew Tong Tim Marks Honghao Shan Nick Butko Javier Movellan Chris Kanan. SUN: A Model of Visual Salience Using Natural Statistics …and it use in object and face recognition. Gary Cottrell - PowerPoint PPT Presentation
Citation preview
1
SUN: SUN: A Model of Visual Salience
Using Natural Statistics
Gary CottrellGary Cottrell
Lingyun Zhang Matthew Lingyun Zhang Matthew TongTong
Tim Marks Honghao Tim Marks Honghao ShanShan
Nick Butko Javier MovellanNick Butko Javier Movellan
Chris KananChris Kanan
2
SUN: SUN: A Model of Visual Salience
Using Natural Statistics…and it use in object and face
recognition
Gary CottrellGary Cottrell
Lingyun Zhang Matthew Lingyun Zhang Matthew TongTong
Tim Marks Honghao Tim Marks Honghao ShanShan
Nick Butko Javier MovellanNick Butko Javier Movellan
Chris KananChris Kanan
3
Matthew H. TongMatthew H. Tong
CollaboratorsCollaborators
Lingyun Zhang
Honghao ShanHonghao ShanTim MarksTim Marks
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
4
CollaboratorsCollaborators
Nicholas J. ButkoNicholas J. Butko Javier R. MovellanJavier R. Movellan
5
CollaboratorsCollaborators
Chris KananChris Kanan
6
Visual SalienceVisual Salience
Visual SalienceVisual Salience is some notion of is some notion of what is what is interestinginteresting in the world - it in the world - it captures our attention.captures our attention.
Visual salience is important because Visual salience is important because it drives a decision we make a it drives a decision we make a couple of hundred thousand couple of hundred thousand times a daytimes a day - where to look. - where to look.
7
Visual SalienceVisual Salience Visual SalienceVisual Salience is some notion of what is is some notion of what is
interestinginteresting in the world - it captures our in the world - it captures our attention.attention.
But that’s kind of vague…But that’s kind of vague… The role of Cognitive Science is to make The role of Cognitive Science is to make
that explicit, by creating a that explicit, by creating a working working modelmodel of visual salience. of visual salience.
A good way to do that these days is to use A good way to do that these days is to use probability theory - because as everyone probability theory - because as everyone knows, the brain is Bayesian! ;-)knows, the brain is Bayesian! ;-)
8
Data We Want to ExplainData We Want to Explain
Visual search:Visual search: Search asymmetry: A search for one object Search asymmetry: A search for one object
among a set of distractors is faster than vice among a set of distractors is faster than vice versa.versa.
Parallel vs. serial search (and the continuum Parallel vs. serial search (and the continuum in between): An item “pops out” of the display in between): An item “pops out” of the display no matter how many distractors vs. reaction no matter how many distractors vs. reaction time increasing with the number of distractors time increasing with the number of distractors (not emphasized in this talk…)(not emphasized in this talk…)
Eye movements when viewing images and Eye movements when viewing images and videos.videos.
9
Audience participation!Audience participation!
Look for the unique Look for the unique itemitem
Clap when you find itClap when you find it
10
11
12
13
14
15
16
QuickTime™ and a decompressor
are needed to see this picture.
17
QuickTime™ and a decompressor
are needed to see this picture.
18
What just happened?What just happened?
This phenomenon is called This phenomenon is called the the visual search asymmetryvisual search asymmetry:: Tilted bars are more easily found Tilted bars are more easily found
among vertical bars than vice-versa.among vertical bars than vice-versa. Backwards “s”’s are more easily found Backwards “s”’s are more easily found
among normal “s”’s than vice-versa.among normal “s”’s than vice-versa. Upside-down elephants are more easily Upside-down elephants are more easily
found among right-side up ones than found among right-side up ones than vice-versa.vice-versa.
19
Why is there an Why is there an asymmetry?asymmetry?
There are not too many There are not too many computational computational explanations:explanations: ““Prototypes do not pop out”Prototypes do not pop out” ““Novelty attracts attention”Novelty attracts attention”
Our model of visual salience will Our model of visual salience will naturally account for this.naturally account for this.
20
Saliency MapsSaliency Maps
Koch and Ullman, 1985: the brain Koch and Ullman, 1985: the brain calculates an explicit saliency map of calculates an explicit saliency map of the visual worldthe visual world
Their definition of saliency relied on Their definition of saliency relied on center-surround principles center-surround principles Points in the visual scene are salient if Points in the visual scene are salient if
they differ from their neighborsthey differ from their neighbors In more recent years, there have In more recent years, there have
been a multitude of definitions of been a multitude of definitions of saliencysaliency
21
Saliency MapsSaliency Maps
There are a number of candidates for the There are a number of candidates for the salience map: there is at least one in LIP, salience map: there is at least one in LIP, the Lateral Intraparietal Sulcus, a region the Lateral Intraparietal Sulcus, a region of the parietal lobe, also in the frontal eye of the parietal lobe, also in the frontal eye fields, the superior colliculus,… but there fields, the superior colliculus,… but there may be representations of salience much may be representations of salience much earlier in the visual pathway - some even earlier in the visual pathway - some even suggest in V1.suggest in V1.
But we won’t be talking about the brain But we won’t be talking about the brain today…today…
22
Probabilistic SaliencyProbabilistic Saliency
Our basic assumption: Our basic assumption: The main goal of the visual system is to The main goal of the visual system is to
find potential targets that are important find potential targets that are important for survival, such as prey and predators.for survival, such as prey and predators.
The visual system should direct attention The visual system should direct attention to locations in the visual field with a high to locations in the visual field with a high probability of the target class or classes.probability of the target class or classes.
We will lump all of the potential targets We will lump all of the potential targets together in one random variable, together in one random variable, TT
For ease of exposition, we will leave out For ease of exposition, we will leave out our location random variable, our location random variable, L.L.
23
Probabilistic SaliencyProbabilistic Saliency Notation: Notation: xx denotes a point in the visual denotes a point in the visual
fieldfield TTxx: binary variable signifying whether point : binary variable signifying whether point xx
belongs to a target classbelongs to a target class FFxx: the visual features at point : the visual features at point xx
The task is to find the point The task is to find the point xx that that maximizesmaximizes
the probability of a target given the the probability of a target given the features at point features at point xx
This quantity This quantity isis the saliency of a point the saliency of a point xx Note: Note: This is what most classifiers This is what most classifiers
compute!compute!
24
Probabilistic SaliencyProbabilistic Saliency
Taking the log and applying Bayes’ Taking the log and applying Bayes’ Rule results in:Rule results in:
25
Probabilistic SaliencyProbabilistic Saliency
log p(Flog p(Fxx|T|Txx)) Probabilistic description of the features Probabilistic description of the features
of the targetof the target Provides a form ofProvides a form of top-downtop-down
(endogenous, intrinsic) (endogenous, intrinsic) saliency saliency Some similarity to Iconic Search (Rao et Some similarity to Iconic Search (Rao et
al., 1995) and Guided Search (Wolfe, al., 1995) and Guided Search (Wolfe, 1989)1989)
26
Probabilistic SaliencyProbabilistic Saliency
log p(Tlog p(Txx)) Constant over locations for fixed target Constant over locations for fixed target
classes, so we can drop it.classes, so we can drop it. Note: this is a stripped-down version of Note: this is a stripped-down version of
our model, useful for presentations to our model, useful for presentations to undergraduates! ;-) - we usually include undergraduates! ;-) - we usually include a location variable as well that encodes a location variable as well that encodes the prior probability of targets being in the prior probability of targets being in particular locations.particular locations.
27
Probabilistic SaliencyProbabilistic Saliency
-log p(F-log p(Fxx)) This is called the This is called the self-information self-information of
this variable It says that rare It says that rare feature valuesfeature values attract attract
attentionattention Independent of taskIndependent of task Provides notion of Provides notion of bottom-upbottom-up
(exogenous, extrinsic) saliency(exogenous, extrinsic) saliency
28
Probabilistic SaliencyProbabilistic Saliency
Now we have two terms:Now we have two terms: Top-downTop-down saliency saliency Bottom-upBottom-up saliency saliency Taken together, this is the Taken together, this is the pointwise pointwise
mutual informationmutual information between the between the features and the targetfeatures and the target
29
Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural
Statistics”Statistics” For most of what I will be telling you For most of what I will be telling you
about next, we use only the -log p(F) about next, we use only the -log p(F) term, or bottom up salience.term, or bottom up salience.
Remember, this means rare feature Remember, this means rare feature values attract attention. values attract attention.
This This is is a computational instantiation a computational instantiation of the idea that “novelty attracts of the idea that “novelty attracts attention”attention”
30
Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural
Statistics”Statistics” Remember, this means rare feature Remember, this means rare feature
values attract attention. values attract attention. This means two things:This means two things:
We need some features (that have We need some features (that have values!)! What should we use?values!)! What should we use?
We need to know when the values We need to know when the values are are unusualunusual: So we need : So we need experience.experience.
31
Math in Action:Math in Action:Saliency Using “Natural Saliency Using “Natural
Statistics”Statistics” Experience, in this case, means Experience, in this case, means
collecting statistics of how the collecting statistics of how the features respond to natural images.features respond to natural images.
We will use two kinds of features:We will use two kinds of features: Difference of Gaussians (DOGs)Difference of Gaussians (DOGs) Independent Components Analysis Independent Components Analysis
(ICA) derived features(ICA) derived features
32
Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians
These respond to differences in These respond to differences in brightness between the center and the brightness between the center and the surround.surround.We apply them to three different color We apply them to three different color channels separately (intensity, Red-channels separately (intensity, Red-Green and Blue-Yellow) at four scales: Green and Blue-Yellow) at four scales: 12 features total.12 features total.
33
Feature Space 1:Feature Space 1:Differences of GaussiansDifferences of Gaussians
Now, we run these over Lingyun’s Now, we run these over Lingyun’s vacation photos, and record how vacation photos, and record how frequently they respond.frequently they respond.
34
Feature Space 2:Feature Space 2:Independent Independent ComponentsComponents
35
Learning the Learning the DistributionDistribution
We fit a generalized Gaussian distribution to the histogram of each feature.
p(Fi ;σ i ,θi ) =θ i
2σ i ⋅Γ1θi
⎛
⎝⎜⎞
⎠⎟
exp −Fiσ i
θi⎛
⎝⎜⎜
⎞
⎠⎟⎟
where Fi is the ith filter response,
θi is the shape parameter and σ i is the scale parameter.
36
• This is P(F) for four different This is P(F) for four different features.features.• Note these features are Note these features are sparse sparse - - I.e., their most frequent response is I.e., their most frequent response is near 0.near 0.• When there is a big response When there is a big response (positive or negative), it is (positive or negative), it is interesting!interesting!
The Learned Distribution The Learned Distribution (DOGs)(DOGs)
37
The Learned Distribution The Learned Distribution (ICA)(ICA)
QuickTime™ and a decompressor
are needed to see this picture.
For example, here’s a For example, here’s a feature:feature:
Here’s a frequency Here’s a frequency count of how often it count of how often it matches a patch of matches a patch of image:image:
Most of the time, it Most of the time, it doesn’t match at all - doesn’t match at all - a response of “0”a response of “0”
Very infrequently, it Very infrequently, it matches very well - a matches very well - a response of “200”response of “200”
BOREDOM!
NOVELTY!
38
Bottom-up SaliencyBottom-up Saliency
We have to estimate the joint We have to estimate the joint probability from the features.probability from the features.
If all filter responses are If all filter responses are independent:independent:
They’re not independent, but we They’re not independent, but we proceed as if they are. (ICA features proceed as if they are. (ICA features are “pretty independent”)are “pretty independent”)
Note: No weighting of features is Note: No weighting of features is necessary!necessary!
−log p(F) = − log p(Fi )i
∑
39
Qualitative Results: BU Qualitative Results: BU SaliencySaliency
OriginalOriginal Human Human DOG DOG ICA ICAImageImage fixations fixations Salience SalienceSalience Salience
40
OriginalOriginal Human Human DOG DOG ICA ICAImageImage fixations fixations Salience SalienceSalience Salience
Qualitative Results: BU Qualitative Results: BU SaliencySaliency
41
Qualitative Results: BU Qualitative Results: BU SaliencySaliency
42
Quantitative Results: BU Quantitative Results: BU SaliencySaliency
These are quantitative measures of how well the These are quantitative measures of how well the salience map predicts salience map predicts humanhuman fixations in static fixations in static images.images.
We are best in the KL distance measure, and We are best in the KL distance measure, and second best in the ROC measure. second best in the ROC measure.
Our main competition is Bruce & Tsotsos, who Our main competition is Bruce & Tsotsos, who have essentially the same idea we have, except have essentially the same idea we have, except they compute novelty they compute novelty in the current image.in the current image.
ModelModel KL(SE)KL(SE) ROC(SE)ROC(SE)
Itti et al.(1998)Itti et al.(1998) 0.1130(0.0011)0.1130(0.0011) 0.6146(0.0008)0.6146(0.0008)
Bruce & Tsotsos Bruce & Tsotsos (2006)(2006)
0.2029(0.0017)0.2029(0.0017) 0.6727(0.0008)0.6727(0.0008)
Gao & Vasconcelos Gao & Vasconcelos (2007)(2007)
0.1535(0.0016)0.1535(0.0016) 0.6395(0.0007)0.6395(0.0007)
SUN (DoG)SUN (DoG) 0.1723(0.0012)0.1723(0.0012) 0.6570(0.0007)0.6570(0.0007)
SUN (ICA)SUN (ICA) 0.2097(0.0016)0.2097(0.0016) 0.6682(0.0008)0.6682(0.0008)
43
Related WorkRelated Work
Torralba et al. (2003) derives a similar Torralba et al. (2003) derives a similar probabilistic account of saliency, but:probabilistic account of saliency, but: Uses Uses currentcurrent image’s statistics image’s statistics Emphasizes effects of global features and Emphasizes effects of global features and
scene gistscene gist Bruce and Tsotsos (2006) also use self-Bruce and Tsotsos (2006) also use self-
information as bottom-up saliencyinformation as bottom-up saliency Uses Uses currentcurrent image’s statistics image’s statistics
44
Related WorkRelated Work
The use of the current image’s statistics The use of the current image’s statistics means:means: These models follow a very different principle: finds These models follow a very different principle: finds
rare feature values rare feature values in the current imagein the current image instead instead of of unusual feature values in general: novelty.unusual feature values in general: novelty.
As we’ll see, novelty helps explain several As we’ll see, novelty helps explain several search asymmetriessearch asymmetries
Models using the current image’s statistics are Models using the current image’s statistics are unlikely to be neurally computable in the unlikely to be neurally computable in the necessary timeframe, as the system must necessary timeframe, as the system must collect statistics from entire image to calculate collect statistics from entire image to calculate local saliency at each pointlocal saliency at each point
45
Search AsymmetrySearch Asymmetry
Our definition of bottom-up saliency leads to Our definition of bottom-up saliency leads to a clean explanation of several search a clean explanation of several search asymmetries (Zhang, Tong, and Cottrell, asymmetries (Zhang, Tong, and Cottrell, 2007)2007) All else being equal, targets with uncommon All else being equal, targets with uncommon
feature values are easier to findfeature values are easier to find Examples:Examples:
Treisman and Gormican, 1988 - A tilted bar is more Treisman and Gormican, 1988 - A tilted bar is more easily found among vertical bars than vice versaeasily found among vertical bars than vice versa
Levin, 2000 - For Caucasian subjects, finding an African-Levin, 2000 - For Caucasian subjects, finding an African-American face in Caucasian faces is faster due to its American face in Caucasian faces is faster due to its relative rarity in our experience (basketball fans who relative rarity in our experience (basketball fans who have to identify the players do not show this effect).have to identify the players do not show this effect).
46
Search Asymmetry Search Asymmetry ResultsResults
47
Search Asymmetry Search Asymmetry ResultsResults
48
Top-down salienceTop-down saliencein Visual Searchin Visual Search
Suppose we actually have a target in mind Suppose we actually have a target in mind - e.g., find pictures, or mugs, or people in - e.g., find pictures, or mugs, or people in scenes.scenes.
As I mentioned previously, the original As I mentioned previously, the original (stripped down) (stripped down) salience model can be salience model can be implemented as a classifier applied to implemented as a classifier applied to each point in the image. each point in the image.
When we include location, we get (after a When we include location, we get (after a large number of completely unwarranted large number of completely unwarranted assumptions):assumptions):
log saliencex = −logp(F = fx)
Self-information:Bottom-up saliency
1 24 4 34 4+ logp(F = fx |Tx =1)
Log likelihood:Top-down knowledge
of appearance
1 24 4 4 34 4 4+ logp(Tx =1|L =l)
Location prior:Top-down knowledge
of target's location
1 24 44 34 4 4
49
Qualitative Results (mug Qualitative Results (mug search)search)
Where we Where we disagree the disagree the most with most with Torralba et al. Torralba et al. (2006)(2006)
GISTGIST
SUNSUN
50
Qualitative Results (picture Qualitative Results (picture search)search)
Where we Where we disagree the disagree the most with most with Torralba et al. Torralba et al. (2006)(2006)
GISTGIST
SUNSUN
51
Qualitative Results (people Qualitative Results (people search)search)
Where we Where we agreeagree the most the most with Torralba with Torralba et al. (2006)et al. (2006)
GISTGIST
SUNSUN
52
Qualitative Results (painting Qualitative Results (painting search)search)
This is an example where SUN and humans This is an example where SUN and humans make the same mistake due to the similar make the same mistake due to the similar appearance of TV’s and pictures (the black appearance of TV’s and pictures (the black square in the upper left is a TV!).square in the upper left is a TV!).
Image Humans SUN
53
Quantitative ResultsQuantitative Results
Area Under the ROC Curve (AUC) Area Under the ROC Curve (AUC) gives basically identical results.gives basically identical results.
54
Saliency of Dynamic Saliency of Dynamic ScenesScenes
Created spatiotemporal Created spatiotemporal filters filters Temporal filters: Difference of Temporal filters: Difference of
exponentials (DoE) exponentials (DoE) Highly active if changeHighly active if change If features stay constant, If features stay constant,
goes to zero responsegoes to zero response Resembles responses of Resembles responses of
some neurons (cells in some neurons (cells in LGN)LGN)
Easy to computeEasy to compute Convolve with spatial filters to Convolve with spatial filters to
create spatiotemporal filterscreate spatiotemporal filters
55
Saliency of Dynamic Saliency of Dynamic ScenesScenes
Bayesian Saliency (Itti and Baldi, 2006): Bayesian Saliency (Itti and Baldi, 2006): Saliency is Bayesian “surprise” (different from Saliency is Bayesian “surprise” (different from
self-information)self-information) Maintain distribution over a set of models Maintain distribution over a set of models
attempting to explain the data, P(M)attempting to explain the data, P(M) As new data comes in, calculate saliency of a point As new data comes in, calculate saliency of a point
as the degree to which it makes you alter your as the degree to which it makes you alter your modelsmodels Total surprise: S(D, M) = KL(P(M|D); P(M))Total surprise: S(D, M) = KL(P(M|D); P(M))
Better predictor than standard spatial salienceBetter predictor than standard spatial salience Much more complicated (~500,000 different Much more complicated (~500,000 different
distributions being modeled) than SUN dynamic distributions being modeled) than SUN dynamic saliency (days to run vs. hours or real-time)saliency (days to run vs. hours or real-time)
56
Saliency of Dynamic Saliency of Dynamic ScenesScenes
In the process of evaluating and In the process of evaluating and comparing, we discovered how much the comparing, we discovered how much the center-bias of human fixations was center-bias of human fixations was affecting results.affecting results.
Most human fixations are towards the Most human fixations are towards the center of the screen (Reinagel, 1999)center of the screen (Reinagel, 1999)
Accumulated human fixations from three experiments
57
Saliency of Dynamic Saliency of Dynamic ScenesScenes
Results varied widely depending on Results varied widely depending on how edges were handledhow edges were handled How is the invalid portion of the How is the invalid portion of the
convolution handled?convolution handled?
Accumulated saliency of three models
58
Saliency of Dynamic Saliency of Dynamic ScenesScenes
Initial results
59
Measures of Dynamic Measures of Dynamic SaliencySaliency
Typically, the algorithm is compared to the Typically, the algorithm is compared to the human fixations within a framehuman fixations within a frame I.e., how salient is the human-fixated point I.e., how salient is the human-fixated point
according to the model versus all other points in according to the model versus all other points in the framethe frame
This measure is subject to the center bias - if the This measure is subject to the center bias - if the borders are down-weighted, the score goes upborders are down-weighted, the score goes up
60
Measures of Dynamic Measures of Dynamic SaliencySaliency
An alternative is to compare the salience of An alternative is to compare the salience of the human-fixated point to the same point the human-fixated point to the same point across framesacross frames Underestimates performance, since often Underestimates performance, since often
locations are genuinely more salient at all time locations are genuinely more salient at all time points (ex. an anchor’s face during a news points (ex. an anchor’s face during a news broadcast)broadcast)
Gives any static measure (e.g., centered-Gives any static measure (e.g., centered-Gaussian) a baseline score of 0.Gaussian) a baseline score of 0.
This is equivalent to sampling from the This is equivalent to sampling from the distribution of human fixations, rather than distribution of human fixations, rather than uniformlyuniformly
On this set of measures, we perform comparably On this set of measures, we perform comparably with (Itti and Baldi, 2006)with (Itti and Baldi, 2006)
61
Saliency of Dynamic Saliency of Dynamic ScenesScenes
Results using non-center-biased metrics on the human fixation data on videos from
Itti(2005) - 4 subjects/movie, 50 movies, ~25 minutes of video.
62
Movies…Movies…
63
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
64
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
65
QuickTime™ and aYUV420 codec decompressor
are needed to see this picture.
66
Demo…Demo…
67
Summary of this part of Summary of this part of the talkthe talk
It is a good idea to start from first It is a good idea to start from first principles.principles.
Often the simplest model is bestOften the simplest model is best Our model of salience rocks.Our model of salience rocks.
It does bottom upIt does bottom up It does top downIt does top down It does video (fast!)It does video (fast!) It naturally accounts for search It naturally accounts for search
asymmetriesasymmetries
70
Christopher KananChristopher Kanan
Garrison CottrellGarrison Cottrell
71
Now we have a model of Now we have a model of salience - but what can it salience - but what can it be used for?be used for?
Here, we show that we Here, we show that we can use it to recognize can use it to recognize objects.objects.
Christopher Kanan
MotivationMotivation
72
Our attention is Our attention is automatically drawn to automatically drawn to interesting regions in interesting regions in images.images.
Our salience algorithm is Our salience algorithm is automatically drawn to automatically drawn to interesting regions in interesting regions in images.images.
These are useful locations These are useful locations for for discriminatingdiscriminating one one object (face, butterfly) object (face, butterfly) from another.from another.
One reason why this One reason why this might be a good idea…might be a good idea…
73
Training Phase (learning Training Phase (learning object appearances):object appearances):
Use the salience map to decide Use the salience map to decide where to look. where to look. (We use the ICA salience map)(We use the ICA salience map)
Memorize these Memorize these samples samples of the of the image, with image, with labelslabels (Bob, Carol, (Bob, Carol, Ted, or Alice) Ted, or Alice) (We store the ICA (We store the ICA feature feature valuesvalues))
Christopher Kanan
Main IdeaMain Idea
74
Testing Phase Testing Phase (recognizing objects we (recognizing objects we have learned):have learned):
Now, given a new face, use the Now, given a new face, use the salience map to decide where to salience map to decide where to look.look.
Compare Compare new new image samples to image samples to storedstored ones - the closest ones in ones - the closest ones in memory get to vote for their label.memory get to vote for their label.
Christopher Kanan
Main IdeaMain Idea
75
Stored memories of BobStored memories of AliceNew fragments
75Result: 7 votes for Alice, only 3 for Bob. It’s Alice!
76
VotingVoting
The voting process is actually based The voting process is actually based on Bayesian updating (and the Naïve on Bayesian updating (and the Naïve Bayes assumption).Bayes assumption).
The size of the vote depends on the The size of the vote depends on the distance from the stored sample, distance from the stored sample, using kernel density estimation. using kernel density estimation.
Hence NIMBLE: NIM with Bayesian Hence NIMBLE: NIM with Bayesian Likelihood Estimation.Likelihood Estimation.
77
The ICA features do double-duty:The ICA features do double-duty: They are They are combinedcombined to make the salience to make the salience
map - which is used to decide where to map - which is used to decide where to looklook
They are They are storedstored to represent the object at to represent the object at that locationthat location
QuickTime™ and a decompressor
are needed to see this picture.
Overview of the systemOverview of the system
78
Compare this to standard Compare this to standard computer vision systems:computer vision systems:
One pass over the image, and One pass over the image, and global features.global features.
ImageGlobal
FeaturesGlobal
ClassifierDecision
NIMBLE vs. Computer NIMBLE vs. Computer VisionVision
79
80Belief After 1 Fixation Belief After 10 Fixations
81
Human vision works in multiple Human vision works in multiple environments - our basic features environments - our basic features (neurons!) don’t change from one problem (neurons!) don’t change from one problem to the next.to the next.
We tune our parameters so that the system We tune our parameters so that the system works well on Bird and Butterfly datasets - works well on Bird and Butterfly datasets - and then apply the system and then apply the system unchangedunchanged to to faces, flowers, and objectsfaces, flowers, and objects
This is very different from standard This is very different from standard computer vision systems, that are tuned to computer vision systems, that are tuned to particular setparticular set
Christopher Kanan
Robust VisionRobust Vision
82
Cal Tech 101: 101 Different Categories
AR dataset: 120 Different People with different lighting, expression, and accessories
83
Flowers: 102 Different Flower SpeciesFlowers: 102 Different Flower Species
Christopher Kanan
84
~7 fixations required to achieve at ~7 fixations required to achieve at least 90% of maximum least 90% of maximum performance performance
Christopher Kanan
85
So, we created a simple cognitive So, we created a simple cognitive model that uses simulated fixations model that uses simulated fixations to recognize things.to recognize things. But it isn’t But it isn’t thatthat complicated. complicated.
How does it compare to approaches How does it compare to approaches in computer vision?in computer vision?
86
Caveats:Caveats: As of mid-2010.As of mid-2010. Only comparing to single feature Only comparing to single feature
type approaches (no “Multiple type approaches (no “Multiple Kernel Learning” (MKL) Kernel Learning” (MKL) approaches).approaches).
Still superior to MKL with very few Still superior to MKL with very few training examples per category.training examples per category.
871 5 15
30NUMBER OF TRAINING EXAMPLES
881 2 3 6
8 NUMBER OF TRAINING EXAMPLES
89
QuickTime™ and a decompressor
are needed to see this picture.
90
More neurally and behaviorally More neurally and behaviorally relevant gaze control and relevant gaze control and fixation integration.fixation integration.People don’t randomly sample People don’t randomly sample
images.images. A foveated retinaA foveated retina Comparison with human eye Comparison with human eye
movement data during movement data during recognition/classification of recognition/classification of faces, objects, etc.faces, objects, etc.
91
A fixation-based approach A fixation-based approach can work well for image can work well for image classification.classification.
Fixation-based models can Fixation-based models can achieve, and even exceed, some achieve, and even exceed, some of the best models in computer of the best models in computer vision. vision.
……Especially when you don’t have Especially when you don’t have a lot of training images.a lot of training images.
Christopher Kanan
92
Software and Paper Software and Paper Available at Available at
www.chriskanan.cowww.chriskanan.comm
[email protected]@ucsd.eduThis work was supported by the NSF (grant #SBE-0542013) to the Temporal
Dynamics of Learning Center.
93Thanks!Thanks!