View
24
Download
0
Category
Preview:
DESCRIPTION
Context as Supervisory Signal: Discovering Objects with Predictable Context. Carl Doersch , Abhinav Gupta, Alexei Efros. Unsupervised Object Discovery. Children learn to see without millions of labels Is there a cue hidden in the data that we can use to learn better representations?. …. - PowerPoint PPT Presentation
Citation preview
CONTEXT AS SUPERVISORY SIGNAL:DISCOVERING OBJECTS WITH PREDICTABLE CONTEXT
Carl Doersch, Abhinav Gupta, Alexei Efros
Unsupervised Object Discovery
• Children learn to see without millions of labels• Is there a cue hidden in the data that we can
use to learn better representations?
…
How hard is unsupervised object discovery?
• In state of the art literature it is still commonplace to select 20 categories from Caltech 1011:
• Le et al. got in the New York Times for spending 1M+ CPU hours and reporting just 3 objects. 2
1Faktor & Irani, “Clustering by Composition” – Unsupervised Discovery of Image Categories, ECCV 20122Le et al, Building High-level Features Using Large Scale Unsupervised Learning, ICML 2012
Object Discovery = Patch Clustering
Object Discovery = Patch Clustering
Object Discovery = Patch Clustering
Clustering algorithms
Clustering algorithms
Clustering algorithms?
Clustering as Prediction
Y value ishere
Prediction:
Actual
Clustering as Prediction
Y value ishere
Prediction:
Actual
What cue is used for clustering here?
• The datapoints are redundant! – Within a cluster, knowing one coordinate tells you
the other• Images are redundant too!
• Contribution #1 (of 3): use patch-context redundancy for clustering.
More rigorously
• Unlike standard clustering, we measure the affinity of a whole cluster rather than pairs
• Datapoints x can be divided into x=[p,c]• Affinity is defined by how ‘learnable’ is the
function c=F(p)– Or P(c|p)– Measure how well the function has been learned
via leave-one-out cross validation
Previous whispers of this idea• Weakly supervised methods
• Probabilistic clustering– choose clusters s.t. dataset can be compressed
• Autoencoders & RBM’s
(Olshausen & Field 1997)
Clustering as Prediction
Y value ishere
Prediction:
Actual
When is a prediction “good enough”?
• Image distance metrics are unreliable
Min distance: 2.59e-4
Max distance: 1.22e-4
Input Nearest neighbor
It’s even worse during prediction
• Horizontal gratings are very close together in HOG feature space, and are easy to predict
• Cats have more complex shape that is farther apart and harder to predict.
? ?
Our task is more like this:
P2(X|Y)
P1(X|Y)
Y value ishere
Contribution #2 (of 3): Cluster Membership as Hypothesis Testing
What are the hypotheses?• thing hypothesis– The patch is a member of the proposed cluster– Context is best predicted by transferring
appearance from other cluster members• stuff hypothesis– Patch is ‘miscellaneous’– The best we can do is model the context as clutter
or texture, via low-level image statistics.
Our Hypotheses
True Region:
High Likelihood
Low Likelihood
Stuff Model
…
Stuff-basedPrediction
Thing Model
Thing-basedPrediction
PredictedCorrespondence
Correspondence
Condition Region (Patch)
Prediction Region (Context)
Real predictions aren’t so easy!
What to do?
• Generate multiple predictions from different images, and take the best
• Take bits and pieces from many images• Still not enough? Generate more with
warping.• With arbitrarily small pieces and lots of
warping, you can match anything!
Generative Image Models
• Long history in computer vision
• Come up with a distribution P(c|p), without looking at c
• Distribution is extremely complicated due to dependencies between features
- Hinton et al. 1995- Weber et al. 2000- Fergus et al. 2003
- Fei-Fei et al. 2007- Sudderth et al. 2005- Wang et al. 2006
Generative Image Models
Object
Part Locations
Features
Inference in Generative Models
Object
Part Locations
Features
Contribution #3: Likelihood Factorization
• Let c be context and p be a patch• Break c chunks c1…ck (e.g. HOG cells)
• Now the problem boils down to estimating
Our Hypotheses
True Region:
Stuff Model
…
Thing ModelPredictedCorrespondence
Correspondence
Condition Region (Patch)
Prediction Region (Context)
Surprisingly many good properties
• This is not an approximation• Broke a hard generative problem into a series of
easy discriminative ones• Latent variable inference can use maximum
likelihood: it’s okay to overfit to condition region• ci’s do not need to be integrated out• Likelihoods can be compared across models
Recap of Contributions
• Clustering as prediction• Statistical hypothesis testing of stuff vs. thing
to determine cluster membership• Factorizing the computation of stuff and thing
data likelihoods
Stuff Model
Condition Region
Find NN’s for thisRegion and transfer
Prediction Region
Faster Stuff Model
x1,000,000 x5,000
Conditional:
Random Image Patches Dictionary (GMM of patches)
Faster Stuff Model
x5,000
Cond’l:
Condition Region
Prediction Region
Region SpecifiedIn Dictionary
x5,000
Cond’l:
Marginalization
Thing Model
Image i…
…
Query image
……
Condition Region Prediction Region
Finite and Occluded ‘Things’
• If the thing model believes it can’t predict well, ‘mimic’ the background model– When the thing model believes the prediction
region corresponds to regions previously predicted poorly
– When the thing model has been predicting badly nearby (handles occlusion!)
Image i
……
Query image
……
Clusters discovered on 6 Pascal Categories
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cumulative Coverage
Visual Words .63 (.37)Russel et al. .66 (.38)HOG Kmeans .70 (.40)
Our Approach .83 (.48)Singh et al. .83 (.47)
Purit
y
Results: Pascal 2011, unsupervised
1
2
5
6
Rank Initial Final
Results: Pascal 2011, unsupervised
19
20
31
47
Rank Initial Final
Results: Pascal 2011, unsupervised
153
200
220
231
Rank Initial Final
Errors
Unsupervised keypoint transfer
L F Wheel CtrR F Wheel Ctr
L Headlight
L Mirror R Headlight [sic]
R MirrorR F Roof L B Roof
L Taillight
L B Wheel Ctr
R B Wheel Ctr
L F Roof R B Roof
Keypoint Precision-Recall
Recall
Pre
cis
ion
0 0.025 0.05 0.075 0.1 0.1250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.15
Ours
Exemplar LDA
Recommended