Upload
alfred-kennedy
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
Finding the object despite possibly large changes inscale, viewpoint, lighting and partial occlusion
requires invariant description
ViewpointScale
Lighting Occlusion
Difficulties
Difficulties
• Very large images collection need for efficient indexing
– Flickr has 2 billion photographs, more than 1 million added daily
– Facebook has 15 billion images (~27 million added daily)
– Large personal collections
– Video collections, i.e., YouTube
Search photos on the web for particular places
Find these landmarks ...in these images and 1M more
Applications
Applications
• Take a picture of a product or advertisement
find relevant information on the web
[Pixee – Milpix]
10K. Grauman, B. Leibe
• Sony Aibo – Robotics– Recognize docking station– Communicate with visual cards– Place recognition– Loop closure in SLAM
Slide credit: David Lowe
Applications
Instance-level recognition: Approach
• Extraction of invariant image descriptors
• Matching descriptors between images- Matching of the query images to all images of a database- Speed-up by efficient indexing structures
• Geometric verification– Verification of spatial consistency for a short list
This class
• Lecture 2: Local invariant features – Student presentation: scale and affine invariant interest point
detectors
This class
• Lecture 3: Instance-level recognition: efficient search– Student presentation: scalable recognition with a vocabulary tree
• Image classification: assigning label to the image
Tasks
Car: presentCow: presentBike: not presentHorse: not present…
• Object localization: define the location and the category
Car CowLocatio
n
Category
Visual object recognitionVisual recognition - Objectives
Difficulties: within object variations
Variability: Camera position, Illumination,Internal parameters
Within-object variations
Visual category recognition
• Robust image description – Appropriate descriptors for objects and categories
• Statistical modeling and machine learning for vision– Selection and adaptation of existing techniques
Why machine learning?
• Early approaches: simple features + handcrafted models• Can handle only few images, simples tasks
L. G. Roberts, Machine Perception of Three Dimensional Solids,
Ph.D. thesis, MIT Department of Electrical Engineering, 1963.
Why machine learning?
• Early approaches: manual programming of rules• Tedious, limited and does not take into accout the data
Y. Ohta, T. Kanade, and T. Sakai, “An Analysis System for Scenes Containing objects with Substructures,” International Joint Conference on Pattern Recognition, 1978.
Why machine learning?
• Today lots of data, complex tasks
Internet images, personal photo albums
Movies, news, sports
Why machine learning?
• Today lots of data, complex tasks
Surveillance and security Medical and scientific images
Why machine learning?
• Today: Lots of data, complex tasks
• Instead of trying to encode rules directly, learn them from examples of inputs and desired outputs
Types of learning problems
• Supervised– Classification– Regression
• Unsupervised• Semi-supervised• Reinforcement learning• Active learning• ….
Image classification : Approach
• Excellent results in the presence of background clutter
bikes books building cars people phones trees
Bag-of-features for image classification
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
Spatial pyramids: perform matching in 2D image space
This class
• Lecture 4: Bag-of-features models for image classification– Student presentation: beyond bags of features: spatial pyramids
Object category localization• Method with sliding windows (Each window is classified as
containing or not the targeted object)
• Learn a classifier by providing positive and negative examples
Localization approach
Histogram of oriented image gradients as image descriptor
SVM as classifier, importance weighted descriptors
This class
• Lecutre 5: Category-level object localization – Student presentation: object detection with discriminatively trained
part based models
This class - schedule
• Session 1, October 1 2010– Cordelia Schmid: Introduction – Jakob Verbeek: Introduction Machine Learning
• Session 2, December 3 2010– Jakob Verbeek: Clustering with k-means, mixture of Gaussians – Cordelia Schmid: Local invariant features – Student presentation 1 : Scale and affine invariant interest point detectors,
Mikolajczyk and Schmid, IJCV 2004.
• Session 3, December 10 2010– Cordelia Schmid: Instance-level recognition: efficient search– Student presentation 2: Scalable recognition with a vocabulary tree, Nister and
Stewenisus, CVPR 2006.
Plan for the course
• Session 4, December 17 2010– Jakob Verbeek: Mixture of Gaussians, EM algo.,Fisher Vector image representation
– Cordelia Schmid: Bag-of-features models for category-level classification – Student presentation2: Beyond bags of features: spatial pyramid matching for recognizing natural
scene categories, Lazebnik, Schmid and Ponce, CVPR 2006.
• Session 5, January 7 2011– Jakob Verbeek: Classification 1: generative and non-parameteric methods – Student presentation 4: Large-scale image retrieval with compressed Fisher vectors, Perronnin,
Liu, Sanchez and Poirier, CVPR 2010.
– Cordelia Schmid: Category level localization: Sliding window and shape model – Student presentation 5: Object detection with discriminatively trained part based methods,
McAllester and Ramanan, PAMI 2010.
.
This class - schedule
Plan for the course
• Session 6, January 14 2011– Jakob Verbeek: Classification 2: discriminative models
– Student presentation 6:TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation, Guillaumin, Mensink, Verbeek and Schmid, ICCV 2009.
– Student presentation 7: IMG2GPS: estimating geographic information from a single image, Hays and Efros, CVPR 2008.
This class - schedule
This class
• Class web page at – http://lear.inrialpes.fr/people/verbeek/MLCR.10.11– Slides available after class
• Student presentations– 20 minutes oral presentation with slides, 5 minutes questions– Two students present together one paper
• Grades– 50% final exam– 25% presentation– 25% short quiz after each presentation