64
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University Dali, Swans Reflecting Elephants

10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants

Embed Size (px)

Citation preview

  • Slide 1
  • 10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants
  • Slide 2
  • Last class: Image Stitching 1.Detect keypoints 2.Match keypoints 3.Use RANSAC to estimate homography 4.Project onto a surface and blend
  • Slide 3
  • Project 5: due Nov 17 1.Align frames to a central frame (40 pts) 2.Identify background pixels on panorama (20 pts) 3.Map background pixels back to videos (20 pts) 4.Identify foreground pixels (10 pts) 5.Insert a foreground object (10 pts)
  • Slide 4
  • Aligning frames 1 3 2
  • Slide 5
  • Background identification
  • Slide 6
  • mean median mode Idea 1: take average (mean) pixel - Not bad but averages over outliers Idea 2: take mode (most common) pixel - Can ignore outliers if background shows more than any other single color Idea 3: take median pixel - Can ignore outliers if background shows at least 50% of time, or outliers tend to be well-distributed
  • Slide 7
  • Identifying foreground 1.Simple method: foreground pixels are some distance away from background 2.Another method: count times that each color is observed and assign unlikely colors to foreground Can work for repetitive motion, like a tree swaying in the breeze
  • Slide 8
  • Augmented reality Insert and/or interact with object in scene Project by Karen Liu Project by Karen Liu Responsive characters in AR Responsive characters in AR KinectFusion KinectFusion Overlay information on a display Tagging reality Tagging reality Layar Layar Google goggles Google goggles T2 video (13:23)
  • Slide 9
  • Adding fake objects to real video Approach 1.Recognize and/or track points that give you a coordinate frame 2.Apply homography (flat texture) or perspective projection (3D model) to put object into scene Main challenge: dealing with lighting, shadows, occlusion
  • Slide 10
  • Information overlay Approach 1.Recognize object that youve seen before 2.Retrieve info and overlay Main challenge: how to match reliably and efficiently?
  • Slide 11
  • Today How to quickly find images in a large database that match a given image region?
  • Slide 12
  • Lets start with interest points Query Database Compute interest points (or keypoints) for every image in the database and the query
  • Slide 13
  • Simple idea See how many keypoints are close to keypoints in each other image Lots of Matches Few or No Matches But this will be really, really slow!
  • Slide 14
  • Key idea 1: Visual Words Cluster the keypoint descriptors
  • Slide 15
  • Key idea 1: Visual Words K-means algorithm Illustration: http://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clustering 1. Randomly select K centers 2. Assign each point to nearest center 3. Compute new center (mean) for each cluster
  • Slide 16
  • Key idea 1: Visual Words K-means algorithm Illustration: http://en.wikipedia.org/wiki/K-means_clusteringhttp://en.wikipedia.org/wiki/K-means_clustering 1. Randomly select K centers 2. Assign each point to nearest center 3. Compute new center (mean) for each cluster Back to 2
  • Slide 17
  • Kmeans: Matlab code function C = kmeans(X, K) % Initialize cluster centers to be randomly sampled points [N, d] = size(X); rp = randperm(N); C = X(rp(1:K), :); lastAssignment = zeros(N, 1); while true % Assign each point to nearest cluster center bestAssignment = zeros(N, 1); mindist = Inf*ones(N, 1); for k = 1:K for n = 1:N dist = sum((X(n, :)-C(k, :)).^2); if dist < mindist(n) mindist(n) = dist; bestAssignment(n) = k; end % break if assignment is unchanged if all(bestAssignment==lastAssignment), break; end; lastAssignment = bestAssignmnet; % Assign each cluster center to mean of points within it for k = 1:K C(k, :) = mean(X(bestAssignment==k, :)); end
  • Slide 18
  • K-means Demo http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html
  • Slide 19
  • Key idea 1: Visual Words Cluster the keypoint descriptors Assign each descriptor to a cluster number What does this buy us? Each descriptor was 128 dimensional floating point, now is 1 integer (easy to match!) Is there a catch? Need a lot of clusters (e.g., 1 million) if we want points in the same cluster to be very similar Points that really are similar might end up in different clusters
  • Slide 20
  • Key idea 1: Visual Words Cluster the keypoint descriptors Assign each descriptor to a cluster number Represent an image region with a count of these visual words
  • Slide 21
  • Key idea 1: Visual Words Cluster the keypoint descriptors Assign each descriptor to a cluster number Represent an image region with a count of these visual words An image is a good match if it has a lot of the same visual words as the query region
  • Slide 22
  • Nave matching is still too slow Imagine matching 1,000,000 images, each with 1,000 keypoints
  • Slide 23
  • Key Idea 2: Inverse document file Like a book index: keep a list of all the words (keypoints) and all the pages (images) that contain them. Rank database images based on tf-idf measure. tf-idf: Term Frequency Inverse Document Frequency # words in document # times word appears in document # documents # documents that contain the word
  • Slide 24
  • Fast visual search Scalable Recognition with a Vocabulary Tree, Nister and Stewenius, CVPR 2006. Video Google, Sivic and Zisserman, ICCV 2003
  • Slide 25
  • Slide Slide Credit: Nister 110,000,000 Images in 5.8 Seconds
  • Slide 26
  • Slide Slide Credit: Nister
  • Slide 27
  • Slide Slide Credit: Nister
  • Slide 28
  • Slide
  • Slide 29
  • Key Ideas Visual Words Cluster descriptors (e.g., K-means) Inverse document file Quick lookup of files given keypoints tf-idf: Term Frequency Inverse Document Frequency # words in document # times word appears in document # documents # documents that contain the word
  • Slide 30
  • Recognition with K-tree Following slides by David Nister (CVPR 2006)
  • Slide 31
  • Slide 32
  • Slide 33
  • Slide 34
  • Slide 35
  • Slide 36
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Slide 47
  • Slide 48
  • Slide 49
  • Slide 50
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Performance
  • Slide 55
  • More words is better Improves Retrieval Improves Speed Branch factor
  • Slide 56
  • Higher branch factor works better (but slower)
  • Slide 57
  • Can we be more accurate? So far, we treat each image as containing a bag of words, with no spatial information a f z e e a f e e a f e e h h Which matches better?
  • Slide 58
  • Can we be more accurate? So far, we treat each image as containing a bag of words, with no spatial information Real objects have consistent geometry
  • Slide 59
  • Final key idea: geometric verification Goal: Given a set of possible keypoint matches, figure out which ones are geometrically consistent How can we do this?
  • Slide 60
  • Final key idea: geometric verification RANSAC for affine transform a f z e e a f e e z a f z e e a f e e z Affine Transform Randomly choose 3 matching pairs Estimate transformation Predict remaining points and count inliers Repeat N times:
  • Slide 61
  • Application: Large-Scale Retrieval K. Grauman, B. Leibe61 [Philbin CVPR07] Query Results on 5K (demo available for 100K)
  • Slide 62
  • Application: Image Auto-Annotation K. Grauman, B. Leibe62 Left: Wikipedia image Right: closest match from Flickr [Quack CIVR08] Moulin Rouge Tour Montparnasse Colosseum Viktualienmarkt Maypole Old Town Square (Prague)
  • Slide 63
  • Example Applications B. Leibe63 Mobile tourist guide Self-localization Object/building recognition Photo/video augmentation Aachen Cathedral [Quack, Leibe, Van Gool, CIVR08]
  • Slide 64
  • Video Google System 1.Collect all words within query region 2.Inverted file index to find relevant frames 3.Compare word counts 4.Spatial verification Sivic & Zisserman, ICCV 2003 Demo online at : http://www.robots.ox.ac.uk/~vgg/research/vgoogl e/index.html http://www.robots.ox.ac.uk/~vgg/research/vgoogl e/index.html 64K. Grauman, B. Leibe Query region Retrieved frames
  • Slide 65
  • Summary: Uses of Interest Points Interest points can be detected reliably in different images at the same 3D location DOG interest points are localized in x, y, scale SIFT is robust to rotation and small deformation Interest points provide correspondence For image stitching For defining coordinate frames for object insertion For object recognition and retrieval
  • Slide 66
  • Next class Opportunities of scale: stuff you can do with millions of images Texture synthesis of large regions Recover GPS coordinates Etc.