Making Computers See
ECE 5554: Computer Vision
Devi Parikh
Assistant Professor
ECE, Virginia Tech
Disclaimer: Many slides have been borrowed from Kristen Grauman, who may have borrowed some of them from others. Any time a slide did not already have a credit on it, I have credited it to Kristen. So there is a chance some of these credits are inaccurate.
Plan for today
Topic overview
Introductions
Little bit about me
Little bit about you
Course overview:
Logistics and requirements
Coming up
Please interrupt at any time with questions or comments
2
Slide credit: Devi Parikh
2
What Is Computer Vision?
3
Slide credit: Devi Parikh
3
Computer Vision:Making Computers See
4
Image from: http://kirkh.deviantart.com/art/BioMech-Eye-168367549
4
Computer Vision
Automatic understanding of images and video
Computing properties of the 3D world from visual data (measurement)
5
Slide credit: Kristen Grauman
5
5
1. Vision for measurement
Real-time stereo
Structure from motion
NASA Mars Rover
Tracking
Demirdjian et al.
Snavely et al.
Wang et al.
6
Slide credit: Kristen Grauman
6
6
Computer Vision
Automatic understanding of images and video
Computing properties of the 3D world from visual data (measurement)
Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)
7
Slide credit: Kristen Grauman
7
7
sky
water
Ferris wheel
amusement park
Cedar Point
12 E
tree
tree
tree
carousel
deck
people waiting in line
ride
ride
ride
umbrellas
pedestrians
maxair
bench
tree
Lake Erie
people sitting on ride
Objects
Activities
Scenes
Locations
Text / writing
Faces
Gestures
Motions
Emotions
The Wicked Twister
2. Vision for perception, interpretation
8
Slide credit: Kristen Grauman
8
8
Computer Vision
Automatic understanding of images and video
Computing properties of the 3D world from visual data (measurement)
Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities. (perception and interpretation)
Algorithms to mine, search, and interact with visual data (search and organization)
9
Slide credit: Kristen Grauman
9
9
3. Visual search, organization
Image or video archives
?
Query
1
2
3
Relevant content
10
Slide credit: Kristen Grauman
10
Related disciplines
Cognitive science
Algorithms
Image processing
Artificial intelligence
Graphics
Machine learning
Computer vision
11
Slide credit: Kristen Grauman
11
Vision and graphics
Model
Images
Vision
Graphics
Inverse problems: analysis and synthesis.
12
Slide credit: Kristen Grauman
12
Slide credit: Larry Zitnick
What humans see
13
2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
13
Slide credit: Larry Zitnick
What computers see
14
2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
14
Slide credit: Larry Zitnick
What do humans see?
15
15
How many object categories are there?
~10,000 to 30,000
Biederman 1987
16
Slide credit: Fei-Fei, Fergus, Torralba CVPR07 Short Course
16
17
Slide credit: Devi Parikh
17
Torralba et al. PAMI 2008
Slide credit: Larry Zitnick
What do humans see?
18
Torralba et al. PAMI 2008
chair
table setting
light
picture
Slide credit: Larry Zitnick
What do humans see?
Slide credit: Larry Zitnick
What do humans see?
20
Why is vision difficult?
Ill-posed problem: real world much more complex than what we can measure in images
3D 2D
Impossible to literally invert image formation process
21
Slide credit: Kristen Grauman
21
Challenges: many nuisance parameters
Illumination
Object pose
Clutter
Viewpoint
Intra-class appearance
Occlusions
22
Slide credit: Kristen Grauman
22
Challenges: intra-class variation
23
Slide credit: Fei-Fei, Fergus & Torralba
23
Challenges: importance of context
24
Slide credit: Fei-Fei, Fergus & Torralba
24
Challenges: complexity
Thousands to millions of pixels in an image
3,000-30,000 human recognizable object categories
30+ degrees of freedom in the pose of articulated objects (humans)
Billions of images indexed by Google Image Search
18 billion+ prints produced from digital camera images in 2004
295.5 million camera phones sold in 2005
About half of the cerebral cortex in primates is devoted to processing visual information [Felleman and van Essen 1991]
25
Slide credit: Kristen Grauman
25
spend the summer linking a camera to a
computer and getting the computer to describe what it saw
Marvin Minsky (1966), MIT
Turing Award (1969)
47 years later
26
Slide credit: Devi Parikh
How hard is computer vision?
26
Gerald Sussman, MIT
Youll notice that Sussman never worked in vision again! Berthold Horn
Slide credit: Devi Parikh
How hard is computer vision?
27
Progress so far
28
Slide credit: Devi Parikh
28
Progress so far
29
Slide credit: Devi Parikh
Progress so far
30
Slide credit: Devi Parikh
30
Progress so far
31
Slide credit: Devi Parikh
Location
AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua
32
Slide credit: Devi Parikh
33
Slide credit: Devi Parikh
AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua
3D Models
Rome
2 million photos
Dubrovnik
58 thousand photos
34
Slide credit: Devi Parikh
Dubrovnik
35
Slide credit: Devi Parikh
AutoTagger: Yunpeng Li, Noah Snavely, Dan Huttenlocher and Pascal Fua
Progress so far
36
Slide credit: Devi Parikh
36
Progress so far
37
Slide credit: Devi Parikh
Progress so far
38
Slide credit: Devi Parikh
L. G. Roberts, Machine Perception of Three Dimensional Solids, Ph.D. thesis, MIT Department of Electrical Engineering, 1963.
Visual data in 1963
39
Slide credit: Kristen Grauman
39
Personal photo albums
Surveillance and security
Movies, news, sports
Medical and scientific images
Visual data in 2013
40
Slide credit: Svetlana Lazebnik
40
40
Why vision?
As image sources multiply, so do applications
Relieve humans of boring, easy tasks
Enhance human abilities
Advance human-computer interaction, visualization
Perception for robotics / autonomous agents
Organize and give access to visual content
41
Slide credit: Kristen Grauman
Applications
Post-disaster family reunification
Law enforcement
Surveillance
Robotics
Autonomous driving
Medical imaging
Photo organization
Image search
E-commerce
cell phone cameras, social media, Google Glass, etc.
42
Slide credit: Devi Parikh
Summary
Computer Vision is a hard problem.
Lots of cool and important applications.
A growing and exciting field.
New teams in existing companies, new companies, etc.
43
Slide credit: Devi Parikh
Introductions
Instructor
Devi Parikh
Joined ECE, Virginia Tech in January 2013
Research Assistant Professor at TTI-Chicago for 3.5 years
Ph.D. from Carnegie Mellon University in 2009
Research area: Computer Vision
Recognition
Human-machine communication
44
Slide credit: Devi Parikh
Introductions
TA:
Neelima Chavali
M.S. student in ECE
45
Slide credit: Devi Parikh
Introductions
You
Name?
Department?
What are you hoping to get out of this course?
Do you have any experience in computer vision?
46
Slide credit: Devi Parikh
This course
ECE 5554
TR 3:30 pm to 4:45 pm
McBryde Hall (MCB) 307
My office hours: F 2:30 pm to 3:30 pm
Neelimas office hours: MW 11:00 am to noon
Course webpage:
http://filebox.ece.vt.edu/~F13ECE5554/
(Google me My homepage Teaching)
47
Slide credit: Devi Parikh
47
47
This course
Introductory Computer Vision course
Basics and fundamentals
Hands-on assignments and projects
Views of vision as a research area
48
Slide credit: Devi Parikh
48
48
Other courses
Advanced Computer Vision
Devi Parikh
Spring semesters
Introduction to Machine Learning and Perception
Dhruv Batra
Fall semesters
Advanced Machine Learning
Dhruv Batra
Spring semesters
49
Slide credit: Devi Parikh
49
49
Topics overview
Features & filters
Grouping & fitting
Multiple views and motion
Recognition
Video processing
Focus is on algorithms, rather than specific systems.
50
Slide credit: Kristen Grauman
50
Features and filters
Transforming and describing images; textures, colors, edges
51
Slide credit: Kristen Grauman
51
Grouping & fitting
[fig from Shi et al]
Clustering, segmentation, fitting; what parts belong together?
52
Slide credit: Kristen Grauman
52
Multiple views and motion
Hartley and Zisserman
Lowe
Multi-view geometry, matching, invariant features, stereo vision
Fei-Fei Li
53
Slide credit: Kristen Grauman
53
Recognition and learning
Recognizing objects and categories, learning techniques
54
Slide credit: Kristen Grauman
54
Video processing
Tomas Izo
Tracking objects, video analysis, low level motion, optical flow
55
Slide credit: Kristen Grauman
55
Textbooks
Recommended book:
Computer Vision:
Algorithms and Applications
By Rick Szeliski
http://szeliski.org/Book/
Lectures will be posted online
56
Slide credit: Kristen Grauman
56
Requirements / Grading
Problem sets (55%)
Project (25%)
Final exam (15%)
Class participation, including attendance (5%)
57
Slide credit: Devi Parikh
57
Problem sets
Some short answer concept questions
Programming problems
Implementation
Explanation, results
Follow instructions. Points will be deducted if cant run your code on our data, cant run our code on your data, etc.
Ask questions on Scholar forum first
Code in Matlab
These assignments are substantial.
They will take significant time to do.
Start early.
58
Slide credit: Kristen Grauman
58
Matlab
Built-in toolboxes for low-level image processing, visualization
Compact programs
Intuitive interactive debugging
Widely used in engineering
59
Slide credit: Kristen Grauman
59
PS0
PS0: Matlab warmup + basic image manipulation
Out today, due in ~ a week (Monday night)
60
Slide credit: Kristen Grauman
60
Digital images
Images as matrices
61
Slide credit: Kristen Grauman
61
im[176][201] has value 164
im[194][203] has value 37
width 520
j=1
500 height
i=1
Intensity : [0,255]
Digital images
62
Slide credit: Kristen Grauman
62
R
G
B
Color images, RGB color space
63
Slide credit: Kristen Grauman
63
Preview of some problem sets
64
Slide credit: Devi Parikh
resize: castle squished
crop: castle cropped
content aware resizing:
seam carving
Preview of some problem sets
Grouping
65
Slide credit: Kristen Grauman
65
Preview of some problem sets
Image mosaics / stitching
66
Slide credit: Kristen Grauman, Image from: Fei-Fei Li
66
Preview of some problem sets
Object search and recognition
67
Slide credit: Kristen Grauman
67
Preview of some problem sets
Tracking, activity recognition
68
Slide credit: Kristen Grauman
68
Assignment deadlines
Assignments in by 11:59 PM the day before class
Follow submission instructions given in assignment
Submit to scholar
No hard copy submissions
Deadlines are firm. Well use scholar timestamp. Even 1 minute late is late.
4 total free late days for the semester
Use them wisely: first two assignments are easier than others
If your program doesnt work, clean up the code, comment it well, explain what you have, and still submit. Draw our attention to this in your answer sheet.
69
Slide adapted from Kristen Grauman
69
Projects
Possibilities:
Apply any techniques we studied in class or related to real world problem
Extend a technique
Empirically analyze a technique
Compare approaches
Design and evaluate a novel approach
Novel application
Be creative!
Publication?
Can work with a partner
Talk to me if you need help with ideas
70
Slide credit: Devi Parikh
70
70
Project timeline (tenative)
Project proposals (1 page) [10%]
October 1st
Project presentations (5-10 minutes) [40%]
November 23rd (Saturday)
If anticipate being a problem, talk to me well in advance for alternate arrangements
Project reports (4 pages) [50%]
December 10th
71
Slide credit: Devi Parikh
71
71
Collaboration policy
All responses and code must be written individually.
Students submitting answers or code found to be identical or substantially similar (due to inappropriate collaboration) risk failing the course.
72
Slide adapted from Kristen Grauman
72
Miscellaneous
Check class website regularly
No laptops, phones, etc. open in class please.
Use our office hours!
Please interrupt with questions at any time.
73
Slide adapted from Kristen Grauman
73
Coming up
Now: read the class webpage carefully
Now: check out Matlab tutorial online
Now: PS0 is out
Thursday August 29th : first lecture on linear filters
Monday September 2nd : PS0 due
74
Slide credit: Devi Parikh
74
Questions?
See you Thursday!
Big triangles
Little triangles
?
Slide credit: Devi Parikh
76
Example: Big triangles vs. Little triangles
76
Slide credit: Devi Parikh
Example: Big triangles vs. Little triangles
77
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
1
1
0
0
0
0
0
0
0
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
0
0
1
1
1
1
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sum = 16
Sum = 13
Sum = 2
Sum = 3
Rule
If sum > 10
Answer = Big triangle
Else
Answer = Little triangle
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
?
Sum = 2
Little triangle
Slide credit: Devi Parikh
Example: Big triangles vs. Little triangles
78
78