Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
CSE152, Spr 2011 Intro Computer Vision
Introduction
Introduction to Computer Vision CSE 152 Lecture 1
CSE152, Spr 2011 Intro Computer Vision
What is Computer Vision? • Trucco and Verri (Text): Computing properties of
the 3-D world from one or more digital images
• Sockman and Shapiro: To make useful decisions about real physical objects and scenes based on sensed images
• Ballard and Brown: The construction of explicit, meaningful description of physical objects from images.
• Forsyth and Ponce: Extracting descriptions of the world from pictures or sequences of pictures”
CSE152, Spr 2011 Intro Computer Vision
Why is this hard?
What is in this image? 1. A hand holding a man? 2. A hand holding a mirrored sphere? 3. An Escher drawing? 4. A 1935 self portrait of Escher
• Interpretations are ambiguous • The forward problem (graphics) is well-posed • The “inverse problem” (vision) is not
CSE152, Spr 2011 Intro Computer Vision
We all make mistakes “640K ought to be enough for anybody.” –
Bill Gates, 1981
“…” – Marvin Minsky
CSE152, Spr 2011 Intro Computer Vision
What do you see?
CSE152, Spr 2011 Intro Computer Vision
What was happening
2
CSE152, Spr 2011 Intro Computer Vision
Should Computer Vision follow from our understanding of Human Vision?
Yes & No
1. Who would ever be crazy enough to even try creating machine vision? 2. Human vision “works”, and copying is easier than creating. 3. Secondary benefit – in trying to mimic human vision, we learn about it.
1. Why limit oneself to human vision when there is even greater diversity in biological vision
2. Why limit oneself to biological when there may be greater diversity in sensing mechanism?
3. Biological vision systems evolved to provide functions for “specific” tasks and “specific” environments. These may differ for machine systems
4. Implementation – hardware is different, and synthetic vision systems may use different techniques/methodologies that are more appropriate to computational mechanisms
CSE152, Spr 2011 Intro Computer Vision
The Near Future: Ubiquitous Vision • Digital video has become really
cheap.
• It’s widely embedded in cell phones, cars, games, etc.
• 99.9% of digitized video isn’t seen by a person.
• That doesn’t mean that only 0.1% is important!
CSE152, Spr 2011 Intro Computer Vision
Applications: touching your life • Digital photography • Google Goggles • Video games • Football • Movies • Surveillance • HCI – hand gestures • Aids to the blind • Face recognition &
biometrics • Road monitoring • Industrial inspection
• Robotic control • Autonomous driving • Space: planetary
exploration, docking • Medicine – pathology,
surgery, diagnosis • Microscopy • Military • Remote Sensing • Virtual Earth; street view • Homeland security
CSE152, Spr 2011 Intro Computer Vision
Related Fields • Image Processing • Computer Graphics • Pattern Recognition • Perception • Robotics • AI
CSE152, Spr 2011 Intro Computer Vision
What is a digital image? • Formed by a camera (usually). • Divided into smaller elements (pixels) • Stored as a matrix of numbers (usually
positive) • 8-bit, 12-bit, 16-bit, float • Color image – 3 channels
CSE152, Spr 2011 Intro Computer Vision
How do we interpret an image? Use different cues
• Variation in appearance in multiple views – stereo – motion
• Shading & highlights • Shadows • Contours • Texture • Blur • Geometric constraints • Prior knowledge
3
CSE152, Spr 2011 Intro Computer Vision
An example of a cue: Shading and lighting
Shading as a result of differences in lighting is
1. A source of information 2. An annoyance
CSE152, Spr 2011 Intro Computer Vision
Illumination Variability
“The variations between the images of the same face due to illumination and viewing direction are almost always larger than image variations due to change in face identity.”
-- Moses, Adini, Ullman, ECCV ‘94
CSE152, Spr 2011 Intro Computer Vision
Lambertian Reflectance:
At image location (x,y) the intensity of a pixel I(x,y) is
I(x,y) = a(x,y) n(x,y) s where • a(x,y) is the albedo of the surface projecting to (x,y). • n(x,y) is the unit surface normal. • s is the direction and strength toward the light source.
An example: Image Formation Model
n s
.
a
I(x,y)
CSE152, Spr 2011 Intro Computer Vision
An implemented algorithm: Relighting
Single Light Source
CSE152, Spr 2011 Intro Computer Vision
An implemented algorithm Photometric Stereo
CSE152, Spr 2011 Intro Computer Vision
The course • Part 1: The Physics of Imaging • Part 2: Early Vision (Segmentation) • Part 3: Reconstruction (Shape-from-X) • Part 4: Recognition
4
CSE152, Spr 2011 Intro Computer Vision
Part I of Course: The Physics of Imaging
• How images are formed – Cameras
• What a camera does • How to tell where the camera was located
– Light • How to measure light • What light does at surfaces • How the brightness values we see in cameras are
determined
– Color • The underlying mechanisms of color • How to describe it and measure it CSE152, Spr 2011 Intro Computer Vision
A real camera … and its model
CSE152, Spr 2011 Intro Computer Vision
Cameras, lenses, and sensors
From Computer Vision, Forsyth and Ponce, Prentice-Hall, 2002.
• Pinhole cameras • Lenses • Projection models • Geometric camera parameters
CSE152, Spr 2011 Intro Computer Vision
Color
CSE152, Spr 2011 Intro Computer Vision
Part II: Early Vision in One Image • Representing small patches of image
– For three reasons • Sharp changes are important in practice --- known as
“edges” • Representing texture by giving some statistics of the
different kinds of small patch present in the texture. – Tigers have lots of bars, few spots – Leopards are the other way
• We wish to establish correspondence between (say) points in different images, so we need to describe the neighborhood of the points
CSE152, Spr 2011 Intro Computer Vision
Segmentation • Which image components “belong together”? • Belong together ≅ lie on the same object • Cues
– similar color – similar texture – not separated by contour – form a suggestive shape when assembled
5
CSE152, Spr 2011 Intro Computer Vision
Texture Patterns [Leung, Malik]
• Regular texture pattern, repeated texture elements • Segment image based on texture • Surface shape from texture pattern
CSE152, Spr 2011 Intro Computer Vision
Boundary Detection
http://www.robots.ox.ac.uk/~vdg/dynamics.html
CSE152, Spr 2011 Intro Computer Vision
Boundary Detection: Local cues
CSE152, Spr 2011 Intro Computer Vision
Boundary Detection: Local cues
CSE152, Spr 2011 Intro Computer Vision
Gradients
CSE152, Spr 2011 Intro Computer Vision
(Sharon, Balun, Brandt, Basri)
6
CSE152, Spr 2011 Intro Computer Vision CSE152, Spr 2011 Intro Computer Vision
Part 3: Reconstruction from Multiple Images
• Photometric Stereo – What we know about the world from lighting
changes. • The geometry of multiple views • Stereopsis
– What we know about the world from having two eyes
• Structure from motion – What we know about the world from having
many eyes • or, more commonly, our eyes moving.
CSE152, Spr 2011 Intro Computer Vision
Mars Rover Spirit
From Viking
CSE152, Spr 2011 Intro Computer Vision
Xbox Kinnect
CSE152, Spr 2011 Intro Computer Vision
Multi-view geometry applications • Image stitching, mosaicing
– http://www.yosemite-17-gigapixels.com/ – 214,414 x 80,571 pixels from 2k images. – Pixel is less than 3 arc seconds.
• Photosynth – http://photosynth.net/ – In Bing Maps
CSE152, Spr 2011 Intro Computer Vision
Images with marked features
7
CSE152, Spr 2011 Intro Computer Vision
Recovered
Recovered model edges reprojected through recovered camera positions into the three original images
CSE152, Spr 2011 Intro Computer Vision
Resulting model & Camera Positions
CSE152, Spr 2011 Intro Computer Vision
Façade (Debevec, Taylor and Malik, 1996) Reconstruction from multiple views, constraints, rendering
Architectural modeling: • photogrammetry; • view-dependent texture mapping; • model-based stereopsis.
Reprinted from “Modeling and Rendering Architecture from Photographs: A Hybrid Geometry- and Image-Based Approach,” By P. Debevec, C.J. Taylor, and J. Malik, Proc. SIGGRAPH (1996). © 1996 ACM, Inc. Included here by permission.
CSE152, Spr 2011 Intro Computer Vision
Façade
CSE152, Spr 2011 Intro Computer Vision
Video-Motion Analysis • Where “things” are
moving in image –segmentation.
• Determining observer motion (egomotion)
• Determining scene structure
• Tracking objects • Understanding
activities & actions
CSE152, Spr 2011 Intro Computer Vision
Forward Translation & Focus of Expansion [Gibson, 1950]
8
CSE152, Spr 2011 Intro Computer Vision
Tracking • Use a model to predict next position and
refine using next image • Model:
– simple dynamic models (second order dynamics) – kinematic models – etc.
• Face tracking and eye tracking now work rather well
CSE152, Spr 2011 Intro Computer Vision
Visual Tracking
Main Challenges 1. 3-D Pose Variation 2. Occlusion of the target 3. Illumination variation 4. Camera jitter 5. Expression variation
etc.
[ Ho, Lee, Kriegman ]
CSE152, Spr 2011 Intro Computer Vision
Tracking
(www.brickstream.com) CSE152, Spr 2011 Intro Computer Vision
Tracking
CSE152, Spr 2011 Intro Computer Vision
Tracking
CSE152, Spr 2011 Intro Computer Vision
Tracking
9
CSE152, Spr 2011 Intro Computer Vision
Tracking
CSE152, Spr 2011 Intro Computer Vision
Part 4: Recognition
Given a database of objects and an image determine what, if any of the objects are present in the image.
CSE152, Spr 2011 Intro Computer Vision
Recognition Challenges • Within-class variability
– Different objects within the class have different shapes or different material characteristics
– Deformable – Articulated – Compositional
• Pose variability: – 2-D Image transformation (translation, rotation, scale) – 3-D Pose Variability (perspective, orthographic projection)
• Lighting – Direction (multiple sources & type) – Color – Shadows
• Occlusion – partial • Clutter in background -> false positives
CSE152, Spr 2011 Intro Computer Vision
Face Detection: First Step
• Digital Cameras: Face + Expression • Street view
• Crowd counting
CSE152, Spr 2011 Intro Computer Vision
Why is Face Recognition Hard? Many faces of Madona
CSE152, Spr 2011 Intro Computer Vision
Face Recognition: 2-D and 3-D
2-D
Face Database
Time (video)
2-D
Recognition Data
3-D 3-D
Recognition Comparison
10
CSE152, Spr 2011 Intro Computer Vision
Yale Face Database B
64 Lighting Conditions 9 Poses => 576 Images per Person
CSE152, Spr 2011 Intro Computer Vision http://www.ri.cmu.edu/projects/project_271.html
CSE152, Spr 2011 Intro Computer Vision
Model-Based Vision
• Given 3-D models of each object • Detect image features (often edges, line segments, conic sections) • Establish correspondence between model &image features • Estimate pose • Consistency of projected model with image.
CSE152, Spr 2011 Intro Computer Vision
• Google Goggles
CSE152, Spr 2011 Intro Computer Vision
Announcements • Class Web Page:
– http://www.cs.ucsd.edu/classes/sp11/cse152-a/
• Assignment 0: “Getting Started with Matlab” (to be posted soon), due 4/7/11
• Read Chapters 1 Trucco & Verri
CSE152, Spr 2011 Intro Computer Vision
The Syllabus