Computer Vision Dan Witzner Hansen Course web page: Email: witzner@itu.dk

Preview:

Citation preview

Computer Vision

Dan Witzner Hansen

Course web page:www.itu.dk/courses/MCV

Email:witzner@itu.dk

What is Vision?

Today

• Introduction to the course• Crash course in 2D and 3D

geometry (Brush-up from High school)

• (Solving linear equations and least squares solution to linear equations)

The Vision Problem

How to infer salient properties of 3-D world from time-varying

2-D image projection

¤ What is salient?¤ How to deal with loss of information going from 3-D to 2-D?

Computer Vision: Stages

• Image formation• Low-level

– Single image processing– Multiple views

• Mid-level– Estimation, segmentation (main topic of Image

Analysis and Foundations of Image Analysis and will only be covered briefly here)

• High-level – Recognition– Classification

Image Formation

• 3-D geometry• Physics of light• Camera properties

– Focal length– Distortion

• Sampling issues– Spatial– Temporal

Low-level: Single Image Processing

• Filtering– Edge– Color – Local pattern similarity

• Texture– Appearance characterization from the

statistics of applying multiple filters• 3-D structure estimation from…

– Shading– Texture

Low-level: Multiple Views

• Stereo– Structure from two views

• Structure from motion– What can we learn in general from

many views, whether they were taken simultaneously or sequentially?

Mid-Level: Estimation, Segmentation

• Estimation: Fitting parameters to data– Static (e.g., shape)– Dynamic (e.g., tracking)

• Segmentation/clustering– Breaking an image or image

sequence into a few meaningful pieces with internal similarity

High-level: Recognition, Classification

• Recognition: Finding and parametrizing a known object

• Classification– Assignment to known categories

using statistics/probability to make best choice

Course Overview

Image formation and cameras

Projective geometryRelating pointsbetween images

Motion Motion analysisObject Tracking

Shape and recognition

Shape anaysisObject recognition

APPLICATIONS

Applications: Factory Inspection

Cognex’s “CapInspect” system:

Low-level image analysis: Identify edges, regionsMid-level: Distinguish “cap” from “no cap”

Estimation: What are orientation of cap, height of liquid?

Applications: Face Detection

courtesy of H. Rowley

How is this like the bottle problem on the previous slide?

Applications: Text Detection & Recognition

from J. Zhang et al.

Similar to face finding: Where is the text and what does it say?Viewing at an angle complicates things...

Detection and Recognition: How?

• Build models of the appearance characteristics (color, texture, etc.) of all objects of interest

• Detection: Look for areas of image with sufficiently similar appearance to a particular object

• Recognition: Decide which of several objects is most similar to what we see

• Segmentation: “Recognize” every pixel

Applications: Virtual Advertising

courtesy of Princeton Video Image

First-Down Line, Virtual Advertising: How?

• Where should message go?– Sensors that measure pan, tilt, zoom and focus are

attached to calibrated cameras at surveyed positions– Knowledge of the 3-D position of the line, advertising

rectangle, etc. can be directly translated into where in the image it should appear for a given camera

• What pixels get painted?– Occluding image objects like the ball, players, etc.

where the graphic is to be put must be segmented out. These are recognized by being a sufficiently different color from the background at that point. This allows pixel-by-pixel compositing.

Applications: Inserting Computer Graphics with a

Moving Camera

CG Insertion with a Moving Camera: How?

• This technique is often called matchmove• Once again, we need camera calibration, but

also information on how the camera is moving—its egomotion. This allows the CG object to correctly move with the real scene, even if we don’t know the 3-D parameters of that scene.

• Estimating camera motion:– Much simpler if we know camera is moving sideways

because then the problem is only 2-D – For general motions: By identifying and following

scene features over the entire length of the shot, we can solve retrospectively for what 3-D camera motion would be consistent with their 2-D image tracks. Must also make sure to ignore independently moving objects like cars and people.

Applications: Motion Capture

Vicon software:12 cameras, 41 markers for body capture;

6 zoom cameras, 30 markers for face

Applications: Motion Capture without Markers

courtesy of C. Bregler

What’s the difference between these two problems?

Motion Capture: How?

• Similar to matchmove in that we follow features and estimate underlying motion that explains their tracks

• Difference is that the motion is not of the camera but rather of the subject (though camera could be moving, too)– Face/arm/person has more degrees of

freedom than camera flying through space, but still constrained

• Special markers make feature identification and tracking considerably easier

• Multiple cameras gather more information

Applications: Image-Based Modeling

courtesy of P. Debevec

Façade project: UC Berkeley Campanile

Image-Based Modeling: How?

• 3-D model constructed from manually-selected line correspondences in images from multiple calibrated cameras

• Novel views generated by texture-mapping selected images onto model

A Movie

Movie

Applications: Robotics

Autonomous driving: Lane & vehicle tracking (with radar)

Human Computer Interaction

What is the relationship between many of these

applications?• Knowledge of

– Cameras– Motion and Tracking– Shapes and object recognition– Mathematics and Statistics

Course Prerequisites

• Background in/comfort with:– Linear algebra– Multi-variable calculus– Statistics, probability

• Homeworks will use Matlab but you are also welcome to use C/C++ (harder though)– An ability to program in C/C++, Java, or

equivalent should be sufficient preparation, but knowing Matlab is better (no introduction given, but you can come see me if needed)

Grading

• 100 % on mandatory assignments• Submission ON TIME

More specifically…..

Single View Examples

Mosaicing

Stereo

Stereo reconstruction

Tracking, Shape and HCI

After the course• Understand, choose between, and apply various computer

vision algorithms. • Understand the relations between objects in the 3D world and

those obtained from cameras. • Understand the principles on how to make 3D models

(reconstruction) from images. • Write programs which are able to follow objects in pre-

recorded movies or live images obtained from cameras in either Matlab or C++.

• Understand principles for making computer vision systems that aim towards enabling humans to interact with a computer through cameras.

Reading Material

• Textbooks: – “Multiple View Geometry” Hartley and Zisserman– ”Introductory Techniques for 3D Computer Vision”(less

important)

• Supplemental readings will be available online as PDF files and a few as photocopies from books.

• Complete assigned reading before corresponding lecture and re-read difficult parts after the lecture.

• This is NOT an easy course so, expect at least 15 hrs WORK each week.

• Show up for ALL lectures.

Details

• Homework– Submission at to me in by the end of

the exercises.– Expect to have it ready before the

exercises, though!– NO Lateness policy – Add-on’s will be

exprected if late• Exam

– Submission of mandatory assignments by the end of the semeste.

More Details

• Instructor– E-mail: witzner@itu.dk– Office hours (by appointment):

• Friday, 10:00-12:00 pm

Remember that semester projects in connection with the course are possible.

Your First Assignment

• Try to get Matlab running• Take a look at a Matlab primer• Unfortunately most of the tools

(mathematics) have to be developed in the beginning of the course and it may therefore seem quite mathematical.

• DON’T LET THAT DISCURAGE YOU

More questions?

First try the web page:www.itu.dk/courses/MCV

Feel free to e-mail me at any time

What is needed here?

Recommended