CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

1

CSP03-04 - Visual input processing

Visual input processing

Lecturer:Smilen Dimitrov

Cross-sensorial processing – MED7

2


Introduction

• The immobot base exercise• Work on the visual input• Goal – object localization

in 3D• Setup:

– PC– Two Logitech QC Zoom

webcams

3


Setup

• Setup for a PC:

1. Logitech QuickCam (QC) drivers 2. QuickTime 3. WinVDig (that corresponds to the installed version of QuickTime) 4. Max/MSP/Jitter

4


Setup

• Camera parameters

Image Sensor: 1/4” Color 640 x 480 Pixel CMOS

Lens type: 3P F/# : F/2.4

Effective focal length : 5.0mm

5


Setup

• Low tech configuration – stereo imaging not guaranteed (frame delays)

• Other options – Bumblebee camera

– True stereo camera – Firewire (power issues, drivers)

• Axis 206 camera

• IP camera (drivers)

6


Goal of the vision processing algorithm

• Object detection: – the application needs to detect the

presence of a new object whenever it enters the monitored environment.

• Object recognition: – Once a new object is detected, it needs

to be classified to determine its type (e.g., a car versus a truck, a tiger versus a deer).

• Object tracking: – Assuming the new object is of interest to

the application, it can be tracked as it moves through the environment. Tracking involves computing current location of the object and its trajectory,

Color tracking

Estimation of 3D location

through two view geometry - stereopsis

7


Goal of the vision processing algorithm

•

8


Color tracking

• Using a Max/MSP/Jitter provided algorithm – jit.findbounds

•Input – min and max color range to react to, and video

•Output – min and max (x,y) coordinates of the rectangle where the color has been found

9


Color tracking

• Jit.findbounds output – rectangle

• Center coordinate2

2

maxmin0

maxmin0

yyy

xxx

10


Color tracking – example code

• Can be performed in Max/MSP javascript using jsui – slow !

11


Color tracking - background

• Video tracking - the process of locating a moving object (or several ones) in time using a camera. An algorithm analyses the video frames and outputs the location of moving targets within the video frame.– video tracking systems usually employ a motion model which

describes how the image of the target might change for different possible motions of the object to track.

• Video tracking approaches: – Blob tracking: Segmentation of object interior (for example

blob detection, block-based correlation or optical flow) – Contour tracking: Detection of object boundary (e.g. active

contours or Condensation algorithm) – Visual feature matching: Registration

• Color tracking is a type of blob tracking

12



• Blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker than the surrounding. There are two main classes of blob detectors

(i) differential methods based on derivative expressions and (ii) methods based on local extrema in the intensity landscape.

• A blob (binary large object) is an area of touching pixels with the same logical state.

• A group of pixels organized into a structure is commonly called a blob. Problems related to blobs:

1. Where are the edges?2. Where is the center?3. How many pixels does it contain?4. What is the average pixel intensity?5. What is the blob's orientation (angle)?

13



• Blob center calculation – simple method

14



• A blob (binary large object) is an area of touching pixels with the same logical state. – All pixels in an image that belong to a blob are in a foreground

state. – All other pixels are in a background state. – In a binary image, pixels in the background have values equal

to zero while every nonzero pixel is part of a binary object.

• For jit.findbounds - this logical test of belonging to the blob is whether the color of the currently tested pixel falls within the range set to be detected

15



• What is easily identifiable by the human eye as several distinct but touching blobs - may be interpreted by software as a single blob.

• A reliable software package will tell you how touching blobs are defined. For example, you can define touching pixels as adjacent pixels along the vertical or horizontal axis as touching or include diagonally adjacent pixels.

• Segmentation of the image - separating the good blobs from the background and each other as well as eliminating everything else that is not of interest.

• Segmentation usually involves a binarization operation – a black and white image result

16



• blob analysis – logical – (generally) performed on black and white image

• Brightness - rectangle algorithm– The rectangle algorithm

keeps track of four points in each frame, the top most, left most, right most and bottom most points where the brightness exceeds a certain threshold value.

17



• Tracking types:(I) objects of a given nature, e.g., cars, people, faces(II) objects of a given nature with a specific attribute, e.g.,

moving cars, walking people, talking heads, face of a given person

(III) objects of a priori unknown nature but of a specific interest, e.g., moving objects, objects of semantic interest manually picked in the first frame

• (I) and (II) - part of the input video frame is searched against a reference model (image patches – or overall shape[geometry]) describing the appearance of the object.

• (III) - the reference can be extracted from the first frame and kept frozen – color tracking

• Recent color tracking algorithms: – MeanShift– Continuously Adaptive Mean Shift (CamShift)

18



• Advanced application of tracking in stereo – matching • Starting from a collection of images or a video sequence the first

step consists in relating the different images to each other.

• two images are shown with the extracted corners. Note that it is not possible to find the corresponding corner for each corner, but that for many of them it is.

• In our example, we are having only one 3D point to deal with – we assume the data obtained from the two cameras are matched

19


Camera parameters

• Extrinsic and intrinsic parameters

• Extrinsic parameters– the orientation of the camera Euclidean co-ordinates with

respect to the world Euclidean co-ordinate system.This relation is given by matrices R and t.

– Thus there are six extrinsic camera parameters; three rotations and three translations.

20


Camera parameters

• Extrinsic and intrinsic parameters

• Intrinsic parameters – coefficients of calibration matrix K

• px and py are the width and the height of the pixels, c=[cx cy 1]T is the principal point (defined as intersection of the optical axis and the retinal [image] plane - center of image plane) and a the skew angle as indicated

21


Stereo 3D localization algorithm

22



• Problem:

23



• Writing the system for the two cameras

24



• Special case – canonical configuration – binocular– The model has two

identical cameras separated only in the X direction by a baseline distance b. The image planes are coplanar in this model.

– The baseline is aligned to the horizontal co-ordinate axis, the optical axes of the cameras are parallel, the epipoles move to infinity, and the epipolar lines in the image planes are parallel.

• Rotation matrices are identity.

• b – distance, f – focal length• Extrinsic parameters

25



• Intrinsic parameters are ignored here – no calibration !

• We will try to scale the coordinates manually until we get something meaningful.

26



• Intersection of the lines in 3D is not guaranteed• Derivation using principle behind CPA (closest points of approach)

– Looking for the closest points on the lines

– Solution using parametric equations

27



• Finally, we obtain the estimate point CMID which we declare to be our object location O(X,Y,Z)

• We will use this in code to calculate the vector location from the obtained coordinates from color tracking

• Will be programmed in JavaScript, and called from Max/MSP/Jitter

28


Problems with the approach

• No calibration – no intrinsic parameters taken into account• Low end cameras – aberrations• Low end cameras – radial distortions

• No guarantee for time sync between left and right images

• In general – approximative/illustrative

29


Implementation in Max/MSP/Jitter

Documents

CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7