29
1 CSP03-04 - Visual input processing Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

Embed Size (px)

Citation preview

Page 1: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

1

CSP03-04 - Visual input processing

Visual input processing

Lecturer:Smilen Dimitrov

Cross-sensorial processing – MED7

Page 2: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

2

CSP03-04 - Visual input processing

Introduction

• The immobot base exercise• Work on the visual input• Goal – object localization

in 3D• Setup:

– PC– Two Logitech QC Zoom

webcams

Page 3: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

3

CSP03-04 - Visual input processing

Setup

• Setup for a PC:

1. Logitech QuickCam (QC) drivers 2. QuickTime 3. WinVDig (that corresponds to the installed version of QuickTime) 4. Max/MSP/Jitter

Page 4: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

4

CSP03-04 - Visual input processing

Setup

• Camera parameters

Image Sensor: 1/4” Color 640 x 480 Pixel CMOS

Lens type: 3P F/# : F/2.4

Effective focal length : 5.0mm

Page 5: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

5

CSP03-04 - Visual input processing

Setup

• Low tech configuration – stereo imaging not guaranteed (frame delays)

• Other options – Bumblebee camera

– True stereo camera – Firewire (power issues, drivers)

• Axis 206 camera

• IP camera (drivers)

Page 6: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

6

CSP03-04 - Visual input processing

Goal of the vision processing algorithm

• Object detection: – the application needs to detect the

presence of a new object whenever it enters the monitored environment.

• Object recognition: – Once a new object is detected, it needs

to be classified to determine its type (e.g., a car versus a truck, a tiger versus a deer).

• Object tracking: – Assuming the new object is of interest to

the application, it can be tracked as it moves through the environment. Tracking involves computing current location of the object and its trajectory,

Color tracking

Estimation of 3D location

through two view geometry - stereopsis

Page 7: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

7

CSP03-04 - Visual input processing

Goal of the vision processing algorithm

Page 8: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

8

CSP03-04 - Visual input processing

Color tracking

• Using a Max/MSP/Jitter provided algorithm – jit.findbounds

•Input – min and max color range to react to, and video

•Output – min and max (x,y) coordinates of the rectangle where the color has been found

Page 9: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

9

CSP03-04 - Visual input processing

Color tracking

• Jit.findbounds output – rectangle

• Center coordinate2

2

maxmin0

maxmin0

yyy

xxx

Page 10: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

10

CSP03-04 - Visual input processing

Color tracking – example code

• Can be performed in Max/MSP javascript using jsui – slow !

Page 11: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

11

CSP03-04 - Visual input processing

Color tracking - background

• Video tracking - the process of locating a moving object (or several ones) in time using a camera. An algorithm analyses the video frames and outputs the location of moving targets within the video frame.– video tracking systems usually employ a motion model which

describes how the image of the target might change for different possible motions of the object to track.

• Video tracking approaches: – Blob tracking: Segmentation of object interior (for example

blob detection, block-based correlation or optical flow) – Contour tracking: Detection of object boundary (e.g. active

contours or Condensation algorithm) – Visual feature matching: Registration

• Color tracking is a type of blob tracking

Page 12: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

12

CSP03-04 - Visual input processing

Color tracking - background

• Blob detection refers to visual modules that are aimed at detecting points and/or regions in the image that are either brighter or darker than the surrounding. There are two main classes of blob detectors

(i) differential methods based on derivative expressions and (ii) methods based on local extrema in the intensity landscape.

• A blob (binary large object) is an area of touching pixels with the same logical state.

• A group of pixels organized into a structure is commonly called a blob. Problems related to blobs:

1. Where are the edges?2. Where is the center?3. How many pixels does it contain?4. What is the average pixel intensity?5. What is the blob's orientation (angle)?

Page 13: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

13

CSP03-04 - Visual input processing

Color tracking - background

• Blob center calculation – simple method

Page 14: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

14

CSP03-04 - Visual input processing

Color tracking - background

• A blob (binary large object) is an area of touching pixels with the same logical state. – All pixels in an image that belong to a blob are in a foreground

state. – All other pixels are in a background state. – In a binary image, pixels in the background have values equal

to zero while every nonzero pixel is part of a binary object.

• For jit.findbounds - this logical test of belonging to the blob is whether the color of the currently tested pixel falls within the range set to be detected

Page 15: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

15

CSP03-04 - Visual input processing

Color tracking - background

• What is easily identifiable by the human eye as several distinct but touching blobs - may be interpreted by software as a single blob.

• A reliable software package will tell you how touching blobs are defined. For example, you can define touching pixels as adjacent pixels along the vertical or horizontal axis as touching or include diagonally adjacent pixels.

• Segmentation of the image - separating the good blobs from the background and each other as well as eliminating everything else that is not of interest.

• Segmentation usually involves a binarization operation – a black and white image result

Page 16: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

16

CSP03-04 - Visual input processing

Color tracking - background

• blob analysis – logical – (generally) performed on black and white image

• Brightness - rectangle algorithm– The rectangle algorithm

keeps track of four points in each frame, the top most, left most, right most and bottom most points where the brightness exceeds a certain threshold value.

Page 17: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

17

CSP03-04 - Visual input processing

Color tracking - background

• Tracking types:(I) objects of a given nature, e.g., cars, people, faces(II) objects of a given nature with a specific attribute, e.g.,

moving cars, walking people, talking heads, face of a given person

(III) objects of a priori unknown nature but of a specific interest, e.g., moving objects, objects of semantic interest manually picked in the first frame

• (I) and (II) - part of the input video frame is searched against a reference model (image patches – or overall shape[geometry]) describing the appearance of the object.

• (III) - the reference can be extracted from the first frame and kept frozen – color tracking

• Recent color tracking algorithms: – MeanShift– Continuously Adaptive Mean Shift (CamShift)

Page 18: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

18

CSP03-04 - Visual input processing

Color tracking - background

• Advanced application of tracking in stereo – matching • Starting from a collection of images or a video sequence the first

step consists in relating the different images to each other.

• two images are shown with the extracted corners. Note that it is not possible to find the corresponding corner for each corner, but that for many of them it is.

• In our example, we are having only one 3D point to deal with – we assume the data obtained from the two cameras are matched

Page 19: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

19

CSP03-04 - Visual input processing

Camera parameters

• Extrinsic and intrinsic parameters

• Extrinsic parameters– the orientation of the camera Euclidean co-ordinates with

respect to the world Euclidean co-ordinate system.This relation is given by matrices R and t.

– Thus there are six extrinsic camera parameters; three rotations and three translations.

Page 20: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

20

CSP03-04 - Visual input processing

Camera parameters

• Extrinsic and intrinsic parameters

• Intrinsic parameters – coefficients of calibration matrix K

• px and py are the width and the height of the pixels, c=[cx cy 1]T is the principal point (defined as intersection of the optical axis and the retinal [image] plane - center of image plane) and a the skew angle as indicated

Page 21: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

21

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

Page 22: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

22

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Problem:

Page 23: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

23

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Writing the system for the two cameras

Page 24: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

24

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Special case – canonical configuration – binocular– The model has two

identical cameras separated only in the X direction by a baseline distance b. The image planes are coplanar in this model.

– The baseline is aligned to the horizontal co-ordinate axis, the optical axes of the cameras are parallel, the epipoles move to infinity, and the epipolar lines in the image planes are parallel.

• Rotation matrices are identity.

• b – distance, f – focal length• Extrinsic parameters

Page 25: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

25

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Intrinsic parameters are ignored here – no calibration !

• We will try to scale the coordinates manually until we get something meaningful.

Page 26: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

26

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Intersection of the lines in 3D is not guaranteed• Derivation using principle behind CPA (closest points of approach)

– Looking for the closest points on the lines

– Solution using parametric equations

Page 27: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

27

CSP03-04 - Visual input processing

Stereo 3D localization algorithm

• Finally, we obtain the estimate point CMID which we declare to be our object location O(X,Y,Z)

• We will use this in code to calculate the vector location from the obtained coordinates from color tracking

• Will be programmed in JavaScript, and called from Max/MSP/Jitter

Page 28: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

28

CSP03-04 - Visual input processing

Problems with the approach

• No calibration – no intrinsic parameters taken into account• Low end cameras – aberrations• Low end cameras – radial distortions

• No guarantee for time sync between left and right images

• In general – approximative/illustrative

Page 29: CSP03-04 - Visual input processing 1 Visual input processing Lecturer: Smilen Dimitrov Cross-sensorial processing – MED7

29

CSP03-04 - Visual input processing

Implementation in Max/MSP/Jitter