25
CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Embed Size (px)

Citation preview

Page 1: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

CS 376bIntroduction to Computer Vision

04 / 28 / 2008

Instructor: Michael Eckmann

Page 2: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376b - Spring 2008

Today’s Topics• Comments/Questions

• Chapter 11 – 2D matching– 2D transformations

• shear• reflection

– General Affine– matching in 2d (models to images)

• several methods

Page 3: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Geometric Transformations (2D)• Translations

• Rotations

• Scaling

• Homogeneous Coordinates

• Shearing

• Reflections

Page 4: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Homogeneous Coordinates• Transformations on homogeneous coordinates

• TRANSLATION: (x2, y

2) = (x

1 + t

x , y

1 + t

y )

( x2 ) = ( 1 0 t

x ) ( x

1 )

( y2 ) ( 0 1 t

y ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

• SCALING: (x2, y

2) = (s

xx

1 , s

yy

1 )

( x2 ) = ( s

x 0 0 ) ( x

1 )

( y2 ) ( 0 s

y 0 ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

Page 5: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Homogeneous Coordinates• Transformations on homogeneous coordinates

• ROTATION: (x2, y

2) = (x

1 cos(B) – y

1 sin(B) , y

1 cos(B) + x

1 sin(B)

)

( x2 ) = (cos(B)

– sin(B) 0 ) ( x

1 )

( y2 ) (sin(B)

cos(B) 0 ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

• These three transform matrices are sometimes written as– T(t

x,t

y)

– S(sx,s

y)

– R(B)

Page 6: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

from Shapiro and Stockman fig. 11.8

Page 7: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

from Shapiro and Stockman fig. 11.8 & table 11.1

Page 8: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

from Shapiro and Stockman fig. 11.8 & table 11.2

Page 9: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

using control points to determine transformation

• For this example, the book assumes that the scaling is already taken care of

due to some controlled imaging environment, so we only need to compute

the rotation and translation which is a matrix of the form:

( u1 ) = (cos(B)

– sin(B) u

0 ) ( x

1 )

( v1 ) (sin(B)

cos(B) v

0 ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

• Given the matches we already have, we can take a pair of matches and compute the angle to rotate like so. Assume ( u

1 , v

1 ) and ( x

1 , y

1 ) match

and ( u2 ,

v2 ) and ( x

2 , y

2 ) match, we can compute the angle of the vector

( u2 ,

v2 ) - ( u

1 , v

1 ) and the angle of the vector ( x

2 , y

2 ) - ( x

1 , y

1 ) and

subtract them to get the angle B.

Page 10: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

using control points to determine transformation

• Given the matches we already have, we can take a pair of matches and

compute the angle to rotate like so. Assume ( u1 ,

v1 ) and ( x

1 , y

1 ) match

and ( u2 ,

v2 ) and ( x

2 , y

2 ) match, we can compute the angle of the vector

( u2 ,

v2 ) - ( u

1 , v

1 ) and the angle of the vector ( x

2 , y

2 ) - ( x

1 , y

1 ) and

subtract them to get the angle B.

• Let's do this for this pair of matches:– (8,17) matches (10,12) and

– (16,26) matches (10,24)

– a1 = arctan((26-17)/(16-8)) = 0.844 radians

– a2 = arctan((24-12)/(10-10)) = 1.571 radians

– B = a2-a1 = 0.727 radians

Page 11: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

using control points to determine transformation

• Now that we know B, how many pairs of matches are necessary to

determine this transform:

( u1 ) = (cos(B)

– sin(B) u

0 ) ( x

1 )

( v1 ) (sin(B)

cos(B) v

0 ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

Page 12: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

using control points to determine transformation

• Now that we know B, how many pairs of matches are necessary to

determine this transform:

( u1 ) = (cos(B)

– sin(B) u

0 ) ( x

1 )

( v1 ) (sin(B)

cos(B) v

0 ) ( y

1 )

( 1 ) ( 0 0 1 ) ( 1 )

• Just need to use 1 pair of matching points (2 equations and 2 unknowns) to determine the translation in u and v.

Page 13: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Shearing• Y shear:

(1 0 0)

(shy 1 0)

(0 0 1)

• X shear:

(1 shx 0)

(0 1 0)(0 0 1)

• Examples of what these do on the board to a rectangle.

Page 14: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Reflections• About the y-axis:

(-1 0 0)(0 1 0)(0 0 1)

• About the x-axis:(1 0 0)(0 -1 0)(0 0 1)

• About the z-axis:(-1 0 0)(0 -1 0)(0 0 1)

• Examples of what these do on the board.

• About the line y=x:(0 1 0)(1 0 0)(0 0 1)

• About the line y=-x:(0 -1 0)(-1 0 0)(0 0 1)

Page 15: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

General Affine Transformations• A general affine transformation allows any combination of translation,

rotation, scaling, shearing and reflection. The general form of an

affine transformation matrix is therefore:

(a11

a12

a13

)

(a21

a22

a23

)(0 0 1 )

• How many pairs of matching points would it take to determine a general affine transformation?

Page 16: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

General Affine Transformations• We have 6 unknowns, so we need 6 equations (or more) which can be

generated from 3 matching pairs of points. How does that give us 6

equations?

• Since there may be some error in the matches or coordinates, a better

way to determine the transform would be to use a least squares

approach (again if your error is Gaussian with mean=0).

• Similarly to how we computed a least squares error equation for lines,

we can do:

• error(a11

, a12

, a13

, a21

, a22

, a23

) =

Sum((a11

xj + a

12y

j + a

13 - u

j)2 + (a

21x

j + a

22y

j + a

23 - v

j)2)

Page 17: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

General Affine Transformations(equations from Shapiro and Stockman)

Page 18: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

2D object recognition via affine mapping• Our text describes 3 techniques to determine an affine transformation

from a model to an image.

• Local Feature Focus method

• Pose Clustering

• Geometric Hashing

Page 19: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Local Feature Focus method• This is a process to determine if an object model appears in an image

and if so, what is the general affine transform between the model and

the image.

• The model has a set of focus features, which are major features that

should be able to be found in an image of this object easily (as long as

they are not occluded).

• The model also has a set of nearby features for each focus feature to

allow verification of a correct focus feature match and to help

determine position and orientation.

• See figure 11.12.

Page 20: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Pose Clustering• This is another process to determine if an object model appears in an

image and if so, what is the general affine transform between the

model and the image.

• The model has a set of features and the image has a set of features.

These need to be matched. The general idea of pose clustering is to

take every possible pair of matching points and compute an RST

transform then check for clusters of RST transforms.

• To get less redundancy and more accuracy, instead of doing every

possible pair we can– filter our features by type, where a certain type of feature will only

match a feature of the same type (ex. fig. 11.13)

Page 21: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Pose Clustering• To get less redundancy and more accuracy, instead of doing every

possible pair we can– filter our features by type, where a certain type of feature will only

match a feature of the same type

– then only use pairs of matching points that satisfy the above

• Compute the RST transforms as before but now for a smaller set of matching pairs.

• For each RST transform with specific computed parameters, count the number of other RST transforms that are within some distance of the transform parameters.

• There are n-1 distance computations for each of n parameter sets of RST transforms.

• Or can use binning (like Hough) --- this will be faster but bins need to be chosen well to capture similar parameter sets in the same bin.

Page 22: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Geometric Hashing• The last two procedures allowed us to determine if a particular model

object was found in an image. What if we had many models that we

wanted to check against our images?

• If use pose clustering or local feature focus method, then each model

would have to be checked separately to determine if it's in the image.

• Geometric hashing allows us to check among a large database of

models to determine if any of them are in the image.

Page 23: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Geometric Hashing• It requires a large amount of offline preprocessing of the models as

well as a fair amount of space. But this allows for fairly fast online

recognition in the average case.

• Given: large database of models described by feature points in some

2d coordinate system and an image with features extracted from it.

• Assuming affine transformations only, we want to know which

model(s) are in the image and what position and orientation the models

are in in the image.

Page 24: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Geometric Hashing• Each model M is stored as an ordered set of feature points.

• Any 3 non-collinear points E = (A , B, C) form a basis for an affine

coordinate system.

• D = xi(B-A) + eta(C-A) + A

• Any point D in M can be represented as (xi,eta) pairs w.r.t. the basis E.

These (xi,eta) pairs are the affine coordinates of the point D.

• If we apply an affine transformation to the points in M, (xi,eta) will be

the same for each point in M, given the same basis E.

Page 25: CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS 376 - Spring 2007

Geometric Hashing• A hash table is created, indexed on (xi,eta), and stores a list of all the

M,E pairs where some D in M has (xi,eta) affine coordinates w.r.t. E.

• The online recognition step uses the hash table above and an

Accumulator array indexed on M,E pairs.