Introduction to Robot Vision Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University

Introduction to Robot Vision

Ziv YanivComputer Aided Interventions and Medical Robotics,

Georgetown University

Vision

The special sense by which the qualities of an object (as color, luminosity, shape, and size) constituting its appearance are perceived through a process in which light rays entering the eye are transformed by the retina into electrical signals that are transmitted to the brain via the optic nerve.

[Miriam Webster dictionary]

The Sensor

C-arm X-ray

endoscope

webcam

Single Lens Reflex (SLR) Camera

The Sensor

Model: Pin-hole Camera, Perspective Projection

Image plane

optic

al

axis

focal point

x

yz

xy

z

principle point

Machine Vision

Goal:Obtain useful information about the 3D world from 2D images.

Model:

images

Regions TexturesCornersLines…

3D GeometryObject identificationActivity detection…

actions

Machine Vision

•Low level (image processing)• image filtering (smoothing, histogram modification…), • feature extraction (corner detection, edge detection,…)• stereo vision• shape from X (shading, motion,…)• …

• High level (machine learning/pattern recognition)• object detection • object recognition• clustering• …

Goal:Obtain useful information about the 3D world from 2D images.

Machine Vision

• How hard can it be?

Machine Vision

• How hard can it be?

Robot Vision

1. Simultaneous Localization and Mapping (SLAM)

2. Visual Servoing.

Robot Vision

1. Simultaneous Localization and Mapping (SLAM) – create a 3D map of the world and localize within this map.

NASA stereo vision image processing, as used by the MER Mars rovers

Robot Vision

1. Simultaneous Localization and Mapping (SLAM) – create a 3D map of the world and localize within this map.

“Simultaneous Localization and Mapping with Active Stereo Vision”, J. Diebel, K. Reuterswärd, S. Thrun, J. Davis, R. Gupta, IROS 2004.

Robot Vision

1. Visual Servoing – Using visual feedback to control a robot:

a) image-based systems: desired motion directly from image.

“An image-based visual servoing scheme for following paths with nonholonomic mobile robots” A. Cherubini, F. Chaumette, G. Oriolo,ICARCV 2008.

Robot Vision

1. Visual Servoing – Using visual feedback to control a robot:

b) Position-based systems: desired motion from 3D reconstruction estimated from image.

• Difficulty of similar tasks in different settings varies widely:– How many cameras?– Are the cameras calibrated?– What is the camera-robot configuration?– Is the system calibrated (hand-eye calibration)?

Common configurations:

System Configuration

x

y

zx

y

z

xy

zx

y

z x

y

z

x

y

z

x

y

z

System Characteristics

• The greater the control over the system configuration and environment the easier it is to execute a task.

• System accuracy is directly dependent upon model accuracy – what accuracy does the task require?.

• All measurements and derived quantitative values have an associated error.

Stereo Reconstruction

• Compute the 3D location of a point in the stereo rig’s coordinate system:• Rigid transformation between the two cameras is known.• Cameras are calibrated –given a point in the world coordinate system we know how to map it to the image.• Same point localized in the two images.

Camera 1

Camera 2T21

world

Commercial Stereo Vision

Polaris Vicra infra-red system(Northern Digitial Inc.)

MicronTracker visible light system(Claron Technology Inc.)

Commercial Stereo Vision

left image right image

Images acquired by the Polaris Vicra infra-red stereo system:

Stereo Reconstruction

Camera 1

Camera 2

• Wide or short baseline – reconstruction accuracy vs. difficulty of point matching

Camera 1

Camera 2

Camera 1 Camera 2

Camera Model

10100

000

000

Z

Y

X

f

f

w

v

u

Z

Yfy

• Points P, p, and O, given in the camera coordinate system, are collinear.

xy

z

P=[X,Y,Z]

O

p=[x,y,f]

f

There is a number for which O + P = p

P = p

= f/Z , thereforeZ

Xfx

Camera Model

10100

00

0

'

'

'

0

0

Z

Y

X

yfk

xsfk

w

v

u

y

x

Transform the pixel coordinates from the camera coordinate system to the image coordinate system:• Image origin (principle point) is at [x0,y0] relative to the camera coordinate system.• Need to change from metric units to pixels, scaling factors kx, ky.

x

y

principle point

[x’,y’]

• Finally, the image coordinate system may be skewed resulting in:

10100

00

00

'

'

'

0

0

Z

Y

X

yfk

xfk

w

v

u

y

x

Camera Model

C]|KR[I10

RCR]0|[IKM

13333343

100

0 0

0

yfk

xsfk

y

x

K

• As our original assumption was that points are given in the camera coordinate system, a complete projection matrix is of the form:

• How many degrees of freedom does M have?

T3

T2

T1

34333231

24232221

14131211

43

M

M

M

mmmm

mmmm

mmmm

M

C – camera origin in theworld coordinate system.

Camera Calibration

MPp

• Given pairs of points, piT=[x,y,w], Pi

T=[X,Y,Z,W], in homogenous coordinates we have:

•As the points are in homogenous coordinates the vectors p and MP are not necessarily equal, they have the same direction but may differ by a non-zero scale factor.

0MPp

Our goal is to estimate M

xz

xy

z

principle point

calibration object/world coordinate

system

camera coordinate system

image coordinate system

y

Camera Calibration

0Am

0

3

2

1

TTT

TTT

TTT

M

M

M

0PP

P0P

PP0

iiii

iiii

iiii

xy

xw

yw

• After a bit of algebra we have:

• The three equations are linearly dependent:

• Each point pair contributes two equations.

• Exact solution: M has 11 degrees of freedom, requiring a minimum of n=6 pairs.

• Least squares solution: For n>6 minimize ||Am|| s.t. ||m||=1.

321 AAA i

i

i

i

w

y

w

x

Obtaining the Rays

• Camera location in the calibration object’s coordinate system, C, is given by the one dimensional right null space of the matrix M (MC=0).

• A 3D homogenous point P = M+p is on the ray defined by p and the camera center [it projects onto p, MM+p =Ip=p].

• These two points define our ray in the world coordinate system.

• As both cameras were calibrated with respect to the same coordinate system the rays will be in the same system too.

Intersecting the Rays

11111 )( nar tt

22222 )( nar tt 1a 1n

2n2a

2

21

21T

2121

)())((

nn

nnnaa

t 2

21

21T

1122

)())((

nn

nnnaa

t

2

)]()([ 2211 tt rr

World vs. Model

• Actual cameras most often don’t follow the ideal pin-hole model, usually exhibitsome form of distortion (barrel, pin-cushion, S).

• Sometimes the world changes to fit your model, improvements in camera/lens quality can improve model performance.

old image-Intensifier x-ray:pin-hole+distortion

replaced by flat panel x-ray: pin-hole

Additional Material

• Code:– Camera calibration toolbox for matlab (Jean-Yves Bouguet )

http://www.vision.caltech.edu/bouguetj/calib_doc/

• Machine Vision:– “Multiple View Geometry in Computer Vision”, Hartley and Zisserman,

Cambridge University Press.– "Machine Vision", Jain, Kasturi, Schunck, McGraw-Hill.

• Robot Vision:– “Simultaneous Localization and Mapping: Part I”, H. Durant-Whyte, T. Bailey,

IEEE Robotics and Automation Magazine, Vol. 13(2), pp. 99-110, 2006.– “Simultaneous Localization and Mapping (SLAM) : Part II”,T. Bailey, H. Durant-

Whyte, IEEE Robotics and Automation Magazine, Vol. 13(3), pp. 108-117, 2006.– “Visual Servo Control Part I: Basic Approaches”, IEEE Robotics and Automation

Magazine, Vol. 13(4), 82-90, 2006.– Visual Servo Control Part II: Advanced Approaches”, IEEE Robotics and

Automation Magazine, Vol. 14(1), 109-118, 2007.

Documents

Introduction to Robot Vision Ziv Yaniv Computer Aided Interventions and Medical Robotics, Georgetown University