11
* Corresponding author. Tel.: #33-1-69-47-75-04; fax: #33-1-69- 47-75-98. E-mail address: shaheen@cemif.univ-evry.fr (M. Shaheen). Control Engineering Practice 9 (2001) 375}385 Visual command of a robot using 3D-scene reconstruction in an augmented reality system Mudar Shaheen*, Malik Mallem, Florent Chavand Universite & d+Evry/CEMIF CE1455 Courcouronnes, 40, Rue du Pelvoux F-91020 Evry Cedex, France Received 10 January 2000; accepted 11 August 2000 Abstract This paper presents an augmented reality system that has been developed to reconstruct 3D scenes from a single camera's view. The camera is supposed to be calibrated in the world frame of reference. The 3D-model is known and correctly matched to its 2D-image. A 3D-object pose recovery algorithm that combines three methods to reach a better robustness and accuracy has been developed. A comparison between these methods is given, as well as a discussion about their advantages and drawbacks. Some experimental results on real images that demonstrate the robustness and the accuracy of the proposed system are presented as well as the visual command of a robot based upon the reconstructed scene. 2001 Elsevier Science ¸td. All rights reserved. Keywords: Telerobotics; Augmented reality; Object recognition; 3D-object pose recovery 1. Introduction The problem of "nding an object's pose consists of determining the position and orientation of a 3D-object with respect to a camera or a prede"ned frame of refer- ence. Many applications of computer vision are con- fronted with this problem: camera calibration, localizing mobile robots, cartography, object tracking, object rec- ognition and augmented reality. Finding an object's pose can be de"ned more precisely as follows: given a 3D- model (expressed by points, lines, etc.) of an object in some frame of reference and its projection onto a camera image, and given the camera's model and its parameters, determine the rigid transformation (rotation and transla- tion matrix) between the new attitude of the object and the frame of reference. Several research works have been attempting to solve this problem based on the criteria of precision, e$ciency, rapidity and robustness. The existing methods can be classi"ed into two groups: (1) Analytic methods (also called as closed-form methods): these methods are applied when a limited number of matching primitives (point to point or line to line) is available. Hung, Yeh, and Harwood (1985) and Horaud, Conio, Leboulleux, and Lacolle (1989) apply their algorithms for four matching points. Dhome, Richetin, Lapreste, and Rives (1989) use three-line matches. Forsyth et al., (1991) use a pair of coplanar conics. When the number of matches exceeds these limit values, the non-linear system of equations becomes high- ly redundant. The absence of a direct solution for such a system has led to the next group. (2) Numerical methods: optimisation methods are used to solve a non-linear system of equations where the number of known variables is higher than the unknown ones (3 for rotation and 3 for translation). Lowe (1987, 1991) uses the Newton}Raphson method to "nd a solu- tion, this has two drawbacks: a good initial guess of the solution must be provided to launch the algorithm and an expensive computation of the Jacobian pseudo- inverse matrix is required at each iteration. Roberts (1965) provides a direct method for "nding the perspect- ive projection matrix which describes the relation be- tween 3D-points and their images, six matches are necessary to "nd the 11 parameters of the matrix. Phong, Horaud, Yassine, and Tao (1995) use the trust-region optimisation method that relies on the quaternion to represent the error criterion. N'zi, Mallem & Chavand (1997) and Dementhon and Davis (1995) convert the problem to a linear system that can be solved by a small number of iterations but their algorithms do not 0967-0661/01/$ - see front matter 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 9 6 7 - 0 6 6 1 ( 0 0 ) 0 0 1 2 7 - 1

Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Embed Size (px)

Citation preview

Page 1: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

*Corresponding author. Tel.: #33-1-69-47-75-04; fax: #33-1-69-47-75-98.E-mail address: [email protected] (M. Shaheen).

Control Engineering Practice 9 (2001) 375}385

Visual command of a robot using 3D-scene reconstruction in anaugmented reality system

Mudar Shaheen*, Malik Mallem, Florent ChavandUniversite& d+Evry/CEMIF CE1455 Courcouronnes, 40, Rue du Pelvoux F-91020 Evry Cedex, France

Received 10 January 2000; accepted 11 August 2000

Abstract

This paper presents an augmented reality system that has been developed to reconstruct 3D scenes from a single camera's view. Thecamera is supposed to be calibrated in the world frame of reference. The 3D-model is known and correctly matched to its 2D-image.A 3D-object pose recovery algorithm that combines three methods to reach a better robustness and accuracy has been developed.A comparison between these methods is given, as well as a discussion about their advantages and drawbacks. Some experimentalresults on real images that demonstrate the robustness and the accuracy of the proposed system are presented as well as the visualcommand of a robot based upon the reconstructed scene. � 2001 Elsevier Science ¸td. All rights reserved.

Keywords: Telerobotics; Augmented reality; Object recognition; 3D-object pose recovery

1. Introduction

The problem of "nding an object's pose consists ofdetermining the position and orientation of a 3D-objectwith respect to a camera or a prede"ned frame of refer-ence. Many applications of computer vision are con-fronted with this problem: camera calibration, localizingmobile robots, cartography, object tracking, object rec-ognition and augmented reality. Finding an object's posecan be de"ned more precisely as follows: given a 3D-model (expressed by points, lines, etc.) of an object insome frame of reference and its projection onto a cameraimage, and given the camera's model and its parameters,determine the rigid transformation (rotation and transla-tion matrix) between the new attitude of the object andthe frame of reference.

Several research works have been attempting to solvethis problem based on the criteria of precision, e$ciency,rapidity and robustness. The existing methods can beclassi"ed into two groups:

(1) Analytic methods (also called as closed-formmethods): these methods are applied when a limitednumber of matching primitives (point to point or line to

line) is available. Hung, Yeh, and Harwood (1985) andHoraud, Conio, Leboulleux, and Lacolle (1989) applytheir algorithms for four matching points. Dhome,Richetin, Lapreste, and Rives (1989) use three-linematches. Forsyth et al., (1991) use a pair of coplanarconics. When the number of matches exceeds these limitvalues, the non-linear system of equations becomes high-ly redundant. The absence of a direct solution for sucha system has led to the next group.

(2) Numerical methods: optimisation methods are usedto solve a non-linear system of equations where thenumber of known variables is higher than the unknownones (3 for rotation and 3 for translation). Lowe (1987,1991) uses the Newton}Raphson method to "nd a solu-tion, this has two drawbacks: a good initial guess of thesolution must be provided to launch the algorithm andan expensive computation of the Jacobian pseudo-inverse matrix is required at each iteration. Roberts(1965) provides a direct method for "nding the perspect-ive projection matrix which describes the relation be-tween 3D-points and their images, six matches arenecessary to "nd the 11 parameters of the matrix. Phong,Horaud, Yassine, and Tao (1995) use the trust-regionoptimisation method that relies on the quaternion torepresent the error criterion. N'zi, Mallem & Chavand(1997) and Dementhon and Davis (1995) convert theproblem to a linear system that can be solved by asmall number of iterations but their algorithms do not

0967-0661/01/$ - see front matter � 2001 Elsevier Science Ltd. All rights reserved.PII: S 0 9 6 7 - 0 6 6 1 ( 0 0 ) 0 0 1 2 7 - 1

Page 2: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

converge in special situations such as a small object}camera distance for Dementhon and big movements of theobject for N'zi et al. (1997). It should be noticed that the opti-misation methods su!er from sticky local minima points andhence do not guarantee a global minimum solution.

The authors' development consists of bringing to-gether all possible advantages of the methods used toenhance the e$ciency, the precision of the results and theexecution time.

In Section 2, the three methods used in the implemen-tation to determine the object pose are presented. Thealgorithm is presented in Section 3. Section 4 shows theexperimental protocol, the results obtained and a com-parison between these methods. Section 5 demonstratesthe application of object pose "nding in an object recog-nition system. In Section 6, a robot visual control ap-plication that is based upon the 3D-scene reconstructionis illustrated. Finally the conclusions are in Section 7.

2. Object pose recovery methods

In this section, three methods used to "nd the object'spose are presented. The "rst is analytical and based upona minimal subset of matching features between the objectmodel and its image. The last two are numericalmethods; each has its own convergence characteristics.Details of object pose recovery problem formulation canbe found in Moreau, Mallem, Chavand, and N'zi (1997)and in N'zi et al. (1997).

2.1. Three-segment geometric method

For resolution simplicity, in general, object pose deter-mination is subdivided into two steps: object orientationand then object translation determinations. Direct deter-mination of orientation is complex because it introducesthree angles. To simplify the equations, the approachpresented by Dhome et al. (1989) is used. It consists ofintroducing intermediate frames of reference, which per-mits only two angles.

Di!erent con"gurations of three matching segmentsare presented in Faugeras (1993). In the authors' imple-mentation, a simple three-point con"guration is used toapply the method. The translation can be deduced easilyonce the rotation matrix is obtained (N'zi et al., 1997).

The fact that this method uses only a minimal subset ofmatching points implies that the resulting transformationmatrix is not precise and represents only an initial guessof the object's pose. To enhance this result, a com-plementary method taking into account all matchingpoints is applied.

2.2. Small movement linear method

The objective of this paper is to correct the object posegiven by the previous method. The rotation error is

calculated using all the matching points. The rotationvector is used to compute this error, then the rotationR and translation t are found.Rotation vector: Rotation can be represented using

three angles, ��, �

�and �

�named nautical angles (see

Appendix A). Each of these angles de"nes the rotation inone axis of a given frame of reference. An advantage ofthis formalism is that an angle directly represents thecommand of one of the joints of a robot. One drawback isthat nautical angles are not commutative; the "nal ob-ject's pose depends on the order in which these threerotations are applied. Also, they give more than onesolution for the inverse problem and some singularpositions.

Another way to express the rotation is to use rotationvector, of which

� the direction represents the rotation axis (u),� the norm represents the rotation angle value (�).

Rotation vector is expressed by r"� ) u where u is theunit vector. Rotation matrix R(u, �) associated with thevector r is

R(u, �)"I���

#sin(�)X(u)#(1!cos(�))(X(u))�, (1)

where

X(u)"�0 !u

�u�

u�

0 !u�

!u�

u�

0 �is the cross product matrix, with

u"�u�u�u��.

This representation has an interesting interpretation,that is, a solid movement can be composed of a transla-tion of its gravity centre and a rotation on one of the axispassing through this centre.

As in the previous method, it is necessary to computeboth rotation R and translation t in order to recover thesmall movement of an object.R determination: Let v be the unit vector lying on one

polyhedron edge. The polyhedron undergoes a smallmovement. The vector after movement v� can be writtenas

v�"v#dv with dv"r�v, (2)

where � is the cross product.This equation represents an approximation of the ro-

tation arc to its tangent (Fig. 1).For a point p, this expression becomes

dp"t#r�p. (3)

376 M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385

Page 3: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Fig. 1. Small-angle approximation.

In fact, it consists of linearising the motion with respectto the three coordinates of r. So, dv is the small linearmovement of the object.t determination: In Eq. (2) the translation is not taken

into account, so it is introduced in the following equationavailable for a point. The computation of the translationis presented below:

Each 3D-point p (x, y, z) undergoing a rotation R anda translation t is projected on the camera image plane tohave its image:

�su

sv

s �"C����

R���

t���

0 0 0 1 � �x

y

z

1�, (4)

where C���

"(c��), i"1,2, 3, j"1,2, 4 is the camera

pinhole model.p and its image (u, v) on a camera are related to a visual

ray or epipolar (the line segment passing through them)which is obtained by the development of relation (4) asfollows:

n�p#a

�"0,

n�p#a

�"0.

(5)

Eq. (5) represents two independent equations which ex-press the epipolar passing through the point p and itsimage (u, v), where

n�"�

c��

!uc��

c��

!uc��

c��

!uc���, n

�"�

c��

!vc��

c��

!vc��

c��

!vc���,

a�"(c

��!uc

��), a

�"(c

��!vc

��),

which after derivation gives (in the case of small move-ments of rotation and translation)

dn�p#n

�dp#da

�"0,

dn�p#n

�dp#da

�"0.

(6)

dp is given by Eq. (3). Using the following mixed productproperty of three vectors i, j, and k: i ) (j�k)"j ) (k�i),the last two equations become

n�t#(p�n

�)r#dn

�p#da

�"0,

n�t#(p�n

�)r#dn

�p#da

�"0,

(7)

where r expresses the rotation and t the translation.Eq. (7) can be expressed in the form: Ax"B with

x"(a b c r�

r�

r�)�, where

r"�r�r�r�� and t"�

a

b

c�are the unknowns.

Solving this linear system allows r and t to be deter-mined. To obtain the six unknowns, a minimum of n"3points is necessary. These three points should have dis-tinctive non-aligned images. This system can be solved bya classical least-squares method when N'3. Thismethod when applied to small movements, reduces theerrors resulting from the analytic three-segment methodin some iterations.Criterion to optimise: The pose error calculated in

pixels is

error"�1

N

�����

((u(�!u

�)�#(v(

�!v

�)�),

(8)

where N is the number of matching points, (u�, v

�) the

coordinates of an image point, and (u(�, v(

�) the estimated

coordinates of the same image point.

2.3. The non-linear method

The last method "nds a solution of the pose problem inthe neighbourhood of the optimum. Suppose duringa small movement, the non-linear equations of the systembecome linear, owing to the rotation vector formalism.The method's evaluation given in Section 4 of this paperillustrates that it does not converge in all cases. Thenon-linear method has essentially been implemented toevaluate the e$ciency and the precision of the above-mentioned linear one. The non-linear method aims to"nd a solution without any simpli"cation of the system ofequations.

A point p and its image (u, v) on a camera are related toan epipolar is represented by Eq. (5).p is substituted by

p"t#r�p. (9)

Eq. (5) becomes

n�t#n

�r�p#a

�"0,

n�t#n

�r�p#a

�"0.

(10)

M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385 377

Page 4: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

The non-linear method "nds the rotation (r) and transla-tion (t) by minimising an error criterion, which is ex-pressed by Eq. (8).

Parameters vector p to be estimated is built up ofrotation and translation parameters. Translation is ex-pressed simply by three parameters:

t���

"[a b c]�.

Olinde}Rodrigues formalism or quaternion is used toexpress object rotation in the working space:

q"[��

��

��

��]�.

Quaternion gives a unique solution to the inverse prob-lem (when passing from rotation matrix to quaternion)and has no singular position. Quaternion representsthree DOF, so besides the following constraint three ofthese parameters are used:

���#��

�#��

�#��

�"1 (seeAppendixA).

The pose parameters vector then become

p"[��

��

��

a b c]�.

Applying a numerical method to the previous criterion(Eq. (8)) allows the identi"cation of pose parameters from3D-points and their images. The Levenberg}Marquardtalgorithm (Press, Teukolsky, Vetterling, & Flannery,1992) is selected for its tuning coe$cient � that avoidslocal minima.

Good initialisation of the parameters vector is neces-sary to ensure the convergence of the algorithm. Thethree-segment method is used (Section 2.1) to provide aninitial guess to the parameters vector. At each iteration,the quaternion is calculated from the current rotationmatrix, then the current p vector is deduced. The algo-rithm gives a new estimate of p and then the new rotationmatrix is computed. The algorithm stops when the errorcriterion is su$ciently small.

3. The proposed mixed algorithm

The evaluation of the three pose recovery methodsdescribed above (see also Section 4) suggested the idea ofadopting a mixed algorithm. The idea is to bring togetherall the advantages of these methods to ensure conver-gence towards the optimum in almost all situations. Infact, the linear and non-linear methods need a goodinitial guess about the object pose as opposed to theanalytic one. However, the two of them are more precisecompared to the last one. So, the three-segment method(analytical one) is used to give an approximate solutionto initialise the numerical methods. In terms of precision,the linear method is better than the non-linear one be-cause analytically it gives an optimal solution in the senseof the error criterion. Also, the non-linear method cannotavoid local minima. However, the linear method presents

some singular positions (inverse problem of rotation vec-tor) that the non-linear one does not have.

Here, the proposed mixed algorithm is described:1. Apply the three-segment method to obtain an initial

guess of the object pose, which is expressed by a trans-formation matrix.

As mentioned before, the three-segment method givesan approximate solution for the pose, which is used inthe next steps as a starting point for the linear and thenon-linear algorithms. Since the three selected match-ing points should be distinct and non-collinear, a cri-terion based on the length and the angle between thetwo vectors, formed by the three image points, shouldbe satis"ed. The criterion to be satis"ed is that the areaformed by the three image points should be su$cientlylarge (area'threshold).

2. Use the linear method to a$ne the object pose,

� IF the criterion error is su$ciently small the result isconsidered good, STOP. The threshold consideredhere is given by the object recognition system, whichuses this pose recovery algorithm.

� ELSE: sticky point, CONTINUE.

In fact, the evaluation of the linear and the non-linearmethods indicates that one should trust the result of thelinear one. Because, when it converges, it reaches the bestpose solution. Besides, the linear algorithm is faster thanthe non-linear one. In fact, the linear algorithm takesonly two or three iterations to give its result and, unlikethe non-linear one, needs no Hessian estimate.

3. Apply the non-linear method (Levenberg}Marquardt algorithm) to obtain convergence,

� IF there is no convergence, the solution is impossible,so STOP.

� ELSE: The non-linear algorithm stops when the rela-tive convergence is smaller than a threshold. Save theresult, CONTINUE.

4. Apply the linear method with the new pose found bythe non-linear method one more time,

� IF the criterion error is su$ciently small the result isconsidered good, STOP (the threshold here is the sameas in Step 2).

� ELSE: no solution, END

It should be noticed that the non-linear algorithm isused only to remove sticky points found by the linearone. This has enhanced the overall convergence, as willbe seen in the next section.

4. Results and comparison

In this section, the results concerning the above-men-tioned pose recovery methods are presented. The entry

378 M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385

Page 5: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Table 1Error statistics of the four methods (in mm)

3-segment Non-linear Linear Mixed

Mean 1.741 0.899 0.820 0.630Standard deviation 1.808 1.181 1.445 1.011Mean deviation 1.348 0.535 0.549 0.101Minimum 0.581 0.578 0.579 0.578

Fig. 2. Pose recovery error.

data have been applied to these methods in the followingorder:

� The geometric three-segment method that gives aninitial guess of the pose.

� The non-linear method based on the last pose found.� The linear method based on the initial guess of the

pose and using the rotation vector formalism.� The proposed mixed algorithm presented in the third

section.

Here, the experimental protocol, the criterion used, theexperimental conditions and "nally, the results in theform of histograms are described.Experiment conditions: A polyhedron with 14 faces, 36

edges and 24 vertices is placed at a mean distance of 2mfrom the camera (Fig. 8).

Experimental conditions are as follows:

Camera focal: 25mm,Camera de"nition: 756�581 photo elements�,Graphics card de"nition: 768�576 pixel�,Object}camera distance: 2m,Object dimensions: 15.5�14.3�2 cm�.

The camera is automatically calibrated using a robot(Mallem, Shaheen, & Chavand, 1999).Experiment protocol: Image processing is applied to the

image of the polyhedron to extract a 2D-model (seg-ments, vertices and chains of segments). The matchingprocess of the authors' object recognition system is ap-plied between the 2D-model and the 3D-model database(Chavand, Shaheen, & Mallem, 1997). The matchingprimitives (2D and 3D vertices; 16 were found in the tests)are used to start the object pose recovery algorithms. Thefour above-mentioned methods were applied.

The three-segment method needs three matches. So, allpossible three combinations are taken from the resultingmatches. The maximal number of combinations:C��

�"(16*15*14)/(3*2)"560, from which 32 represent

aligned-point sets. The threshold used to determine thisalignment is the area formed by the three image pointsand is "xed in the experiment to 40 pixel�.

The four algorithms have been applied one by one tothe 528 sets of matches to see the best algorithm in termsof precision and robustness. Each set of matches is con-sidered as a sample in this experiment. The algorithmthat "nds the best object pose for the maximal number ofentry sets is searched. As noticed earlier, the three-seg-ment method's result is used to start the other methods,which are based on all the matching points.

4.1. Results

Figs. 3 and 4 show histogram and cumulated histo-gram comparisons between the four methods. A histo-gram function is de"ned as the samples number relative

to the pose error found. One can consider a methodbetter than another, in terms of performance, if it getsa higher samples number for a minimal error value. Thehigher the samples number corresponding to the minimalerror value, the better the performance. The mixed algo-rithm represents 5% of gain compared to the linear one.This gain is important, because if the pose recovery fails,the recognition process consequently fails.

Table 1 illustrates some statistics concerning the fourmethods. They all have practically the same convergencepoint (minimum). The mixed algorithm has the closestmean to the minimum, and also the minimal standarddeviation. So, it is the most e$cient among the fourmethods.

The pose error in millimetres can be deduced from thefollowing formula:

E�"d

�sin(a

�)+d

�a�

which is available for each 3D-vertex i of an object afterits pose recovering (Fig. 2), where E

�the pose error in mm

(deviation between real 3D-vertex v�and computed 3D-

vertex v(�), d

�the distance between camera optical centre

C and computed 3D vertex i, C is known, thanks to thecamera calibration process, a

�the angle in radians be-

tween the epipolars of real and computed 3D-vertex i.The computed 3D-vertex expresses the new location ofthe vertex after pose recovery, and e

�the distance in mm

between real and computed 3D-vertex i image.In Fig. 3, histogram function is de"ned as the sample

number relative to the pose error found for each applied

M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385 379

Page 6: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Fig. 3. Histogram comparison for the four pose-recovery methods.

Fig. 4. Cumulated histogram comparison for the four pose-recoverymethods.

Fig. 5. Synoptic scheme of object recognition.

method. A sample is considered as a set of matchingpoints used by a given applied recovery method. Theobject used in this experiment (Fig. 8) allows the use of528 samples (sets of three points) at most.

Each sample is applied successively to the fourmethods. The three-segment method needs only threepoints but the others need all the other vertices of theobject.

As shown in Figs. 3 and 4, the mixed algorithm reachesthe optimum in 99.62% of cases with a high pose accu-racy equal to 0.58mm. The three-segment method doesnot provide good accuracy because it is only based onthree points. However numerical methods improve thepose-error accuracy because they use much morevertices.

However, to compute precisely the depth error, oneshould know the real object's pose after movement. A di-

rect solution is to calibrate the object on the experimentalbench. This task is left for future tests.

These tests have been repeated for di!erent con"gura-tions of the object. Each time they give practically thesame result with a slightly di!erent minimal error. Theresults prove the enhancement of accuracy and robust-ness of the mixed algorithm. This is because, it reachesthe minimal pose error in almost all the cases.

When the object pose error veri"es the recognitioncriteria, the object is superimposed on its real image afterapplying the new transformation.

5. Automatic recognition validation

The object pose-recovery methods discussed above arepart of an object recognition system that is being de-veloped in the authors' laboratory. Results and evalu-ation details of this system will be the subject of a futurepaper. In this section, this system is brie#y presentedalong with a typical validation experiment of the pro-posed pose recovery algorithm. The di!erent compo-nents of the authors' object recognition system areillustrated in the synoptic scheme of Fig. 5 and are

� Object modelling, aspect graph and hash tablesconstruction.

� Image processing and its result presented in a 2Dmodel of the image primitives (line segments, verticesand segment chains).

� Matching, which is based upon two major approaches:graph matching and geometric hashing. Each match-ing hypothesis points to a 3D-model in the databaseand contains three lists of matches: 2D-segment chains/3D-faces, 2D-segments/3D-edges and 2D-vertices/3D-vertices (Chavand et al., 1997).

� Pose recovery using the mixed algorithm, which allowsthe current matching hypothesis to be veri"ed.

� Superimposition of the recognised models on theirvideo images.

380 M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385

Page 7: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Fig. 6. 3D selected models from the database.

Fig. 7. The highlighted model is recognised.

Fig. 8. The 3D-model is superimposed on its image C�F�(with mean pose

error of 0.58mm). The chain}face correspondences are marked, whereC�are 2D-chains computed by the image processing module and F

�are

the projections of 3D-faces of the object model using the camera model.

Fig. 9. Four DOF robot and calibration grid.

Fig. 10. The objects placed on the working table are seen by the "rstcamera. Once an object is selected, its image is processed and matchedto its model. The operator points out one of the recognised object faces(cross on the selected object).

The recognition algorithm proceeds as follows: A set of3D-models is selected from the database, (Fig. 6). Theimage-processing algorithm gives the most reliableprimitives from the scene (segments, vertices and chainsof segments) represented in a 2D-model. The matchingprocess is applied between the 2D-model and the selected3D-models from the base; this results in a list of orderedhypotheses about the present object(s). The pose-recov-ery mixed algorithm is applied to verify each hypothesisin the list; the hypothesis is either accepted or rejected.When a hypothesis is accepted, the corresponding 3D-model (Fig. 7) is superimposed on the real image using itscalculated pose (Fig. 8).

6. Application 5eld: robot control

This application deals with the veri"cation of theauthors' software development. The hardware used is

composed of a four DOF robot which is used to pointout a polyhedral object.

The superimposition of a virtual representation of anobject on its camera image shows that it is correctlyrecognised by the system. So, it is possible to telecontrolthe robot by pointing to a part of an object on the screen.This superimposition provides con"dence to the humanoperator in teleoperating the robot.

In this experiment, a four DOF robot is used (Fig. 9)that is a carriage of two translation axes and a turret oftwo rotary parts. The end e!ector is equiped with a shankon which the LED is mounted for calibration purposes.

The objects to be manipulated by the robot aremodelled in a 3D-database (3DDB). The 3DDB mustbe updated, if the objects or camera move, in order to getthe computer-generated and video images rigorously

M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385 381

Page 8: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Fig. 11. The desired 3D-point is calculated and reached by the robot'sshank (from the second camera's viewpoint).

superimposed. For robot control, the human operatorpoints out on the screen the object to be manipulated bythe robot. Thanks to the earlier 3DDB updating andcamera modelling, object pointed to was identi"ed suc-cessfully (Fig. 10).

Once the object is recognized the operator points outone of its faces. The intersection between the line seg-ment, passing from the 2D selected point through thecamera centre, and the object's face gives the coordinatesof the desired 3D-point. The 3D computed point is usedas a position order for robot control. Hence, the pre-calibrated robot automatically reaches the desired pointon the object (Fig. 11).

7. Conclusion

After introducing the pose-recovery problem, threepose-recoverymethods have been studied and compared.The three-segment method is an analytical one and isbased upon a minimal set of matches so it provides anapproximate solution. The linear method is based on allpossible matches and on the results provided by thethree-segment method. It improves the result but doesnot converge in all the cases, because it theoretically "ndsa solution for small pose errors. The non-linear method,which is based on an initial guess of the pose and allpossible matches, seems to have some local minima prob-lems. The linear method is faster than the non-linear oneand provides a more precise pose solution.

In this paper, a mixed pose-recovery algorithm thatbrings together the advantages of these three methods isproposed. The mixed algorithm is based on the threeabove-mentioned methods in such a way that it switchesbetween them to obtain "nal convergence. Hence, itprovides the most precise possible solution and is moree$cient. The pose-error obtained reaches 0.58mm fora camera}object distance equal to 2m for real images.

The algorithm converges in 99.62% of the cases. Thisresult is crucial for an object recognition system, since itallows the generated hypotheses to be veri"ed more pre-cisely and reduces the overall time of the recognitionprocess. The augmented reality system presented herehas been tested and validated using a four DOF robot,which executes the visual commands given to it bya simple click of the mouse.

Future applications include robotic assembling andsurgery assistance.

Appendix A. Solid orientation description(Khalil & Dombre, 1988)

This appendix relates to the pose-recovery methodsstudied in Section 2. Among the methods most used todescribe the orientation of a solid in 3D-space presentedin this appendix are cosine directors, three angles, rota-tion vector and quaternion.

A.1. Recalls

A.1.1. Multiplication of the homogeneous transformationsLet R

�be a reference mark having undergone known

consecutive transformations ¹���2

which bring it tothe R

�reference mark. The "nal transformation �¹

�ex-

pressing the R�reference mark in reference mark R

�is

calculated according to two cases:

� If the ¹��2

transformations are expressed relativeto the current reference mark R

��2, then �¹

�is

obtained by the multiplication on the right:�¹

�"�¹

��¹

�2��¹

.

� If the ¹��2

transformations are expressed relativeto the reference mark of origin R

�, then �¹

�is obtained

by the multiplication on the left: �¹�"�¹

2�¹

��¹

�.

A.1.2. Vectorial pre-productThe product of two vectors u and v is written as

follows:

u�v"�u�v�!u

�v�

u�v�!u

�v�

u�v�!u

�v��.

This can be expressed, to facilitate the development, ina matrix form:

u�v"�0 !u

�u�

u�

0 !u�

!u�

u�

0 � �v�v�v��"X(u) ) v,

where X(u) is the vectorial pre-product matrix of u.

382 M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385

Page 9: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Fig. 12. R�is obtained from R

�by two rotations: � around z and

� around x.

A.2. Director cosine

Let R be an orthogonal matrix in E� space such as

R"�i j k�"�i�

j�

k�

i�

j�

k�

i�

j�

k��. (A.1)

Its elements represent the orientation cosine directors ofthree vectors. It contains only three independent para-meters, because the vector k is the product of both otherswhose norm is equal to 1 and the scalar product i ) j"0.It is interesting to search other formulas expressing rota-tion since the last one is not optimal (redundant).

A.3. Angles of RPY (roll, pitch, yaw)

The angles of roll}pitch}yaw express the orientationby three successive rotations of a reference mark aroundits three principal axis (z, y then x). Since these threerotations are expressed relative to the current referencemark, the "nal rotation is obtained by multiplication onthe right:

R(��, �

�, �

�)"�

C��

!S��

0

S��

C��

0

0 0 1� �C�

�0 S�

�0 1 0

!S��

0 C���

��1 0 0

0 C��

!S��

0 S��

C���, (A.2)

where ��, �

�and �

�are the three rotation angles, with:

C�"cos(�) and S�"sin(�). So

R"

�C�

�C�

�S�

�S�

�C�

�!C�

�S�

�C�

�S�

�C�

�#S�

�S�

�C�

�S�

�S�

�S�

�S�

�#C�

�C�

�C�

�S�

�S�

�!S�

�C�

�!S�

�S�

�C�

�C�

�C�

��.

The advantage of this formalism in Robotics is that therotation around each axis represents a direct commandto a speci"c electric motor. The disadvantage is that theproduct of the three matrices in Eq. (A.2) is not com-mutative. The order in which the three rotations arecomposed modi"es the matrix R. Another major disad-vantage is that the inverse problem (to "nd the threeangles starting from R) can have more than one solutionand singular positions.

A.4. Rotation vector

Let R(u, �) be a rotation around an axis passingthrough the origin of a reference markR

�and holding the

unit vector u"[u�

u�

u�] ([ ) ] means transposed).

Suppose that u is the unit vector following axis z of R�,

a reference mark having the same origin as R�(Fig. 12).

R�can be obtained starting from R

�by two successive

rotations expressed by the following transformation:

�¹�"R(z, �)R(x,�). (A.3)

By developing relation (A3), one obtains

u"�u�u�u��"�

sin � sin �

!cos � sin �

cos � �. (A.4)

To turn around u is equivalent to turn around the axisz of the reference mark R

�. This can be done by a trans-

formation of R�towards R

�, a rotation R(z, �) then a re-

verse transformation towards R�, one deduces from this

that

R(u, �)"R(z, �)R(x,�)R(z, �)R(x,!�)R(z,!�).

The preceding formula is developed by taking intoaccount relation (A4):

R(u, �)"

�u��(1!C�)#C� u

�u�(1!C�)!u

�S� u

�u�(1!C�)#u

�S�

u�u�(1!C�)#u

�S� u�

�(1!C�)#C� u

�u�(1!C�)!u

�S�

u�u�(1!C�)!u

�S� u

�u�(1!C�)#u

�S� u�

�(1!C�)#C� �

(A.5)

with C�"cos(�) and S�"sin(�). Using the vectorialpre-product matrix of u, the formula of Rodrigues is

R(u, �)"I���

C�#S�X(u)#(1!C�)uu, (A.6a)

R(u, �)"I���

#S�X(u)#(1!C�) (X(u))�. (A.6b)

These two equations are equivalent, where I�represents

the unitary matrix of third order.

M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385 383

Page 10: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

The inverse problem

The inverse problem is that to "nd the vector and theangle corresponding to a given rotation matrix. By mak-ing the sum of the diagonal terms in Eqs. (A.1) and (A.5),one "nds

C�"��(i�#j

�#k

�!1). (A.7)

From the remaining terms, comes

2u�S�"j

�!k

�,

2u�S�"k

�!i

�,

2u�S�"i

�!j

�,

(A.8)

where

S�"���( j

�!k

�)�#(k

�!i

�)�#(i

�!j

�)�. (A.9)

The angle is then deduced as

�"Arctan (S�/C�) with 0)�)�. (A.10)

By analysing the signs in Eq. (A.8) and by using theexpressions of (A.1) and of (A.5),

u�"Sign( j

�!k

�)�(i

�!C�)/(1!C�),

u�"Sign(k

�!i

�)�( j

�!C�)/(1!C�)

u�"Sign(i

�!j

�)�(k

�!C�)/(1!C�)

, (A.11)

is obtained.Thus, there are two solutions to the inverse problem:

R(u, �) and R(!u,!�). A noticeable drawback is thatthe solution represents a singularity in the vicinity of�"0.

A.5. QuaternionThe quaternion (or Euler's parameters, or

Olindes}Rodrigues parameters) is introduced primarilyto eliminate the already evoked singularity. The orienta-tion is expressed by four parameters which describesa single rotation � (!�)�)#�) around the axis ofa unit vector u. These parameters are de"ned by

��"cos(�/2),

��"u

�sin(�/2),

��"u

�sin(�/2),

��"u

�sin(�/2).

(A.12)

The sum of the squares of these terms gives

���#��

�#��

�#��

�"1. (A.13)

After replacing u and � by the quaternion in Eq. (A.5), therotation matrix is written as follows:

R"

�2(��

�#��

�)!1 2(�

���!�

���) 2(�

���#�

���)

2(����#�

���) 2(��

�#��

�)!1 2(�

���!�

���)

2(����!�

���) 2(�

���#�

���) 2(��

�#��

�)!1�.

(A.14)

By an analysis similar to that in the case of the rotationvector, of Eqs. (A.1) and (A.4), the solution of the inverseproblem is obtained by the following formulas:

��"�

��i

�#j

�#k

�#1,

��"�

�Sign( j

�!k

�)�#i

�!j

�!k

�#1,

��"�

�Sign(k

�!i

�)�!i

�#j

�!k

�#1,

��"�

�Sign(i

�!j

�)�!i

�!j

�#k

�#1.

(A.15)

The quaternion gives a single solution and does notpresent any singular position.

References

Chavand, F., Shaheen, M., & Mallem, M. (1997). Matching betweena 2D-image and its 3D-model: Application to updating 3D informa-tion of working spaces. International Symposium on Artixcial Intelli-gence, Robotics, and Intellectual Human Activity Support for NuclearApplications. Wako-shi, Saitama, Japan.

Dementhon, D. F., & Davis, L. (1995). Model-based object pose in 25lines of code. International Journal of Computer Vision, 15, 123}141.

Dhome, M., Richetin, M., Lapreste, J. T., & Rives, G. (1989). Deter-mination if the attitude of 3D objects from a single 2D perspectiveview. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 11, 1265}1278.

Faugeras, O. (1993). Three-dimensional computer vision *A geometricviewpoint. Cambridge, MA, US: MIT Press.

Forsyth, D., Mundy, J. L., Zisserman, A., Coelho, Ch., Heller, A.,& Rowthwell, Ch. (1991). Invariant descriptors for 3-D object recog-nition and pose. IEEE Transactions on Pattern Analysis and MachineIntelligence, 13(10), 971}991.

Horaud, R., Conio, B., Leboulleux, O., & Lacolle, B. (1989). An analyticsolution for the perspective 4-point problem. Computer Vision,Graphics and Image Processing, 47(1), 33}44.

Hung, Y., Yeh, P.S., & Harwood, D. (1985). Passive ranging to knownplanar point sets. Proceedings of the IEEE International Conferenceon Robotics and Automation, (pp. 80}85). Saint-Louis, Missoury,USA.

Khalil, W., & Dombre, E. (1988). Mode& lisation et commande de robots.Paris: Edition Hermes.

Lowe, D. G. (1987). Three-dimensional object recognition from singletwo-dimensional images. Artixcial Intelligence, 31, 355}395.

Lowe, D. G. (1991). Fitting parameterised three-dimensional models toimages. IEEE Transactions on Pattern Analysis and Machine Intelli-gence, 13(5), 441}450.

384 M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385

Page 11: Visual command of a robot using 3D-scene reconstruction in an augmented reality system

Mallem, M., Shaheen, M., & Chavand, F. (1999). Automatic cameracalibration based on robot calibration. The 16th IEEE Instrumenta-tion and Measurement Technology Conference, vol. 3 (pp. 1278}1282).Venice, Italy.

Moreau, G., Mallem, M., Chavand, F., & N'Zi, E.C. (1997). Two 3Drecovering methods for robot control. IFAC+97, SYROCO (pp.531}537). Nantes, France.

N'zi, E. C., Mallem, M., & Chavand, F. (1997). Interactive building andupdating of a 3D database for teleoperation. Robotica, 15(5), 494}511.

Phong, T. Q., Horaud, R., Yassine, A., & Tao, Ph. (1995). Object posefrom 2-D to 3-D point and line correspondences. InternationalJournal of Computer Vision, 15, 225}243.

Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P.(1992). Numerical recipes in C: The art of scientixc computing.Cambridge, UK: Cambridge University Press.

Roberts, L.G. (1965). Machine perception for three-dimensional solids.In: J. Tippet et al. (Eds.), Optical and Electrooptical InformationProcessing. Cambridge, MA, US: MIT Press.

M. Shaheen et al. / Control Engineering Practice 9 (2001) 375}385 385