06147421

8/14/2019 06147421

1/6

978-1-4577-1524-2/11/$26.00 2011 IEEE

Abstract This paper presents a method to distinguishbetween obstacles and manipulator when they share the sameworkspace. Microsoft Kinect is used as a capturing device. A

Kinect calibration method is explained. Furthermore,calibration between Kinect and the manipulator is addressedby iterative least-square method. 3D model of manipulator isgenerated using OpenGL library. Finally, the manipulatorsurface is deleted from the scene by intersection of databetween the manipulator model and its corresponding pointcloud.

I. I NTRODUCTION N robotics, obstacle detection is important for a robot toaccomplish tasks especially in dynamic environment. To

achieve this goal, obstacle information can be acquired usingcomputer vision methods. Furthermore, camera system plays

a role in obstacle detection.For obstacle detection based on vision, there are manytechniques to analyze data, namely 2D, 2D and 3Dmethods. To detect collision in 2D environment [1][2],obstacles and background are recognized based on theircolors. However, it is prone to error if the robot andobstacles contain similar color(s).

In 2D, [3]-[6] generate surfaces inside a scene usingdisparity information. On the other hands, [7] reconstructobjects using epipolar parameterization instead of using thedisparity images. However, their output is rougher than thatof the surface reconstruction methods. In any case, thementioned 2D methods still require the robot and itssurroundings to have different color/texture.

In 3D, there are many possible algorithms for surfacereconstruction, for example, employing corner detection toreconstruct 3D object model [8]. Also, shape-from-silhouettealgorithm [9]-[11] constructs 3D surface by volumeintersection from multiple views. Nevertheless, it requires alot of computation to reconstruct 3D scene. Furthermore, alltechniques described previously are experimented withstereo or multiple cameras. Hence, 3D surfaces usuallycontain some noise when the robot and the background aresimilar in color/texture.

Some paper even considers multiple types of input. For

example, [12] use 2D cameras and a 3D robot model todistinguish between the robot and its surrounding. However,their method requires complicated mapping between themodel and each cameras view to eliminate robot from 2Dimage .

Manuscript received August 10, 2011.P. R. is with Faculty of Engineering, Kasetsart University, Bangkok,

Thailand (corresponding author to provide phone: +66-2-942-8555; fax:Ext. 1550; e-mail: [email protected]).

M. R. is with Faculty of Engineering, Kasetsart University, Bangkok,Thailand (e-mail: [email protected]).

A. C. is with ESIREM, U De Bourgogne, Dijon, France (e-mail:[email protected]).

To avoid the mentioned problems, one can directly use adepth measuring device to reconstruct the 2D or 3Dsurface. Presently, we have a new equipment that is

inexpensive however versatile and produces less noise thanstereo cameras. Kinect is one of such device and is used tocapture depth image in this work. We transfer the depthimage to world coordinate for obstacle detection. However,the resulting point cloud comprises of both obstacle and themanipulator itself.

In this paper, we will focus on a technique of obstacledetection where parts of the robot are also presented in thescene (in Fig.1). The technique relies on depth imagescaptured from the depth camera, Kinect. We will show howto calibrate Kinect. Furthermore, calibration between thedepth camera and the robot is addressed here. Error in thelatter calibration is minimized using iterative least-square

method. Ultimately, the robots point cloud can be deletedfrom that of the whole scene. Hence, obstacle in the scenecould then be detected efficiently.

This paper is organized as follow. Section II explains howto calibrate Kinect as well as how to transfer its depth imageto 2D depth image. Section III shows how to calibrate

between depth camera and manipulator. Section IVdescribes how to generate 3D model of manipulator anddelete the point cloud of robot from the scene. Section V

shows experimental result. Finally, conclusion of this paperis presented in section VI.

II. CALIBRATION OF DEPTH CAMERA This work uses Microsoft Kinect as a depth measuring

device, to detect obstacles in a scene as it is powerful and yetinexpensive. Kinect is a new alternative gaming device thatis capable to capture color and a depth image at the sametime (in Fig.2). Kinect is composed of an RGB camera, adepth sensor and a multi-array microphone (in Fig.3). Thedepth sensor is consisted of an infrared laser projector and amonochrome CMOS sensor. The resulting point cloud can

Kinect-based Obstacle Detection for Manipulator

Panjawee Rakprayoon, Miti Ruchanurucks, and Ada Coundoul

I

Fig.1 A workspace consists of obstacle (such as human and a plant ofPLC) and a manipulator which is not the obstacle in the red circle line.

- 68 - SI International 2011

8/14/2019 06147421

2/6

be loaded to a computer using open soenable Kinect to be operated with WinData connection to the computer is thro

In order to use Kinect for depth mecalibrate it first. Important parameters

parameters which comprise of focaldistance away from the optic axis (c x ,that lens distortion is so small and neglmethod in [15] requires only a chessbothe form of depth image. Then, cornersthe depth image (in Fig.4) are detecintrinsic parameters.

Note that Kinect returns a rawcontains integer value between 0 and 2image must be transformed into the depin equation (1)(2) using the following e

1( )

depth f z

=

( ) ( 0.0030711016* 3.330 f z x= +

Where z is the raw depth from Kine

Fig.2 Color and depth images captured from Ki

Fig.4 Depth image of chessboard which cchessboard shown in red color.

Fig.3 The configuration and component of Kine

DEPTH SENSOR

RGB CAMERA

MULTI-ARR

urce libraries whichows, Linux or Mac.gh USB interface.

surement, one mustare among intrinsiclength (f x , f y ) and

y). Here we assumeigible. A calibrationrd to be captured inof the chessboard inted to estimate the

epth image which047. The raw depthth image to derive Z quation [13].

(1)

9495161)

(2)

t and f(z) is a linear

function which can be estimaAfter that, the intrinsic p

(4) and (5) to approximate rethe depth image ( xyZ ).

Z =

X =

Y =

III. CALIBRATION BETMANI

This section explains hotransformation between the base ( cam

base H ). This calibrat

attached to the robots end-ef

First, we define an equati

coordinate system in (6).

camcam ba se P H =

Where is a position of

coordinate. This parameter cdescribed in section II.

ect device.

ntains corners of the

ct device.

Fig.6 Coordinate system of depth c

Z

Fig.5 A point on image p = (x,y,on world coordinate P = (X,Y,Z)

O

j

Image

fCenter of projection

cam P

Y MIC

ted from regression techniques.rameters are input to equational-world coordinate ( XYZ ) from

epth (3)

( ) x x

Z x c f

(4)

( ) y

y

y c

f

(5)

EEN DEPTH CAMERA A NDULATOR

w to derive a homogeneousdepth camera and the robotsion requires a marker to be

fector as shown in Fig.6.

n to derive for cambase H in world

baseend eff end eff H P (6)

the marker in the depth cameran be acquired using the method

amera and manipulator.

X

Y

ZY

X

Z) is capable to be projected to point.

i

k

plane

Optical axis

p=(x,y,Z)

P=(X,Y,Z)


8/14/2019 06147421

3/6

[ 1]T cam c c c P X Y Z =

end eff P is a position of the marker in the end-effector

coordinate. For convenience, we attach the marker at thecenter of the gripper.

[ ][ 1] 0 0 0 1 T T end eff end eff end eff end eff P X Y Z = =

baseend eff H is a homogeneous matrix from base of the robot

to its end-effector . This matrix is known as we can acquirethe robots forward kinematic relationship.

11 12 13 14

21 22 23 24

31 32 33 34

0 0 0 1

baseend eff

a a a a

a a a a H

a a a a

=

And the unknown cambase H contains 12 parameters.

11 12 13 14

21 22 23 24

31 32 33 34

0 0 0 1

cambase

r r r t

r r r t H

r r r t

=

Here, equation (6) then be represented by equation (7).

11 12 13 14 11 12 13 14

21 22 23 24 21 22 23 24

31 32 33 34 31 32 33 34

0

00

1 0 0 0 1 0 0 0 1 1

c

c

c

cam end eff

X r r r t a a a a

Y r r r t a a a a Z r r r t a a a a

=

(7)

To derive the unknown, the first 3 rows of (7) can berewritten in a different matrix form.

i i A x B= (8)

Where A is a known coefficient matrix.

14 24 34

14 24 34

14 24 34

0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 1

a a a

A a a a

a a a

=

And B contains the position of the marker in the depthcamera coordinate.

[ ]T c c c B X Y Z =

For i, it is the number of data frame that and is indexedwith i = 1,2,3,4,5,,n.

Our unknown is matrix x that contain 12 parameters ofmatrix cam

base H .

[ ]11 12 13 21 22 23 31 32 33 14 24 34T

x r r r r r r r r r t t t =

In order to solve the 12 unknown parameters, 9 rotationsand 3 translations, multiple numbers of robot configurationsare required. Also, as A is not a square matrix, x matrix can

be derived using pseudo inverse as in (9).

i i x A B+

= (9)

Wherei A

+ denotes the left matrix pseudo inverse given

by

( )1T T

i i i i A A A A

+= (10)

Error of the 3D position is a major cause of error incalculating cam

base H . We can reduce such error by exploiting

multiple inputs of A and B matrix. Here, iterative least- squares method [14] is used to update and average every

data accumulatively in order to acquire a more accuratecambase H . It is more robust than linear least square as well as

than QR decomposition in the sense that there is no limit inthe number of inputs, as presented in the two previousmethods. Thus, error in approximation of x can be averaged.In this paper, each depth image is captured whileconfiguration of robot is changed. The equation of iterativeleast-square method can be written as (11).

1

1 1

( ) ( ) N N

T T i i i i

i i

A A A B

= =

= (11)So now, the parameters of cam

base H are resolved from (11).

Then, cambase H will be exploited to model the robot in cameracoordinate in the next section.

IV. GENERATING 3D MODEL A ND DELETING POINT CLOUDOF MANIPULATOR

For deleting point clouds of the robot from depth image, point clouds of the whole scene (manipulator +surroundings) and 3D model of the manipulator must be inthe same coordinate.

First, the manipulator is modeled as a chain of cylinders(in Fig.7a) which consist of the position of each robot joint(

robot P ) with respect to its base. The chain of cylinders is able

to be moved using the input angles from each correspondingconfiguration.

We have to transfer the 3D model of manipulator to depthcamera coordinate (in Fig.7b) using cam

base H obtained

previously from equation (10). The change in coordinatesystem is shown in Fig.8. The equation for transferring

joints position of the model (robot P ) to the depth camera

coordinate is in (12).

camcam base robot P H P = (12)


8/14/2019 06147421

4/6

As a result, the point cloud in scenecan then be put in the same coordina

point cloud of manipulator only, wewithin the cylinders model as the robotcloud within will be deleted. Final outpof surroundings as require. Experimenwill confirm the usefulness of the metho

V. EXPERIMENTAL R EAn articulated robot (AR6) of BOSC

6 axes, is used in this paper. The posit

end eff P ), which the marker is attached, a

of AR6 configurations ( baseend eff H ) (in

determine the relationship between cam

). Also, it used to generate 3D model o f

deleting point cloud within the model.

Fig.8 Coordinate system from manipulator to de

Fig.9 The configuration of BOSCH articulated a

(a) (Fig.7 3D model of manipulator (a) and mmanipulator to camera coordinate (b).

and the robot modele. To eliminate theonsider point clouds surface. Any pointt would be only thatin the next section

d.

ULT H, which consists ofion of end-effector (nd multiple numbers

Fig.9) are used to

era and robot ( cambase H

the robot (robot P ) for

Kinect device is installedworkspace of manipulator (then applied. For transferricloud in 3D world coordincalibrate depth camera to estiequation (3)-(5) are then appareas in Fig.10 behind theresulting from the fact thatocclusion can be avoided bmeasuring device.

Then, the 3D model ofFig.7a) and transferred to tFig.7b) using cambase H . To veri

whole scene as well as the musing OpenGL library. Tworkspace, we transform thecamera coordinate to OpeFig.11.

Finally, the point clouddeleted from the depth sur f

model. The experimental recloud of surroundings is pres

VI. COIn scenes that consist of

we need to distinguish betwIn order to do so, a 3superimposed onto depth imcalibration steps. First, Kinecinformation in real-world crequires a known size planSecond, calibration betweemanipulator is applied to fi( cambase H ). This second calibrat

to the robots end-effector.

generate multiple configuratimatrixs error as much asmethod is applied to deaexperiment shows that thecloud of robot from the depacquired from Kinect is cleamethods. Furthermore, the

prior known object to remodisadvantage of the proposecan be avoided by using mult

pth camera.

rm robots [16].

) pping 3D model of

Fig.11 Coordinate system from de

at some distance to cover an Fig.6). Equation (1)-(2) areg depth information to point

te, OpenCV library is used tomate intrinsic parameters. Thenied. Reader may observe blank

robot. This is an occlusionwe use only one Kinect. Suchy using more than one depth

manipulator is generated (ine depth camera coordinate (iny the result, point cloud of the

odel is displayed in 3D graphicget a better view of the

whole scene again from depthGL coordinate, as shown in

of the manipulator is can beace using the 3D manipulator

ult that consists of only pointnted in Fig.12.

NCLUSION oth obstacles and manipulator,en them before path planning.

D model of manipulator isges. System setup requires twot is calibrated to generate depthordinate. The first calibration

ar object such as chessboard.the depth camera and the

d a homogeneous relationshipion requires a marker attached

The robot is then moved to

ons to reduce the homogeneous possible. Iterative-least squarewith such calibration. The

D model can delete the pointth images. Also, depth imageser than that of stereo matching

proposed method requires noe the point cloud. However, ad method is occlusion, whichiple depth cameras.

pth camera to OpenGL.


8/14/2019 06147421

5/6

R EFERENCES [1] S. Khan and M. Shah, Object based segmentation of video using

color, motion and spatial information, IEEE Computer Society Conf.on Computer Vision And Pattern Recognition, vol.2, pp. II-746 II-751, 2001

[2] H. Shoji, K. Shirai, and M. Ikehara, Shape from focus using colorsegmentation and bilateral filter, Digital Signal ProcessingWorkshop,12 th- Signal Processing Education Workshop, pp. 566-571,2006

[3] R. B. Rusu, A. Holzbach , M. Beetz , and G. Bradski , Detecting andsegmenting object for mobile manipulator, IEEE 12 th InternationalConf. on Computer Vision Workshops , pp. 47-54, 2009

[4] S. Fazl, H. M. Dehnavi, and P. Moallem, A robust obstacle detectionmethod in highly textured environments using stereo vision, 2nd

International conf. on Machine Vision, pp. 97-100, 2009[5] N. Suganuma, Clustering and tracking of obstacles using stereo

vision system, International Joint Conf. on ICCAS-SICE, pp. 4623-4628, 2009

[6] M. Marron, J. C. Garcia, M. A. Sotelo, D. Pizzarro, and I. B. Munoz,Extraction of 3D feature from complex environments in visualtracking applications, IEEE Conf. on Instrumentation And

Measurement Technology, pp. 1-5, 2007

[7] C. Zhao and R. Mohr, Epipolar parameterization for reconstructing a3D rigid curve, International Symposium on Computer Vision,

pp. 67-72, 1995[8] A. S. Vazquez, A. Adan, and F. Molina, 3D vision system for robot

interaction in moving scenes, IEEE International Symposium on Intelligent Signal Processing, pp. 1-6, 2007

[9] S. D. Roeck, N. Cornelis, and L. V. Gool, Augmenting fast stereowith silhouette constraints for dynamic 3D capture., 18 th

International Conf. on Pattern Recognition, pp. 131-134, 2006[10] M. Toyoura, M. Liyama, T. Funatomi, K. Kakusho, and M. Minoh,

3D shape reconstruction from incomplete silhouettes in multipleframes, 19 th International Conf. on Pattern Recognition, pp. 1-4,2008

[11] H. Y. Lin, J. R. Wu, 3D reconstruction by combining shape fromsilhouette with stereo, 19 th International Conf. on Pattern

Recognition , pp. 1-4, 2008[12] D. Henrich, T. Gecks, Multi-camera collision detection between

known and unknown objects, 2nd ACM/IEEE International conf. on Distributed Smart Cameras, pp. 1-10, 2008

[13] K. Conley. (2011, May 26). http://www.ros.org/wiki/kinect_node[14] N. Ziraknejad, S. Tafazoli, and P. D. Lawrence, Autonomous stereo

camera parameter estimation for outdoor visual servoing, IEEEWorkshop on Machine Learning for Signal Processing, pp. 157-162,2007

Fig.10 Point cloud of each scene which contains point cloud of manipulator in the black circle (scale in cm.).

Fig.12 Point cloud of surroundings when point cloud of manipulator is eliminated (scale in cm.).


8/14/2019 06147421

6/6

[15] N. Burrus. (2011, August 07).http://nicolas.burrus.name/index.php/Research/KinectCalibration

[16] Robot controls-Robot technology Trainning manual IQ200-B & IQ200-P Operating using PHG and ROPS4 or RobotSuite Programming with Bapsplus or Robotsuite , Boasch Rexroth AG,Berliner Strasse 25, Germany, 2000-2005, pp. 42


Documents

06147421