Object recognition and grasping using bimanual robot Report · IOC Institut d’Organització i Control de sistemes industrials KDL orocos Kinematics and Dynamics Library KPIECE Kinematic

Master thesis Master in Automatic, Control and Robotics

Object recognition and grasping using

bimanual robot

Report

Author: Aleix Ripoll Ruiz

Director: Jan Rosell

Call: October 2016

Escola Tècnica Superior d’Enginyeria Industrial de Barcelona

Abstract

This document presents an algorithm to recognise and locate known bulky objects from a

workspace and compute grasps using two anthropomorphic hands. Once valid grasp points

have been found and they are kinematically reachable by the manipulator, motion planning

and a collision check are performed.

In the first place, this document presents a general architecture of the system and the

hardware devices and software implemented. Then, a detailed description of the object

recognition, grasp and motion planning and manager modules are provided. Robot

Operating System (ROS) is the framework chosen to be responsible for the communication

between nodes with appropriate messages, to manage the data from the sensors and to

execute robot tasks.

In the second place, attention is focused on the evaluation of the object recognition and

grasp system. From this chapter, the most important conclusions are:

- The camera’s calibration takes an important role in order to obtain good

performance and reliability of the system.

- This project proposes an approach to use multiple camera views to construct a

final point-cloud scene to obtain a complete 3D object.

- This project presents a satisfactory approach of filtering the desired object with its

colours instead of using its shape.

- The more complete the point-cloud is, the more reduced is the error of object

pose.

- A system which grasps a bulky object with six points, three for each hand, instead

of four as other works recommend, is a more robust system.

- The system has a bimanual manipulator with 22 DoF for each arm. Therefore,

inverse kinematic problem is a laborious process to converge to a feasible

solution.

- ROS service is less robust to service provider changes or server failures respect to

the ROS action which provides feedback on the task progress and cancelation at

any time.

Glossary

AAS Allegro Application Studio

API Application Programming Interface

AR Augmented Reality

BSD Berkeley Software Distribution

C-Space Configuration Space

DLS Damped Least Squares

DoF Degrees of Freedom

EST Expansive Space Trees

FK Forward Kinematics

FoV Field of View

FPFH Fast Point Feature Histogram

GA2H Grasping with two Allegro Hands library

GIKL General Inverse Kinematics Library

GUI Graphical User Interface

HLS Hue Lightness Saturation

ICP Iterative Closest Point

IK Inverse Kinematics

IOC Institut d’Organització i Control de sistemes industrials

KDL orocos Kinematics and Dynamics Library

KPIECE Kinematic Planning by Interior-Exterior Cell Exploration

MIT Massachusetts Institute of Technology

MPL2 Mozilla Public License 2.0

NAN Not A Number

OMPL Open Motion Planning Library

PCL Point Cloud Library

PFH Point Feature Histogram

PR Personal Robots

PRM Probabilistic Roadmap Method

ROS Robot Operating System

RRT Rapidly-exploring Random Tree

SAC-IA SAmple Consensus Initial Alignment

SBL Single-query Bi-directional Lazy collision checking planner

SPFH Simplified Point Feature Histogram

STAIR STanford AI Robot

SVD Singular Value Decomposition

ToF Time-of-Flight

UPC Universitat Politècnica de Catalunya

UR Universal Robot

URDF Unified Robot Description Format

XML eXtensible Markup Language

Contents

1. INTRODUCTION ___________________________________________ 1

1.1. Objectives and scope ..................................................................................... 2

1.2. Project activities .............................................................................................. 3

2. GENERAL ARCHITECTURE OF THE SYSTEM __________________ 5

2.1. Hardware architecture .................................................................................... 8

2.1.1. Camera sensors ................................................................................................ 8

2.1.2. Robotic Hand .................................................................................................. 11

2.1.3. Bimanual Manipulator ..................................................................................... 12

2.2. Software architecture.................................................................................... 14

2.2.1. Robot Operating System (ROS) ...................................................................... 14

2.2.2. ROS packages: openni_launch and iai_kienct2 .............................................. 16

2.2.3. C++ libraries .................................................................................................... 17

2.2.4. The Kautham Project ...................................................................................... 18

3. OBJECT RECOGNITION MODULE ___________________________ 20

3.1. Data representation ...................................................................................... 20

3.1.1. Point-clouds .................................................................................................... 20

3.1.2. Data acquisition............................................................................................... 21

3.2. Camera Calibration ...................................................................................... 23

3.3. Object recognition node................................................................................ 29

3.3.1. Point-cloud processing .................................................................................... 29

3.3.2. ROS service: objects_recognition ................................................................... 39

4. GRASP AND MOTION PLANNING MODULE ___________________ 41

4.1. Object grasp node ........................................................................................ 41

4.1.1. Approach overview ......................................................................................... 41

4.2. Robot inverse kinematics node .................................................................... 43

4.2.1. Robot model simulation ................................................................................... 44

4.2.2. General Inverse Kinematics Library (GIKL) ..................................................... 45

4.3. Motion planning node ................................................................................... 47

4.3.1. Introduction to motion planning ....................................................................... 47

4.3.2. Basic features of the Kautham Project ............................................................ 49

5. MANAGER AND EXECUTION MODULE ______________________ 51

5.1. Input data files .............................................................................................. 51

5.2. Manager node .............................................................................................. 53

6. RESULTS _______________________________________________ 55

6.1. Evaluation of the Object Recognition Module .............................................. 55

6.1.1. Camera calibration .......................................................................................... 55

6.1.2. Multiple Kinects evaluation .............................................................................. 58

6.1.3. Object pose estimation .................................................................................... 59

6.1.4. Object recognition ........................................................................................... 66

6.2. Evaluation of the Grasp and Motion Planning Module ................................. 74

7. PROJECT BUDGET _______________________________________ 77

8. ENVIRONMENTAL IMPACT ________________________________ 78

CONCLUSIONS AND FURTHER WORK __________________________ 79

ACKNOWLEDGEMENTS _______________________________________ 81

BIBLIOGRAPHY ______________________________________________ 82

Object recognition and grasping using bimanual robot 1

1. Introduction During the last decades, the number of bimanual human-like robots applications has

increased every day from tasks related to pick and place items of an industrial production

line to an emerging class of personal robots that assist older or disabled people for

cleaning, delivering items or preparing meals at home. In this scope, many of these

applications require an efficient system for sensing the environment of the robot and an

effective bimanual robot with anthropomorphic hands for executing some tasks, for

instance manipulating two objects simultaneously. The module responsible for sensing

the surroundings of the bimanual robot could provide the ability to recognise and locate

the 3D objects, avoiding collisions with external obstacles or localizing the robot while it is

mapping its environment.

This project is focused on determining if a specific object is in the workspace or not and

grasping it with a bimanual robot if so. Human beings tend to relay on visual data to react

to the world around them. Robots can carry out many tasks with sensors such as RFID

sensors, RGB-D cameras, ranging sensors, tactile sensors, accelerometers, gyroscopes,

etc. However, the performance of these sensors is light years away from biological human

sensors as well as its high costs to a high performance. Fortunately, the robotic

community has seen massive progress in low-cost hardware such as depth cameras and

anthropomorphic robotic hands.

An object recognition system finds and locates objects in the real world from the data

provided by the robot sensors and knowing a priori the objects’ models. But this task is

surprisingly difficult. In this project the stages followed for object recognition process with

3D scene captured with RGB-D cameras is discussed. These types of sensors illuminate

the scene with a pattern light to observe its deformation when it collides with the surfaces

of the scene, or blink an infrared pulse estimating the time required to fly into the scene

and bounce back to the depth camera. The first technology is used by Microsoft Kinect

360 [1] and the second one is used by Microsoft Kinect One [2].

Bimanual grasping systems allow robots to manipulate certain objects and perform

complex tasks that are not feasible with single end-effectors. In this project the system

architecture to plan motions towards grasping configurations and to perform robust

grasping operations with a system composed of two Allegro Hands [3] attached to UR5

robots [4] is presented.

The general architecture of the system, the main hardware devices and the software tools

used in this project are presented in chapter 2. This document presents in chapter 3 the

theory about point-cloud processing, the calibration system procedure and the ROS

service implemented. Chapter 4 presents how the planning of the object grasp, the robot

inverse kinematic and the motion planning have been integrated as well as the main

theory background. In chapter 5 the manager module is presented. Chapter 6 shows the

results for each module and, chapter 7 concludes and presents the future work.

2 Report

1.1. Objectives and scope

In order to clarify what it is expected during the development of this project, it is important

to establish objectives and the limits of the project.

The aim of the project is to develop a system able to locate and grasp automatically a

specific object using a bimanual robot. To accomplish it, an open C++ software

application working on the Robotic Operating System (ROS) [5] is implemented. The main

objectives are:

a) Development of a method to recognise and locate the object based on point-

cloud models using RGB-D cameras.

b) Development of the communications with a grasp and motion planning module

(already developed at the IOC) to determine the points to grasp the object

taking into account the hand tips and to find an admissible collision-free path

that connects the initial configuration of the arms to the target one.

c) Implementation of an execution module responsible for managing the

communication between the other modules.

d) Validation of the system by performing some experiments.

The project will use open software tools and hardware devices available at the IOC’s

laboratory:

- RGB-D cameras: Microsoft Kinect 360 and Kinect One.

- Bimanual manipulator: two Allegro Hands attached to two UR5 robots.

- Open software tools: The Kautham Project, General Inverse Kinematics Library

(GIKL) and Grasping with two Allegro Hands library (GA2H).


1.2. Project activities

This section presents the tasks developed in this project in order to meet the objectives

initially proposed. Below, all tasks are listed and, a diagram is illustrating all activities

without time line representation.

The project consists of four stages in order to meet the initial objectives. Stage I consists

in dealing with all the tasks previous to the programming in C++. These include defining

the project activities, searching similar works (state of the art), studying theory about point-

cloud processing, motion planning and robot inverse kinematics. Stage II consists of

programming the different modules responsible for object recognition, object grasp, path

planning and robot inverse kinematic. Each of these modules are tested in Stage III in

order to study the performance and reliability of each one. At the end, Stage IV

summarises the results and conclusions in a document. During all stages, there is a

parallel Continuous stage which contains the tasks related to the project management.

For each stage, a division of tasks and subtasks has been made.

o Stage I: Previous study

o Task 1: IOC needs

T1.1: Define objectives.

T1.2: Define project activities.

o Task 2: Documentation

T2.1: State of the art.

T2.2: C++ programming language.

o Stage II: Programming

o Task 3: Create database of point-cloud models of common objects

T3.1: Database generation.

o Task 4: System architecture

T4.1: Install required packages and devices drivers.

T4.2: Definition of ROS nodes.

T4.3: Build CMake file.

T4.4: Classes and methods definition.

T4.5: Definition of types of communication required.

o Task 5: Object recognition module:

T5.1: Camera calibration system.

T5.2: Point-cloud processing system.

T5.3: ROS service: object_recognition.


o Task 6: Grasping and motion planning module:

T6.1: Integration of GA2H.

T6.2: Integration of Kautham application.


o Task 7: Inverse kinematic node:

4 Report

T7.1: Integration of GIKL.


o Task 8: Manager module:

T8.1: Implementation of the manager and execution node.


o Stage III: Experimental

o Task 9: System robustness: find unexpected issues and fix the

maximum number possible.

T9.1: Run execution node to find new unexpected issue.

o Task 10: System performance: find bad performance in certain code

and improve it as much as possible.

T10.1: Run execution node to find the lack of performance.

o Stage IV: Report

o Task 11: Results and Conclusions

o Task 12: Final review and creation of final report

o Continuous stage:

o Task R: Project reviews and weekly meetings with IOC supervisors.

The following diagram does not represent a real time line as some tasks could be finished

earlier or later.

Figure 1.1. Tasks project diagram. It is not scaled in time.


2. General architecture of the system

This chapter provides useful information about the general structure of the system, the

hardware devices and the software implemented. According to the Engineering Systems

Division at Massachusetts Institute of Technology (MIT) [6], the architecture of any

complex system is “an abstract description of the entities involved of a system and how

they are related”. To construct the architecture of a system “involves determining what the

system is supposed to do and how specifically it will do it”. Moreover, the basic process of

creating the system architecture is by decomposition in which a top level concept of the

system’s functions is broken down into subsystems capable of performing subfunctions.

Knowing these definitions, the architectures of the system and the sub-systems can be

modelled.

The functional architecture [6] is designed to accomplish the system’s objectives

presented in section 1.1:

Figure 2.1. System architecture.

Figure 2.2 shows the general architecture of the Object Recognition and Grasping System

and the connectivity between its modules. The first module consists of the object

recognition process which identifies and recognises the desired objects to be manipulated

later. This subsystem estimates the pose for each recognised objects in the world

coordinate system in order to grasp and manipulate them with the bimanual robot (see

chapter 3).

The second module is responsible for determining the strategy to pick the object involved

with the robot’s end effectors and place the object to the final desired pose with an optimal

motion planning (see chapter 4).

Finally, the execution module carries out the process to manage the information given by

the other two modules (camera sensor, planner and robot information) and the input user

tasks (see chapter 5).

6 Report

Figure 2.2. Block diagram of the Object Recognition and Grasping System.

The architecture decomposition for each module has been presented. Here, the

difference between hardware and software is done as shown in [6]. Figure 2.3 shows the

architecture of the object recognition module, Figure 2.4 shows the architecture of the

object grasp and motion planning module and Figure 2.5 presents the manager and

execution module.

The difference between the physical (hardware) and non-physical (software) entities is

clear and evident in the most engineering projects like the present one. Two schemes,

one with the relations between hardware and other for the software can be constructed.

Figure 2.3. Architecture of object recognition module.


Figure 2.4. Architecture of the object and motion planning module.

Figure 2.5. Architecture of manager and execution module.

8 Report

2.1. Hardware architecture

The hardware architecture refers to the identification and description of the system’s

physical components and their interaction and compatibility into the system architecture.

Figure 2.6 describes the hardware used in this project: camera sensors, bimanual robot,

two robotic hands and a computer that manages the whole system.

Figure 2.6. Main hardware components used in this project.

2.1.1. Camera sensors

Kinect 360 and Kinect One have been used in this project. Both are sensors for

videogames developed by Microsoft [2]. These cameras are widely used for the robotics

community in the field of computer vision for its low cost. The main differences between

these two cameras are the technique to estimate the distances of object surfaces, the

colour and depth camera resolution, the depth range and the field of view (FoV):

- Technology: Kinect 360 uses structured light and Kinect One uses time-of-

flight (ToF) technique.

- Colour camera resolution: Kinect 360 has 640x480 and Kinect One has

1920x1080.

- Depth camera resolution: Kinect 360 has 320x240 and Kinect One has

512x424.

- Depth range: Kinect 360 has 0.4m to 4m and Kinect One has 0.5m to 4.5m.

- Field of view: Kinect 360 has 43ºvertical and 57º horizontal and Kinect One

has 60ºvertical and 70º horizontal.


The technical specifications of these two cameras are presented in Table 2.1 and Table

2.2:

Table 2.1. Kinect 360 technical specifications [1].

Kinect 360 Specifications

Technology Structured light (see section 3.1.2)

Colour camera 1280 x 1024 @12fps 640 x 480 @30fps

Pixel size 3.6μm

Spatial resolution

3mm @2m distance

Depth camera 320x240@30fps

Depth resolution 1cm @2m distance

Depth range 0.4m to 4m

Field of view (FoV)

43° vertical by 57° horizontal

Tilt motor Vertical tilt range

± 27º

Frame rate (depth and colour stream)

Up to 30fps

USB standard 2.0

Audio format A four-microphone array and signal processing including acoustic echo cancellation and noise suppression. 16-kHz, 24-bit mono pulse code modulation (PCM)

Accelerometer A 2G/4G/8G accelerometer configured for the 2G range, with a 1° accuracy upper limit.

Price 270€

10 Report

Figure 2.7 shows the main components of the Kinect 360:

Figure 2.7. Kinect 360 components [1].

Table 2.2. Kinect One technical specifications [2].

Kinect One Specifications

Technology Time-of-flight (see section 3.1.2)

Colour camera 1920 x 1080 @30fps

Depth camera 512 x 424 @30fps

Depth range 0.5m to 4.5m

Field of view (FoV)

60° vertical by 70° horizontal

Tilt motor No

Frame rate (depth and colour stream)

Up to 30fps

USB standard 3.0

Audio format Four microphones to capture sound, record audio, as well as find the location of the sound source and the direction of the audio wave.

Dimensions 24.9cm x 6.6cm x 6.7cm. The Kinect cable is 2.9m long

Weight 1.4kg

Accelerometer No

Price 450€


Figure 2.8 shows the main components of the Kinect One:

Figure 2.8. Kinect One components [1].

2.1.2. Robotic Hand

Two Allegro hands have been used in this project. This hand is developed by SimLab’s

Allegro Application Studio (AAS) [3]. Allegro hand is a low-cost and adaptive robotic hand

for a variety of commercial robots. It is composed with four fingers and sixteen

independent controlled joints. For this reason, it is the good platform for grasp and

manipulation applications. It is capable of holding a variety of object geometries. The

technical specifications are presented in Table 2.3:

Table 2.3. Allegro hand technical specifications [3].

Allegro Hand Specifications

Weight Finger (0.17kg) – Thumb (0.19kg) – Total (1.08kg)

Design Anthropomorphic

DoF 16 (4 Dof x 4 fingers)

payload 5 kg

Actuation Type: DC Motor Gear ratio: 1:369 Max. Torque: 0.70Nm

Joint resolution 0.002º (measured with potentiometer)

Communication Type: CAN (NI, Softing, Kvaser or ESD CAN) Frequency: 333Hz

12 Report

Power 7.4VDC (7.0V – 8.1V) 5A minimum

System requirements

CPU: Inter Core 2 Duo or higher RAM: At least 2GB HDD: at least 2GB Graphics: OpenGL 3.0 H/W Acceleration enabled with at least 64Mb of video RAM OS: MS Windows and Linux (ROS)

Price 13400€

Figure 2.9 shows the main components of the Allegro hand:

Figure 2.9. Allegro hand dimensions in mm and its main components [3].

2.1.3. Bimanual Manipulator

The bimanual robot is composed of two UR5 robots fixed by a T-shaped structure built at

the IOC [7]. Figure 2.10 shows the system with a table to manipulate objects and two

Allegro hands (4 DoF) coupled to UR5 arms (6 DoF). The whole system tries to emulate a

human-like morphology.

Figure 2.10. Bimanual manipulator structure. Image from [8].

The UR5 robot is a collaborative arm with six joints developed by Universal Robots [4].

UR5 is designed to mimic the range of motion of a human arm with its flexibility. This robot


is widely used in many applications such as industrial assembly line, food handling,

painting, pick and place, etc. The technical specifications are presented in Table 2.4:

Table 2.4. UR5 technical specifications [4].

UR5 Specifications

Weight 18.4kg

Workspace Working radius up to 0.85m

DoF 6 revolute joints

Joint ranges ±360º

speed Joints: 180º/s Tool: 1m/s

payload 5kg

Actuation Max. Torque: 2.3Nm

Repeatability ±0.1mm

Communication TCP/IP 100Mbit: IEEE 802.3u, 100BASE-TX Ethernet socket & Modbus TCP

Power supply 100-240VAC, 50-60Hz

Power consumption

200W

Price 17900€

The UR5 workspace is a sphere of 0.85m radius excluding a middle cylindrical area of

0.117m radius. Figure 2.11 shows the UR5 manipulator view and its workspace:

Figure 2.11. Workspace of the UR5 manipulator [4].

14 Report

2.2. Software architecture

The software architecture refers to the identification and description of the system’s non-

physical components and their interaction and compatibility into the system architecture.

Computer software interprets the information provided for the physical hardware from

which the system is built. Software can be divided into application software and the

system software. The first one uses the computer system to perform a specific

functionality for the benefit of the user (typical computer program). The second one

includes operating systems, which manage resources and provide common services for

other software working on top of them, and device drivers which control the devices

connected to the computer. Figure 2.12 describes the system software and the

application software used in this project:

Figure 2.12. Main software components used in this project.

The object recognition and grasping system is programmed in C++ language and it uses

the components described in Figure 2.12. Additional information of each one is presented

in the following subsections.

2.2.1. Robot Operating System (ROS)

The Robot Operating System (ROS) [5] is framework for building robot software. It started

at the Stanford University in the mid-2000s by the Stanford AI Robot (STAIR) and the

Personal Robots (PR) program. It is an open source with a large variety of collection of

tools, libraries and conventions that simplify the work of creating complex and robust robot

applications with a wide variety of robotic platforms.


The ROS community allows exchanging robot software and knowledge through ROS

distributions that make easier to install a collection of software, repositories where

different institutions can develop and release their own robot software and ROS wiki

where anyone can contribute with his/her own documentation and tutorials [9].

The ROS software is structured into packages which each one contains some

combination of code, data and documentation. Each package directory has to include a

CMakeList.txt and package.xml file that describes the contents of the package and how

catkin should interact with it. catkin is the ROS build system that generates executable

programs, libraries and interfaces [10].

The ROS system allows different nodes [11] to communicate with each other, exchanging

information and data. However, the whole system needs a running ROS Master [12] in

order to notice nodes the existence of other nodes and starting to communicate with each

other. The ROS Master enables individual ROS nodes to locate one another in the

system and it tracks publishers and subscribers to topics and services.

The communications between nodes can be done with client/server or

publisher/subscriber methodologies. ROS topics [13] implement a publish/subscribe

communication mechanism and ROS services [14] and actions uses a client/server

communication method. Figure 2.13 shows a clear representation of these three

communication ways between nodes:

Figure 2.13. ROS communication mechanisms.

One of the most common way to exchange data in a distributed system is a

publish/subscribe communication mechanism implemented by topics. Before node (A)

starts to transmit data over topics, it must first advertise the topic name and the type of

message that is going to be sent. Node (B) and node (C) have to subscribe to this topic by

making a request to ROS Master. Then, both nodes will receive messages from this topic.

A topic is one-way communication and it is useful if there might be multiple nodes

listening [15].

16 Report

A service is a synchronous two-way communication that allows one node to call a

function that executes in another node. The server node specifies a function and

advertises the service. By making a request, the client node can accesses this service

and then, awaiting a response from the server node [15].

An action is asynchronous two-way communication between nodes. It is similar to the

request and response of a service, for actions a goal and a result respectively. Moreover,

the server node can provide feedback for some updates on the progress and the client

node can cancel the previously request at any time [15].

2.2.2. ROS packages: openni_launch and iai_kienct2

OpenNI SDK, libfreenect and libfreenect2 are available libraries for accessing the

Microsoft Kinect USB cameras [16], [17], [18], [19]. They search for a Kinect through serial

number, parameter necessary for distinguishing between multiple Kinects and running

them at the same time.

A purpose of this project is to use ROS to manage data between different modules that

process different type of information. There are ROS packages available that they

interpret the information provided by the camera drivers into useful data type in ROS.

Different ROS packages have been used in order to use Kinect 360 and Kinect One in the

ROS system:

1. Packages needed for launching Kinect 360 with libfreenect and OpenNI

SDK:

a. Openni_camera: publishes raw depth, RGB and IR image streams from

Microsoft Kinect 360, PrimeSense PSDK, ASUS Xtion Pro or Pro Live

cameras using the OpenNI SDK driver standard [19], [20].

b. Openni_launch: launches RGB-D processing through rgbd_launch with

the OpenNI driver [21].

c. Rgbd_launch: opens an OpenNI SDK device to convert raw

depth/RGB/IR streams to depth images, disparity images and registered

point-clouds. This file is launched internally from openni_launch [22].

2. Package needed for launching Kinect One with libfreenect2 and OpenNI

SDK:

a. Iai_kinect2: this is a collection of tools and libraries for ROS to interface

to the Kinect One device using libfreenect2. It contains a calibration tool

for calibrating the IR sensor to the RGB sensor and depth

measurements. It converts raw depth/RGB/IR streams to depth images

and registered point-clouds [23].


2.2.3. C++ libraries

This section presents different high-level C++ libraries providing useful utilities. Eigen is

used for basic matrix and vector operations such as computing a translation and rotation

of a system reference frame [24]. Basic colour image processing algorithms are managed

by using OpenCV library [25]. Point Cloud library (PCL) offers most important point-cloud

processing algorithms needed for the object recognition application [26]. Grasping with

two Allegro Hands library (GA2H) and the General Inverse Kinematics library (GIKL) are

two libraries developed at the Industrial and Control Engineering (IOC). GA2H is providing

the grasp points of the 3D object and GIKL computes the inverse kinematics of our

bimanual manipulator to reach the grasp points. Some definitions for each library have

been presented:

1. Eigen: is a C++ library for linear algebra, matrix and vector operations and

geometrical transformations. Eigen is an open source library licensed under

Mozilla Public License 2.0 (MPL2) [24].

2. OpenCV: is an open source library licensed under Berkeley Software

Distribution (BSD) that provides real-time computer vision applications.

OpenCV was developed to provide a common infrastructure for computer

vision applications and to accelerate the use of machine perception in the

commercial products. OpenCV’s application areas include: facial detection and

recognition, human gesture recognition, object segmentation and recognition,

motion tracking, produce 3D point-clouds from stereo cameras, stitch images

together to produce a high resolution image, etc [25].

3. Point Cloud Library: is an open source library for 2D and 3D image and

point-cloud processing under Berkeley Software Distribution (BSD) license

and thus free for commercial and research use. The PCL library has numerous

algorithms for point-cloud processing such as filtering outliers from noisy data,

stitch 3D point-clouds together, feature extraction to recognise objects based

on the geometric appearance, surface reconstruction, registration, segment

relevant parts of a scene, etc [26], [27].

4. Grasping with Allegro Hands: is a C++ library developed at the Industrial

and Control Engineering (IOC). The library computes the grasp points of a 3D

object in order manipulate it with two Allegro Hands mounted in a bimanual

manipulator (see section 2.1.3). The algorithm also provides the inverse

kinematics and the motion planning solutions to grasp the 3D object and

manipulate it [8].

5. General Inverse Kinematics Library: is a C++ library developed at the

Industrial and Control Engineering (IOC). The library deals with the main

robotic kinematic problems of any robot-tree like structure. It provides simple

enhancements to the Orocos Kinematics and Dynamics library (KDL) issues,

implements analytical Jacobian solutions for specific manipulators and offers

alternatives for the numerical inverse kinematic solver such as Damped Least

18 Report

Squares method, not included in KDL. Additional information can be found in

the Annex A.

2.2.4. The Kautham Project

The Kautham Project is a software tool developed at the Industrial and Control

Engineering (IOC) for teaching and research in robot motion planning [28]. It is developed

in C++ programming language and it is using the Open Motion Planning Library (OMPL)

for the large number of planners that it provides [29]. OMPL is an open source library

under the terms of a Berkeley Software Distribution (BSD) license. The library offers a

large number of sampling-based motion planning algorithms such as Probabilistic

Roadmap Method (PRM), Rapidly-exploring Random Tree (RRT), Expansive Space

Trees (EST), etc.

OMPL does not contain any method for collision detection or visualization tool [30], but

Kautham uses QT for the user interface, the Coin3D library for the graphic rendering,

PQP library for the collision checker and ODE library for the dynamic simulation of the

rigid bodies. XML format is used for the input problem formulation. Robots and 3D

obstacles are defined as kinematic chains from Denavit-Hartenberg file or Universal

Robotic Description Format (URDF) file. The following figure shows the Kautham Project

software application:

Figure 2.14. Kautham graphical user interface.

Different basic planners can be easily used and parameterized, allowing users to learn in

deep each of them. This application software can simulate the same problem with several

planners and different strategies changing their parameters such as maximum sampling

points, maximum nearest neighbours, maximum distance to connect two nodes without

collision, sampler method, etc [28]. An example of the configuration space of a problem

solved using a PRM planner is shown in Figure 2.15:


Figure 2.15. Visualization of the configuration space of a problem solved with a PRM.

The user can fix different solver limitations such as the number of runs to simulate each

problem, maximum memory used or maximum time to find a solution. Kautham creates a

benchmarking output file which contains the simulation results [28]. An example of the

benchmarking output is provided comparing the execution times in seconds of PRM, EST

and Single-query Bi-directional Lazy collision checking (SLB) planners:

Figure 2.16. Example of boxplot in the benchmarking output.

20 Report

3. Object Recognition Module

The object recognition module identifies specific 3D objects models within a captured 3D

scene with colour information. Moreover, the object’s pose with respect to the world

coordinate system is provided. Figure 3.1 shows the block diagram of the module

indicating the inputs and outputs:

Figure 3.1. Object Recognition Module block.

A point-cloud data definition, a data acquisition, a way for calibrating RGB-D cameras and

point-cloud processing pipeline are explained in the following subsections. At the end, this

section presents how a ROS node is constructed to advertise a ROS service with the

functionality described, if any client requests an object recognition from a 3D scene with

colour information.

3.1. Data representation

An image taken with a common optical camera represents a projection of a partial 3D

world scene to a 2D plane. The 2D digital image consists of a numerical matrix which its

corresponding values are the light intensity. The matrix elements are called pixels. Each

pixel can have one or more channels. One channel could define a grayscale image and

three channels a colour image. When the captured image adds, for each pixel, the

information of distances in Cartesian space, the data is representing a 3D world scene.

3.1.1. Point-clouds

A point-cloud is a data structure used to represent a collection of multi-dimensional points

and is commonly used to represent three-dimensional data in some coordinate system.

These points are usually defined by X, Y, Z geometric coordinates and often are intended

to represent the external surface of an object. Moreover, a point-cloud can also be defined

with the corresponding colour information, so, each point includes the light intensity value

of red, green, blue and transparency. The point-cloud becomes 4D when colour

information is present.

A point-cloud can be generated artificially from a computer program, for instance, creating

a point-cloud sphere with algebra, or can be acquired from hardware sensors such as


stereo cameras, 3D scanners or time-of-flight cameras (see 3.1.2). Figure 3.2 shows a

point-cloud example. The coordinate system of the point-cloud is set with respect to a

fixed coordinate system. If the point-cloud is representing a 3D world scene, then the fixed

coordinate system is usually the sensor origin.

Figure 3.2. Point-cloud example with different objects on the table. Image from [26].

3.1.2. Data acquisition

There are several ways for estimating the distance of an arbitrary object in the 3D scene

and to convert it to a point-cloud. This section presents some hardware sensors such as

stereo cameras, 3D scanners or time-of-flight cameras and how they can construct a

point-cloud with colour information captured in real scene. The type of sensor chosen

depends on the situation and the application to develop. Sometimes, it requires point-

cloud with high resolution, a faster sensor with high frequency rate or a point-cloud without

noise signals.

RGB-D camera sensor combines information about colour and depth. This allows

capturing texture from colour image and geometry of an object in the scene from depth

sensor. This camera sensor contains:

- Optical sensor for getting the colour information.

- Depth sensor composed by infrared ray emitter and infrared sensor which

estimates the distances of the objects in the scene.

Some types of RGB-D cameras are:

1. Stereo cameras: imitates human binocular vision because uses two or more

lenses with a separate image sensor. So, it gives the ability to capture 3D

images [31].

2. Time-of-flight (ToF): measures the time that a ray or pulse of light takes to

travel certain distance between the camera and the object. The entire scene is

22 Report

captured with single pulse of light, as opposed to point-by-point with a laser

beam. ToF camera has an active light source that is used to illuminate the

scene. The camera registers the light reflected from the scene and depth

information is calculated [32].

3. Structured light: is a technique to measure 3D surfaces by the projection of

light patterns of infrared light. This patter is seen distorted when looked from a

perspective different from the projector. By analysing this distortion, information

about the depth can be retrieved and the surface reconstructed. The main

advantage is that the projected features are easily distinguished by the camera

[33].

How the point-cloud is constructed from the optical camera and the depth sensor is not

trivial. Note that a camera calibration is required in order to obtain the correspondences

between the pixels from the colour and depth sensors because both sensors are not in

the same position inside the camera. A translation and rotation matrix is needed to relate

both sensors. Section 3.2 shows the procedure for calibrate the RGB-D camera and an

example of uncalibrated and calibrated point-cloud is shown in Figure 3.5.

Kinect 360 and Kinect One have been used in this project and the main camera

specifications for both are presented in section 2.1.1. The following figures show the

images from Kinect 360 and the point-cloud generated.

Figure 3.3. Example of images taken with Kinect 360.


Figure 3.4. Coloured point-cloud generated with Kinect 360.

3.2. Camera Calibration

Object recognition, navigation systems or 3D scene reconstruction are examples of

robotics applications that use the camera calibration parameters. The process of camera

calibration gives us a mathematical model of the camera’s geometry, a distortion model of

the lens and the pose of a camera with respect to coordinate frame. Camera calibration

can be divided into two main stages:

Calibrate intrinsic parameters: such as lens distortion, focal length, optical

centre and for RGB-D cameras, colour and depth image offsets.

Calibrate extrinsic parameters: the position and orientation of a camera in a

reference coordinate frame.

The intrinsic model transforms a 3D point in real world scene into a corresponding 2D

point in camera image plane. For RGB-D cameras, the intrinsic calibration solves the

colour and depth offsets because optical and IR sensors are not in the same position

inside the camera. Usually, the intrinsic calibration needs to be done only once per

camera, it makes sense to save the estimated model after a successful calibration into a

file such as XML or YAML format [34], [35]. Later, just loading these files into the program

will allow skipping the calibration stage. Reliable calibration methods already exist, which

are widely used [23], [36].

The default intrinsic parameters of the Kinect 360 provided for the ROS openni_camera

[37] are quite accurate and an intrinsic calibration procedure has not been applied.

However, Kinect One has been calibrated by the method provided by the ROS iai_kinect2

package [23] which gives the camera intrinsic matrix, the distortion coefficients of the lens,

the rotation matrix and the projection matrix for RGB and IR cameras and saves the

information into a YAML file. Moreover, another YAML file is generated containing the

matrix which relates both cameras containing the rotation and translation from RGB to the

24 Report

IR cameras. Figure 3.5 shows how the colour is now correctly applied on the depth data.

This is clear evident around edges like the edge of the Stäubli robot or the rugby ball.

Figure 3.5. Uncalibrated and calibrated point-cloud.

The iai_kinect2 uses a chessboard pattern, a symmetric or an asymmetric circle grid to

calibrate the intrinsic parameters. Typical calibration steps for optical and IR depth

sensors are:

1. Starting at a short distance where the calibration pattern covers most of the

image.

2. Tilting the calibration pattern vertically and horizontally.

3. Move the pattern further away.

4. Move the pattern with different orientations around most of the image.

Figure 3.6. Calibration steps with chessboard pattern.


Figure 3.7 shows the pattern used to calibrate optical and IR of Kinect One:

Figure 3.7. Chess5x7x0.03 pattern detected for optical and IR sensors.

The extrinsic calibration determines the transformation of the camera coordinate system

to world coordinate system. This transformation can be decomposed into a rotation and a

translation. Homogeneous transformations based on 4x4 real matrices have been used

most often within the robotics and computer vision community. The homogeneous matrix

is described below in a block-partitioned form [38]:

(Eq. 3.1)

where is the relative orientation and

is the relative position of frame A relative to

frame B.

Pattern transformation matrix is known and the current pose of the camera with

respect to the pattern coordinate system is also provided . Then, the camera

coordinate system with respect to world coordinate system is determined by multiplying all

the homogenous transformation matrices:

(Eq. 3.2)

26 Report

Figure 3.8 shows how multiple cameras can be calibrated with the same pattern

coordinate system.

Figure 3.8. Extrinsic calibration of two cameras with the same pattern.

Figure 3.9 presents how a single camera can be calibrated with multiple patterns. This

process is more robust to determine the pose of a camera with respect to the world

coordinate system, because it gets the information of multiple patterns.

Figure 3.9. Extrinsic calibration of a single camera with multiple patterns.

Note that symmetric markers such as chessboard (Figure 3.7) used for intrinsic

calibration, they do not provide a correct orientation of the camera with respect to the

world coordinate system. The algorithm is not able to distinguish when the pattern is

oriented up or down. This issue is solved by using asymmetric patterns. Reliable


algorithms already exist, which are widely used:

1. Asymmetrical circle pattern of OpenCV library [39].

2. ARToolkit: calculate the real camera position and orientation relative to

physical markers in real time [40].

3. ROS package AR_pose: is an Augmented Reality marker estimation that uses

ARToolkit. It provides a transform between the camera and a single AR marker

or an array of transforms for multiple AR markers [41].

4. AR Track Alvar: is newer and more advanced than the ARToolkit, which has

been the core for other ROS AR markers packages [42].

AR markers are print outs looking much like QR code. They have a special structure that

makes it easy for the common object recognition algorithms to find them inside an image

or camera stream. First, they have a clear border that detaches the marker from the

outside world. Second, inside this border there is a unique pattern that is easily

recognisable by the image recognition algorithms as it provides clear edges. Third,

knowing the size of the marker helps to transform the 2D image coordinates to 3D world

coordinates. However, all types of patterns have some restrictions such as the marker

size and the angle of the camera looking to the marker.

Figure 3.10. Three examples of AR tags [42].

OpenCV algorithms have been implemented for its simplicity instead of external ROS

packages which generates a ROS package dependency. Therefore, asymmetrical circle

grid is used (see Figure 3.13) and AR markers will be integrated in further work.

The method developed to calibrate the extrinsic parameters of a camera is the same

whatever the camera we use. The camera driver, that decodes the sensor information,

has to be adapted to ROS message topic in order to provide the image colour data to the

extrinsic camera calibration algorithm. Figure 3.11 shows the inputs and outputs of the

extrinsic calibration function. The needed inputs are: the colour image topic, intrinsic

camera parameters and the position and orientation of all asymmetric circle grid patterns

in the scene referred to the world coordinate system. The output is the position and

orientation of the camera in the world coordinate system.

28 Report

Figure 3.11. Extrinsic camera calibration block.

A diagram of the code structure is shown in Figure 3.12. Two OpenCV functions have

been used for the generation of the extrinsic camera matrix ( ): findCirclesGrid and

solvePnP. The first one attempts to determine whether the input image contains a grid of

circles. The function locates centres of the circles and returns a non-zero value if all of the

centres have been found [43]. The second one estimates the object pose with respect to

the pattern coordinates, given a set of object points, their corresponding image

projections, the camera matrix and the distortion coefficients [43].

Then, Object3DChessboard determines an array of object points in the pattern

coordinate system. This object points are correlated with the centre of the circles

recognised in the input image. Finally, the pose of the camera in the world coordinates

system is determined by a simple transformation. Knowing the position and orientation of

the detected pattern and an average translation ( ) and an average rotation

( ) of the camera with respect to the pattern detected, the extrinsic camera matrix is

set:

: is an output rotation vector that brings points from the model coordinate

system to the camera coordinate system. This vector is based on the

Rodrigues’ rotation formula which it is an efficient algorithm for rotating a vector

in space given an axis and angle of rotation [43].

: is an output translation vector.

Figure 3.12. Block diagram of camera set_pose function.


Figure 3.13 shows a detected asymmetric circle grid 4x11 located in front of the camera.

Figure 3.13. Detected an asymmetric circles grid 4x11 used for extrinsic camera calibration.

3.3. Object recognition node

The most important concepts presented in this section are the point-cloud processing

algorithms implemented such as filtering, segmentation, keypoints detection and features

extraction and how the object recognition node is advertising a ROS service in order to

provide object recognition if a client requests it.

3.3.1. Point-cloud processing

A general processing pipeline is presented in Figure 3.14. The main stages are: point-

cloud filtering to remove noise which can cause a poor system performance, a reduced

point-cloud density can decrease the computational cost for processing in the following

stages, a segmentation stage permits to distinguish which are the points of the 3D object

to recognise from the entire point-cloud scene captured, descriptors extraction is used to

get the geometric information more relevant of the object before the comparison with an

object model and the last stage computes the position and orientation of the matched

object.

30 Report

Figure 3.14. Point-cloud processing pipeline.

The RGB-D cameras generate point-clouds with different point densities. Additionally,

measurement errors cannot be avoided which complicates the estimation of point-cloud

parameters such as surface normals. Consequently, these irregularities lead to erroneous

values and fail to object recognition process. However, these issues can be solved

applying some algorithms.

Point-cloud filtering process consists in removing the points out of interest which they do

not provide useful information such as points at infinite caused by unexpected ray

reflections in the scene, unpredictable sensor noise, points with “not a number” (nan)

value, massive amount of data, occlusions, etc. The PCL’s functions used to afford these

issues are:

1. PassThrough filter [44]: makes simple filtering along a specified dimension,

cutting off values that are either inside or outside a given user ranges. This

allows setting a workspace where the objects will be placed for the user to

recognise them. Figure 3.15 shows how this function works where one image

contains the original point-cloud and the other one only the points inside the

workspace fixed by the user.


Figure 3.15. Example of pass through filtering.

2. removeNaNFromPointCloud [45]: removes all points in the input cloud which

contains a nan value. Most PCL functions used in following stages cannot

afford nan values. Consequently, they cannot process the point-cloud if nan

values are not removed before.

3. VoxelGrid filter [46]: this PCL function is used for downsample the point-cloud

in order to reduce the number of points. The method consists in use a 3D voxel

grid like a 3D box to approximate all the points inside with their centroid. This

approach is slower than approximating them with the centre of the voxel but it

represents the object surface more accurately. Figure 3.16 shows how this

function works where one image contains the original point-cloud and the other

one a downsample applied.

Figure 3.16. Downsampling a point-cloud using a voxel grids.

Two additional functions have been developed:

4. Colour filtering: removes all points which are not closed to the colour

threshold fixed by the user. The function is following the HLS model

representation which rearranges the geometry of RGB in order to be more

intuitive than cartesian cube representation. HLS is defined as double cone or

cylinder and it stands for hue, lightness and saturation. Two vertices correspond

to the white and black colours, the angle defines the hue value, the distance to

32 Report

the vertical axis defines the saturation value and the distance of the white and

black axis corresponds to the lightness.

Figure 3.17. HLS model representation.

In order to compare the colours of the point-cloud and the colour threshold fixed by

the user, it is convenient to obtain the pure hue values of the point-cloud. Note that

points closed to white and black values are always filtered. A radius threshold is

applied in order to consider which are the values closed to colour reference and

which are the ones close to white and black colours (see Figure 3.18). Therefore,

a point-cloud is set at his maximum saturation and a half of lightness if a point is

inside the blue region.

Figure 3.18. HLS model: three regions delimited. Points in region blue are considered as colour.

Figure 3.19 shows the original and the final point-cloud after applying colour filtering:

Figure 3.19. An example of maximum colour saturation and a half of lightness.


5. Cluster filtering or radius outlier filtering: discard clusters with low density

points. The user specifies a number of neighbours which every index must have

within a specified radius to remain in the point-cloud. For example, if two

neighbours and a radius are specified, blue points will be removed from the

point-cloud (see Figure 3.20).

Figure 3.20. An example of point cluster filtering.

Figure 3.21 shows the selected clusters with low density of points to be removed

from the original point-cloud.

Figure 3.21. Cluster filtering process.

Segmentation methods divide large amounts of data into smaller clusters in order to

regroup elements with the same properties and decreasing the computational cost for

processing the data in the next stages. The segmentation method implemented is based

on colour-based region growing algorithm. Region growing approach consists in

connecting two points within a specified distance. If a point with the same properties is not

close to another, the algorithm splits the remaining point-cloud into separate clusters.

Colour-based region growing uses colour instead of points normals. Clusters are made by

merging points close to the same colour. Then, two neighbouring clusters with a small

difference between average colours are merged together. During this step every single

cluster is verified by the number of points that it contains. If this number is less than the

threshold specified, the current cluster is merged with the closest neighbouring cluster.

34 Report

Figure 3.22 shows how this RegionGrowingRGB function [47] is splitting the whole

scene in different clusters based on colour:

Figure 3.22. Segmented scene with colour-based region growing.

Figure 3.23 and Figure 3.24 present how the object recognition module is capable to

detecting single or multiple objects. This is possible with the colour-based region growing

segmentation where each cluster detected could be an independent object (see Figure

3.23) or clusters could form a single object with different colours (see Figure 3.24):

Figure 3.23. Segmented scene with multiple objects.


Figure 3.24. Segmented scene with single object with different colours.

At this point, all clusters have been taken as a keypoints to proceed the following stages.

The keypoints contain the important points that can describe the entire cloud. For this

reason, the input cluster can be downsampled using the VoxelGrid [46] with specific leaf

size and all sub-sampled points become our keypoints. Then, the normals of these points

are used to build the shape histogram used in the descriptors. The normal is the

information related with the curvature of the surface and they can be computed with

NormalEstimation function [48].

Object recognition compares different set of points or templates in order to find

correspondences between models. The information that uniquely characterizes a point is

known as descriptor or point-feature representation [49]. Descriptors represent the

information of an image or point-cloud and they are a fundamental part of a system for the

object recognition. A descriptor is build with the information of its neighbourhood and its

normals. If the point-cloud is not dense, the descriptors will have lack of information about

the curvature of the object [50]. The PCL function used to extract a descriptor is:

1. FPFHEstimation [51]: class to compute Fast Point Feature Histogram (FPFH)

[52] descriptors from the input point-cloud and its surface normals.

FPFH descriptor is a local descriptor that improves the performance of Point

Feature Histogram descriptor (PFH) by only considering the direct connections

between keypoints and its neighbours. FPFH removes additional links between

neighbours.

A PFH model is based on the relationships between the points in the k-

neighbourhood and their estimated surface normals. Figure 3.25 presents a

diagram of the PFH computation:

36 Report

Figure 3.25. The influence region diagram for a PFH.

Point Pq marked in red and placed in the middle of a circle (sphere in 3D) with

radius r, is fully interconnected in a mesh with all its k neighbours (points with

distances smaller than the radius specified). Then, the final PFH descriptor is

computed with the relationships between all pairs of points in the

neighbourhood and it represents a computational complexity of O(nk2) being n

the keypoints and k the number of neighbours for each point [52].

Besides, FPFH algorithm reduces the computational complexity of the

algorithm PFH to O(nk). This method computes a descriptor of a point Pq only

considering pairs between itself and its neighbours (red lines illustrated in

Figure 3.26) using a Simplified Point Feature Histogram (SPFH) [52].

An additional step is performed to compensate this loss of information from the

connections between neighbours. FPFH adds to the Pq point a weighted SPFH

from its neighbours according to distance. This procedure incorporates

additional information about the connections between neighbours and the point

Pq until 2r distance [52].

(Eq. 3.3)

where represents the distance between point Pq and a neighbour point Pk.

The extra FPFH connections are shown with black lines in Figure 3.26. As the

diagram shows, some of the value pairs will be counted twice (marked with

thicker lines in the figure).


Figure 3.26. The influence region diagram for a FPFH.

To sum up, FPFH has been implemented in this project because it reduces the overall

complexity of PFH making possible to use it in real-time applications. FPFH includes

additional point pairs outside the r radius sphere at most 2r away but it does not fully

interconnect all neighbours of point Pq as it can be seen from the Figure 3.26.Some value

pairs is missing which might contribute to loss information of the geometry around the

point. However, the re-weighting scheme recaptures some of the point neighbouring

values using SPFH as presented in equation above.

The final stage presented in Figure 3.1 consists in an alignment process called template

registration or matching. By registering a template to a new point-cloud, the position and

orientation of the object can be determined. Moreover, a fitness score of the alignment is

provided. Two PCL functions have been implemented to deal with the registration stage:

1. SampleConsensusInitialAlignment [53]: this method takes an object

template as input and aligns it to the target cloud. Looking at the fitness score

gives an idea of how successful the alignment was, and looking at the

transformation matrix tells us the position and orientation of the object aligned

to the target cloud. This matrix can be described in a block-partitioned form as

shown Eq.3.1.

The Sample Consensus Initial Alignment procedure (SAC-IA) [52] uses point-

cloud intrinsic, rotation invariant features. However, the computational

complexity is high because the matching step requires analyzing all possible

correspondence pairs searching similar histograms in both point-clouds.

2. IterativeClosestPoint (ICP) [54]: minimises the difference between two point-

clouds iteratively [55] and a transformation is estimated based on Singular

Value Decomposition (SVD). ICP requires accurate initialization and requires

mostly overlapping scans. This function has three termination criteria: a

38 Report

maximum number of iterations, a minimum difference value between the

previous and the current estimated transformation, and a minimum Euclidean

squared error.

The fitness score is updated after ICP alignment. A high value indicates that either the

alignment between template and object scene is not the same object or the alignment

could require more iteration to get the best match between both point-clouds. A lower

number means a better alignment. A transformation matrix that describes how template

points should be rotated and translated in order to best align with the points in the target

cloud is also provided. Note that the alignment score says a lot about the final results. The

3D object is recognised in the entire scene and its position and orientation with respect to

the world coordinate system is fixed, if this parameter has a low value.

A general registration pipeline used in this project is presented in Figure 3.27:

Figure 3.27. Block diagram of the registration procedure.


3.3.2. ROS service: objects_recognition

ROS service is a communication mechanism to pass data between nodes [14]. A service

client sends a request message to a service server and it keeps waiting for a response.

This is a limitation of ROS services because the client can wait forever if the service is not

available for some reason such as server has died unexpectedly or service name was

misspelled in the client call. Service allows one ROS node [11] to call a function that

executes in another node. Each service has a unique name and a type that determines

the structure of the request and response messages in the ROS network. A client can

make a persistent connection to a service, which enables higher performance but less

robustness to service provider changes. Service calls are well suited to things that you

only need to do occasionally and that take a bounded amount of time to complete. To get

more information about ROS services, look at the API documentation [14].

Service communication has the following characteristics:

Figure 3.28. ROS service scheme.

A single node initiates the request and only one node can receive it and send back a

response. Service files are just lists of message types which can be built using existing

ROS messages [56] such as std_msgs, geometry_msgs, nav_msgs or sensor_msgs, or

they can be ones you have defined yourself. ROS message is a simple data structure

used on the communication between ROS nodes. Messages can contain standard

primitive types as integer, floating, point, boolean, etc [56].

Object recognition node is advertising a service named objects_recognition associated

to the objectsRecognition.srv service type (see Figure 3.30). When the service node

receives a client request, server executes the function srvLocateObjects. This function

proceeds to the point-cloud processing algorithm explained in section 3.3.1. When the

algorithm ends, the server makes a response to the client with a list of objects recognised

and its poses with respect to the world coordinate system. Section 6.1 presents some

tests and the quantitative results of this module. The scheme of the object_recognition

service can be seen in Figure 3.29.

40 Report

Figure 3.29. Scheme of ROS service: object_recognition.

Finally, the files created to make possible the ROS service communication are presented

in Figure 3.30. A single service file (.srv) and three messages files (.msg) are created to

build a ROS service communication. Service definition cannot embed another service

inside of a service, but messages can contain other types of messages. In the service

definition, the request and the response are separated with three dashes (---).

Figure 3.30. Service and messages types available in this project.


4. Grasp and Motion Planning Module

General robotic applications need the ability to interact with and manipulate objects in the

physical world. Humans see specific objects and know immediately how they would grab

them to pick them up. Robotic grasp detection lags far behind human performance. This

module firstly focuses on, the problem of finding a good grasp given a 3D object model

and its pose and, secondly, on finding an optimal trajectory in a free-collision space

between the initial configuration of the bimanual manipulator to the target object. This

module computes precision grasps for bulky objects using two anthropomorphic hands.

Figure 4.1 shows the block diagram of the module indicating the inputs and outputs:

Figure 4.1. Object Grasp and Motion Planning Module block.

4.1. Object grasp node

This section presents the node responsible for computing the grasps for bulky objects

using two anthropomorphic hands. This type of robotic hand has been developed as

versatile end-effector tools able to perform complex grasp and manipulation tasks. There

are several approaches to find a good grasp points [57], [58], [59], but this project is

focused on the approach developed at the IOC [8]. Figure 4.2 shows the object grasping

points block with the inputs and the output elements:

Figure 4.2. Object Grasping points block.

4.1.1. Approach overview

Most projects define grasp as an end-effector configuration which achieves partial or

complete form closure of a given object. This is a challenging problem because it depends

on the pose and configuration of the robotic gripper as well as the shape and physical

properties of the object to be grasped. So, it requires a search over a large number of

possible gripper configurations.

42 Report

The work in [8] deals with the problem of how to grasp bulky objects that cannot be

grasped with only one hand, using two anthropomorphic hands simultaneously. The

software library provided considers the constraints of the hands and arms in the search of

the contact points on the surface of the object. The approach relies on a full 3D object

model to find an appropriate grasp.

The method consists of searching three reachable contact points for each hand in an

iterative way. Using frictional contact points, two fingers and the thumb from each hand

are sufficient to get robustness of the grasp. The approach has the following steps:

1. The surface of the object model is divided into two sets of slices: ,

2. A set of triplets (

,

) composed of three points with the normal

direction (

,

) are computed in each slice.

3. Each triplet of

is tested together with each triplet of

in order to

find a couple of triplets that satisfies the force-closure condition.

4. The grasp quality is evaluated as the largest perturbation that a grasp can resist

in any direction.

In the end, the algorithm returns the solution grasp (see Figure 4.3), if a free-collision path

for the whole system is found. The method tries with another couple of triplets in case that

the inverse kinematics or the collision check does not return a solution. If after exploring

the object through all triplets and its three axis any grasp points have been found, the

algorithm returns an error [8]. To illustrate the proposed approach, Figure 4.3 shows a

rugby ball divided into blue and orange slices and the resulting contact points:

Figure 4.3. Rugby ball divided in slices based on its inertial axis and the resulting contact points.


4.2. Robot inverse kinematics node

Kinematics is a branch of physics that studies the motion of systems without the

consideration of forces, mass or moments that cause motion. Thus, robot kinematics

refers the analytical study of the motion of a robot manipulator analysing the relationship

between the dimensions and connectivity of kinematic chains and the pose (position and

orientation) for each link of the manipulator [38].

Therefore, it is crucial to define suitable kinematic models for a robot mechanism in order

to analyze the behaviour of manipulators. A manipulator is characterized by an arm that

provides mobility, a wrist that confers dexterity and an end-effector that performs the

robot’s task. The mechanical structure of a robot manipulator consists of a sequence of

rigid bodies, called links, interconnected by means of joints which provide pure rotation or

translation between two consecutive links. So, a kinematic model is built with a hierarchic

structure with a parent-child relationship. This means that, if a joint is rotated around an

arbitrary axis, all its children will also rotate around the same axis because they derive all

of its parent’s transformations [38].

In fact, the robot kinematics describes the analytical relationship between the joint

positions and the pose of the end-effector. Hence, the formulation of the kinematic

relationships allows the study of the forward kinematic problem and the inverse kinematic

problem. Forward kinematics uses the kinematic equations of a manipulator to compute

the position of the end-effector from specified values for the joint parameters. In contrast,

for serial manipulators inverse kinematics is a much more difficult problem because it

involves the transformation of cartesian coordinates, as positions and orientations of the

end-effector, into joint coordinates of a robot manipulator. In fact, the existence of revolute

joints causes the problem more difficult to solve because of the presence of nonlinear

equations [38]. The following figure shows the relationship between forward and inverse

kinematics:

Figure 4.4. The schematic representation of forward and inverse kinematics.

In addition, there are two main techniques that are used in order to solve an inverse

kinematic problem: the analytical and numerical one [60]. Through the first method, the

joint variables are solved analytically according to a given end-effectors’ pose, while

44 Report

through the second one, the joint variables are obtained based on an iterative technique.

There are several methods for solving inverse kinematic problems numerically. These

include cyclic coordinate descent methods, pseudoinverse methods, Jacobian transpose

methods, the Levenberg-Marquardt damped least squares methods, quasi-Newton and

conjugate gradient methods and neural net and artificial intelligence methods [61]. The

GIKL provides three Jacobian IK solvers: Transpose method, Pseudoinverse method and

Damped Least Squares method.

Figure 4.5 shows the block responsible for calculating the joints position with the robot

model topology and the object grasping points computed in the previous stage (see

section 4.1):

Figure 4.5. Robot inverse kinematics block.

In the annex A.1, additional information about robot kinematics theory and numerical

inverse kinematics solvers are provided.

4.2.1. Robot model simulation

Creating a complete virtual model of a robot by simulating components and control

programs can significantly impact the general efficiency of a project. Some benefits of

simulations could be:

Reducing costs involved in robot manufacturing.

Simulating various alternatives without involving physical costs.

Robot or components can be tested before implementation.

Demonstration of a system to determine if is viable or not.

Compatibility with a wide range of programming languages.

The Unified Robot Description Format (URDF) is an eXtensible Markup Language (XML)

[34] format for representing a robot model. The main limitation is that only tree structures

can be represented, excluding all parallel robots. Also, the specification assumes the

robot consists of rigid links connected by joints; so, flexible elements are not supported.

XML specifications for robot models are:

Sensor: describes a sensor such as a camera, ray sensor, IR, etc.

Link: describes the kinematic and dynamic properties of a link.

Transmission: transmits link actuators to joints and represents their mechanical

coupling.

Joint: describes the kinematic and dynamic properties of a joint.

Gazebo: describes simulation properties such as damping, friction, etc.

Model_state: describes the state of a model at a certain time.


Model: describes the kinematic and dynamic properties of a robot structure

[62].

The above XML specifications highlighted in bold consist of:

Kinematic and dynamic description of the robot.

Visual representation of the robot.

Collision model of the robot.

The description of a robot model involves a set of link elements, and a set of joint

elements connecting the links together. The link element describes a rigid body with

inertia, collision box and visual features [63]. The joint element describes the kinematics

and dynamics of the joint and also specifies the joint limits [64]. Figure 4.6 shows the main

attributes of a link and a joint element:

Figure 4.6. Link and Joint representations inside URDF.

4.2.2. General Inverse Kinematics Library (GIKL)

Nowadays, the number of robot manipulator applications is increasing and robots’

morphology is changing accordingly to the functionality. Computing a closed-form solution

for the inverse kinematics of general manipulators could lead to complex problems in

need of a solution.

However, an inverse kinematic library which uses different numerical methods to obtain

the optimal configuration for any robot's pose could be a great alternative, mainly for the

generalization for all kinds of nDoF, tree-like structures and mobile robot manipulators and

it could also be an alternative to avoid computing the analytical formulation for each robot

that only exists in some cases.

There are some robot kinematic libraries available to satisfy our initial needs that are very

robust and have quite a good performance. Therefore, in this project, it has been decided

to choose one of these libraries and expand its functionality while improving their

performances. This is the case of the Orocos Kinematics and Dynamics library, commonly

known as KDL, which computes the robot inverse kinematics with the Jacobian pseudo-

46 Report

inverse but it has some drawbacks [65]:

1. Difficult to converge in presence of joint limits.

2. Only one Jacobian pseudoinverse method implemented.

3. Difficulties to converge with tree manipulators.

4. No actions taken when local minima appear during the iteration process.

5. Any function to load a robot from file.

Issues 3-5 above can be mitigated by simple implementation enhancements to KDL

library such as use URDF file to load any open manipulator, implement analytical

Jacobian solution for each tree manipulators and take action when appear locals minima.

Issues 1 and 2 require consideration of alternative IK algorithms such as Jacobian

Transpose and Damped Least Squares methods. All these issues are intended to be

improved in the General Inverse Kinematics Library (GIKL).

GIKL is a C++ library developed at the Industrial and Control Engineering (IOC) which it

deals with the main robotic kinematic problems of any robot-tree like structure. The

purpose of use this library in this project is to find the final joint configuration of the robot

(see Figure 2.10) to reach the grasping points of the object with two robotic hands. So, an

inverse kinematic problem has to be solved. Figure 4.7 (a particularization of Figure 4.5)

shows the input parameters required to use the inverse_kinematic function and the

output variables:

Figure 4.7. GIKL inverse kinematics block diagram.

First, the file path, where the URDF file is, has to be provided to the IK solver. The solver

constructs internally the kinematic robot structure according to the information of this file.

The solver also identifies all robot’s end-effectors. The manipulator presented in Figure

2.10 has eight end-effectors which correspond to the fingers of the two Allegro Hands.

Second, the desired positions and the orientation of the robot’s end-effectors have to be

provided. The numerical IK solver type, the initial joint configuration of the robot, the

maximum iterations allowed and the maximum error in position and orientation allowed

has to be set, too.

Finally, the inverse_kinematic function computes the final joints configuration of the

robot placing each end-effector to the target points of the object with the desired

orientation. The solver tries to find the solution before reaching the maximum iterations

and it gives the final error in position and orientation. In the annex A.2, additional

information about the GIKL structure and its functionality are provided.


The complete evaluation of the GIKL library is presented in the Annex A.3. The most

important results are presented in Table 4.1:

Table 4.1. Performance of the GIKL.

Chain/Tree DoF

GIKL

pinv

solve

rate

GIKL

pinv

avg

time

GIKL

transpose

solve rate

GIKL

transpose

avg time

GIKL

DLS

solve

rate

GIKL

DLS

avg

time

UR5 5 87,50% 4,22ms 49,00% 183,04ms 88,90% 7,79ms

kukaLWR 7 100,00% 2,35ms 63,30% 158,63ms 100,00% 4,88ms

Bimanual

manipulator 44 - - - - 96.55% 1555ms

4.3. Motion planning node

Robot motion planning is responsible for finding an optimal solution plan from initial joint

configuration to target joint configuration. In addition, the algorithm searches a free

collision path in order to avoid collision of the manipulator to its environment until the

manipulator reaches the target object where the contact is accepted. Figure 4.8 shows the

block diagram of the module indicating the inputs and outputs:

Figure 4.8. Robot Motion planning block.

4.3.1. Introduction to motion planning

Motion planning problem tries to produce a continuous motion that connects a start

configuration (qini) and a goal configuration (qgoal) while avoiding collisions with known

obstacles. There are many robot applications which require the motion planning

formulation such as the mobile robot navigation, the robot autonomy optimization,

grasping objects, etc. A problem of a robot motion planning can be stated as follows:

Fix a start pose of the robot.

Set a desired goal pose of the robot.

48 Report

Describe the geometric structure of the robot.

Provide the geometric description of the world.

The algorithm tries to find a path that moves the robot gradually from start to goal in

configuration space. A configuration space (C-Space) is the set of all possible robot

configurations relative to a fixed coordinate system. Usually a configuration is expressed

as vector of positions and orientations. In addition, free space (Cfree) and obstacle region

(Cobs) are defined inside the C-Space. Free space contains all robot configurations that

avoid collision with obstacles.

The sampling-based algorithms explore the free C-Space by sampling it with enough

points that allow to find a solution. Some of these algorithms are:

The Probabilistic Roadmap Method (PRM) [66].

The Rapidly-exploring Random Trees (RRT) [67].

RRT Connect (bidirectional version of RRT) [68].

Expansive Space Trees (EST) [69].

Single-query Bi-directional planner with Lazy collision checking (SBL) [70].

Kinodynamic motion Planning by Interior Exterior Cell Exploration (KPIECE)

[71].

The Rapidly-exploring Random Trees (RRT) is a single-query planner which attempts to

solve a query as fast as possible and does not focus on the exploration of the entire map

as PRM does. RRT has a random sequence to construct a single tree rooted at qini that

grows toward qgoal configuration [67]. RRT algorithm consists of:

1. First, qrand is selected randomly from a uniform distribution in C-Space.

2. Then, the algorithm adds a new configuration (qnew) into the tree depending on

the step size and the qnear. Configuration qnew is obtained by moving qnear by step

size toward qrand.

Figure 4.9 shows the procedure of RRT to add a new configuration based on qrand and

step size from qnear:

Figure 4.9. The basic RRT construction algorithm.

It is important to take into account the tradeoff between the exploration of Cfree and the

number of samples added to the tree, especially for high-dimensional problems. If step


size is small, then the exploration steps are short and the nodes of the tree are close

together. As the number of nodes becomes large, memory consumption is increased and

finding the nearest neighbour becomes expensive, which reduces the performance of the

planner [72]. Figure 4.10 shows an example of RRT problem solved:

Figure 4.10. Final RRT construction.

At the end, the implementation of a bidirectional version of RRT such as RRTconnect,

which grows two trees toward each other rooted at qini and qgoal until they are merged into

one, will improve the efficiency of the motion planning system [68].

4.3.2. Basic features of the Kautham Project

The Kautham Project is developed in C++ programming language and it is using the

OMPL library for the large number of planners that it provides. Different basic planners

can be easily used and parameterized, allowing users to learn in deep different planning

algorithms. This application software can simulate the same problem with several

planners and different strategies changing their parameters such as maximum sampling

points, maximum nearest neighbours, maximum distance to connect two nodes without

collision, sampler method, etc. Moreover, the application can visualize the solution path

for the problem formulated and animate it [28].

The main information required to define a problem is:

The path to a robot and the obstacles description file: bimanual manipulator

with Allegro Hands (URDF) and the object.

The initial and the final configurations of the robot(s): the initial and final

joint positions of the bimanual manipulator provided by the inverse kinematic

block (section 4.2).

The configuration of the obstacle(s): the object’s position and orientation

respect to the world coordinate system provided by the object recognition

module (section 3).

Other parameters of the planner.

50 Report

The Kautham Project has been encapsulated as a ROS service. Therefore, the Object

Recognition and Grasping system can initiate a ROS service communication mechanism

to pass data between the manager_node and the kautham_node. Each service has a

unique name and a type that determines the structure of the request and response

messages in the ROS network. Some of these services defined by the kautham_node

server are:

kautham_node/SetQuery

kautham_node/CheckCollision

kautham_node/OpenProblem

kautham_node/CloseProblem

kautham_node/SetInit

kautham_node/SetGoal

kautham_node/SetRobotsConfig

kautham_node/SetObstaclesConfig

kautham_node/SetRobControls

kautham_node/SetPlanner

kautham_node/SetPlannerParameter

kautham_node/Solve

kautham_node/AddObstacle

kautham_node/Connect

etc.

A service example of kautham_node is presented in Figure 4.11. In case that the system

needs to set the initial and goal queries, the manager_node requests it to the server node

and waits to a response. The initial and goal queries are determined by the initial and final

joint positions of the bimanual manipulator provided by the inverse kinematic block.

Figure 4.11. ROS service scheme of the kautham_node/SetQuery and its service file.


5. Manager and Execution Module

The manager and execution module is responsible for the supervision and the control of

the different modules in the system. The manager is designed to accomplish the

requirements of the user by using the functionalities of the object recognition module and

the grasping module. Figure 5.1 shows the block diagram of the module indicating the

inputs and outputs:

Figure 5.1. Manager and Execution Module block.

The following subsections detail the user input file definition, how is constructed a ROS

client node to call a ROS service, the execution pipeline and other features are explained.

5.1. Input data files

Object Recognition and Grasping System uses simple text files to load the input user

parameters to accomplish the tasks.

One text file is used for the camera calibration setup which contains the most important

parameters needed for several OpenCV functions described in section 3.2. This file has to

include the next arguments:

1. PATTERN_TYPE: ASYMMETRIC_CIRCLES_GRID

2. PATTERN_SIZE: Number of inner corners per a chessboard row and column

3. SQUARE_SIZE: real distance measured in meters between pattern corners.

4. PATTERN_POSE: the pose of the pattern with respect to the world reference

frame described by the cartesian position in meters (x, y, z) and the angle-axis

description of the orientation in degrees (ax, ay, az, angle).

52 Report

A text file example is shown in Figure 5.2:

Figure 5.2. Example of user input file used in the camera calibration stage.

A second file provides important information to the system such as how many objects in

the scene have to be found, the model to be recognised, the hue colours of the object and

the dimensions of the workspace where the objects are. This file has to include the

following arguments:

1. OBJ_SEARCH: Single object or Multiple objects to be recognised in the scene.

2. MODEL: indicating the path where the model template is.

3. COLORS: the hue levels which the system will focus on.

4. WORKSPACE: dimensions in meters of the box where the system will focus on

finding the objects. The box is located at the origin of the world reference frame.

A text file example is shown in Figure 5.3:

Figure 5.3. Example of user input file used in the manager and execution module.


5.2. Manager node

A general execution pipeline is presented in Figure 5.4. A ROS node is built to implement

these execution stages: request the object_recognition service to get the position and

orientation of the objects in the scene, GA2H is used for computing the grasping points of

the object recognised, inverse kinematic of the bimanual manipulator is computed and

finally, motion planning produces a continuous motion that connects a start configuration

and a goal configuration while avoiding collision with obstacles.

Figure 5.4. Execution processing pipeline.

First, the manager node reads the input user files described in section 5.1 and the node

sends a service request to locate the object or the objects in the scene using the

object_recognition service (see section 3.3.2). When the request is sent, the client node

is waiting for a response.

Second, the manager executes the GA2H library to compute the grasping points of the

object recognised. The inputs are the name of the model, the PCD file of the model object

and its position and orientation. The outputs are the target points on the object where the

tips of the two Allegro Hands will be placed (see section 4.1).

Then, the manager node executes the GIKL library to solve the inverse kinematic of the

bimanual manipulator. The manager sends the desired position and orientation for each

tip of both Allegro Hands. The library also needs the URDF file of the bimanual

manipulator in order to build internally the tree-like structure of the manipulator. Inverse

kinematic solver computes the joint positions with initial joint positions, a method

(pseudoinverse, transpose or DLS), a maximum iteration and error allowed and the vector

54 Report

of the desired end-effectors pose. More information is presented in section 4.2.

Finally, the manager node queries The Kautham Project to solve a basic motion

planning problem to determine a continuous motion that connects a start configuration of

the bimanual manipulator and a goal configuration. The goal configuration of the bimanual

manipulator corresponds to the final joints positions computed in the previous stage.

Furthermore, the Kautham determines the path where the continuous motion of the

manipulator avoids collisions with the known obstacles in the scene. Additional

information about motion planning can be found in section 4.3.

When the solution path is found, the Kautham Project offers the possibility to execute the

plan simultaneously in a virtual simulation and a real execution [73]. More information

about how to run the Kautham Project is explained in the Annex C.


6. Results The performance of the Object Recognition and Grasping System is analyzed separately

for each module. Quantitative and qualitative results of the object recognition module

have been presented and some object grasps have been illustrated at the end of this

section.

6.1. Evaluation of the Object Recognition Module

6.1.1. Camera calibration

Calibration process of any type of sensor takes an important role in order to obtain good

system’s performances. All RGB-D cameras used in this project have been calibrated with

the methodology described in section 3.2. Object Recognition module processes the

point-cloud data provided from the cameras sensors. Therefore, the system has to

compute the intrinsic transformation matrix for the camera in order to adjust the colour and

the depth images due to both sensor offsets. Figure 6.1 shows how the colour is now

correctly applied on the depth data. This is clearly evident around edges like the edge of

the computers or the rugby ball.

Figure 6.1. Uncalibrated and calibrated point-cloud.

The extrinsic camera calibration defines the transformation of the camera coordinate

system to the world coordinate system. This process requires a definition of an

asymmetric pattern and its transformation to the world coordinate system. Then, the

camera determines its pose with respect to the world by recognising the pattern. Section

3.2 has explained in depth the asymmetric patterns used and the functions implemented

to calibrate the extrinsic parameters of a camera.

In this section, two different configurations are provided in order to study the extrinsic

calibration performance. Note that each camera views a single pattern in both cases.

Running multiple Kinects at the same time causes interference in depth sensors (see the

results in section 6.1.2). Thus, it is important to minimise a common area captured with

both depth sensors and place the cameras in such a way that the interferences caused by

the projection of light patterns of infrared light is minimised (see section 3.1.2).

56 Report

The first configuration consists of the object recognition using two cameras capturing both

sides of the object. Thus, cameras can be placed at 180º each other and two patterns

allow calibrating both cameras with respect to the world coordinate system as Figure 6.2

shows:

Figure 6.2. Object recognition with two cameras and two patterns.

The second configuration consists of the object recognition using two cameras which are

capturing different sides of the object and a common area of the object. Both cameras are

placed almost 90º each other in order to detect the same pattern and minimising the

common area of the object captured.

Figure 6.3. Object recognition with two cameras and a single pattern.

The main problem of using two patterns to find the transformation matrix that links the

cameras to the world coordinate system is the numerical error committed. Thus, when two

point-clouds are merged, there is an offset in translation and rotation. This fact can be


minimised setting both patterns in the world coordinate system with high precision but

other factors can still add numerical errors in the transformations.

Besides, the second configuration presents an object recognition system with two

cameras and a single pattern. This reduces the numerical error because both cameras

are focusing on the same pattern and thus, the same transformation matrix is applied for

both cameras. The results of merging two point-clouds captured in both configurations

presented can be seen in Figure 6.4:

Figure 6.4. Results of merging two point-clouds: first configuration (left), second configuration (right)

In addition, the second configuration does not merge perfectly both point-clouds. This fact

could come from the type of pattern used. As commented in section 3.2, the asymmetrical

circle grid (4x11) is used for the extrinsic camera calibration. This pattern is not square

and more circles are defining the z axis than y axis of the pattern with respect to the world

coordinate system. So, when two point-clouds are merged, more offset error appears in y

axis than z axis. Moreover, the higher offset between point-clouds appears in x axis

because the distance between the camera frame and the pattern frame is determined by

2D image processing functions. Thus, the functions estimates the x distance according to

the size of the circles: small size of circles indicates higher distance and vice versa.

58 Report

However, the translation offsets are relatively smalls, when the object recognition is

focused on bulky objects such as a rugby ball, and orientation offsets does not appear

between point-clouds. The object recognition module is able to determine the pose of the

object with the presence of that translation offsets (see Figure 6.5).

Figure 6.5. Translation offsets between two merged point-clouds using the same pattern calibration.

6.1.2. Multiple Kinects evaluation

This project presents some configurations where the RGB-D cameras are set at different

position and analysing qualitatively which is the best scenario that provides good

performances. However, the use of multiple structured light cameras such as Kinect 360

at the same time offers some drawbacks. Kinect 360 measures the 3D surfaces by the

projection of light patterns of infrared light. The use of multiple Kinects 360 causes

overlapping views of the light patterns. This fact causes confusion for the depth sensors of

both Kinects 360 resulting in holes and noise in the computed depth images. Figure 6.6

shows a depth map when only single Kinect 360 is running and when two Kinects 360 are

projecting the light patterns at the same time:

Figure 6.6. Depth image without interferences (left) and with interferences (right).

When the point-cloud is constructed, the interferences also appear on it (see Figure 6.7).

Thus, it is important to minimise the common surface captured with both depth sensors in

order to obtain a good point-cloud quality. The interferences are minimised if the cameras


are located correctly. In addition, the Kinect One is a Time-of-flight camera type which

measures the time that takes a pulse of light to travel certain distance between the

camera and the surface (see section 3.1.2). Although the technology is different with

respect to the Kinect 360, the infrared ray projected of the Kinect One also interferes on

Kinect 360 sensors and vice versa.

Figure 6.7. Point-cloud without interferences and with interferences.

Despite the loss of information in the final point-cloud due to the interferences, the system

with multiple Kinects is able to recognise and estimate the pose of the objects as we can

see in the results in sections 6.1.3 and 6.1.4.

6.1.3. Object pose estimation

The aim of this section is to present quantitative results for the object recognition and

pose estimation mechanism presented in section 3.3. In order to determine the number of

fixed cameras and its configuration in the workspace, four scenarios have been designed.

The intention is to analyse which Kinect type can provide better results and, if the results

improve by using more than one camera. Four different simulations are presented:

a) Simulation using only Kinect One with aerial front view.

b) Simulation using only Kinect 360 with aerial front view.

c) Simulation using two Kinects 360 with lateral view.

d) Simulation using a Kinect One with aerial front view and two Kinects 360 with

lateral view.

The pose estimation error and the computation time have been averaged after 100

simulations for all scenarios presented. Therefore, four systems are presented with its

accuracy and its computation time. Figure 6.8 shows the asymmetrical circle grid used for

the camera calibration, an initial pose of the Rugby ball object and the initial configuration

of Kinect One and Kinect 360:

60 Report

Figure 6.8. Initial configuration of the elements involved in the simulations.

Note that the accuracy of the system is determined by the difference between the

estimated pose provided by the Object Recognition Module and the real pose of the

object. The real pose of the object is set by the user and it could contain some little error.

Thus, the centre of the Rugby ball is placed at:

For scenario 1 and 2: cartesian position (0, 0, 0.165) meters and orientation

expressed in angle-axis of (1, 0, 0) with an angle of π/2 radians.

For scenario 3 and 4: cartesian position (0, 0, 0.165) meters and orientation

expressed in angle-axis of (0, 1, 0) with an angle of π/2 radians.

The system is analyzed with an object which it is big enough to be grasped with the

bimanual manipulator (see Figure 2.10). It does not make sense using a little object which

can be manipulated with one hand. The official dimensions for the rugby ball size 5 are:

Figure 6.9. Official dimensions of the Rugby ball size 5 [74].


In case that the objects pose estimation algorithm commits an error in position up to the

size of the rugby ball, the system will be considered with poor performance. Therefore, the

Object Recognition Module is designed to estimate the pose of a bulky object with the

minimum error as possible.

Four different scenarios have been simulated:

a) Scenario 1: Kinect One with aerial front view

The Object Recognition Module is tested using the Kinect One with an aerial front view

capturing the scene. A representation of the scenario is sketched in Figure 6.10:

Figure 6.10. Scheme of the scenario 1 with its elements involved in the simulations.

b) Scenario 2: Kinect 360 with aerial front view

The second experiment is exactly the same as the previous one but the Kinect 360 is

used now. A representation of the scenario is sketched in Figure 6.11:


62 Report

c) Scenario 3: two Kinects 360 with lateral view

The third experiment consists of implementing additional depth camera to increase the

point-cloud density of the scene and provide to the processing algorithms more

information about the scene. A representation of the scenario is sketched in Figure 6.12:


d) Scenario 4: Kinect One and two Kinects 360

The last experiment consists of implementing the Kinect One to the previous scenario to

increase the point-cloud density and provide an aerial view of the scene. A representation

of the scenario is sketched in Figure 6.13:

Figure 6.13 Scheme of the scenario 4 with its elements involved in the simulations.


The position error has been calculated by the difference between the estimated and the

real position expressed in meters. Figure 6.14, Figure 6.15 and Figure 6.16 show the

position errors committed during the simulations of the four different scenarios. The

estimated position errors for all simulations are represented in plus sign with different

colours and the desired error in black circle sign. The simulation results for each scenario

are presented with independent figures in the Annex B.

Figure 6.14. Plane XY of the estimated position error of the Rugby ball.

Figure 6.15. Plane XZ of the estimated position error of the Rugby ball.

64 Report

Figure 6.16. Plane YZ of the estimated position error of the Rugby ball.

The orientation error has been calculated by the difference between the estimated and the

real orientation expressed in angle axis. The error committed to estimate the orientation of

the Rugby ball object is very high (see Table 6.1). Some simulations present errors

around 3 radiant or 171.88 degrees in some axis. This means that the system is very poor

for estimating the orientation of the objects. However, the results presented in section

6.1.4 determine that the orientation errors decrease for objects with asymmetric geometry.

The Rugby ball can be described as an ellipsoid which has symmetry in the semi-major

and semi-minor axis. This fact is a problem for fixing appropriately the orientation of the

object frame. In our case, the orientation of the Rugby ball can only be fixed to the z axis,

adjusting it to the longitudinal direction and the other two axes randomly.

Finally, Table 6.1 summarises the results of all scenarios presented. The position and

orientation errors for each scenario have been averaged and a confidence interval has

been provided to indicate the region with higher probability to obtain the position and

orientation error of the object. In addition, it is presented the density of points for each

scenario as well as the average computational time which is directly related with the size

of the point-cloud to be processed.


Table 6.1. Results of the four simulated scenarios.

Scenario

Number

of

Points

Object Position errors (units in meters) Object Orientation

errors (units in degrees) Average time

in seconds

xerror yerror zerror angleerror

Kinect One 2700 -0.0448 ± 0.027 -0.0166 ± 0.031 0.0341 ± 0.030 55.61 ± 92.84 20.30

Kinect 360 2500 -0.0217 ± 0.029 -0.0144 ± 0.034 -7.477·10-4 ± 0.040 48.68 ± 100.65 17.12

two Kinect

360 4300 0.0027 ± 0.027 5.5156·10-4 ± 0.022 -0.0314 ± 0.027 49.27 ± 99.08 25.36

Kinect One

and two

Kinect 360

5700 -0.0028 ± 0.026 0.0061 ± 0.028 -0.0287 ± 0.022 48.55 ± 99.54 33.80

66 Report

6.1.4. Object recognition

This section presents qualitative results of the Object Recognition Module in order to

study how the versatile is the system designed with recognising different objects, different

position and orientations of the objects and testing the system with different configurations

of the cameras. The 3D model objects are:

Cylinders with different colours.

Cylinder in T structure.

Rugby ball.

Skate helmet.

Figure 6.17. Object models used for testing Object Recognition Module.

Three different scenarios have been simulated:

a) Simulation using only Kinect One with aerial front view.

b) Simulation using two Kinects 360 with lateral view.

c) Simulation using a Kinect One with aerial front view and two Kinects 360 with

lateral view.

The following three scenarios follow a basic methodology to show clearly the visual

results obtained. First an image colour of the workspace is provided in order to present

the objects to be recognised and then, a set of views of the 3D object recognised. The

point-cloud visualization is done by the PCLVisualizer [26] which an empty window

displays a black background and the world coordinate system (x axis in red, y axis in

green and z axis in blue).


a) Scenario 1: Kinect One with aerial front view

The aim of this scenario is to analyze the performance of the Object Recognition Module if

the point-cloud scene is taken with a single Kinect One with aerial front view. A green

cylinder with length of 16cm and diameter of 6cm is the first object model to carry out the

simulations. Three simulations are performed by orienting the longitudinal direction of the

cylinder through the three orthogonal planes located at the origin of the coordinate

system:

Figure 6.18. Experiment 1: three different orientations of the green cylinder model.

The longitudinal direction of the cylinder oriented perpendicular to the plane XY achieves

a perfect match between the model template and the cylinder in the scene (see Figure

6.19). When the cylinder is perpendicular to the plane XZ, the surface of the cylinder is

tangential to the infrared ray projected of the Kinect which causes distortion in the point-

cloud (see Figure 6.20). Last figure shows how the match is sometimes done partially

between the model template and the object in the scene probably caused by low

iterations to align both models. Figure 6.21 shows how the cylinder template needs more

iteration to align itself to the cover of the cylinder.

Figure 6.19. Results of experiment 1: three object views of the cylinder perpendicular to the plane XY.

68 Report

Figure 6.20. Results of experiment 1: three object views of the cylinder perpendicular to the plane XZ.

Figure 6.21. Results of experiment 1: three object views of the cylinder perpendicular to the plane YZ.

The second experiment consists of testing the system with multiple objects in the

workspace. Cylinder models with different colours (green, pink and blue) are placed as

shown in Figure 6.22:

Figure 6.22. Experiment 2: two different sets of the cylinder models.

Although the Object Recognition Module is detecting correctly the number of objects in

the scene, the object pose estimation is not achieving good results with high accuracy. As

commented previously, the match is sometimes done partially between the model

template and the object in the scene. This fact is caused by low iterations to align both

models. Figure 6.23 shows a perfect match in the pink cylinder but the green cylinder

could need more iteration to align. Figure 6.24 shows that the pink and the green

cylinders oriented vertically achieve better match than the blue cylinder oriented

horizontally. This fact has been detected in the previous experiment (see Figure 6.20).

Finally, the Object Recognition Module is providing the objects pose with respect to the


world coordinate system. However, when the model template has symmetric geometry

such as sphere, the object frame is set at the centre and the orientation randomly. In our

case, the orientation of the cylinders can only be fixed to the z axis, adjusting it to the

longitudinal direction of the cylinder. At this point, it is interesting to perform some

experiments with asymmetric objects.

Figure 6.23. Results of experiment 2: three object views of two cylinders perpendicular to the plane XY.

Figure 6.24. Results of experiment 2: three object views of three cylinders.

Third experiment consists of recognise a single object with “T” structure. This object is

composed with two colours and it is placed as shown Figure 6.25:

Figure 6.25. Experiment 3: two different orientations of the “T” model.

The object model used in this experiment maintains some symmetric geometry in two

transversal planes. However, the “T” structure allows to the system to set the orientation

of the object fixing the z axis to the longitudinal direction of the vertical cylinder (green)

70 Report

and the y axis to the longitudinal direction of the horizontal cylinder (blue). Then, the x

axis is fixed by definition.

Figure 6.26. Results of experiment 3: three object views of the “T” model.

Figure 6.27. Results of experiment 3: two object views of the “T” model with random orientation.

The previous experiments have been designed to analyze the behaviour of the Object

Recognition Module with objects that can be manipulated with one robotic hand. Fourth

experiment consists of recognise bulky objects to be manipulated with a bimanual robot.

Figure 6.28 shows how a Rugby ball and a Skate helmet are placed in the workspace:

Figure 6.28. Experiment 4: Rugby ball and Skate helmet models.

The results obtained with Kinect One with aerial front view indicate that the Object

Recognition Module could not obtain good matches for bulky objects as shown in Figure

6.29. One factor could be the low density of points describing the scene. Therefore, a

single depth camera capturing all objects in the scene could be not enough and multiple

depth cameras would be necessary.


Figure 6.29. Results of experiment 4: Rugby ball and Skate helmet pose estimation.

b) Scenario 2: two Kinect 360 with lateral view

The aim of this scenario is to analyze the performance of the Object Recognition Module if

the point-cloud scene is taken with two Kinects 360. The experiments presented try to

improve the results obtained in previous simulations with bulky objects by choosing

correctly the location of the cameras.

Fifth experiment consists of recognise the Rugby ball with the information provided with

two depth cameras. The point-cloud is much dense respect to the previous experiments.

The Rugby ball pose is shown in Figure 6.30:

Figure 6.30. Experiment 5: two lateral views of the Rugby ball model.

The final result can be seen in Figure 6.31. The pose estimation is not completely

accurate because the two point-clouds taken separately by two different Kinects 360 are

not merged well as can be appreciate in the figure. In section 6.1.1 has been explained

the causes of these offsets between point-clouds. Moreover, the object is illuminated with

the infrared projector of both depth cameras at the same time. This fact causes

interferences in the depth sensors and the visible side of the object has been distorted.

72 Report

Figure 6.31. Results of experiment 5: Rugby ball pose estimation.

The Rugby ball is oriented differently in order to capture two different sides of the ball

while avoiding interferences between the cameras sensor (see section 6.1.2):

Figure 6.32. Experiment 6: two lateral views of the Rugby ball model.

The object has been oriented in order to minimise the common surface captured with both

depth sensors to obtain a good point-cloud quality. Despite the final point-cloud obtained

with two Kinects 360 presents translation offsets, the system with multiple Kinects is able

to recognise and estimate the pose of the object.



c) Scenario 3: two Kinect 360 with lateral view and Kinect One

The aim of last scenario is to analyze the performance of the Object Recognition Module if

the point-cloud scene is taken with three different views. So, three point-clouds are

merged into one.

Figure 6.34. Experiment 7: three different views of the Rugby ball model.

Finally, the Rugby ball can be constructed almost completely, just missing the bottom and

the behind part. Therefore, a better matching between the object template and the object

in the scene has been obtained with respect to the previous experiments with the Rugby

ball:


74 Report

6.2. Evaluation of the Grasp and Motion Planning Module

The aim of this section is to present visual results for the object grasp and motion planning

procedure presented in section 4. The object model selected has been the rugby ball to

test the system in four different poses:

Table 6.2. Rugby ball pose with respect to workspace coordinate system.

Test x y z ax ay az angle

a) 0.0m 0.0m 0.15m 0 0 1 0º

b) 0.0m 0.0m 0.15m 0 0 1 90º

c) 0.1m 0.1m 0.15m 0 0 1 45º

d) 0.0m 0.0m 0.15m 0.3 0.9 -0.3 70º

Figures below illustrate how the object has been divided in slices to place the left and right

fingers and thumbs, show the resulting contact points and present several pictures in

order to see the trajectory followed for the manipulator to reach the object. Note that these

experiments have been performed in uncluttered environments with the object placed

against a uniform background. Grasping in cluttered environments is a harder problem

both from a robot perception as well as a robot motion planning. In this evaluation, robust

strategies for object grasping are needed to overcome the positioning errors previously

analysed (see section 6.1.3).

Figure 6.36. Test a): Rugby ball divided in slices and the resulting contact points.

Figure 6.37. Test a): Manipulator movements following the path planner solution.


Figure 6.38. Test b): Rugby ball divided in slices and the resulting contact points.

Figure 6.39. Test b): Manipulator movements following the path planner solution.

Figure 6.40. Test c): Rugby ball divided in slices and the resulting contact points.

76 Report

Figure 6.41. Test c): Manipulator movements following the path planner solution.

Figure 6.42. Test d): Rugby ball divided in slices and the resulting contact points.

Figure 6.43. Test d): Manipulator movements following the path planner solution.


7. Project Budget

The project budget is shown below:

2 Kinect 360........................................................................................ 540,00€

Kinect One......................................................................................... 450,00€

2 Allegro Hand.................................................................................... 26800,00€

2 UR5.................................................................................................. 35800,00€

Other material.................................................................................... 500,00€

Objects............................................................................................... 100,00€

Computer............................................................................................. 2500,00€

Subtotal.............................................................................................. 66690,00€

IVA (21%)........................................................................................... 14004,90€

Total.................................................................................................... 80694.90€

This project implements open software and any license has been required. The salary of

employees is also detailed.

Technical engineers (900h at 14€/h).................................................. 12600,00€

Project Manager (30h at 30€/h).......................................................... 900,00€

78 Report

8. Environmental impact

This project has presented a system which has a positive impact in the social and natural

environmental but negative in the economic one.

The economic part has an important negative impact due to the high costs of hardware

devices used which is about 80695€, although open software reduces the project budget.

Secondly, the project has a positive social impact. The fact that the system has the ability

to recognise and grasp bulky objects with two hands, allow integrating easily into

scenarios where the robot could coexist with people and help them for specific tasks. At

this point, the system proposed supposes an initial core of a more complex one which will

be able to navigate autonomously into a room and execute other tasks. Moreover, the

open software allows to the robotics’ community to use it without legal issues.

At the end, this project has low natural environmental impact since the power

consumption and the raw materials used are the minimum. Moreover, the hardware

devices have a large lifecycle but once these devices break or do not work properly, these

are recycled correctly.


Conclusions and further work

In summary, the objectives of this project have been achieved upon the obtention of

satisfactory results with the Object Recognition and Grasping System.

A system able to locate and grasp automatically a specific object using a bimanual robot

has been developed. The system has a perception system to sense the environment of

the robot with RGB-D cameras and processing the point-cloud. When the system has

located the object, it is capable to deduce grasping point according to the pose of the

object and the robotic hands. Finally, the manipulator reaches the object from an initial

configuration thanks to the use of an inverse kinematic solver and the path planning

strategy which finds an admissible trajectory with free collision obstacles.

Furthermore, the main conclusions of the evaluation of the object recognition module are:

- The camera’s calibration takes an important role in order to obtain good

performance and reliability of the system.

- Kinect 360 and Kinect One devices are robust to interference when working

alone. When a multiple camera system is required, the cameras interfere with

each other. However, the results presented in this project show that it is

possible to assume those interferences.

- Multiple Kinects One cannot be launched in the same computer due to large

data to process which collapses the USB bus. Using two independent

computers for each Kinect One increments the complexity of the system and

the project budget.

- The point-cloud processing performance is directly affected for several factors

such as the point density, the measurements errors, the selected keypoints, the

descriptor used, the matching process between the source and the target point-

clouds, etc.

- This project proposes an approach to use multiple camera views to construct a

final point-cloud scene in order to obtain a complete object reconstruction. If the

camera calibration is poor, the final point-cloud contains little translation and

rotation offsets between the matched original points, which worsen the

performance of object recognition.

- Filtering process is important to delete outliers and downsampling a point-cloud

in order to reduce the points and consequently, reducing the computational cost

in the next stages. By contrast, deleting many points could produce a lack of

information of the scene and other processing algorithms could not work

properly with low quality of normals or descriptors.

- This project presents a satisfactory approach of filtering the desired object with

its colours instead of its shape.

- The more complete the point-cloud is, the more reduced is the error of object

pose.

80 Report

Secondly, the main conclusions of the evaluation the whole system are:

- In a complex system, where many elements collaborate with each other such

as cameras, manipulators, objects, etc., it is important to fix a world coordinate

system.

- The strategy to grasp bulky object with two anthropomorphic hands is very

different with respect to small ones. The synchrony of two arms to grasp the

same object is a hard and complex task.

- A system which grasps a bulky object with six points, three for each hand,

instead of four as other projects recommend, is a more robust system.

- The system has a bimanual manipulator with 22 DoF for each arm. Therefore,

inverse kinematic problem is a laborious process to converge a feasible

solution. However, if the manipulator joints limits, good initial joints position and

a fast IK solver such as DLS are correctly fixed, the IK problem converges to a

solution.

- ROS services can make a persistence connection between client and server.

This fact supposes low robustness to service provider changes or server

failures. Although the object recognition service did not have any failure during

the simulations, it is recommended to use ROS actions. Action provides

feedback for some updates on the progress and the client node can cancel the

request at any time.

The system presented tries to minimise the budget of the project. Although, the hardware

devices presented have a high cost, the project uses current devices in the IOC’s

laboratory and open software. This fact facilitates to integrate immediately the Object

Recognition and Grasping System into the IOC’s laboratory.

The future tasks could be the following:

1. To improve the Object Recognition module:

- Develop a module to allow the Kinect camera to be held by a robot or

operator equipped with a magnetic tracker to measure the camera pose,

and generate the point-cloud scene by automatically changing the points

of view.

- Use AR markers [42] to place fixed cameras with respect to the world

coordinate system.

2. To improve the Grasp and Motion Planning module:

- Use of a fourth finger from each hand and the palms.

- Use tactile feedback sensors on the hand tips.

- Implementation of motion planner able to find paths with human-like

appearance for bimanual manipulators.

- Use the RRTConnect planner which grows two trees toward each other

until they are merged into one. It will improve the efficiency of the motion

planning system.

3. To improve the Manager and Execution module:

- Use ROS actions to communicate between nodes.


Acknowledgements

I would like to thank my supervisors, in special Jan Rosell, Leopold Palomo and Néstor

García for his patience and support during the process of this project and the rest of the

work mates from IOC to accept me as one of them during my internship.

To Isiah Zaplana, for his classes on numerical inverse kinematics and his motivation to

develop the first core of the General Inverse Kinematic Library.

To Abiud Rojas, for sharing his work about Grasping with two Allegro Hands and his

patience to helping me to integrate it in this project.

To my family, friends and university colleagues, for their advice and their support to finish

this project.

82 Report

Bibliography

[1] Microsoft, “Kinect for Windows Sensor Components and Specifications,” [Online].

Available: https://msdn.microsoft.com/en-us/library/jj131033.aspx. [Accessed July

2016].

[2] Microsoft, “Kinect hardware,” [Online]. Available: https://developer.microsoft.com/en-

us/windows/kinect/hardware. [Accessed July 2016].

[3] SimLab, “Allegro Hand Features,” [Online]. Available: http://www.simlab.co.kr/Allegro-

Hand.htm. [Accessed June 2016].

[4] U. Robots, “Universal Robots: UR5 specifications,” [Online]. Available:

http://www.universal-robots.com/es/productos/robot-ur5/. [Accessed August 2016].

[5] M. Quigley, B. Gerkey, J. Leibs, K. Conley, J. Faust, T. Foote, E. Berger, R. Wheeler

and A. Ng, “ROS: an open-source Robot Operating System,” in International

Conference on Robotics and Automation, ser. Open-Source Software workshop,

2009.

[6] E. Crawley, O. De Weck, S. Eppinger, C. Magee, J. Moses, W. Seering, J. Schindall,

D. Wallace and D. Whitney, “The Influence of Architecture in Engineering Systems,”

Massachusetts Institute of Technology (MIT); pp. 2-15, 2004.

[7] “Institute of Industrial and Control Engineering,” [Online]. Available:

https://www.ioc.upc.edu/the-institute/presentation1?set_language=en. [Accessed June

2016].

[8] A. Rojas-de-Silva and R. Suárez, “Grasping Bulky Objects with Two Anthropomorphic

Hands,” Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems; pp. 1-8, 2016.

[9] WillowGarage, “ROS,” [Online]. Available: http://www.ros.org/. [Accessed June 2016].

[10] J. M. O'Kane, A Gentle Introduction to ROS, University of South Carolina:

CreateSpace Independent Publishing Platform; ISBN 978-14-92143-23-9; p. 9, 2014.

[11] ROS, “ROS.org Nodes,” [Online]. Available: http://wiki.ros.org/Nodes. [Accessed June

2016].

[12] ROS, “ROS.org Master,” [Online]. Available: http://wiki.ros.org/Master. [Accessed June


2016].

[13] ROS, “ROS.org Topics,” [Online]. Available: http://wiki.ros.org/Topics. [Accessed June

2016].

[14] ROS, “ROS.org Services,” [Online]. Available: http://wiki.ros.org/Services. [Accessed

June 2016].

[15] M. Quigley, B. Gerkey and W. D. Smart, Programming Robots with ROS: a practical

introduction to the Robot Operating System, O’Reilly Media, Inc.; pp. 59-80, 81-92, 94-

110, 2016.

[16] ROS, “ROS.org libfreenect,” [Online]. Available: http://wiki.ros.org/libfreenect.

[Accessed June 2016].

[17] OpenKinect, “OpenKinect Main Page,” [Online]. Available:

https://openkinect.org/wiki/Main_Page. [Accessed August 2016].

[18] OpenKinect, “GitHub OpenKinect/libfreenect2,” [Online]. Available:

https://github.com/OpenKinect/libfreenect2. [Accessed August 2016].

[19] Structure, “OpenNI 2 SDK Binaries,” [Online]. Available: http://structure.io/openni.

[Accessed July 2016].

[20] ROS, “ROS.org openni_camera,” [Online]. Available:

http://wiki.ros.org/openni_camera. [Accessed June 2016].

[21] ROS, “ROS.org openni_launch,” [Online]. Available: http://wiki.ros.org/openni_launch.


[22] ROS, “ROS.org rgbd_launch,” [Online]. Available: http://wiki.ros.org/rgbd_launch.


[23] I. A. I. University of Bremen, “Tools for using the Kinect One (Kinect v2) in ROS,”

[Online]. Available: https://github.com/code-iai/iai_kinect2. [Accessed July 2016].

[24] B. Jacob and G. Guennebaud, “Eigen Main Page,” [Online]. Available:

http://eigen.tuxfamily.org/index.php?title=Main_Page. [Accessed August 2016].

[25] WilowGarage, “OpenCV Main Page,” [Online]. Available: http://opencv.org/. [Accessed

June 2016].

[26] WillowGarage, “Point Cloud Library (PCL),” 2010. [Online]. Available:

84 Report

http://pointclouds.org/. [Accessed June 2016].

[27] R. B. Rusu, “3D is here: Point Cloud Library (PCL),” IEEE International Conference on

Robotics and Automation (ICRA); pp. 1-4, 2011.

[28] J. Rosell, A. Pérez, A. Aliakbar, L. Palomo and N. García, “The Kautham Project: A

teaching and research tool for robot motion planning,” IEEE Int. Conf. On Emerging

Technologies and Factory Automation, ETFA’14; pp. 1-8.

[29] I. A. Șucan, M. Moll and L. E. Kavraki, “The Open Motion Planning Library Main Page,”

[Online]. Available: http://ompl.kavrakilab.org/index.html. [Accessed July 2016].

[30] I. A. Șucan, M. Moll and L. E. Kavraki, “The Open Motion Planning Library,” IEEE

Robotics & Automation Magazine; pp. 1-10, 2012.

[31] X. Armangué, “Modelling Stereoscopic Vision Systems for Robotic Applications,”

Universitat de Girona; pp. 49-70, 2003.

[32] M. Laukkanen, “Performance Evaluation of Time-of-Flight Depth Cameras,” AALTO

university; pp. 3-14, 2015.

[33] J. Posdamer and M. Altschuler, “ Surface measurement by space-encoded projected

beam systems,” Computer Graphics and Image Processing, 1982.

[34] “Extensible Markup Language (XML) - Origin and Goals,” [Online]. Available:

https://www.w3.org/TR/REC-xml/#sec-origin-goals. [Accessed June 2016].

[35] C. Evans, O. Ben-Kiki and I. döt Net, “YAML,” 2001. [Online]. Available:

http://www.yaml.org/start.html. [Accessed June 2016].

[36] J. Bowman and P. Mihelich, “ROS camera_calibration,” [Online]. Available:

http://wiki.ros.org/camera_calibration. [Accessed July 2016].

[37] P. Mihelich, “ROS OpenNI launch,” [Online]. Available:

http://wiki.ros.org/openni_launch. [Accessed August 2016].

[38] B. Siciliano, L. Sciavicco, L. Villani and G. Oriolo, Robotics: Modelling, Planning and

Control (Advanced Textbooks in Control and Signal Processing), London: Springer;

pp. 1- 140, 2009.

[39] OpenCV, “OpenCV Tutorials: Camera calibration,” [Online]. Available:

http://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.

html. [Accessed August 2016].


[40] H. Kato, “ARToolkit,” Human Interface Technology Laboratory, [Online]. Available:

http://www.hitl.washington.edu/artoolkit/. [Accessed August 2016].

[41] I. Dryanovsk and B. Morris, “ROS ar_pose,” [Online]. Available:

http://wiki.ros.org/ar_pose. [Accessed August 2016].

[42] S. Niekum, “ROS AR Track Alvar,” [Online]. Available:

http://wiki.ros.org/ar_track_alvar. [Accessed August 2016].

[43] OpenCV, “OpenCV Tutorials: Camera Calibration and 3D Reconstruction,” [Online].

Available:

http://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstru

ction.html. [Accessed August 2016].

[44] PCL, “PassThrough Class Template,” [Online]. Available:

http://docs.pointclouds.org/1.7.1/classpcl_1_1_pass_through.html. [Accessed August

2016].

[45] PCL, “Module filters,” [Online]. Available:

http://docs.pointclouds.org/trunk/group__filters.html#gac463283a9e9c18a66d3d29b28

a575064. [Accessed August 2016].

[46] PCL, “VoxelGrid Class Template,” [Online]. Available:

http://docs.pointclouds.org/trunk/classpcl_1_1_voxel_grid.html. [Accessed August

2016].

[47] PCL, “RegionGrowingRGB Class Template,” [Online]. Available:

http://docs.pointclouds.org/1.7.0/classpcl_1_1_region_growing_r_g_b.html. [Accessed

August 2016].

[48] PCL, “NormalEstimation Class Template,” [Online]. Available:

http://docs.pointclouds.org/1.7.1/classpcl_1_1_normal_estimation.html. [Accessed

August 2016].

[49] A. Evans Patterson, “Registration and Recognition in 3D,” Publicly Accessible Penn

Dissertations. Paper 1401; University of Pennsylvania; pp. 9-15, 2014.

[50] D. Wegrzyn, “Creating 3D Object Descriptors Using a Genetic Algorithm,”

Universidade da Beira Interior; pp. 15-23, 2013.

[51] PCL, “FPFHEstimation Class Template,” [Online]. Available:

http://docs.pointclouds.org/trunk/classpcl_1_1_f_p_f_h_estimation.html#details.

86 Report

[Accessed August 2016].

[52] R. Rusu, N. Blodow and M. Beetz, “Fast point feature histograms (FPFH) for 3D

registration,” In Robotics and Automation, 2009. ICRA'09. IEEE International

Conference on (pp. 3212-3217). IEEE., 2009.

[53] PCL, “SampleConsensusInitialAlignment Class Template,” [Online]. Available:

http://docs.pointclouds.org/1.7.1/classpcl_1_1_sample_consensus_initial_alignment.ht

ml. [Accessed August 2016].

[54] PCL, “IterativeClosestPoint Class Template,” [Online]. Available:

http://docs.pointclouds.org/1.7.1/classpcl_1_1_iterative_closest_point.html. [Accessed

August 2016].

[55] P. J. Besl and N. D. McKay, “A Method for Registration of 3D Shapes,” In Robotics-DL

tentative (pp. 586-606). International Society for Optics and Photonics., 1992.

[56] ROS, “ROS.org Messages,” [Online]. Available: http://wiki.ros.org/msg. [Accessed

June 2016].

[57] A. Bicchi and V. Kumar, “Robotic grasping and contact: A review,” IEEE International

Conference on Robotics & Automation (pp. 348-353), 2000.

[58] A. Miller, S. Knoop and H. Christensen, “Automatic grasp planning using shape

primitives,” In Robotics and Automation, 2003. Proceedings. ICRA'03. IEEE

International Conference on (Vol. 2, pp. 1824-1829). IEEE., 2003.

[59] R. Pelossof, A. Miller, P. Allen and T. Jebara, “An svm learning approach to robotic

grasping,” In Robotics and Automation, 2004. Proceedings. ICRA'04. 2004 IEEE

International Conference on (Vol. 4, pp. 3512-3518). IEEE., 2004.

[60] J. Lenarcic, “An efficient numerical approach for calculating the inverse kinematics for

robot manipulators,” Robotica, vol. 3, no 01, p. 21-26, 1984.

[61] S. R. Buss, “Introduction to Inverse Kinematics with Jacobian Transpose,

Pseudoinverse and Damped Least Squares methods,” IEEE Journal of Robotics and

Automation, 17(1-19), 16., 2004.

[62] ROS.org, “XML specifications,” [Online]. Available: http://wiki.ros.org/urdf/XML.


[63] ROS.org, “Link element,” [Online]. Available: http://wiki.ros.org/urdf/XML/link.



[64] ROS.org, “Joint element,” [Online]. Available: http://wiki.ros.org/urdf/XML/joint.


[65] P. Beeson and B. Ames, “TRAC-IK: An Open-Source Library for Improved Solving of

Generic Inverse Kinematics,” In Humanoid Robots (Humanoids), 2015 IEEE-RAS 15th

International Conference on (pp. 928-935). IEEE., November 2015.

[66] L. Kavraki, P. Svestka, J.-C. Latombe and M. Overmars, “Probabilistic Roadmaps for

Path Planning in High-Dimensional Configuration Spaces,” IEEE transactions on

robotics and automation, vol 12 n4, 1996.

[67] S. LaValle, “Rapidly-exploring random trees: A new tool for path planning,” 1998.

[68] J. Kuffner and S. LaValle, “RRT-Connect: An Efficient Approach to Single-Query Path

Planning,” In Robotics and Automation. Proceedings. ICRA'00. IEEE International

Conference on (Vol. 2, pp. 995-1001). IEEE., 2000.

[69] D. Hsu, J.-C. Latombe and R. Motwani, “Path planning in expansive configuration

spaces,” In Robotics and Automation. Proceedings., 1997 IEEE International

Conference on (Vol. 3, pp. 2719-2726). IEEE., 1997.

[70] G. Sánchez and J.-C. Latombe, “A single-query bi-directional probabilistic roadmap

planner with lazy collision checking.,” In Robotics Research (pp. 403-417). Springer

Berlin Heidelberg., 2003.

[71] I. Sucan and L. Kavraki, “Kinodynamic motion planning by interior-exterior cell

exploration,” In Algorithmic Foundation of Robotics VIII (pp. 449-464). Springer Berlin

Heidelberg., 2009.

[72] S. M. LaValle, “Planning algorithms,” Cambridge university press, 2006.

[73] C. Rodríguez, A. Rojas-de-Silva and R. Suárez, “Dual-Arm framework for cooperative

applications,” Proc. IEEE Int. Conf. Emerging Technologies and Factory Automation,

ETFA, 2016.

[74] S. B. Shop, “Sports Ball Shop: Rugby ball buying guide,” [Online]. Available:

http://www.sportsballshop.co.uk/acatalog/Rugby-Ball-Buying-Guide.html. [Accessed

July 2016].

Documents

Object recognition and grasping using bimanual robot Report · IOC Institut d’Organització i Control de sistemes industrials KDL orocos Kinematics and Dynamics Library KPIECE Kinematic