38
Università della Svizzera italiana Dissertation Proposal Oct 2013 From Vision to Actions Towards Adaptive and Autonomous Robots Jürgen Leitner Abstract This document describes my proposed PhD research and presents a review of my work done so far, as well as, an outline of the work to be done to finalise my dissertation. The goal of this research is to overcome limitations of highly complex robotic sys- tems, especially in terms of autonomy and adaptation. Although robotics research has seen advances over the last decades robots are still not in wide-spread use outside in- dustrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. The main focus of research is to investigate the use of visual feedback for improv- ing reaching and grasping capabilities of complex robots. A developmental, step-wise learning approach is proposed to generate adaptive and autonomous perception and behaviours for the iCub humanoid robot. To facilitate this a combined integration of computer vision, artificial intelligence and machine learning is employed on the robot. Research Advisor Jürgen Schmidhuber Research Co-advisor Alexander Förster Internal Committee Members Michael Bronstein, Rolf Krause External Committee Members Peter Corke, Stefano Nolfi

From Vision to Actions - Juxi.net

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Universitàdella Svizzeraitaliana Dissertation Proposal

Oct 2013

From Vision to ActionsTowards Adaptive and Autonomous Robots

Jürgen Leitner

Abstract

This document describes my proposed PhD research and presents a review of mywork done so far, as well as, an outline of the work to be done to finalise my dissertation.

The goal of this research is to overcome limitations of highly complex robotic sys-tems, especially in terms of autonomy and adaptation. Although robotics research hasseen advances over the last decades robots are still not in wide-spread use outside in-dustrial applications. Yet a range of proposed scenarios have robots working together,helping and coexisting with humans in daily life. In all these a clear need to deal with amore unstructured, changing environment arises.

The main focus of research is to investigate the use of visual feedback for improv-ing reaching and grasping capabilities of complex robots. A developmental, step-wiselearning approach is proposed to generate adaptive and autonomous perception andbehaviours for the iCub humanoid robot. To facilitate this a combined integration ofcomputer vision, artificial intelligence and machine learning is employed on the robot.

Research AdvisorJürgen Schmidhuber

Research Co-advisorAlexander Förster

Internal Committee MembersMichael Bronstein, Rolf Krause

External Committee MembersPeter Corke, Stefano Nolfi

This proposal has been approved by the dissertation committee and the director ofthe PhD program:

Jürgen Schmidhuber, Research Advisor and Committee Chair,IDSIA (Dalle Molle Institute for Artificial Intelligence), Switzerland

Alexander Förster, Research Co-Advisor,IDSIA (Dalle Molle Institute for Artificial Intelligence), Switzerland

Michael Bronstein, Università della Svizzera Italiana, Switzerland

Peter Corke, Queensland University of Technology, Australia

Rolf Krause, Università della Svizzera Italiana, Switzerland

Stefano Nolfi, Consiglio Nazionale delle Ricerche, Italy

Antonio Carzaniga, PhD Program Director,Università della Svizzera Italiana, Switzerland

Contents

Contents iii

1 Introduction 1

2 Background 32.1 Artificial Intelligence, Machine Learning and Robotics . . . . . . . . . . . . 32.2 Vision and Robot Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Object Manipulation and Robots . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Towards Autonomous Object Manipulation in Humanoid Robots 93.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.2 Motion Generation and Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 163.3 Sensorimotor (Eye-hand) Coordination and Adaptation . . . . . . . . . . . 17

4 Timeline and Schedule for Future Work 194.1 Timeline of Work Performed So Far . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Schedule for Completion of the Dissertation . . . . . . . . . . . . . . . . . . . 20

Bibliography 21

Appendix A: icVision Architecture 31

iii

iv Contents

Chapter 1

Introduction

In the last century robots have transitioned from science fiction to science fact. After cen-turies of imagining automated machines that would help humans, the last few decadesbrought into existence an ever growing number of these programmable appliances. Thefield of robotics developed from interdisciplinary research in the fields of electronics,mechanics and computer science, with the first prominent robots, such as the famousShakey at SRI [Nilsson, 1969, 1984], appearing in academia during the 1960s. In itshonour the Association for the Advancement of Artificial Intelligence (AAAI) awards tro-phies, named "Shakeys", to the best robot and artificial intelligence videos every year.Our work on the iCub was honoured with the best student video award recently. In the1980s the digital revolution, yielding programmable personal computes and embeddedsystems, kick-started the creation of robotic systems and their use in large scale indus-trial production. Nowadays robotic arms are fulfilling automation tasks in factories allover the world. In recent years the field is moving towards other areas of utilisation,such as, for example, the use of robots in household settings. Proposed applicationsrange from cleaning tasks, grocery shopping to helping in hospitals and care facilities.These involve the robots working around humans in daily life situations.

My research presented herein is aiming to overcome the limitations of current robots,with a focus on object manipulation in unstructured environments using a complex hu-manoid robot. A better perception and sensorimotor coordination is expected to be oneof the key requirements to increase current robotic capabilities [Ambrose et al., 2012;Kragic and Vincze, 2009]. To facilitate this a combined integration of computer vision,artificial intelligence (AI) and machine learning (ML) on the robot is proposed.

To decide and act in unstructured environments, the robot needs to be able to per-ceive its surroundings, as it is generally not feasible or possible to provide the robot withall the information it needs a priori. This might be because of hardware limitations, suchas limited storage space, or because the information is simply not available or unknownat the time (the most striking example for this is robotic space exploration). Thereforeto extend robotic applications the need for more autonomous, more intelligent robots

1

2

arises. Sensory feedback, that is the robot’s perception, is of critical importance. Cre-ating a useful perception system is still a hard problem, but required for acting in apurposeful, some might say ‘intelligent’, way. In order to work naturally in human envi-ronments robots will need to be much more flexible and robust in the face of uncertaintyand unseen observations. To create better ‘perception’, ‘motion’ and ‘coordination’ anddeal with uncertainties I use various ML and AI techniques. The aim is to answer thequestion, whether it is possible (and if so how) for a humanoid robot to use visionand visual feedback to improve its reaching, grasping and (general) manipulation skills.Sensory information collected might be incomplete or inconsistent demanding meansof managing uncertainty. Research in the fields of AI and ML has extensively exploredways of dealing with incomplete information in the last decades. A wide range of sen-sors have been used to build models of the environment the robot is placex in. Visualfeedback, though it tends to be harder to interpret than other sensory information, isan active research area. A good motivation is that our world is built around (human)visual perception, which also means that to allow ‘natural’ interaction of robots it needsto be able to understand its environment based on the camera images it receives.

Another aspect of these ‘natural’ interactions is the human capability of adapting tochanging circumstances during the action execution. Even if the environment can beperceived precisely it will not be static in most settings. For this reason the robot needsto embody a certain level of adaption also on the motor side. This flexibility could againbe provided by AI and ML techniques, leading to robots capable of reaching, graspingand manipulating a wide range of objects in arbitrary positions.

There are currently still no robotic systems available that perform autonomous dex-terous manipulation (or do so only in very limited settings). At IDSIA the iCub humanoidrobot, a state-of-art high degree-of-freedom (DOF) humanoid robot is available (seeFigure 1.1) which was specifically designed for object manipulation research in roboticsystems. The overarching goal of this research is to extend the capabilities of our exper-imental platform to allow for more autonomous and more adaptive behaviours.

Figure 1.1. Our research platform, the iCub humanoid robot.

Chapter 2

Background and Related Work

Object manipulation in real-world settings is a very hard problem in robotics, yet it isone of the most important skills for robots to possess [Kemp et al., 2007]. Throughmanipulation they are able to interact with the world and therefore become useful andhelpful to humans. Yet to produce even the simplest human-like behaviours, a humanoidrobot must be able to see, act, and react continuously. Even more so for object manipula-tion tasks, which require precise and coordinated movements of the arm and hand. Theunderstanding of how humans and animals control these movements is a fundamentalresearch topic in cognitive- [Posner, 1989] and neuro-sciences [Jeannerod, 1997]. De-spite the interest and importance of the topic, e.g. in rehabilitation and medicine, theissues and theories behind how humans learn, adapt and perform reaching and grasp-ing behaviours remain controversial. Although there are many experimental studies onhow humans perform these actions, the development of reaching and grasping is still notfully understood and only very basic computational models exist [Oztop et al., 2004].Vision is seen as an important factor in the development of reaching and grasping skillsin humans [Berthier et al., 1996; McCarty et al., 2001]. For example, imitation of sim-ple manipulation skills has been observed already in 14-month-old infants [Meltzoff,1988]. Robots in contrast are only able to perform (simple) grasps in specific settings.

2.1 Artificial Intelligence, Machine Learning and Robotics

Although research has developed algorithms to play chess on a level good enough towin against (and/or tutor) the average human player [Sadikov et al., 2007], the roboticmanipulation of a chess piece, in contrast, is still not feasible at a human (not evena child like) level. To produce even the simplest autonomous, adaptive, human-likebehaviours, a humanoid robot must be able to, at least:

• Identify and localise objects in the environment, e.g. the chess pieces and board

• Execute purposeful motions for interaction, e.g. move a piece to a desired position

3

4 2.1 Artificial Intelligence, Machine Learning and Robotics

State-of-the-art humanoid robots such as Honda’s Asimo [Sakagami et al., 2002],NASA’s Robonaut and R2 [Bluethmann et al., 2003; Diftler et al., 2011], Toyota’s PartnerRobots [Takagi, 2006], Justin and TORO [Ott et al., 2012] from the German AerospaceCenter (DLR) and the iCub [Tsagarakis et al., 2007] are stunning features of engineer-ing. They are capable of producing complex, repeatable motions allowing them to walk,run and manipulate delicate objects such as musical instruments [Kusuda, 2008; Solisand Takanishi, 2011]. The caveat is that every last detail of these behaviours is cur-rently programmed by hand (often with the help of advanced tools). As a result thoseresourceful machines, full of capabilities, are not yet realising their full potential dueto the complexity of programming them in a flexible way. Their capacity to adapt tochanges in the environment or to unexpected circumstances are still very limited.

Along with the concept of adaptiveness comes the notion of acting autonomously. Toexploit the full versatility of an advanced robot, a broad spectrum of actions is required.In recent years the interest of the robotics community in Artificial Intelligence and Ma-chine Learning techniques as tools to build intelligent robots has increased. At the sametime, the ML community has shown an increased interest in robots as ideal test bench1

and application for new algorithms and techniques [Konidaris, 2013]. It uses these tech-niques, and tests them, in a physical entity and real environments to interact/move in.This ‘embodiment’ is seen as an important condition for the development of cognitiveabilities both in humans and robots [Brooks, 1999; Wolpert et al., 2001; Pfeifer et al.,2007]. ML algorithms, have been applied in experimental robotics to acquire new skills,however the need for carefully gathered training data, clever initialisation conditions,and/or demonstrated example behaviours limits the autonomy with which behaviourscan be learned. To build robots that can perform complex manipulation skills that helpusers in their activities of daily living (ADL) is the aim of various research projects [Can-gelosi et al., 2008; GeRT, 2012; THE, 2012; WAY, 2012].

Robot Learning: Cognitive, Developmental and Evolutionary Robotics

As mentioned above the programming of these highly complex robot systems is a cum-bersome, difficult and time-consuming process. It generally also describes each precisemovement in detail, allowing little to no flexibility or adaptation during execution. RobotLearning generally refers to research into ways for a robot to learn certain aspects by it-self. Instead of providing all information to the robot a priori, for example, possiblemotions to reach a certain target position, the agent will through some process ‘learn’which motor commands lead to what action. Cognitive Robotics, Developmental Roboticsand Evolutionary Robotics are fields that recently emerged with that specific aim to in-vestigate how robots can ‘learn’ for themselves and thereby generate more autonomousand adaptive capabilities.

1‘Embodiment’, i.e., the claim that having a body that mediates perception and affects behaviour playsan integral part in the emergence of human cognition and how we learn.

5 2.1 Artificial Intelligence, Machine Learning and Robotics

In Cognitive Robotics [Asada et al., 2001] the aim is to provide robots with cognitiveprocesses, similar to humans and animals. An integrated view of the body is taken,including the motor system, the perceptual system and the body’s interactions with theenvironment. The acquisition of knowledge, may it be through actions (e.g. motor bab-bling) or perception is a big part of cognitive robotics research. Another is the develop-ment of architectures for these tasks. A variety has been proposed [Burgard et al., 2005;Shanahan, 2006; Vernon et al., 2007b; Chella et al., 2008; Wyatt and Hawes, 2008], butthe promised improvements in robotic applications still need to be shown. This can beattributed to the varying definitions of cognition and the complex human cognitive sys-tem, whose workings are still not fully understood. To build cognitive architectures twodistinct approaches have been tried. The research seems to mainly focus on top-downarchitectures. A bottom-up approach has been described as more suitable for the usewith robots (e.g. the proposed iCub cognitive architecture Vernon et al. [2007a]).

Developmental Robotics [Asada et al., 2001; Weng, 2004; Kuipers et al., 2006; Mee-den and Blank, 2006; Asada et al., 2009] is aiming to put more emphasis on the devel-opment of skills. It is an interdisciplinary approach to developmental science. It differsfrom the previous approaches, as the engineer only creates the architecture and thenallows the robot to explore and learn its own representation of its capabilities (sensoryand motor) and the environment. As above, the body and its interactions with the envi-ronment are seen as being fundamental for the development of skills. Aims are to buildadaptive robotic systems by exploration and autonomous learning, i.e. learning withouta direct intervention from a designer [Lungarella et al., 2003]. Here interesting areas toexplore are selected by building on previous knowledge, while seeking out novel stimuli.

Evolutionary Robotics [Harvey et al., 1997; Nolfi and Floreano, 2000; Doncieux et al.,2011] is another approach to add adaptiveness and developmental processes to robots.It emerged as a new approach to overcome the difficulties of designing control systemsfor autonomous robots: (a) coordinating the (increasing) number of DOF both in me-chanics and control is hard, especially since the complexity scales with the number ofpossible interactions between parts [Cliff et al., 1993] (see ‘Curse of Dimensionality’)(b) the environment and how the robot interacts with it are often not known before.Its main focus is on evolve a control system based on artificial neural networks. Theseneuro-controllers (NC), inspired by the neuron activity in the human brain, have beenshown to work in a wide range of applications [Nolfi et al., 1994; Dachwald, 2004;Leitner et al., 2010]. An important issue is that to ‘learn’ behaviours, a large numberof iterations (or generations) is required. This works fine in simulation but is hard toachieve on a real robotic platform. Nolfi et al. [1994] showed that evolving a NC onhardware is, while time consuming, feasible, at least for simple mobile robots. Hybridapproaches, where NCs are trained first in simulation and then transferred to the realhardware, seem preferential. The performance of the controllers in the real world canthen be used to improve the simulation [Bongard et al., 2006]. How to effectively trainand apply NCs to real, high-DOF hardware is still an open research question.

6 2.2 Vision and Robot Perception

Other Approaches to robot learning have been developed in the past. The area of Re-inforcement Learning (RL) [Sutton and Barto, 1998] has appealed to many roboticists,especially for learning to control complex robotic systems. A general RL algorithm andthe means to inform the robot whether its actions were successful (positive reward) ornot (negative reward) is all that is required. RL and its applicability to humanoid robotshas been investigated by Peters et al. [2003]. Imitation Learning or ApprenticeshipLearning is of importance in human skill development as it allows to transfer skills fromone person to another. In robotics Robot Learning from demonstration or Programmingby Demonstration is a similar paradigm for enabling robots to learn to perform noveltasks. It takes the view that an appropriate robot controller can be derived from obser-vations of a another agent’s performance thereof [Schaal, 1999].

2.2 Vision and Robot Perception

To be useful in the above proposed scenarios a robot must be able to see, act, and reactcontinuously. Perception is a key requirement in order to purposefully adapt robot mo-tion to the environment, allowing for more successful, more autonomous interactions.Vision and the visual system are the focus of much research in psychology, cognitivescience, neuroscience and biology. A major problem in visual perception is that what in-dividuals ‘see’ is not just a simple translation of input stimuli (compare optical illusions).The research of Marr in the 1970s led to a theory of vision using different levels ofabstraction, from a two-dimensional visual array (projected onto the retina) to a three-dimensional description of the world as output. The stages include: a 2D sketch of thescene (using feature extraction), a sketch of the scene (using textures to provide moreinformation) and a 3D model of the world [Marr, 1982].

Research on perception has been an active component for developing artificial visionsystems, in industry and robotics. Computer Vision generally describes the field of re-search dealing with acquiring, processing, analysing, and understanding images in orderto produce decisions based on the observation. As a scientific discipline, computer visionis concerned with the theory behind artificial systems that extract information from avariety of image data, such as video sequences, views from multiple cameras, or multi-dimensional/spectral data, e.g. from a medical scanner. The fields of computer visionand AI have close connections, e.g. autonomous planning or decision making for robotsrequire information about the environment, which could be provided by a computer vi-sion system. AI and computer vision share other topics such as pattern recognition andlearning techniques. Furthermore computer vision spawned a multitude of researchsub-domains, such as, scene reconstruction, event detection, object recognition, motionestimation, etc.

The first thing done by the human visual system is widely understood to be thesegmentation and detection of objects in the visual space (the 2D images). In visionresearch this would fall into the area of ‘image processing’ [Gonzalez and Woods, 2002].

7 2.3 Object Manipulation and Robots

The techniques generally provide ways of extracting information from the image dataand can be grouped into the following categories: pre-processing (e.g. noise reduction,enhancement, scaling, etc.), feature extraction (e.g. lines, edges, interest points, etc.),segmentation (e.g. separating fore- and background), and high-level processing (e.g.recognition and decision making). Another important topic in computer vision is ‘imageunderstanding’. With the aid of geometry, physics, statistics, and learning the goal is tomimic the abilities of the human (visual) perception system.

Research into vision for the special requirements of robotic systems is referred toas robot vision or machine vision [Horn, 1986; Hornberg, 2007]. For example, visualfeedback has extensively been used in mobile robot applications, for obstacle avoid-ance, mapping and localisation. With the advancement of humanoids and the increasedinterest in working around humans, object detection and manipulation are more andmore driving the development of robot vision systems. An important problem is thatof determining whether or not the image data contains some specific object, feature,or activity. While this has been researched for quite some time already, the task seemsharder than expected and no solution for the general case of detecting arbitrary objectsin arbitrary situations exists. Most of the work is heavily relying on artificial landmarksand fiducial markers to simplify the detection problem. Furthermore existing methodscan at best solve it for specific objects (simple geometries, faces, printed or hand-writtencharacters, or vehicles) and in specific situations (in terms of well-defined illumination,background, and pose of the object wrt. the camera). For a detailed introduction andoverview of the foundations and the current trends the reader is referred to the excellentsurvey by Kragic and Vincze [2009].

2.3 Object Manipulation and Robots

Only after the scene is observed and the robot has an idea about which objects are inthe environment it can start interacting with these. But manipulating arbitrary objectsis not a trivial thing, even for humans. The development of hand control in children, foran apparently simple, prototypical precision grasp task is not matured until the age of8-10 years [Forssberg et al., 1991]. Moreover, complexity, as can be seen by the numberof neurons comprising the control of the arm and hand, is staggeringly high. Even aftermanipulation skills have been learnt they are constantly adapted by an perception-actionloop to yield desired results.

In recent years good progress was made with robotic grasping of objects. The variousmanipulators, mainly hands and grippers, and techniques clearly improved. Also novelconcepts of ‘grippers’ have been designed and some are quite ingenious solutions tosome of the issues. One such example is the granular gripper made by Brown et al.[2010], which is made out of grounded coffee beans which are able to ‘flow’ around theobject and then fixed in position by creating a vacuum. This concept has recently beenextended to a full sized arm elephant-trunk-like arm [Cheng et al., 2012]. Also in terms

8 2.3 Object Manipulation and Robots

of how to grasp objects with regular grippers and ‘hands’ recent results highlight theadvanced state of research in grasping. For example, Maitin-Shepard et al. [2010], withtheir research showed that robots are able to pick up non-rigid objects, such as, towels.Their robot is able to reliably and robustly pick up a randomly dropped towel from atable by going through a sequence of vision-based re-grasps and manipulations-partiallyin the air, partially on the table. In the DARPA ARM project, which aims to create highlyautonomous manipulators capable of serving multiple purposes across a wide varietyof applications, NASA’s JPL winning team showed an end-to-end system that allows therobot to grasp diverse objects (e.g. power drill, keys, screwdrivers, ...) from a table[Hudson et al., 2012; Hebert et al., 2012]. On the other hand Saxena et al. [2008] havepresented a way for a robot to learn, from only a small number of real world examples,where good grasping points are on a wide variety of previously unknown objects.

All this has lead Dr. Pratt, DARPA Manager of robotics related research projects,to his somewhat controversial statement of "Grasping is solved"2 at last year’s IROSconference. While this might be a bit too optimistic it seems like the research is ata good enough state to have better system integration. The direct interface betweenvarious components, which makes robotics such a hard but interesting field, clearlyneed to improve to allow for robust object manipulation. Only by combining sensing andcontrol of the whole robotic platform a fully functional ‘pick-and-place’ capable systemwill appear. To allow for a variety of objects to be picked up from various positions therobot needs to see, act and react within such a tightly integrated control system.

A must read for roboticists is Corke [2011]’s ‘Robotics, Vision and Control’. It puts inthe spotlight the integration of these three components and describes common pitfallsand the issues that arise with integration.

Our Experimental Platform: The iCub Humanoid at IDSIA

The iCub humanoid [Tsagarakis et al., 2007] (shown in Figure 1.1) is an open-systemrobotic platform developed during various European projects. It consists of two armsand a head attached to a torso roughly the size of a human child.3 The head andarms follow an anthropomorphic design and provide a high DOF system that is usedfor researching human cognitive and sensorimotor development. The iCub is an excel-lent experimental platform for cognitive and sensorimotor development and embodiedartificial intelligence [Metta et al., 2010] and was designed to investigate human-likeobject manipulation. The robot’s movements need to be coordinated with feedback fromvisual, tactile, and acoustic4 perception. Of interest is also to test the scalability of ma-chine learning methods towards such complex systems interacting with the real-world[Levinson et al., 2010; Sicard et al., 2011; Leitner et al., 2012e].

2Plenary at IROS 2012: http://www.iros2012.org/site/node/133Legs are under development – recently first standing/balancing demos with the iCub were shown.4Acoustic feedback might be used when manipulation objects to know if a grasp was successful.

Chapter 3

Towards Autonomous ObjectManipulation in Humanoid Robots

One of the most important problems in robotics at the moment is to improve the robots’abilities to understand and interact with the environment around it: it needs to be ableto perceive, detect & locate objects in its surrounding and then be able to plan and ex-ecute actions to manipulate these. To enable a more autonomous object manipulation,more specifically how to enable some level of eye-hand coordination to perform actionsmore successfully, is of high interest to the robotics community (see e.g. NASA’s SpaceTechnology Roadmap calls for ‘Real-time self-calibrating hand-eye System’ [Ambroseet al., 2012]). I am working on integrating learning and object manipulation in roboticsystems. IDSIA is especially interested in the investigation of how rewards and motiva-tion effect the development of complex actions and interactions between an (embodied)

Figure 3.1. IDSIA’s research towards a functional eye-hand coordination on the iCub.

9

10 3.1 Perception

agent and the environment.1 My aim is to improve adaptivity and autonomy in robotgrasping based on visual feedback to close the loop and perform grasping of objects,while adapting to unknown, complex environments. Figure 3.1 gives an overview of theresearch on-going at IDSIA towards the goal of integrating eye-hand coordination intothe iCub. On top the perception side is shown. It includes the modules for the detectionand identification of objects (in the images), as well as, the localisation (in 3D Cartesianspace). The details of these implementations are described in Section 3.1. The bottomhave shows the action, motion side. To generate motion from learning (see Section 3.2)a crucial feature is the avoidance of collisions, both between the robot and the environ-ment and the robot and itself. Section 3.3 aims to describe how we aim to generate alevel of eye-hand coordination not previously seen on the iCub.

3.1 Perception

My PhD research so far has put an emphasis on investigating visual perception in robots.The term refers, in humans and animals, to the ability to interpret their environmentbased on the information from visible light reaching the eyes. In robots, to imitate eyes,cameras are used to capture the light of the environment. The reason for the focus onvision is twofold, firstly the sensing capabilities of robotic platforms (cameras are cheap)and secondly, vision is the most important sense for humans. Another reason is that witha humanoid robot, such as the iCub at IDSIA, a natural tendency exists to be inspiredby human perception and behaviour. A lot of my work, in the last two years, focussedon building a visual perception system for the iCub humanoid, somewhat similar to thedescription of human perception by Marr [1982]. Various experiments performed overthe course of the last two years are described briefly in the next sections.

Framework for Robot Vision and Cognitive Perception

During the first years of my PhD I developed the icVision framework at IDSIA [Leitneret al., 2012c, 2013b]. It allows for an easier development, testing and integration ofthe on-going computer vision research into the real hardware. One of the main de-sign goals is to allow rapid prototyping and testing of vision software on the iCub andreduce development time by removing redundant and repetitive processes. icVision isimplemented in C++, open-source and uses the YARP middleware and OpenCV for theunderlying image processing. Its various modules provide means to detect (known) ob-jects in the images and estimate their 3D location wrt. the robot’s reference frame (seeAppendix A). It interfaces with the MoBeE environment to place a (given) geometricsubject into the robot’s world model enabling the robot to, for example, avoid collidingwith them during operation. An overview of the experiments done using the frameworkcan be found in [Leitner et al., 2013a].

1My research is partly funded by the European Union grant ‘IM-CLeVeR’ [Baldassare et al., 2009].

11 3.1 Perception

Visual Perception and Object Detection

As mentioned in Section 2.2, object detection is not yet solved in a general sense, yetit is a very important issue to enable autonomous grasping. A hard problem on theiCub in particular although there has been some recent progress, especially from the labat IIT [Ciliberto et al., 2011; Fanello et al., 2013; Gori et al., 2013]. For autonomousobject manipulation with humanoid robots it is of importance to detect and identifythe objects to interact with in the environment. This is still a challenging problem inrobotics, especially in settings where lighting varies, viewing angles change and theenvironment can be described as ‘cluttered’ (i.e. lots of different objects in the scene,partly obstructing each other). Given the maturity of the field of image processing,it should be possible to construct programs that use much more complicated imageoperations and hence incorporate domain knowledge.

We developed a technique based on Cartesian Genetic Programming (CGP) [Miller,1999, 2011] allowing for the automatic generation of computer programs for robotvision tasks [Harding et al., 2013]. A large subset of the functionality of the freelyavailable OpenCV image processing library [Bradski, 2000] are used as building-blocksof these programs. The implementation provides an effective method to learn (unique)object detection modules. If the training set is chosen correctly, even different lightingconditions are no longer problematic [Leitner et al., 2012d]. The basic algorithm worksas follows: Initially, a population of candidate solutions is composed of randomly gener-ated individuals. Each of these individuals, represented by its genotype, is tested to seehow well it performs the given task, i.e. segment the object of interest from the rest ofthe scene in a set of test images. This step, known as evaluating the ‘fitness function’, isused to assign a numeric score to each individual in the population. Generally, the lowerthis error, the better the individual is at performing the task. The next step generatesa new population of individuals from the old population. This is done by taking pairsof the best scored genotypes and performing functions analogous to recombination andmutation. Fitness scores are then calculated for these new individuals are then testedus- ing the fitness function. The process of test and generate is repeated until a solutionis found or until a certain number of individuals have been evaluated.

While much of the previous work with machine learning in computer vision has fo-cused on the use of grey scale images, our implementation Cartesian Genetic Program-

Figure 3.2. Example illustration of a CGP-IP genotype.

12 3.1 Perception

ming for Image Processing (CGP-IP) can handle colour images with multiple channels.Treatment for colour images is to separate them into RGB (red, green and blue) andHSV (hue, saturation and value) channels, and provide these as available inputs. Eachavailable channel is presented as an input to CGP-IP, and evolution selects which inputswill be used, as all functions operate on single channel images. Our implementationgenerates human readable C# or C++ code based on OpenCV to be run directly onthe real hardware – using our icVision framework. An example of a CGP-IP genotype isillustrated in Figure 3.2. The first three nodes obtain the components from the currenttest image (e.g. grey scale version, red and green channels). The fourth node adds thegreen and red images together. This is then dilated by the fifth node. The sixth node isnot referenced by any node connected to the output (i.e. it is neutral), and is thereforeignored. The final node takes the average of the fifth node and the grey scale input.

We apply CGP-IP to a variety of problems in computer vision. The efficacy of ourapproach has been shown in basic image processing (noise reduction), medical imaging(mitosis detection) [Harding et al., 2013] and object detection in robotic applications.

Visual Terrain Classification As a first example, we apply CGP-IP to visual detectionand identification of rocks and rock formations. Automatically classifying terrain suchas rocks, sand and gravel from images is a challenging machine vision problem. Usingdata from mobile robots (on other planets) our system learns classifiers and detectorsfor different terrain types from hand-labelled training images. The learned programoutperforms currently used techniques for classification when tested on a panoramaimage collected by the Mars Exploration Rover Spirit [Leitner et al., 2012b].Object Detection and Identification Vision systems are expected to work in ‘real-world’, cluttered environments, with lots different objects visible. We show that theCGP-IP approach is able to work robustly in different lighting conditions, and when thetarget object is moved or rotated. To train, a number of images are collected from theiCub cameras. In each one the target (and other objects) is repositioned. Hence ourtraining set implicitly contains multiple views, different angles, scales and lighting con-ditions. In the training set the target object is segmented by hand. Figure 3.3 showsthe result of one evolved programme for a tea box. Nine training images were used for

Figure 3.3. The detection of a tea box in changing lighting condition performed by alearned CGP-IP filter. Full video at: http://www.youtube.com/watch?v=xszOCj4A1eA

13 3.1 Perception

Figure 3.4. The detection of the iCub’s own fingertips and hands. Green indicatescorrect detection, red and blue respectively show not and wrongly detected pixels.

training. It can be seen that the trained individual is able to cope with variations inscale, orientation and lighting. Unique programmes, allowing for identification as wellas detection, were evolved for a variety of objects of interest, e.g. different cups, tincans, kids blocks and tea boxes.

Robot Hand Detection CGP-IP was then applied to detect the robot’s own hands andfingers in the images – an important prerequisite for eye-hand coordination. Previousapproaches make use of external systems or markers to provide the position of the hand.In contrast we do so using only camera images. At first we verified our approach by vi-sually detecting the hands of the Nao robot, which are of a less complex design and alsosimpler in appearance. The detection of the mechanically complex hands and fingers ofthe iCub seems to be a much harder computer vision task. Separate programs resultedin better detection of the fingers and the hand [Leitner et al., 2013c] (Figure 3.4).

Autonomous Learning of Object Detection CGP-IP is using a supervised learning tech-nique. An obvious limitation is that the training images need to be provided. In our firstobject detection experiments with CGP-IP on the robot these images were hand-labelledin a rather tedious and time-consuming fashion. We noticed that the area specified bythe labels did not need to be very precise as our technique seems to be robust to someerrors in the provided bounding labels. The evolved solution was able to segment theobject according to its outline rather than the label provided. A visual system that ac-tively explores the environment is combined with CGP-IP to create a more autonomousdetection of objects [Leitner et al., 2012a]. A neuromorphic model, based on Itti et al.[1998], is applied to identify salient points in the scene. Around these points featuresare detected using the FAST corner detection algorithm [Rosten et al., 2010]. Localdescriptors are calculated using Gabor wavelets, i.e. convolutional kernels having theshape of plane waves restricted by a Gaussian envelope function [Wiskott et al., 1997].The points are matched between the two camera images, to allow a rough segmentationof the object based on depth information. Clustered matched interest points providethe rough segmentation for the CGP-IP training set. The system enables the robot to

14 3.1 Perception

autonomously learn to detect and identify various objects on a table and build a worldmodel using the spatial information as described below.

It is worth mentioning that this object detection, identification, and tracking is cur-rently implemented to use only single frames. We believe that the performance can beincreased further by using information from multiple, sequential images, such as, opticalflow, motion tracking, or just simply propagating information about the segmentation(e.g., centre and outline of blob) to the next frame.

Spatial Perception and Object Localisation

To localise objects in the environment the iCub has to rely solely on a visual systembased on stereo vision, similarly to human perception, The iCub’s cameras are mountedin the head of the robot therefore the method for localisation must be able to cope withmotion to work while the robot is controlling its gaze and upper body for reaching. Aspart of my thesis work I am investigating how spatial representation can be learned fromvisual feedback [Leitner et al., 2012d,e].

Developing an approach to perform robust localisation on a real humanoid robot isimportant. It is a necessary input for current on-line motion planning, reaching, and ob-ject manipulation systems. Stereo vision describes the extraction of 3D information outof images and is similar to the biological process of stereopsis in humans. Its basic prin-ciple is the comparison of images taken of the same scene from different viewpoints. Toobtain a distance measure the relative displacement of a pixel between the two imagesis used [Hartley and Zisserman, 2000]. The most logical choice of coordinate system,from a planning and control standpoint, in our case is a world reference frame originat-ing from the robot’s hip. To transform coordinates from the eyes’ frames to the worldcoordinates an accurate kinematic model of the robot is necessary.

While these approaches, based on projective geometry, have been proven effectiveunder carefully controlled experimental circumstances, they are not easily transferredto robotics applications. For the iCub platform several approaches have previously beendeveloped. One of these methods is a biologically inspired approach mimicking the hu-man retina [Traver and Bernardino, 2010]. Another, the ‘Cartesian controller module’,provides basic 3D position estimation functionality [Pattacini, 2011] and gaze control.This module works very well on the simulated iCub, however on the hardware platformmultiple sources of noise and errors exist. It therefore generates absolute errors in the2-4 cm range, in an area where the humanoid can reach and manipulate objects2.

We use a learning based approach to develop spatial perception on the iCub. AKatana robotic arm was used to teach the iCub how to perceive the location of the objectsit sees. To do this, the Katana positions an object within the shared workspace, and tellsthe humanoid where it has placed it. While the iCub moves about, it observes the object,and the robot ‘learns’ how to relate its pose and visual inputs to the object location in

2Even bigger errors are seen at the edge of the frame or further away from the robot.

15 3.1 Perception

Cartesian coordinates. We show that satisfactory results can be obtained for localisationeven in scenarios where the kinematic model is imprecise or not available. Furthermore,we demonstrate that this task can be accomplished safely. For this task MoBeE was usedto prevent collisions between multiple, independently controlled, heterogeneous robotsin the same workspace (see Section 3.2).

The learning is done on a feed-forward artificial neural network (ANN), consistingof three layers: one input layer, a hidden layer, and an output layer. This structure ofan ANN is sometimes also referred to as a multi-layer perceptron [Fausett, 1994]. Eachinput, in our case image coordinates and robot pose information, arrives at an inputnode in the input layer of our ANN. As our network is fully connected this input is thenprovided to every node at the hidden layer. Each connection has a weight attached toit. In addition to all inputs, each hidden node also has a connection to a bias node.Similarly the output value of every hidden node is then provided to the output node.These connections are again weighted. The output node is as well connected to a biasnode. We use neurons in the hidden and output layer containing a sigmoidal activa-tion function of the following form to calculate their respective outputs. The ANNs weretrained using the standard error back-propagation algorithm [Russell and Norvig, 2010]method on the dataset collected. This method is a generalisation of the delta-rule andrequires the activation function of the neurons to be differentiable. Back-propagationcan be described in two phases: (i) a forward propagation and (ii) a weight updatephase. In phase one inputs from the training dataset are propagated through the net-work using the current weights for each connection. The error between the networksoutput and the correct output is then used to calculate a gradient at each connection,which defines how the weights will be updated.

We show that the robot can learn to estimate object locations based on visual per-ception. Our approach does not need explicit knowledge of a precise kinematic modelor camera calibration. It learned to localise objects with comparable accuracy to cur-rently available systems. Furthermore our approach is computationally less expensiveand less dependent on other concurrently running modules (such as image rectifyingtechniques). As the learnt models are ‘light weight’ they could easily be incorporatedinto embedded systems and other robotic platforms. The resulting locations are easilyusable with the current operational space controller on the iCub. The results using ourfirst collected dataset show the feasibility of the method to perform full 3D position es-timation. The experiments show that the approach can learn to perceive the location ofobjects in contrast to currently existing engineered methods.

In addition our method can learn whether an object visible in both cameras is withinthe reachable workspace of the robot. While learning spatial perception in Cartesian co-ordinates is not necessarily the best option, also humans are not good at precisely deter-mining distances of perceived objects, it enables the use of currently existing operationalspace controllers on the iCub.

16 3.2 Motion Generation and Learning

3.2 Motion Generation and Learning

Even with great advances in sensing technologies, the motion of the robot still needsto be controlled to perform manipulation tasks. While explicit model-based control isstill the prime approach — which work very well when the world’s state is known — ithas limitations when it comes to uncertainties. Robots, used in a human environment,cannot expect to estimate the state of the surroundings with certainty. It seems almostinevitable that learning will play an important role in robot manipulation. Directlyprogramming robots by writing code can be tedious, error prone, and inaccessible tonon-experts. Through learning, robots may be able to reduce this burden and continueto adapt even after being deployed, creating some level of autonomy. As our platformis a highly complex system, learning and the approaches described above are deemedto be the most suitable. For example, Hart et al. [2006] showed that a developmentalapproach can be used for a robots to learn to reach and grasp. When attempting tosynthesise behaviour on a complex robot like the iCub, the shortcomings of state-of-the-art learning and control theories can be discovered and addressed in future research.

The issue of uncertainty in real-world applications has been addressing these withrobust closed-loop controllers using sensory feedback. For example, Piatt et al. [2006]and the Robonaut group at NASA/JSC and Hart et al. [2006] at UMass Amherst haveexplored ways to learn and compose real-time, closed-loop controllers in order to flex-ibly perform a variety of autonomous manipulation tasks in a robust manner. Edsingerand Kemp [2006] at MIT often use hand-coded behavior-based controllers that specifytasks in terms of visual or other feedback driven control.

We have developed a novel, Modular Behavioural Environment (MoBeE) frameworkfor humanoids and other complex robots, which integrates elements planning, and con-trol, facilitating the synthesis of autonomous, adaptive behaviours. MoBeE has beenused in various experiments with the iCub humanoid. From adaptive roadmap plan-ning by integrating a reactive feedback controller into a roadmap planner [Frank et al.,2012], to teaching the iCub spatial perception [Leitner et al., 2012e]. It allows the in-tegration of suitable learning methods for controlling the robot with existing softwaretools and the on-going research in other fields in our lab.

Generally the approach is to start with low complexity control, such as, controllingthe robot’s gaze and will slowly include more DOF until the full 7DOF arms are con-trolled for reaching. This has been extensively investigated previously, also on the iCub,e.g. using a combination of open and closed loop control [Natale et al., 2007; Nori et al.,2007] and motor babbling [Caligiore et al., 2008]. To advance towards the goal of gen-eral object manipulation, where a robot can autonomously manipulate any given objectwithin its workspace, it would ideally encode and reuse knowledge in terms of task fea-tures that are invariant to the object. Confronted with a novel instance of a specific taskthe robot needs to establish appropriate correspondences between objects and actionsin its repertoire and their counterparts in the current task.

17 3.3 Sensorimotor (Eye-hand) Coordination and Adaptation

At IDSIA we pursue this by building task-relevant roadmaps (TRMs), using a newsampling-based optimiser called Natural Gradient Inverse Kinematics (NGIK) based onprevious research in our lab [Stollenga et al., 2013]. To build TRMs, NGIK iterativelyoptimises postures covering task-spaces expressed by arbitrary task-functions, subjectto constraints expressed by arbitrary cost-functions, transparently dealing with bothhard and soft constraints. TRMs are grown to maximally cover the task-space whileminimising costs. Unlike Jacobian methods, our algorithm does not rely on calculationof gradients, making application of the algorithm much simpler. In our experimentsNGIK outperforms recent related sampling algorithms.3

Recently during the course of the IM-CleVeR final demo we showed that MoBeE alsoenables the robot to ‘learn’ to plan motions using reinforcement learning (RL) and ‘arti-ficial curiosity’. While these fields usually are employed in toy-scenarios, e.g. navigationin a simple maze, in our case we embody it into the real high-DOF robotic hardware. Alow-level reactive controller is coupled with our RL framework to find ‘interesting’ statesthrough interaction with the environment. We show that the robot can learn models torepresent large regions of the iCub’s configuration space [Frank et al., 2013].

3.3 Sensorimotor (Eye-hand) Coordination and Adaptation

Although there exists a rich body of literature in computer vision, path planning, andfeedback control, wherein many critical subproblems are addressed individually, mostdemonstrable behaviours for humanoid robots do not effectively integrate elementsfrom all three disciplines. Consequently, tasks that seem trivial to us humans, suchas picking up a specific object in a cluttered environment, remain beyond the state-of-the-art in experimental robotics.

The main aim of my doctoral research is a closer integration of the sensory and motorsides and creating an closely tied action-perception loop. We created interfaces betweenMoBeE and icVision allowing for a preliminary visual based localisation of the detectedobjects into the world model. Currently we are performing very simple operationalspace control of the robot arm based on the reported location using the approachesdescribed above. This, rather open-loop control, is the first step towards visual guidedreaching and grasping of objects with the iCub. It enables the robot to detect an objectin the visual frame then localise it in Cartesian coordinates and eventually executinga reach, with either the existing operational space controller or our TRM approach.While this approach is already a step forward of current systems, it requires a veryaccurate calibration of both camera and mechanical links to be successful. Howeversome mechanical non linearities still cannot be taken into account.

Clearly vision is an important thing for a humanoid robot, but it needs to be closelyintegrated into the control system. Sensorimotor Development is aiming to learn a ba-sic eye-hand coordination. Langdon and Nordin [2001] have shown this on a simple

3The demo video won the AAAI Best Student Video Award ‘Shakey’: http://youtu.be/N6x2e1Zf_yg

18 3.3 Sensorimotor (Eye-hand) Coordination and Adaptation

humanoid robot using GP techniques. Various methods for learning this sensorimotormapping have been investigated [Hoffmann et al., 2005; Hülse et al., 2010]. Theselead to biologically inspired mappings, yet applying these directly to control the robot isstill an issue. Adaptation is needed for precise object manipulation, as highly accuratemodels of the world and also of the robotic system can not be assumed in most inter-esting scenarios. An estimation of the robot kinematics might help in generating moreprecise motions. So far no module exists to estimate the kinematics of the iCub, thisis partly due to the openly available CAD models and thorough calibration proceduresthat need to/should be applied regularly. For other robotic platforms machine learninghas been used to estimate the kinematic model, e.g. Gloye et al. [2005] used visualfeedback to learn the model of a holonomic wheeled robot and Bongard et al. [2006]used low-dimensional sensory feedback to learn the model of a legged robot.

Currently I am looking at ways to close the control loop on complex, high-DOF hu-manoid robots. Visual Servoing is a commonly used approach to closed-loop vision basedcontrol. It refers to the use of computer vision data to control the motion of a robot[Chaumette and Hutchinson, 2006, 2007]. It relies on techniques from image process-ing, computer vision, and control theory. The visual information is usually coming froma camera mounted on a robot manipulator or a mobile robot. Various configurationsof the robot and camera(s) are possible, the most common in literature being the eye-in-hand case. Visual servoing (VS) has been investigated to be used with human-stylerobots, but there is yet no publication (known to the author) that uses a full humanoidsetup, as in our research platform the iCub. The issue here is that it is not a eye-in-handnor a eye-to-hand system, as the camera is on the robot, but only some of the reachingmovements will imply a motion also with the head (this only happens when the hipis moved). Generally there are two separate approaches: a position-based VS (PBVS)and an image-based VS (IBVS). There is no definitive answer which ones is better, PBVStends to be used more in setups with a 3D sensor (e.g. LIDAR, Kinect, etc.) whereasIBVS seems to be preferential when using cameras, like in our case.

In the next few months I will add a (most likely) image-based VS to our system. Withour current setup we should be able to measure the error between the hand (which wecan filter) and the object we want to pick up, similar to previous approaches on lesscomplex robots or with the help of external sensors [Hager et al., 1995]. More exper-iments will be added to show the possibility of ‘forcing’ the robot arm in operational(or image) space to reduce the error, hopefully leading to pick-and-place actions for avariety of objects at random location on the table.

Chapter 4

Timeline and Schedule for FutureWork

The PhD regulations at USI state that the dissertation proposal shall also describe ‘atimeline of the work performed since the beginning of the PhD, and a schedule for conduct-ing and completing the work’. The following sections are aiming to do exactly that.

4.1 Timeline of Work Performed So Far

I started working at the Dalle Molle Institute for AI (IDSIA) Robotics Lab in Oct 2010.After applying to the PhD programme of the Faculty of Informatics at the Universitàdella Svizzera Italiana (USI), I officially became a PhD student in Feb 2011. The re-search work at IDSIA is project driven. The EU funded STIFF research project aimed toenhance biomorphic agility of robot arms & hands through variable stiffness and elastic-ity. The focus of my work was on using ML techniques to model data from human trials,in collaboration with J. McIntyre’s lab at UPD, Paris. The tasks were mainly focussed onreaching and motions along rigid surfaces with variable ‘stiffness’ applied by the humansubject. The project finished in early 2012. The next project I worked on is IM-CLeVeR.It aims to develop a new methodology for designing robot controllers that can: "(1)cumulatively learn new, efficient skills through autonomous development based on in-trinsic motivations, and (2) reuse such skills for accomplishing multiple, complex, andexternally-assigned tasks." [Baldassare et al., 2009] My work focusses on developingand implementing a computer vision framework, for object detection and localisationon the iCub. I started investigating how object manipulation skills can be learned andhow vision can help in those situations. This project is currently finishing.

In 2013 work started on the WAY project, in which, together with Scuola SuperioreSant’Anna in Pisa, we aim to develop non-invasive methods that allow bidirectional linksbetween a hand assistive device and a patient. I apply machine learning techniques todecode multilevel biosignals (mainly EMG) for controlling a hand prosthesis.

19

20 4.2 Schedule for Completion of the Dissertation

PhD Requirements at USI

Teaching I had the great opportunity to be teaching assistant to Dr. Alexander Försterfor his course on “Systems Programming” (in the BSc track) already three times.

Breadth of Knowledge USI requires from students to pass at least 12 ECTS points ofcourses in multiple disciplines of informatics. I attended the following:

• Introduction to Doctoral Studies (mandatory), 2 ECTS

• Intelligent Systems (cross listed from MSc programme), 4 ECTS

• Research Policy and Grant Proposal Writing, 3 ECTS

• Innovation and Patents, 2 ECTS

• Gate-Level Hardware Design and Arithmetics, 2 ECTS

In addition I attended three summer schools: the EUCog Summer School ‘Neural dy-namics approach to cognitive robotics’ (2011), the iCub Summer School ‘Veni, Vidi, Vici’(2012) and the IEEE Summer School ‘Robotic Vision and Applications’ (2012). Thesetogether should suffice the requirement for ‘breadth of knowledge’.

Scientific/Academic Work

During my PhD so far, especially thanks to a fruitful collaboration with Dr. Harding,I have been able to author various papers (most of them as first author), includingone book chapter [Harding et al., 2013], 3 journal publications and several papers athighly-ranked conferences in the field of robotics (IROS), AI/ML (CEC, IJCNN), anddevelopmental robotics (ICDL/EpiRob). Further submissions to various robotic journalsare planned. I was also reviewing submissions for conferences and journals.

4.2 Schedule for Completion of the Dissertation

A rough plan for finishing my PhD project includes the following steps:

• closer sensorimotor integration of perception and actions [1-2 months]

• implementation of selected vision-based manipulation methods (and preliminaryexperiments) on the iCub [2-3 months]

• finalising the implementation of a visual reaching and grasping on the iCub andgathering of the required experimental results [2-3 months]

• finalising and thesis write-up [1-2 months]

Bibliography

R. Ambrose, B. Wilcox, B. Reed, L. Matthies, D. Lavery, and D. Korsmeyer. Robotics,tele-robotics, and autonomous systems roadmap. Technical report, NASA, 2012. URLhttp://www.nasa.gov/pdf/501622main_TA04-ID_rev6b_NRC_wTASR.pdf.

M. Asada, K.F. MacDorman, H. Ishiguro, and Y. Kuniyoshi. Cognitive developmentalrobotics as a new paradigm for the design of humanoid robots. Robotics and Au-tonomous Systems, 37(2):185–193, 2001.

M. Asada, K. Hosoda, Y. Kuniyoshi, H. Ishiguro, T. Inui, Y. Yoshikawa, M. Ogino, andC. Yoshida. Cognitive developmental robotics: A survey. IEEE Trans. AutonomousMental Development, 1:12–34, 2009.

G. Baldassare, M. Mirolli, F. Mannella, D. Caligiore, E Visalberghi, F. Natale, V. Truppa,G. Sabbatini, E. Guglielmelli, F. Keller, D. Campolo, P. Redgrave, K. Gurney, T. Stafford,J. Triesch, C. Weber, C. Rothkopf, U. Nehmzow, J. Condell, M. Siddique, M. Lee,M. Huelse, J. Schmidhuber, F. Gomez, A. Förster, J. Togelius, and A. Barto. The IM-CLeVeR project: Intrinsically motivated cumulative learning versatile robots. In Proc.of the International Conference on Epigenetic Robotics: Modeling Cognitive Developmentin Robotic Systems, 2009.

N.E. Berthier, R.K. Clifton, V. Gullapalli, D.D. McCall, and D.J. Robin. Visual informationand object size in the control of reaching. Journal of Motor Behavior, 28(3):187–197,1996.

W. Bluethmann, R. Ambrose, M. Diftler, S. Askew, E. Huber, M. Goza, F. Rehnmark,C. Lovchik, and D. Magruder. Robonaut: A robot designed to work with humans inspace. Autonomous Robots, 14(2):179–197, 2003.

J. Bongard, V. Zykov, and H. Lipson. Resilient machines through continuous self-modeling. Science, 314(5802):1118–1121, 2006.

G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Software Tools, 2000.

R.A. Brooks. Cambrian intelligence: the early history of the new AI. The MIT Press, 1999.

21

22 Bibliography

E. Brown, N. Rodenberg, J. Amend, A. Mozeika, E. Steltz, M.R. Zakin, H. Lipson, andH.M. Jaeger. Universal robotic gripper based on the jamming of granular material.Proceedings of the National Academy of Sciences (PNAS), 107(44):18809–18814, 2010.

W. Burgard, M. Moors, C. Stachniss, and F.E. Schneider. Coordinated multi-robot explo-ration. IEEE Trans. Robotics, 21(3):376–386, 2005.

D. Caligiore, T. Ferrauto, D. Parisi, N. Accornero, M. Capozza, and G. Baldassarre. Us-ing motor babbling and Hebb rules for modeling the development of reaching withobstacles and grasping. In Proc. of the International Conference on Cognitive Systems(CogSys), 2008.

A. Cangelosi, T. Belpaeme, G. Sandini, G. Metta, L. Fadiga, G. Sagerer, K. Rohlfing,B. Wrede, S. Nolfi, D. Parisi, C. Nehaniv, K. Dautenhahn, J. Saunders, K. Fischer,J. Tani, and D. Roy. The ITALK project: Integration and transfer of action and lan-guage knowledge in robots. In Proc. of the International Conference on Human RobotInteraction (HRI), 2008.

F. Chaumette and S. Hutchinson. Visual servo control, part i: Basic approaches. IEEERobotics & Automation Magazine, 13(4):82–90, 2006.

F. Chaumette and S. Hutchinson. Visual servo control, part ii: Advanced approaches.IEEE Robotics and Automation Magazine, 14(1):109–118, 2007.

A. Chella, M. Frixione, and S. Gaglio. A cognitive architecture for robot self-consciousness. Artificial Intelligence in Medicine, 44(2):147–154, 2008.

N.G. Cheng, M.B. Lobovsky, S.J. Keating, A.M. Setapen, K.I. Gero, A.E. Hosoi, and K.D.Iagnemma. Design and analysis of a robust, low-cost, highly articulated manipulatorenabled by jamming of granular media. In Proc. of the IEEE International Conferenceof Robotics and Automation (ICRA), pages 4328–4333, 2012.

C. Ciliberto, F. Smeraldi, L. Natale, and G. Metta. Online multiple instance learningapplied to hand detection in a humanoid robot. In Proc. of the IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2011.

D. Cliff, P. Husbands, and I. Harvey. Explorations in evolutionary robotics. AdaptiveBehavior, 2(1):73–110, 1993.

P.I. Corke. Robotics, Vision and Control, volume 73 of Springer Tracts in AdvancedRobotics. Springer, 2011. ISBN 978-3-642-20143-1.

B. Dachwald. Optimization of interplanetary solar sailcraft trajectories using evolution-ary neurocontrol. Journal of Guidance, Control, and Dynamics, 27(1):66–72, 2004.

23 Bibliography

M.A. Diftler, J.S. Mehling, M.E. Abdallah, N.A. Radford, L.B. Bridgwater, A.M. Sanders,R.S. Askew, D.M. Linn, J.D. Yamokoski, F.A. Permenter, B.K. Hargrave, R. Piatt, R.T.Savely, and R.O. Ambrose. Robonaut 2 - the first humanoid robot in space. In Proc. ofthe IEEE Int’l. Conference on Robotics and Automation (ICRA), 2011.

S. Doncieux, J.B. Mouret, N. Bredeche, and V. Padois. Evolutionary robotics: Exploringnew horizons. New Horizons in Evolutionary Robotics, pages 3–25, 2011.

A. Edsinger and C.C. Kemp. Manipulation in human environments. In Proc. of the IEEEInternational Conference on Humanoid Robots, pages 102–109, 2006.

S. R. Fanello, C. Ciliberto, L. Natale, and G. Metta. Weakly supervised strategies fornatural object recognition in robotics. In Proc. of the IEEE Int’l. Conference on Roboticsand Automation (ICRA), 2013.

L.V. Fausett. Fundamentals of neural networks: architectures, algorithms, and applications.Prentice-Hall, 1994. ISBN 978-0133341867.

H. Forssberg, A.C. Eliasson, H. Kinoshita, R.S. Johansson, and G. Westling. Developmentof human precision grip i: basic coordination of force. Experimental Brain Research,85(2):451–457, 1991.

M. Frank, A. Förster, and J. Schmidhuber. Reflexive Collision Response with VirtualSkin - Roadmap Planning Meets Reinforcement Learning. In Proc. of the InternationalConference on Agents and Artificial Intelligence (ICAART), 2012.

M. Frank, J. Leitner, M. Stollenga, A. Förster, and J. Schmidhuber. Learning to planmotions with artificial curiosity. Frontiers in Neurorobotics, 2013. accepted.

GeRT. EU Project Consortium: Generalizing Robot Manipulation Tasks. http://www.

gert-project.eu/project/the-vision/, February 2012.

A. Gloye, F. Wiesel, O. Tenchio, and M. Simon. Reinforcing the Driving Quality of SoccerPlaying Robots by Anticipation. IT - Information Technology, 47(5), 2005.

R.C. Gonzalez and R.E. Woods. Digital Image Processing. Prentice Hall Press, 2002. ISBN0-201-18075-8.

I. Gori, S.R. Fanello, F. Odone, and G. Metta. A compositional approach for 3d arm-handaction recognition. In Proc. of the IEEE Workshop on Robot Vision (WoRV), 2013.

G.D. Hager, W.-C. Chang, and A.S. Morse. Robot hand-eye coordination based on stereovision. IEEE Control Systems, 15(1):30–39, 1995.

S. Harding, J. Leitner, and J. Schmidhuber. Cartesian genetic programming for imageprocessing. In R. Riolo, E. Vladislavleva, M. D. Ritchie, and J. H. Moore, editors,

24 Bibliography

Genetic Programming Theory and Practice X, Genetic and Evolutionary Computation,pages 31–44. Springer New York, 2013. ISBN 978-1-4614-6845-5.

S. Hart, S. Ou, J. Sweeney, and R. Grupen. A framework for learning declarative struc-ture. In Proc. of the RSS Workshop: Manipulation in Human Environments, 2006.

R. Hartley and A. Zisserman. Multiple view geometry in computer vision. Cambridge UnivPress, 2000. ISBN 0-521-54051-8.

I. Harvey, P. Husbands, D. Cliff, A. Thompson, and N. Jakobi. Evolutionary robotics: theSussex approach. Robotics and Autonomous Systems, 20(2):205–224, 1997.

P. Hebert, J.W. Burdick, T. Howard, N. Hudson, and J. Ma. Action inference: The nextbest touch. In RSS Workshop on Mobile Manipulation, 2012.

H. Hoffmann, W. Schenck, and R. Möller. Learning visuomotor transformations for gaze-control and grasping. Biological Cybernetics, 93:119–130, 2005. ISSN 0340-1200.

B.K. Horn. Robot Vision. MIT Press, 1986.

A. Hornberg. Handbook of machine vision. Wiley, 2007. ISBN 978-3527405848.

N. Hudson, T. Howard, J. Ma, A. Jain, M. Bajracharya, S. Myint, C. Kuo, L. Matthies,P. Backes, P. Hebert, T. Fuchs, and J. Burdick. End-to-end dexterous manipulationwith deliberate interactive estimation. In Proc. of the IEEE International Conferenceon Robotics and Automation (ICRA), 2012.

M. Hülse, S. McBride, J. Law, and M. Lee. Integration of active vision and reaching froma developmental robotics perspective. IEEE Trans. Autonomous Mental Development, 2(4):355–367, 2010.

L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid sceneanalysis. IEEE Trans. Pattern Anal. Mach. Intell., 20(11):1254–1259, 1998.

M. Jeannerod. The cognitive neuroscience of action. Blackwell Publishing, 1997.

C.C. Kemp, A. Edsinger, and E. Torres-Jara. Challenges for robot manipulation in humanenvironments [grand challenges of robotics]. IEEE Robotics & Automation Magazine,14(1):20–29, 2007.

G. Konidaris. Designing Intelligent Robots: Reintegrating AI II, 2013. URL http://

people.csail.mit.edu/gdk/dir2/.

D. Kragic and M. Vincze. Vision for robotics. Foundations and Trends in Robotics, 1(1):1–78, 2009.

B.J. Kuipers, P. Beeson, J. Modayil, and J. Provost. Bootstrap learning of foundationalrepresentations. Connection Science, 18(2):145–158, 2006.

25 Bibliography

Y. Kusuda. Toyota’s violin-playing robot. Industrial Robot: An International Journal, 35(6):504–506, 2008.

W. B. Langdon and P. Nordin. Evolving Hand-Eye Coordination for a Humanoid Robotwith Machine Code Genetic Programming. In Proc. of the European Conference onGenetic Programming (EuroGP), 2001.

J. Leitner, C. Ampatzis, and D. Izzo. Evolving ANNs for spacecraft rendezvous anddocking. In Proc. of the International Symposium on Artificial Intelligence, Robotics andAutomation in Space (i-SAIRAS), 2010.

J. Leitner, P. Chandrashekhariah, S. Harding, M. Frank, G. Spina, A. Forster, J. Triesch,and J. Schmidhuber. Autonomous learning of robust visual object detection and iden-tification on a humanoid. In Proc. of the IEEE International Conference on Developmentand Learning and Epigenetic Robotics (ICDL), 2012a.

J. Leitner, S. Harding, A. Förster, and J. Schmidhuber. Mars terrain image classificationusing cartesian genetic programming. In Proc. of the International Symposium onArtificial Intelligence, Robotics and Automation in Space (i-SAIRAS), 2012b.

J. Leitner, S. Harding, M. Frank, A. Förster, and J. Schmidhuber. icVision: A ModularVision System for Cognitive Robotics Research. In Proc. of the International Conferenceon Cognitive Systems (CogSys), 2012c.

J. Leitner, S. Harding, M. Frank, A. Förster, and J. Schmidhuber. Learning spatial ob-ject localisation from vision on a humanoid robot. International Journal of AdvancedRobotic Systems (ARS), 9, 2012d.

J. Leitner, S. Harding, M. Frank, A. Förster, and J. Schmidhuber. Transferring spatialperception between robots operating in a shared workspace. In Proc. of the IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS), 2012e.

J. Leitner, S. Harding, P. Chandrashekhariah, and J. Triesch J. Schmidhuber. M. Frank,A. Förster. Learning visual object detection and localisation using icVision. BiologicallyInspired Cognitive Architectures, 5:29 – 41, 2013a. ISSN 2212-683X.

J. Leitner, S. Harding, M. Frank, A. Förster, and J. Schmidhuber. An Integrated, Mod-ular Framework for Computer Vision and Cognitive Robotics Research (icVision). InA. Chella, R. Pirrone, R. Sorbello, and K.R. Jóhannsdóttir, editors, Biologically InspiredCognitive Architectures 2012, volume 196 of Advances in Intelligent Systems and Com-puting, pages 205–210. Springer Berlin Heidelberg, 2013b. ISBN 978-3-642-34274-5.

J. Leitner, S. Harding, M. Frank, A. Förster, and J. Schmidhuber. Humanoid learns todetect its own hands. In Proc. of the IEEE Conference on Evolutionary Computation(CEC), pages 1411–1418. 2013c.

26 Bibliography

S.E. Levinson, A.F. Silver, and L.A. Wendt. Vision based balancing tasks for the icubplatform: A case study for learning external dynamics. In Workshop on Open SourceRobotics: iCub & Friends. IEEE Int’l. Conf. on Humanoid Robotics, 2010.

M. Lungarella, G. Metta, R. Pfeifer, and G. Sandini. Developmental robotics: a survey.Connection Science, 15(4):151–190, 2003.

J. Maitin-Shepard, M. Cusumano-Towner, J. Lei, and P. Abbeel. Cloth grasp point detec-tion based on multiple-view geometric cues with application to robotic towel folding.In Proc. of the IEEE International Conference on Robotics and Automation (ICRA), pages2308–2315, 2010.

D. Marr. Vision: A Computational Approach. Freeman & Co., San Francisco, 1982. ISBN0-7167-1567-8.

M.E. McCarty, R.K. Clifton, D.H. Ashmead, P. Lee, and N. Goubet. How infants use visionfor grasping objects. Child development, 72(4):973–987, 2001.

L. Meeden and D. Blank. Introduction to developmental robotics. Connection Science,18(2):93–96, 2006.

A.N. Meltzoff. Infant imitation after a 1-week delay: Long-term memory for novel actsand multiple stimuli. Developmental Psychology, 24(4):470, 1988.

G. Metta, L. Natale, F. Nori, G. Sandini, D. Vernon, L. Fadiga, C. von Hofsten,K. Rosander, M. Lopes, J. Santos-Victor, A. Bernardino, and L. Montesano. The iCubhumanoid robot: An open-systems platform for research in cognitive development.Neural Networks, 23(8-9):1125–1134, October 2010.

J.F. Miller. An empirical study of the efficiency of learning boolean functions using acartesian genetic programming approach. In Proc. of the Genetic and EvolutionaryComputation Conference (GECCO), pages 1135–1142, 1999.

J.F. Miller. Cartesian genetic programming. Cartesian Genetic Programming, pages 17–34, 2011.

L. Natale, F. Nori, G. Sandini, and G. Metta. Learning precise 3d reaching in a humanoidrobot. In Proc. of the International Conference on Development and Learning (ICDL),pages 324–329, 2007.

N. Nilsson. A mobile automaton: An application of artificial intelligence techniques. InProc. of the International Joint Conference on Artificial Intelligence (IJCAI), 1969.

N.J. Nilsson. Shakey the robot. Technical report, DTIC Document, 1984.

S. Nolfi and D. Floreano. Evolutionary robotics: The biology, intelligence, and technologyof self-organizing machines. 2000. ISBN 978-0262640565.

27 Bibliography

S. Nolfi, D. Floreano, O. Miglino, and F. Mondada. How to evolve autonomous robots:Different approaches in evolutionary robotics. In Artificial Life IV, pages 190–197.MIT Press, 1994.

F. Nori, L. Natale, G. Sandini, and G. Metta. Autonomous learning of 3d reaching ina humanoid robot. In Proc. of the International Conference on Intelligent Robots andSystems (IROS), pages 1142–1147, 2007.

Christian Ott, Oliver Eiberger, Johannes Englsberger, Maximo A Roa, and Alin Albu-Schäffer. Hardware and control concept for an experimental bipedal robot with jointtorque sensors. Journal of the Robotics Society of Japan, 30(4):378–382, 2012.

E. Oztop, N.S. Bradley, and M.A. Arbib. Infant grasp learning: a computational model.Experimental Brain Research, 158(4):480–503, 2004.

U. Pattacini. Modular Cartesian Controllers for Humanoid Robots: Design and Implemen-tation on the iCub. PhD thesis, Italian Institute of Technology, Genova, 2011.

J. Peters, S. Vijayakumar, and S. Schaal. Reinforcement learning for humanoid robotics.In Proc. of the International Conference on Humanoid Robots, 2003.

R. Pfeifer, J. Bongard, and S. Grand. How the body shapes the way we think: a new viewof intelligence. The MIT Press, 2007. ISBN 0262162393.

R. Piatt, R. Burridge, M. Diftler, J. Graf, M. Goza, E. Huber, and O. Brock. Humanoidmobile manipulation using controller refinement. In Proc. of the RSS Workshop: Ma-nipulation in Human Environments, pages 94–101, 2006.

M.I. Posner. Foundations of cognitive science. The MIT Press, 1989.

E. Rosten, R. Porter, and T. Drummond. Faster and better: A machine learning approachto corner detection. IEEE Trans. Pattern Anal. Mach. Intell., 32(1):105–119, 2010.

Stuart Jonathan Russell and Peter Norvig. Artificial Intelligence: A Modern Approach.Prentice Hall, 3rd edition, 2010.

A. Sadikov, M. Možina, M. Guid, J. Krivec, and I. Bratko. Automated chess tutor. In H.J.Herik, P. Ciancarini, and H.H.L.M. Donkers, editors, Computers and Games, volume4630 of Lecture Notes in Computer Science, pages 13–25. Springer Berlin Heidelberg,2007. ISBN 978-3-540-75537-1.

Y. Sakagami, R. Watanabe, C. Aoyama, S. Matsunaga, N. Higaki, and K. Fujimura. Theintelligent ASIMO: System overview and integration. In Proc. of the InternationalConference on Intelligent Robots and Systems (IROS), 2002.

A. Saxena, J. Driemeyer, and A.Y. Ng. Robotic grasping of novel objects using vision.The International Journal of Robotics Research (IJRR), 27(2):157, 2008.

28 Bibliography

S. Schaal. Is imitation learning the route to humanoid robots? Trends in cognitivesciences, 3(6):233–242, 1999.

M. Shanahan. A cognitive architecture that combines internal simulation with a globalworkspace. Consciousness and Cognition, 15(2):433–449, 2006.

G. Sicard, C. Salaun, S. Ivaldi, V. Padois, and O. Sigaud. Learning the velocity kinematicsof iCub for model-based control: XCSF versus LWPR. In Proc. of the IEEE InternationalConference on Humanoid Robots, 2011.

J. Solis and A. Takanishi. Wind instrument playing humanoid robots. In J. Solisand K. Ng, editors, Musical Robots and Interactive Multimodal Systems, volume 74of Springer Tracts in Advanced Robotics, pages 195–213. Springer Berlin Heidelberg,2011. ISBN 978-3-642-22290-0.

M. Stollenga, L. Pape, M. Frank, J. Leitner, A. Fõrster, and J. Schmidhuber. Task-relevantroadmaps: A framework for humanoid motion planning. In Proc. of the IEEE/RSJInternational Conference on Intelligent Robots and Systems (IROS), 2013.

R.S. Sutton and A.G. Barto. Reinforcement learning: An introduction. Cambridge UnivPress, 1998. ISBN 978-0262193986.

S. Takagi. Toyota partner robots. Journal-Robotics Society of Japan, 24(2):62, 2006.

THE. EU Project Consortium: The Hand Embodied. http://www.thehandembodied.

eu/, January 2012.

V. J. Traver and A. Bernardino. A review of log-polar imaging for visual perception inrobotics. Robotics and Autonomous Systems, 58:378–398, 2010.

N.G. Tsagarakis, G. Metta, G. Sandini, D. Vernon, R. Beira, F. Becchi, L. Righetti,J. Santos-Victor, A.J. Ijspeert, M.C. Carrozza, and D.G. Caldwell. iCub: the design andrealization of an open humanoid platform for cognitive and neuroscience research.Advanced Robotics, 21:1151–1175, 2007.

D. Vernon, G. Metta, and G. Sandini. The iCub Cognitive Architecture: Interactive Devel-opment in a Humanoid Robot. In Proc. of the International Conference on Developmentand Learning (ICDL), 2007a.

D. Vernon, G. Metta, and G. Sandini. A survey of artificial cognitive systems: Implica-tions for the autonomous development of mental capabilities in computational agents.IEEE Trans. Evolutionary Computation, 11(2):151–180, 2007b.

WAY. EU Project Consortium: Wearable Interfaces for Hand Function Recovery. http:

//www.wayproject.eu/, January 2012.

29 Bibliography

J. Weng. Developmental robotics: Theory and experiments. International Journal ofHumanoid Robotics, 1(2):199–236, 2004.

L. Wiskott, J. Fellous, N. Krüger, and C. v.d. Malsburg. Face recognition by elastic bunchgraph matching. IEEE Trans. Pattern Anal. Mach. Intell., pages 775–779, 1997.

D.M. Wolpert, Z. Ghahramani, and J.R. Flanagan. Perspectives and problems in motorlearning. Trends in cognitive sciences, 5(11):487–494, 2001.

J. Wyatt and N. Hawes. Multiple workspaces as an architecture for cognition. In A.V.Samsonovich, editor, AAAI Fall Symposium on Biologically Inspired Cognitive Architec-tures. The AAAI Press, 2008.

30 Bibliography

Appendix A: icVision Architecture

Figure 1. The icVision Architecture. It consists of loosely-coupled modules to providea simple framework for development of computer vision solutions and integration ofthe research directly on the iCub hardware.

icVision is a distributed framework for running robot vision modules (see Figure 1).The 3D localisation is done the following: At first the camera images are acquired fromthe hardware via YARP. The images are converted into grayscale, as well as, split intoRGB/HSV channels and distributed to all active icVision filters. Each filter then processesthe images received using OpenCV functions. The output of this is a binary image, seg-menting the object to be localised. A blob detection algorithm is run on these binaryimages to find the (centre) location of the detected object in the image frame. The posi-tion of the object in both the right and left camera images is sent to the 3D localisationmodule, where together with the robot’s pose, a 3D location estimation is generated.The last step places the object in the existing world model (see Figure 2).

31

32

Figure 2. The 3D localisation workflow through the icVision framework modules.