11
Robotics and Autonomous Systems 60 (2012) 441–451 Contents lists available at SciVerse ScienceDirect Robotics and Autonomous Systems journal homepage: www.elsevier.com/locate/robot Robotic grasping and manipulation through human visuomotor learning Brian Moore a,b,, Erhan Oztop c,a,d a ATR Cognitive Mechanisms Laboratories, Kyoto, Japan b Laval University, Department of Mechanical Engineering, Québec, Canada c NICT Biological ICT Group, Kyoto, Japan d Ozyegin University, Istanbul, Turkey article info Article history: Available online 24 September 2011 Keywords: Body schema Skill synthesis Hand control Robot grasping Robot manipulation abstract A major goal of robotics research is to develop techniques that allow non-experts to teach robots dexterous skills. In this paper, we report our progress on the development of a framework which exploits human sensorimotor learning capability to address this aim. The idea is to place the human operator in the robot control loop where he/she can intuitively control the robot, and by practice, learn to perform the target task with the robot. Subsequently, by analyzing the robot control obtained by the human, it is possible to design a controller that allows the robot to autonomously perform the task. First, we introduce this framework with the ball-swapping task where a robot hand has to swap the position of the balls without dropping them, and present new analyses investigating the intrinsic dimension of the ball- swapping skill obtained through this framework. Then, we present new experiments toward obtaining an autonomous grasp controller on an anthropomorphic robot. In the experiments, the operator directly controls the (simulated) robot using visual feedback to achieve robust grasping with the robot. The data collected is then analyzed for inferring the grasping strategy discovered by the human operator. Finally, a method to generalize grasping actions using the collected data is presented, which allows the robot to autonomously generate grasping actions for different orientations of the target object. © 2011 Elsevier B.V. All rights reserved. 1. Introduction Although humans are very skilled at manipulation, and in spite of the recent developments in robotic manipulation to match this skill, developing a robotic system that comes close to human hand dexterity is still elusive. Imitation and learning from observation/demonstration research in robotics aim at removing the burden of robot programming from experts by using human guidance to teach the robot a given skill such as manipulation. The simplest method to transfer a skill to a robot is to directly copy the motor commands of the demonstrator to the robot, the so called motor tape approach [1], which is effective for open- loop tasks. However, in general, it is not possible to adopt this approach. First, motor commands may not be available to the robot; even if available, the differences between the demonstrator and the robot often render the motor commands useless for the robot. Another approach which is commonly used is the so called direct teaching [2–4] where the robot behavior is shaped or molded Correspondence to: Laval University, Department of Mechanical Engineering, Pavillon Adrien-Pouliot, 1065 Avenue de la médecine, Québec, Québec, G1V 0A6, Canada. Tel.: +1 418 656 2131 3209. E-mail addresses: [email protected], [email protected] (B. Moore), [email protected] (E. Oztop). by directly moving the joints of the robot to obtain the desired behavior. Direct teaching is not practical for complex tasks that may include non-negligible dynamics. Another alternative is the vision based imitation approach [5–14] where the motion of a human demonstrator is mapped onto the kinematical structure of a robot. Current vision based imitation systems may fail to fulfill the precision and robustness requirements of many manipulation tasks as it is often not possible to accurately extract the finger configuration of the demonstrator through computer vision. To sidestep the problems related to computer vision, sophisticated motion capture systems can be deployed, which give accurate human motion data. However, due to the different dynamical properties of the robot and the human, the success of this approach heavily depends on the expert knowledge that goes in the transformation that takes the human motion and maps it onto the robot. A new paradigm for obtaining skilled robot behavior is to utilize an intuitive teleoperation system where humans learn to control the robot joints to perform a given task [15]. Once the human manages to control the robot and complete the desired task with it, the control commands produced by the human–robot system during task execution can be used for designing controllers that operate autonomously. This places the initial burden of learning on the human instructor, but allows the robot to ultimately acquire the ability to perform the target skill without human guidance. 0921-8890/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2011.09.002

Robotic grasping and manipulation through human visuomotor learning

Embed Size (px)

Citation preview

Robotics and Autonomous Systems 60 (2012) 441–451

Contents lists available at SciVerse ScienceDirect

Robotics and Autonomous Systems

journal homepage: www.elsevier.com/locate/robot

Robotic grasping and manipulation through human visuomotor learningBrian Moore a,b,∗, Erhan Oztop c,a,d

a ATR Cognitive Mechanisms Laboratories, Kyoto, Japanb Laval University, Department of Mechanical Engineering, Québec, Canadac NICT Biological ICT Group, Kyoto, Japand Ozyegin University, Istanbul, Turkey

a r t i c l e i n f o

Article history:Available online 24 September 2011

Keywords:Body schemaSkill synthesisHand controlRobot graspingRobot manipulation

a b s t r a c t

A major goal of robotics research is to develop techniques that allow non-experts to teach robotsdexterous skills. In this paper, we report our progress on the development of a framework which exploitshuman sensorimotor learning capability to address this aim. The idea is to place the human operator inthe robot control loop where he/she can intuitively control the robot, and by practice, learn to performthe target task with the robot. Subsequently, by analyzing the robot control obtained by the human,it is possible to design a controller that allows the robot to autonomously perform the task. First, weintroduce this framework with the ball-swapping task where a robot hand has to swap the position of theballs without dropping them, and present new analyses investigating the intrinsic dimension of the ball-swapping skill obtained through this framework. Then, we present new experiments toward obtainingan autonomous grasp controller on an anthropomorphic robot. In the experiments, the operator directlycontrols the (simulated) robot using visual feedback to achieve robust grasping with the robot. The datacollected is then analyzed for inferring the grasping strategy discovered by the human operator. Finally,a method to generalize grasping actions using the collected data is presented, which allows the robot toautonomously generate grasping actions for different orientations of the target object.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

Although humans are very skilled at manipulation, and inspite of the recent developments in robotic manipulation tomatch this skill, developing a robotic system that comes close tohuman hand dexterity is still elusive. Imitation and learning fromobservation/demonstration research in robotics aim at removingthe burden of robot programming from experts by using humanguidance to teach the robot a given skill such as manipulation.The simplest method to transfer a skill to a robot is to directlycopy the motor commands of the demonstrator to the robot, theso called motor tape approach [1], which is effective for open-loop tasks. However, in general, it is not possible to adopt thisapproach. First, motor commands may not be available to therobot; even if available, the differences between the demonstratorand the robot often render the motor commands useless for therobot. Another approach which is commonly used is the so calleddirect teaching [2–4]where the robot behavior is shapedormolded

∗ Correspondence to: Laval University, Department of Mechanical Engineering,Pavillon Adrien-Pouliot, 1065 Avenue de la médecine, Québec, Québec, G1V 0A6,Canada. Tel.: +1 418 656 2131 3209.

E-mail addresses: [email protected], [email protected](B. Moore), [email protected] (E. Oztop).

0921-8890/$ – see front matter© 2011 Elsevier B.V. All rights reserved.doi:10.1016/j.robot.2011.09.002

by directly moving the joints of the robot to obtain the desiredbehavior. Direct teaching is not practical for complex tasks thatmay include non-negligible dynamics. Another alternative is thevision based imitation approach [5–14] where the motion of ahuman demonstrator is mapped onto the kinematical structure ofa robot. Current vision based imitation systems may fail to fulfillthe precision and robustness requirements of many manipulationtasks as it is often not possible to accurately extract the fingerconfiguration of the demonstrator through computer vision. Tosidestep the problems related to computer vision, sophisticatedmotion capture systems can be deployed, which give accuratehuman motion data. However, due to the different dynamicalproperties of the robot and the human, the success of thisapproach heavily depends on the expert knowledge that goes inthe transformation that takes the human motion and maps it ontothe robot.

A newparadigm for obtaining skilled robot behavior is to utilizean intuitive teleoperation system where humans learn to controlthe robot joints to perform a given task [15]. Once the humanmanages to control the robot and complete the desired task withit, the control commands produced by the human–robot systemduring task execution can be used for designing controllers thatoperate autonomously. This places the initial burden of learningon the human instructor, but allows the robot to ultimately acquirethe ability to perform the target skill without human guidance.

442 B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451

In this article, we report our progress on applying this paradigmtomanipulation using an anthropomorphic robot hand. As the firsttask we study a two-balls manipulation task, where the robot isrequired to swap the so called Chinese balls without droppingthem. We first briefly recap some results of the ball swapping taskthat we have published in conference proceedings earlier [15,16],and then perform additional analysis. In particular, we presentnew results concerning the intrinsic dimensionality of the obtainedball swapping skill. As the second task, we focus on grasping anddescribe the derivation of a reaching and grasping controller usingthis paradigm. We give the details of the grasping interface wedeveloped and present several results on the human control of therobot and actual robot grasp synthesis.

2. Method

To better exemplify the paradigm a bit of digression is in order.As soon as we hold a computer mouse, it almost becomes a partof our body; we use it fluently as we control our hands. Thisintrospection has a neural basis [17]. In addition, experimentswith monkeys have shown that arm and hand representationsin the brain are very plastic. For example, as soon as a rakeis grabbed with the purpose of reaching objects that are outof reach, the hand representation extends to represent the toolheld by the monkey [18,19]. We speculate that by connecting ananthropomorphic hand to a human through an intuitive interfacewe would engage such plastic change in the operator’s brainso that fluent control of the robotic hand will be possible. Thisparadigm sees the adaptability of human body representation asan effective way of obtaining robot dexterity. After human controlfluency, the learning software of the robot builds an autonomouscontroller by monitoring sensor readings and motor commandsgenerated by the human controller. The proposal has parallelswith learning by demonstration and imitation learning [5–14];however, the bottleneck of these systems, i.e. the differences (bothdynamics and kinematics) between the demonstrator and thelearner is overcome by human visuomotor learning. Therefore,unlike conventional teaching by demonstration approaches, wedo not expect this to be an easy task for the human teacher; onthe contrary, we anticipate long learning periods which may spanweeks after which the robot learning can take place.

In order for the operator to efficiently learn to control the robotand perform complex tasks with it, the interface between theoperator and the robot should be as intuitive as possible. In otherwords, the motion adaptation of the operator should be naturalin terms of his body motion and readily learnable. The numberone requirement for such an intuitive interface is probably, therepeatability of robot’s response to humanmotion. This drasticallyreduces the operator learning time. We all know the frustration ofusing an erratically behaving computer mouse due to a bad sensor.The number two requirement is the level of anthropomorphism ofthe controlled robot. The delay in the robot control is also critical;but in practice, we have observed that 30–60 Hz control rate issufficient for visually guided tasks.

One main strength of this approach is that it takes advantageof the noise intrinsic to the system. First, the presence of noiseforces the user to find a control strategy to perform a given taskon the robot which leads to a robust solution. That is, it leadsto a robust robot trajectory which can be used to fulfill the taskin the face of changing environment, such as when interactingwith objects having different friction coefficients. Second, since thequality of the measurements (for example using the data glove)greatly depends on the shape and size of the operator (i.e. forthe glove, the hand size and shape are very important), differentoperators might use completely different control strategies thatmight lead to the discovery of alternative ways to perform a skillon the robot.

Fig. 1. The real-time human control via visual data capture. Subject finger motionsare tracked at 30 Hz, converted into joint angles and sent to the PD controller asdesired joint angles.

3. Task 1: Ball swapping

3.1. Human–robot interface

For the ball swapping task, a 16DOF robot hand (Gifu HandIII, Dainichi Co. Ltd., Japan) was attached to a robotic arm (PA-10robot, Mitsubishi Heavy Industries), which kept the hand close tohorizontal, palm facing up posture. A fixed orientation was chosenand was used without change throughout. The task was to movethe fingers of the robot hand so that the balls change their positionwithout falling down. Humans can easily perform this task withtheir own hand, often requiring palm and thumb articulation.

Human finger motion is captured using an active marker basedtracking system (Visualeyez, PhoeniX Technologies Inc.) which hasa positional accuracy around 1 mm in 3D position. This systemcomes with a soft glove that allows user adjustable positioningof the markers (see Fig. 1). Human fingers were tracked at 30 Hz.Due to noise and occlusion problems, simple joint angle to jointangle mapping did not produce satisfactory real-time control ofthe robot hand. Therefore we used inverse kinematics where first,the human finger tip positions were mapped to robot finger tippositions via a calibrationmatrix thatwas obtained earlier througha calibration process. Then, a local inverse kinematics was appliedon the robot fingers, yielding the final joint angles that was sentto the low level robot hand controller as desired joint angles tobe tracked. This produced very intuitive control; operators feltnatural to manipulate their environment using the robot fingers.The details of this interface are not repeated here, but interestedreaders are referred to [15].

We observed that practising finger movements and robotobservationwas sufficient to achieve an intuitive human control ofthe robot fingers as long as the robot response is not slow (>20Hz).For example, without effort, a human operator could reliably touchor tap visual targets using the robot hand. However, in tactile taskssuch as cradling a ball, the subjects had difficulty and requiredlonger training. This presumablywas due to the tactile expectationof a ball rolling over one’s hand, which was missing.

3.2. Experimental setting

For this study, one subjectwent through the training procedure.The subject sat next to the robot where he could clearly see therobot fingers and the balls. There were no instructions on how tohold the hand that was used to control the robot except that themarkers on the glove had to be visible. When markers were lost,a computer generated beep warned the subject. With this setup, it

B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451 443

Fig. 2. Frames representing the ball swapping task performed by the Gifu Hand using the skill transferred from the human performance. The numbers on top left corner ofthe images indicate the chronological order of the frames.

took oneweek for the subject (1–2 h per day) to obtain a successfulswap of the balls with the robot hand. The key strategy (motorprimitive) observed for ball swapping appears to be kicking the ballrather than sliding it over the surface of the fingers (see Fig. 2). Thisis very different from what humans do when the task is executedin natural conditions. In the natural setting, tactile feedback guidesthe execution where the visual component of the task is minimal.One cycle of the human generated ball swapping trajectory wasthen stored for analysis and autonomous controller synthesis.

3.3. Autonomous manipulation generation and trajectory analysis

As mentioned earlier, directly copying the motor commandsof the demonstrator to the robot is very effective for open-loop tasks but has the drawback that the motor commandsmay not be available to the robot and that even if they areavailable, they are not suitable for the robot due to the differencesbetween the demonstrator and the robot. The situation in our case,however, is different; because the correct motor commands forthe robot are readily produced by the human operator. In otherwords, instead of expert robot programming we rely on subjectssensorimotor learning ability to produce the motor commandson the robot itself, which can be conveniently played back ifthe task assumes an open-loop control. If not, then humanperformance provides valuable data points that can used toderive an autonomous feedback controller by using supervisedmachine learning techniques. Our experiments showed that forthe ball swapping task, an open-loop as well as a feedback loopcontroller can be derived. The execution speed and smoothnesswere also improved using insights from the trajectories obtainedby the human operator. The details of these can be found in [16].The feedback controller derivation was based on the open loopsuccessful performance as described in detail in [20]. Here, weonly mention that the ball positions were tracked utilizing a colortracker and used to define a state (s) together with the robot fingerangles. Since, at each instant, we knew the motor command (u)sent to the robot and the state, we could use several successfulball swapping performances to obtain data points of the form (s(t),u(t)). This data set, then was learned using a non-linear regressionmodel, effectively giving us the feedback controller u = π(s).

One attractive point of this paradigm is that it allows self-improvement after the initial skill is transferred to the robot. For

self improvement, with no explicit instruction as to what to dowhen, Reinforcement Learning (RL) is the key technique to use [21].Dynamic Programming can also be used but only if the robot andits environment model is known in advance, which is often notthe case. For example, the backlash in the robot hand used forball swapping and the noise in ball position sensing make it verydifficult to accurately model the dynamics of the (Robot +2 Balls)system. RL has been shown to be effective for many robot tasks;however it’s effectiveness degrades as the number of DOF of therobotic system increases due to the increased computational cost.In practice, one can say RLworkswell for state spaces of degree lessthan ten. The obtained ball swapping skill consisted of movementsof 12 joints; however as the fingers had to move dependentlyto keep the balls from falling down, the intrinsic dimension ofthe motion must be less than the number of joint angles. In fact,low intrinsic dimension appears to be a fundamental propertyof the human movements [22,23]. Once the low dimensionalrepresentation of themovement is found, one canwork with thesedimensions rather than the full number of DOFs to improve the ballswapping performance.With thismindset,we performedprincipalcomponent analysis (PCA) on the ball swapping trajectory obtainedby the human subject. With PCA, we also expected that a set ofmotor primitives adopted by the human operator on the robot forball swapping may be extracted by inspecting the top principalcomponents. We had 12 trajectories each corresponding to singlejoint movements of the robot hand. The idea is to find N <12 number of components such that each component activatesmultiple joints, and the collective activation of these componentsallows robust ball swapping. For PCA analysis, we take the inputdata matrix as B =

b⃗i

where b⃗i represents the mean subtracted

trajectory of a single joint sampled at 30 Hz (i = 1, 2, . . . , 12).Then, the covariance matrix is C = B′

∗ B where B′ denotesthe transpose of the matrix B. Let W be the eigenvector matrix(column ordered starting with the highest eigenvector with thehighest eigenvalue) then K = W′

∗ B maps the trajectories intothe eigenvector space and W ∗ K back-projects to the originaljoint space. We can select the components to include from theeigenvector space using a 0–1 index vector x which selects rthcomponent by having 1 in rth position. With this, a particularlow dimensional representation of B can be obtained with Bred =

W∗Diag(x)∗W′∗B setting x as desired, where Diag is an operator

that returns a matrix with vector x as its main diagonal.

444 B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451

Fig. 3. The reconstructed joint trajectories, as angles (degrees) vs. time (seconds), using top 3 (left) and 5 (right) dimensions obtained by PCA are shown in color over theoriginal trajectories that are drawn in black. Each color indicates a particular finger of the robot where the top, middle and bottom panels show lateral abduction–adduction,distal, and proximal joint angles, respectively. With 3 dimensions (left panels) the robot cannot swap the balls, although the basic movement pattern is evident. With 5dimensions (right panels) the robot can successfully swap the balls.

3.4. Results

3.4.1. Autonomous executionThe trajectory obtained could be robustly played back on the

robotmore than thirtyminutes. The sensitivity to initial conditionsof the balls was small and many types of disturbances weretolerated and self-corrected with this open-loop control policy.Although only one type of ball was used in the human learningphase, this controller was able to swap balls with different sizesand weights, such as a wooden and coated metal ball, a coatedbut larger metal ball, and finally a large but very light Styrofoamball [15]. Inspection of the joint angle trajectories revealed thatthe subject used several kicks to roll the ball over the fingers. Thisinsight proved useful to improve the performance bymanipulatingthe trajectory offline. For example, we have shown that the ballswapping performance can be sped up by 2.5 without losingrobustness [16].

In the absence of perturbation, the feedback version of thecontroller performed essentially the same as the open loop versionas it modeled the open loop control. Small perturbations that werenot visually noticeable did not affect the control much; however,the more critical perturbations or failures was recovered veryintuitively. For example, the robot would repeat its kick to movethe ball from little finger to index finger when the kick failed(e.g. due to a purposeful interference from a human). The detailsof this feedback controller can be found in [20].

3.4.2. Trajectory analysisIn order to find out the minimum number of dimensions that

would sustain robust ball swapping, we systematically increased,starting from 1, the number of dimensions in the reduceddimensional version of the ball swapping trajectory, and played itback on the robot until we obtained a successful ball swapping.This meant having x1 = 100000000000, x2 = 110000000000,x3 = 111000000000, x4 = 111100000000, x5 = 111110000000 inBxi = W ∗Diag(xi)∗W ′

∗B.We found out that themost influential 5dimensions sufficed to have a robust ball swapping performance bythe robot. Already with 3 dimensions the basic movement patternwas apparent. Fig. 3 shows the back projected trajectories with theoriginal trajectories.

To interpret the specific components found by PCA wefurther looked at the robot movements generated by using backprojections of single components. First two components wereintuitive and readily interpretable by a human observer. First

component captured the kick behavior; whereas the secondcomponent captured the slow undulation of fingers in sequence.These are two key behaviors one can use to describe the robotball swapping in words: ‘the balls are shifted from the indexfinger toward the little finger and then kicked back on to theindex finger’. The notable point here is that human operator‘discovered’ these motor primitives (shifting and kicking) throughsensorimotor learning. Fig. 4 shows the back projection of first andsecond components over the actual trajectories.

4. Task 2: Grasping

4.1. Human–robot interface

For the grasping task,weused the same robotic hand (GifuHandIII, Dainichi Co. Ltd., Japan) and arm (PA-10 robot,Mitsubishi HeavyIndustries) as in the ball swapping task (see Fig. 5). However inthis case, the arm robot was also actively controlled by the human;so the grasping task included reaching for grasping and the actualflexion of the finger around the target object so that a stable graspis obtained.

For actuating the robot arm based on real-time humanmovements, an inertial-ultrasonic motion tracking system (IS-600, Intersense Inc.) which provides an adequate control rate isused. Fig. 5 shows the location of the motion tracking sensors formeasuring the position and the orientation of the armend-effector.For robustness and repeatability, we avoid classical iterative IKmethods and use an analytic parametrization of the redundantinverse kinematics [24] which can be used to avoid singularities orobstacles. When the signals from the markers are lost or a jump isdetected in the desiredmotion of the robot arm computed from themotion tracking system, the robot is commanded to stay at its lastgood position until new good data arrives. For the human operatorthis means that robot freezes when an occlusion occurs, and forresuming themotion of the robot, the operator has tomove his/herarm back in the vicinity of where the robot has frozen.

For the robot hand, a fiber-optic based data glove (5DT DataGlove, Fifth Dimension Technologies) which gives us the jointangles of the hand is used. We chose to use the fiber-opticbased system instead of the one utilized in the ball swappingtask to eliminate the problems related with occlusion of thevisual markers. Mapping of the finger motion of the operator(as measured with the data glove) to the robot finger motion isstraightforward since the joint angles of the data glove are directly

B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451 445

Fig. 4. The reconstructed joint trajectories using only the most influential (left) and second most influential dimensions are shown in color over the original trajectorieswhich are drawn in black. The ‘kick’ motor primitive can be seen in the proximal and distal joint flexion (after 6000ms), and ‘undulation’ primitive, although to a less extent,can be seen in the lateral abduction and adduction and distal flexion movements. (Drawing conventions used are the same as in Fig. 3).

Fig. 5. The robot hand-arm system used in this study (left and center); the motion tracking sensors and the data glove (right).

measured (i.e. they are scaled values of the joint angles between0 and 1). For most of the robot finger joints (14 out of 16), eachmeasurement of the operator finger joint angle can be directlymapped to the corresponding robot finger joint angles. Since therobot thumb has an additional joint and since the finger flexure ismeasured differently on the glove than on the robot, minormanualadjustments are required to obtain an intuitive thumb control.1

4.2. Experimental setting

For testing and developing the real-time control of the robothand system, We use PAGSim, the simulator for the robot handsystem developed in our lab. PAGSim allows us to reduce the riskof accidents and damage to the robot. For example, any changein the control algorithm is first tested in simulation to ensuresafety. As might be expected, the skills obtained via human motor

1 For the additional knuckle at the extremity of the robot thumb, the value is set tothe previous thumb knuckle value. Using the data glove, the flexures are measuredas relative angles between two consecutive fingers (i.e. 4measurements). However,to control the robot hand, the flexure of each individual finger is controlledindependently, so five values are required to control the flexure of the robot hand.To consider this additional degree of freedom of the robot hand, the value of themiddle finger flexure is fixed.

learning are first tested in the simulator before being deployedon the robot for autonomous operation. PAGSim is a dynamicssimulation environment that models the hand-arm system andits environment using a commercial quality open-source physicsengine, ODE (Open Dynamics Engine). All parts of the robotand environment are constructed from rigid geometrical objectsand articulated joint structures. We use velocity and positionmodes for controlling the robot hand-arm system, which are alsoemulated in PAGSim by mimicking the control signal profiles ofthe robot controller. In addition, in order to provide a seamlessexecution interface, PAGSim is designed to accept the same UDPnetwork calls which the robot hand-arm system accepts for itsoperation. The parameters of the simulator such as friction, massof the objects, forces on robot hand and arm are adjusted so thatinteractions with the objects have similar effects in real worldand simulation. A version of PAGSim was effective in simulatingthe learning experience of a mobile robot sufficiently accurate sothat the deployment of the learned skill to the real robot waspossible [25]. Having said this, we do not see simulation as areplacement for the physical world as many factors have to beoverlooked to make the simulation practical. For example, softcontacts are of utmost importance in manipulation tasks althoughsoft-body contact dynamics is not supported by many of thesimulation engines including ODE. In spite these issues, we werecontent that, as a first step, a human – virtual robot (i.e. simulator)

446 B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451

Fig. 6. Object to grasp in the simulator.

interface would produce sufficiently rich interactions so that wecould use the proposed paradigm for grasp synthesis.

The task of the operator in the experiment is to perform graspactions (reaching, grasping and lifting) with PAGSim using his ownbody (arm and hand) where his/her body controls the motion ofthe simulated robot arm and hand. The subject is instructed tograsp the virtual mug (see Fig. 6) from its handle in a naturalfashion. An important component of the system is therefore thevisual feedback provided to the user from the simulator. Althoughthe long term goal is to immerse the user in a 3D virtual realityenvironment tomake the systemmore intuitive, we currently onlyuse a normal LCDmonitor for this experiment.2 To compensate forthe lack of 3D effect, the location of the camera in the simulatoris carefully chosen and a real table (with the same height as thetable in the simulator) is put in front of the operator. During thelearning phase, the user practices grasping on the simulator to geta feeling of the required motion. At first, the user usually cannotgrasp the object or would hesitate and grasp the object only veryslowly, often touching the object several times before grasping itsuccessfully. After a few hours of practice with the system, mostusers feel comfortable manipulating objects in the simulator likeif it was the extension of their own body. For users having a smallhand or small fingers (a small percentage of the users), the motionof the hand could not accurately be measured with the data glovemaking it difficult for the user to directly control the motion of therobot hand.

For this investigation, a simple object representing a largemug with a handle is shown in Fig. 6 is used. A local referenceframe is defined on the object and its origin is defined as theobject’s center. In this paper, the position of the object meansthe position of the center of the object. In the experiments,the position of the object is fixed, but the object is rotatedabout its z axis (local reference frame) using a pre-defined setof orientations (20°, 30°, . . . , 90°, 100°) as shown in Fig. 7. Afterextensive training where the operator becomes skilled at graspingthe virtual object with the simulator, a session of 10 grasp actionsfor every pre-defined object orientation is performed in which themotion of the robot (not of the operator) is recorded. For each graspattempt, a button is pressed to indicate the moment when thegrasping of the object takes place and a final key is pressed whenthe trial is finished. If the position of the object at the end of thetrial is above the table, the trial is successful; otherwise, the graspaction is deemed unsuccessful and is discarded.

2 Human operators can grasp real objects using the real robot much easier thanthe virtual case. This is due to the rich 3D information available in the real case.However, for safety and avoiding wear and tear in the hardware we chose virtualenvironment for long term training and data collection.

4.3. Autonomous grasp generation and trajectory analysis

For autonomous operation, based on the robot graspingobtained by the human operator, an efficient algorithm to generatereaching and grasping motion for arbitrary object orientationsneeds to be developed. Sincewehavemultiple successful referencegrasping motions, the most robust grasping motion is selected asa representative for each pre-defined orientation. The definition ofa robust grasp is not unique and different criteria could be useddepending on the object and task. In this experiment, a grasp isrobust if it is successful in grasping the object even if the velocity ofthemotion is changed and if noise is added to the graspingmotion.

We aim at representing a grasping motion as a set of waypointsdescribing thewrist position, the hand orientation and joint valuesfor the fingers in time. We divide a recorded grasp motion intotwo phases: the reaching trajectory and the grasping trajectory. Thegrasping trajectory corresponds to the motion of the robot arm andhand in the neighborhood (in this experiment a cylinder aroundthe mug) of the object to be grasped while the reaching trajectorycorresponds to the motion starting from the initial pose of therobot and ending when it enters the neighborhood of the object.To grasp an object in an arbitrary pose, an algorithm is requiredto:

1. generate a reaching trajectory based on the actual pose of therobot;

2. synthesize a grasping trajectory based on the robust referencegrasping trajectories recorded during the training phase.

To represent the hand orientation, unit quaternions are used. Aunit quaternion can be considered as a point on a 4D hypersphere,and interpolating two quaternions consists of finding a point onthe geodesic path between these two points. However, care mustbe taken since any given orientation corresponds to two antipodalpoints on the hypersphere. The most straightforward approach isto use linear interpolation between the two unit quaternions andproject the result back on the sphere. This method, however, doesnot generate evenly-spaced quaternions. To obtain evenly-spacedquaternions, spherical linear interpolation (known as slerp) [26] isused. An efficient algorithm can be found in [27].3 We will denoteby q = slerp(q1, q2, t) the spherical linear interpolation of the unitquaternions q1 and q2 and the interpolation parameter t between 0and 1 (i.e. t = 0 corresponding to q = q1 and t = 1 correspondingto q = q2).

We now consider the position of the wrist expressed relativeto the object’s center and represented by a pure quaternion(a quaternion with scalar part equal to 0). Interpolation of theposition of the arm’s end-effector can be performed in a similarway as for the orientation. Note that spherical linear interpolation(slerp) cannot be directly used for the quaternions representingpositions as they are not of unit length. Therefore, we define thefunction average slerp (aslerp) as

aslerp(p1, p2, t) = s · slerp

p1∥p1∥

,p2

∥p2∥, t

(1)

where s is the scaling factor defined as

s = (1 − t) ∥p1∥ + t ∥p2∥ (2)

where p1 and p2 represents the positions as pure quaternion, tis the interpolation parameter and ∥p∥ is the quaternion normdefined as:

∥a + bi + cj + dk∥ =

a2 + b2 + c2 + d2. (3)

3 We use the Quaternion toolbox for Matlab (QTFM) available online at http://sourceforge.net/projects/qtfm/.

B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451 447

Fig. 7. Top view of different orientations for the object (θ =0°, 80°).

Table 1This table contains, for every object orientation in the table plane, the average position of the wrist and hand orientation (in the table plane) at pre-grasp and grasp (seeFig. 10). These results are also displayed in Fig. 11 using the symbols described in this table. The number of successful grasps achieved by the operator (out of ten attempts)are also indicated.

Objectorient.

Pre-grasp Grasp

Distance from object center(dPG)

Hand orient.(ηPG)

Distance from object center(dG)

Hand orient.(ηG)

Number of successfulgrasps

Symbol

20 0.41 −71.5° 0.378 −75.3° 5/10 ⋆

30 0.41 −69.7° 0.371 −73.0° 7/10 ◦

40 0.41 −66.1° 0.358 −71.1° 8/10 ×

50 0.41 −62.6° 0.350 −69.0° 10/10 +

60 0.41 −52.4° 0.345 −63.1° 10/10 ∗

70 0.41 −53.0° 0.337 −59.2° 10/10 �80 0.41 −50.1° 0.329 −57.3° 6/10 �

90 0.41 −44.1° 0.326 −47.1° 6/10 ∇

100 0.41 −39.8° 0.321 −40.6° 5/10 △

For interpolating two robot hand configurations f1 and f2which are 16 dimensional vectors, the following convex linearcombination is used:f = (1 − t) · f1 + t · f2 (4)where t is the interpolation parameter.

Given any object orientation θ , it is possible to generatea grasping trajectory T by merging two reference graspingtrajectories T1 and T2 obtained from object orientations θ1 and θ2respectively. We assume also that θ1 and θ2 are chosen as close aspossible to θ and we assume that θ1 < θ < θ2. Since the numberof waypoints for T1 and T2 is most likely not the same, the firststep is to modify and resample the motions to obtain the graspingtrajectories T′

1 and T′

2 with the same number of waypoints. Then,correspondingwaypoints from T′

1 and T′

2 are interpolated using themethod presented in previous subsection.

Since the initial position and orientation of the robot is notknown a priori, it is not possible to use directly the recorded valuesof the human reach to generate the robot reaching trajectory.However, many methods can be used to generate a continuousand smooth trajectory that combines smoothly with the grasptrajectory. One could also specify the trajectory either in jointspace or in cartesian space (end-effector position and orientation).In this approach, cubic splines are used to generate the reachpositions of the end-effector and the orientation of the wrist isobtained by using a spherical linear interpolation between theinitial orientation and the pre-grasp orientation.

4.4. Results

4.4.1. Autonomous operationAs mentioned earlier, a session of ten grasp actions for every

pre-defined object orientation was performed by one operator. Of

these attempts, the number of successful grasp for each objectorientation is summarized in Table 1 (i.e.: under Number successfulgrasps). The table shows that for orientations between 50° and70°, all attempts were successful; this is probably so because atthese orientations, grasping the virtual mug from its handle feelsmore natural and thus yielded more robust grasping. For otherobject orientations, not all attempts were successful but at least5 successful grasps were obtained. To obtain a representativereference grasping trajectory for each orientation, the most robustgrasp (as defined above) for each of these orientation wereextracted. Although the recordings were made with the positionof the object fixed directly in front of the robot and operator, thegrasping skill is generalizable to other position of the object aswell.For example, if the object is not directly in front of the robot,we caneasily rotate the robot (using θ1) to position the robot facing theobject, which would be similar to what the human would do. Wealso assume that the graspwould be similar if the distance betweenthe object and the robot is not the same as in the recording session.This helps reducing the sampling required during the recording byconsidering only the object orientation (1 dimensional) and not thedistance to the object or the height of the object (which wouldbe then 3 dimensional). One limitation of this assumption is thatgrasping when the object is at the reaching limit of the robot armmight fail or not be similar to how humans would perform thisgrasp.

Based on the robust reference grasp trajectories, it was possibleusing themethod above to generate a robotmotion for the object inan arbitrary position and orientation in the simulated environment(see. Fig. 8). Using a large number of random poses of the objectswithin the grasping range (both in position and orientation),all generated grasps were successful within the simulator. This

448 B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451

Fig. 8. Several frames from the generalized grasping execution in simulation are shown.

underlines the efficacy of using themost robust (humangenerated)trajectories for grasp synthesis, as described in Section 4.3. Thesetrajectories were also tested on the real robot to grasp real objects(see Fig. 9), although calibration was necessary to have the realrobot grasp the real mugs. The robustness of the in-simulationcase was not observed, due to the large backlash and other non-modeled dynamic properties of the robot and the physical controlsystem.4

4.4.2. Trajectory analysisIt is also of great interest to analyze the motion of the robot

induced by the human for grasping to try to detect patterns thatcould help us generalized the motion. To do so, we analyzed theposition of the robot’s wrist at the moment when the humanoperator grasped the virtual mug through the virtual robot, for afixed position of the object with different orientations. Since theheight of the wrist with respect to the table is almost constant(i.e. the z component is constant), we study the position of thewrist in the plane parallel to the table (i.e. the xy plane) withrespect to the object center using linear regression. The averageposition (not considering the height z) for each object orientation

4 Ideally, the data collection and autonomous execution should be carried outon the same platform (simulation or real world alone, i.e. not mixed). Howeversometimes there are advantages to use simulation even if the final deployment willbe on a real robot, for example for safety and avoiding wear and tear of the robot.

are shown in Table 1 and Fig. 11. The results are expressed usingpolar coordinates as illustrated in Fig. 10. Linear regression analysisshowed that the position of the wrist can be expressed as a linearfunction of the orientation of the object to be grasped (correlationcoefficient: 0.99) as also can be seen in Fig. 10. One can also see thatthe rotational change in the object orientation is not followed bythe radial change in the wrist position (see Table 1); the range ofobject orientation is 80° whereas the range for the grasp positionof the wrist is about 35°. This seems to indicate that the graspingstrategy induced on the robot by the operator is to position thewrist on a line and modify the hand orientation to compensatefor the object’s orientation rather than reflecting the orientationchange of the object in the wrist positioning.

5. Conclusion

In this studywe have presented our ongoingwork on extendingthe robot skill synthesis via human learning paradigm to graspingactions and provided additional analysis of the ball swappingskill. Although the grasping experiments conducted so far arepreliminary, we were able to discover interesting features of thegrasps generated by the human using the robot as an extensionof the body, albeit the different kinematics of the robot. Thedata indicates that the human operator positioned the robotwrist on a line as the orientation of the handle was rotated infixed increments. This is interesting as one could expect thatthe position of the wrist would follow the radial change in the

B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451 449

Fig. 9. Several frames from the generalized grasping execution on the real robot is shown.

Fig. 10. This figure describes the polar representation used to express the positionof a point with respect to the object. dPG and dG corresponds to the radius of a pointfrom the object center while ηPG and ηG is used to represent the polar angle. Thisformulation is used in Table 1 to express the position of the wrist at pre-grasp andgrasp.

handle orientation. The linear wrist positioning strategy couldbe beneficial since it helps to decouple the control of positionand orientation of the hand, thereby reducing the dimensionalityof the control problem for the central nervous system. A longstanding question in human motor control is whether the controlof reach and grasp are controlled independently or through a singlecontrol mechanism [28]. The current results cannot help answerthis question as the human operator might have formulated adifferent solution for the robot limb he controlled. However it willbe very important if it turns out that the human grasping has alsothis property; because thiswould suggest that human operator has

utilized his own control strategy for the robot limb he was givencontrol of.

Once we have a human operator discover an effective graspingstrategy (the stage reached in our study), there are two ways onecan proceed for obtaining autonomous grasping on the robot. Thefirst alternative is extracting a high level principle (e.g. linear wristpositioning) from the human discovered strategy, and designinga conventional controller based on this principle. The secondalternative is using the individual grasp examples of the robotobtained by human operator to synthesize a grasping controller.In this study we have shown how the latter can be efficiently doneand showed synthesized grasp executions in simulation and withthe real robot.

This study gives leverage to our claim that exploiting humancapacity to learn novel control tasks is an effective way forsynthesizing dexterous robot behaviors. The attractive point ofthis approach is that generating new behaviors does not requirerobotics experts, even though the solution of the targeted behaviorcan be a complex robotics problem. The analyses presentedin this paper indicate that the obtained skills have a lowerintrinsic dimension and thus are open to self-improvement byreinforcement learning. Although, in this study, we did not addressrobot learning after the initial skill acquisition, the initial acquiredskill can be self-improved via robot learning. This will be muchfaster than learning from scratch, as in for example, using areinforcement learning scheme with no prior state or action valuefunction knowledge. In fact, the human learning can be usedto build up the initial state/action value function to bootstrapfurther learning. Another possibility is to give certain autonomyand learning capability to the robot while human is learning tocontrol robot behavior. Thismight yield a very fastmutual learningscheme for both the human and the robot.

The robotic hand in this study can be seen akin to a novel toolthat we need to learn to use or interface with, like a computermouse, that at sometime in the past, we had to spent time toget used to. We think using anthropomorphic robots will make it

450 B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451

Fig. 11. These figures summarizes the pre-grasp and grasp position of the wrist in the local frame of the object (top view) summarized in Table 1. Each point (or symbol)corresponds to the average wrist position for all successful grasp for a given object orientation in the x–y plane. The correspondence between the type of point and theobject orientation is summarized in Table 1. The points on the curve indicate the position of the wrist at pre-grasp (i.e. at the boundary of the grasp zone) whereas the pointsaround the best linear approximation indicate the wrist position at object contact.

easier for humans to subsume a high degree of freedom robot intothe body schema – the neural representation of one’s body – andexecute dexterous taskswith it. This, of course, can only be verifiedthrough behavioral experimentation. Therefore, this study not onlypresents a novel way of obtaining grasping andmanipulation skillsbut also indicates that the techniques from experimental sciencesmay be utilized for dexterous robot behavior synthesis.

Acknowledgments

We would like to thank Emre Ugur for the development ofthe simulator and his technical support throughout this project.B. Moore was supported by the Japan Society for the Promotionof Science (JSPS). E. Oztop was supported in part by the GlobalCOE Program, Center of Human-Friendly Robotics Based on CognitiveNeuroscience at the Ministry of Education, Culture, Sports, Scienceand Technology, Japan.

References

[1] C.G. Atkeson, J.G. Hale, F.E. Pollick, M. Riley, S. Kotosaka, S. Schaal, T. Shibata, G.Tevatia, A. Ude, S. Vijayakumar, M. Kawato, Using Humanoid robots to studyhuman behavior, IEEE Intelligent Systems 15 (2000) 46–56.

[2] D. Kushida, M. Nakamura, S. Goto, N. Kyura, Human direct teaching ofindustrial articulated robot arms based on force-free control, Artificial Life andRobotics 5 (1) (2001) 26–32.

[3] J. Saunders, C.L. Nehaniv, K. Dautenhahn, A. Alissandrakis, Self-imitationand environmental scaffolding for robot teaching, International Journal ofAdvanced Robotics Systems 4 (1) (2007) 109–124.

[4] J. Saunders, C.L. Nehaniv, K. Dautenhahn, Teaching robots by mouldingbehavior and scaffolding the environment, in: 1st Annual Conference onHuman–Robot Interaction HRI2006, Salt Lake City, Utah, USA, March 2–4,2006, pp. 142–150.

[5] D.C. Bentivegna, C.G. Atkeson, G. Cheng, Learning tasks from observation andpractice, Robotics and Autonomous Systems 47 (2–3) (2004) 163–169.

[6] L. Berthouze, P. Bakker, Y. Kuniyoshi, Development of oculo-motor controlthrough robotic imitation, in: IEEE/RSJ International Conference on Roboticsand Intelligent Systems, Osaka, Japan, 1997, pp. 376–281.

[7] A. Billard, R. Siegwart, Robot learning from demonstration, Robotics andAutonomous Systems 47 (2–3) (2004) 65–67.

[8] A. Billard, Y. Epars, S. Calinon, S. Schaal, G. Cheng, Discovering optimalimitation strategies, Robotics and Autonomous Systems 47 (2–3) (2004)69–77.

[9] C. Breazeal, B. Scassellati, Robots that imitate humans, Trends in CognitiveSciences 6 (11) (2002) 481–487.

[10] Y. Demiris, G. Hayes, Imitation as a dual-route process featuring predictiveand learning components: a biologically-plausible computational model,in: K. Dautenhahn, C. Nehaniv (Eds.), Imitation in Animals and Artifacts, MITPress, 2002.

[11] A. Ijspeert, J. Nakanishi, S. Schaal, Learning attractor landscapes for learningmotor primitives, in: S. Becker, S. Thrun, K. Obermayer (Eds.), Advances inNeural Information Processing Systems, MIT Press, 2003, pp. 1547–1554.

[12] Y. Kuniyoshi, Y. Kuniyoshi, Y. Yorozu, M. Inaba, H. Inoue, From Visuo–Motorself learning to early imitation – a neural architecture for humanoid learning,International Conference on Robotics and Automation, Taipei, Taiwan, 2003.

[13] S. Schaal, Is imitation learning the route to humanoid robots? Trends CognitiveSciences 3 (6) (1999) 233–242.

[14] S. Schaal, A. Ijspeert, A. Billard, Computational approaches to motor learningby imitation, philosophical transaction of the royal society of London: seriesB, Biological Sciences (2003) 537–547.

[15] E. Oztop, L. Li-Hang, M. Kawato, G. Cheng, Dexterous skills transfer byextending human body schema to a robotic hand, IEEE-RAS InternationalConference on Humanoid Robots, Genova, Italy, 2006.

[16] E. Oztop, L. Li-Hang, M. Kawato, G. Cheng, Extensive human training for robotskill synthesis: validation on a robotic hand, IEEE International Conference onRobotics and Automation, Roma, Italy, 2007.

[17] H. Imamizu, S. Miyauchi, T. Tamada, Y. Sasaki, R. Takino, B. Putz, T. Yoshioka,M. Kawato, Human cerebellar activity reflecting an acquired internal model ofa new tool, Nature 403 (2000) 192–195.

[18] A. Iriki, M. Tanaka, Y. Iwamura, Coding of modified body schema during tooluse by macaque postcentral neurones, Neuroreport 7 (1996) 2325–2330.

[19] S. Obayashi, T. Suhara, K. Kawabe, T. Okauchi, J. Maeda, Y. Akine, H. Onoe, A.Iriki, Functional brain mapping of monkey tool use, Neuroimage 14 (2001)853–861.

[20] J. Steffen, E. Oztop, H.J. Ritter, Structured unsupervised kernel regression forclosed-loop motion control, in: 2010 IEEE/RSJ International Conference onIntelligent Robots and Systems, Taipei, Taiwan, 2010.

[21] R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction, MIT Press,Cambridge, Mass, 1998.

[22] N.A. Bernstein, The Coordination and Regulation of Movements, PergamonPress, Oxford, 1967.

[23] M. Santello, M. Flanders, J.F. Soechting, Postural hand synergies for tool use,The Journal of Neuroscience 18 (1998) 10105–10115.

[24] B. Moore, E. Oztop, Redundancy parameterization for flexible motion control,in: Proceedings of the ASME IDETC 2010, Montreal, Canada, 2010.

[25] E. Ugur, E. Oztop, E. Sahin, Affordance learning from range data for multi-stepplanning, in: Proceedings of the Ninth International Conference on EpigeneticRobotics Epirob’09, Lund University Cognitive Science Studies, vol. 146, 2009,pp. 177–184.

[26] K. Shoemake, Animating rotation with quaternion curves, in: SIGGRAPH’85: Proceedings of the 12th Annual Conference on Computer Graphics andInteractive Techniques, 1985, pp. 245–254.

[27] J. Kuffner, Effective sampling and distance metrics for 3D rigid body pathplanning, in: IEEE International Conference on Robotics and Automation, NewOrleans, 2004, pp. 3993–3998.

[28] E. Oztop, M. Kawato, Models for the control of grasping, in: D. Nowak,J. Hermsdoerfer (Eds.), Sensorimotor Control of Grasping: Physiology andPathophysiology, Cambridge University Press, 2009.

B. Moore, E. Oztop / Robotics and Autonomous Systems 60 (2012) 441–451 451

Brian Moore earned his Ph.D. from the Johannes KeplerUniversity. During this period, he was a member ofthe Research Institute for Symbolic Computation (RISC)and a research scientist at the Radon Institute forComputational and Applied Mathematics (RICAM) of theAustrian Academy of Sciences. Then he worked as aresearcher at the ATR Cognitive Mechanisms Laboratoriesin Japan. He is now a researcher in the robotic lab at theLaval University.

Erhan Oztop earned his Ph.D. from the University ofSouthern California in 2002. After he obtained his degree,he joined Computational Neuroscience Laboratories atthe Advanced Telecommunications Research InstituteInternational, (ATR) in Japan. There he worked as aresearcher and later a group leader within the JSTICORP Computational Project during 2004–2008. From2009 to 2011 he was a senior researcher at ATRCognitive Mechanisms Laboratories where he led theCommunication and Cognitive Cybernetics group. Duringthis timehe also held a visiting associate professor position

at Osaka University and was affiliated with the National Institute of Informationand Communication Technology, Kyoto, Japan. Currently, he is an engineeringfaculty member of Ozyegin University, Istanbul, Turkey, and is affiliated with ATR,Kyoto, Japan. His research interests include computational modeling of the brainmechanisms of action understanding and intelligent behavior, human-machineinterface, robotics, machine learning and cognitive neuroscience.