21
Acquiring Mobile Robot Behaviors by Learning Trajectory Velocities KOREN WARD School of Information Technology and Computer Science, The University of Wollongong Wollongong, NSW, Australia, 2522. [email protected] ALEXANDER ZELINSKY Research School of Information Sciences and Engineering, The Australian National University Canberra, ACT, Australia, 0200. [email protected] Abstract. The development of robots that learn from experience is a relentless challenge confronting artificial intel- ligence today. This paper describes a robot learning method which enables a mobile robot to simultaneously acquire the ability to avoid objects, follow walls, seek goals and control its velocity as a result of interacting with the envi- ronment without human assistance. The robot acquires these behaviors by learning how fast it should move along predefined trajectories with respect to the current state of the input vector. This enables the robot to perform object avoidance, wall following and goal seeking behaviors by choosing to follow fast trajectories near: the forward direc- tion, the closest object or the goal location respectively. Learning trajectory velocities can be done relatively quickly because the required knowledge can be obtained from the robot’s interactions with the environment without incur- ring the credit assignment problem. We provide experimental results to verify our robot learning method by using a mobile robot to simultaneously acquire all three behaviors. Keywords: robot learning; fuzzy associative memory; trajectory velocity learning; TVL, unsupervised learn- ing; associative learning; multiple behavior learning. 1. Introduction Providing mobile robots with reactive behaviors us- ing conventional programming methods can be a difficult and time consuming task. This is because: It is hard to foresee and encode all situations the robot could encounter. It can be hard to decide precisely which action the robot should take and how fast it should move in all situations. It may not be possible to predict precisely what information will emerge from the sensors in many situations. It may not be possible to know if the final en- coded behavior is robust enough for the in- tended task or whether it will work effectively in different environments. 1.1. Learning Robot Behaviors via Demonstrated Actions To make the task of encoding robot behaviors simpler, some researchers have been able to show that certain, robot behaviors can be learnt by demonstrating actions with the robot via remote control. For example, Pomer- leau (1993) Krose and Eecen (1994) and Tani and Fuku- mura (1994) showed how training data derived from demonstrated actions could be used to train neural network based controllers for mobile robot navigation. Also, Wyeth (1998) demonstrated how neural networks could be trained to enable a visually guided mobile ro- bot to avoid objects and fetch a ball. Alternatively, Cas- tellano et al (1996) showed how Fuzzy Associative Memory (FAM) matrices (see Kosko (1992)) could be trained to obtain wall following behavior by demo n- strating the behavior with a sonar equipped mobile ro-

Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

  • Upload
    others

  • View
    1

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

Acquiring Mobile Robot Behaviors byLearning Trajectory Velocities

KOREN WARDSchool of Information Technology and Computer Science, The University of Wollongong

Wollongong, NSW, Australia, [email protected]

ALEXANDER ZELINSKYResearch School of Information Sciences and Engineering, The Australian National University

Canberra, ACT, Australia, [email protected]

Abstract. The development of robots that learn from experience is a relentless challenge confronting artificial intel-ligence today. This paper describes a robot learning method which enables a mobile robot to simultaneously acquirethe ability to avoid objects, follow walls, seek goals and control its velocity as a result of interacting with the envi-ronment without human assistance. The robot acquires these behaviors by learning how fast it should move alongpredefined trajectories with respect to the current state of the input vector. This enables the robot to perform objectavoidance, wall following and goal seeking behaviors by choosing to follow fast trajectories near: the forward direc-tion, the closest object or the goal location respectively. Learning trajectory velocities can be done relatively quicklybecause the required knowledge can be obtained from the robot’s interactions with the environment without incur-ring the credit assignment problem. We provide experimental results to verify our robot learning method by using amobile robot to simultaneously acquire all three behaviors.

Keywords: robot learning; fuzzy associative memory; trajectory velocity learning; TVL, unsupervised learn-ing; associative learning; multiple behavior learning.

1. Introduction

Providing mobile robots with reactive behaviors us-ing conventional programming methods can be adifficult and time consuming task. This is because:• It is hard to foresee and encode all situations the

robot could encounter.• It can be hard to decide precisely which action

the robot should take and how fast it shouldmove in all situations.

• It may not be possible to predict precisely whatinformation will emerge from the sensors inmany situations.

• It may not be possible to know if the final en-coded behavior is robust enough for the in-tended task or whether it will work effectively indifferent environments.

1.1. Learning Robot Behaviors viaDemonstrated Actions

To make the task of encoding robot behaviors simpler,some researchers have been able to show that certain,robot behaviors can be learnt by demonstrating actionswith the robot via remote control. For example, Pomer-leau (1993) Krose and Eecen (1994) and Tani and Fuku-mura (1994) showed how training data derived fromdemonstrated actions could be used to train neuralnetwork based controllers for mobile robot navigation.Also, Wyeth (1998) demonstrated how neural networkscould be trained to enable a visually guided mobile ro-bot to avoid objects and fetch a ball. Alternatively, Cas-tellano et al (1996) showed how Fuzzy AssociativeMemory (FAM) matrices (see Kosko (1992)) could betrained to obtain wall following behavior by demo n-strating the behavior with a sonar equipped mobile ro-

Page 2: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

2

bot. However, the main disadvantage of using dem-onstrated actions to derive robot controllers is that itcan be difficult to obtain non-conflicting trainingdata that accurately and comprehensively describesany specific behaviors. Furthermore, it can be diffi-cult to extract an efficient controller from the trainingdata that performs adequately. Also, it may not bepractical or possible to repeatedly teach a robot thesame or different situations in order to improve itsactions if the robot exhibits inadequate behavior inunknown environments. For these reasons it ishighly desirable for control systems to be built thatallow robots to automatically produce or improvetheir behaviors via some form of unassisted robotlearning process.

1.2 Learning Robot Behaviors Without Supervision

To enable robots to learn behaviors automatically, avariety of techniques have been used. These tech-niques include the use of Reinforcement Learning(RL) (e.g. Materic (1997), Kaelbling, (1993), Conneland Mahadevan (1992)); Genetic Algorithms (GAs)(e.g. Floreano and Mondada (1994, 1995, 1996), Ja-kobi (1994), Jakobi et al (1992), Nolfi et al, (1994));and some more unique approaches (like Sharkey(1998), Nemhzow et al, (1990), Michaud and Materic(1998)).

RL has successfully been applied to robots forlearning a wide variety of behaviors and tasks (e.g.robot soccer Asada et al (1995), light seeking Kael-bling (1991), box pushing Connel and Mahadevan(1992), and hitting a stick with a ball Kalmar, et al,(1998)). Generally, these tasks are acquired by usingRL to get the robot to learn either low-level behav-iors as in Connel and Mahadevan (1992), Kaelbling(1995), Asada et al (1995) or behavior switching poli-cies like Materic (1997) and Kalmar, et al (1998). Inmany cases, results are achieved by simplifying theproblem by providing the robot with minimal inputsensing and few output actions or by carefully handcoding appropriate control and/or reinforcement sig-nals to facilitate learning. For example, Kaelbling(1991) devised a robot called "Spanky" to demo n-strate how different RL algorithms compared in per-formance at learning light seeking behavior. Thisrobot was equipped with 5 whiskers and 2 infra red(IR) sensors. By grouping the front sensors and us-ing thresholds on the IR sensors Kaelbling was ableto represent the entire input space with just 5 bits.By also providing the robot with only 3 possibleactions for performing its behavior, the problem wassuccessfully reduced to one of finding a solution

that works well from the 25 x 3 = 96 possible outcomes.In cases where the robot is equipped with more com-

prehensive sensing, for example Connel and Ma-hadevan's (1992) box pushing sonar robot, good resultsbecome much harder to achieve. This is because theinput to output state space becomes to large to belearnt in real time due to the credit assignment problem(as explained in Watkins (1989)). This limiting constraintof RL results mainly from the time lag between timesteps and the uncertainty associated with assigningcredit to actions when delayed reinforcement signalsare involved. Despite this limitation, Connel and Ma-hadevan were able to show that RL could achieve re-sults, comparable to hand coded solutions, by usingthe learnt input-output associations to statisticallyclassify the input space. Impressive RL results werealso achieved with the vision equipped soccer playingrobots of Asada et al (1995). However, these resultswere obtained by performing learning off-line in simula-tions and installing the learned associations in the realrobot. Unfortunately, if this approach is used in un-known or typical unstructured environments, learningbecomes a much harder task due to the difficulties as-sociated with adequately modeling the environment androbot’s sensors (particularly if noisy sensors such asultrasonic sensors are used). For a more detailed surveyof RL see Kaelbling (1996).

Besides RL, most other forms of unassisted robotlearning involve using various search methods withinthe space of possible controllers in an attempt to find acontrol solution that performs well. The most commonof these search methods involves the use of geneticalgorithms Holland (1986), Meeden (1996), genetic pro-gramming Koza (1991) or some more novel search meth-ods like Schmidhuber (1996). Generally, these ap-proaches fail to produce results as good as thoseachieved with RL, particularly where robots with con-siderable sensing are involved (i.e. where the statespace is greater than 10,000). This lack of performanceis due mainly to the size of the required search spaceand the time it takes to evaluate the performance ofpossible control solutions on the physical robot. De-spite this, Floreano and Mondada (1994, 1995, 1996)were able to demonstrate that object avoidance, homingbehavior and grasping behavior could be evolved withGAs on their "Khepera" robot over considerable peri-ods of time (days in the case of grasping behavior).Here object avoidance and homing behaviors requiredthe robot's motion to be carefully monitored with exter-nal laser sensors. The grasping behavior was achievedincrementally by saturating the environment withgraspable balls and gradually reducing the ball densityas competence was acquired. In addition to these re-sults, Jakobi (1994), Jakobi et al (1992), Nolfi et al

Page 3: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

3

(1994), Grefenstette and Schultz (1994) and Schultz(1991) managed to evolve various robot behaviorson a simulator which were then installed in a realrobot. However, like the soccer playing robots ofAsada et al (1995), controllers obtained this way canonly be expected to produce good results in knownhighly structured environments that are capable ofbeing effectively modeled in the computer. For acomprehensive survey on evolved robot controllers,see Materic and Cliff (1996).

An alternative method to RL and evolutionarytechiques for producing behaviors on robots withoutsignificant human assistance was demonstrated bySharkey (1998). This method involved operating therobot with a hand coded innate controller based onartificial potential fields Khatib (1985), and by col-lecting sensor-input and command-output associa-tions generated by the robot's motion. After consid-erable operation, the subsequent collected trainingdata was preprocessed and used to train a neuralnetwork which was then used to control the robot.By using this approach Sharkey was able to demo n-strate that improved performance and shortcomingsinherent in the innate controller could be overcomewith this method. This however, requires an innatecontroller to be devised and implemented first. Fur-thermore, there can be no guarantee that a poorlyperforming hand coded controller will produce anadequate neural network controller no matter howmuch training is performed.

To improve on existing unassisted robot learningmethods we have devised a method for rapidly ac-quiring object avoidance, wall following and goalseeking behaviors on mobile robots equipped withrange sensing devises. Unlike most existing robotlearning methods, which are based on learning asso-ciations between sensors and actions, our approachis based on learning association between sensorsand trajectory velocities. Previously, Singh et al(1993) demonstrated that faster learning times couldbe achieved with RL by using a simulated robot tolearn associations between the environment stateand so called "Dirichlet" and "Neuman" parameters.However this approach was environment specificand also suffered from the credit assignment prob-lem. By using our approach learning is rapid, onlineand suitable for implementation on a real robot. Fur-thermore, there is no credit assignment problem andall three behaviors are learnt simultaneously on asingle associative map. Also, the acquired behaviorscan have their object clearance distance changed byadjusting a single threshold parameter and the robotalso learns to appropriately control its velocity.

In Section 2, we describe our approach to unassistedrobot learning by illustrating some of the similaritiesand differences between our trajectory velocity learningapproach and the traditional RL approach to learningobject avoidance behavior. In Section 3 and 4, we de-scribe how we use Fuzzy Associative Memory (FAM)matrices (see Kosko (1992)) to store and incrementallylearn associations between sensor range readings andtrajectory velocities. Section 5 describes various ex-periments we conducted with our robot in a variety ofenvironments.

2. Trajectory Velocity Learning

To overcome the slow learning times associated withexisting unassisted robot learning methods and to en-able robots with considerable sensing to learn in realtime we have decided to alter the actual learning task.Instead of performing the difficult task of learning asso-ciations between sensor inputs and output responses,as for example in conventional RL, we use the robot tolearn a much simpler task: i.e. learning associations be-tween sensors and trajectory velocities as depicted inFigure 1. We refer to this form of learning as TrajectoryVelocity Learning (TVL). Each input vector is comprisedof immediate sensor range readings and trajectory ve-locities can be described as appropriate velocities fornegotiating a predefined set of immediate straight orcurved paths based on their collision distances withnearby objects. Hence, if one of the robot's immediatetrajectories shown in Figure 1 collides with a close ob-ject, appropriately, it should have a slower velocity thanother trajectories that lead into free space.

Figure 1. Learning associations between range readings andtrajectory velocities.

Recently, Fox et al (1997) demonstrated that trajectoryvelocities can be used as an efficient means of makingcontrol decisions for avoiding obstacles with a mobilerobot. However, with Fox et al's "dynamic window"approach, trajectory velocities were not learnt. Instead,they were calculated by considering all objects in the

Page 4: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

4

vicinity of the robot. This requires all objects aroundthe robot to be accurately detected in order to cal-culate trajectory velocities. Usually, this is difficult toachieve even with the most sophisticated sensingdevices. In particular, sonar sensors will only returna range reading if the sonar beam is almost normal tothe object surface. Alternatively, if learnt environ-ment maps or occupancy grids are used to assist inlocating objects around the robot, the robot’s abilityto negotiate unknown regions of the environment isrestricted. This also has the disadvantage that therobot has to accurately estimate its position which isdifficult to achieve using only odometry.

By using the robot to learn associations betweensensor range readings and trajectory velocities thesedeficiencies are overcome. This is because the robotlearns to perceive its environment in terms of trajec-tory velocities eliminating the need for object loca-tions to be known in advance when control deci-sions are made. Furthermore, the use of an associa-tive map to look up trajectory velocities directly fromsensor data enables trajectory velocities to be deter-mined quickly. This results in fast response timesand can allow more trajectories to be considered ascandidates during each time step.

To limit the amount of learning required, the ro-bot's available trajectories are limited to being eithera line straight ahead or a preset number of arcs to theleft or right of the robot with preset radii. Thus thelearning task involves associating a number of in-stantaneous trajectory velocities (7 in the case ofFigure 1) with each input vector state. Although thismay require the discovery of more information thanlearning one to one associations between input vec-tors and simple command responses, the requiredknowledge can be obtained from the robot’s interac-tions with the environment without incurring thecredit assignment problem or requiring fitnessevaluations. Thus, learning is achieved faster thanother typical unassisted robot learning methods.

To explain this with an example, consider the taskof learning object avoidance behavior using conven-tional RL. Figure 2(a) shows the path and commandsperformed by a robot prior to an unintentional colli-sion with an object. When this occurs, there is nocertain way of deciding which responses leading upto the collision should have been taken or even howmany responses were at fault. So with RL, correctcommand responses can only be discovered throughthe repeated assignment of positive or negativecredit to those actions suspected of being correct orincorrect respectively. Hence, over long periods oftime and after many similar collisions, correct re-sponses may accumulate larger credit and exhibit

appropriate actions for this and other similar situations.Unfortunately, when the input space is of considerablesize and many possible command responses exist,learning becomes unacceptably slow regardless of theRL algorithm deployed (see Kaebling (1996)). Hence,such learning tasks are viable only when applied torobots with limited sensing and few command re-sponses or alternatively by performing learning in fastsimulations.

(a)

(b)

Figure 2. Two different ways a mobile robot can learn fromenvironmental stimuli. (a) Learning appropriate commandresponses by assigning credit to actions via RL. (b) Learningappropriate trajectory velocities by considering collision points oftraversed trajectories.

Alternatively, if the learning task is based on learn-ing associations between sensor range readings andappropriate instantaneous velocities for traversing tra-jectories, these can easily be calculated by consideringthe collision point of each traversed trajectory asshown in Figure 2(b). To obtain sensor-trajectory ve-locity associations, sensor data is recorded and helduntil the current trajectory's collision point is obtained.Trajectory collision points can be obtained by usingaccumulated sensor data and odometry to estimate theirpositions, or alternatively, by just following each se-lected trajectory slowly until a collision occurs. Once

Page 5: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

5

the current trajectory's collision point is determined,a preset constant deceleration rate is used to calcu-late the appropriate velocity for each point leadingup to each trajectory’s collision point (if any). Byassociating the calculated velocities with the corre-sponding sensor data that occurred along the trajec-tory, training patterns are obtained and used to trainan associative map. Thus, if the robot were to followany similar colliding trajectories, while complyingwith learnt trajectory velocities, it would come to asafe halt just before coming into contact with theobject. Trajectories which do not collide with ob-jects (i.e. complete full circles) can be appropriatelyassociated with a maximum safe velocity.

Therefore by following various trajectories andcalculating appropriate velocities based on trajectorycollision points, a TVL robot, through its sensors,acquires knowledge of appropriate velocities itshould negotiate each of its pre-designated trajecto-ries at any instant. The robot thereby becomes capa-ble of controlling its velocity and making directionchoices based on this information. By providing therobot with a single instruction to " follow trajectorieswhich are perceived to be fastest", object avoidancebehavior automatically becomes exhibited since tra-jectories which lead into free space will be perceivedto have faster velocities than trajectories which col-lide with nearby objects. Hence, the basic differencesbetween conventional RL and TVL when used tolearn object avoidance behavior are: the type of in-formation that is stored, the means by which it islearnt and the manner by which actions are decidedas Figure 3 depicts.

(a) (b)Figure 3. Basic differences between: (a) ReinforcementLearning (RL) and (b) Trajectory Velocity Learning (TVL)for learning object avoidance.

2.1. Learning Multiple Behaviors with TVL

TVL also makes it possible for the robot to pro-duce other behaviors, besides object avoidance be-havior (shown in Figure 4(a)), without the need forthe robot to learn different associative maps. Forexample, if we instruct the robot to "follow fast tra-jectories which are closest to the nearest detected

object", as in Figure 4(b), wall following behavior be-comes exhibited in the direction closet to the robot’sforward motion. By "following fast trajectories closestto the right of nearest object" produces left wall fol-lowing behavior and conversely "following fast trajec-tories closest to the left of the nearest object" producesright wall following behavior.

(a)

(b)

Figure 4. Using TVL to produce multiple behaviors.(a) Object avoidance. (b) Wall following.

Goal seeking behavior also becomes possible if therobot has perception of both its trajectory velocitiesand the goal location. This is achieved by providing therobot with an instruction to follow fast trajectories to-ward the perceived goal location. Fortunately, this alsoproduces an implied obstacle avoidance capabilitywithout the need to switch behaviors since any convexobject encountered between the robot and the goal willcause the direct trajectory to be perceived to be slowerthan those which lead around the obstacle. A TVL ro-bot like that in Figure 5(a) consequently chooses faster

Page 6: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

6

alternative trajectories resulting in a path around theobstacle being negotiated while still maintaining pur-suit of the goal location. However, if the robot en-counters a deep crevice, like the one depicted in Fig-ure 5(b), simply following a fast trajectory toward thegoal will not escape the crevice. To escape deep lo-cal minimums when seeking goals the robot couldattempt to follow walls in both directions for in-creasing periods of time or alternatively could usepurposive maps Zelinsky and Kuniyoshi (1996) tolearn the shortest path to the goal.

Figure 5. Goal seeking with a TVL robot. (a) Robot success-fully avoids object. (b) Robot trapped by deep crevice.

2.2 Adjusting TVL Behaviors

Varied control of the robot’s behaviors is also possi-ble by providing a variable velocity threshold to de-termine if perceived trajectory velocities are consid-ered to be fast or slow. This makes it possible to con-trol the wall clearance distance in wall following be-haviors and the object avoidance distance in objectavoiding behaviors as shown by the TVL simulatorscreen dumps in Figure 6. For example, if the velocitythreshold is lowered the robot follows trajectoriescloser to objects before itsvelocity falls below thethreshold causing another faster trajectory to beselected. When avoiding objects, as in Figure 6(a),the robot moves cautiously closer to objects beforeavoiding them. Also, when performing wall follow-ing, as in Figure 6(c), a low velocity threshold resultsin walls being followed more closely and at lowerspeed. Conversely, raising the threshold causes therobot to maintain larger object clearances and resultsin the robot moving faster and more competentlythrough the environment when performing its be-haviors, as Figure 6(b) and 6(d) show.

(a) (b)

(c) (d)

Figure 6. Controlling object clearance distances with a velocitythreshold. (a) and (c) Low velocity threshold: small objectclearance. (b) and (d) High velocity threshold: larger objectclearances.

Figure 7 shows a schematic of TVL and summarizeshow mobile robots can become capable of performingcertain adjustable behaviors by making simple choicesbased on learnt trajectory velocities.

Figure 7. Using TVL to acquire multiple adjustable robot be-haviors simultaneously.

If conventional RL or GA robot learning methods wereused to acquire the same control and competency,these methods would require each behavior to be learntin separate maps (or in neural nets). They would requirethe implementation of adequate reinforcement signalsor fitness evaluation functions. It would be difficult forbehaviors learnt with RL or GAs to also acquire an ade-

Page 7: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

7

quate means of controlling the robot's velocity. Itmay not be possible to provide a facility for adjust-ing the robot's object clearance distance. Learningcould also be expected to take a long time due to thecredit assignment problem or the fitness evaluationproblem. However, RL and GA methods have theadvantage that they are suitable for learning a vari-ety of behaviors with a diverse range of sensing de-vices. Although, TVL (on its own) cannot be ex-pected to acquire complex tasks that have been suc-cessfully learnt with RL and GAs (e.g. robot soccerAsada et al (1995), box pushing Connel and Ma-hadevan (1992), hitting a stick with a ball Kalmar etal (1998)), TVL potentially could facilitate learningcomplex task with RL or GAs by providing an effec-tive means of rapidly acquiring essential low levelbehaviors.

2.3 What Type of Robot Learning is TVL?

With TVL the robot does not have to perform thedesired behaviors or experience collisions in order tolearn from its environment. TVL therefore does notuse collisions (or the detection of trajectory collisionpoints) as a behavior based reinforcement signal.Instead, collisions are used as a means of obtainingreference points within the environment that enablesassociations between sensors and trajectory veloci-ties to be directly obtained. Thus, TVL does notlearn behaviors via trial and error or suffer from thecredit assignment problem and therefore cannot beregarded as a reinforcement learning process. Asexplained in Section 3 and 4, each sensor-velocityassociation obtained from the robot's interactionswith the environment is used to train Fuzzy Associa-tive Memory (FAM) (or FAMs) via a supervisedmachine learning process called compositional ruleinference (see Sudkamp and Hammell (1994)).

Although, supervised machine learning is used tolearn sensor-velocity associations, TVL requires noteacher (either human or from an innate controller) asother typical robot learning methods involving su-pervised machine learning processes have, e.g.Sharkey (1998) and Tani and Fukumura (1994). Thisis because the robot does not directly learn actionsand therefore has no need for any actions to bedemonstrated. TVL instead learns only trajectoryvelocities which enables a variety of behaviors to beeasily produced with simple instructions. For thisreason we prefer not to refer to teacher and non-teacher robot learning methods as supervised orunsupervised robot learning respectively, as is oftenthe case in other published works.

The closest form of supervised machine learningapplied to robotics that resembles TVL is Direct InverseControl, see Kraft and Campagna (1990) and Guez andSelinski (1988). An example of this is where a robot armuses neural networks to directly learn the mapping fromdesired trajectories (of the arm) to control signals (e.g.joint velocities) which yield these trajectories. As withTVL no teacher is required. Associations (or trainingpatterns) are generated by engaging the arm in randomtrajectory motion. Each training pattern's output is ob-tained shortly after the training pattern's input is given.

2.4 TVL Robot Setup

To conduct our TVL experiments we use the Yamabicomobile robot Yuta et al (1991) shown in Figure 8(a).This robot is equipped with 16 sonar sensors arrangedin a ring equally spaced around the robot and a bumpsensor for detecting collisions. We provide the robotwith seven trajectory locomotion commands for negoti-ating its environment labeled T3L to T3R on Figure 8(b).Each trajectory has a maximum velocity in the forwardand reverse direction appropriate for safely negotiatingthese trajectories. Stop and spin commands are alsoutilized to halt the robot when collisions occur or toface the robot in a desired direction.

(a) (b)

Figure 8. (a) Yamabico mobile robot equipped with a 16 sensorsonar ring. (b) Robot’s trajectory commands with trajectoryradii and maximum velocities shown.

In the following sections, we describe how associa-tions between input vectors and trajectory velocitiescan be learnt by using Fuzzy Associative Memory(FAM) matrices. We discuss the limitations of using asingle FAM to learn a mapping between sensors andtrajectory velocities and explain how improved percep-tion and performance can be achieved by using multipleFAMs to individually map sensors to each trajectory.

Page 8: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

8

3. Using FAM Matrices to Map Sensorsto Trajectory Velocities

Considerable work in fuzzy control uses a matrix torepresent fuzzy rule consequents called a Fuzzy As-sociative Memory (FAM) matrix (see Kosko (1992)for a concise description). A FAM matrix can be de-scribed as an N dimensional table where each dimen-sion represents a specific input. The size of eachdimension equals to the number of fuzzy membershipfunctions used to describe the representative input.For example, a FAM matrix with 2 inputs and 3 mem-bership functions for describing each input (e.g.small, medium and high) would be require a tablewith 32 = 9 entries to store all possible fuzzy ruleconsequents. Like lookup tables, FAM matrices havethe advantage of allowing fuzzy rule consequents tobe directly accessed from the input vector whichenables their output to be produced quickly. Fur-thermore, the output is derived by what is effectivelya parameterized smoothing (weighted averaging)over a small neighborhood of table entries whichprovides good generalization and immunity to is o-lated incorrect entries. This is explained in Section4.2. FAM matrices also have the advantage of beingtrainable via a supervised machine learning processcalled compositional rule inference, see Sudkamp andHammell (1994). Unlike conventional neural net-works, FAM matrices have the advantage of beingable to learn and recall associations quickly, are ca-pable of incremental learning and can enable the de-signer to appropriately divide up the input space.Furthermore, the acquired knowledge within FAMsis capable of being interpreted by examining thefuzzy rules which comprise table entries.

3.1 Using a Single FAM Matrix to Map Sensors toTrajectory Velocities

The main disadvantage with using a FAM matrix toclassify robot sensor data is that the size of the ma-trix increases exponentially with increasing numbersof inputs and fuzzy sets. For example, a FAM whichhas 16 inputs connected to sensors and 4 member-ship functions describing each input will require 416

or 4,294,970,000 entries to store all possible rule con-sequents. This not only will require considerablememory but for learning purposes may also require alot of data to be learnt in order to fill those entries.One way to effectively reduce the size of the FAMand consequently the amount of data required to fillall entries in the FAM is by grouping sensors to-gether.

In our initial TVL experiments we used a single FAMmatrix and arranged the robot’s sensors into fivegroups of three (as shown in Figure 9(a)). We producedthe input vector by taking the minimum sensor readingfrom each sensor group. We also described the inputdomain using a reduced number of membership func-tions for inputs at the sides and toward the rear of therobot, as shown in Figure 9(a), so that the robot’s per-ception is more acute toward the front. Furthermore, thestructure of the membership functions, shown in Figure9(b), are concentrated toward the near vicinity of therobot so that range readings of closer objects can beinterpreted more accurately. The resulting arrangementallows the total input search space to be described withjust 5 ∗ 4 ∗ 3 ∗ 3 ∗ 4 = 720 possible fuzzy rules.

(a)

(b)

Figure 9. (a) Sonar sensors arranged into five groups withthree sensors in each group. (b) Membership functions usedto fuzzify the input vector.

Each of the 720 FAM entries is used to store the con-sequences of each possible rule. So to describe all theappropriate velocities associated with the robot’s seventrajectories, a total of 720 ∗ 7 = 5040 entries are re-quired within the FAM as shown in Figure 10.

Page 9: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

9

Figure 10. Using a FAM matrix to store velocities of 7trajectories.

For TVL to work effectively, FAM entries shouldalways be initialized to low velocities. Thus, aslearning progresses, the robot becomes increasinglyaware of faster trajectories that exist around objects(and into free space) and becomes capable of negoti-ating the environment via these faster trajectories. Ifthe FAM is instead initialized with fast or randomvelocities, an inexperienced robot would perceivemany trajectories that collide with nearby objects tobe inappropriately fast. Consequently, when per-forming behaviors, the robot would choose to followthese inappropriately fast trajectories at speed andend up colliding heavily with objects. (The require-ment for unfamiliar input vectors to always producelow velocities is one reason why conventional neuralnets are unsuitable for TVL (see Section 4.3 for fur-ther details)).

Although the use of grouped sonar sensors re-sults in the robot having considerably coarse per-ception, we found the single FAM arrangement (inFigure. 9) to be simple and effective for demonstrat-ing the fast learning times possible with TVL withinstructured and uncluttered environments. However,for more difficult environments it is better to avoidcoarsely grouped sensors by using multiple FAMmatrices.

3.2 Using Multiple FAM Matrices to MapSensors to Trajectory Velocities

To overcome the coarse perception that results fromhaving grouped sonar sensors, we decided to re-place the robot's single FAM by providing the robotwith seven FAM matrices for storing velocities be-longing to each of the robot’s seven trajectories asshown in Figure 11. Each FAM matrix receives itsown independent input vector which is derived from

sensors that are considered the most relevant for de-tecting objects in the vicinity of the FAM’s trajectory.For example the most appropriate sensors for resolvingthe forward trajectory (T0) would of course be the frontsensor as well as some neighboring sensors to the leftand right of the front sensor.

5 T0FAM

T0

7 T1LFAM

T1

6 T3R FAM

T6

T0 Sensors

T1L Sensors

T3R Sensors

Input Vectors Trajectory Velocities

Figure 11. Storing associations between sensors and trajectoryvelocities in 7 FAM matrices

Although this multi-FAM configuration requiresalmost seven times as much processing to lookup therobot’s seven trajectory velocities, this does not sig-nificantly reduce response times for the robot due tothe high speed at which FAMs can produce their out-puts directly from input vectors. Furthermore, havingindependent FAMs to store each trajectory’s velocitiesprovides increased immunity to sensor damage and canassist the robot in adapting to sensor malfunctions asexplained in Ward and Zelinsky (1997).

3.3 Selecting Appropriate FAM Inputs from Avail-able Sensors

To decide which sensors produce the most relevantinformation for resolving the pathways of individualtrajectories, three factors need to be considered: (1) theposition of each sensor, (2) the reflective nature of so-nar signals and (3) the divergence (or beam angle) oftransmitted sonar signals. Although sensors adjacentto a specific trajectory have obvious relevance to thattrajectory, due to their position, they will not alwaysreturn a signal from objects located on the trajectory’spathway. In particular, flat walls can be a problem. Forexample, Figure 12 shows typically how a flat wall couldbe detected by a sonar ring.

Page 10: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

10

Figure 12. Detecting a flat wall with a sonar sensorring.

Although the flat wall in Figure 12 lies directly inthe path of trajectory T1L, adjacent sensors S1 andS2 fail to return an echo from the wall due to theacute angle of incidence of their respective sonarsignals. However, sensors S3 to S5 do detect thepresence of the wall due to their transmitted signalsbeing almost normal to the wall. Hence, to be able toresolve appropriate velocities of trajectory T1L, so-nars S3, S4 and perhaps S5 should be included (aswell as other sensors) as inputs into the FAM matrixof trajectory T1L.

To determine which sensors to include as inputsinto each FAM, we placed the robot in various loca-tions and orientations near flat walls and used theabove criteria to decide which sensors would beneeded to resolve each FAM's trajectory.

Similarly, by considering different robot locationsnear flat walls, we divided each FAM input intomembership functions such that the various collisionpoints of each FAM's trajectory could be resolvedwith reasonable accuracy. Figure 13 shows how weallocated sensors and fuzzy membership functions tothe FAM matrices of trajectories T0 through to T3L.FAM’s belonging to trajectories T1R through to T3Rare allocated to sensors and fuzzy membership func-tions in the same fashion as T1L to T3L except theirinputs are connected to symmetrically opposite sen-sors on the right-hand side of the robot.

Figure 13. Sensors and fuzzy membership functions designatedto the FAM matrices of trajectories T0, T1L, T2L and T3L.(Fuzzy Sets: VN Very Near, N Near, M Medium, F Far, VF VeryFar)

Page 11: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

11

Despite configuring the robot’s perception to becapable of resolving trajectory velocities with re-spect to flat walls, extensive experimentation indi-cated that this arrangement also provides adequateperception of trajectories near most isolated objectsand irregular environment features. The reason forthis is that in most cases ultrasonic waves are morelikely to be reflected back to the receiver from inter-nal corners or irregular surfaces than from continu-ous flat surfaces that are not normal to most of thesensors on the sonar ring. Consequently, we wereable to achieve adequate resolution of our unstruc-tured laboratory environments for robust behaviorsto be acquired by the robot in relatively short peri-ods of time.

Increasing the input space of TVL FAMs is not aserious a matter as may be the case with otherlearning methods. This is because during learning,the robot generates large numbers of training pat-terns which are able to be immediately loaded intothe FAMs via compositional rule inference (see Sud-kamp and Hammell (1994)). Thus, the extent to whichyou divide up the input space of each FAM, islargely a matter of how much memory resources areavailable and how accurate you would like eachFAM to be. The final multi-FAM configuration, de-scribed in Figure 13, maps 9 of the 16 sonars to125,720 fuzzy regions. Although this is nearly 25times the total number of fuzzy regions allocated tothe single FAM configuration described in Figure 9(i.e. 5,050 regions), acquiring all 3 behaviors in ourlaboratory environments took only slightly longerthan with the single FAM configuration (see Section5 for details).

Although our experiments demonstrated thatFAMs worked well in this application, other super-vised learning methods (like CMAC neural networksKraft and Campagna (1990)) may also be suitable forlearning associations between sensors and trajectoryvelocities on mobile robots. However, if non-FAMlearning methods are used, precautions must betaken so that unfamiliar input vectors always pro-duce low velocities as output. Otherwise, high speedcollisions with unknown objects could occur (asexplained in Section 3.1). To conduct our experimentswe implemented this requirement by always initializ-ing the TVL FAMs with low velocities (0.1m/s).

4. Learning Trajectory Velocities

Since TVL is based on learning perception ratherthan specific actions, there is no need for the robot

to perform any of its desired behaviors to acquire theperception needed to perform those behaviors. In fact,the simplest way to learn trajectory velocities is to se-lect trajectories randomly and follow each until a colli-sion with an object or a full circle in free space occurs.Alternatively, trajectory velocities can be found by es-timating the collision points of engaged trajectories byusing accumulated sensor data and odometry. Al-though this may require considerable computation andmay be less accurate, it has the advantage of enablingthe robot to learn while performing any of its behaviors.However, if the robot’s motion is constrained by theengagement of a particular behavior, the robot maynever venture into some regions of the environmentand thus may not learn trajectories associated with in-put vectors that occur in those regions. For example, arobot that spends most of its time following walls maynever learn which trajectory velocities should be asso-ciated with input vectors that occur some distance fromthe walls. One effective learning strategy for overcom-ing this problem is to encourage the robot to explore byengaging different behaviors.

4.1 An Effective Learning Strategy

To reduce the overhead required to calculate trajec-tory collision points from accumulated sensor readings,we performed our TVL experiments by following trajec-tories until a collision with an object is detected or a fullcircle is completed. When this occurs, the robot isstopped momentarily to update the FAM entries asso-ciated with the traversed trajectory. This is done byassociating each input vector experienced along thetrajectory with appropriate velocities. These velocitiesare calculated by using a predefined deceleration rate toestimate the appropriate velocity at each point up to thecollision point. Thus, if the robot were to repeat thepath while complying with the learnt velocities, it wouldcome to a safe halt just prior to the collision point. Ve-locities associated with trajectories that complete a fullcircle are set at the trajectory’s maximum allowable ve-locity. Each training pattern is used to incrementallyadjust the relevant FAM’s trajectory velocities as ex-plained in Section 4.2. Figure 14 describes the basiclearning algorithm used to conduct our experiments.Although odometry is required to measure the dis-tances between trajectory collision points and recordedpoints where sensor readings were taken, these dis-tances are relatively short and therefore unaffected byodometry errors.

Page 12: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

12

loopscan all trajectories around robotchoose trajectory with least learning experiencerotate robot to face chosen trajectoryrepeat

move one time step along chosen trajectoryrecord input vector

until collision or full circle occursassociate appropriate velocitieswith each input vectorupdate FAM with the resulting training exemplarsif collision occurred reverse robot a shortdistance along previous path

end loop

Figure 14. Basic algorithm for learning trajectoryvelocities.

To speed up learning, the selection of trajectories tofollow is done by applying the current sensor data tothe FAM matrix and choosing the trajectory whichhas received the least amount of learning. Theamount of learning each FAM entry has received isrecorded by accumulating the sum of the respectivefuzzy rule’s stimulation levels resulting from alltraining exemplars which map to the entry, i.e.

Ei = µAi(x )(x, v)∈Ti

∑ (1)

where µAi (x) represents minimum input membership

of training pattern Ti with input x. Thus by usingthe result of Equation (1) each time the robot proc-esses an input vector, the robot is not only able toperceive the learnt velocities associated with theinput vector but also the amount of learning respon-sible for each entry’s current value.

4.2 Updating FAM Entries

Various techniques for generating fuzzy rules di-rectly from training data have been reported. Gener-ally, these methods can be classified by the learningtechnique involved, the most common being neuralnetworks Lin and Cunningham (1995), genetic algo-rithms Homaifar and McCormick (1995), iterative ruleextraction methods Abe and Lang (1993) and directfuzzy inference techniques Turksen (1992). The mostsignificant differences in these methods lies in thetime it takes to generate fuzzy rules and whether ornot fuzzy input and output sets need to be prede-fined. Unfortunately, the methods which do not re-quire the predefinition of fuzzy sets, require far toomuch training time for on line robot learning to bepossible. With this in mind we have chosen to adoptthe compositional rule inference approach proposedby Zadeh (1973) and more recently investigated by

Sudkamp and Hammell (1994). This approach enablesthe robot to incrementally adjust its FAM entries whileexperiencing its environment.

Consequent adjustment of FAM matrices is achievedby accumulating the weighted average of training ex-emplars which stimulate each consequent entry,namely:

Vi =µA i(x)( x,v)∈Ti

∑ vi

µA i(x)( x,v)∈Ti∑

(2)

where µAi (x) represents minimum input membership oftraining pattern Ti with input x and v represents thevelocity associated with training pattern Ti

Defuzification is performed by using the weightedaveraging method with the difference that we use thecrisp consequent values rather than representative val-ues from fuzzy output sets to save computation time.Thus the anticipated velocity from each FAM matrix isgiven by:

V =µAi(x)

i=1n∑ vi

µAi(x)i=1n∑

(3)

Because FAM entries can be accessed directly fromthe input vector, both updating FAM entries and infer-ring trajectory velocities from sensor data can be donequickly regardless of the size of the FAM matrix. Thisenables the robot to both perceive the velocities of itstrajectories and learn trajectory velocities within the realtime constraints of normal robot operation.

4.3 Conflicting Training Patterns

Due to the reflective nature of ultrasonic waves and thepresence of undetectable features in most environ-ments, there will inevitably be situations where the ro-bot experiences the same sensor data but discoversdifferent velocities for the same traversed trajectory.Some typical examples of this can be seen in Figure 15.

Figure 15. Conflicting training patterns. (i.e. same sensor databut different trajectory velocities).

Our experiments have shown that there was no needto devise sensor pre-processing or conflict resolution

Page 13: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

13

strategies to deal with these problems because mostwere adequately dealt with by the averaging effect ofthe FAM updating procedure (explained in Section4.2). This effectively cancels noise effects in thetraining data by taking the averages of all readingswhich map to the same FAM entries. It also tends toadjust FAM entries updated by conflicting trainingpatterns (like those shown in Figure 15) safelydownward. This occurs because in typical environ-ments there are usually more internal corners and flatwalls than external corners. Therefore, if any con-flicting patterns are generated, there will usually bemore conflicting patterns associated with lower ve-locities than with higher velocities. When performingbehaviors, the only effect the learnt conflicting pat-terns have is they cause the robot to fail to perceivethe faster alternative trajectories that exist near exter-nal corners. Therefore when the robot is near externalcorners, it may have to move further along its currenttrajectory before becoming aware of the faster alter-native trajectories that exist.

4.4 Symmetrical Learning

Because the robot’s sensors and trajectories aresymmetrical, the knowledge required in the FAMsbelonging to the left and right trajectories can alsobe considered symmetrical. Thus, any training pat-terns derived from trajectories on the left can notonly be used to update the appropriate left FAM butcan also be used to update the opposite right FAM,as shown in Figure 16.

Input Vector No. Input Vector No.

Left Trajectory Right Trajectory

0 1 2 3 4 0 1 2 3 4

150 150 105 84 150 .35 150 84 105 150 150 .35

Vel. Vel.

(experienced) (implied)

Figure 16. Updating 2 symmetrically opposite FAMs withone training pattern.

4.5 Rotating Foveal Perception

Similarly, training data derived from trajectories onthe right can be used to update symmetrically oppo-

site FAMs on the left. This can be used to both speedup learning and provide the robot with more compre-hensive perception of its environment. However, spe-cial consideration (as explained in Ward and Zelinsky(1997)) has to made if the robot’s sensors have signifi-cant differences in performance or if some sensors be-come damaged.

The ability to escape dead-ends when performingbehaviors can easily be encoded by providing the robotwith rotating foveal perception. This is achieved byenabling the robot to rotate its perception within thesonar ring so that the robot can perceive the environ-ment as if it were facing other directions. For example, ifthe robot finds itself in a situation where all immediatetrajectories are perceived to be slower than the velocitythreshold, the robot is able to increases its view of per-ceived trajectories without physically rotating itself byrotating its perception within the sonar ring. Thus byrotating its perception all the way around the 16 sen-sors comprising the ring the robot becomes capable ofperceiving the velocities of 16 ∗ 7 = 112 trajectorieswhile maintaining its current direction. This enables therobot to make quick decisions as to what direction itshould face when it finds itself in tight situations suchas dead-ends. For example, Figure 17 shows what hap-pens when a robot performing wall following enters adead-end. In this situation all the immediate perceivedtrajectory velocities lie well below the velocity thresh-old. So the robot rotates its perception within the ringto find faster trajectories. Consequently, the robot dis-covers that the nearest trajectory to the closest objectthat is faster than the threshold is trajectory T3L in thedirection of sensor 6. In response, the robot rotatescounter-clockwise to face the direction of sensor 6 andthen proceeds along trajectory T3L at the appropriatevelocity.

Figure 17. Rotating the robot’s perception virtually within thesonar ring to locate faster trajectories.

Page 14: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

14

5. Experimental Results.

We conducted a number of learning experiments inthe four structured environments shown in Figure 18(a to d), These environments were primarily used fortesting the accuracy of behaviors, comparing theperformance of different FAM configurations and for

observing the effects of adjusting the velocity thresh-old parameter. The unstructured lab environment, Fig-ure 18 (e & f), was used to test the robot's ability toacquire competences in indoor environments where adiverse range of input vectors are possible and featuresexist that are difficult for sonar sensors to detect.

(a) (b) (c)

(d) (e) (f)

Figure 18. Environments used to perform TVL robot experiments. (a) - (d) Structured environments. (e) - (f) The robot laboratory.

Prior to each learning trial we initialized all FAM ve-locity entries to 0.1 m/s and the velocity threshold to85% of trajectory maximums. To learn behaviors withineach of these environments we engaged the robot inexploratory learning (as explained in Section 4.1) andmonitored the robot’s competence by periodicallyswitching the robot’s behavior to object avoidance andwall following to see if the robot could perform thesebehaviors without colliding with walls or objects. Todetermine the amount of learning occurring within therobot at any moment we measured the average changeof all the velocity entries within the FAM matrices pertime step for each minute of learning time i.e.

∆V =vi − vi−1i=1

n∑n

(4)

where v i − v i − 1 is the total change occurring in all

FAM velocity entries during each time-step and n isthe number of time steps occurring in each minute oflearning.

Because we initially set all FAM entries to low ve-locities (i.e. 0.1m/s), the robot’s increasing competencecan be measured by monitoring the robot’s averagevelocity in addition to its collision rate.

5.1 Single FAM TVL Experiments

We conducted a number of TVL experiments within thestructured environments shown in Figure 18 with thesingle FAM configuration described in Section 3.1.Generally, the robot acquired competent behaviors inless than 15 minutes of learning time. However, sometrials took up to 20% longer than others due to the ran-dom manner by which trajectories were selected duringlearning. Often, this resulted in similar trajectories beingtraversed many times before unfamiliar trajectories wereexperienced. The graphs in Figure 19 show the averagelearning rate and acquired competence of 5 trials con-ducted in the circular, square and corridor environ-ments.

Page 15: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

15

Figure 19. Learning curves and performance measures for thecircular, square and corridor environments with the single FAMconfiguration.

Although the robot generally acquired the ability toavoid collisions quickly (e.g. less than 9 min in the cir-cular and square environments), inappropriate trajecto-ries were regularly selected (when performing behav-iors) until almost all FAM entries ceased to change.This caused the robot to turn too early, too late or toomuch in order to avoid collisions or get nearer to walls.Also, for all trials the final competence acquired fromthe square environment also worked reasonably well inthe circular environment and visa versa. Although thisdemonstrates fast learning in simple environments aswell as the generalizations possible the single FAMconfiguration, the robot’s coarse perception appearedto limit the robot’s ability to maintain parallel paths towalls and consistent wall clearance distances whileperforming the wall following behavior. Figure 20shows typical wall following paths exhibited by therobot in the circular, square and corridor environments.

(c) (b) (c)

Figure 20. Wall following behavior within (a) circular (b)square and (c) corridor environments.

Adjusting the robot’s velocity threshold afterlearning, (as explained in Section 2.2) produced signifi-cant changes to the average velocity and wall clearancedistances while performing object avoidance and wallfollowing behaviors. The results are shown by thegraphs in Figure 21.

Figure 21. Relationship between trajectory velocity thresholdand: (a) the average wall clearance distance, (b) the averagevelocity, when performing wall following in the circular andsquare environments (Dashed lines indicates collision pronebehavior).

Varying the threshold from 65% to 95% of trajectorymaximums resulted in a corresponding change in thewall clearance distance of between 0.32 meters to 0.67meters at an average velocity of 0.32 m/s to 0.45 m/s.We found following walls at closer distances to bepossible with lower thresholds, however, the inabilityof the single FAM robot to maintain consistent wallclearances resulted in increased collisions as thethreshold was further reduced.

We conducted goal seeking trials with the environ-ment shown in Figure 22. This was done by firstlyplacing objects at set positions in the environment andperforming learning until competent object avoidancebehavior was produced. The robot was engaged in goalseeking behaviour by providing it with an instructionto follow fast trajectories toward a set goal location (asdescribed in Section 2.1). Relative positions of the ro-bot and goal were maintained by using odometry. Al-though this provided only limited accuracy, we found itsufficient to demonstrate the acquired goal seekingcapability of the robot within the environment due tothe short distances involved and the brief duration ofthe trials. Because of the greater number of input vec-tors possible within environments containing obstacles

Page 16: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

16

learning took approximately 50% longer than in thesame environment without objects.

Figure 22. Goal seeking paths exhibited by robot. (a) - (c)goals successfully achieved. (d) unsuccessful goal seeking at-tempt.

All goals depicted in Figure 22 were successfullyfound without collisions except for the case shown inFigure 22(d). This situation produced a local minimumdespite the gaps between the objects being wideenough for the robot to pass. Examination of the sonardata revealed this to be mainly due mainly to the ro-bot’s coarse perception which makes narrow gaps hardto resolve.

When learning trials were attempted in the labora-tory environment (with the structured artifacts re-moved) the single FAM configuration had considerabledifficulty learning how to avoid all collisions. Furtherexamination of the sonar data and FAM contents indi-cated two reasons for this: (1) some objects such as thesharp edges of tables are difficult for a sonar ring todetect when approached from certain angles as shownin Figure 23, and (2) by having coarsely grouped so-nars the robot had difficulty differentiating its orienta-tion with respect to nearby objects. For example, insome positions the robot could be turned as much as15 degrees with respect to nearby objects without anychange occurring to the input vector. Hence, some in-put vectors become associated with fast velocities byexperiencing near misses during learning. When thesame input vectors occurred while performing behav-iors the fast learnt trajectories sometimes lead to colli-sions with objects. (Note: we found this problem couldbe improved to some extent by using the sonar sensorsto detect collision points and defining a collision to be

further from objects when learning than when perform-ing behaviors.)

Figure 23. Some environment features that are hard for a so-nar ring to detect.

5.2 Multi-FAM TVL Experiments

To improve the robot’s coarse perception caused byhaving coarsely grouped sonar sensors, we providedthe robot with 7 FAM matrices as described in Section4.2. Although all the FAMs in this configuration (ex-cept FAM T0) are considerably larger than the singleFAM configuration (e.g. Trajectory T1L with the multi-FAM has 31,250 entries as opposed to 720 for the sin-gle FAM) the learning time needed to produce comp e-tent behaviors was only 10% − 20% longer within thesame environments. For example, Figure 24 comparesthe average learning times from 5 trials in the squareenvironment with both the single and multiple FAMsbeing trained simultaneously from the same

Figure 24. Average learning time and collision rate for singleand multiple FAM configurations within the square environ-ment.

We found the relatively small difference in learningtimes occurred because the robot generates large num-bers of training patterns during learning. Many of theseredundantly map to same FAM entries in the single

Page 17: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

17

FAM whereas they tend to become distributed overmore FAM entries in the multiple FAMs. Althoughsignificant fluctuations in the multi-FAM’s velocityentries continued to occur with ongoing learning thisappeared to have no effect on the robot’s behaviors.

We found the robot’s capabilities to be greatly im-proved with the multi-FAM configuration. Narrowspaces could be negotiated with relative ease, therewere fewer collisions and the robot exhibited more pre-cise motion after similar learning periods. Figure 25shows typical wall following pathways exhibited by themulti-FAM robot in the circular, square and corridorenvironments.

(a) (b) (c)

Figure 25. Wall following behavior within (a) circular (b)square and (c) corridor environments with the multi-FAM con-figuration.

The ability of the robot to negotiate closely paced ob-jects was also greatly improved. Goals that were out ofreach to the single FAM robot due to the presence ofnarrow gaps (e.g. Figure 22(d)) could now be success-fully reached by the multi-FAM robot as shown in Fig-ure 26(a). However, where environments contained agoal in a narrow passageway, like that shown in Figure26(b), the robot would also fail to enter the passagewayand reach the goal if the robot did not happen to ven-ture into the passageway during learning. To obtainconsistent results (in Figure 26(b)) over 5 trials, we al-ways commenced learning by starting the robot closeto the passage entrance and by facing it toward thegoal. Thus, the robot would first learn faster trajectoriesat the entrance and within the passageway before pro-ceeding on to learn the rest of the environment.

Figure 26. Goal seeking behavior with the multi-FAM robot.(a) Obtaining a goal that the single FAM robot could not reach.(b) Negotiating narrow passages to reach a goal.

Similarly, learning trials conducted within the labora-tory (with the structured artifacts removed) also dem-onstrated improvement performance when comparedwith that of the single FAM configuration. The averagelearning time and acquired competence of 5 consecu-tive trials for single and multiple FAM configurationsare shown in Figure 27.

Figure 27. Comparison of learning times and competencemeasures performed in the laboratory environment with thesingle and multiple FAM configurations.

Although some change was still occurring to theFAMs after 60 minutes of learning, the robot’s behav-iors appeared to stop improving approximately 45 min-utes after learning commenced. During this time therobot’s average velocity at performing behaviors in-creased from 0.1 m/s to 0.5 m/s. Figure 28 shows typicalwall following and object avoidance paths exhibited bythe robot in the lab after 45 minutes of learning.

(a) (b)

Figure 28. Typical paths exhibited by the multi-FAM robotafter 45 minutes of learning in the laboratory environment (a)object avoidance, (b) wall following.

Page 18: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

18

Despite the improved perception provided by themulti-FAM configuration, the robot still experiencedoccasional collisions even after 60 minutes of learning.This was due to the presence of object features whichare difficult to detect using sonar sensors, (as explainedin Section 5.1). We also found that when ongoinglearning was done in environments that containedmany features that are hard for sonar sensors to detect,the robot would actually reduce the speed of its freespace motion when performing its behaviors. This oc-curs because an increased number of training patternswith larger range readings tended to become associatedwith slower velocities.

Although this may appear to corrupt the learningprocess by making the robot appear slower and lesscompetent at performing its behaviors, this can in factbe considered an appropriate way for an adaptive agentto respond to a difficult environment. For example, if anadaptive life-form were to experience regular collisionswith objects which are hard for its senses to detect,cautious motion (as well as fear of collision and re-duced competence) would similarly become exhibited.Furthermore, when the undetectable features werephysically tagged to make them visible to the sonarring, further learning resulted in restoration of the ro-bot’s speed and comp etence.

We also conducted experiments in more clutteredenvironments using the same learning algorithm. How-ever, learning times and final competences were bothinconsistent and relatively poor for any worthwhileresults to be reported. Even after hours of learning, therobot was often unable to negotiate much of its envi-ronment no matter what behavior was selected. Thedifficulty with learning cluttered environments was duemainly to our trajectory learning algorithm confiningthe robot to small sections of the environment duringlearning. Thus, the robot would often fail to learn manyunique features and narrow passages because theywere not experienced during learning. With furtherwork, we hope to overcome this by implementinglearning algorithms that base trajectory selection (whilelearning) on unfamiliar environment regions. Or alterna-tively, by engaging the robot in obstacle avoidance (orwandering) behavior and by using accumulated sensordata and odometry to estimate trajectory collisionpoints for the purpose of adaptively learning trajectoryvelocities.

6. Summary and Conclusion

Although various unassisted robot learning methodshave been around for some time now, they generallyresult in long learning times when used to learn be-haviors on real robots. The main reasons for this arethe credit assignment problem in reinforcement learningmethods and the robot fitness evaluation problem inevolutionary or classifier techniques.

To avoid the credit assignment problem and the fit-ness evaluation problem in existing unassisted robotlearning methods, we have developed a mobile robotlearning technique based on learning associations be-tween sensor range readings and appropriate trajectoryvelocities. Appropriate trajectory velocities are deter-mined in relation to trajectory collision distances withnearby objects. This enables the robot to learn to per-ceive its environment in terms of trajectory velocitiesso that trajectories that collide with nearby objects areperceived to be slower than those which lead into freespace.

To learn trajectory velocities the robot has to detectthe collision points of engaged trajectories from whichappropriate velocities are calculated. These are thenassociated with sensor data that occurred along thenegotiated trajectory and used to train fuzzy associa-tive maps. For the purpose of conducting our experi-ments collision points were detected by following en-gaged trajectories until a collision or full circle oc-curred. Once the robot has sufficiently learnt to per-ceive its environment in terms of trajectory velocities,object avoidance, wall following and goal seeking be-haviors can be performed by giving the robot instruc-tions like "follow fast trajectories nearest to forwarddirection", "follow fast trajectories nearest to closestobject" and "follow fast trajectories nearest to goallocation" respectively.

We tested this learning technique on a Yamabicomobile robot equipped with 16 sonar sensors and abump sensor. We provided the robot with 7 trajectorycommands. Therefore, the robot's learning task was tolearn a mapping between its sonar sensors and 7 trajec-tory velocities. To effectively learn sensor-velocityassociations, FAM (Fuzzy Associative Memory) matri-ces Koza (1991) were used and trained via composi-tional rule inference Sudkamp and Hammell (1994).

Page 19: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

19

Our experiments demonstrated that the robot wasable to effectively learn uncluttered environments andcould produce all behaviors in relatively short periodsof time (10-15 min. in structured environments, 30-50min. in the lab environment). Attempts to learn clut-tered environments were less successful. Many, obsta-cles would tend to confine the robot to small sectionsof the environment during learning. This prevented therobot from learning many unique features in the envi-ronment.

Although more work is required before trajectoryvelocity learning can be applied to robots for perform-ing useful tasks, the experiments described in this pa-per demonstrate that this approach to learning certainmobile robot behaviors has considerable benefits andcost savings. It can enable mobile robots equippedwith range sensing devices to acquire object avoid-ance, wall following and goal seeking behaviors simu l-taneously and relatively quickly in unknown environ-ments. At the same time, the robot also learns to ap-propriately control of its velocity. The object clearancedistance of all behaviors can be adjusted by changinga single velocity threshold parameter. Dead end escapebehavior can also be incorporated into the robot byrotating the robot's perception within the sonar ringwhen all immediate trajectories are perceived to be be-low the velocity threshold. TVL has the disadvantagethat it requires range sensors or range readings derivedfrom other types of sensors. Also, TVL is limited tolearning behaviors that can be described in terms offast trajectories near to some predefined criteria. Forthis reason, when it comes to learning complex taskswith robots, TVL can only be expected to facilitateother learning techniques or control architectures byproviding a means of rapidly acquiring certain low levelbehaviors.

References

Abe, S. and Lang, M.S. 1993. A Classifier using fuzzyrules extracted directly from numerical data, Proc. 2ndIEEE Int. Conf. on Fuzzy Systems, pp. 1191-1198.

Asada, M. Noda, S. Tawaratsumida, S. and Hosoda, K.1995. Vision-based reinforcement learning for pur-posive behavior acquisition, IEEE Int. Conf. on Ro-botics and Automation, pp. 146-153.

Castellano, G. Attolico, G. Stella, E. Distante, A. 1996.Automatic generation of rules for a fuzzy roboticcontro ller, Proc. IEEE IROS ‘96, pp. 1179-1186.

Connell, J. and Mahadevan, S. 1992. Rapid task learningfor real robots, in Robot Learning by J Connell, Klu-wer Academic Publishers.

Floreano D. and Mondada, F. 1994. Automatic creationof an autonomous agent: genetic evolution of a neu-ral-network driven robot, In From Animals to AnimatsIII, Cambridge, MA: MIT Press.

Floreano D. and Mondada, F. 1996. Evolution of hom-ing behavior in a real mobile robot, IEEE Transac-tions on Systems, Man, and Cybernetics, 26(3):396-407.

Floreano, D. and Mondada, F. 1995. Evolution of neuralcontrol structures: some experiments on mobile ro-bots, Robotics and Autonomous Systems, 16:183-195.

Fox, D. Burgand, W. and Thrun, S. 1997. The dynamicwindow approach to collision avoidance. IEEE Ro-botics & Automation Magazine, 4(1):23-33.

Grenfenstette, J. and Schultz, A. 1994. An evolutionaryapproach to learning in robots. In: Proc. of the Ma-chine Learning Workshop on Robot Learning, NewBrunswick, NJ.

Guez A. and Selinski, J. 1988. A trainable neuromorphiccontroller, Journal of Robotic Systems, August.

Holland, J. H. 1986. Escaping brittleness: the possibili-ties of general purpose learning algorithms applied toparallel rule-based systems. In Machine learning, anartificial intelligence approach. (Vol. II), Los Altos:Morgan Kaufmann.

Homaifar A. and McCormick. E. 1995. Simultaneousdesign of membership functions and rule sets forfuzzy controllers using genetic algorithms. IEEETrans. on Fuzzy Systems, .3(2):129-138.

Jakobi, N. 1994. Evolving sensorimotor control architec-tures in simulation for a real robot, Masters thesis,University of Sussex, School of Cognative and Co m-puter Sciences.

Jakobi, N. Husbands, P. and Harvey, I. 1992. Noise andthe reality gap: the use of simulation in evolutionaryrobotics, Proc. of the 3rd Int. Conf. of Artifical Life,Springer, Berlin, pp. 110-119.

Kaelbling. L.P. 1995. An adaptive mobile robot. Proc. ofthe 1st European Conf. on Artificial Life, pp. 41-47,Cambridge, MA, MIT Press.

Kaelbling, L. 1995. Reinforcement learning in embed-ded systems, MIT Press.

Kaelbling. L. P. 1996. Reinforcement learning: a survey.Journal of Artificial Intelligence Research, 4:237-285.

Kalmar, Z. Szepesvari, C. and Lorincz, 1998. A. Modulebased reinforcement learning: experiments with a realrobot, Autonomous Robots, 5(3/4):273-295.

Page 20: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

20

Khatib, O. 1985. Real-time obstacle avoidance for ma-nipulatios and mobile robots. Proc. of the IEEE Int.Conf. on Robots and Automation, pp. 500-505, St.Louis.

Kosko. B. 1992. Neural networks and fuzzy systems,Englewood Cliffs, NJ, Prentice Hall, Inc.

Koza, J. 1991. Evolution and co-evolution of computerprograms to control independent acting agents. Proc.of the 1st Int. Conf. on the Simulation of AdaptiveBehavior. Cambridge, MA, MIT Press/BradfordBooks.

Kraft, L.G. and Campagna, D. 1990. A summary com-parison of CMAC neural networks and traditionaladaptive control systems, Neural Networks for Con-trol, MIT Press, Cambridge, MA.

Krose, B.J.A. and Evans, M. 1994. A self-organizingrepresentation of sensor space for mobile robot navi-gation. IEEE/RSJ Int. Conf. on Intelligent Robotsand Systems, pp. 9-14.

Lin, Y. and Cunningham, G.A. 1995. A new approach tofuzzy-neural system modeling, IEEE Transactions onFuzzy Systems, 3(2):190-197.

Materic, M.J. 1997. Reinforcement learning in the multi-robot d omain, Autonomous Robots 4(1).

Materic, M.J. and Cliff, D. 1996. Chalanges in evolvingcontrollers for physical robots, Robotics andAutonomous Systems, 19:67-83.

Michaud, F. and Matrec, M.J. 1998. Learning from his -tory for behavior-based mobile robots in non-stationary conditions, Autonomous Robots,5(3/4):335-354.

Meeden. L. A. 1996. An incremental approach to devel-oping intelligent neural network controllers for ro-bots. Trans. on Systems, Man and Cybernetics,26(3):474-485.

Nehmzow, U. Smithers, T. & Hallam, J. 1990. Steps to-ward intelligent robots, DAI Research Paper No. 502,Dept. of Artificial Intelligence, Edinburgh University.

Nolfi, S. Floriano, D. Miglino, O. and Mondada, F. 1994.How to evolve autonomous robots: different ap-proaches in evolutionary robotics, Proc. of ArtificialLife IV, pp. 190-197.

Pomerleau, D.A. 1993. Neural network perception formobile robot guidance, Kluwer Academic Publis hers.

Schmidhuber, J.H. 1996. A general method for multi-agent learning and incremental self-improvement inunrestricted environments. in Evolutionary Comp u-tation: Theory and Applications. Scientific Publis h-ing Co., Singapore.

Schultz, A.C. 1991. Using a genetic algorithm to learnstrategies for collision avoidance and local navig a-

tion, Proc. of the 7th Int. Symposium on Unmanned,Untethered, Submersible Technology, Durham, NH,pp. 213-225.

Sharkey, N.E. 1998. Learning from innate behaviors: aquantitative evalution of neural network controllers,Autonomous Robots, 5(3/4):317-334.

Sudkamp, T. and Hammell, R.J. 1994. Interpolation,completion and learning fuzzy rules, IEEE Trans. onSystems, Man and Cybernetics, 24(2):332-342.

Singh, S.P. Barto, A.G. Grupen, R.A. and Connolly, C.I.1993. Robust reinforcement learning in motion plan-ning with harmonic functions, Proc. of NIPS '93.

Tani J. and Fukumura, N. 1994. Learning goal directedsensory-based navigation of a mobile robot, NeuralNetworks, 7(3):553-563.

Turksen, I.B 1992. Fuzzy second generation expert sys -tem design for IE/OR/MS. Proc. of IEEE Int. Conf. onFuzzy Systems, pp. 797-786.

Watkins. C. J. 1989. Learning from delayed rewards.PhD Thesis, King’s College Cambridge UK.

Ward, K. and Zelinsky, A. 1997. An exploratory robotcontroller which adapts to unknown environmentsand damaged sensors. Int. Conf. on Field and Serv-ice Robots, pp. 477-484 Canberra, Australia.

Yuta, S. Suzuki, S. and Iida, S. 1991. Implementation of asmall size experimental self-contained autonomousrobot: sensors, vehicle control and description ofsensor behavior, Proc. of 2nd Int. Symposium on Ex-perimental Robotics, pp. 25-27, Toulouse, France.

Wyeth, G. 1998. Training a vision guided mobile robot,Autonomous Robots, 5(3/4):381-394.

Zadeh, L.A. 1992. Outline of a new approach to theanalysis of complex systems and decision processes,IEEE Trans. on Systems, Man and Cybernetics, 3:779-786.

Zelinsky, A. and Kuniyoshi, Y. 1996. Learning to coor-dinate behaviors in mobile robots, Journal of Ad-vanced Robotics, 10(2):143-159.

Page 21: Acquiring Mobile Robot Behaviors by Learning Trajectory ...koren/Papers/AR00.pdfrobot learning by illustrating some of the similarities and differences between our trajectory velocity

21

Koren Ward received her BCompSc degree in Com-puter Science from the University of Wollongong in1995. Currently she is a PhD candidate at the Universityof Wollongong working on the development of rapidalternatives to existing robot behaviour learning meth-ods. Her research interests include knowledge discov-ery from data, internet robotics and human-computerinterfaces.

Alexander Zelinsky was born in Wollongong, Austra-lia in 1960. He worked for BHP Information Technologyas a Computer Systems Engineer for 6 years beforejoining the University of Wollongong, Department ofComputer Science as a Lecturer in 1984. Since joiningWollongong University he has been an active re-searcher in the robotics field and obtained his PhD inrobotics in 1991. He spent nearly 3 years (1992-1995)working in Japan as a research scientist with Prof.Shinichi Yuta at Tsukuba University, and Dr. YasuoKuniyoshi at the Electrotechnical Laboratory. In March1995 he returned to the University of Wollongong, De-partment of Computer Science as a Senior Lecturer. InOctober 1996 he joined the Australian National Univer-sity, Research School of Information Science and Engi-neering as Head of the Robotic Systems Laboratory,where he is continuing his research into mobile robot-ics, co-operative multiple robots and human-robot in-teraction. In January 2000 Dr. Zelinsky was promoted toProfessor and Head of Systems Engineering at TheAustralian National University. Prof. Zelinsky is amember of the IEEE Robotics and Automation Societyand is currently President of the Australian Robotics &Automation Association.