Anchoring of semiotic symbols

Robotics and Autonomous Systems 43 (2003) 109–120

Anchoring of semiotic symbols

Paul VogtInstitute for Knowledge and Agent Technology, Universiteit Maastricht, P.O. Box 616, 6200 MD Maastricht, The Netherlands

Abstract

This paper presents arguments for approaching the anchoring problem usingsemiotic symbols. Semiotic symbols are definedby a triadic relation between forms, meanings and referents, thus having an implicit relation to the real world. Anchors areformed between these three elements rather than between ‘traditional’ symbols and sensory images. This allows an optimizationbetween the form (i.e. the ‘traditional’ symbol) and the referent. A robotic experiment based on adaptive language gamesillustrates how the anchoring of semiotic symbols can be achieved in a bottom-up fashion. The paper concludes that applyingsemiotic symbols is a potentially valuable approach toward anchoring.© 2003 Elsevier Science B.V. All rights reserved.

Keywords:Anchoring problem; Symbol grounding problem; Physical grounding; Adaptive language games; Semiotics

1. Introduction

The symbol grounding problem that deals with thequestion how symbols can be used meaningfully[8]is one of the hardest problems in AI and robotics. Asmany robotic applications use symbols for reasoning,problem solving and communication, solutions for thisproblem are extremely important for robotics researchand development. But symbol grounding is also an im-portant problem in studying foundations of cognitionsuch as the evolution of language, as human languageis primarily symbolic[7].

Recently, a formalized solution for the technical as-pect of the symbol grounding problem has been pro-posed under the name ofanchoring [5]. Anchoringconcentrates on constructing and maintaining a rela-tion between a symbol and a sensory image that isacquired from observing a physical object. Symbolgrounding is, in addition to anchoring, also concernedwith ‘anchoring’ abstractions and, more fundamen-

E-mail address:[email protected] (P. Vogt).

tally, with philosophical issues relating to the meaningof symbols.

Many attempts to tackle the anchoring problemstart with the design of predefined symbol systemsthat have predefined anchors to relate symbols withvisual percepts[5,13]. Recently, an increasing num-ber of attempts have been made to approach the an-choring problem from the bottom-up in which robotsdevelop their symbolic representations during theirevolution—be it phylogenetic and/or ontogenetic.These attempts often relate to the development ofsymbolic communication[2,12,16,22,24].

The common approach to tackle the anchoringproblem focuses on the development—hand-codedor learned—of anchors between symbols and sen-sory images[5]. This is a difficult problem since therobots have to deal with the object constancy prob-lem: when viewing an object from different locations,the sensory images relating to this object differ enor-mously because the size of the projection may differor because the object may be obscured. Humans arewell capable of dealing with object constancy, but itis unclear how this works. One approach to tackle

0921-8890/03/$ – see front matter © 2003 Elsevier Science B.V. All rights reserved.PII: S0921-8890(02)00353-6

110 P. Vogt / Robotics and Autonomous Systems 43 (2003) 109–120

the problem of object constancy would be to developanchors between symbols and the real world object,rather than between symbols and sensory images.

This paper proposes that the anchoring problem canbe solved in terms ofsemiotic symbols, which haveimplicit anchors in the real world[22]. An experimentbased on Steels’[14] language game model illustrateshow anchors in these semiotic symbols may be con-structed from the bottom-up through the use of lan-guage. In addition, it is discussed how the presentedlanguage game model may explain the cognitive phe-nomenon of family resemblance[23].

The paper is organized as follows:Section 2presents the notion of semiotic symbols and discussessome of the requirements for anchoring these. The ex-perimental setup is presented inSection 3. Section 4presents the experimental results. Discussions of theissues raised in the paper are presented inSection 5.Conclusions are given inSection 6.

2. The anchors of semiotic symbols

In this section, I will define the notion of semioticsymbols as opposed to the definition of symbols that iscommonly used in AI. As I will argue below, semioticsymbols have implicit anchors between some internalstructures and reality. Finally, I will discuss under whatconditions semiotic symbols may emerge.

The definition of semiotic symbols is adopted fromPeirce[11], who defined a semiotic symbol in termsof a sign, which in semiotics is a relation between areferent, meaningandform.1 These three elements canbe described as follows:

Form. A form (or word) is the shape of the sign,which is not necessarily material.

Meaning.The meaning is the sense that is made ofthe sign.

Referent.A referent is the object that stands for thesign, which may include abstractions, actions orother signs.

1 Peirce called this a symbol rather than a semiotic symbol. I callit a semiotic symbol to distinguish it from the—in AI and someother disciplines of cognitive science—commonly used definitionof a symbol, which is similar to the form of the semiotic symbol.In addition, Peirce used the termsrepresentamen, interpretantandobject where I use the terms form, meaning and referent.

Fig. 1. The semiotic triangle illustrates the relations between ref-erent, form and meaning that constitute a sign. Each line is ananchor, but the dotted line indicates that the relation between aform and a referent need not be a physical anchor, which must beestablished between referent and meaning, and between meaningand form.

The relation between the referent, form and mean-ing is often illustrated with the semiotic triangle[10]as shown inFig. 1. According to Peirce, a sign be-comes a (semiotic) symbolwhen its form, in relationto its meaning is arbitrary or conventionalized so thatthe relationship has to be learned; otherwise the signis either anicon or an index.

A semiotic symbol becomes meaningful when it isconstructed and used functionally by an agent, whichis conform Wittgenstein[23]. As such the meaningarises from the interaction of an agent that uses aform with the referent. Elsewhere, I have argued thatthe symbol grounding problem as presented by Har-nad is no longer relevant when we adopt semioticsymbols, because these areper definitiongrounded astheir meanings have intrinsic relations with their ref-erents[22]. This, however, does not solve the symbolgrounding problem, but translates it into another—more technical—problem, which I have coined thephysical symbol grounding problem.2

The physical symbol grounding problem is relatedto the anchoring problem in that it aims at constructingand maintaining anchors between symbols—i.e., theforms in semiotic symbols—and reality. Coradeschiand Saffiotti’s[5] description of anchoring, however,focuses on anchors between forms and sensory data.As the sensory data is acquired from a robot’s inter-action with its environment, the forms relate to thereal world. The anchors, however, are not necessarily

2 This problem is coined the physical symbol grounding problemto indicate that semiotic symbols provide a way to approachsymbol grounding with the physical grounding hypothesis[4]as the semiotic symbols themselves form a coupling betweenthe environment and an agent’s behavior and thus are physicallygrounded.

P. Vogt / Robotics and Autonomous Systems 43 (2003) 109–120 111

constructed to maintain a relation with the real worldentity, but rather with the sensory image of this entity.The physical symbol grounding problem, on the otherhand, does focus on constructing and maintaining arelation with the real world by constructing anchorsbetween forms and real world entities, mediated byanchors between forms and meanings, and betweenmeanings and referents. In addition, where anchor-ing relates forms to sensory images (and thus to thesensing of physical objects), the physical symbolgrounding problem is not restricted to constructingsemiotic symbols about physical objects, but alsoinclude abstractions, movements and even other semi-otic symbols.

The development of semiotic symbols depends onhow an agent interacts with its environment. Whenthe semiotic symbols are used in language, the waythe meaning is constructed depends on how it is used[23]. However, the meaning of semiotic symbols alsomust have a part that can be memorized, which can berepresented in terms of prototypical categories. Whenmediating on the meaning of a semiotic symbol, agentsmust confer to a similar meaning. Hence they must tryto find a common way to name the meaning. It is notunlikely that this requires for the agents to constructsimilar representations of the meanings they use. Inaddition, the construction of semiotic symbols shouldbe adaptive, because it may be impossible to design‘static’ anchors that apply to the dynamic interactionsof a robot with its environment[9]. An adaptive ap-proach to construct semiotic symbols allows robots tocreate new anchors when none exist or when exist-ing ones are insufficient. As a result, I assume that asemiotic symbol can have multiple meanings (or pro-totypes) to stand for a referent in relation to a form.These different meanings of a semiotic symbol willthen be used to interpret a referent on different oc-casions. To achieve such a development of semioticsymbols in communication, I assume that the mean-ings co-develop with linguistic forms[3] by means ofcultural interactions between agents and their environ-ment[18].

The anchors between meanings and referents arisefrom the physical interactions between an agent andits environment. The meanings are anchored to lin-guistic forms through the production and interpreta-tion of expressions. These physical anchors betweenreferents, meanings and forms provide an implicit

non-physical anchor between the forms and refer-ents through their use in language (Fig. 1). The waythese anchors are formed is influenced by the agents’interactions with their environment and individualadaptations as a self-organizing process[14].

For robots that develop semiotic symbols from thebottom-up, the above requires that robots are capableof interacting with their environment, including eachother. Furthermore, they have to construct and memo-rize categorizations that provide anchors between thereferents and the categories such that these can be usedappropriately in language. To use these in languagethey also have to construct anchors between the cat-egories and linguistic forms adaptively. How this canbe modeled is explained in the next section.

3. Adaptive language games

To illustrate how a set of anchored symbols canbe developed from the bottom-up, an experimentis presented in which two mobile LEGO robotsbootstrapped a symbolic communication system. Toachieve this, the robots engaged in a series ofadap-tive language games[14,17] in which they tried tocommunicate the form that stands for an object andadapt their internal structures in order to improvetheir performance on later occasions. Various types oflanguage games have been implemented such asob-servational games, guessing gamesandselfish games,which differ from each other in the type of learningmechanism the robots use and in what non-verbal in-put they use to determine the reference of an utterance[19,20]. For the experiment of this paper, the robotsplayed a series ofguessing games. Below follows atechnical description of the experimental setup.

3.1. The environment

In the experiment two mobile LEGO robots wereused that were equipped with light sensors, bumpers,active infrared, two motors, a radio module and a sen-sorimotor board, seeFig. 2. The light sensors wereused to detect the objects in the robots’ environment.The other sensors and the motors were used to processthe physical behaviors of the robots.

The robots were situated in a small environment(2.5 m×2.5 m) in which four light sources were placed


Fig. 2. The LEGO robots and a light source as used in the experiment.

at different heights. The light sources acted as the ob-jects that the robots tried to name. The four light sen-sors of the robots were mounted at the same heightas the different light sources. Each sensor outputs itsreadings on asensory channel. A sensory channel issaid tocorrespondwith a particular light source if thesensor has the same height as this light source. Thegoal of the experiment was that the robots developed alexicon with which they could successfully name thedifferent light sources.

3.2. Sensing, segmentation and feature extraction

Through the interactions of the robots with theirenvironment, they obtained raw sensory data. In or-der to reduce the redundant information from thishigh dimensional data, the robots transferred this datainto low dimensionalfeature vectors. The process ofacquiring feature vectors was done bysensing, seg-mentationand feature extraction. Each subsequentstep reduced the amount of sensory data as if it werea sieve.

3.2.1. SensingA guessing game started when both robots were

standing close to each other with their backs ‘facing’each other.3 During the sensing phase, the robots ro-

3 In the original implementation, the robots aligned themselvesautonomously[17], but to speed up the experiments, the robotswere placed by hand for this experiment.

tated one by one 720◦ to obtain a spatial view of theirenvironment. A spatial view contained the raw sen-sory data from the middle 360◦, which can be writtenin the form of a matrix4

X =

x1,1 . . . x1,q

.... . .

...

xn,1 . . . xn,q

, (1)

where each row represents the sensory data of then

sensory channels (four in the experiment) and the de-tection ofq measurements are given in the columns.5

The sensory data was sent to a stand alone PC whereall further processing took place off-line.

Fig. 3 shows the sensing of the two robots duringa guessing game. The left figure shows that robot Aclearly detected the four light sources; there appearsa ‘winning’ peak for every light sensorsi that cor-responds to one light source. The right figure showsthat robotB did not sense all four light sources clearlyand hence acquired a different view than robotA. Thishappened because both robots were not located at thesame position.

4 The robots rotated twice instead of once to ensure they rotatedat a constant speed when the actual sensing started. This is donebecause the onset and offset of the movement induced a warpedview, which in turn induced much noise for the segmentation.

5 Note that although the robots have more than four sensors onlythe four light sensors are used to construct anchors.


Fig. 3. The sensing of robot A (left) and robot B (right) during a language game. The plots show the spatial view of the robots’ environment.It was acquired during 360◦ of their rotation. They-axis shows the intensity of the sensors, while thex-axis determines the time (or angle)of the sensing in PDL units. A PDL unit takes about 1/40 s, hence the total time of these sensing events took about 1.5 s for robot A and1.3 s for robot B.

3.2.2. SegmentationThe segmentation phase extracted connecting re-

gions where the sensory data exceeded a threshold thatrepresented the upper noise level of that sensor. Theseregions were supposed to be induced by the sensingof a light source. To accomplish this segmentation, theraw sensory inputX was thresholded for noise result-ing in X′ = matrix(x′

i,j) according to

x′i,j = H(xi,j − Θi), (2)

where

H(x) ={

x if x ≥ 0,

0 if x < 0(3)

andΘi represents the upper noise level of light sensori, which was acquired empirically for each sensor.

Given the preprocessed sensory dataX′, a segmentSk can be defined as thelargestmatrix

Sk =

x′1,r . . . x′

1,m

.... . .

...

x′n,r . . . x′

n,m

, (4)

where in each columnj there is at least one element forwhichx′

i,j > 0 for i = 1, . . . , n andj = r, . . . , m; andwhere 1≤ r < m < q. Note that the inequalityr <

m implies that the segments have to contain at leasttwo measurements to filter out further noise. When

a segment was detected at the start of the view andanother was detected at the end, both segments wereconcatenated.

Ideally, the segmentation resulted in a set that con-tained a segment for each light source. This set consti-tuted what is called thecontextof the guessing game,i.e. Cxt = {S1, . . . , SN}, whereN is the number ofsegments that were sensed. Each robot participatingin the guessing game acquired its own context whichcould differ from another.

3.2.3. Feature extractionThe feature extraction resulted in a feature vector

f = (f1 . . . , fn), wherefi = ϕ(Sk) was a functionthat normalized the maximum intensity of a sensorychanneli to the overall maximum intensity within asegmentSk. That is, the maximum value in rowi of thematrix Sk was normalized to the maximum value ofthe entire matrix. Mathematically, the functionϕ(Sk)

is given by

ϕ(Sk) = maxj∈[r,m](x′i,j)

maxSk(x′

p,q). (5)

This way the function extracted the invariant propertythat the feature of the sensory channel with the overallhighest intensity inside a segment had a value of 1,whereas all other features had a value≤ 1. Or, in otherwords, the feature with value 1 corresponded to the


light source the feature vector referred to. The spacethat spans all possible feature vectorsf is called then-dimensional feature spaceF = [0, 1]n, or featurespacefor short.

3.3. Discrimination game

Each robot played adiscrimination game[15] toform a memorized representation of the meaning—ormeaning for short—for each (potential)topic. A topicis a segment from the acquired context as describedby its feature vector. The speaker selected its topicrandomly from the context and this topic became thesubject of communication. As the hearer of a guessinggame tried to guess what the speaker’s utterance re-ferred to, it had to consider all segments in its contextas apotential topic. A discrimination game was suc-cessful when it resulted in one or more categories thatdistinguished the topic from all other segments in thecontext. When the robot failed to find such a category,the discrimination game failed and the robot expandedits ontology in which the categories were stored. Thediscrimination game is a sequence of three processes:categorization, discriminationandadaptation.

3.3.1. CategorizationA categoryc = 〈c, ν, ρ, κ〉 was defined as a region

in the feature spaceF and it was represented by scoresν, ρ and κ and a prototypec = (y1, . . . , yn), whereyi were the coordinates of the prototype in each ofthe n dimensions ofF. The category was the regionin F in which the points had the nearest distance toc. Each feature vector in the context was categorizedusing the1-nearest neighbor algorithm[6]. So, featurevectorf was categorized with that categoryc for whichthe prototypec had the smallest Euclidean distance‖f − c‖.

In order to allow generalization and specializationof the categories, different versions of the feature spaceFλ were available to a robot. In each space a differentresolution was obtained by allowing each dimensionof Fλ to be exploited up to 3λ times, whereλ =0, . . . , λmax. How this was done will be explained inSection 3.3.3.

The use of different feature spaces allowed therobots to categorize a segment in different ways.The categorization of segmentSk resulted in a set ofcategoriesCk = {c0, . . . , cm}, wherem ≤ λmax.

3.3.2. DiscriminationSuppose that a robot wants to find distinctive cate-

gories for (potential) topicSt , then a distinctive cate-gory set, DC, can be defined as follows:

DC = {ci ∈ Ct|∀(Sk ∈ Cxt\{St}) : ci /∈ Ck}. (6)

Or in words, the distinctive category set DC consists ofall categoriesci of the topicSt that are not a categoryof any other segmentSk in the context Cxt.

3.3.3. AdaptationIf DC = ∅, the discrimination game fails and the

robot should adapt its ontology by constructing newcategories. Suppose that the robot tried to categorizefeature vectorf = (f1, . . . , fn), then new categorieswere created as follows:

(1) Select an arbitrary featurefi > 0.(2) Select a feature spaceFλ that has not been ex-

ploited 3λ times in dimensioni for λ as low aspossible. If no such space can be found, the adap-tation is stopped.

(3) Create new prototypescj = (y1, . . . , yn), whereyi = fi and the otheryr are made by combiningthe features from all existing prototypes inFλ.

(4) Add the new prototypical categoriescj =〈cj, νj, ρj, κj〉 to the feature spaceFλ, withν = ρ = 0.01 andκ = 1 − (λ/λmax).

The three scoresν, ρ and κ together constitute themeaning scoreµ = (1/3)(ν +ρ + κ), which was usedin the naming phase of the guessing games. Althoughthe influence of this score was small, it helped to selecta form-meaning association in case of an impasse.Whereκ was kept constant,ν and ρ were increasedwhen the category was distinctive (ν) and when it wasused successfully in the naming phase (ρ); they werelowered otherwise. Exact details of these updates canbe found in[20].

If the distinctive category set DC�= ∅, the discrimi-nation game was a success and the DC was forwardedto the naming phase of the guessing game. If a cat-egoryc was used successfully in the guessing game,the prototypec of this category was moved toward thefeature vectorf of the topic

c := c + ε · (f − c), (7)

whereε = 0.1 is a constant step size with which theprototype moved towardsf . This way the prototypes


became more representative samples of the featurevectors it categorized.

The discrimination game as implemented here dif-fers from the implementation of Steels[15] mainlyin the representation and construction of categories.Steels used binary trees to split up the sensory (or fea-ture) channels rather than using prototypes. The rea-son for using prototypes is that the world as sensedby a robot is not binary and splitting up categoriesin binary trees seems therefore inappropriate. In ad-dition, Steels allowed categories to be formed in onlyone dimension or in any combination of the differentfeature dimensions; while in this implementation thecategories were alwaysn-dimensional.

It is important to realize that all processing up to thispoint was carried out by each robot individually. Thisway, the ontologies, contexts and distinctive categorysets differed from robot to robot.

3.4. Production

After both robots obtained distinctive categories ofthe (potential) topic(s), the speaker tried to communi-cate its topic based on its lexicon. The lexiconL wasdefined as a set of form-meaning associations:L ={FMi}, where FMi = 〈Fi, Mi, σi〉 was a lexical entry.Word-form Fi was made from an arbitrary combina-tion of consonants and vowels taken from the alphabet,meaningMi was represented by some category, andassociation scoreσi ∈ 〈0, 1〉 was a real number thatindicated the effectiveness of the lexical entry basedon past interactions. Each form could be associatedwith multiple meanings, and each meaning could haveassociations with more than one form.

The speaker of the guessing game ordered thedistinctive category set DC based on the meaningscoreµ. It selected the distinctive category with thehighest meaning score and searched its lexicon forform-meaning associations of which the meaningmatched this distinctive category. If it failed to findsuch an element, the speaker first considered the nextbest distinctive category from the ordered DC. If alldistinctive categories were explored and still no entrywas found, the speaker could invent a new form aswill be explained inSection 3.7.

If there were one or more lexical entries that fulfilledthe above condition, the speaker selected the entrythat had the highest association scoreσ. The form that

was thus produced was uttered to the hearer. In theon-board implementation this was done using radiocommunication, off-line the utterance was a sharedvariable.

3.5. Interpretation

On receipt of the utterance, the hearer searched itslexicon for entries for which the form matched the ut-teranceand the meaning matched one of the distinc-tive categories of the potential topics. If it failed tofind one, the lexicon had to be expanded, as explainedin Section 3.7.

If the hearer found one or more entries, it selectedthe entry that had the highest scoreΣ = σ + α · µ,whereα = 0.1 is a constant weight. The potential topicthat was categorized by this meaning was selected bythe hearer asthe topic of the guessing game. That is,this segment was what the hearer guessed to be thesubject of communication.

3.6. Corrective feedback

The effect of the guessing games was evaluated bythe corrective feedback. If the speaker had no lexi-cal entry that matched a distinctive category, or if thehearer could not interpret the speaker’s utterance be-cause it did not have a proper lexical entry in thecontext of the game, then the guessing game was afailure. The guessing game was successful when bothrobots communicated about the same referent. So ifthe hearer interpreted the utterance and thus guessedthe speaker’s topic, the robots had to evaluate whetherthey communicated about the same referent.

In previous work there have been various attemptsto implement the corrective feedback physically as apointing behavior[17]. All these attempts, however,failed. In order not to focus too long on this problemand to prove the principle, it was assumed for the timebeing that the robots could do this and the verificationwas simulated. Naturally, this problem needs to besolved in the future.

The corrective feedback was simulated by compar-ing the feature vectors of the two robots relating totheir topics. If the features with value 1 matched forboth topics, this means that the topics corresponded tothe same referent and the guessing game was consid-ered successful. If the hearer selected an inconsistent


topic during the interpretation, then there was amis-match in referentand the guessing game failed.

3.7. Lexicon adaptation

Depending on the outcome of the game, the lexi-con of the two robots was adapted. There were fourpossible outcomes/adaptations:

(1) The speaker had no lexical entry.In this casethe speaker created a new form and associ-ated this with the distinctive category it tried toname. This was done with a certain probability,which was kept constant during the experiment atPs = 0.1.

(2) The hearer had no lexical entry.The heareradopted the form uttered by the speaker and as-sociated this with the distinctive categories of arandomly selected segment from its context.

(3) There was a mismatch in referent.Both robotsadapted the association scoreσ of the used lexi-cal entry byσ := η · σ, whereη = 0.9 is a con-stant learning parameter. In addition, the heareradopted the utterance and associated it with thedistinctive categories of a different randomly se-lected segment.

(4) The game was a success.Both robots reinforcedthe association score of the used entry byσ :=η ·σ +1−η. In addition, they lowered competingentries (i.e. entries for which either the form orthe meaning was the same as in the used entry)

Fig. 4. (Left) The CS and DS of the experiment. (Right) The evolution of the number of meanings and forms that were used successfullyby the robots in one run of the experiment.

by σ := η · σ. The latter update is called lateralinhibition.

The coupling of the naming phase with the dis-crimination game and the sensing part makes thatthe emerging lexicon is grounded in the real world.The robots successfully solve the physical symbolgrounding problem in some situation when the guess-ing game is successful, because only in those case asemiotic triangle (Fig. 1) is constructed completely ina functional—and thus meaningful—sense.

4. Experimental results

An experiment was done for which the sensory dataof the sensing phase during 1000 guessing games wasrecorded. From this data set it was calculated thatthe a priori chance for successful communication was23.5% when the robots randomly chose a topic. Be-cause the robots did not always detect all the lightsources that were present, their context was not al-ways coherent. This incoherence caused an upper limitto the success rate that could be reached, called thepotential understandability, which was 79.5% on theaverage.

The 1000 recorded situations were processedoff-line on a PC in 10 runs of 10,000 guessing games.Fig. 4(left) shows the average communicative success(CS) and discriminative success (DS) of the 10 runs.The CS measures the number of successful guessing


Fig. 5. The referent-form competition diagram (left) shows the competition between forms to name referent light source L1. Thereferent-meaning diagram (right) shows the competition between meanings to interpret light source L1. In both diagrams they-axis showsthe occurrence frequencies of successfully used forms or meanings over the past 200 games relative to the occurrence of the referent. Thex-axis shows the number of games played.

games, averaged over the past 100 games. The DSmeasures the number of successful discriminationgames, also averaged over the past 100 guessinggames. As the figure shows, the DS reaches a valuenear 1 very fast. Hence, the robots were well capableof finding distinctive categories for the sensed lightsources. The CS was somewhat lower. It increasedtowards a value slightly below 0.8 near the end. Sincethis is close to the potential understandability, therobots were capable to construct a shared lexiconwithin its limits.

Fig. 4 (right) shows the number of different mean-ings and forms that were used at least once success-fully in one run of the experiments. As the figureshows, the number of meanings used were muchhigher than the number of used forms. The robotsused up to 450 meanings in relation to the four ref-erents, while they only used 16 forms to name them.So, there are approximately 28× more meanings usedthan forms. Although the robots used about 450 mean-ings to distinctively categorize the four light sources,further analysis revealed they only used about 20–25meanings frequently. In addition, only six or sevenforms were used regularly. So, the robots named eachreferent consistently with one or two forms.

The competition diagram ofFig. 5(left) shows howthe occurrence frequencies of the used forms to nameone of the referents evolved during one run of theexperiment. As this figure makes clear, the most fre-

quently used form “tyfo” clearly won the competitionto name light source L1. At the bottom of the dia-gram, other forms reveal a weak competition. Similarcompetitions can be observed for the other referents[20,22]. Fig. 5 (right) shows that the competition be-tween meanings to categorize a referent is stronger,which would be expected givenFig. 4 (right). Moreexperimental results can be found in[20,22].

5. Discussion

In this section, I will discuss why the notion ofsemiotic symbols is useful in relation to the anchor-ing problem. The discussion will be based on the ob-servation that semiotic symbols can be constructed byoptimizing the anchor between their forms and the ob-jects they stand for; thus solving the object constancyproblem. Furthermore, I will explain how the use ofsemiotic symbols can model the phenomenon of fam-ily resemblance.

In this paper the ‘alternative’ definition of symbolsas semiotic symbols is adopted to provide the possi-bility to construct anchors between symbols (or formsas I call them) and the real world. But is there any ad-vantage of using semiotic symbols over the traditionalsymbols in relation to the anchoring problem? In theoriginal anchoring problem[5], anchors are soughtbetween symbols and perceptual features, while the


symbols’ relations to the real world objects are some-what brought to the background. The experiment ofthis paper revealed that it is the relation between theform and the real world object that is being optimizedin terms of a one-to-one relationship. The relation be-tween the form and the sensory data (or even the cat-egories) does not reveal this optimization. I do notargue that the relationship between form and sensorydata is unimportant, but I do want to argue that the re-lation between form and referent is the one we shouldcare for.

Before explaining why the relation between formand referent is crucial, I will elaborate on the im-portance of the relation between sensory images andforms. The processes between sensing and featureextraction are extremely important because thesetransform the raw sensory data into more manageablefeature vectors that additionally bear some invariantinformation concerning the referents. In addition, theintermediate representations of categories are impor-tant to allow the optimization between form and refer-ent, because the discrimination games function—likethe sensing, segmentation and feature extraction—asa sieve. This sieve enables the robots to bind the nu-merous variation of the sensing to more informativegranules that are less numerous. These granules are,although still numerous, more manageable than theraw sensory data; thus allowing to close the couplingbetween referents, meanings and forms more easily.

The optimization between referent and form, how-ever, is the most dominant process for the constructionof consistent anchors between these two elements. Tounderstand how this optimization works, it is impor-tant to realize that robots try to construct a lexiconthat they can apply in different contexts. The lexiconis constructed through the interplay of adaptations un-der selective pressures and pragmatic language use.In the experiment, anchors were formed between ref-erents and meanings, between meanings and forms;and between forms and referents. The results showthat many anchors were used between referents andmeanings, and between meanings and forms. How-ever, when forms were used, they were well anchoredto the referents they name. Failures in the discrimi-nation game caused the emergence of so many mean-ings, because every time a discrimination game fails,a new category was added to the ontology. Many ofthem were associated with a form when they became

distinctive in a later discrimination game. As associa-tions were selected during a guessing game when theirmeanings fitted in the context—even if the scores werenot high—a lot of these meanings were used success-fully in the game.

The same context dependency causes the emergenttendency that the robots do not use so many forms,despite the variability of the acquired contexts dur-ing different games and between the robots. This canbe understood by realizing that when one robot cat-egorizes a referent differently in different guessinggames, this does not necessarily mean that the otherrobot finds different distinctive categories. When therobot that uses the same distinctive category on dif-ferent occasions, it will most likely use the sameform to express this meaning too. This allows theother robot to use the form in association with thetwo different meanings successfully, as the game iscontext dependent. When such situations occur fre-quently, this, in turn, allows the robots to use moremeanings than forms. These emerging dynamics ofthe lexicon can be classified as semiotic dynamics andillustrates how conceptual development is, at least tosome extent, dependent on language acquisition andlanguage use and vice versa. This is conform the—ina weaker version—revived Sapir-Whorf thesis[3]. Asimilar argument in favor of this weaker version ofthe Sapir-Whorf thesis was made in another studyusing language games[1]. In that study it was shownthat agents developed a shared categorization of thecolor space when they used language, but a distinc-tive categorization when they developed categorieswithout engaging in guessing games.

The optimization between referent and form solves,at least to some extent, the notion of object constancy:How can an object be recognized as being the samewhen different views of such an object can result indramatically different sensory stimuli, for instance,because it is partly obscured?Fig. 6 (left) illustrateshow the semiotic dynamics can explain the solution tothe object constancy problem. In the experiment, therobots detected the light sources from different posi-tions, resulting in different sensings—illustrated as thecontinuum of sensings P inFig. 6 (left)—which mayyield different meanings M1 and M2. Nevertheless,the system identifies the objects consistently, becausethe one-to-many relations between form and meaningconverge at the level of form and referent.


The results of the experiment in this paper show thatminimal autonomous robots can develop a shared setof semiotic symbols from the bottom-up by optimizingtheir anchors between forms and referents. However,one of the driving forces for this optimization—thecorrective feedback—was simulated. This is a majorshortcoming as the method used—inspecting eachother’s internal states—is unrealistic and may under-mine the principle. Nevertheless, the assumption wasadopted to test the principles of the underlying boot-strapping mechanisms and not to get stuck on solvingthis problem. A solution may come from applicationswere robots evaluate the corrective feedback usingtask-oriented behaviors, as was recently investigatedin simulations[21]. In these simulations, the feedbackcame from the effect of the task that the agents hadto perform using the evolved language.

The semiotic dynamics of the guessing games helpto solve the object constancy problem, but it mayalso help to explain another interesting phenomenonobserved in cognitive science, namelyfamily resem-blance [23]. Family resemblance is the observationthat seemingly different things are called the samewithout being ambiguous, like the meaning ofgames.Where soccer and chess are typical games, a game likeswinging is not typical. Swinging lies near the borderof the ‘conceptual space’ of games, e.g. referent R1 inFig. 6(right). It has no direct resemblance with gameslike soccer and chess, e.g. R2 and R3—but it has someresemblance with other games that in turn do have re-semblance with soccer and chess. Such categorizationprocess can be explained with the one-to-many rela-

Fig. 6. Illustration of two semiotic relations between referent R,meaning M and form F. The left figure shows the continuum ofpossible views of P of referent R as displayed as a rectangle. Somepart of the rectangle may be interpreted by M1 and another byM2. When both meanings relate to the same form, this mechanismsolves the problem of object constancy. The right figure showshow the model may explain family resemblance. The ovals shouldbe interpreted as Venn diagrams of the meanings M1 and M2.

tions between form and meaning. The word “games”is associated with different meanings for soccer, chessand swinging. The successful use of these meaningsin different situated language games allows the systemto emerge a family of resemblance. Optimization hereshould be made on the relation between a form anddifferent referents. This optimization can be realizedthrough the use of language.

Concluding, the above discussions provide manyarguments in favor of using semiotic symbols overthe traditional symbols with respect to anchoring. Themost important argument is that in the construction ofsemiotic symbols, anchors between forms and realityare implicitly being optimized, rather than optimizinganchors between symbols and sensory images.

6. Conclusions

This paper illustrates how a small group of au-tonomous robots can develop a set of shared semi-otic symbols in a bottom-up fashion by engaging inadaptive language games. The semiotic symbols therobots construct are defined by physical anchors be-tween referents and meanings, and between meaningsand forms, which yield a non-physical anchor betweenform and referent. The use of semiotic symbols allowsa profitable optimization to find, track and (re)acquireanchors between forms and referents, rather than be-tween forms and sensory images as proposed in theoriginal description of the anchoring problem[5].

The experiments show how a consistent construc-tion of semiotic symbols is positively influenced bytheir use in language. Through the use of language,the forms are shared externally to the robots. In addi-tion, the robots share the reference of their communi-cation through the received feedback. These externalfactors, together with the internal adaptations influencethe way the robots organize their conceptual spaces.Thus their conceptual development is influenced to alarge extent by their language use, hence providingan argument in favor of a weak interpretation of theSapir-Whorf thesis as discussed in[3].

To further broad our understanding on the emer-gence of semiotic symbol systems in language use,additional research is required on the emergence ofcompositionality as this is one of the key aspects of hu-man language use. Future research should concentrate


on how compositional structures can be grounded inthe sensorimotor flow through grammatical languageuse. In addition, more research is required to designrobotic applications that are capable of verifying theeffectiveness of their language use in order to providecorrective feedback autonomously. Although furtherresearch is required to improve and scale the model,adaptive language games provide a potentially valu-able technology for a bottom-up approach towards an-choring semiotic symbols.

Acknowledgements

The research has been carried out at the AI Lab ofthe Vrije Universiteit Brussel. I would like to thankLuc Steels very much for enriching me with his expertadvice and inspiration. I am also indebted to all othercolleagues of the AI Lab for their support in one wayor another. Ida Sprinkhuizen-Kuyper, Rens Kortmannand three anonymous reviewers are thanked for theircritical comments that helped to improve this paper.

References

[1] T. Belpaeme, Factors influencing the origins of colourcategories, Ph.D. Thesis, Artificial Intelligence Lab, VrijeUniversiteit Brussel, 2002.

[2] A. Billard, Imitation: a means to enhance learning of asynthetic proto-language in an autonomous robot, in: K.Dautenhahn, C.L. Nehaniv (Eds.), Imitation in Animals andArtifacts, MIT Press, Cambridge, MA, 2001.

[3] M. Bowerman, S.C. Levinson (Eds.), Language Acquisitionand Conceptual Development, Cambridge University Press,Cambridge, 2001.

[4] R.A. Brooks, Elephants don’t play chess, Robotics andAutonomous Systems 6 (1990) 3–15.

[5] S. Coradeschi, A. Saffiotti, Anchoring symbols to sensordata: preliminary report, in: Proceedings of the 17th NationalConference on Artificial Intelligence (AAAI-2000), Austin,2000, pp. 129–135.

[6] T.M. Cover, P.E. Hart, Nearest neighbour pattern classi-fication, Institute of Electrical and Electronics EngineersTransactions on Information Theory 13 (1967) 21–27.

[7] T. Deacon, The Symbolic Species, W. Norton and Co., NewYork, 1997.

[8] S. Harnad, The symbol grounding problem, Physica D 42(1990) 335–346.

[9] D. Jung, A. Zelinsky, Grounded symbolic communicationbetween heterogeneous cooperating robots, AutonomousRobots 8 (2000) 269–292.

[10] C.K. Ogden, I.A. Richards, The Meaning of Meaning: AStudy of the Influence of Language upon Thought and of

the Science of Symbolism, Routledge & Kegan Paul Ltd.,London, 1923.

[11] C.S. Peirce, Collected Papers, vol. I-VIII, Harvard UniversityPress, Cambridge, MA, 1931 (volumes were published from1931 to 1958).

[12] D.K. Roy, A.P. Pentland, Learning words from sights andsounds: a computational model, Cognitive Science 26 (2002)113–146.

[13] J.M. Siskind, Grounding the lexical semantics of verbs invisual perception using force dynamics and event logic,Journal of Artificial Intelligence Research 15 (2001) 31–90.

[14] L. Steels, Emergent adaptive lexicons, in: P. Maes (Ed.), FromAnimals to Animats 4, Proceedings of the Fourth InternationalConference on Simulating Adaptive Behavior, 1996, MITPress, Cambridge, MA.

[15] L. Steels, Perceptually grounded meaning creation, in: M.Tokoro (Ed.), Proceedings of the International Conference onMulti-Agent Systems, AAAI Press, Menlo Park, CA, 1996.

[16] L. Steels, F. Kaplan, AIBO’s first words. The social learningof language and meaning, Evolution of Communication 4 (1)(2000) 3–32.

[17] L. Steels, P. Vogt, Grounding adaptive language games inrobotic agents, in: C. Husbands, I. Harvey (Eds.), Proceedingsof the Fourth European Conference on Artificial Life, MITPress, Cambridge, MA, and London, 1997.

[18] M. Tomasello, The cultural origins of human cognition,Harvard University Press, Cambridge, MA, 1999.

[19] P. Vogt, Bootstrapping grounded symbols by minimalautonomous robots, Evolution of Communication 4 (1) (2000)89–118.

[20] P. Vogt, Lexicon grounding on mobile robots, Ph.D. Thesis,Vrije Universiteit Brussel, 2000.

[21] P. Vogt, Anchoring symbols to sensorimotor control, in:H. Blockdeel, M. Denecker (Eds.), Proceedings of the14th Belgian/Netherlands Artificial Intelligence Conference,BNAIC’02, 2002, pp. 331–338.

[22] P. Vogt, The physical symbol grounding problem, CognitiveSystems Research 3 (3) (2002) 429–457.

[23] L. Wittgenstein, Philosophical Investigations, Basil Blackwell,Oxford, UK, 1958.

[24] H. Yanco, L. Stein, An adaptive communication protocol forcooperating mobile robots, in: J.-A. Meyer, H.L. Roitblat, S.Wilson (Eds.), From Animals to Animats 2, Proceedings of theSecond International Conference on Simulation of AdaptiveBehavior, MIT Press, Cambridge, MA, 1993, pp. 478–485.

Paul Vogt received his M.Sc. degree incognitive science and engineering at theRijksuniversiteit Groningen, The Nether-lands, in 1997. He obtained his Ph.D.from the Artificial Intelligence Labora-tory of the Vrije Universiteit Brussel,Belgium. Currently, he is a postdoc re-searcher with the Institute for Knowledgeand Agent Technology at the UniversiteitMaastricht, The Netherlands. His main re-

search interests are in symbol grounding, language evolution androbotics.

Documents

Anchoring of semiotic symbols