Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Simultaneous Evolution of Learning Rules and
Strategies
Oliver Kirchkamp1
University of Mannheim,SFB504,L 13,15,D-68131Mannheim,email
Abstract
We studya modelof local evolution in which agentslocatedon a network interactstrate-
gically with their neighbours.Strategiesarechosenwith thehelpof learningrulesthatare
themselvesbasedon thesuccessof strategiesobservedin theneighbourhood.
Previouswork on local evolution assumesfixed learningruleswhile we studylearning
rulesthataredeterminedendogenously.
Simulationsindicate that endogenouslearning rules put more weight on a learning
player’s own experiencethan on the experienceof an observed neighbour. Nevertheless
stagegamebehaviour is similar to behaviour with symmetriclearningrules.
Keywords:EvolutionaryGameTheory, Learning,Local Interaction,Networks.JEL-Code:
C63,C72,D62,D63,D73,D83,R12,R13.
1 I amgratefulfor financialsupportfrom theDeutscheForschungsgemeinschaft through
SFB303andSFB504.Partsof thispaperwerewrittenat theEuropeanUniversityInstitute
(EUI), Florence.I amvery gratefulfor thehospitalityandthesupportthat I receivedfrom
theEUI.
I thankGeorg Noldeke, Karl Schlag,AvnerShaked,FernandoVega-Redondo,two anony-
Preprintsubmittedto Elsevier Preprint 18June1999
1 Introduction
In thispaperwe intendto studyhow strategiesandlearning2 rulesevolvesimulta-
neouslyin a local environment.Modelsof local evolution of strategieswith fixed
learningruleshavebeenstudiedby Axelrod(1984,p.158ff), LindgrenandNordahl
(1994),NowakandMay (1992,1993),Nowak,Bonhoeffer andMay (1993),Eshel,
SamuelsonandShaked(1998),andKirchkamp(1995).In thesemodelsplayersei-
ther imitate thestrategy of themostsuccessfulneighbouror thestrategy with the
highestaveragesuccessin their neighbourhoodrespectively. Both rulesareplausi-
ble andleadto thesurvival of cooperationin prisoners’dilemmas.
However, this phenomenondependson the learningrule assumed.Otherlearning
rules,for instanceplayersthat imitateagentswith probabilitiesproportionalto the
successof observed strategies,do not leadto cooperation.Proportionalimitation
rulesarearguablyplausiblesincethey areoptimal whenall membersof a popu-
lation areequallylikely to interactwith eachother(a finding obtainedby Borgers
andSarin(1995)andSchlag(1993,1994)).
Sincelocalevolutionarymodelsaremoresensitivethanglobalmodelsto thechoice
of learningruleswehaveto takespecialcare.In contrastto Borgers,Sarinor Schlag
we will not look for optimalrules.Insteadwe will useanevolutionaryprocessnot
only to selectstrategiesbut to selectlearningrulesaswell. A resultof our study
will bethat local evolution doesnot convergeto theoptimalglobalsolutionfound
by Borgers,Sarinor Schlag.
mousreferees,andseveralseminarparticipantsfor helpfulandstimulatingcomments.2 We have in mind herea descriptive definition of learningin the senseof a (persistent)
changeof behavioural potentiality(seeG. Kimble (Hilgard,Marquis,andKimble 1961)).
Werestrictourattentionto verysimplelearningrules.In particularwedonotaimtoprovide
amodelof learningasacognitive process.
2
We shallstudya modelwhereplayershave fixedpositionson a network andwho
changetheir strategy usinga rule that is basedon imitation.Hence,our modelhas
elementsin commonwith Axelrod(1984);NowakandMay (1992,1993),Nowak,
Bonhoeffer andMay (1993),Eshel,Samuelson,Shaked(1998),Kirchkamp(1995),
andLindgrenandNordahl(1994).However, while thesewritersassumeplayersto
usefixed learningruleswe shallallow playersto changetheir learningrule using
a processthat is basedon imitation. We will neitherallow playersto travel3 nor
to optimise4 . We will find thatwhile endogenouslearningrulesaredifferentfrom
thoseassumedby theaboveauthorsstagegamebehaviour is notaffectedtoomuch.
In section2 we describea modelof local endogenousevolution of learningrules.
In section3 we discusspropertiesof theselearningrules.Section4 analysesthe
dependenceof our resultson parametersof themodel.In section5 we thenstudy
the implication of endogenouslearningfor the stagegamebehaviour. Section6
concludes.
2 The Model
2.1 Overview
In the following we studya populationof players,eachoccupying onecell of a
torusof sizen � n wheren is between5 and200.Playersplay gameswith their
neighbourson this network, learnrepeatedgamestrategiesfrom their neighbours
andupdatetheir learningrule usinginformationfrom theirneighbours.
3 SeeSakoda(1971),Schelling(1971),Hegselmann(1994),Ely (1995).4 SeeEllison (1993),Sakoda(1971)andSchelling(1971).
3
2.2 StageGames
Playersplaygameswithin neighbourhoodswhichhaveoneof theshapesshown in
figure1.
[Fig. 1 abouthere.]
A player(markedasablackcircle)mayonly interactwith thoseneighbours(gray)
which live no morethan r i cells horizontallyor vertically apart.5 In eachperiod
interactionsamongtwo neighbouringplayerstake placeindependentlyfrom all
otherswith probability pi . Hence,in a given period playersmay have different
numbersof interactions.A typical valuefor pi is 1�2. We considervaluesfor pi
rangingfrom 1�100to 1 to testtheinfluenceof this parameter.
We assumethatgameswhich areplayedamongneighbourschangeevery tG peri-
ods.6 Changinggamesforceplayersto adaptand,hence,learningrulesto improve.
Wepresentresultsof simulationswheretG rangesfrom 200to 20000periods.Once
a new gameis selected,all neighboursin our populationplay thesamesymmetric
2 � 2 gamewith thefollowing form:
PlayerII
PlayerI
D C
Dg
g
� 1h
Ch� 1
00
Whennew gamesareselectedparametersg andh arechosenrandomlyfollowing
a uniform distribution over the intervals � 1 � g � 1 and � 2 � h � 2. We can
visualisethespaceof gamesin a two-dimensionalgraph(seefigure2).
5 Thesubscripti will beusedin thecontext of individual interactions.6 ThesubscriptG will beusedto denotechangesin theunderlyinggame.
4
[Fig. 2 abouthere.]
Therangeof gamesdescribedby � 1 � g � 1 and � 2 � h � 2 includesbothprison-
ers’ dilemmasandcoordinationgames.All gameswith g ����� 1 0 andh ��� 0 1areprisoners’dilemmas(DDPD
7 in figure2), all gameswith g � � 1 andh � 0 are
coordinationgames.In figure2 equilibriumstrategiesaredesignatedwith CC, CD,
DC andDD. Thesymbolrisk�
denotesrisk dominancefor gamesthathave several
equilibria.
2.3 RepeatedGameStrategies
Weassumethateachplayerusesasinglerepeatedgamestrategy againstall neigh-
bours.Repeatedgamestrategiesarerepresentedas(Moore)automatawith a max-
imal numberof statesof one,two, three,or four8 . For many simulationswe limit
thenumberof statesto lessthanthree.
7 Gameswith g ����� 1� 0� andh � 1 will not be calledprisoners’dilemmassinceCC is
Paretodominatedby alternatingbetweenCD andDC.8 Each‘state’ of a Mooreautomatonis describedwith thehelp of a stage-gamestrategy
anda transitionfunction to either the sameor any otherof the automaton’s states.This
transitiondependson theopponent’s stage-gamestrategy. Eachautomatonhasone‘initial
state’thattheautomatonenterswhenit is usedfor thefirst time.
Thereare2 automatawith only onestate(oneof theminitially playsC andremainsin this
statewhatever theopponentdoes;theotheralwaysplaysD).
Thereare26automatawith oneor two states.For example‘grim’ is a two-stateautomaton.
In theinitial stateit playsC. It staysthereunlesstheopponentplaysD. Thentheautomaton
switchesto thesecondstate,whereit playsD andstaysthereforever. Otherpopulartwo-
stateautomatainclude‘tit-for -tat’, ‘tat-for-tit’, etc.
The setof all automatawith lessthanfour statescontains1752different repeatedgame
strategies.Thesetof all automatawith lessthanfivestatesalreadyhassize190646.
5
2.4 LearningRules
Fromtime to time a playerhastheopportunityto revisehis or her repeatedgame
strategy. We assumethat this opportunityis a randomevent that occursfor each
playerindependentlyat theendof eachperiodwith a certainprobability. Learning
probabilitieswill bedenoted1�tL andrangefrom 1
�6 to 1
�120.tL thendenotesthe
averagetimebetweentwo learningevents.9 Hence,learningoccurslessfrequently
thaninteraction.Still, learningoccursmore frequentlythanchangesof the stage
gameandupdatesof thelearningrule itself (seebelow).
Whenaplayerupdatestherepeatedgamestrategy heor sherandomlysamplesone
memberof theneighbourhoodandthenappliestheindividual learningrule. In one
respectthisapproachissimplerthantherulesdiscussedin theliterature10 whichsi-
multaneouslyuseinformationon all neighboursfrom thelearningneighbourhood.
To facilitatecomparisonwith the literatureandasanextensionof thebasemodel
we studyendogenousmulti-playerrules in section4.1. We find that endogenous
multi-playerruleshave propertiesthatarevery similar to our endogenoussingle-
playerrules.To furtherfacilitatecomparisonwith theliteraturewestudyfixedsin-
gleplayerrulesin section5 andfind thatthey areverysimilar to thecorresponding
fixedmulti-playerrulesfrom theliterature.
Learningoccursin neighbourhoodswhichhave thesameshapeasneighbourhoods
for interaction(seeagainfigure 1). We denotethe sizeof the neighbourhoodfor
9 ThesubscriptL will beusedin thecontext of changesof strategies(eitherthroughlearn-
ing or throughmutation).10 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,
1993), Nowak, Bonnhoeffer, and May (1993), Eshel, Samuelson,and Shaked (1998),
Kirchkamp(1995).
6
learningwith thesymbolrL .
A player’s learningruleusesthefollowing information:
(1) Thelearningplayer’s repeatedgamestrategy.
(2) The own payoff uown obtainedwith this player’s repeatedgamestrategy,
i.e. the averagepayoff per interactionwhile using this repeatedgamestrat-
egy.
(3) A sampledplayer’s repeatedgamestrategy.
(4) The sampledplayer’s repeatedgamestrategy payoff usamp, i.e. the average
payoff perinteractionthattheplayerobtainedwhile usingthis repeatedgame
strategy.
Learningrulesarecharacterisedby a vectorof threeparameters� a0 a1 a2 �� ℜ3.
Givenalearningrule � a0 a1 a2 alearningplayersamplesoneneighbours’strategy
andpayoff andthenswitchesto thesampledstrategy with probability
p � uown usamp���� a0 � a1uown � a2usamp� (1)
where ��� � is definedas
� x� : �1 if x � 1
0 if x � 0
x otherwise
(2)
uown andusampdenotetheplayer’s andtheneighbour’spayoff.
Thus,thetwo parametersa1 anda2 reflectsensitivitiesof theswitchingprobability
to changesin theplayer’s andtheneighbour’s payoff. Theparametera0 reflectsa
generalreadinessto changeto new strategies.Thiscanbeinterpretedasahigheror
lower inclination to try somethingnew. Choosing � a0 a1 a2 a playerdetermines
threerangesof payoffs:Onewhereswitchingwill neveroccur(i.e. p � uown usamp!�
7
0), a secondonewherebehaviour is stochastic(i.e. p � uown usamp"��� 0 1 ), anda
third onewhereswitchingalwaysoccurs(i.e. p � uown usamp�� 1).
Of course,oneor even two of theserangesmay be empty. Our specification,in-
deed,allows for threetypesof rules:Ruleswhich are(almost)alwaysdetermin-
istic, ruleswhich alwaysbehave stochastically, andruleswhich reactdeterminsti-
cally for somepayoffs andstochasticallyfor others.An exampleof a determin-
istic rule (‘switch if better’) is � a0 a1 a2 : �#� 0$� a a with a % ∞. An exam-
ple of a rule thatalwaysimpliesstochasticbehaviour for thegamegivenabove is
� a0 a1 a2 : �&� 1� 2'� a a with 1� � 4a(� max�*) g )+,) h )+ 1 .
Our parametera0 is similar to theaspirationlevel ∆ from theglobalmodelstudied
in BinmoreandSamuelson(1994).However, thelearningrulesstudiedin Binmore
andSamuelsonarenot specialcasesof our learningrules,sincetheir decisionsare
perturbedby exogenousnoise.In caseswherethis noiseterm becomessmall our
rule approximatesBinmoreandSamuelson(1994)with � a0 a1 a2 : �-� ∆ $� a aanda % ∞.
2.4.1 Normalisation
We mapparameters� a0 a1 a2 "� ℜ3 into � a0 a1 a2 "��. 0 1/ 3 usingthe following
condition(seealsofigure3):
ai 0 tan πai � π2
1i �32 0 1 24 (3)
[Fig. 3 abouthere.]
Evolution operateson the normalisedvalues � a0 a1 a2 5�6. 0 1/ 3 in order not to
ruleoutdeterministicrules:Learningrulesfrom theliteratureareoftendeterminis-
tic. They canberepresentedasruleswhoseparametervaluesai areinfinitely large
8
or small. Sinceour evolutionary processis modeledin finite time it can hardly
convergeto infinite values.However, it couldconvergeto finite valuesof our nor-
malisationandstill denotedeterminsticrules.
2.4.2 Mutationsof RepeatedGameStrategies
We introducemutationsof repeatedgamestrategiesto show that simulationsare
particularlyrobust.However, aswe seein section4, we do not needmutationsfor
our results.
When a player learnsa repeatedgamestrategy, sometimesthe learningprocess
describedabove fails anda randomstrategy is learnedinstead.In this case,any re-
peatedgamestrategy, asdescribedin section2.3,is selectedwith equalprobability.
Sucha ‘mutation’ occurswith a fixedprobabilitymL . We considermutationrates
between0 and0 7.
2.5 ExogenousDynamicsthatSelectLearningRules
Fromtime to time playersrevise their learningrules.We assumethatplayersup-
datelearningrules independentlywith probability 1�tu in eachperiod.tu, hence,
denotestheaveragetime betweentwo updates.11 We considerlearningrates1�tu
among1�40000to 1
�400.If notmentionedotherwise1
�tu � 1
�4000.In particular,
updatesof learningrulesoccurmorerarely thanupdatesof strategiesor changes
of games.Giventhatupdatesof learningrulesdo not occurveryoftenit is reason-
able to assumethat playersmake a larger effort to updatelearningrules:Firstly,
all neighboursare sampled(and not only a single neighbouras for updatesof
11 Thesubscriptu will beusedto denotechangesof learningrules(eitherthroughlearning
or throughmutation).
9
strategies).Secondly, the sampleddata is evaluatedmore efficiently, now using
aquadraticapproximation.
Theshapesof theneighbourhoodsthatareusedto updatelearningrulesaresimilar
to thoseusedfor interactionandlearning(seefigure1). We denotethesizeof the
neighbourhoodfor updatesof learningruleswith thesymbolru.
A playerwho updateshis or herlearningrule hasthefollowing informationfor all
neighboursindividually (includinghim- or herself):
(1) The(normalised)parametersof their currentlearningrulesa0 a1 a2.
(2) Theaveragepayoffs per interactionthat theplayersobtainedwhile their cur-
rentlearningrule wasused,u � a0 a1 a2 .
To evaluatethis informationwe assumethat playersestimatea model that helps
themexplainingtheir environment,in particulartheir payoffs. Playersusetheesti-
matedmodelto chooseanoptimal learningrule.To describethis decisionprocess
weassumethatplayersapproximateaquadraticfunctionof thelearningparameters
to explain how successfullearningrulesare.Formally the quadraticfunction can
bewrittenasfollows:
u � a0 a1 a2 �� c � � a0 a1 a2 b0
b1
b2
�
10
� � a0 a1 a2 q00 q01 q02
q01 q11 q12
q02 q12 q22
a0
a1
a2
� ε (4)
PlayersmakeanOLS-estimationto derivetheparametersof thismodel(ε describes
thenoise).Wechooseaquadraticfunctionbecauseit is oneof thesimplestmodels
which still hasan optimum.Similarly we assumethat playersderive this model
usinganOLS-estimationbecausethis is asimpleandcanonicalwayof aggregating
theinformationplayershave.We do not want to betakentoo literally. We wantto
modelplayersthatbehave asif they would maximisea quadraticmodelwhich is
derivedfrom anOLS-estimation.
Given the parameters(c b0 b1 b2 q00 ' ' q22) of the estimatedmodel, players
choosethetriple a0 a1 a2 thatmaximisesu � a0 a1 a2 s.t. � a0 a1 a2 7�8. 0 1/ 3. We
find that in 99% of all updatesthe Hessianof u � a0 a1 a2 is negative definite,
i.e. u � a0 a1 a2 hasa uniquelocal maximum.In the lessthan1% of all updates
remainingthe quadraticmodelmight be unreliable.In this caseplayerscopy the
mostsuccessfulneighbour.
Figure4 shows(only for onedimension)anexampleof asampleof severalpairsof
parameterai andpayoff u (blackdots)togetherwith theestimationof thefunctional
relationship(grey line) betweenai andu.
[Fig. 4 abouthere.]
11
2.5.1 Mutationsof LearningRules
Weintroducemutationsof learningrulesto show thatoursimulationresultsarero-
bust.However, asweshow in section4, wedonotneedmutations.Resultswithout
mutationsareverysimularto resultswith asmallamountof mutations.
Whenaplayerupdateshisor herlearningrule,with asmallprobabilitymL it is not
theabove describedupdateschemewhich is used,but theplayerlearnsa random
learningrule.Thisruleischosenfollowingauniformdistributionover � a0 a1 a2 9�. 0 1/ 3, which is equivalent to a randomand independentselectionof a0, a1, a2
following eacha Cauchydistribution.We considermutationratesfor learningmL
between0 and0 7.
2.6 Initial Configuration
At thebeginningof eachsimulationeachplayerstartswith a randomlearningrule
thatis chosenfollowing a uniformdistributionover � a0 a1 a2 7�:. 0 1/ 3. Thus,ini-
tially, the parametersa0, a1, a2 aredistributedindependentlyfollowing a Cauchy
distribution. Hence,in the first period expectedvaluesof the parametersof the
learningrules are ¯a1 � ¯a2 � ¯a3 � 0. Eachplayer startswith a randomrepeated
gamestrategy, following auniformdistributionover theavailablestrategies.
3 Results with Endogenous Learning Rules
3.1 DistributionoverLearningParameters
Figure5 displaysaveragesover 53 simulationson a 50 � 50 grid, eachlastingfor
400000periods.
12
[Fig. 5 abouthere.]
Figure5 shows two differentprojectionsof � a0 a1 a2 . The left part displaysthe
distribution over � a1 a2 , theright partover � a0 a1 � a2 . Axesrangefrom 0 to 1
for a0, a1, anda2 andfrom 0 to 2 in the caseof a1 � a2. Labelson the axesdo
not representthenormalisedvaluesbut a0, a1, a2 insteadwhich rangefrom � ∞ to
� ∞. 12
Both picturesareadensityplot anda tableof relative frequencies:
Density plot: Different densitiesof the distribution are representedby different
shadesof grey. Thehighestdensityis representedby thedarkestgrey. 13
Table of relative frequencies: Thepicturesin figure5 alsocontaina tableof rel-
ative frequencies.The left pictureis dividedinto eightsectors,theright picture
is divided into six rectangles.The percentageswithin eachsectoror rectangle
representtheproportionof playersthatusea learningrulewith parametersin the
particularrange.
Theleft partof figure5 showsthat,while playersaresensitive to theirown payoff,
they aresubstantiallylesssensitiveto observedpayoffs.
12 Thescalingof all axesfollows thenormalisationgiven in equation3. Hence,thevalue
“ a1 ; a2” actuallyrepresents2 < tan� π <'� a1 ; a2 ��= 2 � π = 2� andnot a1 ; a2. In thecurrent
context thisdifferenceis negligible.13 Densitiesarederived from a table of frequencieswith 30 > 30 cells for eachpicture.
We maplogsof densitiesinto differentshadesof grey. Theinterval betweenthelog of the
highestdensityandthe log of 1% of thehighestdensityis split into sevenrangesof even
width.Densitieswith logsin thesameinterval havethesameshadeof grey. Thus,thewhite
arearepresentsdensitiessmallerthan1% of themaximaldensitywhile areaswith darker
shadesof grey representdensitieslargerthan1.9%,3.7%7.2%,14%,27%and52%of the
maximaldensity.
13
Sensitivity to own payoffs: Giventhatwe startwith a uniform distribution of a1
and a2, figure 5 initially would look like a smoothgrey surfacewithout any
mountainsor valleys.During thecourseof thesimulationslearningparameters
changesubstantially. Similar to many learningrulesdiscussedin theliterature14
a1 is oftencloseto � ∞.
Insensitivity to sampled payoffs: The left part of figure 5 shows that 96.3%of
all playersuselearningruleswith ) a2 ),�-) a1 ) , i.e. rulesthat aremoresensitive
to own payoff than to sampledpayoff. Notice that the initial distribution over
parametersof thelearningrule impliesthat50%of all rulesfulfil ) a2 )?�@) a1 ) .We describethis kind of behaviour as‘suspicious’15 in the following sense.
Thesuccessof aneighbour’s rulemaydependonplayerswhoareneighboursof
the neighbour, but not neighboursof the player. A ‘suspicious’playerbehaves
like somebodywho ‘fears’ that thesampledneighbour’s experiencecannot be
generalisedto applyto theplayer’sown case.
3.2 Probabilitiesof switching to a sampledlearningrule
The learningrulesasspecifiedin equation1 determineprobabilitiesof switching
to theobservedrepeatedgamestrategy. Figure6 shows thecumulativedistribution
of theseprobabilities.16
[Fig. 6 abouthere.]
14 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,
1993), Nowak, Bonnhoeffer, and May (1993), Eshel, Samuelson,and Shaked (1998),
Kirchkamp(1995).15 Of course,ouragentscanonly behave asif they hadfeelingslike suspicion.16 This figureis derivedfrom a tableof frequencieswith 30 cells.Thescalingof thehori-
zontalaxisfollows thenormalisationgivenin equation3.
14
The horizontalaxis representsa0 � a1uown � a2usamp. Following the learningrule
1 a player switchesstochastically with probability a0 � a1uown � a2usamp if this
expressionis betweenzeroandone.Otherwisetheplayereitherswitcheswith cer-
tainty or notat all.
Figure6 shows that only in about12% of all learningevents0 � a0 � a1uown �a2usamp � 1, this meansthat in only 12%of all learningeventsa player’s decision
is astochasticone.Ourendogenousrulesareneitherfully stochastic(asthosefrom
Borgers,Sarin (1995) andSchlag(1993))nor fully deterministic(as thosefrom
Axelrod(1984,p. 158ff), NowakandMay (1992),etc.).
3.3 Comparisonwith otherLearningRules
Above we mentionedtwo referencepointsfor learningrules:Thoselearningrules
thatareassumedto beexogenousandfixedin theliteratureon local evolution and
rulesthatturnout to beoptimal in aglobal setting.
Let usstartwith theexogenousrulesthatareassumedin theliteratureonlocalevo-
lution. We have seenthat theexogenousfixedrulesmaybesimilar to theendoge-
nouslearningrulesin thesensethatsmallchangesin theplayer’s ownpayoff may
leadto drasticchangesin theprobabilityof adoptinga new strategy. Endogenous
learningrulesdiffer from thosestudiedin partsof theliteratureon local evolution
in thesensethatchangesin anobservedplayer’spayoff leadto smallerchangesin
theprobabilityof adoptinganew strategy.
Let us next compareour endogenousrules with thoserules that are seento be
optimal in a global setting17 . As we have seen,our rules differ in two respects
from thosethatareoptimal in a globalmodel.Firstly, asdiscussedin section3.1,
17 SeeBorgersandSarin(1995),Schlag(1993,1994).
15
they aremoresensitiveto changesin alearningplayer’sown payoff thanto changes
in anobservedneighbour’s payoff. Secondly, asmentionedin section3.2,players
following endogenousrulesquiteoftenswitchwith certainty. Why is theoutcome
of our evolutionaryprocesssodifferentfrom optimalrules?
A highersensitivity to a player’s own payoff ascomparedto an observed neigh-
bour’s payoff canberelatedto thelocal structure.Strategiesthataresuccessfulin
a neighbour’s neighbourhoodmaybelesssuccessfulin a player’s own neighbour-
hood.Thereforeaneighbour’spayoff is a lessreliablesourceof informationthana
player’sown payoff.
Optimal learningrulesalwaysswitch stochasticallyin orderto evaluateinforma-
tion slowly but efficiently. Even small differencesin payoffs are translatedinto
differentbehaviour. Learningslowly is not harmful in BorgersandSarin’s or in
Schlag’s model. Their playersdo not carewhetherthe optimal strategy is only
reachedafteraninfinite time. In our evolutionaryprocess,however, discountingis
involved throughthe regular updateof learningrules.Learningrulesthatquickly
achieve sufficiently goodresultsoutperformrules that slowly approachthe opti-
mum. Evolution of learningrulesat a speedthat is more than infinitesimalmay
leadto deterministicbehaviour.
4 Dependence on Parameters
Our commentsso far have beenbasedon a particular parametercombination.
Changingtheparameters,however, doesnot affect our mainresults.We will con-
sideranalternativerule to sampleneighboursin section4.1.Wewill studychanges
in theotherparametersin section4.2.
16
4.1 TheSelectionRule:SamplingRandomlyor Selectively
Above we assumedthat playerslearn repeatedgamestrategiesfrom a randomly
sampledplayer. Onemight, however, objectthatplayerscouldbemorecarefulin
selectingtheir samples.As a benchmarkcaselet us assumein this sectionthat
playerssamplethemostsuccessfulneighbour, i.e. theneighbourwith thehighest
payoff per interactionfor the currentrepeatedgamestrategy, measuredover the
lifetime of therepeatedgamestrategy.
Figure7 showsadistributionover � a0 a1 a2 , projectedinto thea1 a2 anda0 a1 �a2 space,similar to figure5.
[Fig. 7 abouthere.]
While thepictureis morenoisythanfigure5 thepropertiesof learningrulesarethe
sameasin section3.1. Playersarequite sensitive to changesin their own payoff
but lesssensitiveto changesin thesampledneighbours’payoff.
We suspectthat theadditionalnoisestemsfrom thereducedevolutionarypressure
on learningrules.Sinceonly ‘best’ rulesareavailable,identifyinggoodrulesis too
easy. In figure 7 we observe a clusterof learningruleswith valuesof a2 closeto
� ∞. Theserulesapparentlyfollow a ‘just copy whatever you see’strategy. This
mightbereasonable,since‘whateveryousee’is alreadythebestavailablestrategy
in your neighbourhoodunderthis selectionrule.
4.2 OtherParameterChanges
Figure8 shows the effect of variouschangesin the otherparameters.We always
startfrom thesamecombinationof parametersasa referencepoint andthenvary
17
oneof theparameters,keepingall theothersfixed.Thereferencepoint is a simu-
lationona torusof size50 � 50,wheretheinteractionneighbourhoodandlearning
neighbourhoodbothhave thesamesizer i � rL � 1 while theneighbourhoodthat
is usedwhenupdatingthe learningrule hassize ru � 2. To learna new repeated
gamestrategy playersrandomlysamplea neighbour. Learningoccurson average
everytL � 24periods18 . Theunderlyinggameis changedeverytG � 2000periods.
Playersupdatetheir learningrule on averageevery tu � 4000periods19 . Themu-
tationratefor learningaswell asfor updateof learningrulesis mL � mu � 1�100.
Simulationslast for ts � 40000periods.20 Thus,exceptfor thesimulationlength,
parametersarethesameasthosefor figure5.
[Fig. 8 abouthere.]
Figure8showsaverages21 of a1 anda2 for variouschangesin theparameters.Each
dot representsaparametercombinationthatwesimulated.To make theunderlying
patternclearer, dotsareconnectedthroughinterpolatedsplines.The white dot in
eachdiagramrepresentstheaveragevalue(¯a1 ��� 1 89, ¯a2 � 0 30)for thereference
setof parametersdescribedabove.Theline ¯a2 �A� ¯a1 is markedin grey.
The main result is that for all parametercombinationswe find relative sensitivity
to theplayer’s own payoffs, andinsensitivity to observedpayoffs. In particularthe
averagesa2 �A� ¯a1 for all parametercombinationsthatwesimulated.
18 Rememberthatwe assumelearningto bean independentrandomevent thatoccursfor
eachplayerwith probability1= tL .19 We alsoassumeupdateof learningrulesto beanindependentrandomevent thatoccurs
for eachplayerwith probability1= tu.20 Thesubscriptswill beusedto denotethelengthof thesimulation.21 Theseare arithmeticaveragesof a1 anda2, taken over 20 different simulationsruns,
which arerandomlyinitialised.Theaveragevaluesof a1 anda2 arethentransformedinto
a1 anda2 usingequation3.
18
Noticethatwedo not needmutationsfor our results.However, thesimulationsare
robust with respectto mutations.To show that we candispensewith both kinds
of mutationssimultaneouslywe ran a simulationwheremL � mu � 0 andshow
theresultin thegraph‘mutationof learningrules’ in table1 with a small triangle.
While learningonaverageleadsto asmallervalueof a2 westill have ¯a2 �B� ¯a1. On
theotherhand,we canintroducequite largeprobabilitiesof mutations(up to 0.7)
andstill have ¯a2 �A� ¯a1.
Let usnow discussthesensitivity of learningrulesto parametersof theevolutionary
processin moredetail.We distinguishthreedifferentclassesof parameters,those
whoinfluencerelative locality, relativespeed, andrelativenoiseof theevolutionary
process.We will seethatparametersvaluesthatdescribea slowor noisyprocess,
causethe distribution over the parametersof the learningrule to remaincloseto
the initial distribution. Parametersvaluesthat describea less local situation(for
examplethe interactionradiusis large,suchthatalmosteverybodyinteractswith
everybodyelse)cause‘suspicious’behaviour to disappeargradually.
Table1 summarisestheeffectsof theparameterson thesethreedimensions.
[Table1 abouthere.]
Let usbriefly discusssomeof thedependencieson parameters:
4.2.1 Locality
In thefollowing paragraphwewill explain in whichway theparameterstG, tL, mL ,
rL , C, andn influencediversityof players,andhence,locality of theevolutionary
process.Morelocality makesaplayer’sown experiencemoreimportant,hence,the
playerwill bemore‘suspicious’.Thus,theratio ¯a1� ¯a2 will besmaller.
19
Whengameschangerarely(i.e. tG is large)or whenplayerslearnfrequently(i.e. tL
is small) they have a betterchanceof finding the ‘long-run’ strategy for a given
game.Thus, they becomemoresimilar which reduceslocality. Diversity among
playersmayalsobereducedby ‘backgroundnoise’mL sincenoisemakesplayers
moresimilar. A larger learningradius(rL), increasesthe effectsof locality since
playerswill samplemore neighboursthat could be in a different situation.This
shows that beingableto spot locality is actuallyoneof its conditions.Likewise,
situationsbecomemore diversewhen the interactionneighbourhoodr i is small.
Another sourceof heterogeneityis that complexity of strategies also diversifies
players’behaviour. A largerheterogeneityis alsoprovidedby a larger population
(n).
4.2.2 Speed
Theparametersts, tu, andru influencethespeedof theevolutionaryprocesssince
they affectthefrequency or thescaleof learningsteps.Higherspeedallowslearning
rulesto moveaway from theinitial distribution, thus,to move fartherto theleft in
thediagram.
Thelongertheevolutionaryprocessruns(ts), themoretime learningruleshave to
developandto move away from the initial distribution.Also, themorefrequently
playersupdatelearningrules(i.e. thelargertu), thefasterlearningrulesevolveand
moveaway from theinitial distribution.Thefartherplayerssee(ru) whenupdating
a learningrule, thefastersuccessfullearningrulesspreadthroughthepopulation.
4.2.3 Noise
TheparameterstL, tu, mL , mu influencethenoisewithin theevolutionaryprocess.
Thisagainkeepsaveragesof theparametersof thelearningrulescloserto theinitial
20
distribution.
Whenlearningrulesarerarelyusedto selectstrategies(i.e. whentL is large),then
playersremaininexperienced,andlearningrulesarenot pushedinto a particular
direction.Whenlearningrulesarerarelyupdated(i.e. whentu is large) thenmore
reliabledataisavailablewhensuccessof alearningruleis evaluated.Developement
is, asa result,lessnoisy. Themorestrategiesareperturbedwhenthey arelearned
(mL), themoredifficult it becomesto evaluatea learningrule’s impacton success.
The more learningrulesareperturbedduring the updateprocess(mu), the more
they arepushedbackto theinitial distribution.
Notice, however, that changesin someparametersmay have conflicting effects.
One example is the speedtu in updatinglearning rules: For very small values
of tu learningrulesare updatedtoo often to accumulatea reasonableamountof
dataon thesuccessof the rule. As a consequence,theevolutionaryprocessis too
noisyto divergefrom theinitial distribution.For very largevaluesof tu thedataon
the performanceof learningrulesbecomesmorereliablewhile learningbecomes
slower. Again learningrulesdo not move away from their initial distribution. In
other words: Updatinga learningrule is beneficialfor the updatingplayer who
takesadvantageof informationthat is provided by neighbours.At the sametime
neighourssuffer sincethe updatingplayerceasesto serve asa reliablesourceof
information.Thesetwo conflictingeffectsexplain theturn in thetu-curve.
Wecanconcludethat,whateverparametersarechoosen,learningruleshavesimilar
propertiesin theend.Thedependenceof learningruleson parametersseemsto be
intuitive.
21
5 Stage Game Behaviour
Having dealtwith theimmediatepropertiesof endogenouslearningruleslet usnow
analysetheimpactthatendogenousruleshaveonstagegamebehaviour.
[Fig. 9 abouthere.]
Figure9 shows the proportionsof stagegamestrategies for variousgamesboth
for endogenousandfor fixedlearningrules.In simulationsrepresentedin figure9
theunderlyinggamechangesevery tG � 2000periods.Fromothersimulationswe
know that during these2000periodsstrategieshave adaptedto the new game22 .
Justbeforethegamechangeswedeterminetheproportionof stagegamestrategies
C andD. Theseproportionsarerepresentedin figure9 ascircles.Thepositionof
thecircle is determinedby theparametersof thegame,g andh. Thecolourof the
circle is white if theproportionof Cs is larger. Otherwiseit is black.
Figure 9 comparestwo cases:An exogenouslygiven learning rule of the type
‘switch if better’, approximatedas � a0 a1 a2 C�D� 0$� 100000 100000 and the
caseof endogenouslearningrules.
In bothpicturestwo areascanbedistinguished.Onearea,wheremostof thesimu-
lationsleadto a majority of C, andanotherone,wheremostsimulationsleadto a
majority of D. Two pointsareworthmaking:
E Thefixedlearningrule ‘switch if better’,which is anapproximationof thelearn-
ing rulesstudiedin theliteratureon local evolution with fixedlearningrules23 ,
22 SeeKirchkamp(1995).23 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,
1993),Nowak, Bonhoeffer, andMay (1993),Eshel,Samuelson,andShaked (1998),and
Kirchkamp(1995).
22
leadsto resultsverysimilar to thoseobservedin this literature.
� Thereis cooperationfor a substantialrangeof prisoners’dilemmas.Actually
30.3%of the142prisoners’dilemmasin this simulationleadto a majority of
cooperatingplayers.
� In coordinationgamesplayersdo not follow the principle of risk dominance
but anotherprinciplebetweenrisk dominanceandParetodominance24 .
E Underendogenouslearningtherangeof prisoners’dilemmaswheremostplayers
cooperateshrinksto 10.2%of the 137 prisoners’dilemmasin the simulation.
Behaviour in coordinationgamesagaindoesnot follow risk dominance.
Thefirst point is interesting,becauseit shows that themodelthatwe have studied
in thispaperis comparableto themodelsstudiedin theliteratureonlocalevolution
with fixedlearningrules.
The secondpoint shows that the propertiesof network evolution discussedin the
literatureonlocalevolutionwith fixedlearningrulespersist,atleasttosomesmaller
degree,evenwith endogenouslearningrules.
6 Conclusions
In this paperwe have studiedpropertiesof endogenouslyevolving learningrules
andthestagegamebehaviour thatis impliedby theserules.Wecomparedendoge-
nouslyevolving learningruleswith two typesof otherrules:‘Switch if better’rules
thatareassumedin standardmodelsof localevolution25 andrulesthatareoptimal
24 Very similar behaviour is foundfor thefixedlearningrule ‘copy thebeststrategy found
in theneighbourhood’in Kirchkamp(1995).25 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,
1993),Nowak, Bonhoeffer, andMay (1993),Eshel,Samuelson,andShaked (1998),and
23
in a globalcontext 26 .
We find thatour dynamicsselectsruleswhich aredifferentfrom thosecommonly
assumedin the literatureon local evolution. In particular, the learningrules se-
lectedonthebasisof ourdynamicsaremuchlesssensitiveto changesin asampled
player’spayoff. This ‘suspicion’canberelatedto thefactthatthesampledplayer’s
environmentis differentfrom thatof thelearningplayer.
The rules that emerge from our local evolutionaryprocessdiffer from rules that
areoptimal in a globalmodel.Our rulesarenot symmetricandthey ofteninvolve
deterministicbehaviour. The lack of symmetryin the learningrule is analogous
to the lack of similarity betweenthe situationof a learningplayerandthatof the
neighbours.Thedeterministicbehaviour is a resultof thelackof patiencewhich is
aconsequenceof themorethaninfinitesimallearningspeed.
Stagegamebehavior of endogenousrulesdiffersgraduallyfrom behaviour found
with ‘switch if better’.Cooperationfor a rangeof prisoners’dilemmasandcoordi-
nationnotonrisk dominantequilibriais presentwith ‘switch if better’rulesaswell
aswith our endogenouslearningrules,however, with endogenousrulesto a more
limited degree.
Besidesthe selectiondynamicsthat we have presentedherewe have also anal-
ysedotherselectiondynamics.In KirchkampandSchlag(1995)we studydynam-
ics whereplayersuselesssophisticatedupdaterulesthantheOLS-modelusedin
this paper. We have analysedmodelswhereplayersmove only in the direction
of the maximumof the OLS model,but do not adoptthe estimateof the optimal
rule immediately. We have also analysedmodelswhereplayersdo not estimate
any modelat all but copy successfulneighboursinstead.Both alternative specifi-
Kirchkamp(1995).26 SeeBorgersandSarin(1995),Schlag(1993,1994).
24
cationsleadto similar learningrule properties.Probabilitiesof switchingareless
sensitive to changesin theneighbour’spayoff andmoresensitive to changesin the
learningplayer’s payoff. The propertiesof the inducedstagegamebehaviour are
alsosimilar:Bothalternativespecificationsleadto cooperationfor someprisoners’
dilemmasandcoordinationnot on risk dominantequilibria.Thus,we canregard
theaboveresultsasfairly robust.
References
Axelrod,R., 1984,Theevolutionof cooperation (BasicBooks,New York).
Binmore,K., andL. Samuelson,1994,Muddling Through:Noisy Equilibrium Selection,
DiscussionPaperB–275,SFB303,RheinischeFriedrichWilhelmsUniversitat Bonn.
Bonhoeffer, S., R. M. May, andM. A. Nowak, 1993,More SpatialGames,International
Journal of BifurcationandChaos, 4, 33–56.
Borgers, T., and R. Sarin, 1995, Naive ReinforcementLearning With Endogenous
Aspirations,Secondinternationalconferenceon economictheory:Learningin games,
UniversidadCarlosIII deMadrid.
Ellison,G.,1993,Learning,Local Interaction,andCoordination,Econometrica, 61,1047–
1071.
Ely, J.,1995,Local Conventions,Mimeo, Universityof Californiaat Berkely, Economics
Department.
ESHEL, I ., L. SAMUELSON, AND A. SHAKED (1998):“Altruists,Egoists,andHooligans
in aLocal InteractionModel,” TheAmericanEconomicReview, 88,157–179.
Hegselmann,R.,1994,Zur SelbstorganisationvonSolidarnetzwerkenunterUngleichen,in
WirtschaftsethischePerspektivenI, ed.by K. Homann,no.228/I in SchriftendesVereins
25
fur Socialpolitik, Gesellschaftfur Wirtschafts-und Sozialwissenschaften,NeueFolge,
pp.105–129(Duncker & Humblot,Berlin).
Hilgard, E. R., D. G. Marquis, and G. A. Kimble, 1961, Conditioning and Learning.
Appleton-Century-Crofts, New York, 2nd editionrevisedby GregoryAdamsKimble.
Kirchkamp, O., 1995, Spatial Evolution of Automata in the Prisoners’ Dilemma,
DiscussionPaperB–330,SFB303,RheinischeFriedrichWilhelmsUniversitat Bonn.
Kirchkamp,O., andK. H. Schlag,1995,EndogenousLearningRulesin SocialNetworks,
RheinischeFriedrichWilhelmsUniversitat Bonn,Mimeo.
Lindgreen,K., andM. G. Nordahl,1994,Evolutionarydynamicsof spatialgames,Physica
D, 75,292–309.
May, R. M., andM. A. Nowak,1992,EvolutionaryGamesandSpatialChaos,Nature, 359,
826–829.
, 1993,The SpatialDilemmasof Evolution, InternationalJournal of Bifurcation
andChaos, 3, 35–78.
Sakoda, J. M., 1971, The Checkerboard Model of Social Interaction, Journal of
MathematicalSociology, 1, 119–132.
Schelling,T., 1971,DynamicModelsof Segregation,Journal of MathematicalSociology,
1, 143–186.
Schlag,K. H., 1993,Dynamic Stability in the RepeatedPrisoners’Dilemma Playedby
Finite Automata,Mimeo,Universityof Bonn.
, 1994, Why Imitate, and if so, How? Exploring a Model of Social Evolution,
DiscussionPaperB–296,SFB303,RheinischeFriedrichWilhelmsUniversitat Bonn.
26
List of Figures
1 Neighbourhoodsof differentsizes 28
2 Thespaceof consideredgames. 29
3 Mappingfrom � a0 a1 a2 7� ℜ3 into � a0 a1 a2 (�8. 0 1/ 30
4 An exampleof samplesof pairs.. . 31
5 Long run distributionoverparametersof thelearningrule 32
6 Cumulativedistributionof probabilitiesof switching 33
7 Long run distributionoverparametersof thelearningrule 34
8 Dependenceof a1 anda2 on theparametersof thelearningrule. 35
9 Stagegamebehaviour dependingon thegame 36
27
r F 1 r F 2 r F 3
Fig. 1. Neighbourhoodsof differentsizes
28
DDPD DD
DDriskG
CC
CCriskG
DD
CC CD H DC
DD
1 212
1
1
h
g
Fig. 2. Thespaceof consideredgames.
29
0 1 2 3 4I 1I 2I 3I 4
0J 25
0J 50
0J 75
1J 00ai
ai
Fig. 3. Mappingfrom � a0 � a1 � a2 �9� ℜ3 into � a0 � a1 � a2 �9�LK 0M 1N
30
ai
u
aOi
PP P P
P P PP P P P P
P P P P P P P P P
Fig. 4. An exampleof samplesof pairsof parametersandpayoffs (black)which areusedto estimatea functional relationship(grey) betweenai andu. Given this relationshipanoptimalvalueaQi is determined.
31
a1
a2
-Inf -1 0 1 +Inf
-Inf
-1
0
1
+Inf
R ∞
R 1
0
1
S ∞a2
R ∞ R 1 0 1S ∞
a1
69.2
1.9 0.539
0.454
0.453
0.4770.729
26.2
a0
·a1+a2“
-Inf -1 0 1 +Inf
-Inf
-1
0
1
+Inf
T ∞
T 1
0
1
U ∞“ a1U
a2”
T ∞ T 1 0 1U ∞
a0
2.47 1.71 1.19
25.7 55.6 13.3
Fig. 5. Longrundistribution over parametersof thelearningrule � a0 � a1 � a2 � . Averageover53simulationsrunsona torusof size50 > 50with 2-stateautomata.Neighbourhoodshavesizesr i V rL V 1, ru V 2. Relative frequenciesaregiven aspercentages.Simulationslastfor ts V 400000periods,interactionstake placewith probability pi V 1= 2, repeatedgamestrategiesarelearnedfrom arandomlysampledplayerwith probability1= tL V 1= 24, learn-ing rulesarechangedwith probability1= tu V 1= 4000,new gamesareintroducedwith prob-ability tG V 1= 2000,mutationsboth for repeatedgamestrategies and for learningrulesoccurata rateof mL V mu V 1= 100.
32
0
0W 5
1W 0
X ∞ X 1 0 1 Y ∞
a0 Y a1uown Y a2usamp
Fig. 6. Cumulative distribution of probabilitiesof switching,giventhelearningrulesfromfigure5.
33
a1
a2
-Inf -1 0 1 +Inf
-Inf
-1
0
1
+Inf
Z ∞
Z 1
0
1
[ ∞a2
Z ∞ Z 1 0 1[ ∞
a1
51.7
5.68 3.48
2.57
2.28
2.543.53
28.2
a0
·a1+a2“
-Inf -1 0 1 +Inf
-Inf
-1
0
1
+Inf
\ ∞
\ 1
0
1
] ∞“ a1]
a2”
\ ∞ \ 1 0 1] ∞
a0
7.44 5.86 4.25
10.2 38.9 33.3
Fig. 7. Longrundistribution over parametersof thelearningrule � a0 � a1 � a2 � . Averageover181 simulationruns,eachlasting for 400000 periods.Relative frequenciesaregiven aspercentages.Parametersareas in figure 5, except that learningplayerssamplethe mostsuccessfulneighbour.
34
¯a1
¯a2
^ ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
time to learnstrategies
tL _ 1 236
1224
60
aaaaa a atL _ 120
bc¯a1
¯a2
d ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
time to changethegame
tG e 200600
20006000
f ff fgf tG e 20000hi
¯a1
¯a2
j ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
time to updatelearningrules
40070012002100
4000 12000kkk k k kk
tu l 40000
mn¯a1
¯a2
o ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
lengthof simulation
ts p 12004000
12000
40000ts p 400000
qqqqq rs
¯a1
¯a2
t ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
mutationof strategies
mL u 00.0010.003
0.010.03
vvv vv v
v v
0w 10w 3
0w 7xy
¯a1
¯a2
z ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
mutationof learningrules
mu { 0 0.010.030.1
|||| | | | | |0.30} 7~
mL { mu { 0
��
¯a1
¯a2
� ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
radiusfor interaction,learningandupdate
r i � 3
rL � 3
ru � 3 ru � 1� � ��� � ��� ��
¯a1
¯a2
� ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
Complexity of strategies
C � 123
C � 4
���� ��
¯a1
¯a2
� ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
Probabilityof interaction
pi � 10� 70� 30� 1
pi � 0� 01��� �� ���
¯a1
¯a2
� ∞|
-5|
-2|
-1|
-0.5|
0.25|
0.5|
PopulationSize
5 � 520 � 20
50 � 50
200 � 200
�� ��� ��
Fig. 8. Dependenceof a1 anda2 on theparametersof thelearningrule.Dots representaveragesover the last 20% of 20 simulationruns,eachlastingfor40000 periods.The white circle in eachdiagramrepresentsaveragesof the ref-erenceparameters:50 � 50 torus, samplea randomplayer, tL � 24, tu � 4000,tG � 2000,mL � mu � 1� 100,ts � 40000,r i � rL � 1, ru � 2,C � 2. Thegrey lineshows ¯a2 ��� ¯a1.
35
PlayerII
PlayerI
D C
Dg
g
� 1h
Ch� 1
00
�������������������������DDPD DD
DDrisk�
CC
CCrisk�
DD
CC CD DC
DD
1 2¡ 1¡ 2
1
¡ 1
h
g�������������������������
switchif betterh
20-2
g
1
0
-1
prevailing strategy
1241obs.
endogenoush
20-2
g
1
0
-1
prevailing strategy
1170obs.
Fig. 9. Stagegamebehaviour dependingon the game.( ¢ =mostplayersplay C, £ =mostplayersplay D). Parameters:50 ¤ 50 torus,r i V rL V 1, ru V 2, samplea randomplayer,tL V 24, tu V 4000,tG V 2000,mL V 0M 1, mu V 0M 1, ts V 400000.
36
List of Tables
1 Effectsof simulationparametersonpropertiesof learningrules. 38
37
time tochange
mutationsradius
stra
tegy
gam
e
lear
ning
rule
leng
thof
sim
ulat
ion
stra
tegy
lear
ning
rule
inte
ract
ion
lear
ning
upda
tele
arni
ngru
leco
mpl
exity
prob
.ofi
nter
act.
popu
latio
nsiz
etL tG tu ts mL mu r i rL ru C pi n
degreeof locality ¥@¦ ¦ ¦§¥ ¥ ¥relative speed ¦ ¥ ¥relative noise ¥ ¦ ¥ ¥
Table1Effectsof simulationparametersonpropertiesof learningrules.¨
and © denotethe directionof the effect an increaseof a parameterhason speed,effi-ciency or locality.
38