Simultaneous Evolution of Learning Rules and Strategies · Simultaneous Evolution of Learning Rules and Strategies Oliver Kirchkamp1 University of Mannheim, SFB 504, L 13, 15, D-68131Mannheim,

Simultaneous Evolution of Learning Rules and

Strategies

Oliver Kirchkamp1

University of Mannheim,SFB504,L 13,15,D-68131Mannheim,email

[email protected]

Abstract

We studya modelof local evolution in which agentslocatedon a network interactstrate-

gically with their neighbours.Strategiesarechosenwith thehelpof learningrulesthatare

themselvesbasedon thesuccessof strategiesobservedin theneighbourhood.

Previouswork on local evolution assumesfixed learningruleswhile we studylearning

rulesthataredeterminedendogenously.

Simulationsindicate that endogenouslearning rules put more weight on a learning

player’s own experiencethan on the experienceof an observed neighbour. Nevertheless

stagegamebehaviour is similar to behaviour with symmetriclearningrules.

Keywords:EvolutionaryGameTheory, Learning,Local Interaction,Networks.JEL-Code:

C63,C72,D62,D63,D73,D83,R12,R13.

1 I amgratefulfor financialsupportfrom theDeutscheForschungsgemeinschaft through

SFB303andSFB504.Partsof thispaperwerewrittenat theEuropeanUniversityInstitute

(EUI), Florence.I amvery gratefulfor thehospitalityandthesupportthat I receivedfrom

theEUI.

I thankGeorg Noldeke, Karl Schlag,AvnerShaked,FernandoVega-Redondo,two anony-

Preprintsubmittedto Elsevier Preprint 18June1999

1 Introduction

In thispaperwe intendto studyhow strategiesandlearning2 rulesevolvesimulta-

neouslyin a local environment.Modelsof local evolution of strategieswith fixed

learningruleshavebeenstudiedby Axelrod(1984,p.158ff), LindgrenandNordahl

(1994),NowakandMay (1992,1993),Nowak,Bonhoeffer andMay (1993),Eshel,

SamuelsonandShaked(1998),andKirchkamp(1995).In thesemodelsplayersei-

ther imitate thestrategy of themostsuccessfulneighbouror thestrategy with the

highestaveragesuccessin their neighbourhoodrespectively. Both rulesareplausi-

ble andleadto thesurvival of cooperationin prisoners’dilemmas.

However, this phenomenondependson the learningrule assumed.Otherlearning

rules,for instanceplayersthat imitateagentswith probabilitiesproportionalto the

successof observed strategies,do not leadto cooperation.Proportionalimitation

rulesarearguablyplausiblesincethey areoptimal whenall membersof a popu-

lation areequallylikely to interactwith eachother(a finding obtainedby Borgers

andSarin(1995)andSchlag(1993,1994)).

Sincelocalevolutionarymodelsaremoresensitivethanglobalmodelsto thechoice

of learningruleswehaveto takespecialcare.In contrastto Borgers,Sarinor Schlag

we will not look for optimalrules.Insteadwe will useanevolutionaryprocessnot

only to selectstrategiesbut to selectlearningrulesaswell. A resultof our study

will bethat local evolution doesnot convergeto theoptimalglobalsolutionfound

by Borgers,Sarinor Schlag.

mousreferees,andseveralseminarparticipantsfor helpfulandstimulatingcomments.2 We have in mind herea descriptive definition of learningin the senseof a (persistent)

changeof behavioural potentiality(seeG. Kimble (Hilgard,Marquis,andKimble 1961)).

Werestrictourattentionto verysimplelearningrules.In particularwedonotaimtoprovide

amodelof learningasacognitive process.

2

We shallstudya modelwhereplayershave fixedpositionson a network andwho

changetheir strategy usinga rule that is basedon imitation.Hence,our modelhas

elementsin commonwith Axelrod(1984);NowakandMay (1992,1993),Nowak,

Bonhoeffer andMay (1993),Eshel,Samuelson,Shaked(1998),Kirchkamp(1995),

andLindgrenandNordahl(1994).However, while thesewritersassumeplayersto

usefixed learningruleswe shallallow playersto changetheir learningrule using

a processthat is basedon imitation. We will neitherallow playersto travel3 nor

to optimise4 . We will find thatwhile endogenouslearningrulesaredifferentfrom

thoseassumedby theaboveauthorsstagegamebehaviour is notaffectedtoomuch.

In section2 we describea modelof local endogenousevolution of learningrules.

In section3 we discusspropertiesof theselearningrules.Section4 analysesthe

dependenceof our resultson parametersof themodel.In section5 we thenstudy

the implication of endogenouslearningfor the stagegamebehaviour. Section6

concludes.

2 The Model

2.1 Overview

In the following we studya populationof players,eachoccupying onecell of a

torusof sizen � n wheren is between5 and200.Playersplay gameswith their

neighbourson this network, learnrepeatedgamestrategiesfrom their neighbours

andupdatetheir learningrule usinginformationfrom theirneighbours.

3 SeeSakoda(1971),Schelling(1971),Hegselmann(1994),Ely (1995).4 SeeEllison (1993),Sakoda(1971)andSchelling(1971).

3

2.2 StageGames

Playersplaygameswithin neighbourhoodswhichhaveoneof theshapesshown in

figure1.

[Fig. 1 abouthere.]

A player(markedasablackcircle)mayonly interactwith thoseneighbours(gray)

which live no morethan r i cells horizontallyor vertically apart.5 In eachperiod

interactionsamongtwo neighbouringplayerstake placeindependentlyfrom all

otherswith probability pi . Hence,in a given period playersmay have different

numbersof interactions.A typical valuefor pi is 1�2. We considervaluesfor pi

rangingfrom 1�100to 1 to testtheinfluenceof this parameter.

We assumethatgameswhich areplayedamongneighbourschangeevery tG peri-

ods.6 Changinggamesforceplayersto adaptand,hence,learningrulesto improve.

Wepresentresultsof simulationswheretG rangesfrom 200to 20000periods.Once

a new gameis selected,all neighboursin our populationplay thesamesymmetric

2 � 2 gamewith thefollowing form:

PlayerII

PlayerI

D C

Dg

g

� 1h

Ch� 1

00

Whennew gamesareselectedparametersg andh arechosenrandomlyfollowing

a uniform distribution over the intervals � 1 � g � 1 and � 2 � h � 2. We can

visualisethespaceof gamesin a two-dimensionalgraph(seefigure2).

5 Thesubscripti will beusedin thecontext of individual interactions.6 ThesubscriptG will beusedto denotechangesin theunderlyinggame.

4

[Fig. 2 abouthere.]

Therangeof gamesdescribedby � 1 � g � 1 and � 2 � h � 2 includesbothprison-

ers’ dilemmasandcoordinationgames.All gameswith g �� 1 0 andh �� 0 1areprisoners’dilemmas(DDPD

7 in figure2), all gameswith g � � 1 andh � 0 are

coordinationgames.In figure2 equilibriumstrategiesaredesignatedwith CC, CD,

DC andDD. Thesymbolrisk�

denotesrisk dominancefor gamesthathave several

equilibria.

2.3 RepeatedGameStrategies

Weassumethateachplayerusesasinglerepeatedgamestrategy againstall neigh-

bours.Repeatedgamestrategiesarerepresentedas(Moore)automatawith a max-

imal numberof statesof one,two, three,or four8 . For many simulationswe limit

thenumberof statesto lessthanthree.

7 Gameswith g �� 1� 0� andh � 1 will not be calledprisoners’dilemmassinceCC is

Paretodominatedby alternatingbetweenCD andDC.8 Each‘state’ of a Mooreautomatonis describedwith thehelp of a stage-gamestrategy

anda transitionfunction to either the sameor any otherof the automaton’s states.This

transitiondependson theopponent’s stage-gamestrategy. Eachautomatonhasone‘initial

state’thattheautomatonenterswhenit is usedfor thefirst time.

Thereare2 automatawith only onestate(oneof theminitially playsC andremainsin this

statewhatever theopponentdoes;theotheralwaysplaysD).

Thereare26automatawith oneor two states.For example‘grim’ is a two-stateautomaton.

In theinitial stateit playsC. It staysthereunlesstheopponentplaysD. Thentheautomaton

switchesto thesecondstate,whereit playsD andstaysthereforever. Otherpopulartwo-

stateautomatainclude‘tit-for -tat’, ‘tat-for-tit’, etc.

The setof all automatawith lessthanfour statescontains1752different repeatedgame

strategies.Thesetof all automatawith lessthanfivestatesalreadyhassize190646.

5

2.4 LearningRules

Fromtime to time a playerhastheopportunityto revisehis or her repeatedgame

strategy. We assumethat this opportunityis a randomevent that occursfor each

playerindependentlyat theendof eachperiodwith a certainprobability. Learning

probabilitieswill bedenoted1�tL andrangefrom 1

�6 to 1

�120.tL thendenotesthe

averagetimebetweentwo learningevents.9 Hence,learningoccurslessfrequently

thaninteraction.Still, learningoccursmore frequentlythanchangesof the stage

gameandupdatesof thelearningrule itself (seebelow).

Whenaplayerupdatestherepeatedgamestrategy heor sherandomlysamplesone

memberof theneighbourhoodandthenappliestheindividual learningrule. In one

respectthisapproachissimplerthantherulesdiscussedin theliterature10 whichsi-

multaneouslyuseinformationon all neighboursfrom thelearningneighbourhood.

To facilitatecomparisonwith the literatureandasanextensionof thebasemodel

we studyendogenousmulti-playerrules in section4.1. We find that endogenous

multi-playerruleshave propertiesthatarevery similar to our endogenoussingle-

playerrules.To furtherfacilitatecomparisonwith theliteraturewestudyfixedsin-

gleplayerrulesin section5 andfind thatthey areverysimilar to thecorresponding

fixedmulti-playerrulesfrom theliterature.

Learningoccursin neighbourhoodswhichhave thesameshapeasneighbourhoods

for interaction(seeagainfigure 1). We denotethe sizeof the neighbourhoodfor

9 ThesubscriptL will beusedin thecontext of changesof strategies(eitherthroughlearn-

ing or throughmutation).10 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,

1993), Nowak, Bonnhoeffer, and May (1993), Eshel, Samuelson,and Shaked (1998),

Kirchkamp(1995).

6

learningwith thesymbolrL .

A player’s learningruleusesthefollowing information:

(1) Thelearningplayer’s repeatedgamestrategy.

(2) The own payoff uown obtainedwith this player’s repeatedgamestrategy,

i.e. the averagepayoff per interactionwhile using this repeatedgamestrat-

egy.

(3) A sampledplayer’s repeatedgamestrategy.

(4) The sampledplayer’s repeatedgamestrategy payoff usamp, i.e. the average

payoff perinteractionthattheplayerobtainedwhile usingthis repeatedgame

strategy.

Learningrulesarecharacterisedby a vectorof threeparameters� a0 a1 a2 �� ℜ3.

Givenalearningrule � a0 a1 a2 alearningplayersamplesoneneighbours’strategy

andpayoff andthenswitchesto thesampledstrategy with probability

p � uown usamp�� a0 � a1uown � a2usamp� (1)

where �� is definedas

� x� : �1 if x � 1

0 if x � 0

x otherwise

(2)

uown andusampdenotetheplayer’s andtheneighbour’spayoff.

Thus,thetwo parametersa1 anda2 reflectsensitivitiesof theswitchingprobability

to changesin theplayer’s andtheneighbour’s payoff. Theparametera0 reflectsa

generalreadinessto changeto new strategies.Thiscanbeinterpretedasahigheror

lower inclination to try somethingnew. Choosing � a0 a1 a2 a playerdetermines

threerangesof payoffs:Onewhereswitchingwill neveroccur(i.e. p � uown usamp!�

7

0), a secondonewherebehaviour is stochastic(i.e. p � uown usamp"�� 0 1 ), anda

third onewhereswitchingalwaysoccurs(i.e. p � uown usamp�� 1).

Of course,oneor even two of theserangesmay be empty. Our specification,in-

deed,allows for threetypesof rules:Ruleswhich are(almost)alwaysdetermin-

istic, ruleswhich alwaysbehave stochastically, andruleswhich reactdeterminsti-

cally for somepayoffs andstochasticallyfor others.An exampleof a determin-

istic rule (‘switch if better’) is � a0 a1 a2 : �#� 0$� a a with a % ∞. An exam-

ple of a rule thatalwaysimpliesstochasticbehaviour for thegamegivenabove is

� a0 a1 a2 : �&� 1� 2'� a a with 1� � 4a(� max�*) g )+,) h )+ 1 .

Our parametera0 is similar to theaspirationlevel ∆ from theglobalmodelstudied

in BinmoreandSamuelson(1994).However, thelearningrulesstudiedin Binmore

andSamuelsonarenot specialcasesof our learningrules,sincetheir decisionsare

perturbedby exogenousnoise.In caseswherethis noiseterm becomessmall our

rule approximatesBinmoreandSamuelson(1994)with � a0 a1 a2 : �-� ∆ $� a aanda % ∞.

2.4.1 Normalisation

We mapparameters� a0 a1 a2 "� ℜ3 into � a0 a1 a2 "��. 0 1/ 3 usingthe following

condition(seealsofigure3):

ai 0 tan πai � π2

1i �32 0 1 24 (3)

[Fig. 3 abouthere.]

Evolution operateson the normalisedvalues � a0 a1 a2 5�6. 0 1/ 3 in order not to

ruleoutdeterministicrules:Learningrulesfrom theliteratureareoftendeterminis-

tic. They canberepresentedasruleswhoseparametervaluesai areinfinitely large

8

or small. Sinceour evolutionary processis modeledin finite time it can hardly

convergeto infinite values.However, it couldconvergeto finite valuesof our nor-

malisationandstill denotedeterminsticrules.

2.4.2 Mutationsof RepeatedGameStrategies

We introducemutationsof repeatedgamestrategiesto show that simulationsare

particularlyrobust.However, aswe seein section4, we do not needmutationsfor

our results.

When a player learnsa repeatedgamestrategy, sometimesthe learningprocess

describedabove fails anda randomstrategy is learnedinstead.In this case,any re-

peatedgamestrategy, asdescribedin section2.3,is selectedwith equalprobability.

Sucha ‘mutation’ occurswith a fixedprobabilitymL . We considermutationrates

between0 and0 7.

2.5 ExogenousDynamicsthatSelectLearningRules

Fromtime to time playersrevise their learningrules.We assumethatplayersup-

datelearningrules independentlywith probability 1�tu in eachperiod.tu, hence,

denotestheaveragetime betweentwo updates.11 We considerlearningrates1�tu

among1�40000to 1

�400.If notmentionedotherwise1

�tu � 1

�4000.In particular,

updatesof learningrulesoccurmorerarely thanupdatesof strategiesor changes

of games.Giventhatupdatesof learningrulesdo not occurveryoftenit is reason-

able to assumethat playersmake a larger effort to updatelearningrules:Firstly,

all neighboursare sampled(and not only a single neighbouras for updatesof

11 Thesubscriptu will beusedto denotechangesof learningrules(eitherthroughlearning

or throughmutation).

9

strategies).Secondly, the sampleddata is evaluatedmore efficiently, now using

aquadraticapproximation.

Theshapesof theneighbourhoodsthatareusedto updatelearningrulesaresimilar

to thoseusedfor interactionandlearning(seefigure1). We denotethesizeof the

neighbourhoodfor updatesof learningruleswith thesymbolru.

A playerwho updateshis or herlearningrule hasthefollowing informationfor all

neighboursindividually (includinghim- or herself):

(1) The(normalised)parametersof their currentlearningrulesa0 a1 a2.

(2) Theaveragepayoffs per interactionthat theplayersobtainedwhile their cur-

rentlearningrule wasused,u � a0 a1 a2 .

To evaluatethis informationwe assumethat playersestimatea model that helps

themexplainingtheir environment,in particulartheir payoffs. Playersusetheesti-

matedmodelto chooseanoptimal learningrule.To describethis decisionprocess

weassumethatplayersapproximateaquadraticfunctionof thelearningparameters

to explain how successfullearningrulesare.Formally the quadraticfunction can

bewrittenasfollows:

u � a0 a1 a2 �� c � � a0 a1 a2 b0

b1

b2

�

10

� � a0 a1 a2 q00 q01 q02

q01 q11 q12

q02 q12 q22

a0

a1

a2

� ε (4)

PlayersmakeanOLS-estimationto derivetheparametersof thismodel(ε describes

thenoise).Wechooseaquadraticfunctionbecauseit is oneof thesimplestmodels

which still hasan optimum.Similarly we assumethat playersderive this model

usinganOLS-estimationbecausethis is asimpleandcanonicalwayof aggregating

theinformationplayershave.We do not want to betakentoo literally. We wantto

modelplayersthatbehave asif they would maximisea quadraticmodelwhich is

derivedfrom anOLS-estimation.

Given the parameters(c b0 b1 b2 q00 ' ' q22) of the estimatedmodel, players

choosethetriple a0 a1 a2 thatmaximisesu � a0 a1 a2 s.t. � a0 a1 a2 7�8. 0 1/ 3. We

find that in 99% of all updatesthe Hessianof u � a0 a1 a2 is negative definite,

i.e. u � a0 a1 a2 hasa uniquelocal maximum.In the lessthan1% of all updates

remainingthe quadraticmodelmight be unreliable.In this caseplayerscopy the

mostsuccessfulneighbour.

Figure4 shows(only for onedimension)anexampleof asampleof severalpairsof

parameterai andpayoff u (blackdots)togetherwith theestimationof thefunctional

relationship(grey line) betweenai andu.

[Fig. 4 abouthere.]

11

2.5.1 Mutationsof LearningRules

Weintroducemutationsof learningrulesto show thatoursimulationresultsarero-

bust.However, asweshow in section4, wedonotneedmutations.Resultswithout

mutationsareverysimularto resultswith asmallamountof mutations.

Whenaplayerupdateshisor herlearningrule,with asmallprobabilitymL it is not

theabove describedupdateschemewhich is used,but theplayerlearnsa random

learningrule.Thisruleischosenfollowingauniformdistributionover � a0 a1 a2 9�. 0 1/ 3, which is equivalent to a randomand independentselectionof a0, a1, a2

following eacha Cauchydistribution.We considermutationratesfor learningmL

between0 and0 7.

2.6 Initial Configuration

At thebeginningof eachsimulationeachplayerstartswith a randomlearningrule

thatis chosenfollowing a uniformdistributionover � a0 a1 a2 7�:. 0 1/ 3. Thus,ini-

tially, the parametersa0, a1, a2 aredistributedindependentlyfollowing a Cauchy

distribution. Hence,in the first period expectedvaluesof the parametersof the

learningrules are ¯a1 � ¯a2 � ¯a3 � 0. Eachplayer startswith a randomrepeated

gamestrategy, following auniformdistributionover theavailablestrategies.

3 Results with Endogenous Learning Rules

3.1 DistributionoverLearningParameters

Figure5 displaysaveragesover 53 simulationson a 50 � 50 grid, eachlastingfor

400000periods.

12

[Fig. 5 abouthere.]

Figure5 shows two differentprojectionsof � a0 a1 a2 . The left part displaysthe

distribution over � a1 a2 , theright partover � a0 a1 � a2 . Axesrangefrom 0 to 1

for a0, a1, anda2 andfrom 0 to 2 in the caseof a1 � a2. Labelson the axesdo

not representthenormalisedvaluesbut a0, a1, a2 insteadwhich rangefrom � ∞ to

� ∞. 12

Both picturesareadensityplot anda tableof relative frequencies:

Density plot: Different densitiesof the distribution are representedby different

shadesof grey. Thehighestdensityis representedby thedarkestgrey. 13

Table of relative frequencies: Thepicturesin figure5 alsocontaina tableof rel-

ative frequencies.The left pictureis dividedinto eightsectors,theright picture

is divided into six rectangles.The percentageswithin eachsectoror rectangle

representtheproportionof playersthatusea learningrulewith parametersin the

particularrange.

Theleft partof figure5 showsthat,while playersaresensitive to theirown payoff,

they aresubstantiallylesssensitiveto observedpayoffs.

12 Thescalingof all axesfollows thenormalisationgiven in equation3. Hence,thevalue

“ a1 ; a2” actuallyrepresents2 < tan� π <'� a1 ; a2 ��= 2 � π = 2� andnot a1 ; a2. In thecurrent

context thisdifferenceis negligible.13 Densitiesarederived from a table of frequencieswith 30 > 30 cells for eachpicture.

We maplogsof densitiesinto differentshadesof grey. Theinterval betweenthelog of the

highestdensityandthe log of 1% of thehighestdensityis split into sevenrangesof even

width.Densitieswith logsin thesameinterval havethesameshadeof grey. Thus,thewhite

arearepresentsdensitiessmallerthan1% of themaximaldensitywhile areaswith darker

shadesof grey representdensitieslargerthan1.9%,3.7%7.2%,14%,27%and52%of the

maximaldensity.

13

Sensitivity to own payoffs: Giventhatwe startwith a uniform distribution of a1

and a2, figure 5 initially would look like a smoothgrey surfacewithout any

mountainsor valleys.During thecourseof thesimulationslearningparameters

changesubstantially. Similar to many learningrulesdiscussedin theliterature14

a1 is oftencloseto � ∞.

Insensitivity to sampled payoffs: The left part of figure 5 shows that 96.3%of

all playersuselearningruleswith ) a2 ),�-) a1 ) , i.e. rulesthat aremoresensitive

to own payoff than to sampledpayoff. Notice that the initial distribution over

parametersof thelearningrule impliesthat50%of all rulesfulfil ) a2 )?�@) a1 ) .We describethis kind of behaviour as‘suspicious’15 in the following sense.

Thesuccessof aneighbour’s rulemaydependonplayerswhoareneighboursof

the neighbour, but not neighboursof the player. A ‘suspicious’playerbehaves

like somebodywho ‘fears’ that thesampledneighbour’s experiencecannot be

generalisedto applyto theplayer’sown case.

3.2 Probabilitiesof switching to a sampledlearningrule

The learningrulesasspecifiedin equation1 determineprobabilitiesof switching

to theobservedrepeatedgamestrategy. Figure6 shows thecumulativedistribution

of theseprobabilities.16

[Fig. 6 abouthere.]

14 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,

1993), Nowak, Bonnhoeffer, and May (1993), Eshel, Samuelson,and Shaked (1998),

Kirchkamp(1995).15 Of course,ouragentscanonly behave asif they hadfeelingslike suspicion.16 This figureis derivedfrom a tableof frequencieswith 30 cells.Thescalingof thehori-

zontalaxisfollows thenormalisationgivenin equation3.

14

The horizontalaxis representsa0 � a1uown � a2usamp. Following the learningrule

1 a player switchesstochastically with probability a0 � a1uown � a2usamp if this

expressionis betweenzeroandone.Otherwisetheplayereitherswitcheswith cer-

tainty or notat all.

Figure6 shows that only in about12% of all learningevents0 � a0 � a1uown �a2usamp � 1, this meansthat in only 12%of all learningeventsa player’s decision

is astochasticone.Ourendogenousrulesareneitherfully stochastic(asthosefrom

Borgers,Sarin (1995) andSchlag(1993))nor fully deterministic(as thosefrom

Axelrod(1984,p. 158ff), NowakandMay (1992),etc.).

3.3 Comparisonwith otherLearningRules

Above we mentionedtwo referencepointsfor learningrules:Thoselearningrules

thatareassumedto beexogenousandfixedin theliteratureon local evolution and

rulesthatturnout to beoptimal in aglobal setting.

Let usstartwith theexogenousrulesthatareassumedin theliteratureonlocalevo-

lution. We have seenthat theexogenousfixedrulesmaybesimilar to theendoge-

nouslearningrulesin thesensethatsmallchangesin theplayer’s ownpayoff may

leadto drasticchangesin theprobabilityof adoptinga new strategy. Endogenous

learningrulesdiffer from thosestudiedin partsof theliteratureon local evolution

in thesensethatchangesin anobservedplayer’spayoff leadto smallerchangesin

theprobabilityof adoptinganew strategy.

Let us next compareour endogenousrules with thoserules that are seento be

optimal in a global setting17 . As we have seen,our rules differ in two respects

from thosethatareoptimal in a globalmodel.Firstly, asdiscussedin section3.1,

17 SeeBorgersandSarin(1995),Schlag(1993,1994).

15

they aremoresensitiveto changesin alearningplayer’sown payoff thanto changes

in anobservedneighbour’s payoff. Secondly, asmentionedin section3.2,players

following endogenousrulesquiteoftenswitchwith certainty. Why is theoutcome

of our evolutionaryprocesssodifferentfrom optimalrules?

A highersensitivity to a player’s own payoff ascomparedto an observed neigh-

bour’s payoff canberelatedto thelocal structure.Strategiesthataresuccessfulin

a neighbour’s neighbourhoodmaybelesssuccessfulin a player’s own neighbour-

hood.Thereforeaneighbour’spayoff is a lessreliablesourceof informationthana

player’sown payoff.

Optimal learningrulesalwaysswitch stochasticallyin orderto evaluateinforma-

tion slowly but efficiently. Even small differencesin payoffs are translatedinto

differentbehaviour. Learningslowly is not harmful in BorgersandSarin’s or in

Schlag’s model. Their playersdo not carewhetherthe optimal strategy is only

reachedafteraninfinite time. In our evolutionaryprocess,however, discountingis

involved throughthe regular updateof learningrules.Learningrulesthatquickly

achieve sufficiently goodresultsoutperformrules that slowly approachthe opti-

mum. Evolution of learningrulesat a speedthat is more than infinitesimalmay

leadto deterministicbehaviour.

4 Dependence on Parameters

Our commentsso far have beenbasedon a particular parametercombination.

Changingtheparameters,however, doesnot affect our mainresults.We will con-

sideranalternativerule to sampleneighboursin section4.1.Wewill studychanges

in theotherparametersin section4.2.

16

4.1 TheSelectionRule:SamplingRandomlyor Selectively

Above we assumedthat playerslearn repeatedgamestrategiesfrom a randomly

sampledplayer. Onemight, however, objectthatplayerscouldbemorecarefulin

selectingtheir samples.As a benchmarkcaselet us assumein this sectionthat

playerssamplethemostsuccessfulneighbour, i.e. theneighbourwith thehighest

payoff per interactionfor the currentrepeatedgamestrategy, measuredover the

lifetime of therepeatedgamestrategy.

Figure7 showsadistributionover � a0 a1 a2 , projectedinto thea1 a2 anda0 a1 �a2 space,similar to figure5.

[Fig. 7 abouthere.]

While thepictureis morenoisythanfigure5 thepropertiesof learningrulesarethe

sameasin section3.1. Playersarequite sensitive to changesin their own payoff

but lesssensitiveto changesin thesampledneighbours’payoff.

We suspectthat theadditionalnoisestemsfrom thereducedevolutionarypressure

on learningrules.Sinceonly ‘best’ rulesareavailable,identifyinggoodrulesis too

easy. In figure 7 we observe a clusterof learningruleswith valuesof a2 closeto

� ∞. Theserulesapparentlyfollow a ‘just copy whatever you see’strategy. This

mightbereasonable,since‘whateveryousee’is alreadythebestavailablestrategy

in your neighbourhoodunderthis selectionrule.

4.2 OtherParameterChanges

Figure8 shows the effect of variouschangesin the otherparameters.We always

startfrom thesamecombinationof parametersasa referencepoint andthenvary

17

oneof theparameters,keepingall theothersfixed.Thereferencepoint is a simu-

lationona torusof size50 � 50,wheretheinteractionneighbourhoodandlearning

neighbourhoodbothhave thesamesizer i � rL � 1 while theneighbourhoodthat

is usedwhenupdatingthe learningrule hassize ru � 2. To learna new repeated

gamestrategy playersrandomlysamplea neighbour. Learningoccurson average

everytL � 24periods18 . Theunderlyinggameis changedeverytG � 2000periods.

Playersupdatetheir learningrule on averageevery tu � 4000periods19 . Themu-

tationratefor learningaswell asfor updateof learningrulesis mL � mu � 1�100.

Simulationslast for ts � 40000periods.20 Thus,exceptfor thesimulationlength,

parametersarethesameasthosefor figure5.

[Fig. 8 abouthere.]

Figure8showsaverages21 of a1 anda2 for variouschangesin theparameters.Each

dot representsaparametercombinationthatwesimulated.To make theunderlying

patternclearer, dotsareconnectedthroughinterpolatedsplines.The white dot in

eachdiagramrepresentstheaveragevalue(¯a1 �� 1 89, ¯a2 � 0 30)for thereference

setof parametersdescribedabove.Theline ¯a2 �A� ¯a1 is markedin grey.

The main result is that for all parametercombinationswe find relative sensitivity

to theplayer’s own payoffs, andinsensitivity to observedpayoffs. In particularthe

averagesa2 �A� ¯a1 for all parametercombinationsthatwesimulated.

18 Rememberthatwe assumelearningto bean independentrandomevent thatoccursfor

eachplayerwith probability1= tL .19 We alsoassumeupdateof learningrulesto beanindependentrandomevent thatoccurs

for eachplayerwith probability1= tu.20 Thesubscriptswill beusedto denotethelengthof thesimulation.21 Theseare arithmeticaveragesof a1 anda2, taken over 20 different simulationsruns,

which arerandomlyinitialised.Theaveragevaluesof a1 anda2 arethentransformedinto

a1 anda2 usingequation3.

18

Noticethatwedo not needmutationsfor our results.However, thesimulationsare

robust with respectto mutations.To show that we candispensewith both kinds

of mutationssimultaneouslywe ran a simulationwheremL � mu � 0 andshow

theresultin thegraph‘mutationof learningrules’ in table1 with a small triangle.

While learningonaverageleadsto asmallervalueof a2 westill have ¯a2 �B� ¯a1. On

theotherhand,we canintroducequite largeprobabilitiesof mutations(up to 0.7)

andstill have ¯a2 �A� ¯a1.

Let usnow discussthesensitivity of learningrulesto parametersof theevolutionary

processin moredetail.We distinguishthreedifferentclassesof parameters,those

whoinfluencerelative locality, relativespeed, andrelativenoiseof theevolutionary

process.We will seethatparametersvaluesthatdescribea slowor noisyprocess,

causethe distribution over the parametersof the learningrule to remaincloseto

the initial distribution. Parametersvaluesthat describea less local situation(for

examplethe interactionradiusis large,suchthatalmosteverybodyinteractswith

everybodyelse)cause‘suspicious’behaviour to disappeargradually.

Table1 summarisestheeffectsof theparameterson thesethreedimensions.

[Table1 abouthere.]

Let usbriefly discusssomeof thedependencieson parameters:

4.2.1 Locality

In thefollowing paragraphwewill explain in whichway theparameterstG, tL, mL ,

rL , C, andn influencediversityof players,andhence,locality of theevolutionary

process.Morelocality makesaplayer’sown experiencemoreimportant,hence,the

playerwill bemore‘suspicious’.Thus,theratio ¯a1� ¯a2 will besmaller.

19

Whengameschangerarely(i.e. tG is large)or whenplayerslearnfrequently(i.e. tL

is small) they have a betterchanceof finding the ‘long-run’ strategy for a given

game.Thus, they becomemoresimilar which reduceslocality. Diversity among

playersmayalsobereducedby ‘backgroundnoise’mL sincenoisemakesplayers

moresimilar. A larger learningradius(rL), increasesthe effectsof locality since

playerswill samplemore neighboursthat could be in a different situation.This

shows that beingableto spot locality is actuallyoneof its conditions.Likewise,

situationsbecomemore diversewhen the interactionneighbourhoodr i is small.

Another sourceof heterogeneityis that complexity of strategies also diversifies

players’behaviour. A largerheterogeneityis alsoprovidedby a larger population

(n).

4.2.2 Speed

Theparametersts, tu, andru influencethespeedof theevolutionaryprocesssince

they affectthefrequency or thescaleof learningsteps.Higherspeedallowslearning

rulesto moveaway from theinitial distribution, thus,to move fartherto theleft in

thediagram.

Thelongertheevolutionaryprocessruns(ts), themoretime learningruleshave to

developandto move away from the initial distribution.Also, themorefrequently

playersupdatelearningrules(i.e. thelargertu), thefasterlearningrulesevolveand

moveaway from theinitial distribution.Thefartherplayerssee(ru) whenupdating

a learningrule, thefastersuccessfullearningrulesspreadthroughthepopulation.

4.2.3 Noise

TheparameterstL, tu, mL , mu influencethenoisewithin theevolutionaryprocess.

Thisagainkeepsaveragesof theparametersof thelearningrulescloserto theinitial

20

distribution.

Whenlearningrulesarerarelyusedto selectstrategies(i.e. whentL is large),then

playersremaininexperienced,andlearningrulesarenot pushedinto a particular

direction.Whenlearningrulesarerarelyupdated(i.e. whentu is large) thenmore

reliabledataisavailablewhensuccessof alearningruleis evaluated.Developement

is, asa result,lessnoisy. Themorestrategiesareperturbedwhenthey arelearned

(mL), themoredifficult it becomesto evaluatea learningrule’s impacton success.

The more learningrulesareperturbedduring the updateprocess(mu), the more

they arepushedbackto theinitial distribution.

Notice, however, that changesin someparametersmay have conflicting effects.

One example is the speedtu in updatinglearning rules: For very small values

of tu learningrulesare updatedtoo often to accumulatea reasonableamountof

dataon thesuccessof the rule. As a consequence,theevolutionaryprocessis too

noisyto divergefrom theinitial distribution.For very largevaluesof tu thedataon

the performanceof learningrulesbecomesmorereliablewhile learningbecomes

slower. Again learningrulesdo not move away from their initial distribution. In

other words: Updatinga learningrule is beneficialfor the updatingplayer who

takesadvantageof informationthat is provided by neighbours.At the sametime

neighourssuffer sincethe updatingplayerceasesto serve asa reliablesourceof

information.Thesetwo conflictingeffectsexplain theturn in thetu-curve.

Wecanconcludethat,whateverparametersarechoosen,learningruleshavesimilar

propertiesin theend.Thedependenceof learningruleson parametersseemsto be

intuitive.

21

5 Stage Game Behaviour

Having dealtwith theimmediatepropertiesof endogenouslearningruleslet usnow

analysetheimpactthatendogenousruleshaveonstagegamebehaviour.

[Fig. 9 abouthere.]

Figure9 shows the proportionsof stagegamestrategies for variousgamesboth

for endogenousandfor fixedlearningrules.In simulationsrepresentedin figure9

theunderlyinggamechangesevery tG � 2000periods.Fromothersimulationswe

know that during these2000periodsstrategieshave adaptedto the new game22 .

Justbeforethegamechangeswedeterminetheproportionof stagegamestrategies

C andD. Theseproportionsarerepresentedin figure9 ascircles.Thepositionof

thecircle is determinedby theparametersof thegame,g andh. Thecolourof the

circle is white if theproportionof Cs is larger. Otherwiseit is black.

Figure 9 comparestwo cases:An exogenouslygiven learning rule of the type

‘switch if better’, approximatedas � a0 a1 a2 C�D� 0$� 100000 100000 and the

caseof endogenouslearningrules.

In bothpicturestwo areascanbedistinguished.Onearea,wheremostof thesimu-

lationsleadto a majority of C, andanotherone,wheremostsimulationsleadto a

majority of D. Two pointsareworthmaking:

E Thefixedlearningrule ‘switch if better’,which is anapproximationof thelearn-

ing rulesstudiedin theliteratureon local evolution with fixedlearningrules23 ,

22 SeeKirchkamp(1995).23 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,

1993),Nowak, Bonhoeffer, andMay (1993),Eshel,Samuelson,andShaked (1998),and

Kirchkamp(1995).

22

leadsto resultsverysimilar to thoseobservedin this literature.

� Thereis cooperationfor a substantialrangeof prisoners’dilemmas.Actually

30.3%of the142prisoners’dilemmasin this simulationleadto a majority of

cooperatingplayers.

� In coordinationgamesplayersdo not follow the principle of risk dominance

but anotherprinciplebetweenrisk dominanceandParetodominance24 .

E Underendogenouslearningtherangeof prisoners’dilemmaswheremostplayers

cooperateshrinksto 10.2%of the 137 prisoners’dilemmasin the simulation.

Behaviour in coordinationgamesagaindoesnot follow risk dominance.

Thefirst point is interesting,becauseit shows that themodelthatwe have studied

in thispaperis comparableto themodelsstudiedin theliteratureonlocalevolution

with fixedlearningrules.

The secondpoint shows that the propertiesof network evolution discussedin the

literatureonlocalevolutionwith fixedlearningrulespersist,atleasttosomesmaller

degree,evenwith endogenouslearningrules.

6 Conclusions

In this paperwe have studiedpropertiesof endogenouslyevolving learningrules

andthestagegamebehaviour thatis impliedby theserules.Wecomparedendoge-

nouslyevolving learningruleswith two typesof otherrules:‘Switch if better’rules

thatareassumedin standardmodelsof localevolution25 andrulesthatareoptimal

24 Very similar behaviour is foundfor thefixedlearningrule ‘copy thebeststrategy found

in theneighbourhood’in Kirchkamp(1995).25 SeeAxelrod (1984,p. 158ff), Lindgrenand Nordahl (1994),Nowak andMay (1992,

1993),Nowak, Bonhoeffer, andMay (1993),Eshel,Samuelson,andShaked (1998),and

23

in a globalcontext 26 .

We find thatour dynamicsselectsruleswhich aredifferentfrom thosecommonly

assumedin the literatureon local evolution. In particular, the learningrules se-

lectedonthebasisof ourdynamicsaremuchlesssensitiveto changesin asampled

player’spayoff. This ‘suspicion’canberelatedto thefactthatthesampledplayer’s

environmentis differentfrom thatof thelearningplayer.

The rules that emerge from our local evolutionaryprocessdiffer from rules that

areoptimal in a globalmodel.Our rulesarenot symmetricandthey ofteninvolve

deterministicbehaviour. The lack of symmetryin the learningrule is analogous

to the lack of similarity betweenthe situationof a learningplayerandthatof the

neighbours.Thedeterministicbehaviour is a resultof thelackof patiencewhich is

aconsequenceof themorethaninfinitesimallearningspeed.

Stagegamebehavior of endogenousrulesdiffersgraduallyfrom behaviour found

with ‘switch if better’.Cooperationfor a rangeof prisoners’dilemmasandcoordi-

nationnotonrisk dominantequilibriais presentwith ‘switch if better’rulesaswell

aswith our endogenouslearningrules,however, with endogenousrulesto a more

limited degree.

Besidesthe selectiondynamicsthat we have presentedherewe have also anal-

ysedotherselectiondynamics.In KirchkampandSchlag(1995)we studydynam-

ics whereplayersuselesssophisticatedupdaterulesthantheOLS-modelusedin

this paper. We have analysedmodelswhereplayersmove only in the direction

of the maximumof the OLS model,but do not adoptthe estimateof the optimal

rule immediately. We have also analysedmodelswhereplayersdo not estimate

any modelat all but copy successfulneighboursinstead.Both alternative specifi-

Kirchkamp(1995).26 SeeBorgersandSarin(1995),Schlag(1993,1994).

24

cationsleadto similar learningrule properties.Probabilitiesof switchingareless

sensitive to changesin theneighbour’spayoff andmoresensitive to changesin the

learningplayer’s payoff. The propertiesof the inducedstagegamebehaviour are

alsosimilar:Bothalternativespecificationsleadto cooperationfor someprisoners’

dilemmasandcoordinationnot on risk dominantequilibria.Thus,we canregard

theaboveresultsasfairly robust.

References

Axelrod,R., 1984,Theevolutionof cooperation (BasicBooks,New York).

Binmore,K., andL. Samuelson,1994,Muddling Through:Noisy Equilibrium Selection,

DiscussionPaperB–275,SFB303,RheinischeFriedrichWilhelmsUniversitat Bonn.

Bonhoeffer, S., R. M. May, andM. A. Nowak, 1993,More SpatialGames,International

Journal of BifurcationandChaos, 4, 33–56.

Borgers, T., and R. Sarin, 1995, Naive ReinforcementLearning With Endogenous

Aspirations,Secondinternationalconferenceon economictheory:Learningin games,

UniversidadCarlosIII deMadrid.

Ellison,G.,1993,Learning,Local Interaction,andCoordination,Econometrica, 61,1047–

1071.

Ely, J.,1995,Local Conventions,Mimeo, Universityof Californiaat Berkely, Economics

Department.

ESHEL, I ., L. SAMUELSON, AND A. SHAKED (1998):“Altruists,Egoists,andHooligans

in aLocal InteractionModel,” TheAmericanEconomicReview, 88,157–179.

Hegselmann,R.,1994,Zur SelbstorganisationvonSolidarnetzwerkenunterUngleichen,in

WirtschaftsethischePerspektivenI, ed.by K. Homann,no.228/I in SchriftendesVereins

25

fur Socialpolitik, Gesellschaftfur Wirtschafts-und Sozialwissenschaften,NeueFolge,

pp.105–129(Duncker & Humblot,Berlin).

Hilgard, E. R., D. G. Marquis, and G. A. Kimble, 1961, Conditioning and Learning.

Appleton-Century-Crofts, New York, 2nd editionrevisedby GregoryAdamsKimble.

Kirchkamp, O., 1995, Spatial Evolution of Automata in the Prisoners’ Dilemma,


Kirchkamp,O., andK. H. Schlag,1995,EndogenousLearningRulesin SocialNetworks,

RheinischeFriedrichWilhelmsUniversitat Bonn,Mimeo.

Lindgreen,K., andM. G. Nordahl,1994,Evolutionarydynamicsof spatialgames,Physica

D, 75,292–309.

May, R. M., andM. A. Nowak,1992,EvolutionaryGamesandSpatialChaos,Nature, 359,

826–829.

, 1993,The SpatialDilemmasof Evolution, InternationalJournal of Bifurcation

andChaos, 3, 35–78.

Sakoda, J. M., 1971, The Checkerboard Model of Social Interaction, Journal of

MathematicalSociology, 1, 119–132.

Schelling,T., 1971,DynamicModelsof Segregation,Journal of MathematicalSociology,

1, 143–186.

Schlag,K. H., 1993,Dynamic Stability in the RepeatedPrisoners’Dilemma Playedby

Finite Automata,Mimeo,Universityof Bonn.

, 1994, Why Imitate, and if so, How? Exploring a Model of Social Evolution,


26

List of Figures

1 Neighbourhoodsof differentsizes 28

2 Thespaceof consideredgames. 29

3 Mappingfrom � a0 a1 a2 7� ℜ3 into � a0 a1 a2 (�8. 0 1/ 30

4 An exampleof samplesof pairs.. . 31

5 Long run distributionoverparametersof thelearningrule 32

6 Cumulativedistributionof probabilitiesof switching 33

7 Long run distributionoverparametersof thelearningrule 34

8 Dependenceof a1 anda2 on theparametersof thelearningrule. 35

9 Stagegamebehaviour dependingon thegame 36

27

r F 1 r F 2 r F 3

Fig. 1. Neighbourhoodsof differentsizes

28

DDPD DD

DDriskG

CC

CCriskG

DD

CC CD H DC

DD

1 212

1

1

h

g

Fig. 2. Thespaceof consideredgames.

29

0 1 2 3 4I 1I 2I 3I 4

0J 25

0J 50

0J 75

1J 00ai

ai

Fig. 3. Mappingfrom � a0 � a1 � a2 �9� ℜ3 into � a0 � a1 � a2 �9�LK 0M 1N

30

ai

u

aOi

PP P P

P P PP P P P P

P P P P P P P P P

Fig. 4. An exampleof samplesof pairsof parametersandpayoffs (black)which areusedto estimatea functional relationship(grey) betweenai andu. Given this relationshipanoptimalvalueaQi is determined.

31

a1

a2

-Inf -1 0 1 +Inf

-Inf

-1

0

1

+Inf

R ∞

R 1

0

1

S ∞a2

R ∞ R 1 0 1S ∞

a1

69.2

1.9 0.539

0.454

0.453

0.4770.729

26.2

a0

·a1+a2“

-Inf -1 0 1 +Inf

-Inf

-1

0

1

+Inf

T ∞

T 1

0

1

U ∞“ a1U

a2”

T ∞ T 1 0 1U ∞

a0

2.47 1.71 1.19

25.7 55.6 13.3

Fig. 5. Longrundistribution over parametersof thelearningrule � a0 � a1 � a2 � . Averageover53simulationsrunsona torusof size50 > 50with 2-stateautomata.Neighbourhoodshavesizesr i V rL V 1, ru V 2. Relative frequenciesaregiven aspercentages.Simulationslastfor ts V 400000periods,interactionstake placewith probability pi V 1= 2, repeatedgamestrategiesarelearnedfrom arandomlysampledplayerwith probability1= tL V 1= 24, learn-ing rulesarechangedwith probability1= tu V 1= 4000,new gamesareintroducedwith prob-ability tG V 1= 2000,mutationsboth for repeatedgamestrategies and for learningrulesoccurata rateof mL V mu V 1= 100.

32

0

0W 5

1W 0

X ∞ X 1 0 1 Y ∞

a0 Y a1uown Y a2usamp

Fig. 6. Cumulative distribution of probabilitiesof switching,giventhelearningrulesfromfigure5.

33

a1

a2

-Inf -1 0 1 +Inf

-Inf

-1

0

1

+Inf

Z ∞

Z 1

0

1

[ ∞a2

Z ∞ Z 1 0 1[ ∞

a1

51.7

5.68 3.48

2.57

2.28

2.543.53

28.2

a0

·a1+a2“

-Inf -1 0 1 +Inf

-Inf

-1

0

1

+Inf

\ ∞

\ 1

0

1

] ∞“ a1]

a2”

\ ∞ \ 1 0 1] ∞

a0

7.44 5.86 4.25

10.2 38.9 33.3

Fig. 7. Longrundistribution over parametersof thelearningrule � a0 � a1 � a2 � . Averageover181 simulationruns,eachlasting for 400000 periods.Relative frequenciesaregiven aspercentages.Parametersareas in figure 5, except that learningplayerssamplethe mostsuccessfulneighbour.

34

¯a1

¯a2

^ ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

time to learnstrategies

tL _ 1 236

1224

60

aaaaa a atL _ 120

bc¯a1

¯a2

d ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

time to changethegame

tG e 200600

20006000

f ff fgf tG e 20000hi

¯a1

¯a2

j ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

time to updatelearningrules

40070012002100

4000 12000kkk k k kk

tu l 40000

mn¯a1

¯a2

o ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

lengthof simulation

ts p 12004000

12000

40000ts p 400000

qqqqq rs

¯a1

¯a2

t ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

mutationof strategies

mL u 00.0010.003

0.010.03

vvv vv v

v v

0w 10w 3

0w 7xy

¯a1

¯a2

z ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

mutationof learningrules

mu { 0 0.010.030.1

|||| | | | | |0.30} 7~

mL { mu { 0

��

¯a1

¯a2

� ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

radiusfor interaction,learningandupdate

r i � 3

rL � 3

ru � 3 ru � 1� � ��

¯a1

¯a2

� ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

Complexity of strategies

C � 123

C � 4

��

¯a1

¯a2

� ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

Probabilityof interaction

pi � 10� 70� 30� 1

pi � 0� 01��

¯a1

¯a2

� ∞|

-5|

-2|

-1|

-0.5|

0.25|

0.5|

PopulationSize

5 � 520 � 20

50 � 50

200 � 200

��

Fig. 8. Dependenceof a1 anda2 on theparametersof thelearningrule.Dots representaveragesover the last 20% of 20 simulationruns,eachlastingfor40000 periods.The white circle in eachdiagramrepresentsaveragesof the ref-erenceparameters:50 � 50 torus, samplea randomplayer, tL � 24, tu � 4000,tG � 2000,mL � mu � 1� 100,ts � 40000,r i � rL � 1, ru � 2,C � 2. Thegrey lineshows ¯a2 �� ¯a1.

35

PlayerII

PlayerI

D C

Dg

g

� 1h

Ch� 1

00

��DDPD DD

DDrisk�

CC

CCrisk�

DD

CC CD DC

DD

1 2¡ 1¡ 2

1

¡ 1

h

g��

switchif betterh

20-2

g

1

0

-1

prevailing strategy

1241obs.

endogenoush

20-2

g

1

0

-1

prevailing strategy

1170obs.

Fig. 9. Stagegamebehaviour dependingon the game.( ¢ =mostplayersplay C, £ =mostplayersplay D). Parameters:50 ¤ 50 torus,r i V rL V 1, ru V 2, samplea randomplayer,tL V 24, tu V 4000,tG V 2000,mL V 0M 1, mu V 0M 1, ts V 400000.

36

List of Tables

1 Effectsof simulationparametersonpropertiesof learningrules. 38

37

time tochange

mutationsradius

stra

tegy

gam

e

lear

ning

rule

leng

thof

sim

ulat

ion

stra

tegy

lear

ning

rule

inte

ract

ion

lear

ning

upda

tele

arni

ngru

leco

mpl

exity

prob

.ofi

nter

act.

popu

latio

nsiz

etL tG tu ts mL mu r i rL ru C pi n

degreeof locality ¥@¦ ¦ ¦§¥ ¥ ¥relative speed ¦ ¥ ¥relative noise ¥ ¦ ¥ ¥

Table1Effectsof simulationparametersonpropertiesof learningrules.¨

and © denotethe directionof the effect an increaseof a parameterhason speed,effi-ciency or locality.

38

Documents

Simultaneous Evolution of Learning Rules and Strategies · Simultaneous Evolution of Learning Rules and Strategies Oliver Kirchkamp1 University of Mannheim, SFB 504, L 13, 15, D-68131Mannheim,