Th - pdfs.semanticscholar.org · Anot er b o ok wit h m u c h in uence on t e su cce s s of gam eory i s Games and De cisions b y Lu ce an d Rai a, [14]. A lot of gam e t h eory mo

The Prisoners' Dilemma RevisitedAbdellah Salhi, Hugh Glaser, David De Roure and John [email protected], [email protected], [email protected],[email protected] Systems and Software Engineering GroupTechnical Report DSSE-TR-96-2March 14 1996Department of Electronics and Computer ScienceUniversity of SouthamptonHigh�eld, Southampton SO17 1BJ, United Kingdom

The Prisoners' Dilemma RevisitedA.Salhi+, H.Glaser+, D.De Roure+, and J.Putney�+ Department of Computer Science and Electronics,The University of Southampton,Southampton SO17 1BJ, UK� Trading and Planning Group, National Power PLC,Windmill Hill Business Park, Whitehill Way,Swindon SN5 6PB, UKAbstractCooperation has always been recognised as a fundamental ingredient in thecreation of societies and the generation of wealth. As a concept, it has beenstudied for many years. Yet, in practice, its emergence and persistence are lessunderstood. In the following a game theoretic approach to the study of coopera-tion based on the Prisoners' Dilemma is reviewed. Using a Genetic Algorithm,strategies for playing the In�nitely repeated Prisoners' Dilemma are evolved.Are also reported results obtained with an implementation in MATLAB of suchan algorithm and the insights it provides with regard emergence and persistenceof cooperative behaviour.KEYWORDS: Prisoners' Dilemma, Cooperative/Competitive Games, Strategy,Genetic Algorithm.1 IntroductionGame theory has been used in economics ever since its inception in 1944 by Von Neu-mann and Morgenstern in a book with the title, The Theory of Games and EconomicBehavior. It is expanding at a prodigious rate: Already well established in other socialsciences it has now gained a strong foothold in political sciences and biology. Anotherbook with much in uence on the success of game theory is Games and Decisions byLuce and Rai�a, [14].A lot of game theory models are mathematically understood and universally ac-cepted results can be found in most Operational Research texbooks. A case in mindis the class of zero-sum games or strictly competitive games. Hide-and-Seek is a goodexample: The discovered has lost and the discoverer has won. At the other extremeare the so called strictly cooperative or pure coordination games where the players'interests are exactly the same. The game of Rendez-vous is a good example in whichopponents released in a crowded city try to �nd each other. These games are declaredby some to be trivial at least from the theoretical point of view. In [14], for instance,it is asserted that `any group of decision makers which can be thought of as having aunitary interest motivating its decisions can be treated as an individual in the theory'of games. Note that empirical investigations do not support this statement and thatdecision making in pure coordination situations is far from trivial, [5].1

Real world problems, however, are seldom strictly of either of these types. Theyare often a mixture of both, i.e. the players' interests are neither diametrically op-posed nor in total agreement. These games are known as Cooperation/Collusion orCooperative/Competitive games. They usually present opportunity for cooperationstemming from the existence of common objectives, i.e. what is good for a player mayalso be good for the other(s).This game model �ts well a lot of retail market situations which are not classicmonopolies. Consider the Electricity retail market in England and Wales, for instance.Two main players (a Duopoly) namely National Power and PowerGen are involvedbeside a dozen other players with a markedly smaller share of the market. The mainplayers have opposed objectives such as increase/maintain their respective shares ofthe market, but also common grounds such as the requirement of government tomaintain diversity in the market place. It is obviously not a zero-sum game. It ismore realistically modelled as a Cooperative/Competitive game. As such, it is wellrepresented by the so called Prisoners' Dilemma paradigm. In the following, this gamewill be reviewed and a computational approach to the study of cooperation based ongenetic algorithms will be presented.2 The Prisoners' DilemmaThe Prisoners' Dilemma (PD) is a samll game popular as a paradigm for the problemof human cooperation. It is said to have been brought to attention by Merrill Floodof the Rand Corporation in 1951 and was later formulated by Al Tucker [5, 4]. Thisformulation can be as follows: Player2C DPlayer1 C R=3,R=3 S=0,T=5D T=5,S=0 P=1,P=1Table 1: Formulation of PD: The Payo� MatrixIn the payo� matrix of Table 1, actions are C and D standing for 'Cooperate' and'Defect' respectively. Traditionally, the payo�s are represented by R, P, T, and Sstanding respectively for 'Reward', 'Punishment', 'Temptation', and 'Sucker's' payo�.This payo� matrix shows that defecting is bene�cial to both players for two reasons:First, it leads to a greater payo� (T = 5) in case the other player cooperates, (S = 0).Second it is a safe move because neither knows what the other's move will be. So, torational players defecting is the best choice. But, if both players choose to defect thenit leads to a worse payo� (P = 1) as compared to cooperating (R = 3). That is thedilemma.For many [7, 8, 15] this problem captures so well cooperation in real life that notunderstanding how it may be achieved in this simple situation is a good reason to giveup thinking about rational cooperation altogether. Others, however, argue that thespecial setting of the one shot PD is contrary to the idea of cooperation. First, becausethe only equilibrium point is the outcome [P,P] which is a Nash equilibrium. Recallthat a Nash equilibrium arises when the strategy choice of each player is the best reply2

to the strategy choices of the other players [3]. Second, [P,P] is at the intersection ofminimax strategy choices for both players. These minimax strategies are dominantfor both players, hence the exclusion in principal of cooperation (by virtue of thedominance of the chosen strategies). Morover, even if cooperative strategies werechosen, the resulting cooperative `solution' is not an equilibrium point. This meansthat it is not stable due to the fact that both players are tempted to defect from it.It should also be noted that cooperative problems in real life are likely to be facedrepeatedly. Which makes the Iterated Prisoners' Dilemma (IPD) a more appropriatemodel for the study of cooperation than the one-shot version of the game. Note, alsothat the orthodox view among game theorists is that cooperation cannot result fromrational play in the one shot PD, [4].In [5], it is further pointed out that the usual interpretation of the C choice asCooperative and the D choice as Competitive should be questioned. This is becausecooperation in this instance carries an element of risk, and the rational players canonly be cooperative in an atmosphere of mutual trust. Also, defectiveness is triggeredby suspicious defensiveness and the relative safety of the D choice as opposed tocompetitiveness [5]. This observation makes the empirical investigation of cooperationa more attractive approach than its analytical counterpart.2.1 The Iterated Prisoners' DilemmaThis version of the game is the subject of interest here. The game proceeds over anumber of moves the value of which is either decided before hand in which case resultsare those of the one-shot PD diluted, or the game is inde�nitely repeated until somerandom event occurs and brings it to a permanent end. This is modelled by using a�xed probability � that the event will occur after each round of PD's. The inde�niteaspect of the game is captured by concentrating on small �'s.The PD game is characterised by the strict inequality relations between the payo�s:T > R > P > S. And to avoid coordination or total agreement getting a `helpinghand', most experimental PD games have a payo� matrix satisfying: 2R > S + T , asin Table 1.In the iterated PD (IPD), the concept of time, contrary to the one-shot PD, takesall its weight: players, for instance, realise that reprisals as well as reward may betriggered by their strategy choices. And because of this time dimension the IPDpresents the players with scope for investigating each others behaviour. There is op-portunity for cooperative, deceptive, threatening, exploitatative behaviours and muchmore. But there is no guarantee that any one behaviour will take place. This oppor-tunity for diverse behaviours to crop up in the IPD maps to a similarly large set ofdiverse strategies to choose from, [1, 2, 13]. Some of these strategies will be discussedlater.The close analysis of the IPD reveals that, unlike the one-shot PD, it has a largenumber of Nash equilibria. These being inside the convex hull of the outcomes (0,5),(3,3), (5,0), (1,1) of the pure strategies in the one-shot PD, (see Figure 1). Note that(1,1) corresponding to [P,P] is a Nash equilibrium for the IPD also.3

(0, 5)

(5, 0)

(1, 1)

(3, 3)

NFigure 1: Set of Nash Equilibrium Points.2.2 Cooperation in the IPD: Emergence and PersistenceThe question, the answer to which is of interest to anyone who attempted to use IPDas a model for capturing cooperative behaviour, is how does Cooperation arise in theIPD, in the �rst place? When it has arisen how is it sustained?In their ground breaking work of 1957, [14], Luce and Rai�a predicted that in theIPD, sequences of [R,R] outcomes will arise as the players are aware of the e�ect ofreprisals. It is the fear of reprisal which is the trigger of the cooperative behaviour.However, early experimental results pointed to the contrary: the players somehowget frequently entangled in point [1,1] of Figure 1, corresponding to the outcome [P,P].This was the DD lock-in e�ect observed in many experimental studies of the late 50'sand early 60's, [6], [5].More recent results are not any clearer: In [11] for instance it is argued that co-operation emerges invariably in the IPD due to an implicit communication betweenthe players; players cannot communicate explicitely, but they do so by playing cooper-atives moves, thus signalling their `good intentions'. For instance, if a player persistsin choosing C whatever the strategy choice of his opponent, then it can be seen as asignal for cooperative play which is going to be reciprocated, according to [11]. Thisis rather naive although it should be an acceptable explanation to some particularexperimental results. The objection, of course, is that overdoing cooperative playcan also be seen as a sign of weakness or foolishness that prompts exploitation andpunishment. Hence, the emergence of the opposite of the desired e�ect.2.2.1 A Simple Explanation?The seemingly contradictory results so far reviewed may be seen perfectly reasonableif we accept that at start of play in most experiments, players have no clear overridingobjectives. They, therefore try to explore the di�erent strategies open to them and4

upon reward or reprisal fall into a sequence of [P,P]'s or [R,R]'s from which it is di�-cult to get out, these outcomes being Nash equilibrium points. It is the inventivenessof the players, and their overriding objectives becoming clearer after some rounds ofplay, which help break the sequence. These overriding objectives have been summedup in [16] as follows:� maximising joint payo�s, leading to the C choice;� maximising individual payo�s, with no clear strategy;� maximising relative payo�s, i.e. attempting to beat the opponent and that pointsto the D choice.This analysis �ts well the results reported in [18]. There, it is said that in the IPD,at the beginning players tend to choose C in a proportion slightly greater that 1/2.This phase is then followed by a rapid decline in cooperation (a "sobering period").After approximately 30 repetitions, the C choice becomes more frequent reachingaround 60% in 300 repetitions.2.3 The Evolutionary ApproachThe work that has undoubtedly marked recent interests in the IPD is that of RobertAxelrod, [1, 2] where computer tournaments were used in the study of the evolution ofcooperation. The study acted as a stimulus for the design of complex strategies andthe investigation of simple ones. It points however to cooperative strategies winningover competitive ones in the long run. It falls short of explaining how Cooperationemerges in the �rst place, but it gives a sound explanation for its persistence: Onceit has emerged it tends to be sustained by the building blocks or genes of cooperat-ive strategies becoming the majority in the pool of surviving genes. In other wordsevolution favours cooperative strategies.Results that con�rm the persistence of cooperative strategies in the IPD have alsobeen reported in [12]. There, the idea of evolution is applied to a spatial model inwhich IPD is played within neighborhoods and that evolution works on a local level.Strategies are represented as small automata, identical for a player but possibly indi�erent states against di�erent neighbours. Simulation results show that cooperat-ive strategies have a high rate of survival and that they persist even in stochasticenvironments.Yet, other extensive simulation results, in the line of Axelrod's work, reported in[13] and [17], are less conclusive. In [4] also, the whole interpretation of tournamentresults is criticised. It is further argued that there are a lot of arbitrary factors suchas the length of the chromosomes, the strategies against which others are evolved, thelength of games etc... which prevent from accepting generalisations of the results oncooperative behaviour.This is a fair criticism, but so far, this evolutionary approach is the most promising.A good review of this approach and further experiments and discussions can be foundin a succession of articles by D.R.Hofstadter collected in his Metamagical Themas,[9]. 5

2.4 Strategies and Automaton RepresentationIn IPD and other games strategies are of two types: pure and mixed. `A pure strategyfor a player in a game is a complete description of what the player plans to do wheneverhe or she might be called upon to make a decision. A mixed strategy arises whena player randomizes over his pure strategies, perhaps by tossing a coin or rolling adice', (Binmore, 1994), [4].In the one shot PD, there are two pure strategies, Defect and Cooperate. In theIPD, their number is in�nite and so is the number of mixed strategies.2.4.1 RepresentationFinite state machine or Moore machine representation of strategies is common ingame theory. A Moore machine consists of states (�nite number of them representedas circles with a label inside) and transitional arcs (directed arcs) for moving from onestate to another. Arcs are labelled and only the arc which starts o� the automatonhas no label; it takes that of the state to which it leads initially.In the present context the player who adopts a strategy described by a �nite statemachine, falls initially into a state following the unlabelled arc. From then on its statedepends on the move of the opponent, i.e. the labelled arcs emanating from the currentstate. If the opponent follows an arc which leads to the same current state, then theplayer remains in the same state. If the opponent follows an arc leading to anotherstate, then our player changes state accordingly.A strategy that is well studied and seems to be the overall winner in Axelrod'sOlympiad [1, 2] is TIT-FOR-TAT. This strategy is based on reciprocity but it startswith a C move. A �nite automaton representation of it is in Figure 2. If both playersadopt TIT-FOR-TAT then [R,R] will always be the outcome.C

D

C

C

D

DFigure 2: Tit-For-Tat.TAT-FOR-TIT on the other hand is a variation on TIT-FOR-TAT. It is alsobased on reciprocity but starts play with the D choice. Its representation as a �niteautomaton is in Figure 3.TIT-FOR-TAT is considered as a `nice' strategy while TAT-FOR-TIT is `nasty'.The reason for that is the latter starts by getting into a D state. As long as the replyis C, it remains in that state. If, however, the reply is a D (a retaliatory signal!) thenit becomes `nicer' and moves into the C state, where it remains as long as the reply isa C. It moves back to the D state if the opponent chooses D.6

D

C

D

C

CDFigure 3: Tat-For-Tit.A completely hopless strategy would be EASY-GO which is represented as inFigure 4. It is `too nice' to be a winning strategy although it starts by getting into theD state. In other words it has nothing going for it to be chosen by a rational player.D

C C

CD

, DFigure 4: Easy-Go.A ruthless strategy would be FORG-NOT of Figure 5. It starts by falling into a Cstate where it remains as long as the opponent also chooses C. But once the opponentdefects, FORG-NOT moves into the D state from which it never gets out whateverthe opponent tries. FORG-NOT is ruthlessly unforgiving!D

C C , D

C DFigure 5: Forg-Not.A more complex strategy would be ADJUSTER which attempts to explore theopponent's play. It, for instance, sets up to choose C only once every three moves,say, unless the opponent uses D more than once in three successive moves. Thenit cooperates twice in three successive moves. If the opponent cooperates whateverhappens, then ADJUSTER will choose only D every time. This allows it to exploitover cooperation such as in EASY-GO. 7

Strategies like these and more complex ones have been tried and investigated.Recent experimental results [2, 13] point to TIT-FOR-TAT as an overall winningstrategy. And because of its intrinsic reciprocity, it promotes cooperation somehow.Note that [TIT-FOR-TAT, TIT-FOR-TAT] is a Nash equilibrium for the IPD.3 Evolving StrategiesIn [1, 2] a computational approach based on the Genetic Algorithm of Holland, [10],has been devised which evolves strategies. This evolutionary approach relies on therepresentation of a strategy as a chromosome made up of all possible outcomes ofthe three previous moves. Because there are 4 possible outcomes for each move, i.e.[R,R], [T,S], [S,T] and [P,P], there are 43 or 64 possible moves. So a string of 64 genesor letters C and D makes up a strategy. But at the start of the game 6 genes arerequired to specify the premises of the 3 hypothetical initial moves for each player.They encode the assumed C or D choices made by each player in each of the threemoves which precede the game. So for two players and three hypothetical moves 6genes are needed.These take the length of a strategy to 70 genes, which completely describe what aplayer would do in every possible situation arising in a game of IPD. It is a completede�nition of a strategy.3.1 The Genetic Algorithm1. Randomly generate an initial population of strategies;2. Use each generated strategy to play a IPD against a set of known strategiessuch as TIT-FOR-TAT, ADJUSTER, FORG-NOT etc...3. Select a proportion of successful strategies according to their average score overall games played so far. This score de�nes the �tness of each individual;4. Use genetic operators such as cross-over and mutation to produce a new popu-lation;5. Repeat process as necessary to be consistent with the modelling of the IPD, (seeSection 2.1).This algorithm will produce more e�cient strategies after each generation. Itshould eventually converge to one strategy which is "near optimal" given the conditionsand the environment of its evolution.Experimental results with this algorithm turned out to be surprisingly interestingand relevent to the understanding of cooperative/competitive behaviour. From a ran-dom initial population, strategies as successful as TIT-FOR-TAT were evolved. Notealso that these evolved strategies are mixed strategies with a make up comprisingsequences which behave like TIT-FOR-TAT. Some interesting patterns of behaviourobserved are as follows.1. outcomes [R,R], [R,R], [S,T], followed by action D may be interpreted as if theopponent defects out of the blue then do the same, i.e. be provocable.8

2. outcomes [S,T], [T,S], [R,R], followed by action C may be interpreted as continueto cooperate after cooperation has been restored, i.e. accept apology.3. etc... (See Appendix B for more examples from a sample output)3.2 ImplementationWe have implemented the algorithm of the previous section in Matlab. The results,although limited seem to agree with those in [2]. PRISDIL, the code for the toyprogramme can be seen in Appendix A and a sample output showing an evolvedstartegy is in Appendix B.4 ConclusionThere is a large body of literature concerned with the Prisoners' Dilemma and itsin�nite-horizon variant, the IPD. Although the results surveyed have some ambiguityin them, it is in our view related to simulation design and the interpretation of results.But the approach, especially the computational approach is promising and the resultsare very interesting.The way strategies are encoded and the use of the Genetic Algorithm captureconvincingly the notion of cooperative (or otherwise) behaviour in the IPD. The ap-proach, especially the encoding of strategies, should translate without di�culty to realworld applications where duopolistic or oligopolistic competition is involved, such asthe Electricity Market.From our limited experience, running experiments of this type is computationallyextensive. However, they can bene�t from distributed/parallel computing platforms.Such experiments are envisaged in the future.References[1] R. Axelrod. The Evolution of Cooperation. Basic Books, New York, 1984.[2] R. Axelrod. The evolution of strategies in the iterated prisoners' dilemma. InL. Davis, editor, Genetic Algorithms and Simulated Annealing, pages 32{42.Morgan Kaufmann, Los Altos, Calif., USA, 1987.[3] K. Binmore. Fun and Games. D.C.Heath, Lexington, Mass., USA, 1991.[4] K. Binmore. Playing fair: Game theory and the social contract. MIT Press,Cambridge, England, 1994.[5] A.M. Colman. Game Theory and Experimental Games. Pergamon Press Ltd,Oxford, 1982.[6] M.M. Flood. Some experimental games. Management Science, 5:5{26, 1958.[7] G. Hardin. The tragedy of the commons. Science, 162:1243{1248, 1968.[8] R. Hardin. Collective Action. John Hopkins Press, Baltimore, 1982.9

[9] D.R. Hofstadter. Metamagical Themas. Viking, 1985.[10] J.H. Holland. Adaptation in Natural and Arti�cial Systems. University ofMichigan Press, Ann Arbor, Michigan, USA, 1974.[11] H.J. Jones. Game Theory: Mathematical Models of Con ict. Ellis Horwood,Chichester, UK, 1980.[12] O. Kirchkamp. Spatial evolution of automata in the prisoners' dilemma. Technicalreport, Dept. of Economics, University of Bonn. Working Paper.[13] B. Linster. Essays on Cooperation and Competition. PhD thesis, University ofMichigan, Michigan, USA, 1990.[14] R. Luce and H. Rai�a. Games and Decisions. Wiley, New York, 1957.[15] H. Margolis. Sel�shness, Altruism, and Rationality. Cambridge Press, Cam-bridge, England, 1982.[16] C.G. McClintock. Game behaviour and social motivation in interpersonal set-ting. In C.G. McClintock, editor, Experimental Social Psychology, pages 271{297.Holt, Reinhart and Winston, New York, USA, 1972.[17] D. Probst. Evolution in the repeated prisoners' dilemma. Technical report, Dept.of Economics, University of Bonn. Working Paper.[18] A. Rapoport and A.M Chammah. Prisoner's Dilemma: A Study in Con ict andCooperation. University of Michigan Press, Ann Arbor, USA, 1965.

10

A MATLAB code%Script to run PRISDIL the Prisoners' Dilemma simulationt0=cputime;player=[];score=[];tft=[];opp_score=[];%[player,score,opp_score]=prisdil(genes,pop_sz,n_o_mvs,...%n_o_games,score,indiv_opp_score,mu,xovr)[player,score,opp_score]=prisdil(70,20,150,20,score,...opp_score,0.01,0.5);disp('Total CPU time:');cputime-t0function [player,score,opp_score]=...prisdil(genes,pop_sz,n_o_mvs,n_o_games,scores,opp_score,mu,xovr)************%PRISDIL Driver of the prisoner's dilemma experiment%Generate the population of random startegies or players.%Each consisting of 5 columns of 70 genes.%The player is therefore a matrix PLAYER[70,5]%Column 1 contains the scores for the seventy possible outcoes%Column 2 the 1st outcome% " 3 2nd ""% " 4 " 3rd ""% " 5 contains the random (possible) moves for this% player (the random strategy)global tft_mvs adj_mvs evolv%log_<file> will contain the results for each player and the opponentlog_score=[];log_opp_score=[];%pop will contain the population. It changes with every generation.pop = [];pop = genpop(pop_sz,genes);%n_o_games games (or generations) are playedlog_tft_mvs=[];log_adj_mvs=[];%Set when a mutation should take place; here every 2 generations;mut_time=2;evolv='true ';t=cputime;for game_n=1:n_o_gamesgame_n 11

%Operate a mutation if it is timeif game_n==mut_time & game_n ~=n_o_gamespop=mutate(pop,pop_sz,genes,mu);mut_time=mut_time+2;end;%Each of the generated strategies or players plays n_o_games games%of n_o_mvs moves with TIT_FOR_TAT and ADJUSTERfor indiv=1:pop_szplayer(1:genes,1:5)=pop(1:genes,(indiv-1)*5+1:(indiv-1)*5+5);indiv_score=0;indiv_opp_score=0;%Player's opponent is TIT_FOR_TATtft_mvs=[];[player,indiv_mvs,indiv_score,indiv_opp_score]=...tit_for_tat(player,genes,n_o_mvs,indiv_mvs,indiv_score,...indiv_opp_score);% Store moveslog_tft_mvs=[log_tft_mvs;tft_mvs];tft_mvs=[];%Player's opponent is ADJUSTERadj_mvs=[];[player,indiv_mvs,indiv_score,indiv_opp_score]=...adjuster(player,genes,n_o_mvs,indiv_mvs,indiv_score,...indiv_opp_score);% Store scores (Average over the two games)score=[score indiv_score/2];opp_score=[opp_score indiv_opp_score/2];log_adj_mvs=[log_adj_mvs;adj_mvs];adj_mvs=[];% [player,indiv_score,tft_mvs,indiv_opp_score]=...% evlv_play(indiv,player,genes,n_o_mvs,indiv_score,...% tft_mvs,indiv_opp_score);% After play save in pop the player that has just finished playpop(1:genes,(indiv-1)*5+1:(indiv-1)*5+5)=player(1:genes,1:5);end;% Log scores so far gatheredlog_score=[log_score;score];log_opp_score=[log_opp_score;opp_score];%Breed best individuals by mating them.%Keep the population size constant.pop=cross_ovr(pop,pop_sz,genes,log_score,game_n);12

score=[];opp_score=[];end;save pop;disp('CPU time taken to evolve the strategy:'); cputime-tevolv='false';%Find evolved strategy with best average score over%n_o_games and its rulesbst_player=[];%[strat_score,bst_strat]=max(log_score(game_n,1:pop_sz))[strat_score,bst_strat]=max(mean(log_score));bst_player(1:genes,1:5)=pop(1:genes,...(bst_strat-1)*5+1:(bst_strat-1)*5+5);save bst_player;%Output readably coded rules of strategycoded_rul=[];ruls_score=[];[ruls_score,coded_rul]=...code_rul(ruls_score,coded_rul,bst_player,genes);save coded_rul;%Play evolved strategy against TFTdisp('Do you want to see the performance ...of this evolved player against TFT?');input('y or n between single quotes : ');if ans=='y'indiv_score=0;indiv_opp_score=0;[bst_player,indiv_mvs,indiv_score,indiv_opp_score]=...tit_for_tat(bst_player,genes,n_o_mvs,indiv_mvs,indiv_score,...indiv_opp_score);disp('Player and TFT scores: ')indiv_scoreindiv_opp_score% indiv_mvs% tft_mvsend%Check if the strategy has been altered (It should not!)coded_rul_aftr_tft=[];ruls_score=[];[ruls_score,coded_rul_aftr_tft]=...code_rul(ruls_score,coded_rul_aftr_tft,bst_player,genes);save coded_rul_aftr_tft;%Play evolved strategy against ADJUSTERdisp('Do you want to see the performance of this ...evolved player against ADJUSTER?');input('y or n between single quotes : ');13

if ans=='y'adj_mvs=[];indiv_score=0;indiv_opp_score=0;[bst_player,indiv_mvs,indiv_score,indiv_opp_score]=...adjuster(bst_player,genes,n_o_mvs,indiv_mvs,...indiv_score,indiv_opp_score);disp('Player and Adjuster scores: ')indiv_scoreindiv_opp_scoreendend*************function init_ruls=start_ruls(genes)%Builds the initial set of all possible outcomes%from the num_mvs previous moves.%Here 3 previous are only considered%Second columninit_ruls=[];for i=1:16init_ruls(i,2)=3;init_ruls(16+i,2)=5;init_ruls(32+i,2)=0;init_ruls(48+i,2)=1;end;for i=1:6outcome=rand;if rand>=.75init_ruls(64+i,2)=3;elseif outcome<.75 & outcome>=.5init_ruls(64+i,2)=5;elseif outcome<.5 & outcome>=2init_ruls(64+i,2)=0;elseinit_ruls(64+i,2)=1;end;end;%Third columnfor i=1:4init_ruls(i ,3)=3;init_ruls(4+i ,3)=5;init_ruls(8+i ,3)=0;init_ruls(12+i,3)=1;init_ruls(16+i,3)=3;init_ruls(20+i,3)=5; 14

init_ruls(24+i,3)=0;init_ruls(28+i,3)=1;init_ruls(32+i,3)=3;init_ruls(36+i,3)=5;init_ruls(40+i,3)=0;init_ruls(44+i,3)=1;init_ruls(48+i,3)=3;init_ruls(52+i,3)=5;init_ruls(56+i,3)=0;init_ruls(60+i,3)=1;end;for i=1:6outcome=rand;if rand>=.75init_ruls(64+i,3)=3;elseif outcome<.75 & outcome>=.5init_ruls(64+i,3)=5;elseif outcome<.5 & outcome>=2init_ruls(64+i,3)=0;elseinit_ruls(64+i,3)=1;end;end;%Fourth columninit_ruls(1,4)=3;init_ruls(2,4)=5;init_ruls(3,4)=0;init_ruls(4,4)=1;for j=2:16init_ruls(4*(j-1)+1:4*(j-1)+4,4)=init_ruls(1:4,4);end;for i=1:6outcome=rand;if rand>=.75init_ruls(64+i,3)=3;elseif outcome<.75 & outcome>=.5init_ruls(64+i,3)=5;elseif outcome<.5 & outcome>=2init_ruls(64+i,3)=0;elseinit_ruls(64+i,3)=1;end;end;%First column contains the score of each rule randomly generrated%ie:Before start we initialise player's rules scores to what they are15

%initially, eg: if we have RRR for outcomes in rule 19 say then%the score for that rule is initialised to 3*3, ie player(19,1)=3*3=9for i=1:genesinit_ruls(i,1)=sum(init_ruls(i,2:4));end;***************function pop=genpop(pop_sz,genes)%Generates a population of individuals (chromosomes) of length genes%Columns 1,2,3,4 are the same for every player at start. A matrix%'outcomes[genes,4]' is thus required to build up each player.outcomes=start_ruls(genes);for indiv=1:pop_szfor num_gen=1:genesif rand>0.5pop(num_gen,(indiv-1)*5+5)=1;elsepop(num_gen,(indiv-1)*5+5)=0;end;end;pop(1:genes,(indiv-1)*5+1:(indiv-1)*5+4)=outcomes(1:genes,1:4);end;function pop=cross_ovr(pop,pop_sz,genes,log_score,game_n)***************%Function CROSS_OVR operates a single point crossover on%the best performing individuals so far%Approach: find half pop_sz of individuals with best%average score over games played so far; mate them to%produce next generation. Keep the population size constant%Implementation: sort individuals according to their%average score against opponents;%sorted scores are in srt_av_scr in ascending order and player%numbers are in indiv_n.if game_n==1[srt_av_scr,indiv_n]=sort(log_score);else[srt_av_scr,indiv_n]=sort(mean(log_score(1:game_n,1:pop_sz)));end% Find the random pairs to matemat_pairs=[];frst_prtnr=pop_sz;half_pop=floor(pop_sz/2); 16

while frst_prtnr>=half_pop+1scnd_prtnr=half_pop+ceil(half_pop*rand);while frst_prtnr==scnd_prtnrscnd_prtnr=half_pop+ceil(half_pop*rand);end;pair=[indiv_n(frst_prtnr) indiv_n(scnd_prtnr)];mat_pairs=[mat_pairs;pair];frst_prtnr=frst_prtnr-1;end;%mat_pairs%Mating, single point cross_overnew_pop=[];n_pairs=1;indiv=1;while n_pairs<=floor(pop_sz/2)x_ovr_pnt=floor(genes*rand);%First offspring including the whole description of player,%ie 5 columnsnew_pop(1:x_ovr_pnt,(indiv-1)*5+1:(indiv-1)*5+5)=...pop(1:x_ovr_pnt,(mat_pairs(n_pairs,1)-1)*5+1:...(mat_pairs(n_pairs,1)-1)*5+5);new_pop(x_ovr_pnt+1:genes,(indiv-1)*5+1:(indiv-1)*5+5)=...pop(x_ovr_pnt+1:genes,(mat_pairs(n_pairs,2)-1)*5+1:...(mat_pairs(n_pairs,2)-1)*5+5);%Second offspringindiv=indiv+1;new_pop(1:x_ovr_pnt,(indiv-1)*5+1:(indiv-1)*5+5)=...pop(1:x_ovr_pnt,(mat_pairs(n_pairs,2)-1)*5+1:...(mat_pairs(n_pairs,2)-1)*5+5);new_pop(x_ovr_pnt+1:genes,(indiv-1)*5+1:(indiv-1)*5+5)=...pop(x_ovr_pnt+1:genes,(mat_pairs(n_pairs,1)-1)*5+1:...(mat_pairs(n_pairs,1)-1)*5+5);indiv=indiv+1;n_pairs=n_pairs+1;end;pop=new_pop;end*************function pop=mutate(pop,pop_sz,genes,mu)%Function MUTATE that operates mutations on all chromosomes at%regular intervals Let n_o_genes_t_mu be the number of genes to mutaten_o_genes_t_mu=ceil(genes*mu);for indiv=1:pop_szfor count=1:n_o_genes_t_mugene_t_mu=ceil(genes*rand); 17

if pop(gene_t_mu,(indiv-1)*5+5) == 0pop(gene_t_mu,(indiv-1)*5+5)=1;elsepop(gene_t_mu,(indiv-1)*5+5)=0;endendendend**************function [player,indiv_score,tft,indiv_opp_score]=...evlv_play(indiv,player,genes,n_o_mvs,...indiv_score,tft,indiv_opp_score)%A game of n_o_mvs moves is played by individual indiv against tft.%Their corresponding scores are recorded in array score(1:20)%tft is a simple array of length n_o_mv tft[1:n_mv]%corresponding to the n_mv moves of tft strategy%tft plays first and it is a C, note that C is represented%by 1 and D by 0 also R is represented by its value 3,%T by 5, S by 0 and P by 1indiv_score=0;indiv_opp_score=0;tft=[tft 1];for mv_n=1:n_o_mvs%Instead of choosing the best rule, we chose randomly%among the 50% best ones%Sort them in ascending order first, then chose%randomly from the top half[srt_ruls_scr,srt_ruls]=sort(player(1:genes,1));half_genes=floor(genes/2);bst_rul=srt_ruls(half_genes+ceil(half_genes*rand));action=player(bst_rul,5);% Update bst_rul of playerplayer(bst_rul,2)=player(bst_rul,3);player(bst_rul,3)=player(bst_rul,4);if tft(mv_n)==1 & action==1% ie R player(bst_rul,4)= 3;indiv_opp_score=indiv_opp_score+3;elseif tft(mv_n)==1 & action==0% ie T player(bst_rul,4)= 5;indiv_opp_score=indiv_opp_score+0;elseif tft(mv_n)==0 & action==118

% ie S player(bst_rul,4)= 0;indiv_opp_score=indiv_opp_score+5;else% ie P player(bst_rul,4)= 1;indiv_opp_score=indiv_opp_score+1;end;% Update score for that ruleplayer(bst_rul,1)=player(bst_rul,1)+player(bst_rul,4);% Update score of playerindiv_score=indiv_score+player(bst_rul,4);% update tfttft=[tft action];end;end;**********%Player with the TIT-FOR-TAT startegyfunction [player,indiv_mvs,indiv_score,indiv_opp_score]=...it_for_tat(player,genes,n_o_mvs,indiv_mvs,indiv_score,...indiv_opp_score)%A game of n_o_mvs moves is played by individual indiv%against tft_mvs.%Their corresponding scores are recorded in array score(1:20)global tft_mvs evolv%tft_mvs is a simple array of length n_o_mv tft_mvs[1:n_mv]%corresponding to the n_mv moves of tft_mvs strategy%tft_mvs plays first and it is a C, note that C is represented%by 1 and D by 0 also R is represented by its value 3, T by 5,%S by 0 and P by 1%Set first the three preceding moves (more precisely outcomes) of TFT%to random values%These outcomes effectively make up a tft rule; call it tft_rul.tft_rul=[];for init_mv=1:3outcm=rand;if outcm<0.5,if outcm<0.25,tft_rul=[tft_rul 0];elsetft_rul=[tft_rul 1];endelseif outcm<0.75, 19

tft_rul=[tft_rul 3] ;elsetft_rul=[tft_rul 5];endendn_o_t_bst_rul1_usd=0;tft_mvs=[tft_mvs 1];indiv_mvs=[];score_so_far=[];opp_score_so_far=[];log_cntr=1;for mv_n=1:n_o_mvs%Instead of choosing among the 50% best rules, we look for the one%which matches the last three moves of the game and apply it.%This requires setting TFT to three random moves, as done in tft_rul.%find rule matching that of tft, i.e what to do after a sequence of 3%given moves eg: What to do after 333, corresponding to RRR? etc..bst_rul=0;for rule_n=1:genesif player(rule_n,2:4)==tft_rul(1:3),bst_rul=rule_n;BST_RUL1=[bst_rul tft_rul(1:3)];n_o_t_bst_rul1_usd=n_o_t_bst_rul1_usd+1;break;endendif bst_rul==0%If no such sequence is available we look for what matches the%last two movesfor rule_n=1:genesif player(rule_n,3:4)==tft_rul(2:3),bst_rul=rule_n;BST_RUL2=[bst_rul tft_rul(2:3)];break;endendend%If no such seq is available then look for a rule with the last%outcome matching last outcome of tft_rulif bst_rul==0for rule_n=1:genesif player(rule_n,4)==tft_rul(3),bst_rul=rule_n; 20

BST_RUL3=[bst_rul tft_rul(3)];break;endendend%Error trap: if bst_rul is still 0 then there is something%wrong with playerif bst_rul==0disp('bst_rul is still 0; It should not be. Check player');bst_playersave player;save tft_rul;end;%Otherwiseaction=player(bst_rul,5);% Update rule of tft (Note that this rule is updated in% evolution satge and in performance play stage as well)tft_rul(1)=tft_rul(2);tft_rul(2)=tft_rul(3);% indiv_mvs=[indiv_mvs action];if tft_mvs(mv_n)==1 & action==1% ie outcome is Rmv_outcm=3;tft_rul(3)=3;elseif tft_mvs(mv_n)==1 & action==0% ie outcome is Tmv_outcm=5;tft_rul(3)=0;elseif tft_mvs(mv_n)==0 & action==1% ie outcome is Smv_outcm=0;tft_rul(3)=5;else% ie outcome is Pmv_outcm=1;tft_rul(3)=1;end% Update bst_rul of player if in evolution stageif evolv=='true ',player(bst_rul,2)=player(bst_rul,3);player(bst_rul,3)=player(bst_rul,4);player(bst_rul,4)=mv_outcm;end 21

% Update total score for the rule applied in this moveplayer(bst_rul,1)=player(bst_rul,1)+mv_outcm;% Update total score of player and tftindiv_score=indiv_score+mv_outcm;indiv_opp_score=indiv_opp_score+tft_rul(3);% update tft_mvstft_mvs=[tft_mvs action];% Log scores every 10 moves for each playerlog_cntr=log_cntr+1;if log_cntr>10score_so_far=[score_so_far indiv_score];opp_score_so_far=[opp_score_so_far indiv_opp_score];log_cntr=1;endendif evolv=='true ', return; end;tft_scores=[score_so_far' opp_score_so_far'];save tft_scores;plot(tft_scores);title('Performance of evolved strategy against TFT');xlabel('moves*10');ylabel('scores');end************%Player with strategy ADJUSTERfunction [player,indiv_mvs,indiv_score,indiv_opp_score]=...adjuster(player,genes,n_o_mvs,indiv_mvs,indiv_score,...indiv_opp_score)%A game of n_o_mvs moves is played by individual%indiv against adj_mvs.%Their corresponding scores are recorded in array score(1:20)global adj_mvs evolv%adj_mvs is a simple array of length n_o_mv adj_mvs[1:n_mv]%corresponding to the n_mv moves of adj_mvs strategy%ADJUSTER strategy relies on a high rate of defection, thus%attempting to exploit the opponent. But, when the opponent%is showing its teeth as well, ADJUSTER revises, i.e.%'adjusts', its defection rate. In details adjuster plays as%follows: Play 2D in every 3 successive moves. If oponent plays%2D's in successive moves, then revise rate to just 1D in 3 moves.%When opponent plays 2C's in a row then change rate to original%value, ie 2D's in 3 successive moves.%We set the 3 first outcomes to random ones corresponding to%0 0 1. There are 8 possible sequences of 3 outcomes:22

adj_rul=[];rnd_strt_ruls=[1 1 3;1 5 3;1 5 0;1 1 1;5 1 3;5 1 0;5 5 0;5 5 3];indx_o_rnd_rul=ceil(rand*8);adj_rul=[adj_rul rnd_strt_ruls(indx_o_rnd_rul,1:3)];%ADJUSTR plays first and it is a D, note that C is represented%by 1 and D by 0%also R is represented by its value 3, T by 5, S by 0 and P by 1%We arbitrarily set the first cycle of moves to 0 0 1, i.e. DDCadj_mvs=[adj_mvs 0 0 1];indiv_mvs=[];score_so_far=[];opp_score_so_far=[];log_cntr=1;set_strat_rate_1=[1 0 1;1 1 0;0 1 1];%1 defection in evrey 3 movesset_strat_rate_2=[0 0 1;0 1 0;1 0 0];%2 defections in evrey 3 moves%At start rate of defection is 2rate=2;cycle=1;change='true ';for mv_n=1:n_o_mvs%Instead of choosing among the 50% best rules, we look for the one%which matches the last three moves of the game and apply it. This%requires setting ADJ to three random moves, as done in adj_rul.%find rule matching that of ADJ, i.e what to do when a sequence of 3%given moves occurs. eg:What to do after moves 333,%corresponding to RRR etc..bst_rul=0;for rule_n=1:genesif player(rule_n,2:4)==adj_rul(1:3),bst_rul=rule_n;break;endendif bst_rul==0%If no such sequence is available we look for what matches%the last two movesfor rule_n=1:genesif player(rule_n,3:4)==adj_rul(2:3),bst_rul=rule_n;break;endendend 23

%If no such seq is available then look for a rule with the%last outcome matching last outcome of adj_rulif bst_rul==0for rule_n=1:genesif player(rule_n,4)==adj_rul(3),bst_rul=rule_n;break;endendend%Error trap: if bst_rul is still 0 then there is something%wrong with playerif bst_rul==0disp('bst_rul is still 0; It should not be. ...Check player'); bst_playersave player;save adj_rul;end;%Otherwiseaction=player(bst_rul,5);indiv_mvs=[indiv_mvs action];% update adj_mvsif mv_n>1 & change=='true 'if action==0 & indiv_mvs(mv_n-1)==0,rate=1;change='false';elseif action==1 & indiv_mvs(mv_n-1)==1,rate=2;change='false';endend% if cycle>3if rate==2 & change=='false',seq_o_mvs=ceil(3*rand);adj_mvs=[adj_mvs set_strat_rate_2(seq_o_mvs,1:3)];change='true ';elseif rate==1 & change=='false'seq_o_mvs=ceil(3*rand);adj_mvs=[adj_mvs set_strat_rate_1(seq_o_mvs,1:3)];change='true ';elseseq_o_mvs=ceil(3*rand);adj_mvs=[adj_mvs set_strat_rate_2(seq_o_mvs,1:3)];rate=2; 24

endcycle=1;end% Update rule of adj (Note that this rule is updated in% evolution satge and in performance play stage as well)adj_rul(1)=adj_rul(2);adj_rul(2)=adj_rul(3);% if adj_mvs(mv_n)==1 & action==1% ie R mv_outcm= 3;adj_rul(3)=3;elseif adj_mvs(mv_n)==1 & action==0% ie T mv_outcm= 5;adj_rul(3)=0;elseif adj_mvs(mv_n)==0 & action==1% ie S mv_outcm= 0;adj_rul(3)=5;else% ie P mv_outcm= 1;adj_rul(3)=1;end% Update bst_rul of player if in evolution stageif evolv=='true ',player(bst_rul,2)=player(bst_rul,3);player(bst_rul,3)=player(bst_rul,4);player(bst_rul,4)=mv_outcm;end% Update total score for the rule applied in this moveplayer(bst_rul,1)=player(bst_rul,1)+mv_outcm;% Update total score of playerindiv_score=indiv_score+mv_outcm;indiv_opp_score=indiv_opp_score+adj_rul(3);cycle=cycle+1;% Log scores every 10 moves for each playerlog_cntr=log_cntr+1;if log_cntr>10score_so_far=[score_so_far indiv_score];opp_score_so_far=[opp_score_so_far indiv_opp_score];25

log_cntr=1;endendif evolv=='true ', return; end;score_so_far;opp_score_so_far;adj_scores=[score_so_far' opp_score_so_far'];save adj_scores;plot(adj_scores);title('Performance of evolved strategy against ADJUSTER');xlabel('moves*10');ylabel('scores');%indiv_mvs%adj_mvsend*************%Other startegies can be include in the same way*************function [ruls_score,coded_rul]=...code_rul(ruls_score,coded_rul,bst_player,genes)%Write strategy in a more readable form%The scores of the rules are in array ruls_scoreruls_score(1:genes)=bst_player(1:genes,1);for rul_n=1:genesrule=[];for outcome_n=2:4if bst_player(rul_n,outcome_n)==3rule=[rule 'R'];elseif bst_player(rul_n,outcome_n)==5rule=[rule 'T'];elseif bst_player(rul_n,outcome_n)==0rule=[rule 'S'];elserule=[rule 'P'];end;end;if bst_player(rul_n,5)==0rule=[rule ' D'];elserule=[rule ' C'];end;coded_rul=[coded_rul;rule];end; 26

B Instance of evolved strategyTPT DTTP DSRS CSRS CRRR CRRR CTTP DTPT DRRR CSSR CSSR CSPT DPTT DSSR CPTT DPPT DSSR CTTP DSRS CTRP DRRR CTTT CPPT DTTP CSRS CTRR DSTT DPPT DTPT DTPT DRRR CRRR CRRR CRSR CSRS DSRP DPPT DSTT DTSS CPRS CSSR CSTT DSPT DSPT CTPP D 27

SPT CPPT DPPT DTPT DTTP DSTT DRPT DTRP DPTT DTSS CPTP DRTT DTRS CTPT CSPT DRTT DPPT CPPT DPPT DPRS CTPT DSTT DSTT DPRS DPSP D

28

Documents

Th - pdfs.semanticscholar.org · Anot er b o ok wit h m u c h in uence on t e su cce s s of gam eory i s Games and De cisions b y Lu ce an d Rai a, [14]. A lot of gam e t h eory mo