214 IEEE TRANSACTIONS ON COMPUTATIONAL ...xin/papers/ChongTinoYaoTCIAIG09.pdf214 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 1, NO. 3, SEPTEMBER 2009 Relationship

214 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 1, NO. 3, SEPTEMBER 2009

Relationship Between Generalization and Diversityin Coevolutionary Learning

Siang Yew Chong, Member, IEEE, Peter Tiňo, and Xin Yao, Fellow, IEEE

Abstract—Games have long played an important role in thedevelopment and understanding of coevolutionary learning sys-tems. In particular, the search process in coevolutionary learningis guided by strategic interactions between solutions in the popu-lation, which can be naturally framed as game playing. We studytwo important issues in coevolutionary learning—generalizationperformance and diversity—using games. The first one is con-cerned with the coevolutionary learning of strategies with highgeneralization performance, that is, strategies that can outper-form against a large number of test strategies (opponents) thatmay not have been seen during coevolution. The second one isconcerned with diversity levels in the population that may leadto the search of strategies with poor generalization performance.It is not known if there is a relationship between generalizationand diversity in coevolutionary learning. This paper investigateswhether there is such a relationship in coevolutionary learningthrough a detailed empirical study. We systematically investigatethe impact of various diversity maintenance approaches on thegeneralization performance of coevolutionary learning quantita-tively using case studies. The problem of the iterated prisoner’sdilemma (IPD) game is considered. Unlike past studies, we canmeasure both the generalization performance and the diversitylevel of the population of evolved strategies. Results from our casestudies show that the introduction and maintenance of diversitydo not necessarily lead to the coevolutionary learning of strategieswith high generalization performance. However, if individualstrategies can be combined (e.g., using a gating mechanism), thereis the potential of exploiting diversity in coevolutionary learningto improve generalization performance. Specifically, when theintroduction and maintenance of diversity lead to a speciatedpopulation during coevolution, where each specialist strategy iscapable of outperforming different opponents, the population as awhole can have a significantly higher generalization performancecompared to individual strategies.

Index Terms—Coevolutionary learning, diversity maintenance,evolutionary computation (EC), generalization performance, iter-ated Prisoner’s dilemma (IPD).

I. INTRODUCTION

E VOLUTIONARY COMPUTATION (EC) refers to thestudy of computational approaches that are motivatedfrom and emulate the process of natural evolution. Although

Manuscript received April 20, 2009; revised July 14, 2009; acceptedSeptember 18, 2009. First published October 13, 2009; current version pub-lished December 01, 2009. The work of X. Yao was supported in part byEPSRC under Grant GR/T10671/01.

S. Y. Chong is with the School of Computer Science, University of Not-tingham, Malaysia Campus (UNMC), 43500 Semenyih, Malaysia, and alsowith the Automated Scheduling, Optimisation and Planning (ASAP) ResearchGroup, School of Computer Science, University of Nottingham, NottinghamNG8 1BB, U.K. (e-mail: [email protected]).

P. Tiňo and X. Yao are with the School of Computer Science, The Universityof Birmingham, Birmingham B15 2TT, U.K. (e-mail: [email protected];[email protected]).

Digital Object Identifier 10.1109/TCIAIG.2009.2034269

current EC studies have since expanded and involved inter-disciplinary research with other computational methodologies(such as neural networks) [1], [2], EC traditionally studies andstill continues to focus on a broad class of population-based,stochastic search algorithms known as evolutionary algorithms(EAs) [3]–[5]. EAs can be described using the unified gen-erate-and-test framework for search algorithms [2]. The searchprocess of EAs involves samples (populations) of potentialsolutions drawn from the space of representation of the problemdomain, which undergo iterated applications of variation (gen-eration of new solutions based on solutions in the previousiteration through some perturbation processes) and selection(testing solutions for inclusion in the next iteration based onsome quality or fitness measurements) [2].

Although EC has been studied in many problem domains(such as optimization [6], [7] and classification [8], [9]), gameshave played important roles in the development and under-standing of EC [10]. A game has a specific set of rules thatconstrains strategies to certain behaviors (legal moves), withgoals for strategies to meet (to win the game), and rewards forthose that better achieve the goals under constraints of finiteresources (payoff for a move). A game has enough subtletiesto allow representation of a wide range of complex behaviors(a diverse set of strategies) [11]. Games can capture intrinsicproperties of complex, real-world problems where EC method-ologies are developed to obtain solutions, which can rangefrom solutions (strategies) for board games to economic games[11], [12], but are sufficiently simple to enable extensive andin-depth analysis of EC methodologies.

In EC, coevolutionary learning refers to a broad class of pop-ulation-based, stochastic search algorithms that involve the si-multaneous evolution of competing solutions with coupled fit-ness [3]. As with EAs, coevolutionary learning can be describedusing the generate-and-test framework. For example, early co-evolutionary learning systems are implemented using coevolu-tionary algorithms derived from EAs [11], [13]. However, co-evolutionary learning and EAs are fundamentally different inhow the fitness of a solution is assigned, which may lead to sig-nificantly different outcomes when they are applied to similarproblems (different search behaviors on the space of solutions)[14], [15]. Classical EAs are formulated in the context of opti-mization [3], [5], [6], where an absolute fitness function is usedto assign fitness values to solutions. The solution fitness is ob-jective (remains the same regardless of the composition of thepopulation). Coevolutionary learning assigns a solution’s fitnessthrough its interactions with other competing solutions in thepopulation. The solution fitness is subjective (depends on thecomposition of the population).

1943-068X/$26.00 © 2009 IEEE

Authorized licensed use limited to: UNIVERSITY OF BIRMINGHAM. Downloaded on December 27, 2009 at 11:10 from IEEE Xplore. Restrictions apply.

CHONG et al.: RELATIONSHIP BETWEEN GENERALIZATION AND DIVERSITY IN COEVOLUTIONARY LEARNING 215

Coevolutionary learning is originally proposed as a viablealternative for problems where it is very difficult to constructan absolute quality measurement for solutions (the fitness func-tion) through which optimization-based search algorithms suchas EAs can be used. The evolutionary search is guided by thestrategic interactions between the competing solutions in thepopulation. In particular, these interactions provide the basis tocompute relative fitness values that are used to rank the com-peting solutions, which would affect the selection process of thepopulation and subsequently the variation in the population. Thepotential for problem solving through coevolutionary learningcan be realized from an arms-race dynamic in the evolutionarysearch process that leads to solutions that are innovative and ofincreasingly higher quality [16]–[20].

In coevolutionary learning, games have played an importantrole not only in the development of various coevolutionaryalgorithms, but also in understanding its approach to problemsolving. In particular, games provide a natural framework tostudy coevolutionary learning as the interactions between thecoevolving solutions in the population can be framed as gameplaying. Games are used to study two main properties of acoevolutionary learning system: decision making and learning[11]. Decision making refers to the ability of the evolvedstrategy to make appropriate responses for the given stimuli(behaviors) in light of specific goals that must be achieved.Learning refers to the ability of the system to train (evolve)strategies that can respond to a wide range of environments (op-ponents). The measure of success of a coevolutionary learningsystem in solving games can be viewed as its ability to evolvestrategies with behaviors such that they can outperform againsta large number of different opponents.

Although coevolutionary learning has been successfully ap-plied in solving games [11], [13], [20]–[23], the methodologyhas been shown to suffer from coevolutionary pathologies thatcan potentially have a negative impact on its performance [19],[24]–[28]. In the context of game playing, poor search perfor-mance in coevolutionary learning can be a result of the over-specialization of the population to a particular strategy that per-forms well only against specific opponents rather than a largenumber of different opponents [26], [27].

As such, there are strong motivations to investigate and un-derstand the conditions in the search process of coevolutionarylearning that have impact on its performance. Generalizationperformance and diversity are two important and related issuesof the study. For the first issue, one is concerned with the co-evolutionary learning of strategies with high generalization per-formance, that is, strategies that can outperform against a largenumber of test strategies (opponents) that may not have beenseen during coevolution [26], [29], [30]. For the second issue,one is concerned with diversity levels in coevolutionary learning[25], [27], [31].

The lack of diversity has been claimed to have a negativeimpact on the generalization performance of coevolutionarylearning. Studies that have investigated this issue have proposeddiversity maintenance techniques to introduce and maintaindiversity in the population through specific use of selectionand variation processes, which have been shown subsequentlyto improve the performance of coevolutionary learning [27],

[29], [31], [32]. However, it is not known whether there is arelationship between generalization performance and diversityin coevolutionary learning. Establishing any direct relationshipbetween generalization performance and diversity theoreticallywould be very difficult. Although earlier studies have turnedto various empirical approaches, there is a lack of rigorousanalysis to investigate this issue in terms of generalizationperformance measures in coevolutionary learning [29], [32],[33]. Furthermore, these past studies made little attempt tomeasure diversity levels in coevolutionary learning despitestrong claims that they are increased while investigating theirimpact on performance [24], [25], [29], [33].

In this paper, we present a detailed empirical study as a firststep to investigate whether there is a relationship between gener-alization performance and diversity in coevolutionary learning.We have systematically investigated the impact of various diver-sity maintenance techniques on the generalization performanceof coevolutionary learning quantitatively through case studiesinvolving iterated prisoner’s dilemma (IPD) games. Our casestudies address the two shortcomings found in previous empir-ical studies. First, we have made a series of quantitative mea-surements of the generalization performance of coevolutionarylearning systems with and without diversity maintenance thatare based on the estimation procedure we have developed theo-retically in [30]. Second, we have made a series of quantitativemeasurements of diversity levels in the population of various co-evolutionary learning systems with and without diversity main-tenance that are based on relevant diversity measures that havebeen identified earlier in [34]. These measurements allow us todetermine whether the generalization performance and diversitylevels in the population have been increased as a result of theapplication of diversity maintenance in coevolutionary learningin comparison to the classical coevolutionary learning (CCL)without diversity maintenance that we have used as a baseline.

In investigating the relationship between generalization per-formance and diversity in coevolutionary learning, we try toanswer two related questions: 1) what is the right form of di-versity that would be beneficial for generalization, and 2) whatamount of the “right” form of diversity is needed for good gen-eralization? The answers to these two questions may allow us toidentify specific conditions whereby diversity can be exploitedto improve the generalization performance in coevolutionarylearning. Although our results have shown that not all diversitymaintenance techniques that introduce and maintain diversitylead to a significant increase in the generalization performanceof coevolutionary learning, we have identified a condition withwhich diversity plays a positive role in improving generalizationperformance. In particular, if individual strategies from the pop-ulation can be combined (e.g., using a gating mechanism), thereis the potential of exploiting diversity in coevolutionary learningto improve generalization performance. Specifically, when theintroduction and maintenance of diversity lead to a speciatedpopulation during coevolution, where each specialist strategyis capable of outperforming different opponents, the populationas a whole can have a significantly higher generalization perfor-mance compared to individual strategies.

The rest of this paper is organized as follows. Section IIreviews a number of diversity maintenance techniques in coevo-



lutionary learning that have been proposed in the past. The casestudies that are presented in this paper compare coevolutionarylearning employing these diversity maintenance techniqueswith CCL without diversity maintenance. Section III presentsthe methodology of the experiments used in the case studies.The section also describes how generalization performanceand diversity levels of coevolutionary learning are measured.Section IV presents results of the case studies and discusses theobservations made from the results. Section V concludes thepaper with remarks for future studies.

II. DIVERSITY MAINTENANCE TECHNIQUES

The search process in evolutionary systems can be describedwith the notion of how these systems maintain a spread of points(the population) in the search space through exploration and ex-ploitation. An evolutionary system operating during an explo-ration phase would spend more effort searching other globallypromising regions of the search space that are not covered by thecurrent population. In contrast, evolutionary systems operatingduring an exploitation phase would spend more effort searchinglocally in regions covered by the current population to focus onobtaining the required solutions to the problem. The success ofan evolutionary search depends on how and when to exploreor exploit the search space so that solutions that are of betterquality are obtained in subsequent generations. There may be atradeoff between exploring and exploiting the search space inany generation, and understanding this tradeoff may lead to thedesign of adaptive control in the search process that leads to im-provements in the search performance [35].

One common approach used to study the dynamics of theevolutionary process and the role they play in search is withthe notion of diversity. Diversity is concerned with the levelsand types of variety in the population [34], which can changedepending on the exploration or exploitation phase of theevolutionary process. Many studies [36]–[43] have investigatedmethods that exploit diversity to further improve the searchperformance of evolutionary systems. Some of these algorithmsare designed to introduce and maintain diversity in the popu-lation during the evolutionary search process. Early examplesof these diversity maintenance techniques have been used inEAs to solve difficult optimization problems involving multipleoptima [36]–[40]. The techniques have since been adapted forcoevolutionary learning with the motivation that diversity mayencourage fitness variations in the population for continuedsearch of solutions with higher generalization performance[29], [44].

In our case studies, we categorize the different diversitymaintenance techniques into two groups that are based onhow the specific algorithm operates in the coevolutionarylearning framework. Implicit techniques emphasize diversitymaintenance through the selection process. Explicit techniquesemphasize diversity maintenance through the variation process.This section describes several established diversity mainte-nance techniques in coevolutionary learning studies that we usefor our case studies.

For clarity, we use examples of game playing in our descrip-tion of how a specific diversity maintenance technique is im-plemented in coevolutionary learning. The interactions between

solutions in coevolutionary learning that is used to determinethe solution fitness (relative to other solutions) can be framed asgame playing. A game is played between two strategies (solu-tions) and consists of a series of moves that are made by eachstrategy. During one game play, both strategies accumulate pay-offs that are based on the joint moves they have made, which arethen used to determine the outcome when the game finishes. Anexample would be the case where a strategy wins the game whenit has a higher total payoff in comparison to the other strategy,which loses the game. Diversity maintenance techniques differin terms of how information from game outcomes is obtainedand used.

A. Implicit Diversity Maintenance

1) Speciation: One of the early implicit diversity mainte-nance techniques is speciation (or niching). Speciation is usu-ally implemented with fitness sharing algorithms. The main ideaof fitness sharing is to change fitness evaluations so that fitnessesof solutions are reduced to lower values if the solutions aremore similar. Fitness sharing of solutions can be implementedby computing the shared fitness from the original, unsharedfitness for each solution in the population [40]. This is toemphasize the search for more dissimilar solutions and possiblyleading to a population forming a diverse set of niches withunique solutions. Competitive fitness sharing [27] and implicitfitness sharing [29] are two fitness sharing approaches that havebeen studied for coevolutionary learning.

In the competitive fitness sharing approach, each evolvedstrategy in the population would interact with a random setof opponent strategies sampled from the same population.Strategy plays against an opponent in and receives afitness value of if wins. is the number of opponentsin the population that have defeated . can be precomputedfor each strategy in the population through a tournamentwhere every strategy plays against all strategies in the popu-lation. Obviously, no strategy will win against a copy of itself,which means that the maximum value of is . Theminimum value for the worst strategy that loses to all strategiesin the population would be 1/POPSIZE (POPSIZE isthe total number of parent and offspring strategies). The sharedfitness for a strategy is calculated as follows:

(1)

where is 1 if wins against , and 0 otherwise. Essen-tially, the competitive fitness sharing approach encourages theselection of strategies that are unique in terms of defeating spe-cific opponents that few other competing strategies could defeat(having higher fitness value of as is smaller). Thesestrategies are rewarded with a higher total sum of fitness valueseven though they may defeat a smaller number of opponents.

In the implicit fitness sharing approach, the shared fitnessfor each strategy in the population is obtained using the al-gorithmic procedure outlined in Fig. 1. During one typical runof the procedure for a strategy (there are number of runs),a random sample of opponent strategies (other than strategyitself) would be drawn from the current population. The strategy



Fig. 1. Procedure for implicit fitness sharing to assign shared fitness to a strategy.

would play against each opponent strategy in . The opponentthat achieves the highest game payoff against strategy wouldreceive a fitness value (we use the game payoff itself). Ifnumber of opponents tie in terms of their game payoffs againststrategy , they would share the fitness (the game payoff dividedby ). The approach has been shown mathematically to approx-imate the original fitness sharing scheme [45] but with the ad-vantage of not having to use a sharing function, which can beproblem specific [46].

2) Pareto Coevolution: Pareto coevolution is a coevolu-tionary learning approach with implicit diversity maintenancebased on the framework of multiobjective optimization intro-duced earlier in EAs [24], [32], [33]. Pareto coevolution ismotivated to address the problem associated with the use ofscalar fitness evaluations in CCL. In the context of games, it hasbeen argued that a strategy’s fitness based on a simple weightedsum of values given by game outcomes can be misleading. Onemay not know at the time fitness evaluations are made a prioriwhether all outcomes are of equal importance (uniform weight-ings) or that some outcomes are more important (nonuniformweightings).

Pareto coevolution reformulates the fitness evaluation by ex-plicitly using opponent strategies (test cases) to represent thedifferent, underlying objectives of the problem that a strategymust address. The selection process in Pareto coevolution isbased on the Pareto-dominance relationship. For single-popula-tion Pareto coevolution, to determine whether strategy Paretodominates strategy , we consider all strategies fromthe population as the objectives. The objective valuerefers to the game outcome of strategy when it plays againststrategy . Strategy Pareto dominates if has at least the samegame outcome as for every one of the objectives, and thereis at least one objective for which has a better game outcomethan [24]

(2)with . Strategies and are mutually nondominatingif is better than on at least one objective and is better than

on at least another objective, with both strategies being equiv-alent in the remaining objectives.

The selection process in Pareto coevolution can take the formof a ranking of strategies according to the Pareto layers that areobtained from the Pareto-dominance relationship. For example,mutually nondominating strategies in the Pareto front would bethe most preferable strategies for selection. The second Paretolayer can be obtained by first removing strategies in the Pareto

front from the population. The second Pareto layer would con-tain mutually nondominating strategies among the subpopula-tion that is obtained after removing the Pareto front from the fullpopulation. The following Pareto layers can be obtained sys-tematically with respect to subpopulations that do not includestrategies in higher Pareto layers [32].

3) Reducing Virulence: Both speciation and Pareto coevolu-tion approaches to diversity maintenance require sophisticatedalgorithmic implementations. In [25], a simple approach thatis inspired from the natural interactions between hosts andparasites was proposed. In a biological coevolutionary system,highly virulent parasites could potentially incapacitate thehosts, which in turn, reduce the chances for parasites to re-produce. In coevolutionary learning, similar observations havebeen made whereby strong competing solutions can quicklytake over the population, leading to coevolutionary disengage-ment where the system is unable to differentiate between thesolutions for continued evolutionary search [25].

The implementation of reducing virulence involves a formof rescaling of fitness values. CCL operates at maximum viru-lence. The original fitness value is assumed to be proportionalto the solution ’s ability to solve as many test cases as possiblein the population (e.g., a simple weighted sum, which is nor-malized according to the fitness value of the best solution). Inthe context of game playing, one form of reducing virulence isto lower the fitness of the top performing strategies in the co-evolving population that outperform the largest number of theopponents. Following [25], the virulence level is defined asthe tendency to maximize fitness and a reduced virulence (RV)level changes an evolved strategy’s original fitness value to

(3)

where .A coevolutionary learning system with maximum virulence

level uses , while a system with RV level has in therange of (e.g., moderate virulence level ).Fig. 2 compares how fitness values are rescaled between theuse of maximum and moderate virulence levels. Note that thelowest value for is 0.5 as any lower value will indicate fitness isrescaled to encourage strategies to outperform smaller numbersof opponents [25].

B. Explicit Diversity Maintenance

Explicit diversity maintenance techniques involve increasingthe level of variation in the population through variation oper-ators in the coevolutionary search process. A simple approach



Fig. 2. Comparison of virulence curves between maximum virulence �� and moderate virulence �� [25].

is to use a higher value for the parameter used in the variationoperator that can increase the mutation probability. This simpleapproach is investigated in our case studies. However, it is notedthat there are other more sophisticated approaches where theparameter controlling mutation probability can be varied ac-cording to some algorithmic procedures, e.g., setting higher pa-rameter values to introduce and maintain more variations in thepopulation if measurements on the population indicate low di-versity levels [43].

III. RELATIONSHIP BETWEEN GENERALIZATION PERFORMANCEAND DIVERSITY IN COEVOLUTIONARY LEARNING

In this section, we present a methodology to investigatewhether there is a relationship between generalization per-formance and diversity in coevolutionary learning. We usecase studies to systematically investigate the impact of variousdiversity maintenance techniques on the generalization perfor-mance of coevolutionary learning quantitatively. The problemof the IPD game is considered. Two categories of diversitymaintenance techniques that we investigate are: 1) implicitdiversity maintenance techniques, which include speciation(competitive and implicit fitness sharing), Pareto coevolution,and reducing virulence; and 2) explicit diversity maintenancetechniques with different and higher parameter settings formutation probability.

A. The IPD Game

In the classical IPD game setting, two players engaged in re-peated interactions are given two choices to make, cooperate anddefect [47]–[49]. The classical IPD game is formulated using apredefined payoff matrix that specifies the payoff that a playerreceives for the joint moves of the player and the opponent.Fig. 3 illustrates the payoff matrix for the classical IPD game.Both players receive (reward) units of payoff if both co-operate. They both receive (punishment) units of payoff ifthey both defect. However, when one player cooperates whilethe other defects, the cooperator will receive (sucker) units

Fig. 3. Payoff matrix framework of a two-player, two-choice game. The payoffgiven in the lower left-hand side corner is assigned to the player (row) choosingthe move, while that of the upper right-hand side corner is assigned to the op-ponent (column).

Fig. 4. Payoff matrix for the two-player four-choice IPD game [50]. Each ele-ment of the matrix gives the payoff for player A.

of payoff while the defector receives (temptation) units ofpayoff.

In the classical IPD, the values , and must satisfythe constraints: and . Any setof values can be used as long as they satisfy the IPD constraints(we use , and in all our experi-ments). The game is played when both players choose betweenthe two alternative choices over a series of moves. The dilemmain the game is reflected by the tension between rationality of in-dividuals who are tempted to defect and rationality of the groupwhere every individual is rewarded by mutual cooperation sinceboth players who are better off mutually cooperating than mu-tually defecting are vulnerable to exploitation by the party whodefects [49].

The classical IPD game has been extended to more complexversions to better model real-world interactions. One exampleis to extend the classical IPD with multiple, discrete levels ofcooperation [26], [50]–[53]. The -choice IPD game can be de-fined by the payoffs obtained through the following linear inter-polation:

(4)

where is the payoff to player A, given that and arethe cooperation levels of the choices that players A and B make,respectively. Fig. 4 [50] illustrates the payoff matrix for the four-choice IPD game we use in the case studies.

It is noted that when generating the payoff matrix for an-choice IPD game, the following conditions must be satisfied

[50].1) For and constant : .2) For and : .3) For and

.These conditions are analogous to those for the classical

IPDs. The first condition ensures that defection always paysmore. The second condition ensures that mutual cooperation



Fig. 5. Direct lookup table representation for the deterministic and reactivememory-one IPD strategy that considers four choices (also includes � forthe first move, which is not shown in the figure).

results in higher payoff than that of mutual defection. The thirdcondition ensures that alternating between cooperation anddefection pay less in comparison to just playing cooperation.

Many past studies using IPD games were motivated to inves-tigate specific conditions that can lead to the evolution of certainbehaviors, for example, how and why cooperative behaviors canbe learned through a coevolutionary process guided by interac-tions between competing strategies [54]–[59]. In this paper, ouruse of IPD games is motivated to determine whether diversitycan have an impact on the coevolutionary learning of strategiesthat are “robust.”

The notion of robustness has been defined in the context ofstrategy performance against some expert-designed strategies[13] and in the context of generalization of strategies obtainedfrom a learning process [31], [44], [60]. Other studies such asin [61] have investigated in the context of nonlocal adaptationperformance between populations of strategies. In this paper,we consider the notion of generalization and apply the theoret-ical framework of generalization performance in coevolutionarylearning that we have formulated earlier in [30]. We considerthe four-choice IPD game with 150 rounds for deterministicand reactive, memory-one strategies as it provides a difficultsearch problem (a large search space that cannot be exhaustivelysearched) for coevolutionary learning but one which we can an-alyze quantitatively.

B. Strategy Representation

For the IPD games involving deterministic and reactivestrategies, several representations have been investigated inthe past, which include a lookup table with bit-string encoding[13], finite-state machines [55]–[57], and neural networks [50],[53], [62], [63]. In this paper, we use the direct lookup tablestrategy representation that we have introduced and studied in[50], which directly represents IPD strategy behaviors througha one-to-one mapping between the genotype space (strategyrepresentation) and the phenotype space (behaviors).

Fig. 5 illustrates the direct lookup table representation forthe IPD strategies with four choices and memory-one that weconsider in the case studies. The elements of the direct lookuptable , specify choices that are to be madegiven the inputs (player’s own previous choice) and (oppo-nent’s previous choice). Instead of using pregame inputs (twofor memory-one strategies), the first move is specified indepen-dently . For the four-choice IPD (Fig. 5), each element cantake values of , and . These values are thesame as those that are used to generate the payoffs in the payoffmatrix (Fig. 4) through a linear interpolation (4).

A simple mutation-based variation operator is used to gen-erate an offspring from a parent strategy when using the directlookup table for strategy representation [50]. For the -choiceIPD, mutation replaces the original choice of a particular ele-ment in the direct lookup table with one of the other remaining

possible choices with an equal probability of .Each table element has a fixed probability of being replacedby one of the remaining choices. The mutation-basedvariation operator can provide sufficient variations on strategybehaviors directly with the use of the direct lookup table rep-resentation (even for the more complex IPD game with morechoices) [50].

The important advantage to using the direct lookup table rep-resentation is that the search space (given by the strategy repre-sentation) is the same as the strategy space (assuming uniformstrategy distribution in the strategy space) [30]. This simplifiesand allows direct investigation to be made on the conditions ina coevolutionary learning search process that can have impacton the generalization performance. Here, variation and selectionoperators can directly affect the learning of strategy behaviorsfrom the use of the direct lookup table representation. This inturn allows us to investigate the impact of both explicit and im-plicit diversity maintenance techniques on the coevolutionarylearning of strategy behaviors. Furthermore, we can measure(estimate) the generalization performance of evolved strategiesusing a random sample of test strategies [30].

C. Coevolutionary Learning

All coevolutionary learning systems that we investigateshare the same generate-and-test framework guided by strategicinteractions through IPD game play. The basic coevolutionarylearning system without any specific application of diversitymaintenance (our baseline) is CCL, which is implementedthrough the following procedure [50].

1) Generation step : Initialize POPSIZE/2 parent strate-gies POPSIZE , randomly.

2) Generate POPSIZE/2 offspringPOPSIZE , from POPSIZE/2 parents using a varia-tion operator.

3) All pairs of strategies compete, including the pair wherea strategy plays itself (round-robin tournament). For POP-SIZE strategies in a population, every strategy competes atotal of POPSIZE games.

4) Select the best POPSIZE/2 strategies based on total payoffsof all games played. Increment generation step .

5) Steps 2)–4) are repeated until a termination criterion (afixed number of generation) is met.

The settings for classical coevolutionary learning system(CCL-PM05) that we use as baseline for comparison withcoevolutionary learning systems with specific application ofdiversity maintenance are summarized in Table I.

In implementing coevolutionary learning systems with di-versity maintenance, changes are made only to the affectedvariation or selection operators. This is to facilitate a moredirect comparison of the impact of different diversity mainte-nance techniques on coevolutionary learning, whether explicitlythrough the variation process, or implicitly through the selec-tion process. Note that different implicit diversity maintenancetechniques would rank strategies in the population differently,



TABLE ISUMMARY OF SETTINGS USED IN CCL (BASELINE)

TABLE IISUMMARY OF IMPLICIT DIVERSITY MAINTENANCE TECHNIQUES

USED IN THE EXPERIMENTS

and that the best half of the ranked population are selectedas parents for the following generation. The implementationdetails have been described earlier in Section II, although wesummarize further details such as the choices of parameters anddifferentiate between the various implicit diversity maintenancetechniques in Table II.

For case studies involving coevolutionary learning withspeciation, we investigate both competitive and implicit fitnesssharing techniques. Furthermore, we investigate two samplingprocedures to obtain the set of opponents that is used forfitness evaluation. The first procedure is disassortative sam-pling that first selects a random opponent from the currentpopulation before choosing successive opponents that are themost different (in terms of edit distance for the direct lookuptable representation that we describe in more detail later) fromthe previous opponents [29]. The second procedure is simplerandom sampling [29].

For Pareto coevolution, the single-population approach thatwas proposed in [33] is used. This is to facilitate a more di-rect comparison with other coevolutionary learning systems weinvestigate in this study that use only one population. For thecoevolutionary learning with RV, we consider the moderate vir-ulence level that has been studied in detail earlier in [25].

For case studies involving the application of explicit diver-sity maintenance to coevolutionary learning, we consider CCLapproach and investigate systems with higher parameter valuesettings for the probability of mutation . We consider theCCL with (CCL-PM05) as the basic coevolutionarylearning system (baseline) for comparison with other coevo-lutionary learning systems having diversity maintenance. Weuse other settings, i.e., (CCL-PM10),(CCL-PM15), (CCL-PM20), and(CCL-PM25), to investigate the impact of explicit diversitymaintenance on the generalization performance of coevolu-tionary learning. In the case studies, all experiments involvingdifferent coevolutionary learning systems are repeated in 30independent runs to allow for statistical analysis.

D. Measuring Generalization Performance

We have recently formulated a theoretical framework formeasuring generalization performance in coevolutionarylearning, and have applied the framework in the context ofthe coevolutionary learning of IPD games. The generalizationperformance of a strategy is defined as its average performanceagainst test (opponent) strategies [30].

In formulating the generalization performance measure forcoevolutionary learning, consider a game with the strategy spacegiven by a set of distinct strategies . Letthe selection of individual test strategies from be representedby a random variable taking on values with probability

. Denote the game outcome of strategy playing againstthe test strategy by (conversely, is the game out-come of strategy against strategy ).

Different definitions of (i.e., different notions of gameoutcomes) for the generalization performance indicate differentmeasures of quality. The win–lose and average-payoff functionsfor are two examples. Let and be the payoffsfor strategies and at the end of the IPD game, respectively.Strategy wins the game if , else loses thegame. The win–lose function for for the IPD game isdefined as

forfor otherwise

(5)

where and are arbitrary constants with(we use and ).

In this paper, we consider the win–lose function as it sim-plifies the analysis of the experiments. In particular, we knowthe “all defect” strategy that plays full defection only to be thestrategy with the maximum generalization performance whenthe win–lose function is used irrespective of how test strategiesare distributed in (this is not necessarily true for other defini-tions such as the average-payoff function). We arbitrarily choosea stricter form of the win–lose function to measure performance(since a loss is awarded to both sides in the case of a tie).

The true generalization performance of a strategy is definedas the mean performance against all test strategies

(6)



That is, is the mean of the random variable

Some test strategies may be favored over the others, or all teststrategies can be considered with equal probability a priori. Inparticular, when all strategies are equally likely to be selected,i.e., when is uniform, we have

The size of the strategy space can be very large, makingdirect estimation (6) of the generalization performancecomputationally infeasible. For example, in the case of thefour-choice IPD game involving deterministic and reactive,memory-one strategies, .

In practice, one can estimate using a random sample oftest strategies drawn independent and identically distributed

(i.i.d.) from with probability . The estimated generalizationperformance of strategy is then

(7)

If the game outcomes vary within a finite intervalof size , the variance of

is upper bounded by. Using Chebyshev’s theorem [64], we obtain

(8)

for any positive number . Note that Chebyshev’s bound isdistribution free, e.g., no particular form of distribution ofis assumed. Equation (8) allows one to make a statistical claimabout how confident one is about the accuracy of an estimategiven a random test sample of a known size .

Let . The inequality (8) can be rewritten asfollows:

where and . For the case studies,we use a sample of random test strategies to estimate the gen-eralization performance of strategy obtained through coevo-lutionary learning , based on IPD game outcomes de-fined by . Given the high computation cost that is in-volved in estimating the generalization performance through alarge number of IPD game plays, we consider a sample sizeof . Assuming (error at 1% of

), Chebyshev’s bound gives. We can claim with a probability

at least that the error for the gener-alization performance estimate satisfies .

In game playing, one may be more interested in estimatingthe generalization performance with respect to a biased testsample, e.g., biased towards test strategies that are more likelyto be encountered in a real-world scenario such as game tourna-ments, rather than an unbiased sample where every test strategyis equally likely to be encountered. A biased test sample may

consist of “good” strategies based on some performance cri-teria, e.g., outperforming a large number of test strategies. Thisbiased sample of “good” test strategies can be obtained using aprocedure we have introduced and studied earlier in [30].

1) Partial enumerative search run : sample strate-gies, , randomly.

2) Every strategy competes with all other strategies in thesample, including its twin (i.e., each strategy competes ina total of games).

3) Detect the strategy index so thatyields the highest total payoff. Let . Incrementrun .

4) Repeat steps 1)–3) times to obtain a -sized biasedsample of test strategies, .

The population (sample) size for each partial enumerativesearch is , which is significantly larger than the max-imum number of unique strategies in an evolutionary run ((generation POPSIZE)). This is so that the test strategy is: 1)likely not to have been encountered before by evolved strategiesand 2) more challenging (having outperformed a significantlylarger number of opponents compared to evolved strategies).The procedure is independently repeated times so thata more diverse sample of test strategies is obtained. A sampleof test strategies obtained from a single partial enumerativesearch may be less diverse (they are behaviorally similar withthe same weakness that can be exploited) and lower performingdue to a poor population sample.

For the case studies, we estimate the generalization perfor-mance with respect to unbiased and biased samples of test strate-gies, which we denote as and , respectively. Both es-timates and can be computed for the current topperforming evolved strategy in a particular generation. How-ever, analysis on the generalization performance of coevolu-tionary learning system can be made by taking generalizationperformance estimates on the best evolved strategies fromthe population , which is an or-dered set (ordered according to their internal fitness values).

• Best

(9)

• Average

(10)

• Population

(11)

The “Population” measurement allows us to estimate thegeneralization performance of the coevolving population asa whole. For simplicity, we assume that the population canbe combined with an ideal gating mechanism [29], [60]: itchooses the evolved strategy in that best performsagainst each of the test strategies. For example, given five teststrategies, if outperforms the first two test strategies, and

outperforms the last three test strategies, the populationof with this gating mechanism outperformsall test strategies (it will choose for the first two test



strategies, and for the others). As such, the “Population”measurement would obtain an upper bound on the generaliza-tion performance of the population. We use this measurementto compare the generalization performance of the populationand an individual strategy by investigating their difference inthe coverage of the whole strategy space (i.e., performanceagainst all test strategies). This analysis allows us to investigateif diversity maintenance can have an impact on the coevolvingpopulation and subsequently be exploited for improvements ingeneralization performance.

For the case studies, top performing individual strate-gies are selected from the coevolving population. Estimatesand are made for the top performing individual strategiesin the population and and forthe population .

E. Measuring Diversity

Different diversity measurements obtain different informa-tion on the levels and type of variety from the population ofan evolutionary system. Some diversity measures may be moreimportant in comparison to other diversity measures for specificproblems. Here, the main motivation is to obtain some generalinformation on the diversity levels for comparison between co-evolutionary learning systems with and without diversity main-tenance. For the case studies, we consider two different diversitymeasurements, i.e., genotypic and phenotypic diversity mea-surements. In an earlier study [34] that had conducted a system-atic empirical investigation on various diversity measures, re-sults suggested strong correlations between the diversity levelsbased on the genotypic and phenotypic diversity measurementswith the fitness of the top performing solutions obtained fromevolutionary search processes. The study concluded that thesetwo diversity measurements may be useful in obtaining infor-mation on diversity levels that reflects the evolutionary searchprocess with respect to its performance.

Genotypic diversity relates to the level of variety in the geno-typic space (the search space that is specified by the solution rep-resentation). The genotypic diversity measurement that we useis based on the edit distance diversity measurement [34], [65],[66], which we have adapted to measure the edit distance of de-terministic and reactive, memory-one IPD strategies with a di-rect lookup table representation.1 The edit distance between twostrategies and is related to the minimum number of transfor-mations that is required to change one strategy’s direct lookuptable to the other strategy’s direct lookup table

(12)

where and are the direct lookup table elements of strategiesand , respectively, and is a distance metric on

and . is the total number of elements of the direct lookuptable. For the four-choice IPD strategy, (one elementfor the first move and 16 elements for decisions based on jointpast moves as shown in Fig. 5).

1Note that since the strategy representation has a fixed length data structure,we are measuring the Hamming distance. However, we use the term edit distancein this paper based on the references of past studies [34], [65], [66], and also todifferentiate with the more common usage of the term Hamming distance inEAs using the binary string representation.

For the distance metric , we use the Levenshtein dis-tance as in [67]

.

Note that a transformation for a particular element is made onlyas a substitution from one choice to another. For example, iftwo strategies and differ only in the choice they make for thefirst move, e.g., for strategy and for strategy , onlyone transformation is required to change one strategy’s directlookup table to the other strategy’s direct lookup table (substi-tuting with to transform the direct lookup table rep-resentation of strategy to that of strategy ). The edit distancebetween these two strategies is 1/17.

The genotypic diversity measurement based on edit distanceon a coevolving population can be obtained by taking the av-erage value of pairwise distances of strategies in the population[68]

POPSIZE POPSIZE

(13)

Phenotypic diversity relates to the level of variety in the phe-notypic space (e.g., fitness values used in the selection processof coevolutionary learning) [34], [69]. Studies such as in [70]have investigated fingerprints of strategies based on their be-havioral responses against opponents. Here, we use the simplephenotypic diversity measurement based on the entropy diver-sity measurement [34], [35]

(14)

where refers to the proportion of a particular fitness valuein the population. In our case studies, the fitness value is the av-erage IPD game payoff obtained from a round robin tournamentwith the entire population of coevolving strategies.

IV. RESULTS AND DISCUSSION

In this section, we present an empirical study to investi-gate whether there is a relationship between generalizationperformance and diversity in coevolutionary learning. Usingcase studies, we systematically investigate the impact of dif-ferent diversity maintenance techniques to the coevolutionarylearning of the four-choice IPD on: 1) generalization perfor-mance and 2) diversity. We use the results of CCL withoutdiversity maintenance as a baseline for comparison with resultsof coevolutionary learning systems with diversity maintenance.

We first present the results of the case studies that investigatethe impact of implicit diversity maintenance on the generaliza-tion performance of coevolutionary learning. This is followedby the results of case studies that investigate the impact of ex-plicit diversity maintenance on the generalization performanceof coevolutionary learning. After that, we present results on thecase studies that investigate the impact of implicit and explicitdiversity maintenance on the diversity levels of the coevolvingpopulation.

In general, we have made observations that are consistentwith those from past studies on the impact of individual diversity



maintenance techniques on the generalization performance ordiversity (as some of the past studies have made actual measure-ments for only one quantity) for coevolutionary learning of theIPD game. More importantly, through the use of the case studieswith common quantitative measurements for generalization per-formance and diversity, our results indicate that the introduc-tion and maintenance of diversity do not necessarily lead to asignificant increase in the generalization performance of coevo-lutionary learning. However, our observations and subsequentanalysis of results have suggested the potential that diversity canbe exploited in improving the generalization performance of co-evolutionary learning if the coevolved strategies in the popula-tion can be combined (e.g., using a gating mechanism).

A. Generalization Performance of CoevolutionaryLearning With Implicit Diversity Maintenance

We first present the results of the generalization perfor-mance measurements with respect to an unbiased sample

. We compare the generalization performance of dif-ferent coevolutionary learning systems with implicit diver-sity maintenance with CCL without diversity maintenance(CCL-PM05) for the four-choice IPD. Three measurements for

, and , areobtained. Table III summarizes the results of the measurementsat the end of the coevolutionary learning process.

In general, mixed results are obtained for the comparison ofmeasurements of coevolutionary learning systems with dif-

ferent implicit diversity maintenance techniques and the coevo-lutionary learning without diversity maintenance (CCL-PM05).For example, the generalization performance of coevolutionarylearning with RV is actually lower than that of CCL-PM05.Table III shows that all three measurements

, and for are lower for thecase of RV in comparison to the case of CCL-PM05.

However, coevolutionary learning with competitive fitnesssharing has resulted in significantly higher generalizationperformance compared to CCL-PM05. For example, Table IIIshows that both coevolutionary learning systems using compet-itive fitness sharing with disassortative sampling (CFSD) andcompetitive fitness sharing with random sampling (CFSR) havehigher generalization performance in all three measurements

, and forcompared to the case of CCL-PM05 that is statistically signif-icant. In comparing between the two sampling procedures theimpact on the generalization performance of coevolutionarylearning, there is little difference in the results for using dis-assortative sampling in CFSD and for using simple randomsampling in CFSR.

For coevolutionary learning systems using implicit fitnesssharing with disassortative sampling (IFSD) and implicitfitness sharing with random sampling (IFSR) and Paretocoevolution (PAR), Table III shows that only the compar-isons for measurements for with that ofCCL-PM05 result in statistical significant differences. For both

and measurements for ,the comparisons between coevolutionary learning with andwithout the particular implicit diversity maintenance do notreveal any statistical significant difference. The

TABLE IIISUMMARY OF THE RESULTS FOR THE GENERALIZATION PERFORMANCEWITH RESPECT TO AN UNBIASED SAMPLE GIVEN BY �� ,

�� , AND �� FOR DIFFERENT COEVOLUTIONARYLEARNING SYSTEMS WITH IMPLICIT DIVERSITY MAINTENANCE FOR THE

FOUR-CHOICE IPD TAKEN AT THE FINAL GENERATION. FOR EACH SYSTEM,THE AVERAGE VALUE AND THE ERROR AT 95% CONFIDENCE INTERVAL OVER

30 RUNS ARE NOTED. VALUES IN BOLD INDICATE THAT THE PARTICULARSYSTEM HAS A SIGNIFICANTLY HIGHER DIVERSITY LEVEL COMPARED TO THAT

OF CCL-PM05 BASED ON THE PAIRED WILCOXON SIGNED RANK TEST FORSTATISTICAL SIGNIFICANCE DIFFERENCE AT THE 0.05 LEVEL OF SIGNIFICANCE

and measurements for both IFSD and IFSRare actually lower compared to that of CCL-PM05. For PAR,

and measurements are slightlyhigher in comparison to that of CCL-PM05.

The results for coevolutionary learning with implicit fitnesssharing (IFSD and IFSR) are in agreement with previous claimsmade in [29] and [60]. The generalization performance of eachindividual evolved strategy is not necessarily improved (as in-dicated by lower and measure-ment values in Table III). However, if each evolved strategyspecializes against smaller but different groups of opponents,a population of these specialist strategies has a significantlyhigher generalization performance if the population can suffi-ciently often and consistently choose the best strategy from theavailable pool against the opponents through a gating mecha-nism (as indicated by significantly higher mea-surement values in Table III). In addition, the effect of specia-tion on the coevolving population is maintained throughout theevolutionary process for both IFSD and IFSR. Fig. 6 shows thatthe difference between the generalization performance measure-ments for individual evolved strategies and the population ismaintained on average across the generations.

By taking plots of the average results over 30 indepen-dent runs for the generalization performance measurements,we can observe the overall generalization performance andhow it changes as a result of applying a particular diversitymaintenance technique in coevolution. For implicit fitnesssharing, we observe in Fig. 6 a significant difference betweenthe generalization performance of the population as a wholeand individual strategies throughout coevolution. However,we also note some small degree of local fluctuations in thegeneralization performance particularly for the measurementof individual strategies. This is not surprising since the implicitfitness sharing technique aims to increase diversity directly (topromote speciation in the population), which would indirectlyresult in an increase in the generalization performance whenthe population of strategies can be combined using a gatingmechanism. Although diversity is maintained in the coevolvingpopulation, it does not necessarily result in a well-speciated



Fig. 6. Generalization performance given by �� ,�� , and �� measurements for coevolutionarylearning systems with implicit fitness sharing across the generations forthe four-choice IPD. The results of generalization performance duringcoevolutionary learning for (a) IFSD and (b) IFSR are shown. Each graph isobtained by averaging over 30 independent runs.

population of strategies from the point of view of generalizationperformance (as one observes from the local fluctuations inFig. 6).

Similar observations are also made for PAR in Table IIIwhere the generalization performance of the population

is significantly higher compared to thegeneralization performance of individual evolved strategies( and ). These results for PARare consistent with the emphasis of Pareto coevolution inselecting nondominated strategies in the coevolving populationwhere each would maintain performance against differentopponents that are increasingly higher in quality. The Paretocoevolution would result in the evolutionary search of strategiesthat have a greater degree of specialization, in which the com-bination of these individual strategies would result in highergeneralization performance.

In addition, we have taken measurements of the general-ization performance with respect to a biased sample forvarious coevolutionary learning systems with implicit diversitymaintenance and CCL without diversity maintenance. Table IV

TABLE IVSUMMARY OF THE RESULTS FOR THE GENERALIZATION PERFORMANCE WITHRESPECT TO A BIASED SAMPLE GIVEN BY �� , �� ,AND �� FOR VARIOUS COEVOLUTIONARY LEARNING SYSTEMS

WITH IMPLICIT DIVERSITY MAINTENANCE FOR THE FOUR-CHOICE IPDTAKEN AT THE FINAL GENERATION. FOR EACH SYSTEM, THE AVERAGEVALUE AND THE ERROR AT 95% CONFIDENCE INTERVAL OVER 30 RUNSARE NOTED. VALUES IN BOLD INDICATE THAT THE PARTICULAR SYSTEMHAS A SIGNIFICANTLY HIGHER DIVERSITY LEVEL COMPARED TO THAT OFCCL-PM05 BASED ON THE PAIRED WILCOXON SIGNED RANK TEST FOR

STATISTICAL SIGNIFICANCE DIFFERENCE AT THE 0.05 LEVEL OF SIGNIFICANCE

summarizes the results of the , ,and measurements for . In general, observa-tions made from comparisons between coevolutionary learningsystems with and without implicit diversity maintenance for

are similar to observations made for the case of .Forexample,bothCFSDandCFSRresulted instatisticallysig-

nificant and higher generalization performance for all measure-ments compared to that of CCL-PM05, while all measurementsof generalization performance for RV are lower compared toCCL-PM05’s (Table IV). The same observation is made for PARwhere the measurement for is statisticallysignificantly higher compared to that of CCL-PM05. The onlyexception is coevolutionary learning with implicit fitness sharing(IFSD and IFSR) where the measurements for

are no longer statistically significantly different even thoughthe values are still higher compared to those of CCL-PM05.

In addition, it is noted that although the relative generalizationperformance of different coevolutionary learning systems withimplicit diversity maintenance remains unchanged (i.e., theycan be ranked in the ascending order of RV, IFSR, IFSD, PAR,CFSR, and CFSD), the overall generalization performance mea-surements are lower with respect to a biased sample of “good”test strategies compared to the measurements made with respectto an unbiased sample of test strategies. These results are con-sistent with our earlier conclusion in [30] where it is more diffi-cult for evolved strategies to perform well against strong gameplaying test strategies obtained from the multiple partial enu-merative search.

B. Generalization Performance of CoevolutionaryLearning With Explicit Diversity Maintenance

Table V summarizes the results of generalization perfor-mance measurements for that have been made for differentcoevolutionary learning systems with explicit diversity mainte-nance in comparison to that of CCL-PM05. In general, resultsshow that the application of explicit diversity maintenance (byincreasing the setting) does not lead to an increase in thegeneralization performance of the top performing individualevolved strategies. Table V shows that and



TABLE VSUMMARY OF THE RESULTS FOR THE GENERALIZATION PERFORMANCEWITH RESPECT TO AN UNBIASED SAMPLE GIVEN BY �� ,�� , AND �� FOR VARIOUS COEVOLUTIONARYLEARNING SYSTEMS WITH EXPLICIT DIVERSITY MAINTENANCE FOR THE

FOUR-CHOICE IPD TAKEN AT THE FINAL GENERATION. FOR EACH SYSTEM,THE AVERAGE VALUE AND THE ERROR AT 95% CONFIDENCE INTERVAL OVER

30 RUNS ARE NOTED. VALUES IN BOLD INDICATE THAT THE PARTICULARSYSTEM HAS A SIGNIFICANTLY HIGHER DIVERSITY LEVEL COMPARED TO THAT

OF CCL-PM05 BASED ON THE PAIRED WILCOXON SIGNED RANK TEST FORSTATISTICAL SIGNIFICANCE DIFFERENCE AT THE 0.05 LEVEL OF SIGNIFICANCE

measurements for actually decrease. How-ever, the generalization performance given by the populationas a whole increases significantly if is sufficiently high.Table V shows that the measurements forin the case of CCL-PM20 and CCL-PM25 are statisticallysignificantly higher compared to the case of CCL-PM05.

The results show that increasing the level of variation inthe population does not lead to the coevolutionary learning ofindividual strategies with higher generalization performance.However, the generalization performance of the population as awhole is increased. This observation suggests that the populationis effectively speciating, with the individual specialist strategiesoutperforming smaller and different groups of opponents. Thespeciated population of coevolved strategies can be combinedusing a gating mechanism that effectively outperforms a largernumber of opponents. Fig. 7 illustrates this observation where thelarge differences between the generalization performance for in-dividualevolvedstrategies( and )and the population are maintained on averagethroughout coevolution when is significantly increased(CCL-PM25). The same observation has been made earlier forIFSD and IFSR that also speciate the population (Fig. 6).

Table VI summarizes results for the generalization perfor-mance measurements with respect to a biased sample forvarious coevolutionary learning systems with explicit diversitymaintenance. We observe that the application of explicit diver-sity maintenance to coevolutionary learning by increasing the

setting can have a positive impact in increasing the general-ization performance of the population as a whole. However,the increase is not statistically significant. This is not surprisingas the evolved strategies are competing against a biased sampleof strong game playing test strategies obtained from the mul-tiple partial enumerative search.

C. Impact of Diversity Maintenance on DiversityLevels in Coevolutionary Learning

Results from the experiments show that not all applicationsof diversity maintenance techniques lead to positive and sig-nificant improvements on the generalization performance ofcoevolutionary learning. However, before any conclusion canbe drawn, measurements of diversity levels in the population

Fig. 7. Comparison of generalization performance given by �� ,�� , and �� measurements for coevolutionarylearning systems with and without explicit diversity maintenance across thegenerations for the four-choice IPD. The results of generalization performanceduring coevolutionary learning for (a) CCL-PM05 and (b) CCL-PM25 areshown. Each graph is obtained by averaging over 30 independent runs.

TABLE VISUMMARY OF THE RESULTS FOR THE GENERALIZATION PERFORMANCE WITHRESPECT TO A BIASED SAMPLE GIVEN BY �� , �� ,AND �� FOR VARIOUS COEVOLUTIONARY LEARNING SYSTEMS

WITH EXPLICIT DIVERSITY MAINTENANCE FOR THE FOUR-CHOICE IPDTAKEN AT THE FINAL GENERATION. FOR EACH SYSTEM, THE AVERAGEVALUE AND THE ERROR AT 95% CONFIDENCE INTERVAL OVER 30 RUNSARE NOTED. VALUES IN BOLD INDICATE THAT THE PARTICULAR SYSTEMHAS A SIGNIFICANTLY HIGHER DIVERSITY LEVEL COMPARED TO THAT OFCCL-PM05 BASED ON THE PAIRED WILCOXON SIGNED RANK TEST FOR

STATISTICAL SIGNIFICANCE DIFFERENCE AT THE 0.05 LEVEL OF SIGNIFICANCE

for different coevolutionary learning systems with and withoutdiversity maintenance would need to be performed. Table VIIsummarizes the results of genotypic and phenotypic diversity



TABLE VIISUMMARY OF GENOTYPIC AND PHENOTYPIC DIVERSITIES FOR VARIOUS

COEVOLUTIONARY LEARNING SYSTEMS WITH DIVERSITY MAINTENANCEFOR THE FOUR-CHOICE IPD TAKEN AT THE FINAL GENERATION. FOR EACH

SYSTEM, THE AVERAGE VALUE AND THE ERROR AT 95% CONFIDENCEINTERVAL OVER 30 RUNS ARE NOTED. VALUES IN BOLD INDICATE THAT

THE PARTICULAR SYSTEM HAS A SIGNIFICANTLY HIGHER DIVERSITY LEVELCOMPARED TO THAT OF CCL-PM05 BASED ON THE PAIRED WILCOXON

SIGNED RANK TEST FOR STATISTICAL SIGNIFICANCE DIFFERENCEAT THE 0.05 LEVEL OF SIGNIFICANCE

measurements for different coevolutionary learning systemsthat are taken at the final generation. The results provide ameans to determine the impact of diversity maintenance on thediversity levels in the population of coevolutionary learningthrough the comparison of diversity levels between coevo-lutionary learning systems with diversity maintenance andCCL system (CCL-PM05) that is not designed specifically tointroduce and maintain high diversity levels.

In general, the results on the measurements of diversity levelsare consistent and in agreement with the design motivations ofthe respective coevolutionary learning systems with diversitymaintenance. For coevolutionary learning with explicit diversitymaintenance, it is expected that the introduction of more vari-ations to the evolved strategies in the population by increasingthe setting would lead to an increase in the genotypic diver-sity levels. Table VII shows that the increase in genotypic diver-sity levels is statistically significant (e.g., comparing the geno-typic diversity level of CCL-PM25 with that of CCL-PM05).Furthermore, the statistically significant increase that is shownin Table VII in phenotypic diversity levels is consistent withthe use of the direct lookup table strategy representation. Moregenotypic variation in the population of strategies would leadto more variation in the behavioral interactions, which are re-flected by the higher phenotypic diversity levels.

For coevolutionary learning systems with implicit fitnesssharing, the general increase in diversity levels (genotypic andphenotypic) is consistent with the original design motivationof speciating the population by creating niches of evolvedstrategies having different and unique behaviors [29], [44].Table VII shows that both genotypic and phenotypic diversitylevels for IFSD and IFSR are statistically significantly highercompared to that of CCL-PM05. A similar observation can alsobe made for the case of PAR in comparison with CCL-PM05.The higher diversity levels would be a result of the evolution

and subsequent maintenance of a large set of mutually non-dominating strategies (e.g., the entire population) [33].

For coevolutionary learning with RV, Table VII shows that ahigher phenotypic diversity level is introduced and maintainedin the population compared to that of CCL-PM05. However, RVhas only a slightly higher genotypic diversity level in the popu-lation compared to that of CCL-PM05. Despite this, the resultsare consistent with the design motivation of the technique. Inparticular, the technique does not seek to diversify a populationinto niches, but instead, aims to encourage the population to en-gage in interactions [25]. This would result in the coevolutionof a population of strategies responding differently to one an-other, leading to an increase in fitness variations that would beobserved as an increase in the phenotypic diversity level.

The only exception is coevolutionary learning with com-petitive fitness sharing. In our case studies, the application ofcompetitive fitness sharing does not lead to an increase in thephenotypic diversity level of the population in coevolutionarylearning. Table VII shows that the phenotypic diversity levelsfor CFSD and CFSR are lower compared to that of CCL-PM05.This observation is different compared to the observation thathas been reported earlier in [27]. In particular, the study uses adiversity measurement that counts the number of unique strategyresponses above a significant usage, which is similar to thephenotypic diversity measurement. This discrepancy may be aresult of applying coevolutionary learning to different games.

Despite this, the use of competitive fitness sharing does leadto a positive and significant increase in the genotypic diversitylevel in the population that is shown in Table VII. These re-sults are verified through a closer inspection of evolved strate-gies. It is observed that evolved strategies in the population haveengaged mostly with “full defection,” which would result insimilar average payoffs that lead to low phenotypic diversity.However, observations on the representation of these evolvedstrategies show differences on the parts that are not accessedduring game play interactions, which nevertheless would resultin higher genotypic diversity.

In comparing the relative levels of diversity that are intro-duced and maintained due to applications of different diversitymaintenance techniques in coevolutionary learning, both im-plicit fitness sharing and explicit diversity maintenance appearto be particularly effective. Fig. 8 plots the average genotypicand phenotypic diversity levels of different coevolutionarylearning systems across the generations. Both figures indicatethat the implicit fitness sharing and explicit diversity main-tenance (at sufficiently high settings) can introduce andmaintain higher diversity levels (genotypic and phenotypic) inthe population throughout the evolutionary process comparedto other diversity maintenance techniques.

Thus far, we have presented and discussed separately theimpact of different diversity maintenance techniques in intro-ducing and maintaining different diversity levels (genotypicand phenotypic) in coevolutionary learning. Here, we inves-tigate whether there are relationships between genotypic andphenotypic diversity in coevolutionary learning with diversitymaintenance. Our analysis would be based on a series of scatterplots (which have been employed previously in [34] for anal-ysis) of genotypic and phenotypic diversity. Each point in the



Fig. 8. Comparison of genotypic and phenotypic diversity for various coevolutionary learning systems across the generations for the four-choice IPD. (a) and(b) Comparison of the genotypic diversity for the case of CCL-PM05 with that of coevolutionary learning with explicit diversity maintenance (CCL-PM10, CCL-PM15, CCL-PM20, and CCL-PM25) and implicit diversity maintenance (CFSD, CFSR, IFSD, and IFSR), respectively. (c) and (d) Comparison of the phenotypicdiversity for the case of CCL-PM05 with that of coevolutionary learning with explicit diversity maintenance (CCL-PM10, CCL-PM15, CCL-PM20, and CCL-PM25) and implicit diversity maintenance (CFSD, CFSR, IFSD, and IFSR), respectively. Each graph is obtained by averaging over 30 independent runs.

scatter plot represents one population in a particular generationfrom each of the 30 independent runs. Since each coevolu-tionary learning system is run for 600 generations and repeated30 times, the scatter plot would have a total of 600 30 points.The scatter plot may provide information on the interactionsand possible relationships between genotypic and phenotypicdiversity as a result of the application of specific diversitymaintenance techniques in coevolutionary learning.

We have compared the genotypic–phenotypic diversityscatter plots for coevolutionary learning with and withoutexplicit diversity maintenance. Fig. 9 illustrates two generalobservations that we have made. First, there is a trend of highergenotypic diversity occurring with higher phenotypic diversityfor CCL-PMs at higher settings (such as CCL-PM20). Thehigher settings would introduce more genotypic variations,which in turn, would lead to more variations in the behavioralinteractions (phenotypic diversity) in coevolutionary learning.Second, CCL-PMs with higher settings can better maintainboth genotypic and phenotypic diversity, e.g., the points aredistributed more towards the top-right region of the scatter plotfor CCL-PM20 compared to that of CCL-PM05.

We have also compared the genotypic–phenotypic diversityscatter plots for various coevolutionary learning systems withimplicit diversity maintenance. In general, we observe a trend ofhigher genotypic diversity occurring with higher phenotypic di-versity for coevolutionary learning with implicit fitness sharing(Fig. 10) and also Pareto coevolution. The observation for sucha trend is less apparent for the case of coevolutionary learningwith RV and more so with competitive fitness sharing (Fig. 10).For our case studies, the positive relationship of genotypic andphenotypic diversity can be explained by considering the effectof speciation in the population to niches of evolved strategieshaving different and unique behaviors. A speciated populationwould have high genotypic and phenotypic diversity. The spe-ciation effect would be very pronounced in IFSD.

D. Relationship Between Generalization Performanceand Diversity in Coevolutionary Learning

The results of generalization performance and diversity mea-surements from the case studies show that the introduction andmaintenance of diversity through the applications of diversity



Fig. 9. Comparison of genotypic–phenotypic diversity scatter plots forcoevolutionary learning systems with explicit diversity maintenance for thefour-choice IPD. Each figure plots the population’s genotypic diversity againstthe population’s phenotypic diversity. Each point represents one populationin a particular generation from each of the 30 independent runs. There arealtogether 600� 30 points. (a) CCL-PM05. (b) CCL-PM20.

maintenance techniques do not necessarily lead to improve-ments on the generalization performance of coevolutionarylearning. A case in point is the use of coevolutionary learningwith RV where the resulting generalization performance islower compared to CCL without diversity maintenance, regard-less of whether the generalization performance measurementsare made with respect to individual evolved strategies or withrespect to the population.

However, it is not necessarily true that a diversity mainte-nance technique that does not introduce and maintain high diver-sity levels during coevolution will lead to lower generalizationperformance. A case in point is the use of the competitive fit-ness sharing in coevolutionary learning. Results have shown thatthe application of competitive fitness sharing significantly im-proves on the generalization performance in all measurements,i.e., regardless of whether an unbiased or biased sample of teststrategies is used, or whether the measurements are made on in-dividual evolved strategies or the population as a whole. This

Fig. 10. Comparison of genotypic–phenotypic diversity scatter plots for co-evolutionary learning systems with implicit diversity maintenance for the four-choice IPD. Each figure plots the population’s genotypic diversity against thepopulation’s phenotypic diversity. Each point represents one population in aparticular generation from each of the 30 independent runs. There are altogether600� 30 points. (a) IFSD. (b) CFSD.

is despite coevolutionary learning systems with competitive fit-ness sharing not having higher diversity levels compared to sys-tems using other diversity maintenance techniques.

The success of competitive fitness sharing in improvingthe generalization performance of coevolutionary learningappears to depend on how the strategy fitness is determinedusing (1). The emphasis of the selection process on strategiesoutperforming opponents that few others could leads to thecoevolutionary learning of strategies that play “full defection”more often. Given that an “all defect” strategy would obtainthe maximum possible generalization performance (definedby the win–lose outcome), a coevolutionary learning systemthat evolves a population of strategies effectively playing “alldefect” would have the highest generalization performance.In this case, it would appear that diversity plays little role inimproving the generalization performance of coevolutionarylearning [we note the similarity in the genotypic–phenotypic di-versity scatter plots for CFSD in Fig. 10(b) and for CCL-PM05in Fig. 9(a)].



Fig. 11. Comparison of diversity-generalization scatter plots for coevolu-tionary learning systems with implicit fitness sharing for the four-choice IPD.Each figure plots the population’s diversity against the population’s general-ization performance estimated by �� . Each point represents onepopulation in a particular generation from each of the 30 independent runs.There are altogether 600� 30 points. (a) IFSD. (b) IFSD.

However, one general observation that we have made consis-tently from the case studies is that the introduction and mainte-nance of diversity that lead to the speciation of the population tounique niches of specialized strategies have a positive and sig-nificant impact on the generalization performance of coevolu-tionary learning if the population of strategies can be combined(e.g., using a gating mechanism as in [29]). We have conductedan analysis based on diversity-generalization scatter plots to in-vestigate this issue. Fig. 11 shows that the speciation effect ofa coevolving population is particularly effective in improvingthe generalization performance through the population of spe-ciated strategies obtained from the application of implicit fit-ness sharing. One can observe a trend of high diversity (geno-typic and phenotypic) occurring with high generalization per-formance of the population for coevolutionary learning withimplicit fitness sharing. Similar observations can be made tosome extent for Pareto coevolution and coevolutionary learningwith explicit diversity maintenance, although it is apparent that

implicit fitness sharing has a more pronounced speciation ef-fect to coevolutionary learning that can improve generalizationperformance.

Although overall our case studies have not been able to es-tablish a clear positive relationship between the generalizationperformance (as we define it) and diversity in coevolutionarylearning, there are some diversity maintenance techniques, e.g.,implicit fitness sharing, that seem to be able to boost the gener-alization performance when the population of coevolved strate-gies can be combined. Some past studies such as [29] and [71]have also suggested that diversity in the population can be ex-ploited in coevolutionary learning to improve “out-of-sample”performance. In those approaches, diversity plays an importantrole in coevolutionary learning through speciation to obtain a di-verse set of solutions (each solution specializing with respect todifferent groups of test cases) that as a whole has the potentialfor higher performance. However, the issues of quantificationof population diversity and performance measurement remainopen research questions.

V. CONCLUSION

In this paper, we have conducted a detailed empirical study toinvestigate the issue of whether there is a relationship betweengeneralization performance and diversity in coevolutionarylearning. Understanding this issue is important given that thelack of diversity in the population may have a negative impacton the generalization performance of coevolutionary learning.More importantly, specific conditions may be identified withpotential implications for design choices in coevolutionarysearch algorithms whereby diversity can be exploited toimprove the generalization performance in coevolutionarylearning.

We have used games as they provide a natural frameworkto study coevolutionary learning. The interactions between thecoevolving solutions in the population can be framed as gameplaying. This allows one to study the two main properties ofa coevolutionary learning system, i.e., decision making andlearning, in detail. With games, one is interested in the abilityof coevolutionary learning system to search for strategies thatcan outperform a large number of different opponents. We haveconducted case studies involving IPD games to systematicallyinvestigate the impact of various diversity maintenance tech-niques on the generalization performance of coevolutionarylearning. The case studies compare coevolutionary learningwith and without diversity maintenance and address two short-comings in previous studies through a series of quantitativemeasurements for both the generalization performance anddiversity levels in the population of various coevolutionarylearning systems.

We have been motivated to answer two related questions inour studies: 1) what is the right form of diversity that would bebeneficial for generalization, and 2) what amount of the “right”form of diversity is needed for good generalization? Here, re-sults from our case studies have shown that the introduction andmaintenance of diversity in the population of coevolutionarylearning do not necessarily lead to a significant increase in thegeneralization performance. Although some diversity mainte-nance techniques have been shown to significantly improve the



generalization performance of coevolutionary learning, thereare cases (e.g., RV) where a positive relationship between thegeneralization performance and diversity has not been observedand established in our experiments.

We have also investigated the coverage of the whole strategyspace by the population of coevolved strategies. In particular,we have generated a large number of test strategies and foreach test strategy, we have checked whether the coevolvedpopulation contains a strategy capable of winning against thattest strategy. Lower values of this population measure wouldsignify that the coevolved population is “homogeneous” andthere is a large pool of possible test strategies for which co-evolutionary learning has not evolved effective strategies tocompete against. On the other hand, larger values of this pop-ulation measure mean a good coverage of the strategy spaceby the coevolved population. We have observed that diversitymaintenance that led to a speciated population (e.g., implicitfitness sharing) did increase the coverage of the strategy space.For other diversity maintenance techniques, we have not ob-served a strong correlation between increased diversity andstrategy space coverage.

However, the population coverage measurement using anideal gating mechanism only indicates the potential of thecoevolved population to do well against arbitrary test strategies.Given a test strategy, we would need a mechanism for choosingthe “right” coevolved strategy from the population. Ensembleapproaches can be used to combine different expertise in adiverse population together to form a stronger strategy [29],[71]. This is a direction for our future research. An ensembleapproach can be regarded as an automatic divide-and-conquerapproach to complex problem solving. Different individualsin the ensemble will be specialists in dealing with differentaspects of the entire problem. As a result, the ensemble as awhole will be a stronger solution.

The study has considered different forms of fitness mea-surements (e.g., implicit diversity maintenance techniques)in coevolutionary learning that are evaluated using a general-ization performance measure based on the win–lose function.However, the motivation of our study is not constrained to theuse of a specific choice of fitness measure in coevolutionarylearning and its evaluation by a specific generalization per-formance measure. We use the generalization performancemeasure based on the win–lose function to simplify the anal-ysis of the experiments since we know that “all defect” is thestrategy with the maximum generalization performance. Otherfunctions such as average payoffs can also be used althoughthe analysis will be much more difficult since we may notknow a priori the strategy having the maximum generalizationperformance. Further studies on this topic will be of

Documents

214 IEEE TRANSACTIONS ON COMPUTATIONAL ...xin/papers/ChongTinoYaoTCIAIG09.pdf214 IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, VOL. 1, NO. 3, SEPTEMBER 2009 Relationship