1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004

1 of 81

Evolution and Coevolution of ANNs playing Go

Peter Mayer, 2004

(81) 2

Outline

Computers and Games

The game of Go

Experimental Setup

Training of Go playing ANNs

Evolution of Go Playing ANNs

Summary and Outlook

(81) 3

Games

Algorithms designed since AIs onset Clearly defined rules

Still complex

Chess received the most attention More researched than Go

Two main approaches Rely on expertise – directly programmed

weighted features; Extensive knowledge

Use evolution – less knowledge; more

versatility

(81) 4

The game of Go

Oldest (unaltered)

strategic board game in

the world

10,000,000 players in

Japan “alone”

Fairly simple rules

BUT difficult to master Immense tree (~200 opts)

Complex structures

Many concurrent goals

(81) 5

Go Rules

19x19 board Empty in the beginning

Black & White “stones”

Black starts

Each turn Place 1 stone

At an intersection

Never move stones

OR pass

(81) 6

Go Rules [2] Objective - Get the most points !

Points are acquired by: Securing Territories

Capturing opp’s

pieces

(81) 7

Go Rules [3] Stones at a vertically or horizontally

adjacent intersection are called a group An empty intersection adjacent to a stone

or group is called a "liberty" of that group 1 Liberty = group in “atari” No liberties -> CAPTURE ! Group is

removed Example – Black places stone in X

resulting in right figure

http://games.yahoo.com/games/rules/go/goglossary.html#goliberty

(81) 8

Go Rules [4] Stones can be placed anywhere,

but cannot commit suicide (except

Chinese)

Legal if stone simultaneously

captures opponent’s group (2

right figures)

Suicide – white cannot place at

X

White CAN place at X

Result: capture

(81) 9

Go Rules [5]

Same position cannot occur more

than once

Endless repetitions: Black can capture at upper figure by

placing at X

White - same by placing at Y

Black – repeat…

Ko rule White may not place at Y before

playing somewhere else first

Avoid any repetitions

(81) 10

Go Rules – Live and Dead groups “Dead” groups if impossible to prevent

capture It is not necessary to do so

Group remains on board

At end of game, removed and added to captured

stones

“Living” groups are impossible to capture Group with 2 “eyes” – even if white surrounds it,

playing at X or Y is suicide

Opponent must play elsewhere

(81) 11

Go Basics – End game

Play continues until both players pass

Players then alternatively play stones at

“neutral” points – adjacent to both White and

Black Also known as “dame” (DAH-MAY)

Dead stones are removed from the board and

counted with other prisoners (1 point per

prisoner)

Also - 1 point for each intersection surrounded

by player’s stones (“territory”)

(81) 12

Go Basics – End game example

Prisoners were removed already

All 4 points marked X are dame – worthless

Black has 7 points in UR (territory); 2 points in LL

1 removed prisoner

TOTAL = 10 points

White has 5 in UL; 2 in LR

2 prisoners

TOTAL = 9 points

Black wins unless komi (5.5 pts compensation) is

due

(81) 13

Ranking and Handicaps Determine Go players’ strength Resemblance to martial arts Both amateur and professional ranking system Amateur

35 kyu to 1 kyu THEN 1 dan to 7 dan

Pro 1 dan to 9 dan Awarded only by Go institutions

Pro dans are much stronger than amateur dans

(81) 14

Ranking and Handicaps (2) Handicaps

Weaker player starts with several stones on the board Placed at specific places Helps make games more even

Difference in ranks ~ number of handicap stones needed to win

2 stones to even 2 dan against 4 dan 4 to even 3 kyu and 2 dan

The most powerful Go programs reach only … … 10 kyu!

(81) 15

Outline

Computers and Games

The game of Go

Experimental Setup



Summary and Outlook

(81) 16

Experimental Setup

Opponent Go players

ANN player

Go board (input) representations

Move (output) representations

Coevolution Hall of Fame coevolution

Cultural coevolution

General evolution setup

(81) 17

Go Players - Random

No strategy

Pass move also

“Knows” only the rules of go legality of moves

Usually weakest opponent

(81) 18

Go Players – Naïve Player

Roughly human-beginner level

Able to save and capture stones

Knows about Lost stones

Saving - connecting stones to living groups

Weak stones (not savable)

(81) 19

Go Players – Naïve Strategy A subset of JaGo’s (main opponent)

strategy Outline (arranged by priority):

Attempt to save Try to put opponent into atari Connect weak stones Capture opponent groups in atari Check intersections for placing stones

In random order Make sure no (own) liberties decrease below 2 as a

result

Perform Random move

(81) 20

Go Players – JaGo Player Java based program

Best computer player used Not a strong player ~16 kyu

Knows standard techniques Mainly save & capture

Uses pattern matching Looks at entire board

32 patterns, with rotations and mirrors

(81) 21

Go Players – JaGo Strategy (1) Save stones in atari

Try to decrease liberties of large groups

Find own savable larger groups

Attack opponent’s groups (decreasing

order:) With 2 or more liberties and attackable

With 2 or more stones & less than 3 liberties

With 2 or less liberties

(81) 22

Go Players – JaGo Strategy (2)

Save own groups with few liberties if savable Start pattern matching – Response; Center

Random move order

Seek opponent’s groups to capture in 2 moves

Perform random move which isn’t of a bad pattern

Capture opponent’s single liberties Connect own weak stones PASS

(81) 23

Go Players – JaGo Patterns (1)

(81) 24

Go Players – JaGo Patterns (2)

(81) 25

Go Players – GNU Go Advantages

5x5 to 19x19 boards Handles handicaps well Rated 10 kyu

Problems 5x5 solved – open an C3 for 18.5 points

(komi=5.5) – always wins in Black GNU Go passes on B3, C2-4, D3 (only correct

at C3) Premature convergence of evolution

(81) 26

ANN Player Inform ANN about actual position

Evaluate ANN output to receive next move

Representation is important!

Intention maps For each Go move (including PASS) – value

between [0,1]

High value – high intention to make move (and

v.v)

Select legal move with highest value

To avoid predictability – consider sub optimal

moves also (“creativity factor”)

(81) 27

Player Strength Commonly to receive a rating unrated

Go players play against rated players

(same in Chess)

The strength s of a player is determined

by The score of 1000 double games

Against each of 3 opponents: R, N, JaGo

Divided by the number of games (6,000)

1 is perfect strength

3 opponents help resist over-fitting

(81) 28

Player Competence Strength is not understanding of rules

(legality) E.g. 2 players receive same score but only one

always tried legal moves first

The competence C of a player is defined as follows:

bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move

C is the averaged on all games

(81) 29

Board Representations 19x19 boards

far too large

Even for evolved agents

Use only 5x5 boards

(81) 30

Board Representations Should preprocess position to make

ANNs life easier

Tested in training experiments

Standard Input Representation (SIR) 2 neurons at each intersection :-

1 per player’s piece; 1 per opponent’s

No distinction between B and W stones

Optional – 1 neuron to tell if B or W

(2*b^2) neurons (were b is board size) = 50

(81) 31

Representations - NIR Naïve Input Representation

More compact

1 neuron per intersection

Set to -1 (player’s stone) or 1

(opponent’s)

0 if empty

Uses half of SIRs neurons = 25

(81) 32

Representations - LVIR Limited View Input Representation Splits the Go board into several

quadratic areas of size 3x3 Idea – simplest way of capturing stones

works within this area E.g. capture of 1 stone by surrounding it

Areas overlap at middle row and middle column

Coding – similar to SIR w is number of areas (=4) 72 Neurons Could also be Naïve

(81) 33

Clever Representations Based on image processing and circuits

We want less raw inputs to allow ANN to

concentrate more on features

Manhattan distance Used in integrated circuits where wires run

parallel to X or Y axis

Got its name from Manhattan NY, where

streets are aligned in grid

P1 = (x1, x2)

P2 = (y1, y2)

(81) 34

Clever Representations Manhattan distance is related to distance

of Go stones (no diagonals) distance = [#(separating stones) – 1]

1 if next to each other

2 if separated by one stone

3 for knight’s move or two separating stones

(81) 35

Representations: c-o-Matrix

Co-occurrence-matrices

Used in image processing

Many parameters are derived from it Mean, Sd, energy, contrast,

homogeneity, …

Quadratic

Based on a relation p between image

positions (symmetric if p is)

(81) 36


Elements C[i][j] = Number of times pixels occur in an

image of a specified value (color)

In the relation specified by p

Relative to other pixels

Size is number of different colors

(81) 37


An actual go board is an “image” with

3 different colors (including empty)

Example p1: Manhattan distance of 1 between 2

points First matrix row: B near B 16 times B near W 3 times B near empty 11 times

(81) 38


Does not say much about absolute positions – must combine

SIR and C for whole board NIR and C for whole board NIR and Cs for 3x3 areas

sLVIR and Cs for 3x3 areas

NLVIR and Cs for 3x3 areas

(81) 39

Output Representations Only 2

Standard Output Representation

(SOR) Each intersection is represented by 1

neuron

1 for PASS

(b^2 + 1) neurons

(81) 40

Output Representations Row Column Output Representation

(RCOR) Used to decrease ANN size

5 neurons for columns; 5 for rows

1 for PASS

(2b + 1) neurons

Intention more complicated:

PASS intention is square of relevant neuron

RCOR Limits intention map: v1>v2 y1>y2 v4>v3

All values positive, non-zero

(81) 41

Coevolution Derives non-static fitness, as in nature

1 or more populations; interacting

Competitive [battle] vs. Cooperative

[subtasks]

Advantages “Who needs enemies when you got friends like

these?” – saves finding opponents; Especially

in Go where no strong program exists

Variety in fitness – adaptive opponents

No upper bound for improvement

(81) 42

Coevolution Methods Applied

Based on work by Lubberts &

Mikkulainen [2001]

Hall of Fame Host population and Master population

Maintaining the ability of host population to

beat opponents of previous generations

Each generation, the best individual is added

to HoF

All population competes against sample of

the HoF

(81) 43

Coevolution - HoF Applied in this resaearch

HoF initially filled without competition

Individuals get their fitness by competing

against the masters

When full - host with highest win rates

(against masters) joins HoF Replace first Master to lose all games

Coevolutionary progress cannot be

directly seen Both populations constantly changeing

(81) 44

Cultural Coevolution A new approach!

Maintains “culture” of masters resembling

HoF

To enter culture, host must defeat all

masters Masters never replaced – unlimited culture size

Every individual receives a fitness score by

competing against all masters

Culture growth rate decreases rapidly Every new master is the strongest found (yet)

(81) 45

Cultural Coevolution [2] Numerous advantages Maintains ability to defeat weak players

Keeps good solutions found

Same player cannot enter twice Needs to defeat itself

Culture’s performance never decreases Avoid focusing on a specific player’s

weakness As soon as any master is immune, the hosts

have to find another way More masters less likely to remember all

weaknesses

(81) 46

General Evolution Setup Opponents – Random; Naïve; JaGo Fitness = strength

Rate of wins against all 3 opponents 6,000 games of both colors

Not using scores, only win rates Defeating more opponents is better

Generalized Multi-Layer Perceptrons (GMLPs) All non-loop connections are permitted

Evolving Hidden neurons; connections; weights; bias (for

non-input)

(81) 47

General Evolution Setup [2] 2 binary Chromosomes used

1 for connections : 0-no 1-yes 1 for hidden neurons (if 0, no connections also) Number of possible connections:

ni, nh, no – number of input, hidden and output neurons

Determines size of chromosome

Real-Chromosome Weights & Bias values (seen as weights) Size is number of connections + number of bias

vals (for non-input)

(81) 48

General Evolution Setup [3] Tournament selection (size 2) 2 point crossover Binary mutation

Flip bits with 1/L probability

Real-Chromosome Mutation multiple-σSA Each object maintains altering “strategy”

params which alter distribution of “object” params

Normal distributions used for both

(81) 49

Setup – Recurrent Nets Difficult to learn Go without structured input Experiments with recurrent nets included Allow loops for input Ns

Naturally represent adjacent board intersections

No hidden Ns Played against JaGo Typically output changes without input

change due to feedback loops Computed output only once! Only 2 directly connected Ns influence each other Evolutions should connect only close Ns

(81) 50

Outline

Computers and Games

The game of Go

Experimental Setup



Summary and Outlook

(81) 51

Training ANNs – Setup Testing IRs mentioned previously No Go-specific knowledge used Each experiment was repeated 20 times Nets, same as Richards [1998]

3 layers; Fully connected; Feed forward Linear activation for input Ns; Sigmoid for rest 50 input; 26 output; 100 hidden - 7600

connections

Patterns: JaGo vs Jago; 5x5 board;

Rprop – resilient variant of Backprop

(81) 52

Training ANNs – Experiment 1

Determine number of training cycles Too few cycles Weights not adjusted properly Too many over-fitting

Determine training pattern set Limit the level a Go player can reach Should include all 3 game stages Both expert and novice moves

JaGo vs JaGo All game stages No distinction between winner and loser moves

1,000 .. 5,000 Cycles; 50/100/200 Games

(81) 53

Training ANNs – Results 1

Average of 20 runs 100&200 games better than 50 3000\5000 games don’t add strength Best – 200 games; 2000 cycles

Used hereafter

(81) 54


Determine number of hidden Ns Many

Diverse features

Few Few stronger features (perhaps better 1s) Less time-consuming

100 Ns yielded best results selected

(81) 55


Output representations Standard (SOR) vs Row-Column (RCOR) 200 patterns; 2000 games; 100 hidden Ns

Similar strength; RCOR competence slightly lower

RCOR still expansive and adds constraints SOR is used in the following experiments

(81) 56

Training ANNs – IR Experiments

Various input representations Used reference-ANN (RANN)

SIR & SOR; 100 hidden; 7,600 connections Strength = 0.2908; Competence = 0.8467

2,000 games; 200 cycles NIR (half input size) & SOR

Strength = 0.2093; Competence = 0.8031 Naïve input makes it difficult to learn Go

LVIR (3x3 windows) & SOR Strength = 0.2755; Competence = 0.8258 Slightly lower; LVIR doesn’t add input

difficulty

(81) 57

Training ANNs – IRs [2] Whole Co-occur-matrix (dist=1,2,3);

SIR&SOR

Found better Strength & Competence! Knight’s-Move matrix adds relevant information

Whole matrix (dist=1,2,3); NIR&SOR 21% less connections due to NIR

Better than standard NIR, but still low

(81) 58

Training ANNs – IRs [3] 3x3 matrices (dist=1,2,3) ; NIR&SOR

Low but ~20% better than previous (whole matrix) NIR

3x3 matrices (dist=1,2,3) ; LVIR\NLVIR Both matrices and board views use 3x3 windows

No improvement; Huge number of Ns not necessary

(81) 59

Training ANNs – IRs Summary

(81) 60

Training ANNs – IRs Summary

Trained ANNS better against JaGo compared

to Naïve Although JaGo is better

Some over-fitting for good players

Against Naïve outputs close to zero – no repsonse

NIR ANNs generally weaker than SIR

Manhattan distance of 2 good against

Random

IR + whole matrix (dist=2) was strongest

RANN is still best; Selected for evolution

(81) 61

Outline

Computers and Games

The game of Go

Experimental Setup



Summary and Outlook

(81) 62

Evolving Go ANNs

Setup of Evolution experiments

Evolution of ANNs against Computer

Players Random Player; Naïve; JaGo

Recurrent against JaGo

Coevolution Cultural

Hall of Fame

Training Evolved ANNs

(81) 63

Evolution Setup 5x5 boards; Komi of 5.5

50 Individuals Described previously (3 chromosomes)

GMLPs with SIR and SOR Max 3,010 connections

Recurrent ANNs Using NIR (25 Ns) and SOR (26)

Max 2,601 connections

Same strength measure as training (6k

games)

(81) 64

Evolution Against Random Empirically 64 games to determine fitness Best ANN evolved {Str=0.4005;

Comp=0.48} After 47 gens; 929 connections

Evolved ANNs hardly reacted to different positions Always in the middle; Never in corners –

creates eyes Unnecessary to “think” against Random

Occasionally Random places at strategic intersection and then usually wins

Only 3 of 20 best ANNs open at optimal C3

(81) 65

Evolution Against Naive Better player; ANNs develop better strategies Same setting 200 gens for ALL population to win ½ of

games – fast learning Best {Str=0.69; Comp=0.487} after 2915

gens High strength and only 10 hidden !! Win rates

Same against Naïve and Random Low against JaGo (~0.2)

25% use optimal opening move (still low) Exploit Naïve’s weaknesses at endgames

(81) 66

Evolution Against JaGo Far stronger than Naïve (85% wins) Takes significantly more time for each move

Used distributed computing 64 games would take 32 hours per run Only 32 games for fitness - empirically sufficient

Best {Str=0.772; Comp=0.476} after 1909 gens Scores 100% wins 1k gens to score 0.4; In 4 runs 100% wins in 3k gens!!!

Sd twice as large – harder for evolution Weak against Naïve ~0.4;Strong against

Random

(81) 67

Evolution Against JaGo Again, low competence ~0.5 Evolved strategies

Still connecting stones but faster (responsive) Tenuki (abandon & play elsewhere) to distract

JaGo 9 open optimally; All in 3x3 area around center Strength depends heavily on opening move Mid games sometimes show standard Go

sequences! Take advantage of JaGo’s weakness – capturing

weak stones

(81) 68

Recurrent Nets Evolution Natural representation on Go board

Input are connected

More time consuming Only 2 runs; 32 games; setting described

previously

100% win rate within 1k generations!!! Both nets open at C3 Strategies

1 aggressive;1 distractive Protect; Create living groups; Bad Endgames

Very high relative strength 0.94 Random; 0.49 Naïve (never played before)

(81) 69

Cultural Coevolution Until now much over-fitting was observed

Fitness 8 games against all masters (4 each color)

Few because games are quite similar

Results of typical run – host population 3,500 gens

90% wins at 500 gens

Stagnation around 1k

Last master added at 462

After 2k Mean fitness decreases

(81) 70

Cultural Coevolution [2] Masters

21 ANNs

After number 8 all have R>0.8

Last obtained Strength of 0.365

Strategy (both populations) Many random move selection

Due to many saturated Ns (output=1)

Games usually similar but multiple random

moves are hard to defeat

May be cause by mutation (Multiple-Self

Adaption)

(81) 71

Cultural Coevolution [3] Strategy (cont.)

Coevolution found easy solution Computer players are very difficult to beat with

saturated neurons

New extremely long experiment (60k gens!) was performed with different mutation (single-SA) Similar results, Except: Now most culture growth until gen 10k (last at

40k) Now less saturated Neurons Less fitness decrease despite increasing culture

Strength

(81) 72

Cultural Coevolution [4] Culture Summary

80 members

After #16 Random>0.94

After #29 all opened optimally

After #57 all Strength>0.4

Wins against JaGo ~0.5 Naïve

~15 hidden Ns – fluctuate between successive

(81) 73

Recurrent & Cultural 10k gens

Faster learning but basically same results R>0.9 at C11 (compared to C14)

N>0.2 at 14 (compared to C37)

Strategy Still bad against JaGo

Bad openings! (only 2% optimal)

Only last 5 masters close to center

Learned not to capture dead groups

(81) 74

Hall of Fame Coevolution Compared to Cultural

Parameters Important parameter is HoF size={1,2,4,8,16}

Eight games against each master

3k gens were coevolved

After coevolution all HoF ANNs were

evaluated

Every 100 gens the best ANN was evaluated

(81) 75

Hall of Fame Coevolution [2]

Results – HoF size 1 Masters – low strength of 0.3625

In gen 1k – one ANN had 0.4 Lost solution

HoF changed every generation cycles

Results – HoF size 16 Master 5 – highest strength of 0.4403 in gen

400

Strength of 0.5057 was obtained and lost

One master was replaced in every generation!

Somehow weak masters remained in the HoF

Host population stagnates (cycles)

(81) 76

Hall of Fame Coevolution [3]

Strategies All place first stone at D4!

HoF coevolution does not encourage diversity

among ANNs

(81) 77

Training Evolved ANNs Evolution against JaGo –

Strength ~0.77

4-16 hidden Ns

Training Strength ~0.3

100 hidden Ns

Check whether evolved structure is good Train after evolution

Train without evolution only using structure

(81) 78

Training Evolved ANNs [2] Used best 2 evolved ANNs against JaGo

Taken from runs 11 & 17

ANN11 – 10 hidden; 1178 connections

ANN17 – 14 hidden; 1162 connections

Trained with 200 games; 2,000 cycles

Experiment 1 (post-evolution) Results Bad! Strength of 0.11 and 0.10 –

Lower than any trained ANN (RANN has 0.29)

High competence 0.89

(81) 79

Training Evolved ANNs [3] Experiment 2 – keep only evolved structure

Strength below 0.152 (RANN is 0.29)

Weakest against JaGo (0.05) although trained

with JaGo

Against Naïve 0.11 (same as RANN)

Evolutions creates efficient structures Few hidden Ns

Difficult to learn with training

High competence due to they seldom

responded with same move to different

positions

(81) 80

Summary Training could not achieve high Go

playing skills

Evolved ANNs specialized in the opponent

which was used during evolution

Cultural coevolution generated strong

players Strength increasing throughout the process

Perhaps an ANN stronger than amateurs can

be coevolved

Recurrent nets learned faster

(81) 81

Summary [2] 2 coevolved (recurrent and feed-forward)

won the grand tournament

Coevolution proved better than evolution

for developing Go strategies

Recurrent ANNs would provide a field for

further research More natural board representation

Could contain a fixed input layer representing

the board

Documents

1 of 81 Evolution and Coevolution of ANNs playing Go Peter Mayer, 2004