View
213
Download
0
Embed Size (px)
Citation preview
1 of 81
Evolution and Coevolution of ANNs playing Go
Peter Mayer, 2004
(81) 2
Outline
Computers and Games
The game of Go
Experimental Setup
Training of Go playing ANNs
Evolution of Go Playing ANNs
Summary and Outlook
(81) 3
Games
Algorithms designed since AIs onset Clearly defined rules
Still complex
Chess received the most attention More researched than Go
Two main approaches Rely on expertise – directly programmed
weighted features; Extensive knowledge
Use evolution – less knowledge; more
versatility
(81) 4
The game of Go
Oldest (unaltered)
strategic board game in
the world
10,000,000 players in
Japan “alone”
Fairly simple rules
BUT difficult to master Immense tree (~200 opts)
Complex structures
Many concurrent goals
(81) 5
Go Rules
19x19 board Empty in the beginning
Black & White “stones”
Black starts
Each turn Place 1 stone
At an intersection
Never move stones
OR pass
(81) 6
Go Rules [2] Objective - Get the most points !
Points are acquired by: Securing Territories
Capturing opp’s
pieces
(81) 7
Go Rules [3] Stones at a vertically or horizontally
adjacent intersection are called a group An empty intersection adjacent to a stone
or group is called a "liberty" of that group 1 Liberty = group in “atari” No liberties -> CAPTURE ! Group is
removed Example – Black places stone in X
resulting in right figure
(81) 8
Go Rules [4] Stones can be placed anywhere,
but cannot commit suicide (except
Chinese)
Legal if stone simultaneously
captures opponent’s group (2
right figures)
Suicide – white cannot place at
X
White CAN place at X
Result: capture
(81) 9
Go Rules [5]
Same position cannot occur more
than once
Endless repetitions: Black can capture at upper figure by
placing at X
White - same by placing at Y
Black – repeat…
Ko rule White may not place at Y before
playing somewhere else first
Avoid any repetitions
(81) 10
Go Rules – Live and Dead groups “Dead” groups if impossible to prevent
capture It is not necessary to do so
Group remains on board
At end of game, removed and added to captured
stones
“Living” groups are impossible to capture Group with 2 “eyes” – even if white surrounds it,
playing at X or Y is suicide
Opponent must play elsewhere
(81) 11
Go Basics – End game
Play continues until both players pass
Players then alternatively play stones at
“neutral” points – adjacent to both White and
Black Also known as “dame” (DAH-MAY)
Dead stones are removed from the board and
counted with other prisoners (1 point per
prisoner)
Also - 1 point for each intersection surrounded
by player’s stones (“territory”)
(81) 12
Go Basics – End game example
Prisoners were removed already
All 4 points marked X are dame – worthless
Black has 7 points in UR (territory); 2 points in LL
1 removed prisoner
TOTAL = 10 points
White has 5 in UL; 2 in LR
2 prisoners
TOTAL = 9 points
Black wins unless komi (5.5 pts compensation) is
due
(81) 13
Ranking and Handicaps Determine Go players’ strength Resemblance to martial arts Both amateur and professional ranking system Amateur
35 kyu to 1 kyu THEN 1 dan to 7 dan
Pro 1 dan to 9 dan Awarded only by Go institutions
Pro dans are much stronger than amateur dans
(81) 14
Ranking and Handicaps (2) Handicaps
Weaker player starts with several stones on the board Placed at specific places Helps make games more even
Difference in ranks ~ number of handicap stones needed to win
2 stones to even 2 dan against 4 dan 4 to even 3 kyu and 2 dan
The most powerful Go programs reach only … … 10 kyu!
(81) 15
Outline
Computers and Games
The game of Go
Experimental Setup
Training of Go playing ANNs
Evolution of Go Playing ANNs
Summary and Outlook
(81) 16
Experimental Setup
Opponent Go players
ANN player
Go board (input) representations
Move (output) representations
Coevolution Hall of Fame coevolution
Cultural coevolution
General evolution setup
(81) 17
Go Players - Random
No strategy
Pass move also
“Knows” only the rules of go legality of moves
Usually weakest opponent
(81) 18
Go Players – Naïve Player
Roughly human-beginner level
Able to save and capture stones
Knows about Lost stones
Saving - connecting stones to living groups
Weak stones (not savable)
(81) 19
Go Players – Naïve Strategy A subset of JaGo’s (main opponent)
strategy Outline (arranged by priority):
Attempt to save Try to put opponent into atari Connect weak stones Capture opponent groups in atari Check intersections for placing stones
In random order Make sure no (own) liberties decrease below 2 as a
result
Perform Random move
(81) 20
Go Players – JaGo Player Java based program
Best computer player used Not a strong player ~16 kyu
Knows standard techniques Mainly save & capture
Uses pattern matching Looks at entire board
32 patterns, with rotations and mirrors
(81) 21
Go Players – JaGo Strategy (1) Save stones in atari
Try to decrease liberties of large groups
Find own savable larger groups
Attack opponent’s groups (decreasing
order:) With 2 or more liberties and attackable
With 2 or more stones & less than 3 liberties
With 2 or less liberties
(81) 22
Go Players – JaGo Strategy (2)
Save own groups with few liberties if savable Start pattern matching – Response; Center
Random move order
Seek opponent’s groups to capture in 2 moves
Perform random move which isn’t of a bad pattern
Capture opponent’s single liberties Connect own weak stones PASS
(81) 23
Go Players – JaGo Patterns (1)
(81) 24
Go Players – JaGo Patterns (2)
(81) 25
Go Players – GNU Go Advantages
5x5 to 19x19 boards Handles handicaps well Rated 10 kyu
Problems 5x5 solved – open an C3 for 18.5 points
(komi=5.5) – always wins in Black GNU Go passes on B3, C2-4, D3 (only correct
at C3) Premature convergence of evolution
(81) 26
ANN Player Inform ANN about actual position
Evaluate ANN output to receive next move
Representation is important!
Intention maps For each Go move (including PASS) – value
between [0,1]
High value – high intention to make move (and
v.v)
Select legal move with highest value
To avoid predictability – consider sub optimal
moves also (“creativity factor”)
(81) 27
Player Strength Commonly to receive a rating unrated
Go players play against rated players
(same in Chess)
The strength s of a player is determined
by The score of 1000 double games
Against each of 3 opponents: R, N, JaGo
Divided by the number of games (6,000)
1 is perfect strength
3 opponents help resist over-fitting
(81) 28
Player Competence Strength is not understanding of rules
(legality) E.g. 2 players receive same score but only one
always tried legal moves first
The competence C of a player is defined as follows:
bi = games; i = moves; tij = #tried illegal moves; kij = #possible illegal move
C is the averaged on all games
(81) 29
Board Representations 19x19 boards
far too large
Even for evolved agents
Use only 5x5 boards
(81) 30
Board Representations Should preprocess position to make
ANNs life easier
Tested in training experiments
Standard Input Representation (SIR) 2 neurons at each intersection :-
1 per player’s piece; 1 per opponent’s
No distinction between B and W stones
Optional – 1 neuron to tell if B or W
(2*b^2) neurons (were b is board size) = 50
(81) 31
Representations - NIR Naïve Input Representation
More compact
1 neuron per intersection
Set to -1 (player’s stone) or 1
(opponent’s)
0 if empty
Uses half of SIRs neurons = 25
(81) 32
Representations - LVIR Limited View Input Representation Splits the Go board into several
quadratic areas of size 3x3 Idea – simplest way of capturing stones
works within this area E.g. capture of 1 stone by surrounding it
Areas overlap at middle row and middle column
Coding – similar to SIR w is number of areas (=4) 72 Neurons Could also be Naïve
(81) 33
Clever Representations Based on image processing and circuits
We want less raw inputs to allow ANN to
concentrate more on features
Manhattan distance Used in integrated circuits where wires run
parallel to X or Y axis
Got its name from Manhattan NY, where
streets are aligned in grid
P1 = (x1, x2)
P2 = (y1, y2)
(81) 34
Clever Representations Manhattan distance is related to distance
of Go stones (no diagonals) distance = [#(separating stones) – 1]
1 if next to each other
2 if separated by one stone
3 for knight’s move or two separating stones
(81) 35
Representations: c-o-Matrix
Co-occurrence-matrices
Used in image processing
Many parameters are derived from it Mean, Sd, energy, contrast,
homogeneity, …
Quadratic
Based on a relation p between image
positions (symmetric if p is)
(81) 36
Representations: c-o-Matrix
Elements C[i][j] = Number of times pixels occur in an
image of a specified value (color)
In the relation specified by p
Relative to other pixels
Size is number of different colors
(81) 37
Representations: c-o-Matrix
An actual go board is an “image” with
3 different colors (including empty)
Example p1: Manhattan distance of 1 between 2
points First matrix row: B near B 16 times B near W 3 times B near empty 11 times
(81) 38
Representations: c-o-Matrix
Does not say much about absolute positions – must combine
SIR and C for whole board NIR and C for whole board NIR and Cs for 3x3 areas
sLVIR and Cs for 3x3 areas
NLVIR and Cs for 3x3 areas
(81) 39
Output Representations Only 2
Standard Output Representation
(SOR) Each intersection is represented by 1
neuron
1 for PASS
(b^2 + 1) neurons
(81) 40
Output Representations Row Column Output Representation
(RCOR) Used to decrease ANN size
5 neurons for columns; 5 for rows
1 for PASS
(2b + 1) neurons
Intention more complicated:
PASS intention is square of relevant neuron
RCOR Limits intention map: v1>v2 y1>y2 v4>v3
All values positive, non-zero
(81) 41
Coevolution Derives non-static fitness, as in nature
1 or more populations; interacting
Competitive [battle] vs. Cooperative
[subtasks]
Advantages “Who needs enemies when you got friends like
these?” – saves finding opponents; Especially
in Go where no strong program exists
Variety in fitness – adaptive opponents
No upper bound for improvement
(81) 42
Coevolution Methods Applied
Based on work by Lubberts &
Mikkulainen [2001]
Hall of Fame Host population and Master population
Maintaining the ability of host population to
beat opponents of previous generations
Each generation, the best individual is added
to HoF
All population competes against sample of
the HoF
(81) 43
Coevolution - HoF Applied in this resaearch
HoF initially filled without competition
Individuals get their fitness by competing
against the masters
When full - host with highest win rates
(against masters) joins HoF Replace first Master to lose all games
Coevolutionary progress cannot be
directly seen Both populations constantly changeing
(81) 44
Cultural Coevolution A new approach!
Maintains “culture” of masters resembling
HoF
To enter culture, host must defeat all
masters Masters never replaced – unlimited culture size
Every individual receives a fitness score by
competing against all masters
Culture growth rate decreases rapidly Every new master is the strongest found (yet)
(81) 45
Cultural Coevolution [2] Numerous advantages Maintains ability to defeat weak players
Keeps good solutions found
Same player cannot enter twice Needs to defeat itself
Culture’s performance never decreases Avoid focusing on a specific player’s
weakness As soon as any master is immune, the hosts
have to find another way More masters less likely to remember all
weaknesses
(81) 46
General Evolution Setup Opponents – Random; Naïve; JaGo Fitness = strength
Rate of wins against all 3 opponents 6,000 games of both colors
Not using scores, only win rates Defeating more opponents is better
Generalized Multi-Layer Perceptrons (GMLPs) All non-loop connections are permitted
Evolving Hidden neurons; connections; weights; bias (for
non-input)
(81) 47
General Evolution Setup [2] 2 binary Chromosomes used
1 for connections : 0-no 1-yes 1 for hidden neurons (if 0, no connections also) Number of possible connections:
ni, nh, no – number of input, hidden and output neurons
Determines size of chromosome
Real-Chromosome Weights & Bias values (seen as weights) Size is number of connections + number of bias
vals (for non-input)
(81) 48
General Evolution Setup [3] Tournament selection (size 2) 2 point crossover Binary mutation
Flip bits with 1/L probability
Real-Chromosome Mutation multiple-σSA Each object maintains altering “strategy”
params which alter distribution of “object” params
Normal distributions used for both
(81) 49
Setup – Recurrent Nets Difficult to learn Go without structured input Experiments with recurrent nets included Allow loops for input Ns
Naturally represent adjacent board intersections
No hidden Ns Played against JaGo Typically output changes without input
change due to feedback loops Computed output only once! Only 2 directly connected Ns influence each other Evolutions should connect only close Ns
(81) 50
Outline
Computers and Games
The game of Go
Experimental Setup
Training of Go playing ANNs
Evolution of Go Playing ANNs
Summary and Outlook
(81) 51
Training ANNs – Setup Testing IRs mentioned previously No Go-specific knowledge used Each experiment was repeated 20 times Nets, same as Richards [1998]
3 layers; Fully connected; Feed forward Linear activation for input Ns; Sigmoid for rest 50 input; 26 output; 100 hidden - 7600
connections
Patterns: JaGo vs Jago; 5x5 board;
Rprop – resilient variant of Backprop
(81) 52
Training ANNs – Experiment 1
Determine number of training cycles Too few cycles Weights not adjusted properly Too many over-fitting
Determine training pattern set Limit the level a Go player can reach Should include all 3 game stages Both expert and novice moves
JaGo vs JaGo All game stages No distinction between winner and loser moves
1,000 .. 5,000 Cycles; 50/100/200 Games
(81) 53
Training ANNs – Results 1
Average of 20 runs 100&200 games better than 50 3000\5000 games don’t add strength Best – 200 games; 2000 cycles
Used hereafter
(81) 54
Training ANNs – Experiment 2
Determine number of hidden Ns Many
Diverse features
Few Few stronger features (perhaps better 1s) Less time-consuming
100 Ns yielded best results selected
(81) 55
Training ANNs – Experiment 3
Output representations Standard (SOR) vs Row-Column (RCOR) 200 patterns; 2000 games; 100 hidden Ns
Similar strength; RCOR competence slightly lower
RCOR still expansive and adds constraints SOR is used in the following experiments
(81) 56
Training ANNs – IR Experiments
Various input representations Used reference-ANN (RANN)
SIR & SOR; 100 hidden; 7,600 connections Strength = 0.2908; Competence = 0.8467
2,000 games; 200 cycles NIR (half input size) & SOR
Strength = 0.2093; Competence = 0.8031 Naïve input makes it difficult to learn Go
LVIR (3x3 windows) & SOR Strength = 0.2755; Competence = 0.8258 Slightly lower; LVIR doesn’t add input
difficulty
(81) 57
Training ANNs – IRs [2] Whole Co-occur-matrix (dist=1,2,3);
SIR&SOR
Found better Strength & Competence! Knight’s-Move matrix adds relevant information
Whole matrix (dist=1,2,3); NIR&SOR 21% less connections due to NIR
Better than standard NIR, but still low
(81) 58
Training ANNs – IRs [3] 3x3 matrices (dist=1,2,3) ; NIR&SOR
Low but ~20% better than previous (whole matrix) NIR
3x3 matrices (dist=1,2,3) ; LVIR\NLVIR Both matrices and board views use 3x3 windows
No improvement; Huge number of Ns not necessary
(81) 59
Training ANNs – IRs Summary
(81) 60
Training ANNs – IRs Summary
Trained ANNS better against JaGo compared
to Naïve Although JaGo is better
Some over-fitting for good players
Against Naïve outputs close to zero – no repsonse
NIR ANNs generally weaker than SIR
Manhattan distance of 2 good against
Random
IR + whole matrix (dist=2) was strongest
RANN is still best; Selected for evolution
(81) 61
Outline
Computers and Games
The game of Go
Experimental Setup
Training of Go playing ANNs
Evolution of Go Playing ANNs
Summary and Outlook
(81) 62
Evolving Go ANNs
Setup of Evolution experiments
Evolution of ANNs against Computer
Players Random Player; Naïve; JaGo
Recurrent against JaGo
Coevolution Cultural
Hall of Fame
Training Evolved ANNs
(81) 63
Evolution Setup 5x5 boards; Komi of 5.5
50 Individuals Described previously (3 chromosomes)
GMLPs with SIR and SOR Max 3,010 connections
Recurrent ANNs Using NIR (25 Ns) and SOR (26)
Max 2,601 connections
Same strength measure as training (6k
games)
(81) 64
Evolution Against Random Empirically 64 games to determine fitness Best ANN evolved {Str=0.4005;
Comp=0.48} After 47 gens; 929 connections
Evolved ANNs hardly reacted to different positions Always in the middle; Never in corners –
creates eyes Unnecessary to “think” against Random
Occasionally Random places at strategic intersection and then usually wins
Only 3 of 20 best ANNs open at optimal C3
(81) 65
Evolution Against Naive Better player; ANNs develop better strategies Same setting 200 gens for ALL population to win ½ of
games – fast learning Best {Str=0.69; Comp=0.487} after 2915
gens High strength and only 10 hidden !! Win rates
Same against Naïve and Random Low against JaGo (~0.2)
25% use optimal opening move (still low) Exploit Naïve’s weaknesses at endgames
(81) 66
Evolution Against JaGo Far stronger than Naïve (85% wins) Takes significantly more time for each move
Used distributed computing 64 games would take 32 hours per run Only 32 games for fitness - empirically sufficient
Best {Str=0.772; Comp=0.476} after 1909 gens Scores 100% wins 1k gens to score 0.4; In 4 runs 100% wins in 3k gens!!!
Sd twice as large – harder for evolution Weak against Naïve ~0.4;Strong against
Random
(81) 67
Evolution Against JaGo Again, low competence ~0.5 Evolved strategies
Still connecting stones but faster (responsive) Tenuki (abandon & play elsewhere) to distract
JaGo 9 open optimally; All in 3x3 area around center Strength depends heavily on opening move Mid games sometimes show standard Go
sequences! Take advantage of JaGo’s weakness – capturing
weak stones
(81) 68
Recurrent Nets Evolution Natural representation on Go board
Input are connected
More time consuming Only 2 runs; 32 games; setting described
previously
100% win rate within 1k generations!!! Both nets open at C3 Strategies
1 aggressive;1 distractive Protect; Create living groups; Bad Endgames
Very high relative strength 0.94 Random; 0.49 Naïve (never played before)
(81) 69
Cultural Coevolution Until now much over-fitting was observed
Fitness 8 games against all masters (4 each color)
Few because games are quite similar
Results of typical run – host population 3,500 gens
90% wins at 500 gens
Stagnation around 1k
Last master added at 462
After 2k Mean fitness decreases
(81) 70
Cultural Coevolution [2] Masters
21 ANNs
After number 8 all have R>0.8
Last obtained Strength of 0.365
Strategy (both populations) Many random move selection
Due to many saturated Ns (output=1)
Games usually similar but multiple random
moves are hard to defeat
May be cause by mutation (Multiple-Self
Adaption)
(81) 71
Cultural Coevolution [3] Strategy (cont.)
Coevolution found easy solution Computer players are very difficult to beat with
saturated neurons
New extremely long experiment (60k gens!) was performed with different mutation (single-SA) Similar results, Except: Now most culture growth until gen 10k (last at
40k) Now less saturated Neurons Less fitness decrease despite increasing culture
Strength
(81) 72
Cultural Coevolution [4] Culture Summary
80 members
After #16 Random>0.94
After #29 all opened optimally
After #57 all Strength>0.4
Wins against JaGo ~0.5 Naïve
~15 hidden Ns – fluctuate between successive
(81) 73
Recurrent & Cultural 10k gens
Faster learning but basically same results R>0.9 at C11 (compared to C14)
N>0.2 at 14 (compared to C37)
Strategy Still bad against JaGo
Bad openings! (only 2% optimal)
Only last 5 masters close to center
Learned not to capture dead groups
(81) 74
Hall of Fame Coevolution Compared to Cultural
Parameters Important parameter is HoF size={1,2,4,8,16}
Eight games against each master
3k gens were coevolved
After coevolution all HoF ANNs were
evaluated
Every 100 gens the best ANN was evaluated
(81) 75
Hall of Fame Coevolution [2]
Results – HoF size 1 Masters – low strength of 0.3625
In gen 1k – one ANN had 0.4 Lost solution
HoF changed every generation cycles
Results – HoF size 16 Master 5 – highest strength of 0.4403 in gen
400
Strength of 0.5057 was obtained and lost
One master was replaced in every generation!
Somehow weak masters remained in the HoF
Host population stagnates (cycles)
(81) 76
Hall of Fame Coevolution [3]
Strategies All place first stone at D4!
HoF coevolution does not encourage diversity
among ANNs
(81) 77
Training Evolved ANNs Evolution against JaGo –
Strength ~0.77
4-16 hidden Ns
Training Strength ~0.3
100 hidden Ns
Check whether evolved structure is good Train after evolution
Train without evolution only using structure
(81) 78
Training Evolved ANNs [2] Used best 2 evolved ANNs against JaGo
Taken from runs 11 & 17
ANN11 – 10 hidden; 1178 connections
ANN17 – 14 hidden; 1162 connections
Trained with 200 games; 2,000 cycles
Experiment 1 (post-evolution) Results Bad! Strength of 0.11 and 0.10 –
Lower than any trained ANN (RANN has 0.29)
High competence 0.89
(81) 79
Training Evolved ANNs [3] Experiment 2 – keep only evolved structure
Strength below 0.152 (RANN is 0.29)
Weakest against JaGo (0.05) although trained
with JaGo
Against Naïve 0.11 (same as RANN)
Evolutions creates efficient structures Few hidden Ns
Difficult to learn with training
High competence due to they seldom
responded with same move to different
positions
(81) 80
Summary Training could not achieve high Go
playing skills
Evolved ANNs specialized in the opponent
which was used during evolution
Cultural coevolution generated strong
players Strength increasing throughout the process
Perhaps an ANN stronger than amateurs can
be coevolved
Recurrent nets learned faster
(81) 81
Summary [2] 2 coevolved (recurrent and feed-forward)
won the grand tournament
Coevolution proved better than evolution
for developing Go strategies
Recurrent ANNs would provide a field for
further research More natural board representation
Could contain a fixed input layer representing
the board