View
28
Download
0
Category
Tags:
Preview:
DESCRIPTION
Game Playing. Evolve a strategy for two-person zero-sum games. Help the user to determine the next move. Constructing a game tree Each node represents a state in the game Each arc represents a legal move The minimax algorithm Alpha-beta pruning. Example: Minimax Algorithm. Game Tree: - PowerPoint PPT Presentation
Citation preview
Genetic ProgrammingGame Playing
Evolve a strategy for two-person zero-sum games.
Help the user to determine the next move. Constructing a game tree
Each node represents a state in the game Each arc represents a legal move
The minimax algorithm Alpha-beta pruning
Genetic ProgrammingExample: Minimax Algorithm
Game Tree: We want to maximize player X’ score. A value of 1 indicates a win for player X and a loss for player O. A value of 0 indicates a win for player O and a loss for player X.
1 0 0 0 1 0 0 1
X X X X
X
OO
10
0
1 1
1
1
Genetic ProgrammingHeuristics
Not viable to generate the entire game tree. Use of heuristics Example : Tic-Tac-Toe
Number of possible wins for X minus number of possible wins for O.
X X
O
8 – 5 = 3 4 – 5 = -1
Genetic ProgrammingExample: Minimax Algorithm
32 31 15 16 7 8 24 23
X X X X
X
OO
32 16
16
8 24
8
16
Genetic ProgrammingGame Tree
32 31 15 16 7 8 24 23
32 16 8 24
16 8
16
X X X X
X
OO
3 4 20 19 28 27 11 12
4 20 28 12
4 12
12
X X X X
X
OO
O
12
1 2 18 17 26 25 9 10
2 18 26 10
2 10
10
X X X X
X
OO
30 29 13 14 5 6 22 21
30 14 6 22
14 6
10
X X X X
X
OO
O
14
X 12
Genetic ProgrammingOperators
Terminals – Legal moves, i.e. left and right Functions: CXM1, CXM2, COM1, COM2
XM1: first move made by player X XM2: second move made by player X OM1: first move made by player O OM2: second move made by player O
CXM1
Arg1 Arg2 Arg3
XM1=UXM1=RXM1=L
CXM2
Arg1 Arg2 Arg3
XM2=UXM2=RXM2=L
COM1
Arg1 Arg2 Arg3
OM1=UOM1=ROM1=L
COM2
Arg1 Arg2 Arg3
OM2=UOM2=ROM2=L
Genetic ProgrammingFitness Cases
Consists of the possible combinations of L and R for the moves that O can make.
Format: XM1, OM1, XM2, OM2
32 31 15 16 7 8 24 23
32 16 8 24
16 8
16
X X X X
X
OO
3 4 20 19 28 27 11 12
4 20 28 12
4 12
12
X X X X
X
OO
O
12
1 2 18 17 26 25 9 10
2 18 26 10
2 10
10
X X X X
X
OO
30 29 13 14 5 6 22 21
30 14 6 22
14 6
10
X X X X
X
OO
O
14
X 12
LLLL
LRRR
LLLR
LRRL
Genetic ProgrammingEvaluation
The raw fitness of an individual is the sum of the payoffs for each fitness case.
The hits ratio is the number of fitness cases for which the individual receives a payoff at least as good as the minimax strategy.
What is the raw fitness and hits ratio of the following individuals?
COM1
L L L
COM1
L L R
Genetic ProgrammingGP Parameters
Population size: 500 Max. no. of Generations: 51 Initial Population Generation: The ramped half-
and-half method with an initial tree depth of six and a depth limit of seventeen on the size of trees created by the genetic operators.
Method of Selection: Fitness proportionate selection
Genetic ProgrammingEvolved Solution
com2
com1
cxm1
com1
cxm2
cxm1
cxm1L
R
L L L
L L
L
LL
R
R
R
R R
L
R
Genetic ProgrammingSimplified Solution
com2
com1
RL
L R
L
Genetic ProgrammingPursuer - Evader
P(0,0)
E(x,y)
Genetic ProgrammingGame Parameters
The payoff for the pursuer is the time it takes to catch the evader .
The payoff of the evader is the time it remains free.
The information available at each stage of the game is the position of the pursuer and the evader.
A game-playing strategy will specify the angle at which the pursuer must move in order to catch the evader.
Genetic ProgrammingTerminals and Functions
T={ X, Y , R } X - x-coordinate of the position of the evader Y – Y-coordinate of the position of the evader R – ephemeral constant in the range [-1, 1]
F={ +, -, /, EXP, IFLTZ} EXP – the exponential function IFLTZ – evaluates its first argument if its second
argument is less than zero else it evaluates its third arguments
Genetic ProgrammingEvaluation
This fitness cases consists of 20 different positions of the evader on the plane, i.e. a set of (X, Y) coordinate values.
The raw fitness of an individual is average time required to catch the evader over the 20 fitness cases.
An upper limit is set on the maximum time permitted. The hits ratio is the number of fitness cases for which this time limit is not exceeded.
Recommended