Artificial Intelligence in Games CA107 Topics in Computing Dr. David Sinclair School of Computer Applications [email protected]

Artificial Intelligence in Games

CA107 Topics in Computing

Dr. David Sinclair

School of Computer Applications

[email protected]

What is Artificial Intelligence?

• There are many answers but the simplest definition is:A field of research whose goal is to make “machines” that do things that require intelligence if done by a human.Intelligence is the ability to learn and understand, to solve problems and make decisions.

Why add A.I. to games?

• From the A.I. “side of the house”– Games are excellent testbeds as they:

• Have well-defined rules generating a large search space

• Easily represented in a computer• “easy” to test

• From the Games “side of the house”– A.I. can make the game much more enjoyable

to play.

Search

• The brute force approach of search has been highly effective in games such as Draughts and Chess.– Draughts/Checkers

• Chinook (World Champion)

– Chess• Best programs can hold their own with the best humans.

• Deep Blue II– move generation and evaluation in hardware

– parallel search in software

Total Search

• From the starting position1. Generate every legal move for player 1.

2. For each legal move of player 1 generate every legal move for player 2.

3. Repeat steps 1 & 2 until the game reaches a definitive result.

Problem with Total Search

• Not practical– A player in chess has, on average, 36 legal moves.

– A game could take 45 moves to reach a conclusion (underestimate).

– Total number of positions = 3690

– There is only ~1081 atoms in the universe

• Couldn’t store all the positions in computer the size of the universe.

Evaluation Functions

• Searching from a position to a definitive result is not practical.

• Generate all possible outcomes in a fixed number of moves from a position.– Builds a game tree

• For each terminal position in the game tree calculate the likelihood that the terminal position will result in a win, loss or draw for the player moving.

Searching the Game Tree

-3 20 4 -5 -3-1 -4-2 0 1

MAX

MIN

MAX

-2 4 -5 2 -3 1

-2 -5 -3

-2

This is the Minimax Algorithm

Improving Minimax

• The Minimax Algorithm has various improvements that are used in paractice.– Alpha-Beta– Principle Variation Search (PVS)– Transposition Tables– Killer Move Heuristics

• At best they can halve the work of the search.

Computer Chess

• Deep Blue II– 256 dedicated chess processors

• generate moves• evaluate positions

– Search process in software (PVS)– Database of opening sequences– Databases of endgame sequences

• Deep Blue II can evaluate 200 million possitions per second (3 billion in 3 minutes).

• Deep Blue II can hold its own with the best players in the world, but it is not invincible!

Learning

• Backgammon is a very interesting testing ground for computer game playing for two reasons:– the stochastic nature of the game; and– the experience that an accurate evaluation of a position

is far more effective than a deep search.

• Backgammon (TD-Gammon)– In the top 10 (~6th)– One of the top human players says it has a better

evaluation capability than him.– Has changed the way humans play backgammon.

Learning how to evaluate a position

• Evaluates positions with a neural network that has trained itself by playing over 200,000 games against itself.– From an initial state of knowing the rules and

zero strategical/tactical knowlegde• network learned a number of elementary strategies

and tactics during the first few thousand training games against itself .

• After several tens of thousands of training games more sophisticated concepts began to emerge.

Learning how to evaluate a position (contd.)

• After 200,000 training games with a basic board encoding the network was as strong as its predecessor NeuroGammon.

• NeuroGammon was trained on a corpus of expert games and used a sophisticated board encoding.

• When TD‑Gammon was retrained using NeuroGammon’s board encoding, TD‑Gammon reached the level of strong master play.

Deep Anchors:TD-Gammon’s influence on humans

Move Estimate Rollout

8-4*,8-4,11-7,11-7 +0.184 +0.139

What should white play when he rolls a double 4?

8-4*,8-4,21-17,21-17 +0.238 +0.221

Opponent Modeling - Poker

• Poker differs from games such as Chess and Draughts in two major respects. – it is a game of imperfect information– the game-theoretic optimal strategy does work as well

as a maximising strategy in practice

• An essential element is bluffing (betting to give the impression that a bad hand is good) and sandbagging (betting to give the impression that a good hand is bad)– To do this you need to model your opponent!

Properties of a World Class Poker program

• Hand Strength Assessment

• Hand Potential Assessment

• Betting Strategy

• Bluffing

• Unpredictability

• Opponent Modelling

Loki

• Play Texas Hold’em– Pre‑flop: Each player is dealt two cards face down,

followed by the first round of betting.– Flop: Three community cards are dealt face up and a

second round of betting occurs.– Turn: A fourth community card is dealt face up and the

third round of betting occurs.– River: A final fifth community card is dealt face up and

the final round of betting occurs.

• There are 1326 possible combinations from the initial two cards.

Hand Strength Assessment in Loki

• Loki played a million hands to calculate the approximate income rate from each starting hand.

• After the pre‑flop, there are 47 remaining unknown cards and 1081 possible hands an opponent might hold. We can calculate how many of these hands, combined with the community cards, will lose to our hand, tie with our hand or be beaten by our hand.– For example, if our hand is A-Q and the flop is 3-

4-J then 444 cases would beat us, 9 would tie and the remaining 628 cases would lose to our hand. Therefore our hand strength is 0.585 (58.5%).

Opponent Modeling in Loki

• For each of the possible 1081 combinations of hole cards an opponent may have a weight is assigned to it.– These weights are determined by the 169 distinct

income rates determined by simulation.

• There are 36 possible classes of opponent actions depending on:– their action (fold, call/check, bet/raise),– how much the action costs (bets of 0, 1, >1) and– when the action occurred (pre‑flop, flop, turn, river).

Opponent Modeling in Loki (contd.)

• Each action modifies the probabilities for each of the possible 1081 combinations of hole cards an opponent may have.

• We can make the opponent models interact so that if one player’s hand has a very high probability of containing an A, then we can reduce the weights on other player’s hands that also contain the A.

Intelligent Opponents?

• A simple way to make an opponent appear intelligent is to use a stochastic state machine.– Stochastic random element– State machine is a program that is in a definite

state and only moves from state to state depending on how it is interacted with.

Example (loosely based on Civilisation II)

• There are a collection of civilisations competing for shared resources. A civilisation behaviour to another civilisation is influenced by:– The goodwill [0...100] between the two civilisations;

and – The expected/actual gain [-100...100].

• An action will modify the goodwill: Give technological gift +25 Break a treaty with another -20

Enter treaty +20 Break a cease-fire -30

Keep cease-fire +10 Encroach on territory -10

Assist this civilisation +40 Attack this civilisation -20

State machine

ALLIEDtreaty = 0.8assist = 0.8

COOPERATIVEtreaty = 0.4assist = 0.2

Neutraltreaty = 0.1attack = 0.1assist = 0.05

AGGRESIVEattack = 0.6

ceasefire = 0.3peace = 0.8

HOSTILEattack = 0.95

ceasefire = 0.05

goodwill > 30 or gain > -50

gain >10

goodwill > 45

30 < goodwill < 45or gain > 60

goodwill < 50 and gain > 90

goodwill < 85

goodwill < 65goodwill > 65

goodwill < 65

goodwill > 85

goodwill > 85

Go• Go represents the biggest challenge yet to the

application of A.I. in games.

• None of the existing techniques has proved sucessful.

• Go will require a combinations of techniques.– Pattern matching

– Search (Forward prunning and focusing)

– Planning

– Resolving threats and plans

Go – the game

• Go is played on a 19x19 grid made of horizontal and vertical lines. Each player, black and white, place stones on the intersection points of the grid. Once a stone is placed it cannot be moved, unless it is captured.

• Each stone or set of vertically and/or horizontally connected stones has a set of liberty points. These are the vertically and horizontally unoccupied adjacent grid points.

Go – the game (contd.)• To capture a group of stones all you need do is

reduce the group’s number of liberty points to zero.

• There are 2 restrictions on placing stones on the board.– The first is that you cannot place a stone on a point that

would result in it having no liberties. This is called suicide.

– The second is that you cannot immediately play a stone on a point that has just been captured. You must play the stone elsewhere on the board on the move immediately after the capture. Then you can return to the capture point.

Capture and liberties

a

If white plays a stone at the point a then the three black stones will be captured and removed from the board.

Eyes

a b

This black group of stones can never be captured since white would have to remove both the liberties at the a and b point at the same time. But white can only play one stone at a time, and white cannot play into a or b as this is suicide.

The result

• At each turn a player has the option of placing a stone on the board or passing (skipping their move).

• The game continues with both players placing stones on the board until both players pass consecutively.

• The result is determined by each player adding up the territory they control plus the number of the opponents stones they have captured. Each territory point controlled and each stone captured are worth one point.

Go – the great challenge

• Huge branching factor (~180 for Go, ~36 for Chess)

• evaluation of a position– in Chess there is a good correlation between the strength

of position and the number and quality of pieces.– in Go there is a poor correlation between the strength of

the position and the number of stones and territory surrounded.

• Best Go program is standard of average player (despite a $1 million prize).

Documents

Artificial Intelligence in Games CA107 Topics in Computing Dr. David Sinclair School of Computer Applications [email protected]