45
To Control the Game in Early Time - A new approach to computer game play Zhang Qin December 11, 2006

To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

To Control the Game in Early Time− A new approach to computer game play

Zhang Qin

December 11, 2006

Page 2: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

“People assume that computers are just glorified calculating machines.

Not so. It’s true that computers are often used to do calculating because

they can calculate very quickly–but computer programs don’t have to have

anything to do with numbers. . . A computer is a universal machine and I

have proved how it can perform any task that can be described in symbols.

I would go further. It is my view that a computer can perform any task

that the human brain can carry out. Any task. . . ”

— Alan Turing [26]

1

Page 3: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Contents

1 Introduction 1

2 The Traditional Top-down Search Strategies 32.1 Minimal Tree and the Standard α− β Search . . . . . . . . . . . . . . 32.2 The Enhancements for α− β Search . . . . . . . . . . . . . . . . . . . 62.3 The Weaknesses of α− β Search and Alternatives Methods . . . . . . 102.4 A Test of α− β search and Its Enhancements . . . . . . . . . . . . . . 11

3 Control — A More General Concept 123.1 The Definition of Control . . . . . . . . . . . . . . . . . . . . . . . . . 143.2 The Problems with Passive Control . . . . . . . . . . . . . . . . . . . . 163.3 A New Passive Control Method . . . . . . . . . . . . . . . . . . . . . . 173.4 The Framework of a New Controlled-based Algorithm . . . . . . . . . 19

4 To Control Domineering in Early Time 214.1 The Definition of a Convergent Board . . . . . . . . . . . . . . . . . . 214.2 The Control Part . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.1 The Criterion of Hot Spots . . . . . . . . . . . . . . . . . . . . 224.2.2 The Field Superposition Method . . . . . . . . . . . . . . . . . 224.2.3 Other Assistant Considerations . . . . . . . . . . . . . . . . . . 244.2.4 The Local Control Part . . . . . . . . . . . . . . . . . . . . . . 24

4.3 The Evaluation Function . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 The Experiment and Improvement . . . . . . . . . . . . . . . . . . . . 25

5 Further Discussion 265.1 On Standard Game Theory . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Application to Go . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

A Introduction to the CGT System 29A.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29A.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30A.3 And the Other Half . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

A.3.1 Value Simplification . . . . . . . . . . . . . . . . . . . . . . . . 32A.3.2 Comparison and Addition . . . . . . . . . . . . . . . . . . . . . 34A.3.3 The Temperature Theory . . . . . . . . . . . . . . . . . . . . . 34

A.4 Game Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

List of Tables

1 α− β search and its enhancements . . . . . . . . . . . . . . . . . . . . 12

2

Page 4: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

List of Figures

1 Minimal Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 α− cut and β − cut . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 The Effects of α− β Pruning . . . . . . . . . . . . . . . . . . . . . . . 64 The Effects of PVS Pruning . . . . . . . . . . . . . . . . . . . . . . . . 85 The Spectrum of Control . . . . . . . . . . . . . . . . . . . . . . . . . 156 The Field Superposition Method . . . . . . . . . . . . . . . . . . . . . 237 Problem Encountered by Using Mixed Strategies . . . . . . . . . . . . 278 Some Configurations for Domineering . . . . . . . . . . . . . . . . . . 329 Thermograph Example: G = {2 | −1} . . . . . . . . . . . . . . . . . . 3610 Thermograph Example: G = {{2 | −1}, 0 ‖ {−2 | −4}} . . . . . . . . . 3611 An Example of the THERMOSTRAT Algorithm . . . . . . . . . . . . 3712 An Example of the Patterns for Simplification . . . . . . . . . . . . . . 3813 An Example of the Rules for Simplification . . . . . . . . . . . . . . . 38

3

Page 5: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Abstract

In this paper we bring forward a new concept, “Control”, to solve the major “Combi-natorial Explosion” problem in computer game play. It is a more general concept thanthe notion of “Search”. We present a unified framework, Algorithm Controlledα−β ,to reify this concept. We also design a system to play the game Domineering as aconcrete example, which shows the superiority of “Control” in practice. As a byprod-uct, we give a survey of most traditional tree-search algorithms, and experimentallycompare these methods in an Othello program. Appendix A is a tutorial written forthe CGT system, which is used as a new passive control method in my experiments.

Page 6: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

1 Introduction

Computer game play is usually referred to as the Drosophila of Artificial Intelligence,or the “touchstone of computer intelligence”. Ever since Claude Shonnon publishedhis seminal paper that laid out the framework for building high performance game-playing programs [21], people have incessantly been trying to build strong computerprograms to defeat the best human players in all kinds of games, such as chess,Othello, Scrabble, Go, to name a few. Alan Turing did a hand simulation of hiscomputer chess algorithm [24] in 1951; Arthur Samuel designed a famous checkers-playing program around the 1950s; and A. Newell J. Shaw and H. Simon made ancomprehensive investigation for the game chess in 1958 [16]. All these early workspaved the way for the development of Computer Game Play. Through more than 40years’ research, various methods and algorithms were brought out, and the playingstrength of the computer has been greatly enhanced. The event of IBM’s Deep Bluebeating world chess champion Garry Kasparov in 1996 symbolized a golden time ofcomputer game play.

People who are familiar with von Neumann-Morgenstern game theory [25] wouldask, why bother to study computer games? Because most games played by computersare two-person, zero-sum games with perfect information, like chess, Go and Othello.Therefore a rational strategy for the play is obvious: we can construct a completegame search tree and make a thorough analysis of it. Not only is it possible to predictwho will win, but we are also able to figure out how much the winner could gain.Unluckily, neither human beings nor the largest and fastest computers are capable ofexecuting the optimal strategy. It is impossible to calculate trillions of positions in areasonable time. (It is estimated that there are 10123 legally possible games of chess,and many more of shogi(10230) and Go(10360).) The “Exponential Explosion” is stillthe main barrier for both human and machine to conquer in the future.

To understand how to design a computer game play system, human being’s ownplay behavior is always a first use for reference. Psychological research on chess think-ing shows that for human it involves a modest amount of search in the game tree (amaximum of 100 branches, say) combined with a large amount of pattern recognition(or chunks), drawing upon patterns stored in memory as a result of previous chesstraining and experience. The stored knowledge is used to guide search in the gametree along the most profitable lines, and to evaluate the terminal positions that arereached in the search. These estimated values at the terminals of the miniature gametrees are mini-maxed back to the root in order to select the best move [22]. Due to thepioneering studies of the Dutch psychologist Adriaan de Groot [8], people have real-ized that the knowledge and pattern recognition ability is the most valuable treasurethat the human player possesses to beat his opponent. De Groot found that in chessit is very difficult to discriminate, from the statistics of search, between grandmas-ters and ordinary club players. But on the other hand, the grandmaster’s vast store

1

Page 7: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

of chunks (at a minimum of 50,000 chunks in chess, for example) seems to providethe main explanation for the ability to play many simultaneous games rapidly. In-stead of searching the game tree, the grandmaster simply makes “positionally sound”moves until his opponent makes an inferior move, and then uses clue on the associatedmemory immediately to exploit the opponent’s mistake.

With the help of the Human Game Play behavior, researchers realized that a typ-ical computer program needs to incorporate the following three components: “MoveGenerator”, “Search Engine” and “Evaluation Function”. But there are two differentpaths to approach “Intelligence”. Some suggest we should emulate the human ex-ample to achieve a high intelligent level, that is, the computer should try to learn asmuch as possible, and store its knowledge into a large database for retrieve in real-timecompetition. Others persist in using “brute force” search to obtain intelligence. Theybelieve that accompanied by a limited knowledge, deep search can achieve intelligentbehavior as well. After a systematic study of the history of computer chess Simonand Schaeffer came to distinguish three eras [22].

1. The pioneering era, for computer chess from the early 1950s to the mid 1970s,is the period when many different approaches were tried and much domainknowledge was used.

2. The second era is the technology era, characterized by a strong correlationbetween machine speed and program performance. For chess this era began withthe full exploitation of the α−β search algorithm in the mid 1970s, after Knuthand Moore’s extensive analysis of the method [12]. This era was dominated bybrute-force searching programs using little knowledge, relying on the speed ofcomputers for a good performance. Also the design of special chess hardwareand the parallelization of chess programs were typical of this period.

3. In the third era, called the algorithm era, which has just begun for computerchess. It is recognized that speed alone will reach its limits, resulting in a newappraisal for innovative search methods. And many strategies besides α−β areproposed and investigated (See [11] as a good survey).

I would prognosticate that a fourth era – the “control era” – will come some day.

This paper is organized as follows. In Section 2, I make a survey of the traditionaltop-down tree search strategies and show the results of my experiments of implement-ing all those strategies (including these mixed strategies). In Section 3, I introducea new notion “Control” and propose a framework for a play system. In Section 4,I employ the general framework to design “Convergent Domineering” as a concreteapplication. In Appendix A, we introduce the CGT notation system, which is usedto calculate the value of a Game precisely. Sections 3 and 4 are the main parts of this

2

Page 8: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

thesis, one familiar with (or not interested in) traditional tree search strategies couldgo directly to Section 3.

2 The Traditional Top-down Search Strategies

“Ultimate exponential explosion is not avoided – save in exceptionally

highly structured situations like the algebra example – but only postponed.

Hence, an intelligent system generally needs to supplement the selectivity

of its solution generator with other information – using techniques to guide

search”

— Allen Newell and Herbert A. Simon [17]

As mentioned in the introduction, a typical AI game program contains three dis-tinct elements: move generation, tree search (including pruning) strategy and eval-uation function, all of which come from the study of how human players behave ingames. Move generation is the simplest part, and will not be discussed here. Theposition evaluation always appears to be the most important (also the most difficult)part in an AI program. In real-time game play, a limited search must be carried outto determine the unknown potential of such active moves. The evaluation processestimates the value of top-down game tree nodes that could not be fully expanded.That is to say, it is “imprecise”, or only roughly assesses the value of a particularboard configuration. Put in another perspective, no other than “imprecise” evalua-tion needs to do mini-maxed back selection and pruning. Most of evaluation designsare domain-dependent and usually designed in cooperation between computer scien-tists and domain experts, so it seems a bit difficult to discuss them here. The thirdpart - tree search and pruning strategies - is used to expand the tree in the most rea-sonable way, cutting down some branches that are unpromising but otherwise wouldbe searched in the “exhaustive” search.

This section concentrates on the tree search and pruning aspects, especially theα−β search strategy and many of its enhancement, as well as its weak points and somealternative approaches. I will describe most of them in an intuitive way, providingthe main ideas behind them while avoiding details.

2.1 Minimal Tree and the Standard α − β Search

In typical two-person zero-sum perfect information games, the aim of the traditionalsearch is to find a path from the root to the highest valued leaf node that can bereached, under the assumption of best play by both sides. The min-max rule isgenerally used to propagate the value of those outcomes back to the initial state.Here Max means the player move at root (or the root of any subtree) optimizes its

3

Page 9: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

gains by returning the maximum of its children values. The other player, Min, triesto minimize Max’s gains by always choosing the minimum value. In zero-sum games,one’s gain is the other’s loss. Therefore, by evaluating the terminal nodes from theperspective of the player to move and negating the values as they back up the tree,the value at each node in the tree can be treated as the merit for the player whoseturn it is to move. This framework is referred to as NegaMax [12], which is simpleand uniform, and has a global vision 1.

Most people would find that theoretically, the outcome of the whole game can befound in the way above, but as indicated in the first part, “exponential explosion”would prohibit us to do it without the limitation of search depth. Fortunately, it hasbeen known since 1958 (maybe even earlier) that pruning was possible in a min-maxsearch. It is not essential to expand all the branches of a game-tree to find the min-max value; only a so-called minimal tree needs to be expanded. The final outcomeonly depends on the nodes in the minimal tree. In their seminal 1975 paper, Knuthand Moore gave a thorough analysis of the Minimal tree and the α − β algorithm.In [14] Marsland and Popowich introduced a more descriptive terminology for thedefinition of minimal tree2, as follows:

Definition 2.1.1 Minimal Tree Given any game-tree, we can derive a minimal treeby identifying its nodes according to the following rules:

1. The root of the game-tree is a pv-node.

2. At a pv-node, n, at least one child must have a min-max value −vmm(n) (whenthere are several such children pick one arbitrarily). That child is a pv-node,but the remaining children are cut-nodes.

3. At a cut-node, a child node n with a min-max value vmm(n) < vmm(npv) isan all-node, where npv is the most immediate pv-node predecessor of n. At leastone child must have such a value; when there are several, pick one arbitrarily.

4. Every child node of an all-node is a cut-node. Obviously, several different min-imal trees may exist according to the second sentence of rule 3.

Knuth and Moore provide an elegant formulation to calculate the leaves of theminimal tree, as follows:

Nd =

{2b

d2 − 1 d ≡ 0 (mod 2)

2bd+12 + 2b

d−12 − 1 d ≡ 1 (mod 2)

1Every traditional top-down search strategy used in small board game has a global vision, the

reason I emphasize it in this paper will be explained in Section 32Whether it is minimal (when game trees have variable branching factor and transition table is

used) is queried by Aske Plaat et al; people who are interested in this can refer to Plaat’s PhD thesis

[19]

4

Page 10: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

P denotes pv-node; C denotes cut-node; A denotes all-node

Figure 1: Minimal Tree

Figure 2: α− cut and β − cut

where d means depth, and b is the fixed branch factor of the game tree. One cansee that the concept of minimal tree sets a bound for the cost of any min-max basedsearch method, that is to say, in order to find an optimal solution in a d−depth gametree, any search method needs to search at least a full d

2 − depth tree.Most of the pruning algorithms are trying to approach the minimal tree, that is, to

cut down as many branches as possible to “disinter” the minimal tree, among whichcomes the most important method - the α − β search. Intuitively, the idea is verysimple:

1. α-cut: consider Fig. 2.1(a). The value of node B is 18, and the value of node Dis 16. A is a Max-node, and C is a Min-node. Obviously, we can deduce thatValue(C)≤ 16. Because A = Max(B,C), A = 18, no matter what the value ofE and F would be. Therefore, we do not need to compute the value of E and F.

2. β-cut: similarly, consider Fig. 2.1(b). The value of node B and D is 8 and 18,respectively. A is a Min-node, and C is a Max-node. Obviously, we can deducethat Value(C)≥ 18. Because A = Min(B,C), A = 8, no matter what the value

5

Page 11: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

a = alpha; b = beta

Figure 3: The Effects of α− β Pruning

of E and F would be. Again, we do not need to compute the value of E and F.

In the NegaMax framework, the α − β algorithm could be presented like this: (dmeans depth)Algorithm α− β(n, d, alpha, beta){

S = Successors(n)if(d ≤ 0 or empty(S))

return Evaluation(n);best = −MAXINTEGER;for all ni ∈ S do{

v = α− β(ni, d− 1, -beta, -max(alpha, best));if(v > best){

best = v;if best ≥ beta return best;

}}return best;

}

Fig.2.1 shows an example of α− β pruning.

2.2 The Enhancements for α − β Search

The number of nodes that need to be expanded in the α − β search tree dependsheavily on the order of the nodes at the next level. In the best case, the next move is

6

Page 12: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

always expanded first at pv-nodes; at cut-nodes any move sufficiently good to causea cutoff can be searched first. But in the worst case, it degenerates into the originalmin-max search.

Various heuristics are used to accomplish a good move-ordering. J. Schaefferinvented the “History Heuristic” in 1989 [20], which tries to rank the “usefulness” ofnodes, where a good node meets at least one of two conditions: (1) The node willcause pruning (2) It will not cause pruning, but is the best among its “brothers”During the search. We maintain a so-called “history table” for all the possible movesthat will occur in the game. When finding a good move, we add a certain bonusto the value of the corresponding moves in the history table. (Schaeffer suggest abonus of 2d will be useful where d is the depth at which the move is generated, butit usually depends on specific games). A move that is considered to be a good movemany times during the search would gain a high score in the history table. Andevery time we expand a node, we sort these children moves first according to theirscores in the history table, in order to get a good order for further search. Becausea node with a high score would probably cause pruning or is the best choice at thatlevel, it might speed up the search. An important property of this technique is thesharing of information about the effectiveness of moves throughout the tree, ratherthan only at nodes that appear at the same search level. In other words, it uses the“graph property” of the game tree, since we can connect the same nodes/position inthe game tree, thus transforming it into a graph.

Another approach is trying to enhance α−β search itself. The Principal Variation

Search (PVS) is one successful variant. The basic idea behind this method is that wecould try to use narrow windows to prove the inferiority of the subtrees leading tothe pruning of some additional horizon nodes. The PVS algorithm first explores theexpected pv-nodes (The true type of a node is not known until it has been searched.),tries to establish a “lower-bound”, and then visits the expected cut- and all-nodes,using the lower-bound to reduce its search. One can make a comparison between theexample shown below and the one employing the standard α − β algorithm shownearlier. For a detailed description of the algorithm, please refer to [13]. In order tosupport this idea in a more efficient fashion, the results (lower-bound, upper-bound,exact score, best move, status. . . ) of the nodes that are searched can be kept in alarge direct access table. When a position is reached again, the corresponding tableentry serves as an advisor to narrow the (alpha, beta) window bound and provides thebest move that was found before, as well as reusing those configurations that havebeen completely examined before. This technique is usually called “Transposition

Table”. Actually, it is just a table containing information for later retrieval later [27].Fig. 2.2 shows an example of PVS pruning.

It is commonly accepted that the fastest α−β search based pruning strategy with-out the aid of knowledge heuristic is the MTD(f) (Memory-enhanced Test Driver

7

Page 13: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

a = alpha; b = beta

Figure 4: The Effects of PVS Pruning

with node n and value f) algorithm. It was proposed by Aaske Plaat et al, [19]. Itsmain idea is to call standard α − β search many times. Every time it uses a Min-

imal Window Search; each Minimal Window Search returns a bound, and it couldbe used to change the global upper-bound and lower-bound if necessary. The mainupper-bound and lower-bound will change quickly in the search, when upper-bound≤ lower-bound, finish. At that time, the value g would converge to the best value.This idea originates from two observations, (1) null-window searches cut off morenodes than wide search windows; (2) we can use storage to glue multiple passes ofnull-window calls together, so that they can be used to narrow the range of the finalmin-max value, without re-expanding nodes searched in previous passes, creating abest-first expansion sequence. The MTD(f) algorithm appears very simple, somewhatlike the quicksort algorithm, as follows:

Algorithm MTD(firstguess, depth){

g = firstguess;upperbound = MAXINTEGER; lowerbound = MININTEGER;while(lowerbound < upperbound){

if (g == lowerbound) beta = g + 1; else beta = g;g = α− β(depth, beta - 1, beta); /* null window search */if (g < beta)

upperbound = g; /* fail high */else lowerbound = g; /* fail low */

}return g;

}

8

Page 14: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Obviously, the initial “firstguess” value is also very important. If the initial valueis nearly the same as the final Max-Min value, the convergence would be achievedvery fast. A standard technique to further improve this strategy is called “Iterative

Deepening” [23] (It can also be used in many other strategies, i.e. to control the timeperformance). Intuitively, for example, an iterated series of 1 − ply (ply is a jargonand means one depth), 2 − ply, 3 − ply ... searches is carried out, with each newsearch first retracing the best path from the previous iteration and then extendingthe search by one ply. If assisted by a memory table, iterated search often requiredless time than an equivalent direct search. And the best value of its (D − 1) − ply

search can provide a good “firstguess” for the D − ply search.Many other techniques try to improve α − β search algorithm further, among

which forward pruning is thought to be the most promising way as well as the most“hazardous” way. This kind of strategy usually employs some powerful heuristicmethods to further reduce the tree size, such as “decrease the search depth if twomoves in a row by one player do not help to bring the value of a position backinto the α − β search window, since in most games it is always better to make amove than to pass”. In the strong Othello program Logistello, Buro [6] introduceda probabilistic forwarding cut strategy ProbCut, based on following idea: in order toevaluate a position using a deep search of depth d, the position can first be examinedby a shallow search of depth d′ < d. The result v′ can be used to estimate thetrue value v and to decide with a prescribed likelihood whether v lies outside thecurrent (alpha, beta) window. If so, the position is not searched more deeply andthe right-hand bound is returned. Otherwise, the search is performed to depth d,yielding the true value. Intuitively, it just emulates human players to eliminate someobviously bad choices. Later this method was combined with the Multi-cut strategyproposed by Bjornsson and Marsland to gain a higher performance. The main ideaof the Multi-cut strategy is as follows: for a new principal variation to emerge, everyexpected cut-node on the path from a leaf-node to the root must become an all-node.But some expected cut-nodes, where many moves have a good potential of causinga beta-cutoff, are less likely to become all-nodes, and consequently such lines areunlikely to become part of a new principal variation. As in ProbCut, Multi-cut usesa shallow search to decide whether a pruning could be performed at an earlier time.It establishes a constant C; if the number of cuts in the shallow search is greater thanC, we need not to search further. For detailed information of Multi-cut, please referto [4]. All these techniques cannot guarantee that the best move could be found,because falsely judging a node as a “bad” choice at a shallow level and discard itwould lead to the omission of an actually good move.

9

Page 15: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

2.3 The Weaknesses of α−β Search and Alternatives Methods

Simon and Schaeffer said that “In some sense, it is unfortunate that computer chesswas given the gift of α − β search so early in its infancy. Although it was a valuableand powerful tool, its very power may have inadvertently slowed progress in chessprograms for a decade.” [22] Yes, no other existing method could compete with thedomain-monopolizing, many-fold enhanced and fine-tuned α−β algorithm. However,we cannot neglect some limitations of α − β search. It is known that α − β searchalways uses an “imprecise” heuristic evaluation function, and the search depth islimited as well as only the maximum/minimum of all the values of the successors willbe taken into account. All this will enlarge the possibility of erroneous estimation.Some techniques based on α−β search are already used to improve the second defect,such as “Quiescence Search”, but the first and third seem out of α−β search’s power.In addition, the depth-first manner employed by α − β search limits the choice ofwhich node to expand next.

Historically, many strategies are proposed besides α − β search. Among themB* (and its successor Probabilistic B*) and Conspiracy Number seem to be mostinteresting. Both of them use a selective search fashion. Before describing them, Iwould like to repeat an interesting comparison between “selective search” and “brute-force search” using an analogy between game-tree search space and a surface withgeological features on it proposed by Berliner and McConnell [3]. The X and Ycoordinates define the location on the surface, and the Z coordinate is the value ofthe evaluation function at point Xi, Yi (the height of the geological feature). Supposewe have a marble, and we wish to roll it into the lowest position on the surface(Perhaps one can find the largest golden stone there!). There might be ridges andbarriers on the surface but one could never know what is over the next ridge untilone has looked over it. In other words, we are searching online. Obviously, a brute-force search strategy like α − β search searches the full area within a given radiusfrom the original position except some obviously forks in roads which have a signsaying “no gold here”, and a pruning would be performed. But a selective searchstrategy will take small steps every time towards a direction which appears to havea larger probability of finding a relatively large golden stone by peering over moremeaningful ridges than the brute-force search strategy. In a smooth space, searchfurther performs well; but in a highly ridged space, one guided by some clues as towhere the golden stones are to be found will win.

The basic idea of Conspiracy Number proposed by McAllester [15] is to record“how many leaf nodes in its subtree have to change their value in order to change thevalue of the root”, and the corresponding minimum number of sub-nodes is called the“Conspiracy Number” of the root. A specific “conspiracy threshold” is set for eachroot node, and the higher the threshold is, the greater is our confidence in the valueof the root node. Instead of performing a fixed depth search like α − β search, this

10

Page 16: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

algorithm selectively expands nodes in the tree until the specified degree of confidenceis achieved in the root value. Unfortunately, a lot of difficulties puzzle this strategy,such as what is a suitable conspiracy threshold? When to terminate search? Andthe space and time requirement is also a big problem, because we need a conspiracynumber for every possible value a node can have.

In a sense, the B* algorithm [3] seems the most promising strategy besides α− β

search. It tries to establish a lower bound of the best successor that is greater or equalto the upper bound of all alternatives, which is called separation. Once the separationis found the search procedure is finished. And in contrast to α − β search’s depth-first search, B* uses a best-first search strategy. It uses some bound informationto traverse the tree and expand the most relevant leaf nodes. The Probabilistic

B* employs probability distributions to further reduce the computational prize ofcalculating the unique scalar value [18], and it is successfully used in the famous chessprogram Hitech. The main difficulty of this method is that separation was found tobe rather hard to achieve the termination of the search. Therefore Hitech adapts arelaxed criterion called domination.

Is there any alternative method that is superior to traditional α − β search inpractice? After a systematic analysis of the game chess, Junghanns said “No” [11].But I doubt things would remain the same for all kinds of games, although it is afact that all those alternatives might be weak at some aspects compared with thewell-equipped α− β search in the game chess.

2.4 A Test of α − β search and Its Enhancements

I built an Othello program to test the practical effects of α − β search and many ofits enhancements. Table 1 shows the results.Here are some observation:

1. “Transaction Table” does not work well in the Othello game. I think the reasonis that the configuration of the game table in Othello changes too quickly aftereach move, so there will be few entry of the same. In practice, it reduces thenodes about 20%-40% compared with the standard α−β search and achieves a20%-50% speed-up.

2. “History Heuristic” does not work well in Othello for the same reason as encoun-tered by the hash table. In Othello, the grid that is allowed to put a stone nowmight be forbidden in the next turn, and then the “good node” will bring us nogood but misguidance. In practice, it reduces the nodes by 50% compared withstandard α−β search. But to my surprise, the time cost increased by 20%-40%instead. I think it is caused by the “sort” procedure (To choose the most usefulnode, which is not needed in standard α− β pruning search).

3Abbreviation for NegaScout, it is similar as PVS, another variants of the Null Window Search

11

Page 17: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Table 1: α− β search and its enhancements

Search strategy Average #node searched/step Average time cost/step (sec.)

depth = 8 depth = 10 depth = 8 depth = 10Standard α− β 249657 3890512 3.45 56.55α− β + TT 140627 3133077 2.27 49.63α− β + HH 114502 2158547 4.25 79.17MTD(f) 81845 997818 1.23 15.88α− β + TT + ID 141051 – 2.33 –MTD(f) + HH 64376* 1044324** 1.46 26.43TT + NS3+ HH 69920 922427 1.48 21.90Muti-Probcut (depth = 13) 600000-650000 (depth = 13) 8.07

HH = History Heuristic; TT = Transition Table; ID = Iterative Deepening;

∗ Actually, there occurs big oscillation, #nodes searched is 50000-100000

∗∗ #nodes searched is 600000-1300000

Computing environment: IBM ThinkPad T20, Windows XP, processor P3 700MHz, main

memory size 256MB

3. “MTD(f)” suffers less regarding with the poor characteristic of Othello (quickchange of configurations). In practice, it reduces the nodes by 70%-75%, andspeeds up the search 2.8-4.0 times.

4. Implementing “Multi-ProbCut” requires a lot of specific parameters. Becauseof lack of good parameters, the performance was not improved much in myexperiment (compared with Michael Buro’s experiment). But it is still thebest strategy for Othello, since it would never suffer the poor characteristics ofOthello. It could search much deeper in the same amount of time.

5. “Negascout + Hash + History Heuristic”, the combination of these 3 techniquescould reduce the nodes by 75%-80%, but just speed up the search by 2.3-2.6times (because of History Heuristic).

3 Control — A More General Concept

“In the knowledge lies the power”

— Edward. A. Feigenbaum [10]

12

Page 18: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

“What makes a problem a problem is not that a large amount of search

is required for its solution, but that a large amount would be required if

a requisite level of intelligence were not applied”

— Allen Newell and Herbert A. Simon [17]

In contrast to those brilliant successes against human player in chess, checkerOthello and Scrabble, all the traditional search strategies listed in the previous sectionmight fail in games like Go and poker. For example, after many years of exploration,the Go programming field is still not well developed. The key point is that its 19 ∗ 19board and resulting large branching factor overwhelms the power of α− β search. Infact, at least in my point of view, chess and Go represent two extreme examples ofgrid-based games. A brief comparison might make that clear:

1. The state of the board changes rapidly in chess, as pieces move to new posi-tions continuously and captures are often made, while in Go, the board is onlygradually changing in most situations, as stones are added to the existing con-figurations (captures are less influencing in most cases than in chess). As aresult, the value curve in chess stays smooth most of time and leaps abruptlyat special points, while it is much more stable in Go.

2. In chess, there is a consanguineous relationship between the likelihood of win-ning (from any stage of the game) and the number and quality of pieces oneach side. But in Go, there is no strong correlation between winning a tacticalstruggle over a group and the winning of the game (kofight is a good example).Thus, it seems that chess is just like a battle while Go appears as a war [5].

3. In Go, the number of legal moves and alternatives is considerably larger thanin chess, which renders the poor behavior of α− β search and other depth-firstbrute-force search methods. A typical global look-ahead Go program containsless than 5 steps.

4. Unlike chess, there are a lot of configurations in which only one sensible moveexists, i.e., a series of forcing moves emerge in cases like “ladder”4, one shouldconsider more than 20 steps ahead to determine a success or failure, which makesthe breath-first search useless.

These essential dissimilarities contribute to the difficulties involved in implement-ing traditional top-down tree search strategies for Go. Therefore, Go researchersare more interested in things like “pattern matching”, “formulary moves” (such asjoseki in Go jargon), “local tactic”, “global strategy”, “contact fight”, etc., most

4one can refer to the Appendix of [2]

13

Page 19: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

of which aim to emulate the human cognitive procedures and make use of existingknowledge. To draw an analogue, they try to use local tactics to win battles andthe global strategies to guide the war. For instance, the early academic Go programINTERM.2 (Reitman and Wilcox) used a hierarchy decision system to generate asound move. It summarized information up and filtered decisions down. Informationcontained in low-level representations were summarized and used in high-level repre-sentations. The program made decisions at a high-level about what to “focus on”.These decisions were filtered down to lower levels and influenced decisions made atthose levels.

All these facts tell us that it is impossible to rely solely on the unique technique– “Search” in computer game play. New mechanisms are called for, and here comesthe concept of “Control”.

3.1 The Definition of Control

The rough definition of “control” is “to guide a configuration towards a more man-ageable situation”. According to their inherent characteristics, we can divide theconcrete methods of control into two types. One is the “passive” control, like patternrecognition and formulary moves. In order to use these methods the computer mustwait until specific patterns emerge. Another is “active” control, like global strategyand local tactic. By employing these methods the computer can try to reach famil-iar situations itself. In many cases, the two kinds of control are used together toaccomplish the goal. For example, the computer could use active control to guidethe board configuration to a situation with which it is familiar, and then use passivecontrol to find an exact good move position. From another perspective, we could alsodivide them into “global control” and “local control” according to their “vision”. Theconcept of “manageable situation” here is a bit fuzzy, and it is hard for me to givea precise definition right now. Anyway, it is such kinds of situations in which thecomputer has a larger possibility of creating good moves while eliminating errors.

To give the reader a more intuitive image of the difference between “search” and“control”, I would like to review the analogy proposed by Berliner and McConnell.From a higher perspective, there are two ways to find a good low position. One isto do a brute-force search, which exhausts the area within a possible radius from theoriginal position. Certainly it can find a good position, but needs a lot of energyas well. An alternative approach is trying to recall those analogical situations onceencountered (passive control) or use some strategies to reach those situations (activecontrol), in which we already know how to find a good position. For example, if weknow in advance that there is always a deep canyon behind a ridge, and we are alsofamiliar with the way to reach the lowest point of canyon once getting across the ridge,we could direct to the ridge, which would render the situation more manageable.

Generally speaking, “Search” is just an alternative of “Control” on small boards,

14

Page 20: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 5: The Spectrum of Control

and “control” is a more promising way towards victory in large and complex settings.In the global scope, control might use various strategies and knowledge to find theway to manageable situations, and to narrow the consideration spectrum. And inthe local scope, control could employ various systems like CGT (see Appendix A)to find the exact best moves or use traditional search methods to find a good move;knowledge based techniques like “pattern recognition” are also applicable. Thus theway to find a reasonable move (not necessarily the move with best practical benefit)is extended enormously.

Figure 3.1 shows the spectrum of control. The tree-search strategy that appearsin the “Global Control” part might be a little strange. Actually, it is totally differentfrom the traditional tree-search method used to find a good position. Instead, it is akind of strategy (There would be other strategies with same effects). As one can see inSection 3.4, it could be used to regain the global vision, which is lost in the procedureof finding “hot spots”. The definition of vision here is: the size of surrounding areathat is taken into consideration when making a decision.

People have long paid too much confidence and attention on the traditional top-down search and pruning methods, while neglecting its essential weakness. To someextent, the victory of Deep Blue is just a brute-force victory, it contributes little totrue Artificial Intelligence. On the other hand, another famous chess program Hitech

which uses an advanced pattern recognition system and reaches a high performancesimilar to Deep Blue’s worth more credit. Let us imagine such an operation in whichwe expand the chess board to 16 ∗ 16. What will happen? Perhaps human (or theHitech program after some improvements) could still play well after this change, but

15

Page 21: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

it would definitely become a nightmare for Deep Blue. Because the branches of thepossible moves at each step would be increased greatly, and in order to play well thecomputer must do a much deeper search to gain “intelligence”, both of which wouldforce the “exponential explosion” problem to become more severe. Computer GamePlay is just like a war, a general cannot determine everything in a war, and a wiserone would always try to control it as much as possible. I regard “intelligence” asthe capability to reach a controllable situation, which could be dealt withexisting knowledge and resources nicely.

One will argue why control is useful and applicative in games? It is obvious thatthe concept of control will be worthless if we can never reach a complete manageablesituation or the procedure of control takes too much time. On the contrary, if we cancompletely control the game in early time, it would definitely become a promisingway towards victory! The CGT system used in Domineering incorporated with theseparation strategy is one such example. Another interesting example occurs in chess.If one tries to exchange chessmen as quickly as possible, the board configuration willsoon reach its end-game stage. An agent who has enough end-game skills wouldseldom lose! Of course, it might also be hard to win.

3.2 The Problems with Passive Control

Unfortunately, there are several major difficulties with regard to these traditional“passive control” approaches5. It is often the case that some large databases areneeded in game programs in order to perform “pattern recognition” and “formularymoves”. According to David Fotland (the author of the Many Face of Go (MFG)program), the pattern database of MFG contains around 1200 patterns of size 8 ∗ 8and around 6900 suggested moves for these patterns, and its joseki database containsaround 45000 moves. And to one’s surprise, the endgame databases of the Checkersprogram Chinook contains all checkers positions with eight or fewer pieces, 444 billionpositions compressed into six gigabytes for real-time decompression. Behind theirnotable success, a series of problems emerge. First of all, it would take a long timeto recognize a position and search a solution in the database. And the size of thedatabase is also limited due to technical constraints. Second, once out of the scope ofdatabase (i.e. there is no accurate matching position in the database), the computerwould be at a loss even if the position it encounters is just a variation of a basic one,or could be deduced easily from some basic positions. Third, if using local patternrecognition strategies alone, the computer would be limited to a local scope. It willbe blind to things that have happened in other places of the board, which lead topoor decisions in cases like “ko” occurring in Go.

Besides, another (the fourth here) difficulty seems never been pointed out by other

5In fact, certain problems related with “active control” might be more difficult, such as how to

apply global strategies in complicated settings, but they are beyond the scope of this thesis.

16

Page 22: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

researchers (perhaps it just occurs in some lesser-known games). In some games, suchas Domineering, Clobber, COL, SNORT 6, it is usually hard to tell a good positionfrom a bad one, even after a considerable depth of search. Just like some affairs thathappen in real life, such as the global economy and relations among nations, one couldnever know what is the best way to deal with them to obtain the largest immediatebenefit. Similarly, the precise values among several configurations are basically thesame, and may be only differentiated by an infinitesimal “number” like ↑, ∗, · · · (SeeAppendix A). No existing way of evaluation function could separate them apart if onlymeasuring the value of positions. All those traditional tree search methods which aimto find a good move lose their powers.

The purpose of this paper is:

1. Try to propose a general framework for computer game play.

2. Try to solve the second and third problem as well as contriving an alternativemethod to avoid the fourth problem for a kind of games. (it will appear inSection 4)

Before showing the general framework, I would like to introduce a new passive controlmethod, which will show its superiority in some games.

3.3 A New Passive Control Method

Since 1970s, a group of mathematicians have made great efforts in analyzing theconfigurations of end-games in a precise way. They wish to predicate not only whowill win, but also how much the winner could gain at the end of the game on thepremise of perfect play for both side. In the year 1976, John H. Conway published hisfamous work On Numbers and Games. And in 1982, BCG’s Winning Ways broughtpeople to the burgeoning field of recreational mathematics 7. By using some specialnumbers like ∗, ↑, ↓,+n,−n and certain techniques like chilling and warming (thesewith many other techniques and theorems constitute a big theory called CGT), they

6Domineering has been considered by Goran Andersson and has also been called Crosscram and

Dominoes. Left and Right take turns in placing dominoes on a rectangular checker-board. Left

orients his dominoes horizontally and Right vertically. Each domino exactly covers two squares of

the board and no two dominoes may overlap. A player who can find no room for one of his dominoes

loses. I decide to use it as a demo for my new strategy, see Part 4. Clobber is played by two players,

White and Black, on a rectangular checker-board. In the initial position, all squares are occupied

by a stone, with white stones on the white squares and black stones on the black squares. A player

moves by picking up one of their stones and clobbering an opponent’s stone on an adjacent square

(horizontally or vertically). The clobbered stone is removed from the board and replaced by the

stone that was moved. The game ends when one player, on their turn, is unable to move, and then

that player loses. For other games, please refer to [1, 7].7Both books are revised and published by A.K.Peters. [1, 7]

17

Page 23: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

managed to calculate the precise value of each particular end-game configuration for avariety of games, and no error any more! And if the value is > 0, we obtain a winningstrategy for the left player (< 0 for the right, ‖ 08 for the first, = 0 for the second).

It seems that we have found a new kind tool to implement “passive” control. Andif it could be used, it would be more powerful than the traditional pattern recognitionusing knowledge databases. The reasons are as follows:

1. CGT enables us to calculate the precise value of a particular board configurationdynamically. Yes, the basic part of CGT system is just an exhaustive game-treemin-max search. But the values it calculates could also be stored in a knowledgedatabase as patterns. I just want to say it is superior to both static min-maxsearch and traditional database system. This is the key reason I use it here.

2. A lot of general patterns and rules can also be used to speed up the calculation.

3. More interestingly, these unconventional numbers can be “compared” with op-erators like “ ‖>”, “<‖”, “>”, “<”, and (the key point) “+”, which enables theendgame play to escape from local scope and obtain a global vision by decidingwhich component to play into precisely.

The traditional pattern recognition cannot do any of these. But as far as I know,CGT has never been used in an AI program. The crux lies in two points. First,it is true that it could not be applied to some games (like chess) directly 9. CGTworks best with “cold” games, where having to move is a liability or at most aninfinitesimal boon, whereas the vast majority of chess (and some other games) are“hot”. It is also true for Go, but there the value of having to move, while positive,can be nearly kept constant and then managed by “chilling operators” [2]. Second,perhaps the most important thing, is that the problem of “exponential explosion”exists, and even more severely than “deep search”, if using this method to calculatethe value dynamically. To avoid this difficulty, the only thing we could do is just tocalculate the value for small components, and then add them together, if possible,which requires the end-game to be “component independent”. It becomes impossibleto use this strategy in early stage of games10.

On the other hand, as indicated by Elwyn R. Berlekamp 11, mathematicians havealways been somewhat skeptical of an AI approach, because they think there are veryfew games for which any reasonable rating systems exist. What they want to do isto calculate the value of games directly, such as for n ∗ n Domineering. Much workhas been accomplished, but still a lot of work needs to be done. And whether it is

8fuzzy with zero9Actually, in some cases in chess it could be applied. Please see [9] as an illustration.

10I observed that it took quite a long time (no result after 10 minutes) to calculate an empty 6 ∗ 6

board of “Domineering” using Aaron Seigel’s CGT toolkit.11According to a letter he wrote to me

18

Page 24: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

possible to obtain some general value formulations for all kinds of configurations is stillin doubt. At least I think it is unrealistic to obtain the value of an arbitrary shape ofDomineering configuration using mathematical reasoning and deduction alone insteadof brute force calculation.

Section 4 will show that in games like Domineering, CGT could be employed as apowerful passive control. For the details of CGT, I refer to Appendix A.

3.4 The Framework of a New Controlled-based Algorithm

As discussed above, “Control” is a more general concept than “Search”, and we couldimitate the classic α−β search algorithm to propose a new “control-based” algorithmfor most of the games, especially for grid-based games. The framework of the newalgorithm is as followsAlgorithm Controlledα−β(n, d, alpha, beta, side){

if(side == OPPONENT) /∗ We should model our opponent’s strategy. ∗/return OpponentPlay(n, d, alpha, beta, side);

if(d ≤ 0 or IsBoardConverge() == true)return Evaluation(n);

best = −MAXINTEGER;/∗ The Control part ∗/HotSpots = FindHotSpot(n, side);for all spoti ∈ HotSpots do{

result = Control(spoti, side);if(result.type == VALUE){

if(Check(result.val) == true)Insert(S, Successors(result.pos);

}else /∗ result.type == POSITION ∗/

Insert(S, Successors(result.pos));}/∗ The modified version of α− β ∗ /

if(empty(S))return Evaluation(n);

for all ni ∈ S do{

val = Controlledα−β(ni, d− 1, -beta, -max(alpha, best, not(side)));if(val > best){

19

Page 25: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

best = val;if best ≥ beta return best;

}}return best;

}

This algorithm is just a simple framework that could be employed by all the gameplay systems. Here are some explanations:

1. We introduce the function OpponentPlay() to model our opponent since weoften don’t know our opponent’s strategy; for example, we can assume that hismove is only based on the practical benefits.

2. The definition of “hot spots”: regions that should be given first consideration.

3. IsBoardConverge() is a function to check whether we could control the gamecompletely, such as whether we could calculate the precise value of the board.

4. FindHotSpot() is a function that typically uses active control to find some “hotspots” , which need preferential consideration. It predigests the entire boardto several smaller local regions, thus greatly reducing the branch width. Here“hot spots” refer to regions/positions that favor some specific strate-gies instead of those that are just good for immediate benefits.

5. Control() is a key function that employs active or passive control to find someconcrete “hot” move positions in a specific region (The definition “hot” is thesame as in FindHotSpot()). There are two types of results that are returned.One is an value (precise or imprecise); we use the function Check() to determinewhether this spot should be considered, that is, if the value obtained is “good”enough. We treat the next move that can lead to this value as a “hot move”,otherwise we just neglect it. The other is a good position; we treat them as“hot moves” directly.

6. Those “hot moves” that are collected together from all the regions to form theset S. We then perform a modified search and pruning algorithm on this setto choose the best move position; or we could say, we use tree-search toregain a global vision.

Note that this new algorithm has three new features:

1. Extend the notion of finding a good move that maximizes the immediate “ben-efit” to the notion of a good move that favors specific strategies.

2. Bring forward the concept of vision.

20

Page 26: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

3. If the function LocalControl() can calculate the value of a specific region already,we might perform the pruning earlier, which would decrease the search cost.

4 To Control Domineering in Early Time

“The basic paradigm for the initial testing of the germ theory of disease

was: identify a disease; then look for the germ. An analogous paradigm

has inspired much of the research in artificial intelligence: identify a task

domain calling for intelligence; then construct a program for a digital com-

puter that can handle the tasks in that domain”

— Allen Newell and Herbert A. Simon [17]

For the game Domineering, we can adopt a strategy called “separation” to parti-tion the entire board as quickly as possible, and then CGT could be used to calculatethe precise value of all those small components and maintain the global vision as well.At the end stage of the game, once the board is convergent (the concept will be ex-plained soon), we could obtain the whole value of the board, no matter > 0 or < 0 forthe computer side. The computer will never make bad moves any more. But humanbeings will. Thus the value would increase for the computer side monotonously, andonce > 0, the computer would win definitely! Therefore, all we must consider anddesign is the global control strategy of reaching such configurations (I call it conver-gent configuration) quickly. In other words, we must speed up the convergence. Ofcourse, it doesn’t help to converge to bad positions. Instead, we should speed upconvergence to winning positions. But it will be more difficult to implement. Forexample, for “Domineering”, it is especially hard to contrive the “separation strat-egy”. The program would be definitely stronger if we also considered the shape ofpositions. So an additional selection strategy is needed.

The Algorithm Controlledα−β provides a general framework for designing pro-grams for all games. But for a particular game, it should be elaborated and modifiedin order to be harmonious with the specific domain. In the following part, I willprovide concrete methods for all the undefined functions in Algorithm Controlledα−β

that will be used in Domineering.

4.1 The Definition of a Convergent Board

The function IsBoardConverge() in Domineering returns true if we can already cal-culate the value of all the components precisely. For the time constraint, we mustfirst set a threshold ρ, and let n be the number of grids contained in a component; ifn > ρ, we say this component is convergent. An alternative is to use the time bound

21

Page 27: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

directly, that is, if the value of a component could be calculated less than a thresholdtime τ , we say it is convergent. A board is convergent if and only if all components onthe board is convergent. From a convergent board configuration, we could use CGT(Here I will use Aaron Siegel’s toolkits) to calculate the value of the board precisely.And from then on, the computer will never make errors anymore, therefore no searchis needed any more.

4.2 The Control Part

This is the key part of our new Algorithm.

4.2.1 The Criterion of Hot Spots

The purpose of function FindHotSpot() is to try to find some “hot spots”. In Domi-neering, it is relatively simple. We first set another threshold ν. If a component G isconvergent and has a left incentive G−GL > ν or a right incentive GR −G > ν, it istreated as a “hot spot”; if more than one convergent component meets this condition,we choose the one with the greatest incentive value. Otherwise, if there is no conver-gent component meeting the condition, we treat every unconvergent component as“hot spots”. If there is only one “hot spot” at the beginning, the algorithm termi-nates at once 12. From this one can see that the convergent component that meetsthe requirement always has priority.

4.2.2 The Field Superposition Method

Then how to find the quickest path to partition an unconvergent component? Ouraim is trying to separate a component as soon as possible while keeping good weightbalance and quality of its children (I mean the shape of child-components shouldfavor the computer side). People who have learned a bit of algorithm theory wouldquickly call to remembrance the minimum-cut problem. Certainly, it is the best wayto separate a component without extra requirements. But unfortunately, it wouldbe meaningless if we cut the component around the corner or someplace so that twochild-components would differ greatly in size. The larger one might still be far awayfrom convergence.

Of course, a predetermined method of separation is useless here, and we must tryto exploit the potential of existing blocks in the component in real time, hoping thatthey could be employed in the separation procedure. For example, positions near

12In fact, I play a trick here. One might find that there are two strategies used here; one is aiming

at maximizing the immediate benefit while the other is trying to decompose the board as quickly

as possible. Therefore, two different evaluation systems are called for. But it is very hard for us

to harmonize these two evaluation system because they are based on different methodologies. The

former uses CGT which provides accurate calculations while the later just adopts rough estimations.

The trick I use here is to set a higher priority to the concern of immediate benefits.

22

Page 28: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 6: The Field Superposition Method

the two ends of a long stick would be preferred, and the one near its “abdomen” isinferior, just like the intensity of the magnetic field of a magnet! This is where the“field superposition” method comes from.

Formally, We can assign a value to each grid that is occupied in the component byemulating the analysis of a magnetic field. To do that, we first assign a charge to eachgrid that is occupied; a positive charge “+” for even grid (i.e. Px + Py ≡ 0 (mod 2)),and a negative one “–” for odd grid, see figure 4.2.2. Those occupied grids formingsome continuous blocks will be treated like radiate sources. Besides, an additional“virtual point” should be added to an original empty component (Whose interiorcontains no occupied grid) to enable the separation procedure at the beginning. Thefollowing algorithm shows how to calculate the value of each empty grid V (emptyj).

Algorithm CalGridValue(Com){

Blocks = FormBlocks(Com);for every blocki ∈ Blocks do{

for every emptyj ∈ Com do{

sum = 0;for every gridk ∈ blocki do

sum = sum + sign(gridk) ∗ 1d2 /∗ d is the Euclidian distance

between gridk and blocki. ∗/

V (emptyj) = V (emptyj) + |sum|}

}}

23

Page 29: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

The grid with the highest score might indicate its superiority in separating thecomponent. Certainly, we have to make some modifications to the superpositionmodel in order to obtain a more favorable separation tool. Details including how toadjust the value of an empty grid which is surrounded by more than two occupiedgrids are omitted here.

4.2.3 Other Assistant Considerations

As mentioned above, merely quick-decomposition is not enough, and we should alsoconsider the weight balance as well as the quality of its children. All these are verydifficult, and additional criteria are needed. For example, in order to balance theweight, we can first find the “center of gravity” of a component, and then calculatethe distance between an empty grid and the “center of gravity”. And we can simplyuse the comparison between the number of 1 ∗ n strips and m ∗ 1 strips to determinethe quality of all child-components. Let V be the value of an empty grid, D be thedistance between the empty grid and the “center of gravity”, and Q be the sum ofquality of its child-components (if the original component is not separated, Q = 0),we can determine the final value of a grid as

V (gridi) = αV + βD + γQ

where the value of weighted linear parameters α, β, γ and the way to define Q couldbe determined by experiment.

4.2.4 The Local Control Part

The local control For those convergent components is straightforward. We can justfind the best move position by directly using CGT.

4.3 The Evaluation Function

Perhaps this is the most difficult part. There are two kinds of components. Thevalue of one kind can be evaluated precisely, and that of another kind can just beestimated. Then how could we mix the two different evaluation criteria together?Fortunately, we can just ignore those convergent components. The reason is that ifthey are “urgent”, that is, the benefit value G−GL > ν or GR−G > ν, then they willbe chosen immediately, and the algorithm terminates at the first iteration; otherwise,the computer does not bother about them, and can just concentrate its energy on theseparation procedure.

The rest of the problem is how to work out a function to estimate the “Goodness”of an board that only contains unconvergent components. Obviously, we need toconsider the number of the unconvergent components (the few the better) and their

24

Page 30: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

sizes and shapes (the easiness of decomposition in the future). For example, we canuse the following function to determine the board’s “Goodness”:

Goodness(board) = λ/N + µ∑

Si

where N is the number of unconvergent components and Si could simply be deter-mined by the size and the value of the next best decomposition grid of a particularcomponent.

4.4 The Experiment and Improvement

I built a Domineering program by using the method proposed above, but the resultis not very good. We want to discuss the shortcomings of our program and how toimprove it in a later version. We would like to omit all the technical details of ourprogram here. If you are interested in it, please contact with me.

As everyone can notice, the key point of the “Control” method is aiming atobtaining long-term benefits by sacrificing certain immediate benefits. Butlosing too much immediate benefit will lead to an inevitable difficulty, that is, onecan never turn the scale even when the board is totally controllable. Therefore, justblindly separating the board at the beginning of a Domineering game is inappropriate,and the “shape” of resulting child-components must be considered, since Domineeringis very shape-sensitive13. But it is very hard to consider the shape, especially at thevery beginning. We think the following endeavor might be helpful.

1. Build an opening-book.

2. Add local selection (use CGT) to the separation procedure.

But we just want to use this game as a concrete example to illustrate our concept of“Control”, and our method “separation” — the global control should be highlighted.Therefore, we adopted the following tricks to improve the performance, although theymight not be the best choices if the performance is the only matter considered.

1. Accelerate the separation procedure. That is, we need not force the separation“Walls” to be continuous (or we can say like real lines); they could be like brokenlines.

2. The convergent criteria could also be modified accordingly.

3. We could design some maps in which the separation procedure could be fasterwhile they are still fair for both sides. (Of course, we could also use an opening-book to replace these somewhat special maps)

13That is, the values of different shapes vary greatly

25

Page 31: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

5 Further Discussion

“In Go, both Black and White can only play one move in turn. Territory

is the result, and the the difference in the efficiency of each move results

finally in a difference in territory. Moves that you play while keeping the

initiative, by attacking, have great value. Moves that simply surround ter-

ritory, or defensive moves, or moves that simply erase, have a low value.

So don’t play moves that are too concerned with territory pure and sim-

ple.”

— Takemiya Masaki14

It is likely that many people might still question about the practicability of thenotion of “control”. And perhaps they are eager to know if there are other possibleapplications of that method. In fact, my ultimate goal (also the starting point)is trying to apply this method to the most difficult game — Go, which is almostimpossible to be tamed by traditional methods as discussed before. Of course, itmight be very difficult, but there is still some hope. To accomplish the goal, somebasic systems which could play Go in a reasonable fashion (not necessarily very good)should be designed. In other words, the traditional strategy (that is, to gain basicbenefits) should be first established, and then more advanced strategies (like theseparation strategy in Domineering) could be added to enhance the performance.

This section will provide further discussion on how to cope with the “mixed strate-gies” introduced by the notion of “control” and their possible applications on Go.

5.1 On Standard Game Theory

With the introduction of different strategies that used to control the game in computergame play, how to select and mix the strategies becomes an important issue. Inthe game of Domineering, the following dilemma occurs when considering the caseComputer VS Computer, see Fig 5.1.

Here is the explanation of Fig 5.1. At first, computer A (we call it Vincent) usesMixed Strategy “Separation+CGT” introduced in Section 4, and computer B (we callit Kitty) uses the traditional method. Of course, Vincent’s performance is superiorif the two computers have the same resources and computational ability. After awhile, Kitty (if clever enough) notices this situation, so she changes her strategy to“TraditionalMethod + CGT”. Under these circumstances, the convergent state thatVincent tries to reach is of no use, since his rival Kitty also uses CGT which couldeliminate all the errors at the end stage of the game. Therefore, it is a waste of time

14A Professional Japanese Go Player

26

Page 32: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 7: Problem Encountered by Using Mixed Strategies

and energy for Vincent to do the separation. Recognizing this situation, Vincent (ifalso clever enough) will change its policy to using “TraditionalMethod” only. Whynot also use “T + C”? The answer is that when both sides adopt the same method“T +C”, the time to reach a convergent state will be extremely long (on large boards),so it is also a waste of time to check whether the board or a particular component isconvergent in each step. Therefore it might be cheaper to use “T” only. And finally,the dilemma occurs. Kitty will perhaps change her strategy again to “S + C” whichleads to an infinite loop in the end.

I do not know much about the Standard Game Theory yet, so I do not know how todeal with this situation right now. It seems that there is no Nash Equilibriumat all! Of course, to let this situation occur, both computers should be extremelyclever, or we could say that they must have true intelligence, which might become anissue many years later.

5.2 Application to Go

To put the notion of control into practice, one needs a pool of useful strategies.Actually, there are a great number of strategies that could be readily adopted in thegame Go. For example, Takemiya Masaki’s Cosmic Style (or Nature Style in hisown words) is such a typical global strategy. Here is a short description of Takemiya’sCosmic Style,

27

Page 33: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Takemiya never consciously thinks about deploying moyos15 - if he did,they would be too easy to foil. They just happen “naturally” but are nota precondition for success of his style. With deliberately planned moyosit is too difficult to cast a secure net, and with fellow pros being so goodat fighting, the slightest chink in the moyos will be mercilessly exploited.But naturally developing moyos lack this brittleness and have inherentflexibility. Though people call his moyos cosmic they are actually rootedin the soil of large territories.

Cosmic style is for attacking, natural style is beautiful.

People who do not acquaint themselves with Go perhaps can not comprehend theconcept illustrated above thoroughly, but it is not a necessity. I just want to say thatwe could actually turn a genre in Go-playing into a concrete strategy that could beused by the computer, and then this strategy will lead the board to a controllablesituation (since the computer knows how to play well when such a situation has beenreached). I think it would be a trend in the future, although it will take great painsto fulfill those strategies well.

Acknowledgement

I would like to thank Rudolf Fleischer, my supervisor, for his many suggestions andconstant support during this research. I am also thankful to Fei Ruoyu, my cooperatorof the Domineering playing system.

I had the great pleasure of meeting Chen Pinjie, by whose help I came across theearly days of chaos and confusion. I also thanks for his proof reading and suggestionsto improve the presentation.

Of course, I am grateful to my parents for their foster and love. Without themthis work would never have come into existence.

Finally, I wish to thank the following: Hong Xiangyu, Zhang Qiwei and everyonein our Theoretical Group of Computer Science.

Zhang Qin

December 11, 2006

15A jargon in Go denoting large territories in one’s orbit

28

Page 34: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

A Introduction to the CGT System

The key technique employed as a passive control in the play system comes fromCombinatorialGame Theory (CGT), which was established by a group of mathemati-cians. BCG’s Winning Ways16 and Conway’s On Numbers and Games (ONAG as itis always called) are two major monographs providing a panorama of the primaryworks done in this research field. Winning Ways is relatively intuitive and recreation-oriented, while ONAG is precise and mathematically sophisticated. I will introducethe part of CGT I need in the thesis primarily based on ONAG, because it seemsmore “manageable” to me17.

Before the introduction, let us find out what is needed for our play system first. Ofcourse, we should know what those values representing various game configurationslook like. Then, we should understand how compare them. Finally, we should knowhow to carry out the “+” operation. That’s enough. It seems easy at the first glance,but later one would find that they are not as trivial as one might think. An extramessage should be added before continuation, those values that occur in the gamepositions have a close relationship with those so-called “Surreal Numbers”, a numbersystem containing both Real number and Ordinals. And to some extent, the Class Pgformed by including all values of Partizan Games is even more general than “SurrealNumbers”. People who are interested in them can refer to ONAG.

A.1 Basics

In CGT, a game G is constructed like this: G = {L|R}, where L and R are two sets ofgames. We usually write GL for the typical element of L, GR for the typical elementof R, and they are called Left and Right options of G, respectively. Now we can writeG = {GL|GR}.

Definition A.1.1 18 Comparison, Addition and Negation

1. ComparisonG ≥ H iff (no GR ≤ H and G ≤ no HL). G ≤ H iff H ≥ G. G ‖ H iff neitherG ‖>H iff G 6≤ H; G<‖H iff G 6≥ H; G > H, G < H, G = H as usual.

2. AdditionG + H = {GL + H,G + HL|GR + H,G + HR}

3. Negation−G = {−GR| −GL}

16Written by Elwyn R. Berlekamp, JohnH. Conway and Richard K. Guy17Note, this section is not my work. One could treat it like a simple tutorial. Many of the materials

presented here are excerpted directly from ONAG and Winning Ways18From ONAG p.78

29

Page 35: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

One may notice that almost all of the definitions above are by induction. As aconsequence, most of the theorems over games are proven by induction (one can findmost of the theorems and proofs for CGT in ONAG).

The Class Pg of all Partizan Games forms a partial group under addition, with 0as zero and −G as negative.

Definition A.1.2 19 The order-relation

1. G > H iff G−H is won by Left, whoever starts.

2. G < H iff G−H is won by Right, whoever starts.

3. G = H iff G−H is won by the second player to move, and

4. G ‖ H iff G−H is won by the first player to move.

A.2 Numbers

Now we can construct some games using these definitions. First of all, 0 = { | }. Inthis position, L = R = ∅, both Left and Right have no legal moves. Following this,we could define integer numbers like 1 = {0| }, 2 = {1| }, · · ·n + 1 = {n| }. The lastequation means from a position with n + 1 free moves for Left, he can move so as toleave himself just n moves, where Right cannot move at all. The negative integers canbe constructed similarly −1 = { |0},−2 = { | − 1}, · · · − (n + 1) = { | − n}. Further,we can find number involving halves 1

2 = {0|1}, 1 12 = {1|2} · · · . − 1

2 = {−1|0},−1 12 =

{−2| − 1} · · · One may wonder how this can happen, here is a brief explanation: letG = {0|1}, let’s consider the game H = G + G = {0|1} + {0|1}. If Left moves first,Left can move the first G to 0, then Right must move the second to 1, Left has onemore free move than Right. If Right moves first, he can move the first G to 1, thenLeft can move the second to 0, the same. So 2G = 1, or G = 1

2 . If we continue ourconstruction, we can obtain numbers like 1

4 = {0| 12},34 = { 1

2 |1} · · ·. In fact, we couldfinally get all real numbers and ordinals, and we call them the class No in general.One may find that all these numbers are constructed with the fact GL < GR so far,so where is the other half (those constructed with GL ≥ GR)? Please wait and wewill show it in the next subsection.

As an example, what is the value of G = {1 14 |2}? It seems that the mean value

1 58 fits well. Unfortunately, it is wrong. To justify our assertion, we could test this

equation by playing the sum-game H = G + (−1 58 ) = {1 1

4 |2} + {−1 34 | − 1 1

2} sincewe already know (−1 5

8 ) = {−1 34 | − 1 1

2}. Only if H = 0, that is, neither player has a

19From ONAG p.78

30

Page 36: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

winning move in game H, we could say that G = 1 58 . Now consider if Left moves first

in the first component, and then Right can move in the second component, whichleads to − 1

4 , and Left loses. The same thing holds if Right moves first in the firstcomponent. But Right has a good move in the second component, that is, from −1 5

8

to −1 12 , which leads to a new game H ′ = G′ + (−1 1

2 ) = {1 14 |2}+ {−2| − 1}. One can

find it is still true that 1 12 lies strictly between 1 1

4 and 2. If tracing further, we couldrealize that neither player has a good move if moving first in H ′. Therefore G = 1 1

2 .The crux here lies in that 1 5

8 is not the simplest number strictly between 1 14 and 2,

but 1 12 is. In general, we have an important property of numbers that given by the

simplicity rule.

Property A.2.1 20 If all the options GL and GR of some game G are known to benumbers and each GL strictly less that each GR, then G is itself a number namely thesimplest number x great than every GL and less than every GR.

The simplicity rule can be illustrated as follows:

Theorem A.2.1 21 Suppose for x = xL|xR that some number z satisfies xL 6≥ z 6≥xR for all xL, xR, but that no option of z satisfies the same condition. Then x = z

The proof is omitted here, one can refer to ONAG. As a special case, we could provethat

2p + 12n+1

={

2p

2n+1| 2p + 2

2n+1

}=

{p

2n| p + 1

2n

}This rule enables us to simplify any expression with one Left option and one Rightoption to a unique number.

We now have everything we need for numbers. Actually, the class No is a totallyordered field, and we can perform comparison and addition of two numbers directly.As always, we can simplify an expression easily by eliminating those Dominated Op-

tions (see next subsection) first and then performing the simplicity rule.

A.3 And the Other Half

In the game Domineering, we often meet configurations like Fig. 8.In Fig. 8(a) (labelled game G), each player’s move will lead to a new position, in which neither player has legal moves, that is, has value 0. Then we obtain

G = {0|0}. Obviously, it is not a number, because each player moving first can win, inother words, it is confused with 0, labelled ∗. We can verify that ∗ is larger than everynegative number and smaller than every positive number. In Fig. 8(b) (labelled gameH), if Left moves first, he can lead the game to a configuration with four disjunctive

, or he can lead to the game G. But he will definitely choose the former position20From ONAG p.81-8221From ONAG p.23

31

Page 37: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

(a) (b) (c) (d)

Figure 8: Some Configurations for Domineering

which leads to a triumph, while the later will give Right a chance to win. If Rightmoves first, any choice will lead to the game G. So H = {0 | ∗}. Again H is nota number, so we give it a new name ↑ . It is not difficult to see that ↑ is strictlygreater than 0 because Left has a winning move. But it is smaller than every positivenumbers, since Left can win in game H = G− ↑ for every positive game G. To seethat, if Left moves first, he can turn ↑ to 0 and win; and if Right moves first, she willturn ↑ to ∗, then Left turn it to 0, win.

We can continue to define some special values like this:

↓= {∗ | 0}, ⇑= ↑ + ↑ , +n = {0 ‖ 0 | −n}22, −n = {n | 0 ‖ 0} · · ·

In fact, we can continue our construction and obtain a large class of values like these,they are all very small and seem similar to numbers. In practice, we can make somecomparisons using definition A.1.1, and gain some inequalitys like ⇑> ∗, ↑ ‖ ∗ · · ·, aswell as some surprising equations like {0 | ↑ } =⇑∗.

But what is the value of Fig. 8(c) (labelled game I) shown above? It has a ratherworse value {1 | −1} ! We can just know that I is strictly less than all numbersgreater than 1, strictly greater than all numbers less than -1, but it confuses withall numbers between -1 and 1, inclusive. And in general, it seems not easy for usto compare or add two numbers that appear in more complicated forms. And it isridiculous if we always give a new name to a new expression once it comes out andtry to find out their relationships by hand. The following subsections are devoted tomaking further investigations to these unconventional values.

A.3.1 Value Simplification

It appears difficult for us to apply addition and comparison directly to these non-numbers23. It is often the case that two non-numbers with essentially the same valuemight appear in different forms, such as { ↑ | ↑ } and {0 | ↑ }. Therefore the first thingwe should do is to simplify all non-numbers.

22{0 ‖ 0 | −n} is just the abbreviate notation for {0 | {0 | −n}}, and we can always read games

start from the place with the most ‘|’23These values are not in No but in Pg

32

Page 38: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

We have two important rules here: Dominated Option and Reversible Op-tion. Suppose two different Left options of G are comparable with each other, sayGL1 ≤ GL0 . Then we say GL1 is dominated by GL0 , since Left will definitely chooseGL0 for its superiority. Similarly, if GR1 ≥ GR0 , we call GL1 is dominated by GL0 ,since Right always prefer the smaller value. And suppose the Left option GL0 hasitself a Right option GL0R0 , say, for which we have GL0R0 ≤ G. Then we say thatthe move from G to GL0 is a reversible move, being reversible through GL0R0 . Andsimilar definition for the Right if it has some Left option GR1L1 ≥ G.

The following two theorems provide a powerful technique to simplify a game ex-pression. The first tells us that we could eliminate some option in the set L and R

and replace some reversible options.

Theorem A.3.1 24 The following changes do not affect the value of G

1. Inserting as a new Left option any A<‖G, or as a new Right option any B ‖>G.

2. Deleting any dominated option.

3. If GL0 is reversible through GL0R0 , replacing GL0 as a Left option of G by all

the Left option GL0R0L of GL0R0

4. If GR1 is reversible through GR1L1 , replacing GR1 as a Left option of G by all

the Left option GR1L1R of GR1L1 .

The second theorem asserts that every short game25 has a unique simplest form.

Theorem A.3.2 26 Suppose that G and H have neither dominated nor reversibleoptions. Then G and H are equal iff each Left and Right option of either is equal toa corresponding option of the other.

Both of the two theorems above can be proved by induction. One can refer to ONAGfor details. They enable us to obtain a unique normal form of each game.

For example, what is the normal form of G = { ↑ | ↑ }? We already know that{ ↑ | ↑ } ≥ {0 | ↑ } =⇑∗ = ↑ + ↑ + ∗ and ⇑> 0. The latter inequality tells us G > 0.Right’s only option of ↑ will only be reversible if there is some ↑L≥ G, that is, 0 ≥ G,which is known to be false. And Left’s only option ↑ will be reversible if there is some↑L≤ G, that is 0 ≤ G, which is true. So we obtain the equation { ↑ | ↑ } = {0 | ↑ },and assert that the latter is a normal form. Another example, what is the value ofFig. 8(d) shown earlier? We have

24From ONAG p.110-11125A game called “short” if it will end after finite moves26From ONAG p.112

33

Page 39: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

= { | }

= {0 , {2 | 0} | {0 | −2}, { 12 | −2}}

Replacing Left’s reversible move and eliminating Right’s dominated move we obtainthe normal form {0 | {0 | −2}}, which was given the name +2 earlier.

A.3.2 Comparison and Addition

Once given the normal forms of two games, we can perform addition using defini-tion A.1.1 and comparison using Definition A.1.2, that is, to compare game H andG, we can just calculate G−H = G+(−H), and then compare it with zero. Besides,for some simple additions, we can also use other techniques. If x and y are numberswith x ≥ y, then, {x | y} = u+ {v | −v}27 = u± v, where u = 1

2 (x+ y), v = 12 (x− y).

In general, we can writez ± a± b± c± · · ·

forz + {a | −a}+ {b | −b}+ {c | −c}+ · · · (a ≥ b ≥ c ≥ · · · ≥ 0)

If Left moves first, he will always pick the largest available amount, the value ofthe game will become z + a − b + c · · ·, and if Right first, the value will becomez − a + b− c · · ·28

A.3.3 The Temperature Theory

The temperature theory can help us to further simplify those expressions (values)that cannot be reduced by the other rules. We can regard a game G with Left valuegreater than Right value as a hot game. Its value is vibrating between its Rightvalue and Left value in such a way that on average its center of mass is at mean(G).In order to computer the mean value, we must find some way to cool it down soas to quench these vibration. Put in another perspective, for example, in the gameG = {100 | −100}, each player could gain 100 points if moving first. We say that thisgame is very “hot”29, each side wants to move if it could. But if we require every moveshould pay a tax worth 101 points, each player could only obtain a game of worth -1and 1 point respectively, and the value of the game would be reduced to {−1 | 1} = 0.In general, we call v the self-temperature of G = u+{v | −v}, and if the environment-temperature (will be explained later) is bigger than G’s self-temperature, the valueof G will cease to vibrate, and freeze at mean(G); A formal definition is as follows.

27Here we use {x | y} + z = {x + z | y + z}28All these do not mean that the original game equals to {z + a − b + c · · · | z − a + b − c · · ·}29Games like Go are called “hot” since each move could gain some territory, while games like

Hackenbush are called “cold” since each move would decreases the number of remainding choices

34

Page 40: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Definition A.3.1 30 If G is a short game, and t a real number ≥ 0 (environment-temperature), then we define the cooled game Gt by formula

Gt = {GLt − t|GR

t + t}

For sufficiently large t, the value of a game G turns out to be a constant.

This technique is very useful in hot games like Go. At the beginning of the gameGo, the environment-temperature is very hot, or in other words, a good move cangain a lot of territory. Therefore positions like {1 | −1} are not a good choice to movein, since we would obtain a cooled game Gt = {1 − t | −1 + t} under temperaturet, which degenerates to a number for t > 1. As usual, we prefer to move in ahot position rather than in a number position. But when the game reaches its endpart, the environment-temperature tends to be constant. After a complete study,Berlekamp and Wolfe found in fact it is 1◦. By using a so-called “chilling operator”,they developed a system strong enough to play against professional Go players.

The Thermograph was invented to give people an intuitive view of what a hotvalue looks like after being cooled by various degrees of temperature. It also leads toa good strategy called THERMOSTRAT31 to find a “good move” which is close tothe optimal and will be enough to ensure one’s victory in many situations where thereare many components. The motivation to invent such a strategy is that it usuallytakes us a lot of time and energy to compute the optimal move using Definition A.1.1alone.

Let us first visit an example of how to draw the thermograph for game like {2 |−1}32 using Definition A.3.1. First, we can calculate

G = G0 = 2 | −1, G 12

= 112| 1

2, G1 = 1 | 0, G1 1

2=

12| 1

2=

12

+ ∗

Gt =12

for all t > 112

Then we can draw the thermograph for {2 | −1} as in Fig. 9.Another example: {{2 | −1}, 0 ‖ {−2 | −4}}. We can draw the thermograph

for {2 | −1} and {−2 | −4} as before. The thermograph for 0 is a vertical line.Left has two options here, then which one to choose? Since it will be Right’s turnto move after Left has made a choice, for each temperature t, Left should choosewhichever of his options that has leftmost Right boundary at t. Similarly, Rightshould choose whichever of his options that has rightmost Left boundary at t. Thus,we can determine the right and left boundary of the Left and Right set of optionsrespectively, and merge them together using Definition A.3.1 again, to obtain the finalthermograph (the middle “pyramid” in Fig. 10).

30From ONAG p.10331See Winning Ways, chapter 632All the examples are from Winning Ways Volume.1, Chapter 6

35

Page 41: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 9: Thermograph Example: G = {2 | −1}

Figure 10: Thermograph Example: G = {{2 | −1}, 0 ‖ {−2 | −4}}

36

Page 42: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 11: An Example of the THERMOSTRAT Algorithm

The THERMOSTRAT strategy is used to speed up addition. Now let us take alook at how it works. Suppose we already know the thermograph of games A,B,C · · ·,and want to compute the thermograph for the compound game G = A+B +C + · · ·.We could draw the right boundary of G using the sum

Rt(A) + Rt(B) + Rt(c) + · · ·

And the left boundary of G

Rt(A) + Rt(B) + Rt(c) + · · ·+ Wt

whereWt = max{Wt(A),Wt(B),Wt(C) · · ·}

is the largest width of any components at the height t, see Fig. 11 We can see thatleft boundary is the furthest left when the temperature t ∈ [5, 7], and we call thetemperature T = 5 the ambient temperature since it is the least necessary heatfor Left to get his favorite move. BCG in their Winning Ways have proved thatTHERMOSTRAT can nearly provide the best choice, and we have the following fact.

Theorem A.3.3 33 When playing the sum of a large number of games, the differ-ence between THERMOSTRAT and the optimal strategy is bounded by the largesttemperature.

A.4 Game Dictionaries

Now we have got all we want. We can simplify every expression, no matter whetherit contains numbers or non-numbers. But consider a little bit more, is there any

33From Winning Ways Volume.1 p.166

37

Page 43: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

Figure 12: An Example of the Patterns for Simplification

If Then

Figure 13: An Example of the Rules for Simplification

method we can use to further speed up the computation in real practice? Yes, anatural way is to build a dictionary for each particular game. The dictionary mightcontain values for all the small components (for example, all Domineering boardconfigurations containing less than 25 grids) and some general patterns as well assimplification rules. What are patterns and some simplification rules? For example,in Fig. 12, we can cut the edge between grid B and C, the operation will not affectthe value of the configuration. We can also obtain rules like Fig. 13 Please refer toONAG for the proof of this rule.

38

Page 44: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

References

[1] E. R. Berlekamp, J. H. Conway, and R. K. Guy. Winning Ways For YourMathematical Plays (2nd version). A K Peters. Ltd, 2001.

[2] E. R. Berlekamp and David Wolfe. Mathematical Go Chilling Gets the LastPoint. A K Peters. Ltd, 1994.

[3] H. J. Berliner and C. McConnell. B* probability based search. Artificail Intelli-gence, 86(1):97–156, 1996.

[4] Y. Bjornsson and T. Marsland. Multi-cut αβ-pruning in game-tree search. The-oretical Computer Science, March 1999.

[5] J. Burmeister and J. Wiles. An introduciton to the computer go field and asso-ciated internet resources. Online Resource.

[6] Michael Buro. ProbCut: An effective selective extension of the alpha-beta algo-rithm. ICCA Journal, 18(2):71–76, 1995.

[7] J. H. Conway. On Numbers and Games (2nd version). A K Peters. Ltd, 2001.

[8] A. D. de Groot. Thought and choice in chess. Mouton, The Hague, 1965.

[9] Noam D. Elkies. On number an endgames: Combinatorial game theory in chessendgames. In Games of No Chance, volume 29, 1996.

[10] Edward A. Feigenbaum. How the “what” becomes the “how”. ACM TuringAward Lecture, 1994.

[11] A. Junghanns. Are there practical alternatives to alpha-beta in computer chess.ICCA Journal, 21(1):14–32, 1998.

[12] D. E. Knuth and R. W. Moore. An analysis of alpha-beta pruning. ArtificialIntelligence, 6(4):293–326, 1975.

[13] T. A. Marsland and M. Campbell. Parallel search of strongly ordered game trees.ACM Computing Surveys, 14(4):533–551, 1982.

[14] T. Anthony Marsland and Fred Popowich. Parallel game-tree search. IEEETransactions on Pattern Analysis and Machine Intelligence, PAMI-7(4):442–452,1985.

[15] D. A. McAllester. Conspiracy numbers for min-max search. Artificial Intelligence,35(3):287–310, 1988.

[16] A. Newell, J. Shaw, and H. A. Simon. Chess-playing programs and the problemof complexity. IBM Journal of Research and Development, 2:230–335, 1958.

39

Page 45: To Control the Game in Early Time - UNSW School of ...€¦ · To Control the Game in Early Time − A new approach to computer game play Zhang Qin December 11, 2006

[17] Allen Newell and Herbert A. Simon. Computer science as impritical inquiry:Symbols and search. ACM Turing Award Lecture, 1975.

[18] A. J. Palay. Search with Probabilities. PhD thesis, School of Computer Science,Carnegie Mellon University, 1985.

[19] Aske Plaat. Research Re: search & Re-search. PhD thesis, Erasmus University,Dept. of Computer Science, 1996.

[20] Jonathan Schaeffer. The history heuristic and alpha-beta search enhancementsin practice. IEEE Transactions on Pattern Analysis and Machine Intelligence,PAMI-11(11):1203–1212, 1989.

[21] C. Shannon. Programming a computer for playing chess. philosophical magazine,41:156–175, 1950.

[22] Herbert A. Simon and Jonathan Schaeffer. The game of chess, August 19 1992.

[23] D. J. Slate and L. R. Atkin. CHESS 4.5 - The Northwestern University ChessProgram, in Chess Skill in Man and Machine, P. Frey (ed.), pages 82–118.Springer-Verlag, 1977.

[24] A. Turing. Digital computers applied to games. In B.Bowden, editor, Faster thanThought, pages 286–295. Pitman, 1952.

[25] J. von Neumann and O. Morgenstern. Theory of games and economic behavior.Princeton University Press, Princeton, 1944.

[26] H. Whitemore. Breaking the code. Samuel French, London, 1987.

[27] A. L. Zobrist. A new hashing method with applications for game playing. Int.Computer Chess Assoc. J., 13(2):169–173, 1990.

40