Data: You had nothing! Geordi: He bluffed you, Data. Data: It …poker.cs.ualberta.ca/publications/Hawkin_PhD.pdf · 2018. 4. 14. · Data: You had nothing! Geordi: He bluffed you,

Data: You had nothing!Geordi: He bluffed you, Data.Data: It makes very little sense to bet when you cannot win.Riker: But I did win, I was betting that you wouldn’t call.

– Star Trek: The Next Generation, Episode “The Measure of a Man”

i

University of Alberta

AUTOMATED ABSTRACTION OF LARGE ACTION SPACES IN IMPERFECTINFORMATION EXTENSIVE-FORM GAMES

by

John Hawkin

A thesis submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Department of Computing Science

c© John Hawkin, 2014

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesisand to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is

converted to, or otherwise made available in digital form, the University of Alberta will advise potential usersof the thesis of these terms.

The author reserves all other publication and other rights in association with the copyright in the thesis, andexcept as herein before provided, neither the thesis nor any substantial portion thereof may be printed or

otherwise reproduced in any material form whatever without the author’s prior written permission.

To my wife, Jenn.

iii

Abstract

An agent in an adversarial, imperfect information environment must sometimes decide whether

or not to take an action and, if they take the action, must choose a parameter value associated

with that action. Examples include choosing to buy or sell some amount of resources or choosing

whether or not to move combined with a distance for that movement. This problem can be expanded

to allow for mixing between multiple actions each with distinct associated parameter values and

can be expressed as an imperfect information extensive form game. This dissertation describes a

new automated method of abstracting the action space of such decision problems. It presents new

algorithms for implementing the method, provides some theory about how the method relates to

Nash equilibria in a small poker game and assesses the method using several poker games. One

of these algorithms was used in the creation of an agent that won one of the divisions of the 2012

Annual Computer Poker Competition. An improvement upon this algorithm produced an action

abstraction of two-player no-limit Texas hold’em poker that out-performs a state-of-the-art action

abstraction while requiring less than 40% of the memory. The resulting agent had the best overall

results in a round robin tournament of six top two-player no-limit Texas hold’em agents.

iv

Acknowledgements

There are many people I would like to thank for helping me during my PhD. My supervisors, Rob

and Duane, were always extremely patient, kind, and generous with their time. There were very

understanding during the many times I had to delay my work for one reason or the other. I am very

lucky to have worked under the tutelage of such skilled scientists.

The many members of the Computer Poker Research Group in my department were all extremely

helpful, and much of this work could not have been done without them. They deserve a ton of credit,

not just for guidance, but for all of the work they did that I was able to build upon. It was a joy to

work with such brilliant people. They were also great colleagues for the many years I spent in the

department. Thank you to each of you.

Thank you to my family, who saw me through the good and the bad, and put up with my answer

of “about another year” once a year, every year, since about 2010, when I was asked how much

longer I would take to finish. I am happy to be home and see them much more now.

To all my other friends in Edmonton – thanks for the memories. I enjoyed this period of my life

because I had such great people to hang out with.

Finally, I would like to thank my wife Jenn. She is always there when I need her. She is always

understanding. A better person I have not met. Without her, I’m not sure I would have been able to

do this. Jenn, I love you.

v

Table of Contents

1 Introduction 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Poker literature on no-limit bet sizing . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Problem significance and difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Thesis Organization and Contributions . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Poker games 82.1 Rules of two player no-limit poker . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Kuhn poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Half street Kuhn poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Leduc poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Texas hold’em poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.4.1 Doyle’s game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.5 Bet and raise sizes in no-limit poker . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Related work 133.1 Extensive form games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Game theory definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3 ε-Equilibrium solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3.1 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.3.2 Counterfactual regret minimization . . . . . . . . . . . . . . . . . . . . . 163.3.3 Gradient-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.4 Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4.1 Outline of approach to solving extensive form games . . . . . . . . . . . . 17

3.5 Work in large or continuous action spaces . . . . . . . . . . . . . . . . . . . . . . 173.6 The Annual Computer Poker Competition . . . . . . . . . . . . . . . . . . . . . . 193.7 Limit Texas hold’em poker agents . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.7.1 Two player limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.7.2 Ring limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.8 No-limit Texas hold’em poker agents . . . . . . . . . . . . . . . . . . . . . . . . . 213.8.1 Short stack no-limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.8.2 Deep stack no-limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Solving for bet sizes in small games 254.1 The multiplayer betting transformation . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Transformation significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2.1 Analysis in Kuhn poker . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.3 Computing multiplayer betting games strategies . . . . . . . . . . . . . . . . . . . 304.4 Computing abstractions for no-limit Kuhn poker . . . . . . . . . . . . . . . . . . . 324.5 Local maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Dynamically Adjusted Bet-Sizing Ranges 355.1 Evaluation of action abstractions . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Leduc poker with limited raising . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.3 Application to Texas hold’em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.4 Importance metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.4.1 Comparing different ranges . . . . . . . . . . . . . . . . . . . . . . . . . 395.5 Hold’em re-visited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.6 Tree creation - the small range constraint . . . . . . . . . . . . . . . . . . . . . . . 425.7 Variable range boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.8 Leduc and Texas hold’em revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 50

vi

5.8.1 Texas hold’em . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.9 ACPC 2012 competition entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6 Multiple fixed-ratio ranges 576.1 Choosing T for BETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576.2 Changing the default range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 Fixed-ratio ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.3.1 The range-size constraint revisited . . . . . . . . . . . . . . . . . . . . . . 616.3.2 Choice of n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.4 RED–BETS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.4.1 Application of RED–BETS to 5 bucket Texas hold’em . . . . . . . . . . 67

6.5 Multiple ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 696.5.1 REDM–BETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 706.5.2 Application of REDM–BETS to 5 bucket Texas hold’em . . . . . . . . 71

6.6 Pruning the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.7 Application of REFINED–BETS to 5 bucket Texas hold’em . . . . . . . . . . 78

6.7.1 Analysis of performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.7.2 Size comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.8 Kuhn revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7 Changing the card and opponent abstractions 847.1 Evaluation in imperfect recall games . . . . . . . . . . . . . . . . . . . . . . . . . 847.2 Results in IR− 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.2.1 REDM–BETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.2.2 REFINED–BETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887.2.3 Size comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.3 Scaling the card abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937.3.1 Bet size distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.4 Using action abstractions from IR− 100 in IR− 570 . . . . . . . . . . . . . . . 967.5 Larger opponent abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.5.1 Size comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.6 Scaling the card abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.6.1 Results against top no-limit agents . . . . . . . . . . . . . . . . . . . . . . 106

8 Conclusion 1088.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

8.1.1 Game transformation and BETS . . . . . . . . . . . . . . . . . . . . . . 1088.1.2 RE–BETS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1098.1.3 REDM–BETS and REFINED–BETS . . . . . . . . . . . . . . . . 1098.1.4 Changing the card abstraction . . . . . . . . . . . . . . . . . . . . . . . . 109

8.2 Limitations of this work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108.2.1 Separate runs to create action abstractions and produce strategies . . . . . . 1108.2.2 Serial computation of BETS . . . . . . . . . . . . . . . . . . . . . . . . 1108.2.3 Two strategy calculations for one agent . . . . . . . . . . . . . . . . . . . 1108.2.4 Local maxima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1108.2.5 Lack of a convergence proof for BETS . . . . . . . . . . . . . . . . . . . 1118.2.6 Choice of variable N in BETS . . . . . . . . . . . . . . . . . . . . . . . 111

8.3 Final thoughts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A Game value tables 113

Bibliography 124

vii

List of Figures

3.1 Typical process of solving a large extensive form game . . . . . . . . . . . . . . . 18

4.1 A decision node in a no-limit game, and the multiplayer betting version of the samedecision node with q = 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Half-street Kuhn poker game value per bet size. . . . . . . . . . . . . . . . . . . . 304.3 Half-street Kuhn poker game value per bet size, extended to 2 pot. . . . . . . . . . 33

5.1 Bet-sizing tree for a game with stacks of 15 big blinds, with an abstraction thatallows only half pot and pot bets. . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Bet-sizing tree for a multiplayer betting transformation of a game with stacks of 15big blinds, allowing two bets not including the bold actions, or three bets if boldactions are included. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.3 Bet-sizing trees of multiplayer betting games produced by RE–BETS. The valuesat each node represent the number of big blinds in the pot at that point, rounded tothe nearest 0.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.4 Bounds on game values of various betting abstractions for player one against anFCPA player two, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player one wishes to reduce this number. . . . 52

5.5 Bounds on game values of various betting abstractions for player two against anFCPA player one, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player two wishes to increase this number . . 52

5.6 Distribution of important bet sizes for abstractions P18 and P26. Bets are roundedto the nearest 0.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.7 Results of the 2012 ACPC no-limit bankroll instant run-off competition, in milli-bigblinds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.1 Bounds on game values of abstractions produced by RE–BETS, using three dif-ferent default ranges, for player one against an FCPA player two, in a 5 bucketperfect recall card abstraction of Texas hold’em. The Y-axis is value to player two,so player one wishes to reduce this number. . . . . . . . . . . . . . . . . . . . . . 59

6.2 Bounds on game values of abstractions produced by RE–BETS, using three dif-ferent default ranges, for player two against an FCPA player one, in a 5 bucketperfect recall card abstraction of Texas hold’em. The Y-axis is value to player two,so player two wishes to increase this number. . . . . . . . . . . . . . . . . . . . . 60

6.3 Bounds on game values of abstractions using fixed-ratio vs fixed-width ranges forplayer one against an FCPA player two, in a 5 bucket perfect recall card abstractionof Texas hold’em. The Y-axis is value to player two, so player one wishes to decreasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.4 Bounds on game values of abstractions using fixed-ratio vs fixed-width ranges forplayer two against an FCPA player one, in a 5 bucket perfect recall card abstractionof Texas hold’em. The Y-axis is value to player two, so player two wishes to increasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.5 Bounds on game values of abstractions using one, two and three default ranges forplayer one against an FCPA player two, in a 5 bucket perfect recall card abstractionof Texas hold’em. The Y-axis is value to player two, so player one wishes to decreasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.6 Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in a 5 bucket perfect recall card abstractionof Texas hold’em. The Y-axis is value to player two, so player two wishes to increasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

viii

6.7 Bounds on game values of abstractions using one and two default ranges for playerone against an FCPA player two, in a 5 bucket perfect recall card abstraction ofTexas hold’em. The pruned runs start with two default ranges, keeping e = 1 ande = 25 extra bets. The Y-axis is value to player two, so player one wishes to decreasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.8 Bounds on game values of abstractions using one and two default ranges for playertwo against an FCPA player one, in a 5 bucket perfect recall card abstraction ofTexas hold’em. The pruned runs start with two default ranges, keeping e = 1 ande = 25 extra bets. The Y-axis is value to player two, so player two wishes to increasethis number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.9 Half-street Kuhn poker game value per bet size, extended to 2 pot. . . . . . . . . . 83

7.1 Absolute bounds compared with empirical 95% confidence intervals on the gamevalue of abstractions created using REDM–BETS for player one against an FCPAplayer two, in a 5 bucket perfect recall card abstraction of Texas hold’em. The Y-axisis value to player two, so player one wishes to decrease this number. . . . . . . . . 85

7.2 Absolute bounds compared with empirical 95% confidence intervals on the gamevalue of abstractions created using REDM–BETS for player two against an FCPAplayer one, in a 5 bucket perfect recall card abstraction of Texas hold’em. The Y-axisis value to player two, so player two wishes to increase this number. . . . . . . . . 86

7.3 Bounds on game values of abstractions using one, two and three default ranges forplayer one against an FCPA player two, in an IR − 100 card abstraction of Texashold’em. The Y-axis is value to player two, so player one wishes to decrease thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.4 Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in an IR − 100 card abstraction of Texashold’em. The Y-axis is value to player two, so player two wishes to increase thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7.5 Bounds on game values of abstractions using one, two and three default ranges forplayer one against an FCPA player two, in an IR − 100 card abstraction of Texashold’em. The pruned runs start with three default ranges, keeping e = 1 and e = 25extra bets. The Y-axis is value to player two, so player one wishes to decrease thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.6 Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in an IR − 100 card abstraction of Texashold’em. The pruned runs start with three default ranges, keeping e = 1 and e = 25extra bets. The Y-axis is value to player two, so player two wishes to increase thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.7 Bounds on game values of abstractions using one, two and three default ranges forplayer one against an FCPA player two, in an IR − 100 card abstraction of Texashold’em. The pruned runs start with three default ranges, keeping e = 50 ande = 100 extra bets. The Y-axis is value to player two, so player one wishes todecrease this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.8 Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in an IR − 100 card abstraction of Texashold’em. The pruned runs start with three default ranges, keeping e = 50 ande = 100 extra bets. The Y-axis is value to player two, so player two wishes toincrease this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.9 Bounds on game values of abstractions using one, two and three default ranges forplayer one against an FCPA player two, in an IR − 570 card abstraction of Texashold’em. The Y-axis is value to player two, so player one wishes to decrease thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.10 Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in an IR − 570 card abstraction of Texashold’em. The Y-axis is value to player two, so player two wishes to increase thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

7.11 Bounds on game values of abstractions using one and two default ranges for playerone against an FCPA player two, in an IR−570 card abstraction of Texas hold’em.Some bet sizes are calculated using an IR−100 card abstraction before being pairedwith the IR− 570 card abstraction. The Y-axis is value to player two, so player onewishes to decrease this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

ix

7.12 Bounds on game values of abstractions using one and two default ranges for playertwo against an FCPA player one, in an IR−570 card abstraction of Texas hold’em.Some bet sizes are calculated using an IR−100 card abstraction before being pairedwith the IR− 570 card abstraction. The Y-axis is value to player two, so player twowishes to increase this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.13 Bounds on game values of abstractions using one and two default ranges for playerone against an FCPA player two. The Y-axis is value to player two, so player onewishes to decrease this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.14 Bounds on game values of abstractions using one and two default ranges for playertwo against an FCPA player one. The Y-axis is value to player two, so player twowishes to increase this number. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7.15 Bounds on game values of abstractions using one and two default ranges for playerone against an ACPC2011 player two, in an IR − 100 card abstraction of Texashold’em. The Y-axis is value to player two, so player one wishes to decrease thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.16 Bounds on game values of abstractions using one and two default ranges for playertwo against an ACPC2011 player one, in an IR − 100 card abstraction of Texashold’em. The Y-axis is value to player two, so player two wishes to increase thisnumber. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.17 Bounds on game values of abstractions using one and two default ranges, alongwith three default range pruned with e = 100, for player one against an ACPC2011player two, in an IR − 100 card abstraction of Texas hold’em. The Y-axis is valueto player two, so player one wishes to decrease this number. . . . . . . . . . . . . 102

7.18 Bounds on game values of abstractions using one and two default ranges, along withthree default range pruned with e = 200, for player two against an ACPC2011player one, in an IR − 100 card abstraction of Texas hold’em. The Y-axis is valueto player two, so player two wishes to increase this number. . . . . . . . . . . . . . 103

x

List of Tables

4.1 Bet sizes used in Nash equilibria of various Kuhn poker variants, with correspondingbet size found by applying BETS. . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1 Bet sizes after different length runs of the BETS algorithm. . . . . . . . . . . . . 385.2 Bet sizes after different length runs, including bets with high RT

i,|·|0 values. . . . . 415.3 Number of (I, a) pairs in various action abstractions, each with FCPA opponents,

in 5 bucket perfect recall card abstractions of Texas hold’em. . . . . . . . . . . . . 51

6.1 Default ranges used in multi-range experiments . . . . . . . . . . . . . . . . . . . 706.2 Number of (I, a) pairs in various action abstractions for player one, generated us-

ing REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and a 5 bucket perfect recall cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 81

6.3 Number of (I, a) pairs in various action abstractions for player two, generated us-ing REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and a 5 bucket perfect recall cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 81

7.1 Number of (I, a) pairs and game value in various action abstractions for player one,generated using REDM–BETS and REFINED–BETS. In each case BETSwas run five times. In all cases the opponent uses FCPA, and an IR − 100 cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 92

7.2 Number of (I, a) pairs and game value in various action abstractions for player two,generated using REDM–BETS and REFINED–BETS. In each case BETSwas run five times. In all cases the opponent uses FCPA, and an IR − 100 cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 92

7.3 Number of (I, a) pairs in various action abstractions for player one, generated us-ing REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and a 5 bucket perfect recall cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 95

7.4 Number of (I, a) pairs in various action abstractions for player two, generated us-ing REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and a 5 bucket perfect recall cardabstraction of Texas hold’em was used. . . . . . . . . . . . . . . . . . . . . . . . 96

7.5 Game values and number of (I, a) pairs in various action abstractions for player one,generated by completing just one run of BETS using three ranges, then pruningwith different values of e. In all cases player two uses ACPC2011, and an IR−100card abstraction of Texas hold’em was used. The game value to player two, so playerone wishes to decrease this number. . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.6 Game values and number of (I, a) pairs in various action abstractions for player two,generated by completing just one run of BETS using three ranges, then pruningwith different values of e. In all cases player one uses ACPC2011, and an IR− 100card abstraction of Texas hold’em was used. The game value to player two, so playertwo wishes to increase this number. . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.7 Number of (I, a) pairs in various action abstractions for player one, generated usingREDM–BETS and REFINED–BETS. The size of a symmetric fixed bettingabstraction of ACPC2011 is included for reference. In all cases the opponent usesACPC2011, and an IR− 100 card abstraction of Texas hold’em was used. . . . . 104

xi

7.8 Number of (I, a) pairs in various action abstractions for player two, generated usingREDM–BETS and REFINED–BETS. The size of a symmetric fixed bettingabstraction of ACPC2011 is included for reference. In all cases the opponent usesACPC2011, and an IR− 100 card abstraction of Texas hold’em was used. . . . . 104

7.9 One on one performance of top no-limit Texas hold’em agents in milli-big-blinds/game.Results were calculated by playing 40 duplicate matches of 100, 000 hands each(8, 000, 000 hands total). 95% confidence intervals on each result are provided. . . 106

7.10 Abstractions used by the agents in Table 7.9 . . . . . . . . . . . . . . . . . . . . . 1067.11 Descriptions of the agents in Table 7.9 . . . . . . . . . . . . . . . . . . . . . . . . 107

A.1 Value to player two and bounds for the curve shown in Figure 5.4 . . . . . . . . . 113A.2 Value to player two and bounds for the curve shown in Figure 5.5 . . . . . . . . . 113A.3 Value to player two and bounds for each curve shown in Figure 6.1 . . . . . . . . . 114A.4 Value to player two and bounds for each curve shown in Figure 6.2 . . . . . . . . . 114A.5 Value to player two and bounds for each curve shown in Figure 6.3 . . . . . . . . . 114A.6 Value to player two and bounds for each curve shown in Figure 6.4 . . . . . . . . . 115A.7 Value to player two and bounds for each curve shown in Figure 6.5 . . . . . . . . . 115A.8 Value to player two and bounds for each curve shown in Figure 6.6 . . . . . . . . . 115A.9 Value to player two and bounds for each curve shown in Figure 6.7 . . . . . . . . . 116A.10 Value to player two and bounds for each curve shown in Figure 6.8 . . . . . . . . . 116A.11 Value to player two and bounds for each curve shown in Figure 7.1 . . . . . . . . . 116A.12 Value to player two and bounds for each curve shown in Figure 7.2 . . . . . . . . . 117A.13 Value to player two and bounds for each curve shown in Figure 7.3 . . . . . . . . . 117A.14 Value to player two and bounds for each curve shown in Figure 7.4 . . . . . . . . . 117A.15 Value to player two and bounds for each curve shown in Figure 7.5 . . . . . . . . . 118A.16 Value to player two and bounds for each curve shown in Figure 7.6 . . . . . . . . . 118A.17 Value to player two and bounds for each curve shown in Figure 7.7 . . . . . . . . . 119A.18 Value to player two and bounds for each curve shown in Figure 7.8 . . . . . . . . . 119A.19 Value to player two and bounds for each curve shown in Figure 7.9 . . . . . . . . . 120A.20 Value to player two and bounds for each curve shown in Figure 7.10 . . . . . . . . 120A.21 Value to player two and bounds for each curve shown in Figure 7.11 . . . . . . . . 120A.22 Value to player two and bounds for each curve shown in Figure 7.12 . . . . . . . . 121A.23 Value to player two and bounds for each curve shown in Figure 7.13 . . . . . . . . 121A.24 Value to player two and bounds for each curve shown in Figure 7.14 . . . . . . . . 121A.25 Value to player two and bounds for each curve shown in Figure 7.15 . . . . . . . . 122A.26 Value to player two and bounds for each curve shown in Figure 7.16 . . . . . . . . 122A.27 Value to player two and bounds for each curve shown in Figure 7.17 . . . . . . . . 122A.28 Value to player two and bounds for each curve shown in Figure 7.18 . . . . . . . . 123

... ...

xii

Chapter 1

Introduction

1.1 Motivation

Games are a fantastic test bed for Artificial Intelligence (AI) research. They provide an excellent

domain in which to work, with clearly defined actions and goals. If we can learn to play different

types of games at a high level, then we can learn to behave intelligently in real world domains that

can be modelled as games. Some examples of such real world problems include scheduling port se-

curity rotations, choosing investment strategies and ordering troop movements in military conflicts.

Games can vary in difficulty from very trivial games such as tic-tac-toe to very complex games

such as chess, poker and go. This spectrum of difficulty allows researchers to develop techniques,

test them on small games, and eventually apply them to larger games. Games that are played com-

petitively among humans provide another appealing option — testing of agents created using AI

techniques against human players, from amateurs to world class professionals.

The games that have received the most attention from the AI community in the past have been

games of perfect information such as chess and checkers. In these games the entire state of the game

is known by both players. Games of imperfect information such as poker, on the other hand, present a

different type of challenge for computer scientists — the opponent’s cards are unknown, they do not

know our cards, and neither party knows what cards will be dealt in future rounds. These properties,

namely imperfect information and stochasticity, are common in real world situations. For example,

a robot navigating foreign terrain is only aware of what it can detect with its sensors (imperfect

information), and must be able to adapt to random fluctuations of the environment (stochasticity).

Two player limit Texas hold’em has been the subject of a great deal of recent work in AI [6,

37, 21, 24, 68, 65]. Though there are other popular poker games such as Omaha and Stud, Texas

hold’em is by far the most popular poker game today, partly due to its rich strategic complexity [13].

This popularity and complexity make Texas hold’em a good choice of poker variants as a testbed for

AI research. A great deal of progress has been made in recent years creating poker agents (known

as “bots”) that play two player limit Texas hold’em. In the summer of 2008 a man machine match

was held with some of the top two player limit Texas hold’em players in the world, resulting in a

1

victory for the computer [2]. While many of the AI techniques used to develop two player limit

Texas hold’em bots can be applied to two player no-limit Texas hold’em, the latter game includes

an additional challenge — it is much larger. Heads up no-limit Texas hold’em with a stack size of

200 big blinds and blinds of 50 and 100 chips (the variant played in the computer poker competition

[32]) has ∼ 10164 game states [38], while in comparison two player limit Texas hold’em has ∼ 1018

game states [6].

Interesting games such as Texas hold’em are too large to be solved directly, so first we must apply

abstraction to create a smaller game. There are two main types of abstraction — action abstraction

and state abstraction (throughout this dissertation we call state abstraction of poker games “card”

abstraction). When navigating a robot, a simple action abstraction would be something like move

backward, move forward, turn left and turn right. State abstraction might involve classifying surfaces

as either “smooth” or “rough” and treating all surfaces of one type the same way. In a poker game,

action abstraction involves limiting the actions that we allow either ourselves or the opponent to

make. For example, it is common in no-limit poker games to decide on a small number of bet sizes

that we will allow ourselves to make at all points in the game. This means that we only have to

consider what happens when we make one of these bets. An example of state (card) abstraction

in poker would be treating similar cards, say a pair of Aces and a pair of Kings, the same way

everywhere in the game.

State abstraction of large poker games is essential in any attempt to solve them. In addition to

this, the first equilibrium-based two player limit hold’em bots also used action abstraction, reducing

the number of bets allowed each in a round from four to three [6]. Modern two player limit hold’em

bots no longer take this approach, as better performance can be achieved by fully representing all

actions and only abstracting the state space of the game [6]. The explosion in the size of the no-limit

game is caused by the large number of actions each player may choose from at every point in the

game. One approach to solving the no-limit game is to abstract the action space of the game in the

way I described above, removing the majority of the actions until the game is similar in size to limit

Texas hold’em, and then applying the same solution techniques that have been so effective in the

limit game. This is the approach I use in this dissertation — I present an algorithm that takes a game

with a large continuous or near-continuous set of actions and intelligently picks a small subset of

these actions to keep, creating a much smaller abstraction of the full game. In this dissertation I focus

on the poker literature that relates to action abstraction in agents that play approximate equilibrium

strategies. For a good general review of literature related to computer poker see Rubin and Watson

[56].

1.2 Poker literature on no-limit bet sizing

There is a great deal of poker literature that deals with the issue of bet sizing in no-limit Texas

hold’em. The books No-limit hold’em: Theory and Practice [62], Heads-up No-limit Hold’em [51],

2

Professional No-limit Hold’em: Volume I [13], Harrington on Hold’em: Volume I and II [27, 28]

and Harrington on Cash Games Volume I and II [29, 30] all emphasize the value of picking different

bet sizes in different situations. While some offer general rules of thumb with regard to bet sizes

[13, 27, 28, 29, 30], others emphasize the situations in which bet sizes should be adjusted up or

down depending on the current game state [51, 62].

In Harrington on Hold’em: Volume II [28], Harrington advocates a jam-fold strategy be used

when the stack to big blind ratio hits 7.5, and sometimes when the ratio is between 7.5 and 15.

Similarly in the Mathematics of Poker by Chen and Ankenman [9] the authors suggest that using a

jam-fold strategy is close to optimal with stacks of about 11 or 12 big blinds for the small blind and

up to 13 or 14 for the big blind. Chen and Ankenman calculate near optimal jam-fold strategies for

stacks of up to 50 big blinds, and their results are in good agreement with the work of Miltersen and

Sorensen mentioned above in Section 3.8.1 (which was for 6.67 big blinds). Additionally, Chen and

Ankenman include a bound on their suboptimality [9].

Chen and Ankenman also devote a couple of chapters of The Mathematics of Poker to the study

of no-limit bet sizing in Kuhn poker [9]. They derive analytical solutions for three no-limit Kuhn

games - half street Kuhn, Kuhn with no raises allowed and Kuhn with no restrictions. They show

that Nash equilibria in these games use a variety of bet sizes, and they go on to show that in certain

simple multi-street games a bet size of pot is optimal. It should be noted that the derivation of the

analytical solution to no-limit Kuhn poker is seven pages long, and their approach does not scale

well and thus cannot be directly applied to two player no-limit Texas hold’em. I use their analytical

solutions to test our algorithm for generating action abstractions, as I will show in Section 4.4.

Chris Ferguson, a world series of poker main event champion, is well renowned in the poker

community for his bet sizing abilities. In the book The Full Tilt Poker Strategy Guide [10], he wrote

a section entitled “How to bet” in which he describes his bet sizing methodology. He advocates

betting different amounts in the pre-flop based on a player’s position only (in a nine or ten handed

game). Thus for two player games his suggestion is to use only one bet size, independent of your

private cards. On further rounds he bases his bet size on the texture of the board, what he believes

his opponent’s possible hands could be, and his best guess as to what his opponent believes he

may have. He then determines an appropriate bet size, all the while ignoring his private cards. He

only uses his private cards to determine if he will bet. An advantage of this approach is that it

incorporates information hiding — it is impossible to deduce private cards from bet sizes if the two

are completely independent.

1.3 Problem significance and difficulty

An abstraction of the legal actions in two player no-limit Texas hold’em leads to problems when

such a strategy is used to play the full game, as actions made by the opponent that are not within

this abstraction must be handled. This is known as the translation problem, and it has received some

3

attention in recent years [58, 59]. While solving the translation problem deals with interpreting bets

made by the opponent, the question of what bet sizes a player should make for themselves has been

given little attention. Many modern no-limit poker bots use simple betting schemes, such as only

ever betting the the number of chips that are currently in the pot (called a pot bet) or all of their chips

(known as an all-in bet) [25, 58]. A great deal of value can be gained from choosing appropriate bet

sizes - in the small poker game of no-limit half street Kuhn poker, for example, the betting player

has the advantage if they choose their bet size well. If, however, the betting player chooses a bet

size of pot, their advantage is reduced to zero (see Section 4.2.1). This is assuming that both players

play according to a Nash equilibrium strategy for the given bet sizes — that is, they both choose a

strategy such that neither player can deviate from this strategy and gain value. The bettor has lost

their advantage by selecting an inferior bet size [9].

Another example of the importance of picking good bet sizes comes from work done by Schni-

zlein [58]. Schnizlein did experiments in 100 and 200 big blind no-limit Texas hold’em, comparing

approximate Nash equilibrium strategies in abstractions of these games that allowed only a few

hand-picked bet sizes. In the 100 big blind game his experiments showed that adding the option of

making a half pot bet once per round can lead to a significant improvement in performance over bots

using abstractions which allow only pot and all-in bets. He then did an experiment in the 200 big

blind game keeping the size of the abstract game constant and alternating between allocating more

space to the card abstraction and the action abstraction. As with the 100 big blind game allowing

a single half pot bet each round in addition to pot and all-in bets led to the best performance, even

though this increased the size of the abstraction by 15.7 times, and thus a much smaller card ab-

straction was used. The most interesting result of this experiment, however, was that an agent which

allowed half pot, three quarter pot, pot, 1.5 pot and all-in bets performed roughly the same as an

agent allowing only pot and all-in bets but using a much more refined card abstraction [58]. The

extra bet sizes increase the size of the game by 248 times, and thus for this experiment the bot using

only pot and all-in bets used a card abstraction that was 248 times bigger.

This result suggests that the choice of bet sizes is of great importance in creating an effective

no-limit poker bot. An analytical approach to solving for bet sizes in no-limit poker games was suc-

cessfully applied to the very small game of no-limit Kuhn poker [9], however scaling such methods

to a large game such as two player no-limit Texas hold’em is intractable. One must use abstraction,

as even if we could solve a game with ∼ 10164 bet sizes, there would not be enough memory to store

a strategy to play the game. To this author’s knowledge, there has been no work on automating the

action abstraction of no-limit poker other than the work presented in this dissertation.

1.4 Thesis Organization and Contributions

The action space of no-limit Texas hold’em is extremely large. At each decision point the player

must decide between a large number of different bet sizes. This can be approximated by instead

4

having them simply choose whether or not they will bet, and if they do bet, the amount that they

wish to bet. This is now part of a more general class of problems in game theory, where an agent

in an adversarial, imperfect information environment must decide whether or not to take an action

and, if they take the action, must choose a parameter value associated with that action. Examples

include choosing to buy or sell some amount of resources or choosing whether or not to move

combined with a distance for that movement. This model can be expanded to allow for mixing

between multiple actions each with distinct associated parameter values. The contribution described

in this dissertation is an automated method of abstracting the action space of such games, with an

emphasis on no-limit poker games.

In Chapter 2, I explain the rules and terminology of the no-limit poker games we use as test-beds

throughout the dissertation. I begin Chapter 3 by defining relevant background material in the field

of game theory and discussing the game theory concepts that are important to this work. I then

introduce related work on domains with large action spaces. I finish the chapter with a summary of

the history of computer poker research and a description of the current state of the art in this field.

In Chapter 4, I show that, using a new transformation and accompanying regret-minimizing

algorithm, I can compute the bet sizes used by Nash equilibrium strategies in small no-limit games,

without solving those games directly. I introduce a new transformation that takes a game of the

type described above and creates a new multi-agent game consisting of two teams. Each team has

a single decision agent and possibly multiple parameter-setting agents, one for each parameter. The

transformation is designed such that if we define a strategy for each parameter-setting agent in the

transformed game, these map directly to parameter values in the original game. I demonstrate the

usefulness of this transformation empirically and analytically in small no-limit poker games. I show

that when this transformation is applied to half-street no-limit Kuhn poker, a Nash equilibrium of the

resulting game has the useful property that the strategy of the parameter-setting agent can be mapped

to the bet size used in a Nash equilibrium solution of the original game. I then introduce a new

regret-minimizing algorithm, which I call BETS. This algorithm is based on the CFR algorithm

[68], designed specifically to be used on games after applying my transformation. I use BETS on

the transformed version of various small no-limit games. I show empirically that, in each case, the

resulting strategies for the parameter-setting agents can be mapped to the bet sizes used in a Nash

equilibrium of the original game.

In Chapter 5, I consider the problems that arise when applying my transformation to games

where the depth of the game tree is large. The BETS algorithm, introduced in Chapter 4, defines a

range of possible values for each parameter in the game. The parameter-setting agents may select a

value from that range. In Chapter 5 I show that careful range choice is essential to creating a good

action abstraction. I explain why selecting appropriate ranges is a non-trivial problem. There are

practical reasons why large ranges cannot be used, and in general no single small range is appropri-

ate. I demonstrate a new algorithm, RE–BETS, which automatically adjusts ranges using multiple

5

short runs of BETS. I apply RE–BETS to no-limit Leduc poker and no-limit Texas hold’em by

fixing the bet sizes of one player and using parameter-setting (bet-sizing) agents to choose the bet

sizes of the other player. The resulting abstract games are both smaller than common hand-crafted

action abstractions and have higher game value for the player whose bets have been chosen by RE–

BETS. I finish Chapter 5 by describing how RE–BETS was used to create the agent that won the

no-limit bankroll instant run-off division of the 2011 Annual Computer Poker Competition (ACPC).

In Chapter 6, I consider another approach to choosing the end points of the ranges used by

the parameter-setting agents on each run of BETS. RE–BETS uses fixed-width ranges, where

the distance between the endpoints of each range remains a constant fraction of the pot size for

all parameter-setting agents. I show that setting the range width in a different way, which I call

fixed-ratio ranges, has many advantages over fixed-width ranges. Next I consider generalizing RE–

BETS to use more than one parameter-setting agent at each decision node. Combining this with

fixed-ratio ranges leads to the REDM–BETS algorithm. I show results of REDM–BETS ap-

plied to a perfect recall card abstraction of no-limit Texas hold’em, using a simple action abstraction

for the opposing player. Using two parameter-setting agents per node leads to an increase in value,

however using three parameter-setting agents does not improve over two. I conclude Chapter 6 by

introducing a technique where I allow multiple parameter-setting agents on the first run of BETS,

after which I pick the most valuable of these actions and remove the less valuable ones. This leaves

multiple parameter-setting agents at the same decision point in only a few places, where each is

useful. The resulting algorithm, REFINED–BETS, leads to a further reduction in the size of

my abstractions while maintaining the value achieved by REDM–BETS, again when applied to

a perfect recall card abstraction of no-limit Texas hold’em using a simple action abstraction for the

opposing player.

In Chapter 7, I apply the REDM–BETS and REFINED–BETS algorithms to imperfect

recall card abstractions of no-limit Texas hold’em, using two different action abstractions for the op-

posing player. I show that having more than two parameter-setting agents at each decision node does

lead to improved performance in imperfect recall games, unlike the perfect recall game of Chapter

6. I also show that increasing the size of the imperfect recall card abstraction does not significantly

change the action abstractions generated by my algorithms. This leads me to the idea of creating

action abstractions using small card abstractions, then pairing the resulting action abstractions with

larger card abstractions when creating the final production agent. I use the REFINED–BETS

algorithm and this technique to create an agent for no-limit Texas hold’em that performs very well

when compared with other state-of-the-art agents.

The research described in this dissertation makes the following contributions:

• I introduce a transformation and accompanying regret-minimizing algorithm (BETS) that is

shown experimentally to generate high value action abstractions (Chapter 4).

6

• I introduce an algorithm that iteratively applies BETS, known as RE–BETS, which re-

solves some of the issues with scaling the BETS algorithm to large games (Chapter 5).

• I further refine the RE–BETS algorithm to allow for multiple parameter-setting agents at a

single decision node. This leads to the REDM–BETS algorithm (Chapter 6).

• I introduce a metric that can be used to rank the importance of parameter-setting agents. This

is used in my final algorithm, REFINED–BETS, to prune the game tree (Chapter 6).

• I show that these algorithms scale well – that when applying these algorithms to small imper-

fect recall poker games, the resulting action abstractions perform well when paired with larger

card abstractions (Chapter 7).

• I apply RE–BETS and REFINED–BETS to create state-of-the-art no-limit poker agents

(Chapter 5, Chapter 7).

7

Chapter 2

Poker games

Limit and no-limit poker games have different betting rules. In limit poker at every point in the

game where a bet is legal there is only one bet size that can be made. In no-limit, however, a player

may bet any integer amount of chips between some minimum bet size and all of their chips. In this

chapter I will describe the basic rules of all two player no-limit poker games, and I will define in

detail a number of games that I will be using later.

2.1 Rules of two player no-limit poker

This subsection presents a set of rules that are common to all of the two player no-limit poker

variants used in this dissertation. Each player starts with the same number of chips, known as their

stack. These chips are used for betting throughout the game. Before cards are dealt each player is

required to put some number of chips in the pot. If the game uses antes, then each player places the

same number of chips in the pot. If the games uses blinds, then one player places a set number of

chips known as the small blind in the pot and the other player places the big blind, usually twice the

size of the small blind. The players are often referred to by the blind they put in the pot - the small

blind player and the big blind player. Typically the size of each player’s stack is measured in big

blinds or antes — a common stack size for no-limit Texas hold’em poker is 100 big blinds.

Next, cards are dealt to each player, at least one of which is always dealt face down and kept

private. This adds the element of imperfect information to the game. After each player receives the

same number of cards a betting round begins. If no bet has been made so far in a round then the

player first to act may choose between one of two actions:

• bet - place some number of chips in the pot, forcing the opponent to either match them (call)

or surrender the hand (fold).

• check - decline the option to bet.

If a bet has been made, a player has three possible responses (note that in some variations some

of these may be disallowed at certain times):

8

• fold - give up on the hand, surrendering the pot to the other player.

• call - match whatever bet the opponent has made and end the betting round.

• raise - match any bet made by the opponent and increase it, requiring the opponent to match

this new bet. A raise is best thought of as a call followed immediately by a new bet.

Now that I have introduced the actions I can explain an additional difference between antes and

blinds. When using antes the first player faces no bet, so the valid actions are bet and check. When

using blinds the small blind acts as a forced bet, and the big blind acts as a forced raise. Thus the

player who placed the small blind in the pot is faced with a bet, and may fold, call or raise. Note

that a special situation arises if the player in the small blind elects to call, matching the big blind.

The player in the big blind is then given the option to either check, ending the betting round, or bet,

forcing the small blind to respond.

Poker games are made up of betting rounds, which always follow the dealing of cards. Each

player is given a turn to act (in some test games I suppress this rule). If no bet is made, then the

betting round ends and the next set of cards are dealt according to the rules of the game. If a bet or

raise is made then the other player must respond to this action before continuing to the next round.

When a player elects to bet in a limit poker game, there is only one bet size they are allowed to

make, determined by what betting round they are in. In no-limit poker a player may bet any amount

of the chips remaining in their stack, with the restriction that they must at least match the last bet

that was made in that betting round. For example if the small blind raises, matching the big blind

and betting an additional 10 chips, then if the big blind wishes to raise again they must put in at

least 20 chips - calling the bet of 10 chips and betting at least an additional 10 chips. The first bet or

raise in each round must at least match the size of the big blind. In games that use antes the initial

minimum bet size varies depending on the game. The minimum bet restriction is lifted if the player

has less than this minimum number of chips left in their stack, in which case if they wish to bet they

must put all of their remaining chips in the pot (this is known as an all-in action). I will sometimes

ignore the minimum betting rule when working with small games for testing purposes.

Cards dealt before a betting round are either personal cards, which are only seen by the player

they were dealt to and may only be used by that player, or community cards, which may be used by

either player1. If we reach the end of the final round of betting and neither player has folded, the pot

is awarded to the player that can form the best poker hand using their own cards and the community

cards. I will assume the reader is familiar with the ranking of poker hands (if not see [27]).

Now that I have defined the rules I will introduce several useful no-limit poker games. Some

small no-limit games use fewer cards and thus use a modified version of the usual hand rankings,

which I will define in each case below.1In the games I will consider the personal cards are always unseen by the other player, though there are poker games

where at least one of these is made public information.

9

2.2 Kuhn poker

Kuhn poker is a small poker game invented by H. W. Kuhn [45]. It is played with three cards in the

deck - one Jack, one Queen and one King. The King has the highest value and the Jack the lowest.

Kuhn poker is usually played using antes, though blinds may also be used. In all of my experiments

I will use antes. After the antes are paid one card is dealt face down to each player, leaving one card

unseen by both players. The game’s only betting round then begins. In Kuhn poker no raises are

allowed — if one of the players bets, the other player may only fold or call. If the betting round ends

with neither player folding then the player holding the highest card wins the pot.

Kuhn poker is very useful as a testbed because it is a very small game, and yet it still retains some

of the basic strategic elements of larger games such as Texas hold’em poker [35]2. Kuhn poker can

be extended by allowing players to make an unlimited number of raises (until they run out of chips),

and I will sometimes use this extension for testing purposes. The size of the bet in the original game

is the same size as the ante, however I generalize the game to a no-limit version by allowing the bet

size to be anything up to the stack size.

2.2.1 Half street Kuhn poker

Kuhn poker can be further simplified by not allowing the second player to bet. This means if the

first player checks the game ends and the player with the highest card wins the antes. If the first

player bets then the second player may either fold or call. This variation of Kuhn poker is known as

half street Kuhn poker. As with regular Kuhn poker, I generalize this game to a no-limit version by

allowing the betting player to make any bet size up to the stack size.

2.3 Leduc poker

Leduc poker uses a deck with three different ranks and two different suits. The Jack, the Queen and

the King are used again, with the same relative values. The game can be played with either antes

or blinds. Each player receives a card face down and the first betting round begins. After the first

betting round concludes if neither player has folded a community card is dealt. Then the second

betting round commences. If neither player folds then the winner is the player who can make the

best two card hand. This means if either player has a pair (their private card is the same as the

community card) they win, and if not then the player with the highest private card wins.

By default there is no limit to the number of raises players can make in Leduc poker. I sometimes

decide to impose a limit in order to reduce the size of the game.

2Two such strategic elements are bluffing, or betting when you have a poor hand in the hopes that the opponent will fold,and slow-playing, or not betting when you have a very strong hand to induce a bet from your opponent.

10

2.4 Texas hold’em poker

The final game I will define is Texas hold’em poker. This game uses a full 52 card deck, with 13

ranks and 4 suits. There are four rounds of play known as the preflop, the flop, the turn and the river.

Each player is initially dealt two private cards face down and the preflop betting round begins. Next

three community cards are dealt face up in the middle of the table — this is known as the flop. After

a betting round another community card is dealt (the turn) and the third betting round commences.

The final community card is then dealt (the river) and one more round of betting occurs. Once this

round is over both players make the best five card poker hand they can using any five of their two

private cards and the five community cards, and the best poker hand wins. As usual if either player

folds at any point, the game ends and the pot is given to the other player.

2.4.1 Doyle’s game

One hand of poker takes very little time to play, and players usually play several hundred or thousand

hands in a row. If both players start with N chips and player 1 wins k chips on the first hand, then

typically both players will continue playing with player 1 having N + k chips and player 2 having

N−k chips. The value of the game at this point is the same as if both players had N−k chips, since

one player cannot bet more chips than the other player has. This is true whether both players are

playing a cash game format, where either player may choose to quit at any point, or a tournament

format, where both players play until one player has all the chips [9].

Instead of playing either of these formats I use another format known as Doyle’s game [32]. If

each player begins hand number 1 with a stack of size N , then after this hand we will keep track of

which player won or lost chips and reset both players stacks to size N . I do this after each hand so

that every hand begins with the same stack sizes. There are two reasons to do this. On a strategic

level, the larger the stacks the more strategically complex the game is, due to the larger number of

information sets. I wish to work with a strategically complex game. On a practical level a strategy

designed for a game with stacks of size 200 cannot be applied to a game with stacks of size 20. Thus

to design an agent to play a two player tournament, it would be necessary to solve the game for a

huge number of different stack sizes and switch between these strategies each hand depending on

how many chips the player with the least chips had. While this is certainly possible, partly because

the smaller stack games are much quicker to solve, a lot of computational work is required to play

what is considered to be a less challenging game [62].

2.5 Bet and raise sizes in no-limit poker

Bet and raise sizes in no-limit poker games are usually expressed in terms of the current size of the

pot [62, 61, 51, 13]. For example, if there are 20 chips in the pot, and a player makes a bet of 10

chips, this is called a half pot bet, or a bet of 0.5 pot. If the other player then responds by putting

11

50 chips in the pot, this is referred to as a pot-sized raise. The first 10 chips match the initial bet,

bringing the pot size to 40 chips, and thus the additional 40 chips make up a pot-sized raise. I express

bet and raise sizes in this way throughout this dissertation, and I will use the term “bet” to refer to

both bets and raises, and “bet size” to refer to both bet and raise sizes.

12

Chapter 3

Related work

I begin this chapter by defining extensive form games and various game theory terms that I will be

using. Next I introduce the most important and influential algorithms for finding ε-Nash equilibria in

extensive form games. I then discuss abstraction and outline the usual approach to solving extensive

form games. After this I provide a review of the existing literature on solving domains with large

or continuous action spaces. I move on to discuss the development of poker agents in limit Texas

hold’em. I conclude this chapter with a summary of the state of research on two player no-limit

Texas hold’em.

3.1 Extensive form games

An extensive form game consists of:

• N players.

• A set H of histories. Each history h in this set consists of a sequence of valid actions for the

game (transitions between states in the game made by players and chance).

• A set Z ⊆ H of terminal histories. A terminal history represents the end of the game — no

more actions may be made.

• An information partition Ii of all choice points in the game for each player i consisting of all

of the information sets I for that player. An information set for a player is a set of histories

that the player cannot distinguish from each another.

• A player function P (h), defined for every history h, that determines whose turn it is. This can

be P (h) = i, meaning player i’s turn, or P (h) = c, where c is chance.

• A function fc defining the probability that chance will take each of the available actions when

P = c.

• A utility function ui for each player i assigning a real value to each terminal history. This is

the payoff for that player when that terminal history is reached.

13

Extensive form games can be represented by a figure known as a game tree. A game tree is made

up of decision nodes, chance nodes, terminal nodes, and actions. The edges in the tree represent

actions, the chance and decision nodes represent information sets and the terminal nodes represent

terminal histories. At each decision node player P (h) = i chooses an action. A chance node

corresponds to a history h such that P (h) = c. At a chance node the actions are chosen according

to the function f Ic (a) for that node. At a terminal node payoffs are given to each player i according

to the utility function ui(h).

3.2 Game theory definitions

In order to play an extensive form game each player must have a strategy. A pure strategy defines

a single action to be taken at every information set in the game. A mixed strategy is a probabilistic

combination of all pure strategies, chosen prior to playing. A behavioral strategy is a probability

distribution over actions defined at each information set. Perfect recall is the assumption that a

player remembers all their own past actions, as well as any prior observations when making those

actions [67]. Any game where this is not true is known as an imperfect recall game. In perfect

recall games any mixed strategy can be represented as a behavioral strategy and vice versa [44]. In

imperfect recall games this is no longer the case.

The following are textbook definitions [26]. A strategy profile is a set of strategies, one for

each player in the game. Given a strategy profile, a best response is a strategy for one player that

maximizes that player’s expected utility when played against the strategies of the other players in

that strategy profile. A Nash equilibrium is a strategy profile such that no player can improve their

utility by changing their strategy. This means that if all players are playing according to the same

Nash equilibrium strategy profile, then each player is playing a best response to all other players. An

ε-Nash equilibrium is a strategy profile in which no player can improve by more than ε by changing

their strategy. A constant sum game is a game in which the utility of all players sums to a constant

at every terminal node. If the utility sums to zero we have a zero sum game. A constant sum game

can be converted to a zero sum game by subtracting the constant divided by the number of players

from the payoffs of all players at all terminal nodes. In a two player zero sum game if each player

plays an equilibrium strategy from any equilibrium strategy profile, each player’s expected utility is

a fixed value, known as the value of the game for that player.

3.3 ε-Equilibrium solvers

Most of the state of the art poker agents that exist today are created using an ε-equilibrium solver.

These produce ε-Nash equilibria for the games to which they are applied.1

1Note that if abstraction is used the result is an ε-Nash equilibrium for the abstract game, not necessarily the full game[66].

14

3.3.1 Linear programming

Normal form representation

Normal form is a description of a game where, instead of the graphical representation used in exten-

sive form, the game is represented by a matrix. Let us define x as a vector of length n where each

xi is the probability that player one will play pure strategy i and n is the number of pure strategies

available to player one. The equivalent vector of length m for player two is y, where player two

may choose from m pure strategies. The expected payoff to the first player can be expressed as

xTAy (3.1)

where A is an n x m payoff matrix. Each entry in this matrix Aij is the utility for player one if they

play pure strategy i and player two plays pure strategy j. The elements of vector x must add to 1,

since player 1 must choose a pure strategy each time the game is played, and similarly for y.

A Nash equilibrium can be found by solving

maxx

miny

xTAy (3.2)

subject to the constraints on x and y I have described in this section, as well as xi ≥ 0 for all

i between 1 and n and yj ≥ 0 for all j between 1 and m. This equation and these constraints

can be transformed to a standard linear programming form and can then be solved using linear

programming techniques, such as the interior point method [49] or the simplex method [52].

Sequence form representation

The development of sequence form represented a huge step forward in solving two player zero sum

games [43]. In the normal form representation of a game the payoff matrix contains an entry for

every possible combination of pure strategies for both players. This stores a great deal of redun-

dant information — consider a game where player one choses from a actions and then player two

responds by picking one of b actions, ending the game. Each of player two’s pure strategies defines

a response to all possible actions player one could make, and thus player two has ab pure strategies

and there are ab columns for player two in the normal form payoff matrix. The payoff matrix stores

a2b values, even thought there are only ab possible sequences of actions in this game. Sequence

form removes this redundancy by replacing pure strategies with sequences, which specify a player’s

moves only along a specific path in the game tree [54]. The number of sequences is thus bounded

by the number of nodes in the tree. Koller et al. [43] show that this problem can still be formulated

as a linear program, and thus solved using any linear programming method.

15

3.3.2 Counterfactual regret minimization

The counterfactual regret minimization algorithm (CFR) is currently the most successful and com-

monly used algorithm for finding ε-Nash equilibria in two player poker games [32]. The concept of

regret refers to how much more value one would have obtained by choosing a utility-maximizing

strategy instead of the strategy that was employed [37]. Note that a large regret value for a strategy

means that strategy would perform much better than our current strategy. When a game is played

repeatedly, average overall regret refers to the average of the differences in utility between the strate-

gies chosen each time the game was played, and the strategy that would have maximized utility over

all games played [37]. If the average overall regret of a strategy profile is less than ε, then this

strategy profile is a 2ε-Nash equilibrium. The CFR algorithm uses a concept known as immediate

counterfactual regret defined at an information set I , which is a player’s average regret for their ac-

tions at I over multiple games, assuming they played to reach that information set. Zinkevich et al.

show that the sum of immediate counterfactual regret over all information sets bounds the average

overall regret [68].

CFR is a self play algorithm that works by using Blackwell’s algorithm for approachability to

minimize the immediate counterfactual regret at each information set [11]. This is done by consid-

ering every action at every information set and calculating the utility that would have been gained

or lost if that action was taken instead of what was actually played. This value is added up each

iteration for each action. Zinkevich et al. call this sum the cumulative regret of each action [68]. For

each iteration, the absolute value of these cumulative regrets are added together, and both players’

strategies are changed so that they take each action at each information set with probability pro-

portional to that action’s contribution to this sum. If an action’s cumulative regret is negative it is

ignored. For example if the cumulative regrets over all iterations at a given information set are 40

for folding, 60 for calling and −25 for raising, then the strategy at that information set on the next

iteration will be P (f) = 0.4, P (c) = 0.6 and P (r) = 0. Zinkevich et al. show that if this algorithm

is applied to any perfect recall two player zero sum game, the average strategy profile converges to

an ε-Nash equilibrium, where ε shrinks as the number of iterations increases [68].

The CFR algorithm is often run using sampling of the state space, and can be generalized to

use sampling of the action space as well ([18, 40, 46]). When CFR was first applied to poker the

authors were able to solve games with ∼ 1012 game states, two orders of magnitude larger than was

previously possible using linear programming techniques on the sequence form representation of the

game [68].

3.3.3 Gradient-based approach

Gradient-based algorithms attempt to solve the sequence form representation of an extensive form

game (see Section 3.3.1) by starting from an arbitrary point in the strategy space and descend-

ing to the saddle point. There is a problem with trying to solve this equation for poker games —

16

miny xTAy and maxx x

TAy are not smooth functions. To tackle this problem Gilpin et al. [23]

smooth these functions using the excessive gap technique [53]. They then scale down the contribu-

tion of the smoothing functions as they approach the saddle point, eventually arriving at an ε-Nash

equilibrium for the game. Gilpin et al. report that using this technique they were able to handle

games that are several orders of magnitude larger than games that can be solved using conventional

linear programming solvers [23].

3.4 Abstraction

When a game is too large to solve directly, a technique known as abstraction is often used. The idea

is to create a small game with the same basic properties as the larger game and then solve the smaller

game. The process of creating the smaller game is known as abstraction, and the smaller game is

often called the abstract game. The solution to the small game is adapted to be used in the larger,

original game. A successful abstraction technique is one that creates a smaller game that retains the

important strategic features of the original game. The larger the game, the more difficult this process

is. I will discuss abstraction of no-limit Texas hold’em in Section 3.8.2.

3.4.1 Outline of approach to solving extensive form games

The typical approach to solving large extensive form games it to first perform abstraction, creating

a much smaller game. The abstract game is then solved using an equilibrium solver. Finally the

strategy for the abstract game is converted into a strategy to play the full game in a process called

translation. The translation problem is more complex when action abstraction and state abstraction

are used together (see Section 3.8.2). This process is represented graphically in Figure 3.1.

3.5 Work in large or continuous action spaces

While considerable attention has been given to deriving an optimal policy for an agent in a large

or continuous state space, little work has been done considering domains with large or continuous

action spaces [47, 31]. The two strategies that have been pursued are to either create coarse-grain

abstractions of the action space and apply standard methods for discrete action spaces [12, 48], or

to sample from the continuous action space according to some distribution [47, 31]. In this section

I discuss work outside of the domain of no-limit poker. I summarize research on creating no-limit

agents in Section 3.8.

Lazaric et al. [47] tackle domains with continuous action spaces by using an actor-critic ap-

proach in which the policy of the actor is estimated using sequential Monte Carlo methods. They

then use importance sampling on the basis of the values learned by the critic, and they apply resam-

pling to modify the actor’s policy. They show results where their algorithm performs better than

static solutions such as continuous Q-learning [47]. While this approach had some success in static

17

Figure 3.1: Typical process of solving a large extensive form game

domains, it is unlikely to scale well to domains with adversarial agents.2 Hasselt et al. [31] also

designed an actor-critic algorithm for continuous action spaces. The key difference between their

algorithm and standard actor-critic algorithms is that they use the sign of the TD-error to determine

the update to the actor instead of using the exact value. Their algorithm performs well in simple

tracking and balancing problems [31].

Madeira et al. [48] developed very coarse-grained abstractions of the state and action space of

large-scale strategy games, some of which have state spaces of size ∼ 101900 and action spaces of

size ∼ 10200. The resulting agents performed at an intermediate level on the Battleground simulator

[48]. Similarly Dean et al. [12] create coarse-grain abstractions to compactly represent MDPs

for planning problems with large state and action spaces. Their methods work most effectively in

domains where, though there are a large number of actions at each state, only a few of these actions

have much of an impact on achieving the final goal or minimizing costs [12].

Little theoretical work exists dealing with agents operating in domains with large or continuous

action spaces. Nijmegen [64] shows that for infinite stage stochastic games with very large action

spaces there exist ε-equilibrium points for all ε > 0. Antos et al. [4] provide a rigorous analysis of

a fitted Q-iteration algorithm for continuous action space MDPs, deriving what they believe to be

the first finite-time bound for value-function based algorithms in continuous state and action space

problems.

2Personal correspondence with Alessandro Lazaric, March 2009

18

3.6 The Annual Computer Poker Competition

Every year since 2006 the Annual Computer Poker Competition (ACPC) has been held in association

with a major AI conference (AAAI or IJCAI) [32]. Initially the competition was played using limit

Texas hold’em, and over time two more variants of Texas hold’em were added: limit with more

than two players and two player no-limit. Starting in 2009 each game has been divided into two

divisions, scored using different winner determination rules. The total bankroll division ranks the

entries according to the total number of chips they win or lose against all other entries. This division

is designed to emphasize opponent modeling: the amount by which an agent wins (or loses) against

another agent is very important. The other division, known as bankroll instant run-off, is scored

the same as the total bankroll division initially, then the worst performing competitor is removed

from the tournament and the scores are recalculated. This process is repeated until there is only one

competitor left. The idea of this scoring system is to try and measure which agents lose the least.

The amount of chips won or lost is less important, and instead it is more important if an agent wins

or loses against the other competitiors.

The problem of opponent modeling in poker is a very difficult one, and while the idea of the total

bankroll division is to encourage research in this area, it has often been the case that the winners of

this division did not have a modeling component. Recent work on modeling has shown promise —

Bard et al. use an implicit modeling technique to choose between a set of pre-computed strategies

[5]. They show that their implicit modeling agent would have won the 2011 two player limit total

bankroll division of the ACPC had it been employed.

3.7 Limit Texas hold’em poker agents

Here I will outline the work that has been done on solving limit Texas hold’em poker.

3.7.1 Two player limit

In 2003 Billings et al. [7] developed the abstraction techniques necessary to make the game of

two player limit Texas hold’em small enough to solve using linear programming algorithms. This

allowed for the creation of the first agents competitive with world class players. The next major

advancement was the development of the CFR algorithm by Zinkevich et al. [68], as described in

Section 3.3. In 2007, agents created using the CFR algorithm were the first to compete head to head

with world class professional poker players, losing by a small margin. Further advancements in state

abstraction [67] improved the performance of these agents, leading to a victory over some of the best

limit two player Texas hold’em players in the world in July 2008 [2].

Recently another algorithm has advanced the state of the art in two player poker. Building

on their algorithmic technique for computing best responses quickly [41], Johanson et al. created

CFR-BR, a CFR variant that is guaranteed to find, within a given state abstraction, the strategy

19

with the lowest exploitability in the full game [39]. Their algorithm uses a combination of chance-

sampled CFR with best response calculations to achieve this goal. In another paper, Johanson et

al. look at both improvements to state abstraction in poker games and proper evaluation of these

abstractions [42]. Their results showed that using imperfect recall in state abstractions results in

better performance, which is not surprising given the fact that many of the best entries in the limit

two player divisions of the ACPC in the last few years have used imperfect recall in their state

abstractions [32]. Additionally Johanson et al. show that effective state abstractions can be created

by clustering together similar hands using the k-means algorithm [42].

Bots created using chance and action-sampling variants of the CFR algorithm have been the

most successful in the ACPC, obtaining the most top three finishes [32]. The top two competitors

in the limit two player bankroll instant run-off division of the ACPC in 2012 were CFR-based.

Eric Jackson created the first place entry using a distributed disk-based implementation of chance-

sampled CFR to run a small number of iterations (1440) on a very large card abstraction [36]. The

second place entry was entered by Johanson et al., using CFR-BR applied to a state abstraction

created using k-means clustering, as described earlier.

Approaches not using the CFR algorithm (or a variant thereof), while less common, have also

enjoyed some success in the two player limit Texas hold’em domain. From 2009 to 2011 Marv

Anderson created agents that won the two-player limit total bankroll competitions of the ACPC

each year. These agents were created using a simple neural network, trained on winning entries

from previous years [32]. Rubin and Watson use a case based reasoning approach trained on strong

agents from previous years to generate agents, which has resulted in some top three finishes in recent

years [55, 57]. Gilpin et al. [16, 20, 21, 22, 23, 24] have also had success creating limit two player

agents using the gradient based approach discussed in Section 3.3. Much of their work has centered

around developing techniques to automatically abstract the state space of the game [21, 22, 24].

Another of their techniques modifies the play of their agents by solving the river independently

using a continuous approximation to Texas hold’em, resulting in an improvement in performance

when tested against the agents entered in the 2008 two player limit division of the ACPC [16].

Much of what currently differentiates the best two player limit Texas hold’em agents revolves

around work on state abstraction. It is important to note that Waugh et al. showed that simply

increasing the size of an abstraction of a poker game does not necessarily lead to a better agent — in

fact, the resulting agent can get worse [65, 66]. Nonetheless agents using larger abstractions have, in

general, beaten agents with smaller abstractions, when the abstractions were generated using similar

techniques.

3.7.2 Ring limit

From 1997 to 2001 Billings et al. worked to create Loki and its successor Poki to play multiplayer

limit Texas hold’em, a very common game played in most casinos [6, 8]. They used a combination

20

of different approaches, including expert knowledge, Monte Carlo roll-out simulations, exhaustive

enumeration algorithms to determine the value of different poker hands (hand strength) and some

basic opponent modeling [8]. Poki was a substantial improvement over Loki, and was the first poker

agent capable of beating average human players in low limit casino games [6]. The size of the

multiplayer game proved to be too great to be able to develop world class poker agents using the

technologies of the time. For this reason much of the academic poker research since the development

of Poki has focused on two player poker, which I discussed above.

A return to research on ring limit started in 2008, when the ACPC added a single six player

limit division. This was replaced with three player limit in 2009 and divided into two divisions, as

described earlier. Sweeney et al. used a gradient-based approach to create static multiplayer agents

for both the six player game in 2008 and the three player game in 2009 [63]. Park et al. applied

Monte Carlo search to create agents, also applying their technique to both game types [60]. Abou-

Risk et al. applied the CFR algorithm to a small abstraction of the three player game, obtaining a

base strategy. They then created a new agent by picking certain two player sub-games (where one

player had folded), solving them independently, then using them in place of the base three player

strategy at these points [3]. The agents created by Abou-Risk et al. were the most successful of the

time, placing first and second in both of the 2009 ACPC three player limit divisions [32]. They also

significantly out perform Poki, the winner of the six player division of the ACPC in 2008, in three

player limit [3].

Gibson et al. continued the work of Abou-Risk et al., dominating the three player limit division

of the ACPC in 2010, 2011 and 2012 [32]. Gibson et al. looked at increasing abstraction granularity

in very large games, such as three player limit hold’em, by partitioning the game tree into parts and

computing strategies for each part independently, then combining the strategies in a process known

as strategy stitching [19]. This led to the winning entries of the 2010 and 2011 three player events.

A technique for computing these strategies concurrently with the rest of the game tree was used to

create their 2012 competition entry, also leading to a victory in the three player events.3 The other

notably successful entry in the three player limit division in recent years comes from Rubin and

Watson, again applying their case based reasoning approach mentioned above [57]. The resulting

agents obtained multiple top three finishes in the limit three player division from 2010 to 2012 [32].

3.8 No-limit Texas hold’em poker agents

The work that has been done on no-limit Texas hold’em can be divided into two categories: solving

games with a very small stack size to blind ratio, and solving games with a much larger ratio. These

are known as short stack and deep stack games, respectively.

3Personal correspondence with Richard Gibson, April 2013

21

3.8.1 Short stack no-limit

Miltersen and Sorensen [50] derived a strategy for a two player no-limit Texas hold’em game where

both players have stacks of 6.67 big blinds, a very small stack size. Their strategy involves only

allowing the small blind to either fold or go all-in. This type of strategy is known in the poker com-

munity as a jam-fold strategy [9, 61]. They proved that this strategy is close to a Nash equilibrium

in the full game (with this stack to blind ratio).

Sandholm and Ganzfried devised an algorithm that obtains a Nash equilibrium solution in three

player no-limit Texas hold’em, using stacks of 7.5 big blinds for all three players [14, 15]. While

their algorithm is not guaranteed to converge, they prove that if it does converge the resulting strategy

profile is an ε-Nash equilibrium. This solution suffers from problems faced frequently in multiplayer

game theory — while the strategy they obtain is an ε-Nash equilibrium, this only guarantees that

if one player deviates from this strategy profile the other two will not lose value. If two players

simultaneously deviate from this strategy profile then the value of the third player’s strategy may

be reduced. Additionally, there may be many Nash equilibria, with different values for each player

— it may be possible for two players to implicitly coordinate their strategy to the detriment of

the third player. Such a strategy profile may be unrealistically pessimistic for some players and

overly optimistic for others [9]. Sandholm and Ganzfried state that they expect there is a unique

equilibrium, negating the former problem, and that they doubt the latter problem is significant when

players are playing jam-fold strategies [14].

3.8.2 Deep stack no-limit

Work on deep stack no-limit Texas hold’em (stacks of 100 big blinds or more) is much more com-

plex, due to the larger game size. In limit poker it is not necessary to abstract the action space,

and in short stack no-limit a simple jam-fold abstraction is sufficient to represent strategies that are

close to Nash equilibria in the full game, as mentioned above. In deep stack no-limit, abstraction

of the action space is required if the techniques described in Section 3.3 are to be used. This adds

two challenges to solving the game — creating a reasonable action abstraction, and designing an

algorithm to translate any information set in the full game to an information set in this abstraction.

The focus of my thesis is to provide an answer to the former problem. The latter problem is known

as the translation problem, and significant work on this has been done by Schnizlein et al. [58, 59],

which I discuss below.

Choice of action abstraction

Typical abstractions of the no-limit action space allow fold, call and all-in actions everywhere, but

limit the number of different sized bets players can make. The allowed bets are usually expressed in

terms of the pot size. Common bets include half pot, pot, and integer multiples of the pot.

Gilpin et al. created no-limit agents to play the 200 and 500 big blind variants using their

22

technology developed for limit [25]. As an abstraction of the action space they allowed fold, call,

pot and all-in everywhere, along with half pot at select places in the game. Their reasons for choosing

this abstraction were based in established poker theory [25].

Schnizlein created no-limit agents using the CFR algorithm and a variety of abstractions to play

the 100, 200 and 500 big blind games [58]. In addition to fold, call, pot and all-in, some of the bet

sizes he considered were half pot, quarter pot, 1.5 pot, 2 pot, 10 pot and 11 pot. He did experiments

keeping the size of the game constant and alternating between allocating more space to the card

abstraction and the action abstraction. As was discussed in Section 1.3 his results showed that

agents using small bet options such as half pot outperformed agents of the same size with simpler

betting abstractions but much more complex card abstractions [58].

Betting translation

Typically when a no-limit game is solved using an abstraction of the action space, this is done with

the intention of using the resulting strategy to play the full no-limit game. In order to do this we

must decide how we will handle actions made by the opponent that are not in our abstraction of the

game. There are multiple algorithms that can be used to map actions in the real game to actions in

the abstracted game. The simplest of these, hard translation, maps each action to the closest action

in the abstraction using a predefined metric. The most successful metric is known as the geometric

metric. If b is the real bet size and c and d are the closest bet sizes b in the abstract game such that

c < b < d, then we map b to c if c/b > b/d and we map to d otherwise [25, 58].

An alternative to hard translation, known as soft translation, again maps a real bet size b to the

closest abstract bet sizes. While hard translation simply maps to one of the abstract actions, soft

translation maps to both of them with some probability. Schnizlein et al. [58, 59] map to the lower

bet size with probability P = c/b−c/d1−c/d and the upper bet size with probability 1− P , where b, c and

d are defined as above. Ganzfried et al. perform soft translation using different probabilities - in this

example they map to c with probability P = cdb2+cd , and d with probability 1− P .4 Soft translation

has been shown to outperform hard translation [58].

In the 2009 ACPC an adaptive no-limit poker agent designed to exploit the weakness of existing

translation techniques was entered by Schnizlein [58] and dominated the two player no-limit compe-

tition. It beat some equilibrium-based agents by over 2 small blinds per hand, more than they would

lose if they simply folded every hand [32]. This is a compelling demonstration of the difficulty of

the translation problem.

In 2010 and 2011 the no-limit bankroll instant run-off division of the ACPC was won by agents

generated by applying variants of the CFR algorithm to hand-crafted action abstractions, and then

using translation of the form discussed above [32]. In the 2012 competition the action abstraction

used by the winning entry in the no-limit bankroll instant run-off division was generated by the

4Personal correspondence with Sam Ganzfried, February 2009

23

author using the techniques described in this dissertation (see Section 5.9).

24

Chapter 4

Solving for bet sizes in small games

In this chapter I introduce a transformation that can be applied to imperfect information extensive-

form games in which one or more actions at many decision points have an associated continuous

or many-valued parameter. Specifically, I show this transformation as it applies to no-limit poker

games. Section 4.1 introduces the transformation, which creates a new multi-agent game consisting

of two teams, where each team has a single decision agent and possibly multiple parameter-setting

agents, one for each parameter. In Section 4.2, I show that a known Nash equilibrium solution

for a small no-limit poker games maps to a Nash equilibrium solution of the transformed game.

Section 4.3 introduces a new regret-minimizing algorithm, called BETS, that can be applied to find

equilibria of the transformed game. In Section 4.4 I show that by performing this transformation

and using the BETS algorithm I can determine the bet-sizes made as part of a Nash equilibrium

solution of the original game. Finally, Section 4.5 introduces the local maxima problem encountered

when using this technique, which I consider in more detail in Chapter 6.

The core material in this chapter has been published in AAAI 2011 [33].

4.1 The multiplayer betting transformation

I now define the multiplayer betting transformation. It can be applied to domains with actions that

have associated real or multi-valued parameters. In the case of poker, the action in question is a bet,

and the multi-valued parameter is the size of the bet. These are combined to create a large number

of possible actions of the form “bet x”, where x is an integer between some minimum value and the

stack size of the betting player. In the case of a trading agent, each of the actions buy and sell could

have a parameter that specifies the amount. As we use poker games as our test bed, I will use poker

terminology throughout this dissertation.

The transformation is applied to a two-player no-limit poker game as follows. At each decision

node with m betting actions such that m > 1, remove all betting actions except all-in. Insert some

predefined number q of betting actions such that q < m. A different value of q may be used at each

decision node. Each new bet action is followed by a separate decision node for a new player, who I

25

will refer to as a “bet sizing player”. This player chooses between a low bet and a high bet, which

I will refer to as betting L or betting H , where these are expressed as fractions of a pot sized bet.

Every bet action is assigned a new bet sizing player. The low and high bets can be different for

each bet sizing player, with the only restrictions being that for bet sizing player i Li < Hi, Li is at

least a minimum bet and Hi is at most an all-in bet. This means that if bet sizing players i and j

branch from the same decision node, they may each have L and H values such that [Li, Hi] overlaps

[Lj , Hj ].

The new bet sizing players do not see any of the private cards, and each bet sizing player obtains

the same payoffs as the player for whom they are choosing the bet. So instead of two players, the

game has two teams. The decision of whether to bet L or H is private to the bet sizing player that

chose it — no other player in the game observes this action. I will refer to the two players that

have fold, call, bet and all-in actions as player one and player two, and I will refer to the game that

is created when this transformation is applied to game X as “multiplayer betting X”. Figure 4.1(a)

shows a decision node for player one in a no-limit game with a 4 chip pot, 6 chips left in each stack

and a minimum bet of 2. The corresponding decision node for the multiplayer betting version of this

game, with q = 1, is shown in Figure 4.1(b). Here circles represent player one, squares represent

player two, the diamond is a bet sizing player, and any values of L and H can be chosen such that

L ≥ 0.5 and L < H ≤ 1.5.

fold call bet 2 bet 3 bet 4 bet 5 bet 6

(a) Regular no-limit game

fold call

bet

L H

All-in (bet 6)

(b) Multiplayer betting game

Figure 4.1: A decision node in a no-limit game, and the multiplayer betting version of the samedecision node with q = 1.

I now define the effective bet size function B(s), where P (L) is the probability of making the

low bet L and P (H) is the probability of making the high bet H .

26

B(P (H)) = (1− P (H))L+ P (H)H. (4.1)

Here I used P (H) + P (L) = 1 to eliminate P (L).

Lemma 1. For any fixed values of L and H and probability P (H)i of player i betting H , the

expected value of the pot size after that bet is equal to p+B(P (H)i)p, where p is the pot size before

the bet.

Proof. After a bet of size L the pot size is p+ pL, and after a bet of size H the pot size is p+ pH .

So the expected pot size is P (L)i(p+ pL) + P (H)i(p+ pH) = P (L)ip+ P (H)ip+ P (L)ipL+

P (H)ipH . Since P (L)i+P (H)i = 1, this is equivalent to p+((1− P (H)i)L+ P (H)iH) p, and

using Equation 4.1 we get p+B(P (H)i)p, as required.

This leads me to define the equivalent abstract game

Definition 1. Given a no-limit poker game G, a transformed game G′ obtained using the multiplayer

betting transformation, and a strategy profile σ for G′, I define the equivalent abstract game as an

action abstraction of G created by taking G′ and replacing each bet sizing player i by a bet of size

si = B(P (H)i).

Finally, I present the following theorem.

Theorem 1. For any given strategy profile σ of a multiplayer betting game, ui(σ) where i = 1 or

i = 2 is equivalent to ui(σ′) in the equivalent abstract game, and σ′

i = σi for i = 1, 2.

Proof. For every terminal history z of the equivalent abstract game involving k betting actions there

is a corresponding set Z of 2k terminal histories in the multiplayer betting game where players one

and two make the same actions. The bet sizes in the equivalent abstract game correspond to the

probabilities of choosing L or H , so, by Lemma 1, the pot size for history z is equal to the expected

pot size for player one and player two for Z. Therefore, the utility of z equals the expected utility

for all of Z. Since the probability of z is equal to the combined probability of histories in Z, the

utility of a strategy in the abstract game is equal to the utility of the corresponding strategy in the

multiplayer game for players one and two.

4.2 Transformation significance

Theorem 1 allows us to map a given strategy profile σ for a multiplayer betting game to a strategy

profile σ′ for the equivalent abstract game with the strategy profiles having the same value in both

games. Each different choice of P (H)i for each bet sizing player i corresponds to a different equiv-

alent abstract game, making the choice of abstraction part of each team’s strategy in the multiplayer

betting game. Additionally, a strategy profile for the multiplayer betting game and a strategy pro-

file for the equivalent abstract game require almost the same memory to store, with the multiplayer

27

betting game defining exactly q extra parameters, where q is the number of bet-sizing players. Any

abstraction of the full no-limit game can be represented by a strategy in a multiplayer betting game

given appropriate values of L, H and q. Thus with careful choices of these parameters (a topic I

cover in Chapters 5 and 6), we can represent all abstractions that we are interested in using, while

requiring only slightly more memory than any one of those abstractions to store the strategy.

This is not the only value of the multiplayer betting transformation. As I demonstrate below,

in small poker games the bet sizes used in Nash equilibrium strategy profiles correspond to the

effective bet sizes used in Nash equilibria of the multiplayer betting game. This suggests that a Nash

equilibrium of the multiplayer betting game can be mapped to an excellent choice of abstraction in

the full poker game. It should be noted here that the equivalent abstract game, as given by Definition

1, is a game where any real valued number of chips can be bet, while in reality poker games (with

the exception of some toy games) are played using integer bet sizes. In practice we can round to

the nearest integer, and because in poker the pot grows exponentially with each bet [62], this is a

reasonable approximation.

4.2.1 Analysis in Kuhn poker

Here I will consider the properties of a Nash equilibrium solution of multiplayer betting no-limit

half street Kuhn poker. To do this I use analytical formulas previously derived for half street Kuhn

poker with fixed bet sizes that relate Nash equilibrium strategies to the bet size used [9].

In half street Kuhn poker the first player has two possible actions: bet or check. If the first

player bets, the second player must either fold or call, while if the first player checks the game ends

(see Section 2.2.1). In a Nash equilibrium strategy profile the first player always bets with the the

King and never bets with the Queen, while the second player always calls with the King and never

calls with the Jack [9]. Thus the game has two non-dominated strategy parameters, P1(b | j), the

probability that the first player bets with the Jack, and P2(c | q), the probability that the second

player calls with the Queen. For any version of this game with antes of 0.5 and any number of

real-valued bet sizes si such that 0 < si < 1, a strategy profile is a Nash equilibrium if and only if,

for all bet sizes si such that P1(b | j)i �= 0 [9],

P1(b | j)i = 1− P2(c | q)i2

=si

si + 1. (4.2)

Note that if si > 1, then in any Nash equilibrium strategy profile P1(b | j)i = 0, P2(c | q)i = 0

and this subgame becomes trivial — the first player only considers betting si with the King, and the

second player only calls a bet of size si with the King.

Consider a multiplayer betting version of this game. All bets in half street Kuhn poker can

be represented by a bet-sizing player using an equivalent effective bet size B(P (H)). To find a

Nash equilibrium for multiplayer betting half street Kuhn poker we must obey Equation 4.2 where

si = B(P (H)i) for every bet-sizing player i, and we must also pick P (H)i such that player i

28

cannot gain by choosing a different value. For this to be true we must have, for all 0 ≤ x ≤ 1,

u1(B(x)) ≤ u1(s) and thus u1(B(x)) − u1(s) ≤ 0. The only situations that affect this difference

in utility for a given strategy profile σ are sequences that involve a bet being made and called. Thus

we consider the probability player one bets with the King and is called (winning the hand), minus

the probability player one bets with the Jack and is called (losing the hand). So we have

u1(B(x))− u1(s) =

[P1(b | k)

3

P2(c | q)2

−(P1(b | j)

3

(P2(c | k) + P2(c | q)

2

))](B(x)− s

).

(4.3)

It is dominated to check or fold the King so P1(b | k) = 1 and P2(c | k) = 1 and we have

u1(B(x))− u1(s) =

[P2(c | q)

6− P1(b | j)

3

(1 + P2(c | q)

2

)](B(x)− s

)≤ 0. (4.4)

For this to be true for any B(x) we must have

P1(b | j) = P2(c | q)1 + P2(c | q) . (4.5)

To have a Nash equilibrium in the multiplayer betting half street Kuhn poker game we need to satisfy

Equation 4.5 and Equation 4.2, so

1− P2(c | q)2

=P2(c | q)

1 + P2(c | q) , (4.6)

a quadratic to which the positive solution is

P2(c | q) =√2− 1. (4.7)

Plugging this into Equation 4.2 and solving for s we get

s =√2− 1. (4.8)

So a Nash equilibrium strategy profile in the multiplayer betting half street Kuhn game must define

all values of P (H)i where Li < Hi < 1 such that B(P (H)i) = s =√2− 1.

Chen and Ankenman also show that the value of half street Kuhn poker to the betting player as

a function of bet size s is given by [9]

V (s)1 =s(1− s)

6(1 + s)(4.9)

Figure 4.2 shows a plot of this function from s = 0 to s = 1. Each point on this curve represents

the game value for a given bet size s. For every point on the x-axis the Nash equilibrium strategy

profile may be completely different.

Chen and Ankenman show that the curve peaks at s =√2 − 1, thus the Nash equilibrium

in no-limit half street Kuhn poker uses this bet size. This is the same as Equation 4.8, so if we

can find a Nash equilibrium for any multiplayer betting half street Kuhn poker game where we set

29

Figure 4.2: Half-street Kuhn poker game value per bet size.

Li <√2 − 1 < Hi < 1 for all bet-sizing players i, then the value of the equivalent abstract game

for player one will be the same as the value of a Nash equilibrium strategy profile of the full no-limit

game for player one. Note that the Nash equilibrium of no-limit half street Kuhn poker uses only one

bet size, so using multiple bet-sizing players in the transformation is unnecessary (for this game), as

in a Nash equilibrium strategy profile they will all set P (H)i to the same effective bet size.

4.3 Computing multiplayer betting games strategies

The multiplayer betting transformation creates an n player game, where n > 2. Unfortunately

this means that the standard techniques for minimizing regret in two-player games are no longer

guaranteed to converge. This does not mean it is not worth trying — Gibson et al. [19] have

had success applying variants of the CFR algorithm to three player poker games, as discussed in

Section 3.7.2. Unfortunately, when I applied chance-sampling CFR [37] directly to the multiplayer

betting version of some small poker games, it did not converge. Analysis of these runs shows that the

bet-sizing player mostly alternates between having positive regret for one action and negative for the

other, switching between playing pure strategies P (H) = 1 and P (H) = 0. Looking at Equation 4.2

we can see that in a Nash equilibrium strategy profile for no-limit half street Kuhn poker P1(b | j)iand P2(c | j)i are not quite linear in si. With this in mind I modified the chance-sampled CFR

algorithm as follows: I continue to use regret matching to minimize regret for players one and two

(see Section 3.3.2), but for each bet-sizing player I move each effective bet size sti = B(P (H)ti)

30

small amounts in the direction that minimizes regret for player i:

st+1i =

{(t−1)sti+srt+1

i

t if t < N(N−1)sti+srt+1

i

N otherwise.(4.10)

where N is a parameter chosen beforehand, sti is the effective bet size made by bet-sizing agent

i on iteration t defined as B(P (H)i) from Equation 4.1, and srt+1i is the effective bet size that

would be used on iteration t + 1 if I was using regret matching. We can see that up to iteration

t = N the effective bet size chosen by this algorithm is simply the average of all the effective bet

sizes that regret matching would pick each iteration. Using this average makes the movement of the

effective bet size from iteration to iteration slower and more smooth. After time t = N we still use

the average of all past choices of regret matching, but we weight the effective bet size of the latest

choice by 1/N . This ensures that the effective bet size continues to move some minimal amount in

a direction that minimizes regret.

When this new regret minimizing algorithm is applied to half street no-limit Kuhn poker using

L = 0, H = 1, N > T and run for T = 500, 000 iterations, the average of all P (H)ti choices

made by bet sizing player i converges to within 1% of B(P (H)T

i ) = 0.414. Note that this is the

same value as Equation 4.8.1 The algorithm, however, is significantly slower than CFR. In CFR,

the average (not the current) action probabilities converge to an ε-Nash equilibrium. Therefore, for

each action sequence the algorithm maintains a sum of the probability this sequence was played at

each iteration. Any time a player has 0% probability of reaching an information set, the addition

step can be skipped for all action sequences that reach the subtree beneath that information set. This

cutoff can always be used, independent of how the bet-sizing players are updated. This leads to a

new update equation:

st+1i =

{sti +

Bti (sr

t+1i −sti)

t if t < N

sti +Bt

i (srt+1i −sti)

N otherwise(4.11)

Here Bti is the probability all players play to reach this information set on iteration t. Equation

4.10 updates sti for each bet sizing player on every iteration. If Bti = 0 then Equation 4.11 implies

that st+1i = sti. In this case, the algorithm can make a single strategy update pass for each team of

bet-sizing players at the same time that it updates the totals used to calculate the average strategy of

the corresponding main player. The resulting algorithm, again with L = 0, H = 1 and N > T , also

converges to B(P (H)T

i ) = 0.414, but does so significantly faster than when Equation 4.10 is used.

I call the process of minimizing regret in a multiplayer betting game using the result to obtain

an abstraction Bet-size Extraction through Transformation Solving, or BETS, and it is outlined in

Algorithm 1.

While the choice of N > T leads to convergence in half street Kuhn poker, in larger Kuhn games

N > T no longer converges. I found that a setting of N = 10000 for all bet sizing players worked

1This was tested with one, two and three bet-sizing players

31

Algorithm 1 The BETS algorithmRequire: Multiplayer betting game G′

Require: T iterationsApply modified chance-sampled CFR to G′ as follows:for t from 1 to T do

Calculate regrets as in CFRDuring strategy-update phase:for each player i do

if i is a bet-sizing player then

P (H)t+1i = st+1

i from Equation 4.11else

Set σi as in CFRend if

end for

end for

Create abstraction A, replacing each bet sizing player i in G′ by a bet of size si = B(P (H)T

i )return A

well in all test cases, and this is the value used to produce all of the results presented throughout the

dissertation.

4.4 Computing abstractions for no-limit Kuhn poker

Applying BETS I computed abstractions for three variants of no-limit Kuhn poker for which the

Nash equilibria have been determined analytically [9]. The results are shown in Table 4.1, and

detailed below.

Kuhn Variant Bet sizes used by Nash equilibrium Bet sizes found by BETS algorithmHalf street 0.414 0.4141 bet cap 0.509 and 0.414 0.509 and 0.414

Unlimited raises 0.252, 0.542 and minimum raise 0.252, 0.542 and L for all L, H tried

Table 4.1: Bet sizes used in Nash equilibria of various Kuhn poker variants, with corresponding betsize found by applying BETS.

For multiplayer betting half street Kuhn poker with L = 0 and H = 1, a 500, 000 iteration run

of BETS converges to within 1% of B(P (H)) = 0.414, the bet size used by the Nash equilibrium

strategy profile of the full no-limit game [9]. For cases where H < 0.414 the algorithm converges

with P (H) = 1. Figure 4.2 shows that this is the bet size with the highest game value for the betting

player in that range. For choices of L > 0.414 with H ≤ 1, BETS converges to P (H) = 0, again

obtaining the maximum game value for the given range (for a discussion of what happens when

H > 1 see Section 4.5).

In the larger game of no-limit Kuhn poker where each player may make a bet but not raise,

the bet sizes used by Nash equilibrium strategies are s = 0.414 for player one and s = 0.509 for

player two [9]. Here a one million iteration run of BETS converges to within 1% of the effective

32

bet sizes of B(P (H)) = 0.414 and B(P (H)) = 0.509 for teams one and two respectively. In the

full no-limit Kuhn poker game both players may bet and raise indefinitely. The bet sizes used by a

Nash equilibrium strategy profile in this game are quite unintuitive, like the previous games. The

bet sizes are 0.252 for player one’s initial bet, 0.542 for player two’s bet if player one checks, and

the minimum legal raise if player two raises in response to a player one bet [9]. Using L = 0.1 and

H = 0.3 for player one’s bet and L = 0.4 and H = 0.6 for player two’s first bet BETS converges

to within 1% of the 0.252 and 0.542 bet sizes after two million iterations, and for all ranges that I

tried for the player two raise, BETS sets P (H) = 0.

4.5 Local maxima

In Section 4.2 I showed that in no-limit half street Kuhn poker the Nash equilibrium bet size corre-

sponds to the strategy used by the bet-sizing player in the Nash equilibrium of a multiplayer betting

game where L < 0.414 and H > 0.414. The reverse is not always true — games with more than

2 players may have many equilibria with different game values, and the multiplayer betting game is

no exception to this. An example of this can be seen in half street Kuhn poker, when H > 1. To

understand what happens at this point, I extend the game value curve from Figure 4.2 for larger bet

sizes, producing Figure 4.3.

Figure 4.3: Half-street Kuhn poker game value per bet size, extended to 2 pot.

Examining this figure you can see that for bets of size s > 1 the value of the game is 0. For all

of these bet sizes the strategies for both of the main players are the same - there is no longer any

bluffing with the Jack or calling with the Queen. The first player only bets with the King, and the

second player only calls with the King. While these points do not represent Nash equilibria of the

33

full no-limit game, they are Nash equilibria in the multiplayer betting game. This is due to the fact

that while player one and the betting player could gain value if they both changed their strategy to

the peak point where s = 0.414, they cannot gain value if just one of them changes their strategy.

This means that for any multiplayer betting version of no-limit half street Kuhn poker where H > 1,

BETS may converge to an effective bet size s ≥ 1, even if L < 1 (and indeed in some cases I

observed this behaviour). I will return to this discussion in Chapter 6.

In the next chapter I move on to discuss the application of this transformation and algorithm to

larger games, the complications that arise, and how I resolve them.

34

Chapter 5

Dynamically Adjusted Bet-Sizing

Ranges

In this chapter I consider the application of the BETS algorithm (see Section 4.3) to large poker

games, specifically Leduc and Texas hold’em. Action abstractions in these games must be evaluated

differently, which I discuss in Section 5.1. Sections 5.2 and 5.3 show that the application of BETS

to these games leads to an improvement in game value, however it is also evident that the static

bet size ranges are limiting. To investigate this limitation, I introduce a metric for ranking bets in

order of how much value could be gained by changing their range (Section 5.4). Next in Section

5.5 I apply this metric to the results of Sections 5.2 and 5.3, clearly demonstrating the potential

gains of adjusting the ranges. Section 5.6 shows that it is preferable to make these ranges as small

as possible. The desire for small, dynamic ranges led me to the RE–BETS algorithm, which I

introduce in Section 5.7. I apply RE–BETS to Leduc and Texas hold’em in Section 5.8, leading to

a significant improvement over the BETS algorithm. Finally, in Section 5.9 I describe how RE–

BETS was used to create the winning entry in the 2012 ACPC no-limit bankroll instant run-off

division.

Note that in this chapter I use one bet-sizing player at each decision node when performing

the multiplayer betting transformation (see Section 4.1). I consider multiple bet-sizing players per

decision node in Chapter 6.

The core material in this chapter has been published in AAAI 2012 [34].

5.1 Evaluation of action abstractions

The results in the previous chapter are from small games for which analytical solutions are known,

providing a gold standard for evaluating my algorithm. For larger games such as no-limit Leduc

poker and no-limit Texas hold’em poker I have neither an analytical expression for the Nash equi-

librium, nor can I solve the full no-limit game using modern techniques. Instead, I compare the

game value of different choices of action abstraction for one of the players against a fixed choice

35

(fold, call, pot and all-in in this chapter) for the other player. I call the fixed action abstraction the

“opponent action abstraction”.

Some of the action abstractions used in the comparisons I make in this chapter are generated

by the BETS algorithm from Chapter 4 and the RE–BETS algorithm (see Section 5.7), while

others are hand-crafted. I use the same card abstraction when making direct comparisons — for the

analysis presented in this chapter (excluding Section 5.9) I use a card abstraction of no-limit Texas

hold’em created using the hand strength metric [65] with 5 buckets each round and perfect recall.

This leads to an abstraction with 5 buckets on the pre-flop, 25 on the flop, 125 on the turn and 625

on the river. No card abstraction is used in my no-limit Leduc experiments.

I compute the strategies that I use for my comparisons in two steps. The first step involves

running BETS (or later RE–BETS) which creates a new action abstraction. After this I run a

chance-sampled version of the CFR algorithm [68], which adjusts only the strategy parameters, us-

ing the bet sizes that were computed in step one. The CFR algorithm produces an ε-Nash equilibrium

for this new action abstraction, and thus calculating a best response to this strategy profile gives me

bounds on the game value of the new action abstraction.1 Note that this evaluation technique is for

perfect recall card abstractions. Evaluating action abstractions that are paired with imperfect recall

card abstractions is discussed in Chapter 7.

5.2 Leduc poker with limited raising

I consider a Leduc poker game where each player has 400 chips, with antes of 1 chip each. This

game has approximately 1011 states and 1011 action sequences. Due to the large number of action

sequences, it is difficult to solve the game directly with an ε-Nash equilibrium solver, as it would

require at least 1480 gigabytes of memory. Instead I applied my methodology for evaluating large

games as follows. I created a baseline abstraction where both players use an action abstraction with

actions fold, call, pot and all-in (known as the FCPA abstraction), against which I can compare

the game value of each equivalent abstract game. I then created two multiplayer betting games,

each time replacing each of the pot bets for one of the players with a single bet sizing player with

L = 0.8 and H = 1.5. I call the player with the bet-sizing teammates the main player, and the

other player the opponent, a terminology I use throughout the rest of the dissertation. I allowed

a maximum of two raises per round for both the baseline Leduc game and the multiplayer betting

games, a restriction which I explain in Section 5.8.

I applied the BETS algorithm (Section 4.3) to each of the multiplayer betting games, running

for 15 million iterations each time. Once this completed, I applied chance-sampled CFR to the

equivalent abstract games, running for one billion iterations to assure a very small ε. Getting best

response values for the resulting strategy I obtained game values of −0.060 antes for player one’s

1I use the term “best response” throughout the dissertation to refer to a best response within the abstract game, as opposedto a best response in the full game

36

new betting abstraction and 0.078 for player two’s new betting abstraction. Running chance-sampled

CFR for one billion iterations on the baseline FCPA abstraction and getting best response values

I obtained a game value of −0.072 antes for player one (and an equivalent 0.072 antes for player

two). The abstractions chosen by my algorithm for player one and two represent respective 16.7%

and 8.3% improvements over the baseline action abstraction. We revisit these results in Section 5.8.

5.3 Application to Texas hold’em

For my application to no-limit Texas hold’em I used 200 big blind stacks and blinds of 50 and 100

chips. This is the variant of no-limit Texas hold’em played in the ACPC since 2009 (see Section 3.6).

Once again I used FCPA as the opponent action abstraction as well as the baseline abstraction, and

I used the 5 bucket perfect recall card abstraction described in Section 5.1. I consider only player

one in this section - I will look at both players one and two in Section 5.8.1.

Choosing L = 0.8 and H = 1 (suggested by experts) for all bet sizing players, I applied BETS

to player one, running for 200 million iterations, and I obtained an action abstraction. Next I ran

chance-sampled CFR for one billion iterations on both this abstraction and the FCPA baseline

abstraction. Calculating the best responses of both abstractions, I found that the game value of

the equivalent abstract game I generated for player one using BETS was greater than that of the

baseline action abstraction’s value for player one by ≈ 17%.

I also created action abstractions for player one based on a number of shorter runs of BETS.

Table 5.1 show the number of bet sizes in the given range after different length runs. This table

shows that the vast majority of bet sizes are < 0.82, which is surprising, given that previous expert

knowledge dictated that if only a single bet size is used everywhere, it should be pot sized.2 So

many bet sizes being close to the low boundary after only 1 million iterations suggests that the

ranges should be moved down. It’s possible, however, that the bets with these small values have

little effect on the game value, while the more valuable bets remain closer to 1. This could happen,

for example, if the bet-sizing players choosing these smaller bets were reached with very small

probability. To test if this was the case I developed a metric, RTi,|·|0, designed to rank the individual

bet-sizing players in order of how much value could be gained by moving their range. I discuss this

metric in the next section.

5.4 Importance metric

Here I define a metric that can be used to rank the bet-sizing players in a multiplayer betting game

using two criteria. This metric, RTi,|·|0, is designed to be calculated during a run of T iterations of

the BETS algorithm (see Section 4.3). First, it gives an indication of the importance of the sub-tree

below bet-sizing player i to the overall game value. This is indicated by the probability that player

2Personal correspondence with Darse Billings, October 2010

37

Length Bets 0.8− 0.82 Bets 0.82− 0.98 Bets 0.98− 1.01m 152 8 04m 151 9 06m 150 9 18m 145 13 2

10m 142 16 220m 138 20 2100m 126 31 3200m 126 28 6

Table 5.1: Bet sizes after different length runs of the BETS algorithm.

i will be reached on a given iteration t where all players play according to σt, denoted by πσt

−i,

combined with the expected utility to player i if bet size sti is played, given by uti(s

ti, σ

t). Second, a

difference calculation gives an indication of how much utility could be gained by moving player i’s

bet in the direction of positive regret on iteration t. If this value is high and the effective bet size is

already at the end of the range, this indicates value could be gained by adjusting player i’s range to

allow further movement in the same direction.

The metric is calculated every iteration t, and summed over all iterations T . If no movement

of a bet-sizing player’s range is indicated, this metric can be used to sort the bet-sizing players in

order of importance to the game value. If, however, there are bet-sizing players that would benefit

from adjusted ranges, their score is increased. Thus by combining RTi,|·|0 with an evaluation of how

close the effective bet size chosen by each bet-sizing player is to the edge of its range, I can get

an indication of which bet-sizing players are likely to provide the greatest increase in value if their

range is adjusted.

On each iteration t of the BETS algorithm, the bet-sizing players minimize immediate coun-

terfactual regret [68]. To do this I compute two values:

R(L)ti = πσt

−i

(uti(L, σ

t)− uti(s

ti, σ

t)

)(5.1)

and

R(H)ti = πσt

−i

(uti(H,σt)− ut

i(sti, σ

t)

), (5.2)

where sti = B(P (H)ti) (see Equation 4.1). If increasing sti would gain value, then R(H)ti > 0 and

R(L)ti < 0. The opposite is true if decreasing sti gains value. Clearly R(H)ti tells us the value that

can be gained (or lost) by maintaining the current strategy and increasing sti. The opposite is true

for R(L)ti. These regrets, however, only tell us how much value could be gained by moving the

effective bet size to the edge of the range — if sti = Hi,3 then R(H) = 0. In such cases it would be

nice to know if value could be gained by increasing the range.

Since all other probabilities are fixed during regret computation, R(H)ti is linear in sti. This is

true because the amount of money that is won or lost is directly proportional to the utility uti, which

3Note that Li and Hi are the bet sizes made by bet-sizing player i, while L and H are the actions themselves

38

is linear in sti. Therefore, the magnitudes of these values are proportional to the distance of sti from

the range boundaries. For example, as sti gets closer to Hi, |R(L)ti| increases and |R(H)ti| decreases

proportionally. This means that if sti = Hi and a lot of value would be gained if the bet size was

increased further, then while R(H)ti = 0, R(L)ti will be large and negative. For this reason I define

Rti,|·| = |R(L)ti|+ |R(H)ti|. (5.3)

By combining the absolute values of these regrets, I get an indication of the value that could have

been gained on iteration t by moving bet i in the direction of positive regret, independent of where stiis in the range. In fact, Rt

i,|·| doesn’t even depend on the values of Li and Hi, only on w = Hi−Li.

I will prove this statement formally in the next section — see Equation 5.14.

Finally, I sum this value over all iterations t to obtain a metric for the entire run:

RTi,|·| =

T∑t=1

Rti,|·|. (5.4)

There is one more step involved to normalize this metric if different bet-sizing players use different

ranges, as I discuss in the next section.

5.4.1 Comparing different ranges

In Section 5.7 I introduce RE–BETS, an algorithm that adjusts the ranges of bet-sizing players.

This leads to a multiplayer betting game where each bet-sizing player may be using a different range.

With this in mind, I consider the effect of the bet sizes Li and Hi on the utilities uti(L) and ut

i(H).

Since we evaluate these utilities keeping the strategies fixed for each player, the utility of a bet is

directly proportional to the pot size. The pot size is not, however, directly proportional to the bet

size. Given the pot size p, I define the pot-growth function of a bet of size s as

G(s) = 1 + 2s. (5.5)

If a bet of size s is made and called the new pot size p′ is

p′ = p+ 2sp

= p(1 + 2s)

= pG(s).

(5.6)

Looking at this equation it’s clear that the pot size grows proportional to G(s).

Now let us reconsider Equation 5.3. The signs of R(L)ti and R(H)ti are related. There are three

possibilities: sti = Hi, sti = Li, and Li < sti < Hi. The third case can be broken down further,

as uti is linear in sti, so one of R(L)ti and R(H)ti is positive and the other is negative. If R(H)ti is

positive for the third case we have

39

|R(L)ti|+ |R(H)ti| = πσt

−i

(uti(s

ti, σ

t)− uti(L, σ

t)

)+ πσt

−i

(uti(H,σt)− ut

i(sti, σ

t)

)

= πσt

−i

(uti(H,σt)− ut

i(L, σt)

)

= πσt

−i|uti(H,σt)− ut

i(L, σt)|

(5.7)

where the final step holds as the right hand side of the equation is guaranteed to be positive. If R(L)ti

is positive then when Li < sti < Hi we have

|R(L)ti|+ |R(H)ti| = πσt

−i

(uti(L, σ

t)− uti(s

ti, σ

t)

)+ πσt

−i

(uti(s

ti, σ

t)− uti(H,σt)

)

= πσt

−i

(uti(L, σ

t)− uti(H,σt)

)

= πσt


i(L, σt)|

(5.8)

and again the right hand side must be positive. The cases where sti = Hi or sti = Li reduce to the

same equation:

Rti,|·| = πσt


i(L, σt)|. (5.9)

I now consider the effect that changing the values of Li and Hi for bet-sizing player i have on

this equation. I define the width of the range for player i as

wi = Hi − Li. (5.10)

As utility uti(s, σ

t) is directly proportional to G(s) I define

uti(s, σ

t) = CtiG(s) (5.11)

where Cti is a constant for player i on iteration t. Applying these two equations and rearranging

Equation 5.9 I get

Rti,|·|

πσt

−iCti

= |G(Hi)−G(Li)|

= |(1 + 2Hi)− (1 + 2Li)|= |(1 + 2Hi)− (1 + 2(Hi − wi)|= 2wi.

(5.12)

I can now write Equation 5.4 as

RTi,|·| =

T∑t=1

Rti,|·|

= 2wi

T∑t=1

πσt

−iCti .

(5.13)

40

With this in mind I define

Rti,|·|0 =

Rti,|·|

2wi(5.14)

and

RTi,|·|0 =

RTi,|·|

2wi. (5.15)

Now this metric can be used to safely compare bet sizing players using different ranges. In this

chapter wi is constant4 and so RTi,|·|0 is directly proportional to RT

i,|·| for all bet-sizing players i.

In Chapter 6, however, I examine a different way of setting up ranges where the width is no longer

constant. Thus for the remainder of the thesis I use Equation 5.15 to measure the relative importance

of bet-sizing players.

5.5 Hold’em re-visited

I re-ran the analysis from Section 5.3, this time ranking the bet-sizing players according to RTi,|·|0.

The second, fourth and sixth columns of Table 5.2 show the same data as Table 5.1, while the third,

fifth and seventh columns of Table 5.2 show the distribution of the 10 bets with the highest RTi,|·|0

values. The important bets tell a slightly different story —- while 8 of these bets are close to L = 0.8,

two of them moved towards H = 1 over the first 8 million iterations, and these two bets had the

highest and third highest RTi,|·|0 values.

0.80− 0.82 0.82− 0.98 0.98− 1.0All Top 10 All Top 10 All Top 10

Length Bets Bets Bets Bets Bets Bets1m 152 8 8 2 0 04m 151 8 9 2 0 06m 150 8 9 1 1 18m 145 8 13 0 2 2

10m 142 8 16 0 2 220m 138 8 20 0 2 2

100m 126 9 31 0 3 1200m 126 9 28 0 6 1

Table 5.2: Bet sizes after different length runs, including bets with high RTi,|·|0 values.

The game value of the equivalent abstract games changed significantly during the first 8 million

iterations, while the most important bets were moving. During the final 192 million iterations,

however, the game value stayed relatively constant. This result, coupled with the fact that there are

important bets at both ends of the [0.8, 1] range, suggested that allowing some of the bets to go lower

and others to go higher could result in increased game value. The simplest way to achieve this goal

4There are a few minor exceptions to this when using RE–BETS, but only for low-importance bets at the end of somebetting sequences.

41

would be to continue using static ranges, but make them larger. Unfortunately, as I explain in the

next section, there are significant disadvantages to using large ranges.

5.6 Tree creation - the small range constraint

When creating the game tree for the multiplayer betting version of a game, there are two issues to

be considered:

• What ranges should be used?

• How many bets should I allow in each bet-sequence (what is the depth of the game tree)?

Consider an abstraction of a three round no-limit poker game with initial stacks of 15 big blinds,

where only half-pot bets, pot bets and all-in bets are legal for player one, and only pot bets and all-in

bets are legal for player two. I will consider all pot sizes in this game in terms of big blinds, and

I do not specify the size of the big blind in chips. Instead I will assume it is large enough that no

rounding of bet sizes will be required.

As I am only concerned with what happens when bet-sizing players act, I show a reduced version

of the game tree, which I call the bet-sizing tree, in Figure 5.1. In making this tree I remove all

actions sequences that involve any action other than a bet from the main player (the player to which

I will be applying the multiplayer betting transformation). Other betting sequences remain valid

and are part of the full game tree, I simply wish to highlight what happens after multiple bet-sizing

players have acted.

Each node in Figure 5.1 is labeled by the pot size in big blinds (not in chips) after all bets to this

point have been called. For example, the root node is labeled 2 since before any bet can be made,

one player must have already put in one big blind and the other player had to have matched it, for a

total pot of 2 big blinds. The right child node contains 6 big blinds, since if the main player makes

a pot-sized bet (2 big blinds) and is called, the pot would then contain: the current pot (2), plus the

bet (2), plus the call (2).

In this game the main player can make three half-pot bets, two pot bets, or two half-pot bets and

one pot bet, before an additional bet would require more than the 15 big blinds in the initial stack.

For example, to follow the node labeled 16 by a half-pot bet, the main player would have to add 8

big blinds to the pot after having already placed 16/2 = 8 big blinds into the pot, requiring an initial

stack of 16 big blinds. In applying the multiplayer betting transformation to this game, I follow

the methodology that I have been using throughout this chapter — I remove all bets from the main

player, inserting instead one bet-sizing player at each decision node. I use L = 0.5 and H = 0.5 for

all bet-sizing players. For the opponent player I use the FCPA abstraction, though I do not show

opponent bets in the bet-sizing trees.

At this point I must ask the question: does my bet-sizing tree have depth 2 or depth 3?

42

=

16

8

2

18

6

1212

24 24 24

4

0.5 1

0.5

0.5 0.5 0.5

11

1

0.5

Figure 5.1: Bet-sizing tree for a game with stacks of 15 big blinds, with an abstraction that allowsonly half pot and pot bets.

16

8

2

18

6

1212

24 24 24

4

0.5 1

0.5

0.5 0.5 0.5

11

1

0.5

1

36

1

36

1

5436

0.5

Figure 5.2: Bet-sizing tree for a multiplayer betting transformation of a game with stacks of 15 bigblinds, allowing two bets not including the bold actions, or three bets if bold actions are included.

Figure 5.2 shows the bet-sizing tree in the multiplayer betting version of this game, where the

bold edges indicate the third time a bet-sizing player acts. The dotted ovals represent information

sets, since after a bet-sizing player makes their private action, none of the other players know the

resulting pot size. Unfortunately, if I include bet-sequences where three bet-sizing players act in the

game tree, then some of the sequences will be invalid. For example, if all bet-sizing players choose a

pot bet, after three bet-sizing players have acted there are 54 big blinds in the pot, where each player

has contributed 27 big blinds. This is 12 more big blinds than the initial stack size. Alternatively,

if I create a game tree where at most two bet-sizing players can act in any bet-sequence, then some

legal bets will be missing from the tree. For example, if the first two bet-sizing players to act choose

half-pot bets, then it is legal to bet half-pot or pot, so another bet-sizing player could be added to the

game tree.

43

If I do not allow any sequences where three bet-sizing players act, the abstractions I create may

not be able to take advantage of a bet action that results in a higher game utility. Unfortunately,

the legality of the third bet is dependent on the size of the previous bets and this information is

hidden by the information sets, due to the private actions of the bet-sizing players. While missing an

opportunity to bet may lower the utility, in order to avoid allowing illegal bets I must assume all bet-

sizing players choose H when creating the game tree. This means that if the algorithm selects many

small bets, it may not be able to select as many small bets as are possible in the full game. I refer

to this disparity between the shortest and longest legal bet-sequence as the “tree-depth problem”. It

gets worse as the betting ranges get larger, and conversely it can be minimized by selecting small

ranges.

Recall that in the previous section we saw the importance of allowing bet sizes to increase or

decrease beyond fixed range boundaries. Thus I desire a way to use small ranges, but adjust these

ranges as bets become stuck at one of the ends. If I change the ranges, I must reconstruct the game

tree, as the new range may permit a different number of bets in the tree. If many of these ranges

move down, I need a mechanism for reducing both L and H so that the tree becomes deeper (due to

a smaller H) and the preferred bet size can move lower towards the new L. If the algorithm favors a

larger bet size, I need to increase both L and H and reconstruct the tree based on a larger H , which

may make the tree more shallow.

5.7 Variable range boundaries

Table 5.2 suggested that my expert-informed choice of fixed [0.8, 1] ranges in the Texas hold’em

strategy computation was likely suboptimal, since the bet sizes for the most important bets hit the

range boundaries. Some bets should be smaller than 0.8 and others should be larger than 1. However,

I showed in the previous section that large ranges cause problems due to the large variability of the

pot size for a given betting sequence. My approach is to minimize the impact of the tree-depth

problem by fixing the size of the range, while moving the two range boundaries.

The first step of this process is creating the initial game tree. I start with a single default range

[Ld, Hd]. I initialize all ranges to this default range. In the previous section I showed that I should

use Hd to determine the depth of the game tree to assure that there are no illegal bets. To do this,

the tree is created depth-first, and at each bet-sizing player the H action is taken. This gives me

my initial multiplayer betting game G′. This is passed in to my new algorithm, Range Enhanced

Bet-size Extraction through Transformation Solving, or RE–BETS. This algorithm is outlined in

Algorithm 2.

After some initialization of variables, I set G1 = G′ and apply the BETS algorithm. Note,

however, that instead of running BETS for many iterations (T = 200, 000, 000) with a single fixed

range, I perform R short runs (a few million iterations). At the end of each run r, BETS returns an

abstraction Ar. I then change all ranges of Gr so that the new range for each particular bet-sizing

44

Algorithm 2 The RE–BETS algorithmRequire: Multiplayer betting game G′

Require: R runsRequire: T iterations of BETS per runRequire: Ld

Require: Hd

Require: αSet w = Hd − Ld

Set G1 = G′

for r from 1 to R− 1 do

Ar = BETS(Gr, T )Create new multiplayer betting game Gr+1 from Gr as follows:for each bet-sizing player i in Gr do

Obtain si from Ar

Hi = si + w/2 and Li = si − w/2end for

Set N to the root node of Gr+1

AdjustTree(Gr+1, N, Ld, Hd, α)end for

AR = BETS(GR, T )return Set of abstractions A

player is centered on the effective bet size in Ar computed for that bet-sizing player by BETS. If,

therefore, the effective bet size for any bet-sizing player hits a range boundary or was near a range

boundary at the end of a run, the bet size will be able to move further in that direction on the next

run. I maintain the same width of 0.2 for all ranges in the tree (I revisit this idea in Chapter 6).

After all of the ranges have been changed, there will be some points in the tree where an extra

bet will now fit, and at other points bets that were previously legal become illegal. The next step

of the algorithm is to add and remove bet sizes for opponent nodes, as well as bet-sizing players.

Algorithm 3, AdjustTree, performs this task. Each time a bet-sizing player is encountered I must

check that the L and H actions are both legal, and if they are not I must adjust them accordingly.

This is done by Algorithm 4, AdjustRange. Note that it is not necessary to call AdjustRange on

newly added bet-sizing players, provided that I set Ld ≥ 0.5, as 0.5 is always at least a minimum

bet. I return to this discussion in Section 6.5.

AdjustTree (Algorithm 3) recursively performs a depth-first traversal of the tree, starting from

the root node. As the legality of different bet sizes does not depend on the chance nodes, I only need

to consider each betting sequence once, independent of the cards. For this reason I skip over chance

nodes. At each opponent node I cycle through all the bets in the opponent abstraction. If the bet is

not already in the tree, but is now legal, it is added to the tree, and a node below it is added with

actions fold, call and all-in. No other actions are added at the new node at this point; all other legal

betting actions will be added when that node is reached by AdjustTree. If, conversely, the bet being

considered was in the tree, but is now illegal, it and the sub-tree below it are removed. A similar

process is performed for the main player nodes, using Hd to determine if a bet-sizing player can be

45

Algorithm 3 AdjustTreeRequire: GRequire: NRequire: Ld

Require: Hd

Require: αif N is an opponent node then

for each bet size b in the opponent abstraction do

if b is not the in tree then

if b is legal then

Add bet b to G at N , and a node below b with fold, call and all-inend if

else

if b is no longer legal then

remove b and the sub-tree below from G at Nend if

end if

end for

else if N is not a chance node then

Denote the size of an all-in bet at this decision node as sall−in

if there is no bet-sizing player at N then

if Hd < sall−in − α then

Add a bet-sizing player to G at N using [Ld, Hd], and an opponent node below this withfold, call and all-in

end if

else

Denote this bet-sizing player iLet Li, Hi be the L and H values used by bet-sizing player i at N(Li, Hi, useBet) = AdjustRange(Li, Hi, α, sall−in)if useBet == false then

Remove bet-sizing player i and the sub-tree below from Gend if

end if

end if

if N is a chance node then

Sample c from chanceTake action cDenote the current node N ′

G = AdjustTree(G,N ′, Ld, Hd, α)else

for each action a at N do

Take action aif a leads to bet-sizing player i then

Take action Hi

end if

Denote the current node N ′

G = AdjustTree(G,N ′, Ld, Hd, α)end for

end if

return G

46

Algorithm 4 AdjustRangeRequire: LRequire: HRequire: αRequire: sall−in

useBet = truew = H − Lif L < (sminbet + α) then

L = sminbet + αH = L+ wif H > sall−in − α then

H = sall−in − αif H − L < w/2 then

useBet = falseend if

end if

else if H > sall−in − α then

H = sall−in − αL = H − wif L < (sminbet + α) then

L = sminbet + αif H − L < w/2 then

useBet = falseend if

end if

end if

return L,H, useBet

47

added to the tree. If there is a bet-sizing player at this node (whether it was already there or has just

been added), AdjustRange is now called.

In AdjustRange (Algorithm 4) I first check if, for the bet-sizing player i, Li has decreased to the

point where it is within a small constant α of the minimum bet size sminbet. If so, I increase Li

to Li = sminbet + α and adjust Hi upwards to maintain the constant range size w.5 Conversely,

if bet-sizing players that acted previous to player i have increased their H values, then Hi may be

larger than an all-in bet size. This may also happen after I adjust the range upwards to accommodate

the minimum bet size. If this happens, I set Hi = sall−in − α and set Li to accommodate the fixed

range size. If both constraints cannot be met while maintaining a fixed width w, I reduce the range

width as necessary. If, however, the range size decreases to less than half of the original width w, I

remove bet-sizing player i from the game, along with the sub-tree beneath that player. At this point

the minimum bet is quite close to an all-in bet, and so having a bet-sizing player here is unlikely to

add value.

Once AdjustTree finishes, I have an abstraction Ar. The entire process is repeated R times,

producing a set A containing R abstractions.

Consider applying the RE–BETS algorithm to the game from Section 5.6, but instead using

the default range of [0.8, 1]. I start with the bet-sizing tree shown in Figure 5.3(a) (in these examples

I round the pot size to the nearest 0.1 big blinds for ease of presentation, RE–BETS uses exact

values). Note that this tree has a structure identical to Figure 5.2 if the bold edges are not included.

For the purposes of this example, I will assume that the BETS algorithm always picks the L action

for all bet-sizing players. In reality BETS may (and most likely will) produce different bet sizes for

each bet-sizing player. Thus the first run of BETS yields bet sizes of 0.8 for all bet-sizing players.

The new ranges for all bet-sizing players are [0.7, 0.9]. The new bet-sizing tree is shown in Figure

5.3(b). While the pot sizes are now different, the structure of this tree is the same as that of Figure

5.3(a), as after two 0.9 bets a bet of size Hd = 1 would still be illegal.

On the next run of BETS I again assume all L actions are picked, and thus this leads to bet

sizes of 0.7. This produces the bet-sizing tree shown in Figure 5.3(c). Similarly run r = 3 produces

Figure 5.3(d). Run r = 4, however, produces a different structure, shown in Figure 5.3(e). With

ranges of [0.4, 0.6], the pot size is at most 9.7 big blinds after two bet-sizing players have acted. This

means after a bet of size Hd = 1 the maximum possible pot size is 29 big blinds, which is allowed

as each player starts with 15 big blinds.

Note that RE–BETS also adds opponent bets to the game tree when they become legal. In

this example the opponent abstraction is FCPA, so the opponent’s bet size (pot) is the same as

the default bet size for new ranges (Hd). This means that on run r = 4 after BETS completes,

for any action sequence in which two bet-sizing players have acted, a pot sized bet will be legal.

RE–BETS will add this action to the game tree for the opponent in these spots.

5A value of α = 0.01 was used in all of my experiments

48

13.5

2

18

6

15.615.6

5.2

0.8 1

0.8 11 0.8

(a) Bet-sizing tree for multiplayer betting game G1

11.5

2

15.6

5.6

13.413.4

4.8

0.7 0.9

0.7 0.90.9 0.7

(b) Bet-sizing tree for multiplayer betting game G2

9.6

2

13.5

5.2

11.411.4

4.4

0.6 0.8

0.6 0.80.8 0.6

(c) Bet-sizing tree for multiplayer betting game G3

8

2

11.5

4.8

9.69.6

4

0.5 0.7

0.5 0.70.7 0.5

(d) Bet-sizing tree for multiplayer betting game G4

16.9

6.5

2

9.7

4.4

7.97.9

19.5 20.5 20.5

3.6

0.4 0.6

0.4

0.8 0.8 0.81

0.4

1

23.7

1

23.7

1

2925.2

0.8

0.60.6

(e) Bet-sizing tree for multiplayer betting game G5

Figure 5.3: Bet-sizing trees of multiplayer betting games produced by RE–BETS. The values ateach node represent the number of big blinds in the pot at that point, rounded to the nearest 0.1.

49

5.8 Leduc and Texas hold’em revisited

I applied RE–BETS to the restricted Leduc game I introduced in Section 5.2, using default ranges

of [0.8, 1] for all bet-sizing players. Using 2, 4, 6, 8 and 10 million iterations per run of BETS,

the game value of the equivalent abstract games converged each time within 10 iterations of RE–

BETS, beating the value of the FCPA abstraction by 21% and 9% for players one and two re-

spectively, as compared to the 12% and 7.7% gains I showed in Section 5.2. The game value of the

abstractions produced by RE–BETS levelled off in each case before the 10 iterations had com-

pleted. This shows that RE–BETS can out perform a single long run of BETS, even when the

width of the ranges used for that single run (0.7) is larger than the width (0.2) used in RE–BETS.

The equivalent abstract games contained many important bet sizes outside the range [0.8, 1.5] — for

the 10 million iteration run of player one, 3 of the top 4 bets, as ranked by RTi,|·|0, were greater than

1.5.

This Leduc game has a cap of two bets per round and starting stacks of 400 antes. With a two bet

per round cap and large stacks, I was able to use large ranges in Section 5.2 and apply BETS while

avoiding the tree-depth problem described in Section 5.6. The betting sequence length is at most 4

bets and the starting stack is large enough to make 4 bets of 1.5 pot each. With RE–BETS, this

restriction is no longer necessary. So I applied RE–BETS to an unrestricted Leduc game, with 200

big blind stacks. Using the same ranges and run lengths for BETS, and FCPA as the opponent

abstraction, the abstractions I created improved over a baseline FCPA abstraction by ≈ 30% and

≈ 43% for player one and player two respectively.

5.8.1 Texas hold’em

In the Texas hold’em experiment described by Table 5.2 most bets were close to an edge of their

range after 1 million iterations, with the 10 top bets, as ranked by RTi,|·|0, being close to an edge

within 8 million iterations. With this result in mind I applied RE–BETS with various values of

T (the number of iterations BETS will execute) between 2 and 10 million, and R = 20 (R is

the number of times BETS is run). I found that for both players, the game value of the equivalent

abstract games I created would initially increase after each run of BETS, eventually peaking. While

T = 2, 000, 000 was not enough, using T = 8, 000, 000 led to good performance, being slightly

edged out by T = 10, 000, 000. Using T = 10, 000, 000 the game value of the equivalent abstract

games peaked after R = 8 for player one and R = 6 for player two, improving over the baseline

FCPA action abstraction by ≈ 48% and ≈ 88% respectively. I call these peak-value abstractions

P18 and P26 and I use these for the analysis presented in the remainder of this chapter.

Good no-limit agents typically use at least two bet sizes in their action abstractions - usually pot

and half pot. Since half pot bets are small, they increase tree depth, so the number of half pot bets

is usually restricted [25, 58]. I created a number of action abstractions that use unrestricted pot bets,

plus one half pot bet, once per round on specific rounds. I also created a larger action abstraction

50

including 0.5, 0.75, 1, 3 and 11 pot bets, allowing only one 0.5 bet and one 0.75 bets on each of the

flop, turn and river. This is the action abstraction used by the winner of the 2011 no-limit bankroll

instant run-off division of the ACPC. These abstractions were made asymmetrically - one player was

given extra betting options while the other used the opponent betting abstraction (FCPA). These

abstractions and my generated abstractions are listed in Table 5.3, along with the number of (I, a)

(information set, action) pairs they contain. The number of extra (I, a) pairs does not depend on

which player has the added bets. The amount of memory required to apply the CFR algorithm is

proportional to the number of (I, a) pairs in the game [68].

Name Pot fractions in abstraction # (I, a) pairsFCPA 1 1,781,890HPF 1, 0.5 once on pre-flop 3,867,090HF 1, 0.5 once on flop 3,820,140HT 1, 0.5 once on turn 3,646,890HR 1, 0.5 once on river 3,143,140

HFTR 1, 0.5 once on flop, turn and river 9,115,390ACPC2011 1, 3, 11, 0.5 and 0.75 once on flop, turn and river 31,135,210

P18 Bet-sizing agent, player 1, 8 runs 2,930,050P26 Bet-sizing agent, player 2, 6 runs 3,121,775

Table 5.3: Number of (I, a) pairs in various action abstractions, each with FCPA opponents, in 5bucket perfect recall card abstractions of Texas hold’em.

As all of these action abstractions share the same opponent, this allows me to directly compare

each of their game values. I used chance-sampling CFR to find ε-Nash equilibria, and then I com-

puted best responses to these equilibria, obtaining upper and lower bounds on the game value for

each player in each case. Figures 5.4 and 5.5 shows the results for some of the abstractions that were

used for players one and two, respectively.6 The Y-axis represents milli-big blinds won by player two

(the dealer), thus player one wishes to reduce this value, while player two wishes to increase it. The

bounds are very tight - for the basic FCPA abstraction a run of about 200 million iterations would

in practice be considered long enough to produce an acceptable ε for this size of game. To obtain

these bounds I ran chance-sampled CFR between 2 and 4 billion iterations on each of the bet-sizing

abstractions and 5 billion iterations on each of the other abstractions. Not all of the abstractions

listed in Table 5.3 are included in these plots. HFTR overlapped bounds with ACPC2011 for both

players, with ACPC2011 being slightly better in both cases. For clarity HFTR is not included in the

plots. For player one (Figure 5.4) HPF showed little improvement over FCPA, and HF , HT and

HR all performed roughly the same. Thus only FCPA, HF and ACPC2011 are shown. For player

two (Figure 5.5), abstractions HT and HR are not included in the plot as they were only slightly

better than FCPA.6See the appendix for the exact values of the curves and their bounds for all game value curves presented in this dissertation

51

Figure 5.4: Bounds on game values of various betting abstractions for player one against an FCPAplayer two, in a 5 bucket perfect recall card abstraction of Texas hold’em. The Y-axis is value toplayer two, so player one wishes to reduce this number.

Figure 5.5: Bounds on game values of various betting abstractions for player two against an FCPAplayer one, in a 5 bucket perfect recall card abstraction of Texas hold’em. The Y-axis is value toplayer two, so player two wishes to increase this number

As shown in Figure 5.5, abstraction P26 out performs all other abstractions, beating ACPC2011

by ≈ 15% after 6 iterations of RE–BETS. For player one, abstraction P18 had bounds on its peak

52

game value close to those of ACPC2011 and HFTR, and easily beat the rest of the abstractions, as

shown in Figure 5.4. Additionally, we can see from Table 5.3 that abstractions P18 and P26 are

about one tenth the size of ACPC2011. Smaller betting abstractions are always favoured over larger

ones with similar value, as this allows for refinement of the card abstraction. In Chapter 7 I examine

how the value of the action abstractions I create changes as the card abstraction is refined.

A histogram showing the distribution of the top 10% of bets for abstractions P18 and P26,

ranked according to RTi,|·|0, is shown in Figure 5.6. Each bet size was rounded to the nearest 0.1 —

the 0.7 bar includes bets from 0.650− 0.749. We can see that no one bet size is preferred - a variety

of bets from 0.2 to 1.5 are used. The important bet sizes are different for each player - P18’s range

from 0.2 to 0.7 along with a couple of large bets > 1, while P26’s vary from 0.4 to 1.

Figure 5.6: Distribution of important bet sizes for abstractions P18 and P26. Bets are rounded tothe nearest 0.1

5.9 ACPC 2012 competition entry

The RE–BETS algorithm was used in the creation of a no-limit Texas hold’em poker agent that

was entered in the 2012 ACPC. In creating the action abstractions for each player I used a different

opponent action abstraction and a different card abstraction from the rest of this chapter. The action

abstractions were generated using a small card abstraction, then combined with a much larger card

abstraction and passed in to a state-of-the-art distributed implementation of the Monte Carlo CFR

(MCCFR) algorithm which created the final production agent [18, 40, 46]. I defer the discussion of

the choices of card and opponent action abstraction and the merits of using a small card abstraction

to create the action abstractions and a big card abstraction to create the final agents to Chapters 6

53

and 7. In this section I simply present the results of this application.

The Texas hold’em poker agent I created won the no-limit bankroll instant run-off division and

placed second in the no-limit bankroll division. It used the ACPC2011 abstraction introduced in

Section 5.8.1 as the opponent action abstraction and a default range of [0.5, 0.7]. A value of T =

10, 000, 000 was used for each run of BETS. It uses an imperfect recall card abstraction, generated

using the k-means clustering state abstraction technique of Johanson et al. mentioned in Section

3.7.1 [42]. It has 169 buckets on the pre-flop,7 100 buckets on each of the flop and turn, 50 buckets

on the river for player one and 100 buckets on the river for player two. Note that, as this is an

imperfect recall card abstraction, it is smaller than the perfect recall 5 bucket abstraction used in the

rest of this chapter. In a perfect recall card abstraction the number of buckets is exponential in the

rounds — a perfect recall 5 bucket game has 54 = 625 bucket sequences that must be considered on

the river. Initially, I used 100 river buckets in this imperfect recall card abstraction for both players.

For player one the first few iterations of RE–BETS yielded very similar results with just 50 river

buckets, and so that smaller abstraction was used to speed convergence.

In Chapter 7 I discuss in detail the question of how to evaluate an imperfect recall card abstrac-

tion. In creating this agent, however, I used a simple technique: RE–BETS was run until the top

bets, as ranked by RTi,|·|0, were no longer pinned at the edges of their ranges after the BETS al-

gorithm had finished running. In Section 5.8.1 we saw that for a 5 bucket card abstraction and an

FCPA opponent action abstraction, it only took R = 6 and R = 8 runs of BETS for players one

and two respectively for the game value curves to peak. In this case, however, the top bets, as ranked

by RTi,|·|0, did not level off as fast. In total 24 runs of BETS were used for player 1 and 9 runs of

BETS were used for player 2.

While a very small card abstraction was used by RE–BETS to obtain my action abstraction, the

next step in the creation of the final agent was to apply this action abstraction to a game with a larger

card abstraction. The action abstractions I generate are compatible with any card abstraction, as the

bet-sizing players choose their bets based solely on the betting sequence, independent of the cards.

The size of card abstraction that can be used is directly related to the size of my action abstractions

— the smaller the action abstraction, the more room there is for refinement of the card abstraction.

To get an idea of the size of the action abstractions created by RE–BETS I compare them to a

baseline action abstraction using ACPC2011 for both players.

The winning entry of the bankroll instant run-off division of the ACPC in 2011 used this ACPC2011

action abstraction along with an imperfect recall card abstraction with 169 pre-flop buckets and

3700 buckets on the flop, turn, and river. That agent has 2, 685, 582, 334 (I, a) pairs, requiring

40 gigabytes of RAM to use a CFR-based solution technique. In contrast, if the action abstrac-

tions I generated are paired with the same card abstraction the resulting game has 787, 297, 938 and

1, 235, 465, 864 (I, a) pairs, requiring 12 and 18.5 gigabytes of RAM respectively for players one

7There are only 169 possible unique hands in the pre-flop of Texas hold’em.

54

and two. So the card abstraction was increased to have 169 pre-flop buckets and 18630 buckets on

the flop, turn and river. This combination of my action abstraction and a large card abstraction was

still small enough to require only 42 gigabytes of RAM for player 2 and 24 gigabytes of RAM for

player 1 when loaded by the MCCFR algorithm [18, 40, 46]. MCCFR was run for approximately 18

days on a cluster using nodes of 4 AMD 6172 processors with 12 cores each, running at 2.1GHz.

All 48 cores were used in the calculation. In total 141 billion iterations were run for player one,

and 93 billion iterations for player two. Each of these nodes has 256 gigabytes of RAM , and so a

much larger card abstraction could have, theoretically, been solved. The larger the card abstraction,

the longer it takes to solve the game — this 18630 bucket abstraction was used due only to time

constraints.

As with any production agent, this agent had to have a translation algorithm to account for bets

made by opponents that were not contained within the ACPC2011 opponent abstraction. I used the

translation system designed by Schnizlein et al. [59, 58], discussed in Section 3.8.2. In addition to

this there is one other component to this agent — a technique known as thresholding is used, where

we alter the strategy of the agent such that any actions it would choose less than t% of the time are

never chosen, with this probability being distributed across the rest of the actions with greater than

t% probability of being chosen. While removing some of the theoretical guarantees of an ε-Nash

equilibrium, this technique has been shown to perform better in practice in matches against other

poker agents [17].

The results of the bankroll instant run-off division of the 2012 ACPC competition are shown in

Figure 5.7, courtesy of the ACPC website [1]. This agent was entered on behalf of the University

of Alberta Computer Poker Research Group (CPRG), who use the name “hyperborean” for their

entries in these competitions. Each column of the table represents an elimination round of the

run-off competition, after which the weakest agent is removed from the group and the scores are

recalculated. The values given in each cell of this table are the totals for the agents listed in the rows

against the all agents that remain in the corresponding round of the competition. As you can see in

rounds 0 and 1 hyperborean was in third place, then in rounds 2 and 3 hyperborean rose to second

overall. During rounds 0 through 3 little rock led by a significant margin, due to the fact that it beat

the worst agents by a larger margin. However from round 4 on hyperborean was the clear leader.

Hyperborean beat every other agent in the individual one on one matches.

While this agent performs well, it is worth noting that a false assumption was made in creating

it — that T = 10, 000, 000 would be sufficient to adjust ranges in the appropriate way when using

this card abstraction and opponent action abstraction. Better action abstractions can be generated if

larger values of T are used. In addition, this reduces the value of R needed for convergence. I will

discuss this further in Chapter 7.

55

Figu

re5.

7:R

esul

tsof

the2012

AC

PCno

-lim

itba

nkro

llin

stan

trun

-off

com

petit

ion,

inm

illi-

big

blin

ds.

56

Chapter 6

Multiple fixed-ratio ranges

In this chapter I consider generalizing RE–BETS to use more than one bet-sizing agent at each

decision node, as well as ranges set by fixing the ratio of the rate at which the high and low bets

cause the pot size to grow. In Section 6.1 I outline my methodology for choosing the number of

iterations of BETS to use each time it is run. Section 6.2 explores the effect that different default

ranges have on the results of RE–BETS. This leads me to the idea of fixed-ratio ranges, detailed

in Section 6.3, which have many advantages over fixed-width ranges. I introduce the RED–BETS

algorithm, which uses fixed-ratio ranges, in Section 6.4. Next I explore using more than one bet-

sizing player at each decision node. I implement this in the REDM–BETS algorithm in Section

6.5, which leads to an improvement in game value over RED–BETS. The analysis of Section 6.6,

however, shows that we can maintain this gain while only using multiple bet-sizing players at a few

key points in the game tree. The combination of REDM–BETS and this idea culminate in my

final algorithm, REFINED–BETS. Section 6.7 shows the application of REFINED–BETS

to Texas hold’em, and Section 6.8 shows that REFINED–BETS helps avoid the local maxima

problem introduced in Section 4.5.

I continue to use the same 5 bucket perfect recall card abstraction of no-limit Texas hold’em in

this chapter as was used in Chapter 5, as well as an opponent action abstraction of FCPA. I consider

imperfect recall card abstractions and more complex opponent action abstractions in Chapter 7.

6.1 Choosing T for BETS

In Section 5.8 I applied RE–BETS (Algorithm 2) with a default range of [0.8, 1] to a 5 bucket

perfect recall abstraction of no-limit Texas hold’em. I used the movement of the most important

bet-sizing players, as ranked by RTi,|·|0, to decide how many iterations I should use when running

BETS, settling on T = 10, 000, 000. For player one this led to equivalent abstract games that

increased in game value until the peak point, maintaining the peak value after each subsequent run

of BETS. For player two, however, the game value peaked after 6 runs, then dipped slightly before

levelling off. In order to see if the dip in the curve is related to the number of iterations per run of

57

BETS, I repeated the experiment from Section 5.8 using T = 20, 000, 000. This led to a very minor

improvement in game value at the peak point. Notably, however, the peak value was maintained for

both players even after running BETS 20 times.

Now I consider why T = 10, 000, 000 is enough to reach approximately the same peak value

reached when using T = 20, 000, 000, but not enough to maintain that peak (for player two). It is

important to realize that even once the game value has peaked, the game tree continues to change

each time BETS is run. The tree size can grow if, for example, bet-sizing players with relatively

minor influence on the game value choose smaller and smaller bet sizes. This leads to extra (in-

consequential) bet-sizing players being added to the tree during the AdjustTree phase (Algorithm

3, Section 5.7) of RE–BETS. If the game tree grows, larger values of T are required to achieve

the same game value. In general, for the poker games studied in this dissertation, the size of the

abstractions generated using RE–BETS (and its variants I introduce in this chapter) increases each

time BETS is run.

With this in mind, the first time I run BETS I use a larger value of T than I expect I will need,

much as was done in Sections 5.3 and 5.5. I then use the movement of the most important bet-sizing

players over the course of this run, as ranked by RTi,|·|0, as an indication of what value of T will

be needed for subsequent runs to convergence. For clarity of presentation, I do not delve in to the

details of how I choose T in each case. As I discussed in Section 5.5, the value of the abstraction

generated on the first run of BETS changes very little once important bets are close to the edge of a

range, so using more iterations on the first run of BETS has little effect on the game value curves.

6.2 Changing the default range

In Section 5.8 I used a default range of [0.8, 1] for all bet-sizing players. Recall that the default

range is used to initialize the ranges of the bet-sizing players, both in the initial game tree and

during AdjustTree (Algorithm 3), when it is used to decide whether to add a bet-sizing player to the

tree (see Section 5.7). When considering the choice of a default range, the following questions come

to mind:

• How does the choice of default range affect the number of times BETS must be run before

the game value curve peaks?

• Does the choice of default range affect the value at the peak point?

To address these questions, I ran the experiment from Section 5.8 three times, using a different

default range for all bet-sizing players each time. For the reasons described in Section 6.1 I used

T = 20, 000, 000 for each run of BETS. The results of the first 10 runs, for player one, are shown

in Figure 6.1.

The first thing to note looking at this figure is that all three default ranges reach approximately

the same game value after the tenth run of BETS. Additionally, examining the top 10% of bets

58

Figure 6.1: Bounds on game values of abstractions produced by RE–BETS, using three differentdefault ranges, for player one against an FCPA player two, in a 5 bucket perfect recall card ab-straction of Texas hold’em. The Y-axis is value to player two, so player one wishes to reduce thisnumber.

for these final abstractions as ranked by RTi,|·|0, they were all within 0.1 of their value in P18, the

abstraction I created for player one in Section 5.8.1 in each case. This suggests that when we allow

a maximum of one bet-sizing player at every decision node, different starting conditions for RE–

BETS may lead to very similar action abstractions with approximately equivalent game values,

given the right value of T in each case. Even if there are multiple different-valued maxima of the

multiplayer betting game, as discussed in Section 4.5, RE–BETS is converging on maxima with

very similar values.

The same experiment for player two is shown in Figure 6.2. These results are very similar to

that of player one, except for the fact that the smallest default range, [0.4, 0.6], peaks at a slightly

lesser game value than the other two. This result is similar to the problem discussed in Section 6.1

— because the default range is centred on a smaller bet size, bets are being added to the game tree

more often. They are added as soon as a bet of size Hd = 0.6 becomes legal. This leads to bet

sequences which have very little influence on the game value tending to be longer, and thus the size

of the equivalent abstract games is bigger. So while T = 20, 000, 000 is enough to converge on

approximately the same game value when the default range is centred on a larger bet size, when it is

centred on a smaller bet size we need more iterations of BETS to achieve the same game value.

Something that separates the results of RE–BETS using different default ranges, for both play-

ers, is the number of short runs it takes to converge on the game value. The reason for this becomes

clear when we consider that in each case the bet sizes are converging on the same values, indepen-

59

Figure 6.2: Bounds on game values of abstractions produced by RE–BETS, using three differentdefault ranges, for player two against an FCPA player one, in a 5 bucket perfect recall card ab-straction of Texas hold’em. The Y-axis is value to player two, so player two wishes to increase thisnumber.

dent of where they start. The most that a range can move in one short run is 0.1, which would happen

if the average effective bet size is one of the edges (so either B(P (H)T

i ) = H or B(P (H)T

i ) = L).

As the majority of the important bets are < 1 (see Figure 5.6), the larger the bet size at the centre

of the default range is, the longer it will take to get to these bet sizes. While for this game the

simplest solution is to choose a default range centred on a small bet size, such as [0.4, 0.6], this

leads to larger games that require larger values of T . Additionally, changing the card abstraction

and opponent action abstraction may lead to a very different distribution of bets, so it is not clear

that a range centred on a smaller bet size will always be better (see Chapter 7). In the next section

I describe a variable-width alternative to these fixed-width ranges that, among other advantages,

causes the width of ranges to increase when the centre of the range moves towards larger bet sizes,

and decrease when it moves towards smaller bet sizes.

6.3 Fixed-ratio ranges

In all experiments to this point I have used a fixed range width of 0.2. Now I consider how the choice

of L or H made by a bet-sizing player using fixed-width ranges of 0.2 affects the growth of the pot.

To do this recall that in Section 5.4 I introduced the pot-growth function G(s) of bet size s, restated

here:

G(s) = 1 + 2s. (6.1)

60

This is defined such that if the pot size is p then after a bet of size s the new pot size p′ is

p′ = pG(s). (6.2)

I now define the pot-growth ratio Pi of a bet-sizing player i as the ratio of the pot size after the H

action to the pot size after the L action:

Pi =pG(Hi)

pG(Li)

=G(Hi)

G(Li).

(6.3)

The smallest default range from the experiments in Section 6.2 is [0.4, 0.6]. Applying Equation

6.2, the pot size p′ becomes

p′ = p(1 + 2(0.4)) = 1.8p (6.4)

after the L action and

p′ = p(1 + 2(0.6)) = 2.2p (6.5)

after the H action, with a pot-growth ratio of 2.2/1.8 = 1.22. Applying the same calculation to the

largest default range used in Section 6.2, [1.2, 1.4], produces pot sizes of 3.4p and 3.8p for L and

H actions, respectively, and a pot-growth ratio of 3.8/3.4 = 1.12. As you can see in this example,

as a fixed-width range move towards larger bets the pot-growth ratio goes down. If, however, I set

ranges such that Pi = c for all bet-sizing players i, where c is a constant, then as the range moves up

the width of the range will increase, and as the range moves down it will decrease. I call such ranges

“fixed-ratio ranges”. Next I examine how setting ranges in this way affects the tree-depth problem

from Section 5.6.

6.3.1 The range-size constraint revisited

In Section 5.6 I showed that to avoid making illegal bets, the game tree of a multiplayer betting game

must be created with the assumption that all bet-sizing players will choose the H action. If through-

out an action sequence all bet-sizing players choose action L, at the end of that action sequence it’s

possible that there will be enough chips left for an extra bet-sizing player using the default range

[Ld, Hd], but this player will not exist in the game tree. I called this the “tree-depth problem”.

Using small fixed-width ranges helps to reduce this problem, which is part of the motivation for

RE–BETS. Here I will show that I can improve upon this approach by using specially chosen

fixed-ratio ranges. This approach allows me to make certain guarantees with regard to how bad this

problem can get.

Note that the tree-depth problem was introduced in Section 5.6 using a game without all-in bets.

The existence of all-in bets helps to reduce the impact of this problem, and I assume I am working

with a game with all-in bets throughout this section, for convenience. A similar analysis applies if

there is no all-in bet in the game.

61

To calculate the bet size s that would put a player all-in on their nth bet, I get the nth root of

S, the number of big blinds in the stacks. I then use Equation 6.1 to solve for s. For example, let’s

assume S = 200, as it does in most of my experiments. The fifth root of 200 is 2001/5 ≈ 2.8853,

meaning that the pot grows by a factor of G(s) ≈ 2.8854 after each bet s. Solving this for s gives

s ≈ 0.943. Thus if five bets of size s = 0.943 are made, then the last bet will be approximately an

all-in bet (within rounding error). Similarly, to determine how many bets are required to get all-in

after six bets of equal size, I calculate 2001/6 ≈ 2.4183, leading to a bet size of s ≈ 0.709. As

before, if six bets of size s = 0.709 are made, the last bet will be approximately an all-in bet.

Before continuing, I consider the effect of bets made by an opponent using an opponent betting

abstraction (making fixed bet sizes). Any such bet will reduce the number of chips remaining in

the stacks, while the ratio of the largest and smallest possible pot sizes does not change, as the

opponent’s bet size is known by all players. Thus, opponent bets reduce the impact of the tree-

depth problem, as reducing the stack sizes means that fewer bet-sizing players can act in this action

sequence. With this in mind I start by considering only the worst-case scenario: action sequences in

which the opponent does not make any bets. In such action sequences all bets are made by bet-sizing

players.

Now I re-visit the tree-depth problem in the initial game tree that will be created by RE–BETS

if we use the default range [0.71, 0.94] for all of our bet-sizing players in this game with S = 200.

Note that I have rounded Hd down from the fifth root, and Ld up from the sixth root. Using RE–

BETS, the game tree is created by substituting a bet of size Hd for all bet-sizing players (see

Section 5.7). As Hd = 0.94 is slightly less than 0.943, at most five bet-sizing players can act in

a single action sequence. If each of these bet-sizing players chooses the H action, then there will

remain only a very small all-in bet. If, instead, each of these five bet-sizing players chooses the

L action, there will be enough chips left for an all-in bet of slightly less than L = 0.71, as six

s = 0.709 bets put a player all-in on the sixth bet. What this means is that even if I changed RE–

BETS to create the game tree assuming all bet-sizing players chose L, I would create the exact

same tree! Thus using a default range of [0.71, 0.94] has eliminated the tree-depth problem on the

initial run of BETS.

Now consider the following theorem.

Theorem 1. In any multiplayer betting game with stacks of S big blinds, if x is the nth root of S, y

is the (n+ 1)th root of S and for every bet-sizing player i Pi = x/y, then after n bet-sizing players

have acted the ratio of the largest to the smallest possible pot size is S1/(n+1) = y.

Proof. After one bet-sizing player acts the ratio between the possible pot sizes will be x/y, by the

definition of Pi. The largest ratio of possible pot sizes after n bets is (x/y)n. By the definitions of

x and y this becomesxn

yn=

S

yn=

S

yn+1/y=

S

S/y= y. (6.6)

62

If RE–BETS is modified to keep the pot growth ratio Pi constant for every bet sizing player i,

then even as ranges move up and down the ratio of the largest to smallest possible pot size is constant

for a given number of actions n by bet-sizing players. Theorem 1 guarantees that if I set the default

range [Ld, Hd] using any two roots of S, in the worst case the smaller pot size will reach the larger

pot size with a bet of size Ld, since this grows the pot by G(Ld) = S1/(n+1).

Additionally, recall that in RE–BETS ranges are never increased to the point where Hi is an

all-in for any bet-sizing player i. Instead Hi is always kept slightly smaller than an all-in bet. Bet-

sizing player i is only removed from the game tree if the range becomes so small that it is impossible

to keep Li greater than a minimum bet, Hi less than an all-in bet, and maintain a width of at least

half of Hd − Ld (this is done by AdjustRange, Algorithm 4). With this in mind I adjust Hd such

that G(Hd) is slightly less than the nth root of S, as was done in the example at the beginning of

this section. Doing this creates an all-in bet after n bet-sizing players act, which will then remain

unless the nth bet-sizing player to act is removed from the game tree (at which point there will still

be an all-in bet after the previous bet-sizing player acts). This all-in bet may be of little utility if all

H actions are used, however it provides an additional betting option of size up to slightly less than

Ld after a sequence of L actions. This guarantees the elimination of the tree-depth problem on the

first run of BETS, for arbitrary choice of n. As I did earlier in the section, I use two decimal points

for my default ranges, rounding Hd up and Ld down.

I must clarify my statement that the tree-depth problem is eliminated. I defined the tree-depth

problem as a situation in which I create a different tree if I assume all H bets are made than if I

assume all L bets are made. This is conceptually simple, and makes sense in the initial multiplayer

betting game. However, in the general case where each bet-sizing player is using different ranges, it

is helpful to consider a different metric. Instead of determining simply whether or not a bet of size

Ld would be legal after a sequence of L actions, consider the size of an all-in bet after a sequence

of L actions (even in a game with no all-in bet, simply consider how large an all-in bet would be).

Before the first run of BETS, the size of this bet is slightly less than Ld, which is reasonable. The

size of this all-in bet will change, however, as bet-sizing players move their ranges up and down.

As bet-sizing players move their ranges towards smaller bets, this all-in bet gets larger, until an

additional bet-sizing player is added. As they move their ranges towards larger bets, this all-in bet

gets smaller, unless a bet-sizing player is removed. Cases where a bet-sizing player gets added do

reduce the size of the final all-in, but increase the ratio of greatest to smallest pot size, and thus

make it possible for an even bigger all-in if bet-sizing players continue to move their ranges towards

smaller bets. Cases where a bet-sizing player is removed aren’t a concern, as we don’t remove them

unless their H bet is very small, meaning the size of the all-in is also quite small at this point.

With fixed-ratio ranges, the size of this all-in bet size only depends on the action sequence with

the most bet-sizing players in it, the fixed ratio P , and the default bet size used to add a new bet-

63

sizing player. With fixed-width ranges, it also matters if the bet-sizing players acting have ranges

centred on very small bet sizes or very big ones. Ranges centred on very small bet sizes make the

problem worse, as they increase the ratio of possible pot sizes. Ranges centred on large bet sizes

reduce the problem, as they decrease this ratio. Consider, however, that ranges centred on small bet

sizes are the problematic ones, as they may increase the number of possible bet-sizing players to act

in a single sequence, and conversely ranges centred on large bet sizes are less of a concern. So using

fixed-width ranges makes the problem worse when it is already bad, and helps alleviate the problem

when it is getting better naturally.

6.3.2 Choice of n

There are two practical considerations that must be made when choosing a value of n for multiplayer

betting games. The first of these is whether we are fixing one player’s bets and applying the trans-

formation to the other player, or applying the transformation to both players. In the former case, as

mentioned in the previous section, any bet made by the fixed-betting player reduces the problem, as

that player’s bet size is known by all players. The second consideration is the number of rounds in

the game. If there are four rounds (as is common in many poker games), then there can only be four

bets made by one player if the other player does not bet or raise. Thus, if one player uses fixed bets,

then for there to be five actions by bet-sizing players in a given betting sequence, then there must be

at least one fixed bet made in that sequence.

I consider choosing n for no-limit Texas hold’em with an (arbitrary) fixed betting opponent (in

the next chapter I will consider an opponent abstraction of ACPC2011). One option is to set the

value of n very low. This will, however, mean that the default range will be centred on a very large

bet size, and the pot growth ratio P will be quite large. For example, a choice of n = 3 corresponds

to a range of [1.39, 2.42] and a pot growth ratio of P = 1.54. With this default range, it will take

many runs to reach small bet sizes, so if we expect that many bet-sizing players will tend to choose

small bet sizes (as was the case in Section 5.7, illustrated in Figure 5.6), this range is not ideal.

Additionally, a large pot growth ratio will make the tree-depth problem worse if many bet-sizing

players choose smaller bets.

The opposite approach would be to use a large value of n, such as n = 7. This would lead to a

default range of [0.47, 0.56] and a pot growth ratio of P = 1.1. The first problem encountered here

is that Ld = 0.47 will not always be legal — when facing a bet of 11 pot (which is included in the

ACPC2011 abstraction) the minimum bet (raise) size is sminbet = 0.48. Also, for games with only

four rounds and a fixed betting opponent, this is overly pessimistic — any betting sequence with

seven bet-sizing players acting in it must also contain three opponent bets. These three opponent

bets significantly reduce the impact of the tree-depth problem.

One last consideration comes from poker theory. Once enough chips have been place in the pot,

both players become “pot committed”. It is generally accepted among poker theorists that being

64

pot committed means that it would almost always be a mistake for either player to fold, and that if

either player bets they should usually make an all-in bet [9, 10, 13, 28, 51, 62]. Miller et al. argue

that once your remaining stack contains less chips than the pot, you are pot committed [13]. Thus

once an all-in bet is sall−in < 1, adding a bet-sizing player at this point seems unlikely to improve

the game value of the abstraction. Putting all of this together, I choose a value of n = 5 for my

experiments in no-limit Texas hold’em for the remainder of this dissertation. This leads to a default

range of [0.71, 0.94], and a pot growth ratio of P = 1.19.

6.4 RED–BETS algorithm

To use fixed-ratio ranges instead of fixed-width ranges, I must alter RE–BETS accordingly. I

define the default pot-growth ratio P d as

P d =1 + 2Hd

1 + 2Ld. (6.7)

For each bet-sizing player i, I must be able to take a bet size si and determine new range boundaries

Hi and Li. To do this I rewrite Equation 6.7 as

P d =1 + 2(si + wi/2)

1 + 2(si − wi/2)(6.8)

where

Hi = si + wi/2 (6.9)

and

Li = si − wi/2. (6.10)

In Equation 6.8 P d and si are known, and I must solve for wi. I do this as follows:

P d =1 + 2si + wi

1 + 2si − wi

P d(1 + 2si − wi) = 1 + 2si + wi

P d + 2siPd − wiP

d = 1 + 2si + wi

P d + 2siPd − 1− 2si = wi(1 + P d).

(6.11)

Simplifying I get

wi =P d(1 + 2si)− (1 + 2si)

P d + 1

wi =(1 + 2si)(P

d − 1)

P d + 1.

(6.12)

I now introduce an update to the RE–BETS algorithm, which I call Range Enhanced and

Depth-fixed Bet size Extraction through Transformation Solving (RED–BETS). It is very similar

65

Algorithm 5 The RED–BETS algorithmRequire: Multiplayer betting game G′

Require: R runsRequire: T iterations of BETS per runRequire: Ld

Require: Hd

Require: α

Set P d = Hd

Ld

Set G1 = G′



Obtain si from Ar

// The line below is the significant change from RE–BETS (Algorithm 2, Section 5.7)wi =

(1+2si)(Pd−1)

Pd+1Hi = si + wi/2 and Li = si − wi/2

end for


AdjustTree(Gr+1, N, Ld, Hd, α)end for


to RE–BETS, with the difference being that instead of fixed-width ranges I use fixed-ratio ranges.

I do this by updating the width using Equation 6.12. RED–BETS is detailed in Algorithm 5.

Note that RED–BETS continues to use AdjustTree (Algorithm 3) and AdjustRange (Algo-

rithm 4, Section 5.7). It is worth noting that an adjustment could be made to AdjustRange to main-

tain fixed-ratio ranges in the event that Li < sminbet+α or Hi > all-in - α. This adds an extra level

of complexity that is very unlikely to add any value to the algorithm. To understand why, consider

that there are four possible scenarios that can be encountered in AdjustRange:

1. Li is too small so it is moved up, and Hi is moved up by the same amount

2. Hi is too large so it is moved down, and Li is moved down by the same amount

3. The range must be squeezed to accommodate both of these constraints.

4. Accommodating these constraints would squeeze the range to less than half its original width,

so we remove this bet-sizing player

If the first scenario occurs, the range is moved upwards. If AdjustRange was modified to main-

tain fixed-ratio ranges, Hi would move up by more than the Li (see Section 6.3.1). This, in turn,

makes it more likely that a bet-sizing player acting after player i will have an H action that will

now be larger than all-in. So in this case leaving AdjustRange unchanged is a conservative move,

possibly making some ranges smaller than necessary while avoiding some extra changes throughout

the tree. In the second scenario the opposite effect occurs — because the range is moving down,

66

Li must be reduced by less than Hi if I wish to maintain fixed-ratio ranges. By leaving this as is,

however, I do not cause any tree-depth problems for other bet-sizing players, because i is the last

bet-sizing player to act in the action sequence.

The third and fourth scenarios would not be affected by an attempt to maintain fixed-ratio ranges,

as the ranges are reduced in size from both ends. Also, it should be noted that any changes made

to ranges by AdjustRange are temporary — every time BETS runs I create new ranges centred on

the effective bet sizes, using Equation 6.12 to create new, fixed-ratio ranges. If, for example, the

second scenario is encountered and a range is moved down, then even if the next time BETS runs

it chooses the exact centre of the bet-sizing player in question’s current range, the new range will be

set by Equation 6.12 and will thus move the L bet size up accordingly.

For these reasons I decided to not to change AdjustRange and so I continue to use the original

version (Algorithm 4).

There is an additional advantage of RED–BETS on top of those listed in Section 6.3.1. Pro-

vided the default range starts low enough, the worst-case number of runs of BETS required to reach

a bet size decreases. I have chosen a default low bet that starts at L = 0.71 (see Section 6.3.2), and

considering that, when facing a pot bet, a min bet is sminbet ≈ 0.33, there is not far to go to reach

very small bets. The scenario that I have to be concerned about is if the range is headed towards a

large bet, say over 2 pot. This will take a minimum of one run of BETS per 0.1 change in bet size

when using RE–BETS. As the default range here starts at width 2.3 ([0.71, 0.94]), and increases

as the centre of the range moves towards larger bet sizes, the number of runs of BETS required to

reach large bets decreases.

As an example, consider the minimum number of runs it would take for each of RE–BETS

and RED–BETS to reach a bet size of s = 2.5. Using RE–BETS with a starting range of [0.8, 1]

it would take at least 15 runs — one run per 0.1 change in bet size. Using RED–BETS with a

starting range of [0.71, 0.94], it would take a minimum of 10 runs.

6.4.1 Application of RED–BETS to 5 bucket Texas hold’em

I ran RED–BETS with a default range of [0.71, 0.94], using an FCPA opponent action abstrac-

tion. I set T = 20, 000, 000 for each run of BETS. Figures 6.3 and 6.4 show the resulting game

value curves for players one and two, respectively, compared to the curves for the same game using

RE–BETS with T = 20, 000, 000 iterations for each run of BETS and a default range of [0.8, 1].

As you can see, both range types hit approximately the same game value, while the fixed-ratio ranges

converge slightly faster in both cases. I use fixed-ratio ranges in my experiments for the remainder

of the dissertation.

67

Figure 6.3: Bounds on game values of abstractions using fixed-ratio vs fixed-width ranges for playerone against an FCPA player two, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player one wishes to decrease this number.

Figure 6.4: Bounds on game values of abstractions using fixed-ratio vs fixed-width ranges for playertwo against an FCPA player one, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player two wishes to increase this number.

68

6.5 Multiple ranges

All of the results I have presented so far have used only one bet-sizing player at each decision node.

A natural question to ask is whether or not using multiple bet-sizing players at a single decision

node would result in a significant gain in game value. Having multiple bet-sizing players at the

same decision node allows for strategic behaviour that is not possible when there is just one bet-

sizing player. For example, it may be advantageous to mix between large and small bet sizes, and

do so with different percentages based on different private cards. For example, if I have a very good

hand, when I bet I may wish to pick a large bet size 80% of the time and a smaller bet size 20% of the

time. If I have a mid-range hand, I may adjust these percentages to 50% each. By mixing between

bets in different ways with different private cards I am, on average, revealing some information

about what cards I hold. The value gained by getting more money in the pot when my hand is better,

however, may outweigh that disadvantage. This type of mixing between different sized bets based

on private information is a common recommendation in the poker literature [10, 27, 51, 62].

Before considering how to expand RED–BETS to allow for multiple bet-sizing players at a

single decision node, I discuss how I go about choosing multiple default ranges. I denote the ith

default range [Ldi , H

di ]. For the smallest default range I use Ld

1 = 0.5. As this is the smallest bet size

that is always legal in no-limit hold’em, any time I add a bet-sizing player using [Ld1, H

d1 ], L

d1 = 0.5

it is guaranteed to be legal, independent of the opponent abstraction that is used. Using P = 1.19

and substituting in to Equation 6.3 this leads to an Hd1 value of

1 + 2Hd1

1 + 2(0.5)= 1.19

Hd1 = 0.69.

(6.13)

Thus the smallest default range is [0.5, 0.69]. For the second default range I choose Hd2 such the

pot growth ratio of an Hd2 sized bet would be slightly smaller than the fourth root of 200. This

leaves an all-in bet in the game tree after any action sequence in which four bet-sizing players using

[Ld2, H

d2 ] have acted. For a discussion of why it is advantageous to choose G(Hd) to be slightly

less than a root of the stack size in big blinds (200) see Section 6.3.1. As 2001/4 ≈ 3.7606, I set

Hd2 = 3.76−1

2 = 1.38. Again using P = 1.19 and Equation 6.3 this gives me Ld2 = 1.08. I do some

experiments where I allow two bet-sizing players throughout the tree, and in these experiments I use

default ranges of [0.5, 0.69] and [1.08, 1.38].

In some additional experiments I add a third default range along with these two. Hd3 is picked,

again, to have a pot growth ratio slightly smaller than a root of 200, this time the third root. The

third root of 200 is 2001/3 ≈ 5.848, so I set Hd3 = 5.84−1

2 = 2.42. Once more setting P = 1.19 and

using Equation 6.3 I get Ld3 = 1.95. A summary of the default ranges used for these multiple range

experiments, along with the significance of each choice of default range, is given in Table 6.1.

69

Maximum bet-sizing players per node Default ranges Significance1 [0.71, 0.94] After 5 0.94 bets there is a small all-in2 [0.5, 0.69] 0.5 is the smallest bet that is always legal

[1.08, 1.38] After 4 1.38 bets there is a small all-in3 [0.5, 0.69] 0.5 is the smallest bet that is always legal

[1.08, 1.38] After 4 1.38 bets there is a small all-in[1.95, 2.42] After 3 2.42 bets there is a small all-in

Table 6.1: Default ranges used in multi-range experiments

6.5.1 REDM–BETS

I expand RED–BETS to multiplayer betting games with multiple bet-sizing players at each deci-

sion node. I call this new algorithm Range Enhanced and Depth-fixed Multiple Bet size Extraction

through Transformation Solving (REDM–BETS). Initially, I form the game tree by adding all de-

fault ranges i whenever Hdi is legal. After each run of BETS I follow the same approach as RED–

BETS, but cycling over all ranges. I continue to use fixed-ratio ranges with the ratio of P = 1.19

calculated using n = 5 for all the reasons described in Section 6.3.2. Note that with numRanges

default ranges there can be many places in the tree with less than numRanges bet-sizing players,

as there may not be enough chips left for all Hdi bets at these points. The full algorithm is given by

Algorithm 6.

Algorithm 6 The REDM–BETS algorithmRequire: Multiplayer betting game G′

Require: R runsRequire: T iterations of BETS per runRequire: numRanges, the number of default ranges to useRequire: Ld[], an array of size numRangesRequire: Hd[], an array of size numRangesRequire: α

Set P d = Hd[1]Ld[1]

Set G1 = G′



Obtain si from Ar

wi =(1+2si)(P

d−1)Pd+1

Hi = si + wi/2 and Li = si − wi/2end for


// The line below is the significant change from RED–BETS (Algorithm 5, Section 6.4)AdjustTreeMultiBets(Gr+1, N, numRanges, Ld[], Hd[], α)

end for


This algorithm differs only slightly from RED–BETS. It now takes as input numRanges,

70

the number of default ranges to use, and arrays of Ld[] and Hd[] of size numRanges. The game

tree now has numbered branches for each bet-sizing player — if a bet-sizing player is added using

default range j, then from that point on it is “using range j”. This way I can tell if I need to try

adding a bet-sizing player using default range j, or if that player has already been added in the past

(but may be using a very different range now).

The call to AdjustTree has been replaced by a new algorithm, AdjustTreeMultiBets (Algorithm

7). AdjustTreeMultiBets is, conceptually, a simple extension of AdjustTree (Algorithm 3, Section

5.7) to multiple ranges. Where AdjustTree assumed only one range was used, AdjustTreeMulti-

Bets loops over all ranges j, either attempting to add a bet-sizing player using default range j or

calling AdjustRange on the bet-sizing player already using range j. If only one range is passed to

AdjustTreeMultiBets, it becomes equivalent to AdjustTree.

AdjustTreeMultiBets uses 1-indexed arrays; that is, Ld[] and Hd[] go from 1 to numRanges. I

use 1-indexed arrays in algorithms for the remainder of the dissertation.

6.5.2 Application of REDM–BETS to 5 bucket Texas hold’em

For one default range I used T = 20, 000, 000 as I did in Sections 6.4 and 6.1. Again I used an

opponent action abstraction of FCPA. Following the methodology outlined in Section 6.1, for two

default ranges I used T = 45, 000, 000 and for three default ranges I used T = 70, 000, 000. In all

three cases I completed 10 runs of BETS. The game value curves for player one and player two

are shown in Figures 6.5 and 6.6, respectively.

Figure 6.5: Bounds on game values of abstractions using one, two and three default ranges for playerone against an FCPA player two, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player one wishes to decrease this number.

71

Algorithm 7 AdjustTreeMultiBetsRequire: G, N , αRequire: numRanges, the number of default ranges to useRequire: Ld[], an array of size numRangesRequire: Hd[], an array of size numRanges

if N is an opponent node then



if b is legal then


else



end if

end for



// The addition of this loop is the significant change from AdjustTree// (Algorithm 3, Section 5.7)for j from 1 to numRanges do

if there is no bet-sizing player using range j at N then

if Hd[j] < sall−in − α then

Add a bet-sizing player to G at N using [Ld[j], Hd[j]], and an opponent node belowthis with fold, call and all-in

end if

else

Denote this bet-sizing player iLet Li, Hi be the L and H values used by bet-sizing player i at N(Li, Hi, useBet) = AdjustRange(Li, Hi, α, sall−in)if useBet == false then


end if

end for

end if






Take action Hi

end if


G = AdjustTreeMultiBets(G,N ′, numRanges, Ld[], Hd[], α)end for

end if

return G

72

Figure 6.6: Bounds on game values of abstractions using one, two and three default ranges for playertwo against an FCPA player one, in a 5 bucket perfect recall card abstraction of Texas hold’em.The Y-axis is value to player two, so player two wishes to increase this number.

Looking at these results it is clear that using two ranges instead of one leads to a small improve-

ment in game value. The peak value of the two range runs improved over the peak value of the

one range runs by ≈ 23% for player one and ≈ 6% for player two. Additionally, the game value

achieved on the first iteration of REDM–BETS is much better when using two ranges than it is

when using one. This is not surprising, considering that even if both ranges are not of value, it is

likely that one of the two bet-sizing players using [0.5, 0.69] and [1.08, 1.38] will be closer to the

final bet size than a single bet-sizing player using [0.71, 0.94].

Interestingly, however, the two range and three range curves are virtually identical, both for

player one and for player two. As using three ranges creates a larger tree, there is no reason to

use a three range abstraction in this case, as we achieve approximately the same game value with

a smaller two range abstraction. Furthermore, while these results tell us that using more than one

range can lead to a gain in game value, it’s not clear that it is valuable to have multiple ranges all

throughout the tree. In fact, as I show in the next section, most of the gain in value comes from just

a few key extra bets, and identifying where I need these extra bets allows me to create even smaller

abstractions while maintaining the game value.

6.6 Pruning the tree

Now I introduce the idea of using the importance metric RTi,|·|0 introduced in Section 5.4 to prune

the game tree after a single run of BETS, to determine which decision nodes have more than one

73

bet-sizing player that is valuable enough to keep. At a high level, the idea is to keep only the most

important bet-sizing player at each decision node, and then in addition to this add a few extra bet-

sizing players that, among the remaining bet-sizing players, have the highest RTi,|·|0 values. I use the

parameter e to define the number of extra bet-sizing players to keep. For example, if e = 1 I keep

only one extra bet-sizing player. If I am pruning based on two ranges, then there will be exactly e

decision nodes in the tree with two bet-sizing players to choose from. So for e = 1, there will be

one decision node at which there are two bet-sizing players.

I implement this in FindImportantBets, Algorithm 8. I create a hash keepBetP layers{} map-

ping from bet-sizing player histories to a boolean indicating if they will remain in the tree. I

initialize this hash as false for all. I now go through the entire tree, and at each decision node

with actions leading to bet-sizing players, if bet-sizing player i has the largest RTi,|·|0 value, I set

keepBetP layers{h(i)} = true. Next, I sort the RTi,|·|0 values of the remaining bet-sizing players,

and I set keepBetP layers{} to true for the top e bet-sizing players. Note that the chance of a tie

in RTi,|·|0 values is extremely small, as these values are summed over millions of iterations. In the

event of a tie I simply keep the first bet-sizing player I encounter.

After running FindImportantBets, I must create a new tree removing all bet-sizing players that

are not marked to be kept. This is detailed in PruneTree, Algorithm 9.

Algorithm 8 FindImportantBetsRequire: GRequire: eRequire: Rvalues{}

// The public action sequence is the sequence of actions known to all players// Thus, this does not include the actions of bet-sizing playersDenote the public action sequence that leads to bet-sizing player i as a(i)keepBetP layers{}, a hash map from public action sequences a(i) to boolean; initially falsefor each node N in G1 do

if N is a main player node then

maxR = 0for each bet-sizing player i at N do

if Rvalues{a(i)} > maxR then

maxR = Rvalues{a(i)}keepBetP layer = a(i)

end if

end for

if maxR > 0 then

keepBetP layers{a(i)} = trueRemove bet-sizing player keepBetP layer from Rvalues{}

end if

end if

end for

Sort Rvalues{} from largest to smallest, store sorted a(i) sequences in sortedExtraBets[]for extraBet from 1 to e do

keepBetP layers{sortedExtraBets[extraBet]} = trueend for

return keepBetP layers{}

74

Algorithm 9 PruneTreeRequire: GRequire: N

// The public action sequence is the sequence of actions known to all players// Thus, this does not include the actions of bet-sizing playersDenote the public action sequence that leads to bet-sizing player i as a(i)

Require: keepBetP layers{}, a hash map from public action sequences a(i) to booleanfor each action a at N do

if a leads to bet-sizing player i then

if keepBetP layers{a(i)} then

Take action a, followed directly by action Helse


else

Take action aend if


G = PruneTree(G,N ′, keepBetP layers{})end for

Once I’ve finished pruning the tree, I now continue as I did in RED–BETS and REDM–

BETS. I center ranges on the effective bet sizes given by BETS, recalculating the range size to

maintain the fixed-ratio, and adding and removing bet-sizing players from the end of the tree. Note,

however, that I use only the bottom default range to add bet-sizing players from this point on. Recall

that, with fixed-ratio ranges, the worst-case all-in bet size depends on the default bet size used to

add a new bet-sizing player (see Sections 6.3.1 and 6.3.2). Using the bottom default range to add

bet-sizing players is a conservative move, designed to help reduce the size of the worst-case all-in.

As I use a single default range to add bet-sizing players from this point forward, this means that

there will never be more than e extra bets in the tree.

To support multiple bet-sizing players per decision node, but only add bet-sizing players based

on the bottom default range, I must once again modify the AdjustTree algorithm (Algorithm 3,

Section 5.7). The new version, AdjustPrunedTree, is shown in Algorithm 10. AdjustPrunedTree

is a hybrid of AdjustTree and AdjustTreeMultiBets (Algorithm 7, Section 6.5). Like AdjustTree,

it only takes a single default range as input. The loop over bet-sizing players that was added to

AdjustTreeMultiBets still exists, but now is used in a more restricted way. AdjustTreeMultiBets

loops over all ranges, attempting to add bet-sizing players or calling AdjustRange on existing bet-

sizing players. AdjustPrunedTree only attempts to add a bet-sizing player once, as AdjustTree does

(using the bottom default range). If bet-sizing players exist already, however, it loops over all of

them, calling AdjustRange each time.

I combine all of these steps in the final algorithm of this dissertation: Range Enhanced with

Frugal Inclusion of Notable Extra bets and Depth-fixed Bet-size Extraction through Transformation

Solving, or REFINED–BETS. It is detailed in Algorithm 11.

75

Algorithm 10 AdjustPrunedTreeRequire: GRequire: NRequire: Ld

Require: Hd

if N is an opponent node then



if b is legal then


else



end if

end for



if there is no bet-sizing player at N then

if Hd < sall−in − α then

Add a bet-sizing player to G at N using [Ld, Hd], and an opponent node below this withfold, call and all-in

end if

else

// The addition of this loop is the significant change from AdjustTree// (Algorithm 3, Section 5.7)for each bet-sizing player i at N do

Let Li, Hi be the L and H values used by bet-sizing player i at N(Li, Hi, useBet) = AdjustRange(Li, Hi, α, sall−in)if useBet == false then


end for

end if

end if






Take action Hi

end if


G = AdjustPrunedTree(G,N ′, Ld, Hd)end for

end if

return G

76

Algorithm 11 The REFINED–BETS algorithmRequire: Multiplayer betting game G′

Require: R runsRequire: T iterations of BETS per runRequire: numRanges, the number of default ranges to useRequire: Ld[], an array of size numRangesRequire: Hd[], an array of size numRangesRequire: αRequire: P d

Require: eSet G1 = G′

Denote the public action sequence that leads to bet-sizing player i as a(i)keepBetP layers{}, a hash map from public action sequences a(i), to boolean; initially falseRvalues{}, a hash map from public action sequences a(i), to RT

i,|·|0; initially 0for r from 1 to R− 1 do

(Ar, Rvalues{}) = BETS(Gr, T )if r == 1 then

keepBetP layers{} = FindImportantBets(G1, e, Rvalues{})Set N to the root node of G1

G1 = PruneTree(G1, N, keepBetP layers{})end if

Create new multiplayer betting game Gr+1 from Gr as follows:for each bet-sizing player i in Gr do

Obtain si from Ar

wi =(1+2si)(P

d−1)Pd+1

Hi = si + wi/2 and Li = si − wi/2end for


AdjustPrunedTree(Gr+1, N, Ld[1], Hd[1])end for


77

6.7 Application of REFINED–BETS to 5 bucket Texas hold’em

Figures 6.7 and 6.8 show the results of applying REFINED–BETS using e = 1 and e = 25 on

players one and two, respectively. Two default ranges are used in the creation of the initial trees that

are later pruned. After pruning the tree, I used T = 45, 000, 000 iterations of BETS for e = 1 and

e = 25 for both players, as I did for the full two ranges. For comparison, the game value curves for

REDM–BETS using one and two default ranges from Figures 6.5 and 6.6 are reproduced on these

graphs. The game value curves for three default ranges are omitted, as they are almost identical to

the two range curves (see Section 6.5.2).

Figure 6.7: Bounds on game values of abstractions using one and two default ranges for player oneagainst an FCPA player two, in a 5 bucket perfect recall card abstraction of Texas hold’em. Thepruned runs start with two default ranges, keeping e = 1 and e = 25 extra bets. The Y-axis is valueto player two, so player one wishes to decrease this number.

78

Figure 6.8: Bounds on game values of abstractions using one and two default ranges for player twoagainst an FCPA player one, in a 5 bucket perfect recall card abstraction of Texas hold’em. Thepruned runs start with two default ranges, keeping e = 1 and e = 25 extra bets. The Y-axis is valueto player two, so player two wishes to increase this number.

6.7.1 Analysis of performance

Looking at Figures 6.7 and 6.8, we can immediately see an interesting phenomenon on run r = 1.

At this point, for player one (Figure 6.7) the value of the two range abstraction and both pruned

abstractions are very similar, and both have significantly greater value than the one range abstrac-

tion. For player two (Figure 6.8) the e = 25 abstraction again equals the value of the two range

abstraction, but e = 1 does not perform quite as well as it does for player one. The e = 1 abstraction

still has a better game value than the one range abstraction, but lags behind the value of the rest of

the abstractions. It is not particularly surprising that the two range abstractions immediately perform

better than the one range abstractions, as there are two bets to choose from at most decision nodes.1

The fact that the e = 1 abstraction has significantly better game value on r = 1 than the one range

abstraction in both cases, however, shows a big advantage of the pruning process. There are more

ranges to choose from initially, and so the remaining bet-sizing players end up much closer to their

target values after just one run when compared to the bet-sizing players in the one range abstractions.

Once the curves level off for player one (Figure 6.7), the value of the e = 1 abstractions end up

approximately half way between the value of the two range abstractions and the one range abstrac-

tions. This is a notable result; half of the value gained by using two ranges everywhere can be gained

by using two ranges exactly once. Furthermore, using e = 25 makes up the remainder of this gap,

obtaining approximately the same game value as having two ranges everywhere. For player two, the

1At some decision nodes where the pot size is very large an Hd1 = 0.69 bet will be legal and an Hd

2 = 1.38 will not

79

e = 1 abstractions level off to the same value as the one range abstractions, and again the e = 25

abstractions equal the value of the two range abstractions.

To understand why the e = 1 abstractions outperform the one range abstractions for player one,

while providing no advantage at the peak point over one range for player two, I consider the bet

sizes of the extra bet in both cases. For player one, the extra bet is added directly after the opponent

chooses to call the big blind. At r = 5, the bet-sizing player using the lower range chooses a value

of 0.94 while the upper range player chooses 1.71. The strategy created with CFR using this action

abstraction uses both of these bet sizes, betting 0.94 26.5% of the time, and 1.71 16.5% of the time.

The same action sequence in the one range case is using a bet size of 1.28 at r = 5. The strategy

created with CFR using this action abstraction makes this bet 40% of the time. For player two, the

extra bet is added as the very first action of the game. At r = 5, the lower range player chooses

a value of 0.59 while the upper range player uses 0.66. These bets are made 53% and 30% of the

time, respectively. In the one range case a bet of size 0.63 is chosen here at r = 5. This bet is made

84% of the time. The bet sizes chosen by all the bet-sizing players I have mentioned here remain

relatively constant for the next 10 runs of BETS in each case.

The contrast between these two examples is notable. When a second bet-sizing player is added

for player one, the result is that this new player and the other bet-sizing player at the same decision

node choose very different bet sizes, both a significant distance from the bet size chosen if no extra

player is added, which falls in between them. In the case of player two, however, the extra bet-

sizing player chooses a bet size very similar to that chosen by the other bet-sizing player at the same

decision node. The bet size chosen at this point when no extra player is added is, again, in between

these two values. The result for player one is a notable increase in game value, while for player two

if there is an increase in game value it is insignificant.

This is not an isolated incident. If the e = 25 runs are extended to r = 15, and bets within 0.1

of each other are considered to be merged, there are only e = 16 remaining extra bet-sizing players

for player one and e = 18 remaining extra bet-sizing players for player two. There are a couple of

things to note about this result. First, the fact that many bet-sizing players converge on very similar

values when initialized with different ranges suggest that local maxima are not causing a significant

problem for those players. Second, this illustrates a limitation of pruning the bet-sizing players on

run r = 1. Until completing many runs of BETS, there is no way to know which bet-sizing players

will end up very close to one another and which ones will be separated by a significant amount. On

a practical level, the fact that abstraction sizes tend to increase over time, first mentioned in Section

6.1, dictates that I should only run until I hit the peak game value and no longer, as the abstractions

tend to get bigger after this point without adding value. The increases in abstraction size dwarf the

memory savings of removing duplicate bet sizes. It typically takes 10 − 15 runs for bets to merge,

and the game value curves usually level off well before that. For this reason I simply overestimate

the number of extra bets e that will actually be needed, and I do not try to do any duplicate detection

80

or removal.

It should be noted that another possible option would be to complete many runs of BETS

to determine which bet sizes eventually merge. I could then take the abstraction created at the

point where the game value curve levels off, and for each group of bet-sizing players that merged

together on a later run, choose just one bet-sizing player to keep and remove the rest. This would

reduce the size of the tree while, presumably, maintaining the peak game value.2 This would take

significant computational time for small memory savings, and thus would only be worthwhile if

memory resources are extremely limited.

6.7.2 Size comparison

Now I compare the sizes of the one, two and three range abstractions along with the pruned two

range abstractions. For the two range abstraction on run r = 5 there are e = 585 and e = 706 extra

bets for players one and two, respectively. Recall that the pruned trees have e extra bets after the

pruning step on run 1, and continue to have e extra bets from then on, as from this point forward

we only add bets using the bottom default range (see Section 6.6). To put these difference in terms

of tree size, I show the (information set, action) pairs of the one range, e = 1, e = 25, two range

and three range abstractions for both players one and two on run r = 5 in Tables 6.2 and 6.3,

respectively.

Abstraction # (I, a) pairsOne range (e = 0) 3,217,970

Start with two ranges, e = 1 4,678,670Start with two ranges, e = 25 7,104,940

Two ranges, no pruning (e = 585) 14,585,790Three ranges, no pruning (e = 1470) 28,734,090

Table 6.2: Number of (I, a) pairs in various action abstractions for player one, generated usingREDM–BETS and REFINED–BETS. In each case BETS was run five times. In all casesthe opponent uses FCPA, and a 5 bucket perfect recall card abstraction of Texas hold’em was used.

Abstraction # (I, a) pairsOne range (e = 0) 3,614,400

Start with two ranges, e = 1 5,223,850Start with two ranges, e = 25 7,591,300

Two ranges, no pruning (e = 706) 17,906,500Three ranges, no pruning (e = 1585) 30,797,090

Table 6.3: Number of (I, a) pairs in various action abstractions for player two, generated usingREDM–BETS and REFINED–BETS. In each case BETS was run five times. In all casesthe opponent uses FCPA, and a 5 bucket perfect recall card abstraction of Texas hold’em was used.

Examining these tables I note two things. First of all, the e = 1 abstractions are ≈ 45% larger

than the one range abstractions in both cases. This is due to the fact that the extra bet-sizing player

2This is not guaranteed — it’s possible that additional runs may then be required to reach the peak point once more

81

is added very early in the tree in both cases (see Section 6.7.1), creating a large sub-tree beneath it.

So choosing e = 1 increases the size of the abstractions greatly, while only providing an increase

in game value in the case of player two. The e = 25 abstractions are another ≈ 52% and ≈ 45%

larger than the e = 1 abstractions for players one and two, respectively. This is still less than half

the size of the two range abstractions for both players, and about one quarter the size of three range

abstractions. As we saw in Section 6.7.1, however, the e = 25 abstractions obtain approximately

the same game value as these larger abstractions that do not use pruning.

This clearly quantifies the advantage I gain by pruning — by using e = 25, I reduce the tree by

a factor of 2 from two default ranges, or 4 from three default ranges, while maintaining the game

value. If pruning is to be used, however, it is important to choose a value of e that is large enough to

capture the most important extra bets, and not just extra bets that will merge with ones that already

exist while making the tree larger. While e = 25 is enough to equal the value of two (and three)

ranges in this game, the number of extra bets needed varies as the card abstraction and opponent

abstraction change. I discuss this further in the next chapter. Additionally, for 5 bucket perfect recall

Texas hold’em with an FCPA opponent I pruned based on two default ranges, as two and three

default ranges obtain approximately equivalent game value. The pruning process can be applied

with any number of default ranges; in the next chapter I will show examples where I prune based on

three default ranges.

6.8 Kuhn revisited

In this section I show that REFINED–BETS can help to avoid local maxima in half-street Kuhn

poker (see Chapter 4), without prior knowledge of the global maxima. As I showed in Chapter 4,

running BETS with one range converges to the optimal value of 0.414 in half-street Kuhn poker as

long as 0 ≤ L ≤ 0.414 and 0.414 ≤ H < 1. If L > 0.414 and H < 1, then BETS will converge

to L. If H < 0.414, BETS will converge to H (See Section 4.4).

A problem arises when H > 1. The BETS algorithm may converge to a bet size s ≥ 1, because

all bet sizes greater than one are part of a local maxima. If s ≥ 1 then the Nash equilibrium strategy

is for the betting player to stop bluffing, and the responding player stops calling with their Queen. As

a result, both L and H have the same value, and the bet sizing player cannot gain value by changing

their strategy. I introduced this problem in Section 4.5 and showed the game value of half-street

Kuhn for bet sizes from 0 to 2. I reproduce that figure again here as Figure 6.9 for convenience.

When REFINED–BETS is applied to this game, as long as one of the initial ranges has

H < 1, we can avoid this local maxima problem and, given enough runs of BETS, arrive at

the optimal bet size of 0.414. I applied REFINED–BETS with the three default ranges I use

throughout this chapter, [0.5, 0.69], [1.08, 1.38] and [1.95, 2.42]. Running for 500, 000 iterations, as

I did in Section 4.4, I kept only the bet with the highest importance value after the first run. This

led to keeping the bottom range, with a bet size of 0.5. This range has the highest importance value

82

Figure 6.9: Half-street Kuhn poker game value per bet size, extended to 2 pot.

because both 0.5 and 0.69 have higher utility than bets in the other two ranges (see Figure 6.9). On

the next run this bet sizing player converged to within 1% of 0.414, the optimal value.

Repeating this experiment using ranges [0.1, 0.2], [0.5, 0.69] and [0.71, 0.94], and again pruning

after run 1, the second range was the only remaining bet sizing player. In this case the bottom range

chooses 0.2, the top range chooses 0.71 and the middle range chooses 0.5. As we can see from

Figure 6.9, 0.5 has the highest game value of these, and thus the highest importance value. Once

more the algorithm converged to 0.414 on the next run. This showcases the power of REFINED–

BETS - if we have at least one range in the region of the global maxima, we can avoid local

maxima. Additionally, we arrive at the correct answer very quickly if we use many ranges on the

first run, as pruning using the importance values correctly picks the bet-sizing player closest to the

global maxima.

83

Chapter 7

Changing the card and opponent

abstractions

In this chapter I apply the REDM–BETS and REFINED–BETS algorithms to imperfect re-

call card abstractions and two different opponent action abstractions. I begin by describing how

I evaluate action abstractions paired with imperfect recall card abstractions (Section 7.1). Next, I

apply REDM–BETS and REFINED–BETS from Chapter 6 to a small imperfect recall card

abstraction of no-limit Texas hold’em in Sections 7.2.1 and 7.2.2. For these experiments I continue

to use an FCPA opponent action abstraction. I discuss how using a larger card abstraction leads

to very little change in the action abstractions created in Section 7.3. This leads me to the idea

of creating action abstractions using small card abstractions, then pairing these action abstractions

with larger card abstractions, which I explore in Section 7.4. In the next section (7.5) I consider the

effect of changing the opponent action abstraction to the ACPC2011 abstraction (first introduced in

Section 5.8.1). Finally, in Section 7.6, I describe how an agent was created using one of my action

abstractions paired with a very large card abstraction, and show how this agent compares to other

top no-limit Texas hold’em agents.

7.1 Evaluation in imperfect recall games

In Chapters 5 and 6 I used a 5 bucket perfect recall card abstraction of Texas hold’em in my experi-

ments. The advantage of using a perfect recall card abstraction is that I can calculate best response

values within that card abstraction (combined with the action abstraction I’m using), and thus obtain

bounds on the possible values of a given (action abstraction, card abstraction) pair. When using

imperfect recall card abstractions things are more complicated, as computing a best response is in-

tractable [67]. While imperfect recall makes evaluation more difficult, it is generally considered

to create better performing card abstractions in comparison with perfect recall in no-limit Texas

hold’em ([32, 42, 67]). For this reason I wish to use imperfect recall card abstractions, and so in this

section I describe my methodology for evaluating them.

84

I start the evaluation process, as before, by creating an ε-Nash equilibrium strategy profile for the

abstraction I wish to evaluate. The next step when working with perfect recall card abstractions is to

determine absolute bounds on the value of the action abstraction within the given card abstraction.

Instead, a large number of hands of self-play are dealt, in the full no-limit game, with this strategy

profile. For a given number of hands, a 95% confidence interval on the result of the self-play match

can be calculated, assuming a normal distribution for the results of each match. Enough hands are

played such that this confidence interval is approximately the same width as the absolute bounds

were when using best responses on the perfect recall card abstractions. I found that achieving this

level of confidence required playing 1, 200, 000, 000 hands, leading to 95% confidence intervals

of approximately ±1.1 milli-big-blinds. Across a range of different agents the confidence intervals

remained approximately the same width. I use these empirical bounds when creating the game value

curves for the action abstractions generated using imperfect recall card abstractions.

To see how these empirical bounds compare to the absolute bounds I calculate for perfect recall

games, I performed this procedure on the abstractions created in Section 6.4. These abstractions

were generated using a 5 bucket perfect recall game, with an FCPA opponent action abstraction.

They were generated by applying REDM–BETS to one default range of [0.71, 0.94]. I ran self-

play matches of 1, 200, 000, 000 hands for each of the first 10 runs of REDM–BETS for both

player one and player two. The resulting 95% confidence intervals on the bounds of the game value

are shown in Figures 7.1 and 7.2, along with the absolute bounds.

Figure 7.1: Absolute bounds compared with empirical 95% confidence intervals on the game valueof abstractions created using REDM–BETS for player one against an FCPA player two, in a 5bucket perfect recall card abstraction of Texas hold’em. The Y-axis is value to player two, so playerone wishes to decrease this number.

85

Figure 7.2: Absolute bounds compared with empirical 95% confidence intervals on the game valueof abstractions created using REDM–BETS for player two against an FCPA player one, in a 5bucket perfect recall card abstraction of Texas hold’em. The Y-axis is value to player two, so playertwo wishes to increase this number.

Looking at both figures it is clear that playing 1, 200, 000, 000 hands leaves a small amount of

variance in the game value, small enough that the general shape of the curve remains the same.

This allows my analysis to proceed as it did when using the absolute bounds. Throughout the

remainder of this chapter I work with imperfect recall card abstractions, and thus I cannot generate

the absolute bounds. Instead I will use the empirical bounds in their place, generated each time by

running 1, 200, 000, 000 hands of self-play in the full no-limit game.

7.2 Results in IR− 100

I now repeat the experiments of Sections 6.5.2 and 6.7 with an imperfect recall card abstraction. All

imperfect recall card abstractions I use in this chapter were generated using the k-means clustering

method of Johanson et al. [42] mentioned in Section 3.7.1. I always use 169 buckets pre-flop, as

that is the number of unique pre-flop hands in Texas hold’em. I use the convention IR − Ns to

refer to an imperfect recall card abstraction with N buckets on the flop, turn and river. If the number

of buckets on each of these rounds is different, I specify IR − Ns on the flop, IR − Ms on the

turn, and so forth, where N and M are numbers of buckets. In this Section I use an IR − 100 card

abstraction, and I continue to use an FCPA opponent action abstraction.

86

7.2.1 REDM–BETS

For my first set of results I apply REDM–BETS to this game using one, two and three default

ranges. Following the methodology outlined in Section 6.1, for one default range I used T =

60, 000, 000, for two default ranges I used T = 70, 000, 000 and for three default ranges I used

T = 80, 000, 000. I use the same default ranges from the previous chapter, as detailed in Table 6.5.

Figures 7.3 and 7.4 show the results of 10 runs of BETS for players one and two, respectively.

Figure 7.3: Bounds on game values of abstractions using one, two and three default ranges for playerone against an FCPA player two, in an IR− 100 card abstraction of Texas hold’em. The Y-axis isvalue to player two, so player one wishes to decrease this number.

The most notable feature of these two graphs is that the game values are significantly larger in

magnitude than they were in the 5 bucket perfect recall game. Figures 6.5 and 6.6 show that in

the case of a 5 bucket perfect recall game, the game value (for player two) ranged from about 20

milli-big blinds to 125 milli-big blinds, depending on if this value was being minimized (player one)

or maximized (player two). In IR − 100 the values range from 105 to 200. The difference in scale

is due to the fact that in the perfect recall game, there is no way for players to distinguish the top

20% of hands that are in the top bucket. This leads to a significant number of all-ins in the pre-flop,

with hands as weak as a King and an 8 of different suits. The effect of this is to reduce the number

of decisions made throughout the game, and thus player two has less opportunities to capitalize on

the advantage they gain by acting after player one. This problem of not being able to distinguish

relatively good hands from the best hands is one of the reasons imperfect recall abstractions are

generally preferred over perfect recall abstractions in no-limit Texas hold’em [32, 67].

Looking at the shapes of Figures 7.3 and 7.4 two other differences from the perfect recall 5

87

Figure 7.4: Bounds on game values of abstractions using one, two and three default ranges for playertwo against an FCPA player one, in an IR− 100 card abstraction of Texas hold’em. The Y-axis isvalue to player two, so player two wishes to increase this number.

bucket game stand out. First, one range does not do quite as well when compared with ACPC2011.

In Figures 6.5 and 6.6 we saw that one range achieves a game value that roughly ties ACPC2011

for player one and surpasses it for player two by ≈ 18%. In IR− 100 one range continues to easily

outperform FCPA, however it does not do as well as ACPC2011 for either player. The second

difference between the two games is that in IR− 100 adding a third range leads to a small increase

in game value over two ranges, while in the 5 bucket perfect recall game adding a third range made

no difference.

Overall it seems that the number or ranges matters more to game value in this imperfect recall

game than it did in the 5 bucket perfect recall game. Next I consider how this impacts pruning of the

tree.

7.2.2 REFINED–BETS

As three ranges performs better than two, I applied REFINED–BETS to this IR − 100 game

by pruning the three range abstraction. This means that, given e > 1 extra bet-sizing players, there

may be places in the tree where two of those extra bet-sizing players are added at the same decision

node, making a total of three bet-sizing players at that node. Once again I show the results of using

e = 1 and e = 25, as I did in Section 6.7 for the 5 bucket perfect recall game (Figures 6.7 and 6.8).

I used T = 65, 000, 000 for e = 1 and T = 80, 000, 000 for e = 25. Recall from Section 7.2.1 that

T = 80, 000, 000 was used for the full three ranges as well.

The results of applying REFINED–BETS with three default ranges to IR − 100 are given

88

in Figures 7.5 and 7.6.

Figure 7.5: Bounds on game values of abstractions using one, two and three default ranges for playerone against an FCPA player two, in an IR − 100 card abstraction of Texas hold’em. The prunedruns start with three default ranges, keeping e = 1 and e = 25 extra bets. The Y-axis is value toplayer two, so player one wishes to decrease this number.

Figure 7.6: Bounds on game values of abstractions using one, two and three default ranges for playertwo against an FCPA player one, in an IR − 100 card abstraction of Texas hold’em. The prunedruns start with three default ranges, keeping e = 1 and e = 25 extra bets. The Y-axis is value toplayer two, so player two wishes to increase this number.

89

Looking at Figure 7.5, we see that e = 1 improves upon the performance of one default range

for player one, as was the case in the 5 bucket perfect recall game. While in the 5 bucket perfect

recall game the extra bet was added in the pre-flop after player two checks, here it’s added in the

pre-flop after player two bets. At that point bets of size 0.93 and 2.47 are used. In comparison,

player one uses a bet size of 1.13 at this point in the one range case. The gains of e = 1 over one

range are comparable to the gains of e = 25 over e = 1. The bounds of e = 25 and two ranges are

very similar on runs 5 and 6, and while two ranges stabilizes, e = 25 trends upwards a bit. Due to

the empirical nature of the bounds it’s hard to draw conclusions about small changes in the value. It

is of note, however, that many of these curves tend to uptick slightly in the last few runs, which is

possibly due to the phenomenon described in Section 6.1 — after each short run the games tend to

get larger, and thus using the same number T of iterations of BETS leads to slightly less accurate

bet sizes. Even if we ignore this upward trend in the last few runs we can see that e = 25 does not

perform as well as three full ranges for player one.

For player two e = 1 improves over the performance of one range, as shown in Figure 7.6. This

is an improvement over the 5 bucket perfect recall game when e = 1 did not add any value. Notably,

e = 1 is enough to equal the value of ACPC2011, averaging a slightly greater value after the curve

levels off, with empirical bounds overlapping. The bet is added at the same point in the game as in

the perfect recall case — as the first action made. In the imperfect recall game the e = 1 case keeps

the first and second default ranges, setting them to values of 0.78 and 0.93. This is different from

the perfect recall game where bet sizes of 0.59 and 0.66 are used. For the e = 25 pruning run the

value does not improve a great deal over that of e = 1, falling below the value of two ranges.

In light of the fact that 25 extra bets is outperformed in both cases by the full three ranges, I

repeated the experiment with e = 50 and e = 100, the results of which are shown in Figures 7.7

and 7.8. For player one we can see a minor improvement over e = 25, with both e = 50 and

e = 100 having very similar values. Their value curves beat those of two ranges on average by a

very small margin, within the empirical bounds. In player two’s case, again there is an improvement

over e = 25, and while it is a larger improvement than for player one, the value still falls short of

three full ranges. e = 100 beats the value of e = 50 by a very small margin on average, again within

the empirical bounds. Overall, the pruned runs are only improving in game value by a marginal

amount as we double the number of extra bets. This is very different from the perfect recall game

where two ranges pruned with just e = 25 obtained the same value as three ranges everywhere.

90

Figure 7.7: Bounds on game values of abstractions using one, two and three default ranges for playerone against an FCPA player two, in an IR − 100 card abstraction of Texas hold’em. The prunedruns start with three default ranges, keeping e = 50 and e = 100 extra bets. The Y-axis is value toplayer two, so player one wishes to decrease this number.

Figure 7.8: Bounds on game values of abstractions using one, two and three default ranges for playertwo against an FCPA player one, in an IR − 100 card abstraction of Texas hold’em. The prunedruns start with three default ranges, keeping e = 50 and e = 100 extra bets. The Y-axis is value toplayer two, so player two wishes to increase this number.

91


Here I repeat the size comparison I did in Section 6.7.2, but applied to this IR − 100 game. As

before, I use run r = 5 to compare the abstractions. The number of extra bets, along with the

number of (information set, action) pairs, are shown in Tables 7.1 and 7.2.

Abstraction Value to player two # (I, a) pairsOne range (e = 0) 130.2± 1.01 610,016

Start with three ranges, e = 1 119.9± 1.05 779,458Start with three ranges, e = 25 112.2± 0.99 1,681,840Start with three ranges, e = 50 111.3± 1.03 1,938,254

Start with three ranges, e = 100 110.7± 1.03 2,388,896Two ranges, no pruning (e = 547) 112.3± 1.11 2,701,068

Three ranges, no pruning (e = 1203) 107.7± 0.97 4,726,748

Table 7.1: Number of (I, a) pairs and game value in various action abstractions for player one,generated using REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and an IR − 100 card abstraction of Texas hold’emwas used.

Abstraction Value to player two # (I, a) pairsOne range (e = 0) 180.8± 0.98 636,016

Start with three ranges, e = 1 184.5± 1.00 1,001,686Start with three ranges, e = 25 190.7± 0.98 1,988,482Start with three ranges, e = 50 190.4± 1.06 2,264,124

Start with three ranges, e = 100 192.9± 0.96 2,616,552Two ranges, no pruning (e = 576) 193.1± 0.97 3,199,296

Three ranges, no pruning (e = 1516) 195.5± 1.02 6,086,776

Table 7.2: Number of (I, a) pairs and game value in various action abstractions for player two,generated using REDM–BETS and REFINED–BETS. In each case BETS was run fivetimes. In all cases the opponent uses FCPA, and an IR − 100 card abstraction of Texas hold’emwas used.

For player one, e = 1 is only ≈ 28% bigger than one range, as compared to ≈ 57% bigger for

player two, while it was ≈ 45% bigger for both players in the 5 bucket perfect recall game. In fact,

each of the abstractions for player two are larger than the player one equivalents, something that was

true for the 5 bucket perfect recall game in all cases except e = 1 (see Tables 6.2 and 6.3). This

trend continues when I switch to an ACPC2011 opponent action abstraction (see Section 7.5).

The size-to-value ratio of these abstractions is not as simple as it was for the 5 bucket perfect

recall game. Pruning is still advantageous — the value of e = 50 ties or beats two range for both

players, and is 72% of the size of two ranges for player one and 71% of the size for player two.

Which of these abstractions is considered best, however, depends on how much memory one is

willing to trade for diminishing returns in game value. If beating the ACPC2011 abstraction’s value

is considered to be a requirement, then I would want to use at least e = 50 for player one and at least

e = 25 for player two. If I wish to beat the value of two ranges by a significant amount, I can either

use three full ranges, or use a larger value of e to create a game between the sizes of the e = 100

92

abstractions the full three range abstractions (e = 1516) that has between their values as well. I will

consider this trade-off further in Section 7.5.

7.3 Scaling the card abstraction

As IR − 100 proved to be a very different card abstraction from the 5 bucket perfect recall card

abstraction used in Chapters 5 and 6, it is natural to wonder what happens when the number of

imperfect recall buckets is increased. Thus, I applied REDM–BETS to an IR− 570 abstraction,

generated using the same technique that was used to create the IR−100 abstraction, and maintaining

the FCPA opponent action abstraction. The reason 570 buckets are used is actually a relic from

limit Texas hold’em, where an IR − 570 abstraction is approximately the same size as a 5 bucket

perfect recall abstraction. In no-limit Texas hold’em IR− 570 with an FCPA action abstraction is

≈ 25% larger than the 5 bucket perfect recall game with the same FCPA action abstraction.

I used T = 100, 000, 000 for one default range, T = 130, 000, 000 for two default ranges, and

T = 160, 000, 000 for three default ranges. Figures 7.9 and 7.10 show the results of REDM–

BETS applied to this card abstraction using 10 runs of BETS.

Figure 7.9: Bounds on game values of abstractions using one, two and three default ranges for playerone against an FCPA player two, in an IR− 570 card abstraction of Texas hold’em. The Y-axis isvalue to player two, so player one wishes to decrease this number.

93

Figure 7.10: Bounds on game values of abstractions using one, two and three default ranges forplayer two against an FCPA player one, in an IR − 570 card abstraction of Texas hold’em. TheY-axis is value to player two, so player two wishes to increase this number.

In comparing these results with REDM–BETS applied to IR − 100 (Shown in Figures 7.3

through 7.8), the resemblance is striking. While some of the values have shifted a few milli-big

blinds, the shapes of the curves are virtually identical. To get some insights in to why this is, I

compare the most important bets in IR− 100 to those of IR− 570.

7.3.1 Bet size distributions

To see why the value curves of IR − 100 and IR − 570 are so similar, I now look at how the size

and importance of bet sequences change as the distributions change. I took the top 25 bet-sizing

players, as ranked by RTi,|·|0, for the IR − 100 three range abstraction after run r = 5. The betting

sequence of these bet sizing players, along with the bet size they use, is shown in the first and third

columns of Table 7.3 and Table 7.4. The action sequences representing bet-sizing players can be

interpreted as follows: for the main player the character ‘d’ represents the first range, ‘e’ represents

the second range and ‘g’ represents the third range. For the opponent ‘d’ represents a pot sized

bet. A ‘/’ signifies the end of a round of betting. In each case the action sequence ends with the

bet in question — for example the top three ranked bet-sizing players for player two (Table 7.4) in

IR− 100 are ‘e’, ‘d’ and ‘g’ — these are ranges two, one and three, respectively, on the first action

of the game.

Next I found the corresponding bet-sizing players in the IR − 570 three range abstraction after

run r = 5. The fourth and fifth columns of the same tables show the relative importance of each

of these bet-sizing players in the IR − 570 game, along with the bet size they use in that game.

94

Action sequence IR− 100 ranking IR− 100 bet size IR− 570 ranking IR− 570 bet sizedd 1 0.831 4 0.879dg 2 2.547 3 2.457de 3 0.808 2 0.891cg 4 1.680 5 1.709ce 5 1.539 6 1.455cd 6 1.097 7 0.977

dc/d 7 0.191 1 0.190ddc/d 8 0.210 8 0.236dec/d 9 0.230 9 0.210

dc/cc/e 10 1.073 15 0.839dc/cc/d 11 0.359 11 0.298cgc/d 12 0.251 12 0.265cdc/d 13 0.246 13 0.224dgc/d 14 0.589 14 0.447cec/d 15 0.276 10 0.226

dc/cdd 16 0.340 21 0.340dc/cc/cc/d 17 0.409 24 0.230

dc/cc/g 18 1.408 16 1.210cc/cc/d 19 0.510 26 0.510dc/dc/d 20 0.250 18 0.240

ddc/dc/d 21 0.500 37 0.526dc/e 22 0.610 17 0.600

dc/cdc/cc/d 23 0.287 34 0.190dc/cc/ec/d 24 0.381 61 0.403

ddc/e 25 0.660 25 0.600

Table 7.3: Number of (I, a) pairs in various action abstractions for player one, generated usingREDM–BETS and REFINED–BETS. In each case BETS was run five times. In all casesthe opponent uses FCPA, and a 5 bucket perfect recall card abstraction of Texas hold’em was used.

Examining these tables, it’s clear that while the order of importance of the bet-sizing players changes

slightly, in general the same bet-sizing players are important in both cases. For player one (Table

7.3) there are only four bet-sizing players in the top 25 of IR − 100 that are not in the top 25 of

IR − 570. The highest ranking of these bet-sizing players that didn’t make the top 25 of IR − 570

is bet-sizing player 19 “cc/cc/d”, which ranks just one out of the top 25 for IR − 570. Player two

(Table 7.4) is similar — there are five bet-sizing players in the top 25 of IR− 100 but not in the top

25 of IR− 570, the highest ranking of which is bet-sizing player 16 “edc/cd”.

Averaging the absolute difference in the bet sizes used by IR − 100 and IR − 570 for these

bet-sizing players, I get an average absolute difference of 0.065 for player one and 0.061 for player

two. The largest absolute difference in bet sizes is 0.198 for player one (row 18 of Table 7.3) and

0.251 for player two (row 11 of Table 7.4). It is clear from the similarity of the game value curves

for each of these card abstractions that these differences in importance ranking the bet size have a

very small impact on the game value. I explore this idea further in the next section.

95

Action sequence IR− 100 ranking IR− 100 bet size IR− 570 ranking IR− 570 bet sizee 1 0.830 2 0.843d 2 0.724 1 0.794g 3 1.210 3 1.210

ec/cd 4 0.393 5 0.441dc/cd 5 0.400 4 0.397gc/cd 6 0.440 8 0.439dc/ce 7 0.600 6 0.600ec/ce 8 0.600 7 0.600

ec/cdc/cd 9 0.493 15 0.710cc/cd 10 0.510 10 0.510

dc/cdc/cd 11 0.506 16 0.757ec/cc/cd 12 0.430 20 0.444

ec/cg 13 1.350 12 1.210gc/ce 14 0.720 9 0.600

dc/cc/cd 15 0.440 19 0.400edc/cd 16 0.495 32 0.580ddc/cd 17 0.545 23 0.541gc/cg 18 1.358 14 1.210cc/ce 19 0.600 13 0.611edd 20 0.994 46 0.973

dc/cg 21 1.420 11 1.210edc/ce 22 0.600 22 0.600

dde 23 1.199 28 1.137ec/cdc/ce 24 1.289 37 1.278dc/cdc/ce 25 1.243 45 1.308

Table 7.4: Number of (I, a) pairs in various action abstractions for player two, generated usingREDM–BETS and REFINED–BETS. In each case BETS was run five times. In all casesthe opponent uses FCPA, and a 5 bucket perfect recall card abstraction of Texas hold’em was used.

7.4 Using action abstractions from IR− 100 in IR− 570

In the previous section I showed how similar the distribution of important bet sizes of IR− 100 and

IR−570 are on run r = 5. In this section I go a step further and ask the question: if I took the action

abstractions created in the IR − 100 game, then paired them with the IR − 570 card abstraction,

what would the value be? The answer to this question is presented in Figures 7.11 and 7.12.

96

Figure 7.11: Bounds on game values of abstractions using one and two default ranges for playerone against an FCPA player two, in an IR − 570 card abstraction of Texas hold’em. Some betsizes are calculated using an IR− 100 card abstraction before being paired with the IR− 570 cardabstraction. The Y-axis is value to player two, so player one wishes to decrease this number.

Figure 7.12: Bounds on game values of abstractions using one and two default ranges for playertwo against an FCPA player one, in an IR − 570 card abstraction of Texas hold’em. Some betsizes are calculated using an IR− 100 card abstraction before being paired with the IR− 570 cardabstraction. The Y-axis is value to player two, so player two wishes to increase this number.

The game value curves of the action abstractions generated using an IR − 100 abstraction and

97

then applied to a IR − 570 abstraction are virtually identical to those created in the IR − 570 ab-

straction. For player one (Figure 7.11) there are a few points where the game value is slightly worse

than that of the abstraction generated in IR− 570, however player two actually shows the opposite

effect effect as well — on a few runs the action abstraction from IR−100 actually performs slightly

better in IR − 570 than the abstraction created in IR − 570! Overall, it is clear that there is little

benefit to the game value if I use an IR−570 card abstraction to create my action abstractions, even

if I eventually wish to pair my action abstractions with an IR−570 card abstraction. It is preferable

to use a smaller card abstraction to create my action abstractions — larger card abstractions require

more memory and a larger value of T for each run.1

Before continuing, recall that the game value curves in IR − 100, while virtually identical in

shape to those of IR − 570, were shifted by a few milli-big blinds. To further illustrate this, I plot

the game value curves from the IR−100 game and the IR−570 game on the same plot. The result

is shown in Figures 7.13 and 7.14 for players one and two, respectively. Player one’s value hits a

worse peak value in the larger game, while the value curves for player two are very similar in both

games. Combined, this indicates that the IR-570 game is slightly better for player two. I will return

to this discussion in Section 7.6.

Figure 7.13: Bounds on game values of abstractions using one and two default ranges for player oneagainst an FCPA player two. The Y-axis is value to player two, so player one wishes to decreasethis number.

1Note, however, that the time per iteration of BETS does not increase as the number of buckets in each round isincreased, as BETS uses chance sampling, and thus it only considers one bucket on each round each iteration

98

Figure 7.14: Bounds on game values of abstractions using one and two default ranges for player twoagainst an FCPA player one. The Y-axis is value to player two, so player two wishes to increasethis number.

7.5 Larger opponent abstraction

Up to this point I have used an opponent action abstraction of FCPA in all of my experiments. I

now apply my algorithms to an imperfect recall IR−100 card abstraction paired with an ACPC2011

opponent action abstraction. I start by examining the game value curve generated using one and two

default ranges in this game. I used T = 70, 000, 000 iterations per run of BETS for one default

range and T = 80, 000, 000 for two default ranges. The results are shown in Figures 7.15 and 7.16

for players one and two, respectively.

99

Figure 7.15: Bounds on game values of abstractions using one and two default ranges for player oneagainst an ACPC2011 player two, in an IR− 100 card abstraction of Texas hold’em. The Y-axis isvalue to player two, so player one wishes to decrease this number.

Figure 7.16: Bounds on game values of abstractions using one and two default ranges for player twoagainst an ACPC2011 player one, in an IR− 100 card abstraction of Texas hold’em. The Y-axis isvalue to player two, so player two wishes to increase this number.

Looking at these graphs, there is an interesting difference from all of the games with FCPA

opponents. In both IR − 100 and IR − 570, one range was never enough to beat the ACPC2011

100

abstraction’s value against FCPA. That remains the case here for player one, however for player

two we can see that one range passes the value of ACPC2011 played against itself. It is important to

note, however, that while one range for player two beats the value of ACPC2011 by a small margin,

one range for player one falls short of the value of ACPC2011 by a bigger margin. So combining the

two values, one range still loses to ACPC2011. For player one two default ranges improves upon

the performance of one, roughly equally the value of ACPC2011. For player two, again the value is

improved over the one range peak value, though not by as much.

To apply my algorithms to three default ranges in this game proved to be very computationally

demanding. Thus, instead of creating a full three range value curve, I ran the first iteration with

three ranges for T = 80, 000, 000 iterations, and then pruned the tree. Pruning based on three

ranges seemed a sensible approach, as in IR− 100 against an FCPA opponent using three default

ranges led to greater peak game value for both players one and two (see Section 7.2.2). Multiple

pruned versions of each player were created. Agents were then created for each of these pruned

abstractions on run r = 1. 1, 200, 000, 000 hands of self-play were played to obtain the value of

each pruned game before further runs of BETS. While two different abstractions having the same

value after the first run does not guarantee they will peak at the same value, if adding bets leads to a

higher value at this point, it is likely that the peak point will have a higher value as well.

The results of these self-play matches after pruning three ranges with various values of e on

run r = 1 are shown in Tables 7.5 and 7.6 for players one and two, respectively. The number of

(information set, action) pairs for each abstraction is also shown. Given the results of Section 7.2.2,

I started with e = 100.

# extra bets (e) Value to player two # (I, a) pairs100 148.57± 1.13 20, 115, 544200 148.47± 1.11 22, 371, 456500 147.69± 1.05 25, 692, 726

Table 7.5: Game values and number of (I, a) pairs in various action abstractions for player one,generated by completing just one run of BETS using three ranges, then pruning with differentvalues of e. In all cases player two uses ACPC2011, and an IR − 100 card abstraction of Texashold’em was used. The game value to player two, so player one wishes to decrease this number.

# extra bets (e) Value to player two # (I, a) pairs100 154.18± 1.08 21, 946, 414200 157.75± 1.12 23, 263, 498500 158.32± 1.15 26, 625, 940

Table 7.6: Game values and number of (I, a) pairs in various action abstractions for player two,generated by completing just one run of BETS using three ranges, then pruning with differentvalues of e. In all cases player one uses ACPC2011, and an IR − 100 card abstraction of Texashold’em was used. The game value to player two, so player two wishes to increase this number.

Examining Table 7.5, note that e = 100 and e = 200 are very close in value — any gain

from adding another 100 bets is so small that many more hands would have to be played to see

101

the difference. Moving up to e = 500 the game value moves downward (which is good, as it is

value to player two), but only by 0.78 milli-big blinds. This gain is well within the 95% confidence

intervals. Considering that e = 500 is 27.7% bigger than e = 100, I decided that this possible gain

in value was not worth the increased abstraction size (and thus computational requirements) and

choose e = 100 from this point on.

Applying the same analysis to Table 7.6, you can see there is a difference of 3.57 milli-big

blinds between e = 100 and e = 200, well outside the confidence intervals. Increasing to e = 500

leads to an even smaller gain than it did for player one, only 0.57 milli-big blinds, again within the

confidence intervals. I decided this was not worth the 14.5% increase in abstraction size and choose

e = 200 going forward.

Given these results I applied REFINED–BETS with e = 100 for player one and e = 200

for player two. The resulting game value curves, along with those of one and two ranges, are shown

in Figures 7.17 and 7.18 for players one and two, respectively.

Figure 7.17: Bounds on game values of abstractions using one and two default ranges, along withthree default range pruned with e = 100, for player one against an ACPC2011 player two, in anIR−100 card abstraction of Texas hold’em. The Y-axis is value to player two, so player one wishesto decrease this number.

102

Figure 7.18: Bounds on game values of abstractions using one and two default ranges, along withthree default range pruned with e = 200, for player two against an ACPC2011 player one, in anIR−100 card abstraction of Texas hold’em. The Y-axis is value to player two, so player two wishesto increase this number.

As was the case with an FCPA opponent, the two range curves and pruned three range curves

have levelled off by run r = 5. Additionally, the values of the two range and the pruned three range

curves are very similar. For player one (Figure 7.17), three pruned ranges performs better than two

ranges all the way, peaking at a slightly higher value and beating ACPC2011, while two ranges only

ties it. For player two (Figure 7.18) two ranges peaks at roughly the same value as the three pruned

ranges.


Now I compare the sizes of the two range abstractions to the sizes of the pruned three range abstrac-

tions. Instead of looking at run r = 5, as I have previously, I consider the sizes at various runs. The

number of extra bets, along with the (information set, action) pairs, are shown in Tables 7.7 and 7.8.

Recall that the size of the abstractions created will change as bet sizes vary, as this can lead to the

addition and removal of bet-sizing players (see Section 5.7). The purpose of this size comparison is

to identify one abstraction for each player with a very good size-to-value ratio. In the next section

I will explain how I then use these abstractions to create a no-limit Texas hold’em agent in a much

larger card abstraction. The better the size-to-value ratio, the larger I can make the card abstraction.

Examining player one (Table 7.7) I note that the size of the two range abstractions reduces from

runs r = 2 to r = 4, while in previous experiments the size of the abstractions created always

increased or stayed constant after each run. While this is an interesting result, the reduction is quite

103

Run # Abstraction Value to player two # (I, a) pairs2 Start with three ranges, e = 100 147.14± 1.01 21, 714, 3303 Start with three ranges, e = 100 146.78± 1.02 23, 895, 3304 Start with three ranges, e = 100 144.29± 1.03 24, 192, 1025 Start with three ranges, e = 100 144.49± 1.08 24, 526, 4742 Two ranges, no pruning (e = 3, 567) 152.57± 1.01 27, 366, 1163 Two ranges, no pruning (e = 3, 616) 149.02± 1.03 27, 334, 5304 Two ranges, no pruning (e = 3, 670) 148.06± 1.03 26, 008, 9445 Two ranges, no pruning (e = 3, 722) 146.98± 1.04 26, 499, 513

N/A ACPC2011 fixed bets 146.88± 1.09 72, 630, 334

Table 7.7: Number of (I, a) pairs in various action abstractions for player one, generated usingREDM–BETS and REFINED–BETS. The size of a symmetric fixed betting abstraction ofACPC2011 is included for reference. In all cases the opponent uses ACPC2011, and an IR − 100card abstraction of Texas hold’em was used.

Run # Abstraction Value to player two # (I, a) pairs2 Start with three ranges, e = 200 158.78± 1.00 26, 781, 5403 Start with three ranges, e = 200 160.45± 0.98 28, 135, 7684 Start with three ranges, e = 200 159.51± 1.05 31, 725, 5825 Start with three ranges, e = 200 160.02± 1.02 34, 146, 1962 Two ranges, no pruning (e = 4, 168) 161.93± 0.98 38, 519, 4003 Two ranges, no pruning (e = 4, 728) 161.23± 1.00 44, 841, 6284 Two ranges, no pruning (e = 5, 236) 159.31± 0.99 51, 622, 6285 Two ranges, no pruning (e = 5, 811) 158.24± 1.04 56, 321, 020

N/A ACPC2011 fixed bets 146.88± 1.09 72, 630, 334

Table 7.8: Number of (I, a) pairs in various action abstractions for player two, generated usingREDM–BETS and REFINED–BETS. The size of a symmetric fixed betting abstraction ofACPC2011 is included for reference. In all cases the opponent uses ACPC2011, and an IR − 100card abstraction of Texas hold’em was used.

small, and on r = 5 the size increases again — overall the size of the two range abstractions seems

to have stabilized over this time. More importantly, the pruned abstractions are smaller than the two

range abstractions, if only by a small amount. As the sizes of these abstractions are all quite similar,

I choose the highest value abstraction, on r = 4, to use with a larger card abstraction. Three ranges

pruned at r = 4 is a minor ≈ 2% improvement in game value over the fixed ACPC2011 abstraction

(see Figure 7.17), however it is only ≈ 33% of the size.

The size of the player two abstractions show very different characteristics from the player one

abstractions. The two range abstractions grow in size by ≈ 9− 15% each run. The pruned abstrac-

tions also grow in size, but at a slower pace, growing at ≈ 5%, ≈ 13% and ≈ 8% between runs 2,

3, 4 and 5 respectively. The two smallest peak abstractions here are r = 3 for the three range prune

runs and r = 2 for the two range runs. The two range run peaks at a slightly higher value, however

it is 37% larger than the three range pruned run at r = 2. Due to this large size difference for a very

small value difference, I choose the three range pruned run on r = 3 to pair with a larger card ab-

straction. As with player one, this abstraction performs better than the fixed ACPC2011 abstraction,

this time by ≈ 9% (see Figure 7.18), and it is only 39% of the size.

104

7.6 Scaling the card abstraction

I used the same procedure to create a poker agent in a much larger card abstraction as I did in

Section 5.9 to create the winning entry of the 2012 ACPC bankroll instant run-off division. Recall

that this involved taking the action abstraction for each player and pairing it with a much larger card

abstraction, then applying the MCCFR algorithm (the same implementation of this algorithm used

in Section 5.9) [18, 40, 46]. I choose an IR − 35000 card abstraction — that is, an imperfect recall

abstraction with 169 pre-flop buckets and 35, 000 buckets on each of the flop, turn and river. This

card abstraction was generated using the same k-means technique that generated all other imperfect

card recall abstractions used in this dissertation [42]. When paired with my action abstractions, the

resulting abstractions required 126 gigabytes of RAM for player 1 and 146 gigabytes of RAM for

player 2 when loaded by the MCCFR algorithm. MCCFR was run for approximately 68 days for

player one and 66 days for player two. I used the same cluster as I did in Section 5.9, which has

nodes of 4 AMD 6172 processors with 12 cores each, running at 2.1GHz. Again, all 48 cores

were used in the calculation. In total 309 billion iterations were run for player one, and 300 billion

iterations for player two.

To see how this agent performs, I started by following my usual technique of playing 1, 200, 000, 000

hands of self-play to obtain tight empirical bounds on the game value. The values I obtained are

155.11 ± 1.14 and 162.54 ± 1.12 for players one and two, respectively. Subtracting player one’s

value from player two, this agent beats its opponent abstraction of ACPC2011 by 7.43 ± 1.60. As

shown in Figures 7.17 and 7.18, the corresponding values of this action abstraction in the card ab-

straction I used when training it, IR − 100, are144.29± 1.15 and 160.45± 1.12, again for players

one and two respectively. Subtracting player one’s value from player two we see that in IR − 100

this action abstraction beats its opponent abstraction of ACPC2011 by 16.16± 1.61 in IR− 100.2 I

showed in Section 7.4 that while the value of abstractions trained in IR− 100 using an FCPA op-

ponent action abstraction shifted when applied to an IR−570 game (with the same opponent action

abstraction), the action abstractions created in the IR−570 game had the same value. This suggests

that the shift in game value may be influenced more by the card abstraction the action abstractions

are eventually paired with, instead of the card abstractions used to create these action abstractions.

With that in mind, it is hard to draw conclusions about whether or not action abstractions generated

in the IR−35000 game would significantly outperform these ones. What I can say is that my action

abstractions continue to have a positive overall value against the opponent action abstraction, while

being much smaller.

2The game values for player one and player two are sampled independently, so the combined confidence interval iscalculated by adding the variances together

105

7.6.1 Results against top no-limit agents

Further testing of my no-limit Texas hold’em agent was done by playing it against other world-class

Texas hold’em agents, all generated by the University of Alberta’s Computer Poker Research Group

(CPRG). As I did in Section 5.9, I used the translation system designed by Schnizlein et al. [58, 59],

discussed in Section 3.8.2, to translate opponent bets not contained within the ACPC2011 opponent

action abstraction that I used. While in Section 5.9 I described how I used thresholding in the ACPC

2012 competition entry, no thresholding was used in this experiment. A cross table showing the

result of all of these agents playing against one another is given in Table 7.9. Most agents in the

cross table were generated using the same MCCFR algorithm to find an ε-Nash equilibrium, with the

exception of 2011+, which used the CFR algorithm. All of them used the same k-means clustering

technique to create the card abstraction [42]. A key detailing the card and action abstraction of

each agent in Table 7.9 is given in Table 7.10, along with a brief description of each agent. Agents

big2011, minBetC and minBetB were all crafted to take up just under 250 gigabytes of memory

in order to fit in 256 gigabytes of RAM.

Opponents2011+ Hawkin1 Hawkin2 big2011 minBetC minBetB Average

2011+ -26 ± 9 -57 ± 8 -35 ± 9 -47 ± 8 -53 ± 7 -42Hawkin1 26 ± 9 -45 ± 8 -23 ± 7 -27 ± 8 -27 ± 9 -20Hawkin2 57 ± 8 45 ± 8 14 ± 8 1 ± 7 2 ± 8 10big2011 35 ± 9 23 ± 7 -14 ± 8 -7 ± 9 -2 ± 9 -3

minBetC 47 ± 8 27 ± 8 -1 ± 7 7 ± 9 1 ± 8 5minBetB 53 ± 7 27 ± 9 -2 ± 8 2 ± 9 -1 ± 8 7

Max 57 45 15 22 26 16

Table 7.9: One on one performance of top no-limit Texas hold’em agents in milli-big-blinds/game.Results were calculated by playing 40 duplicate matches of 100, 000 hands each (8, 000, 000 handstotal). 95% confidence intervals on each result are provided.

Bot name Action abstraction (in pot fractions) Card Abstraction2011+ ACPC2011 IR− 3700

Hawkin1 1 fixed-size range vs IR− 18600ACPC2011 opponent

Hawkin2 3 variable-size ranges pruned vs IR− 35000ACPC2011 opponent

big2011 ACPC2011 + 0.5 once pre-flop, IR− 9000 flop and turn,1.5, 6, 20, 40 IR− 3700 river

minBetC ACPC2011 + IR− 9000 flop and turn,1 minimum bet per round IR− 1175 river

minBetB ACPC2011 + 2 + IR− 3700 flop and turn,1 minimum bet per round IR− 1175 river

Table 7.10: Abstractions used by the agents in Table 7.9

Hawkin2, the agent described in Section 7.6, performs the best of all agents in the cross table,

106

Bot name Description2011+ Upgraded version of the winner of the ACPC instant run-off

division in 2011. The original agent was created using 7, 600, 000iterations of CFR, this one was run for 45, 000, 000 iterations.

Hawkin1 Bot described in Section 5.9. Winner of the 2012 ACPC instantrun-off division.

Hawkin2 Bot described in Section 7.6.big2011 Bot created by expanding 2011+ in both cards and actions.minBetC Two agents containing a maximum of one minimum bet per round.minBetB minBetC had more buckets in the card abstraction, while

minBetB had an extra bet (2 pot) in the betting abstraction.

Table 7.11: Descriptions of the agents in Table 7.9

averaging a win of 10 milli-big-blinds per game. It easily beats the winners of the 2011 and 2012

instant run-off competitions, winning over these agents by the largest margin of any agent in the

cross table. The victory over the upgraded 2011 winner makes a great deal of sense, as the bet sizes

used by that agent were the ones I used as the opponent action abstraction. As my agent wins in

self-play against the fixed bets of ACPC2011 by 7.43 ± 1.60 (see Section 7.6), the remainder of

the value comes from the combination of the fact that my agent has a much larger card abstraction

(IR − 35000 vs IR − 3700), and the advantage of causing translation problems for this agent that

only understands ACPC2011 bets. The other of these agents, the 2012 winner, was generated using

RE–BETS with one fixed-width default range, as described in Section 5.9. This agent has closer

to the same card abstraction, using IR − 18600. The advantage of translation is nullified here,

as both agents use ACPC2011 as their opponent action abstraction, but bet with their own unique

distributions of bets.

While Hawkin2 defeats big2011 with statistical significance, agents minBetC and minBetB

have similar performance to Hawkin2 overall, and the matches between these bots do not have a

clear victor — for this many more hands would be required. As Hawkin2 makes many bets that are

not within the opponent abstraction of the other bots, they must use translation when interpreting

these bets. Similarly, as minimum bets are not within the opponent abstraction used by Hawkin2

(the ACPC2011 action abstraction), it must translate these bets to a bet size it understand before it

can respond to them. If we expect to be up against an agent that uses minimum bets, then we can

include them in the opponent action abstraction, and this should significantly improve our perfor-

mance. Note that the opposite option is not available to the opponent agents - as Hawkin2 makes

a different bet size at each point in the tree, unless the opposing agent-maker has the entire list of

bet sizes available to them when creating their agent, they cannot avoid translating the bet sizes at

game-time.

107

Chapter 8

Conclusion

I conclude this dissertation with a summary of the contributions of this work and the limitations of

my approach.

8.1 Contributions

I have introduced a set of algorithms for the automated creation of action abstractions in games with

large or continuous action spaces such as no-limit hold’em. I started in Chapter 1 with a summary

of the available poker literature pertaining to bet sizing, as well as a discussion of the computational

difficulty of no-limit poker games. In Chapter 2 I gave a detailed description of all no-limit poker

games studied in this dissertation. Chapter 3 was devoted to relevant game theory background, as

well as a summary of related work. In Chapters 4 through 7 I introduced a game transformation

and multiple accompanying regret-minimizing algorithms designed to automatically abstract large

or continuous action spaces, and I showed the successful application of these algorithms to various

no-limit poker games, most notably no-limit Texas hold’em.

8.1.1 Game transformation and BETS

Abstraction of the action space is essential in games with large or continuous action spaces, such

as no-limit Texas hold’em. In Chapter 4 I introduced a transformation that can be applied to such

games. This transformation creates a multiplayer game, and the strategies of the added players in

this transformed game can be mapped to an action abstraction of the original game. I introduced

the BETS algorithm, a regret minimizing algorithm designed to find an ε-Nash equilibrium of the

transformed game. I showed that in a number of small poker games, this ε-Nash equilibrium can

then be mapped to an action abstraction of the original game that uses (approximately) the Nash

equilibrium bet sizes.

108

8.1.2 RE–BETS

Chapter 5 explained some of the shortcomings of BETS in larger games. Bounding the range

of possible actions is necessary for my game transformation, however using one range for all

parameter-setting agents does not scale well with game size. To remedy this problem I introduced

RE–BETS, an algorithm that iteratively applies BETS, adjusting the ranges of various parameter-

setting agents between each iteration. RE–BETS shows improved performance in small games,

and scales well with game size. Using RE–BETS I created a no-limit Texas hold’em poker agent

that won the 2011 ACPC no-limit bankroll instant run-off competition.

8.1.3 REDM–BETS and REFINED–BETS

In Chapter 6 I improved upon the range-adjusting done on each iteration of RE–BETS. While

RE–BETS keeps the width of the ranges of parameter-setting agents constant, I showed that in-

stead it is advantageous to keep the pot growth ratio of the upper and lower bound constant. This

led to the RED–BETS algorithm. I then generalized this algorithm, creating the REDM–BETS

algorithm, which allows for multiple parameter-setting agents at each decision node. While using

multiple parameter setting agents can improve performance, I showed that there is value in keeping

multiple agents only at key points in the game. This is accomplished using the REFINED–

BETS algorithm. When applied to a perfect recall card abstraction of no-limit Texas hold’em,

REFINED–BETS creates action abstractions that are smaller than those created by REDM–

BETS while achieving the same game value.

8.1.4 Changing the card abstraction

In Chapter 7 I studied the application of REDM–BETS and REFINED–BETS to different

card and action abstractions. When using imperfect recall card abstractions REFINED–BETS

outperforms REDM–BETS, but by a smaller margin than in perfect recall games. Additionally,

the action abstractions these algorithms create when applied to an imperfect recall card abstraction

do not change significantly as the size of the imperfect recall card abstraction is increased. This leads

to the idea of using a small imperfect recall card abstraction when generating an action abstraction,

and then pairing the resulting action abstraction with a much larger card abstraction to create the

final no-limit Texas hold’em agent. I used REFINED–BETS and this technique to generate an

agent to play no-limit Texas hold’em, and I showed that it performs very well against a collection of

world-class no-limit Texas hold’em poker agents.

109

8.2 Limitations of this work

8.2.1 Separate runs to create action abstractions and produce strategies

The output of the BETS algorithm, along with all of the other algorithms presented in this thesis,

is an action abstraction. One must apply the algorithm, extract the action abstraction, and then run

another regret-minimizing algorithm such as CFR using this action abstraction to create an agent

that can play the original game. I believe that it may be possible to combine these steps, which is an

excellent topic for future work.

8.2.2 Serial computation of BETS

The BETS algorithm has not, to this point, been implemented in parallel. This restricts the size of

games to which it can be applied. While the action abstractions I create scale well with increasing

card abstractions, parallelization would allow usage of more complex opponent action abstractions.

8.2.3 Two strategy calculations for one agent

In most of the experiments in this dissertation, the BETS algorithm is applied to only one player,

while the other player uses a hand picked action abstraction (I refer to this throughout as the oppo-

nent action abstraction). This is done because there are certain pot fractions that are used more often

than others by both poker professionals and poker agents. For this reason the algorithms are applied

twice, once for each player. A poker agent is then obtained by using the results of one calculation

when playing as one of the players, and the other calculation when playing as the other player. Thus

obtaining a strategy to play using the action abstractions I generate requires double the computation

it would if both players bet using the same pot fractions. This limitation is offset by the fact that my

action abstractions are quite small, and thus it takes less time to compute each strategy.

8.2.4 Local maxima

In Section 4.5 I showed that the BETS algorithm can get stuck in local maxima. This is possible

because my transformation creates a multiplayer game, and thus there can be multiple Nash equi-

libria with different values. I revisited this issue in Section 6.8 and discussed how the REFINED–

BETS algorithm helps in avoiding local maxima. I showed that, in half-street Kuhn poker, REFINED–

BETS produces the global maxima as long as one of the bet-sizing agents being used has a range

such that 0 ≤ L < H ≤ 1. Furthermore, in Section 6.2 I showed that multiple different initial

conditions for RE–BETS lead to action abstractions with approximately the same game value.

This is a potential indication that, even if there are multiple local maxima with different values,

RE–BETS either finds the same ones or the values of these maxima are very similar. It is by no

means a proof of this, and it is quite possible that there are global maxima with much greater values

that my algorithms are not finding.

110

8.2.5 Lack of a convergence proof for BETS

In Chapter 4 I show that the BETS algorithm converges to Nash equilibrium bet sizes for multiple

small no-limit poker games. I do not, however, have a proof of convergence for the algorithm,

nor have I proven that if the algorithm converges, it has converged to an ε-Nash equilibrium of the

transformed game. There may be certain conditions that are required for this to occur. What I can

say is that I have not seen any examples of cases where the algorithm failed to converge.

8.2.6 Choice of variable N in BETS

When I introduced BETS in Section 4.3, I used the variable N , which determines the rate at

which bet-sizing agents can change their effective bet size from iteration to iteration. After some

initial investigation in half-street Kuhn poker, I set N = 10, 000, and used that value for all of the

experiments in the remainder of the dissertation. It may be the the speed with which the algorithm

converges is directly linked to the setting of this variable. The optimal value of N may be different

for different games, or even different for each parameter-setting agent in the same game.

8.3 Final thoughts

I have shown that there is much value to be gained by automating the abstraction of the action space

in no-limit poker games. Simply adding more bet sizes to the action abstraction of both players

and solving larger and larger games is not a scalable approach. I have outlined algorithms that are

very scalable, both from a computation and storage standpoint. My experimental results show that

the resulting abstractions have greater value than larger symmetrical abstractions and take up less

memory. Additionally, as was discussed at the end of Chapter 7, the unpredictable nature of the bet

sizes my abstractions use has added advantages against computer opponents. Most approaches to

creating poker agents require a predetermined opponent action abstraction, and thus opponents of

agents created using the algorithms outlined in this dissertation will be required to use translation to

interpret these unusual bet sizes.

In summary, the research described in this dissertation has made the following contributions:

• I introduced a transformation and accompanying regret-minimizing algorithm (BETS) that

is shown experimentally to generate high value action abstractions (Chapter 4).

• I introduced an algorithm that iteratively applies BETS, known as RE–BETS, which re-

solves some of the issues with scaling the BETS algorithm to large games (Chapter 5).

• I further refined the RE–BETS algorithm to allow for multiple parameter-setting agents at a

single decision node. This led to the REDM–BETS algorithm (Chapter 6).

• I introduced a metric that can be used to rank the importance of parameter-setting agents. This

is used by my final algorithm, REFINED–BETS, to prune the game tree (Chapter 6).

111

• I showed that these algorithms scale well – that when applying these algorithms to small

imperfect recall poker games, the resulting action abstractions perform well when paired with

larger card abstractions (Chapter 7).

• I applied RE–BETS and REFINED–BETS to create state-of-the-art no-limit poker

agents (Chapter 5, Chapter 7).

112

Appendix A

Game value tables

Here I present tables showing the game values for all game value curves presented in Chapters 5, 6

and 7.

Run # RE-BETS1 51.14± 2.42 42.8± 2.593 36.04± 2.984 32.25± 3.325 30.54± 3.266 31.51± 3.317 30.68± 3.38 29.48± 3.329 30.36± 3.5210 30.36± 3.51

Table A.1: Value to player two and bounds for the curve shown in Figure 5.4

Run # RE-BETS1 77.88± 2.492 92.98± 1.413 101.84± 1.854 107.82± 1.685 110.99± 1.816 115.79± 1.877 115.38± 1.828 113.07± 2.119 111.64± 2.810 111.74± 2.8

Table A.2: Value to player two and bounds for the curve shown in Figure 5.5

113

Run # RE-BETS RE-BETS RE-BETS[0.4, 0.6] [0.8, 1.0] [1.2, 1.4]

1 39.6± 5.6 51.8± 2.2 60.84± 2.392 34.75± 5.75 43.35± 2.65 56.08± 2.743 30.25± 6.25 36.5± 2.9 51.16± 3.054 29.75± 6.25 30.25± 3.25 47.02± 2.735 29.69± 6.26 29.47± 3.15 42.97± 2.176 29.22± 6.41 28.73± 3.2 39.42± 2.587 29.73± 6.07 28.32± 3.15 36.08± 2.678 29.82± 5.71 27.6± 3.56 32.59± 2.89 29.12± 4.94 27.83± 3.73 30.16± 2.7310 29.38± 4.84 28.16± 3.57 29.43± 3.15

Table A.3: Value to player two and bounds for each curve shown in Figure 6.1

Run # RE-BETS RE-BETS RE-BETS[0.4, 0.6] [0.8, 1.0] [1.2, 1.4]

1 104.88± 2.81 85.5± 2 45.88± 2.782 110.58± 2.98 95.25± 1.75 54.31± 2.763 113± 2.65 102± 1.5 61.39± 34 114.01± 3.12 108.15± 1.65 67.02± 2.865 112.77± 2.67 113.6± 1.73 76.15± 2.436 113.35± 2.79 115.81± 1.69 90.87± 1.647 113.36± 2.86 115.89± 1.87 102.46± 1.558 111.18± 3.23 116.99± 1.62 109.11± 1.879 111.96± 3.24 116.85± 1.89 115.63± 1.7110 110.32± 3.14 117.78± 1.99 117.06± 1.54


Run # RED-BETS RE-BETS[0.71, 0.94] [0.8, 1.0]

1 47.26± 2.18 51.8± 2.22 36.55± 2.44 43.35± 2.653 31± 2.24 36.5± 2.94 28.83± 2 30.25± 3.255 28.44± 1.9 29.47± 3.156 28.09± 1.9 28.73± 3.27 27.58± 1.93 28.32± 3.158 27.07± 1.96 27.6± 3.569 26.74± 1.97 27.83± 3.7310 25.8± 1.97 28.16± 3.57


114

Run # RED-BETS RE-BETS[0.71, 0.94] [0.8, 1.0]

1 93.81± 1.89 85.5± 22 103.96± 1.35 95.25± 1.753 108± 1.38 102± 1.54 114.91± 1.28 108.15± 1.655 118.16± 1.31 113.6± 1.736 118.43± 1.26 115.81± 1.697 118.23± 1.18 115.89± 1.878 118.44± 1.2 116.99± 1.629 117.02± 1.29 116.85± 1.8910 118.15± 1.22 117.78± 1.99


Run # REDM-BETS REDM-BETS REDM-BETS1 Range 2 Ranges 3 Ranges

1 47.26± 2.18 27.22± 2.41 26.25± 2.442 36.55± 2.44 25.13± 2.34 24.31± 2.413 31± 2.24 24.51± 2.3 23.44± 2.244 28.83± 2 24.15± 2.29 23.28± 2.525 28.44± 1.9 23.19± 2.23 22.23± 2.686 28.09± 1.9 21.96± 2.16 21.03± 2.67 27.58± 1.93 20.78± 2.23 19.72± 2.458 27.07± 1.96 20.92± 2.14 19.6± 2.369 26.74± 1.97 21.19± 2.15 19.89± 2.4110 25.8± 1.97 20.34± 2.25 19.69± 2.44



1 93.81± 1.89 116.33± 2.41 116.69± 2.562 103.96± 1.35 122.51± 2.34 123.16± 2.493 108± 1.38 122.72± 2.15 125.02± 2.534 114.91± 1.28 122.99± 2.11 125.08± 2.635 118.16± 1.31 122.73± 2 124.96± 2.686 118.43± 1.26 123.35± 1.83 124.88± 2.667 118.23± 1.18 122.95± 1.82 124.66± 2.418 118.44± 1.2 123.91± 2.03 124.92± 2.599 117.02± 1.29 123.4± 1.77 124.93± 2.6110 118.15± 1.22 124.38± 2.19 124.91± 2.26


115

Run # REDM-BETS REDM-BETS REFINED-BETS REFINED-BETS1 Range 2 Ranges 2 Ranges 2 Ranges

1 Extra Bet 25 Extra Bets1 47.26± 2.18 27.22± 2.41 29.05± 2.25 27.24± 2.282 36.55± 2.44 25.13± 2.34 27.66± 2.11 25.22± 2.123 31± 2.24 24.51± 2.3 26.75± 2.09 24.45± 2.124 28.83± 2 24.15± 2.29 26.11± 2.02 23.65± 2.055 28.44± 1.9 23.19± 2.23 25.49± 1.96 22.86± 2.136 28.09± 1.9 21.96± 2.16 24.18± 1.91 22.09± 2.117 27.58± 1.93 20.78± 2.23 23.48± 1.95 21.26± 2.128 27.07± 1.96 20.92± 2.14 23.16± 1.93 20.73± 2.19 26.74± 1.97 21.19± 2.15 23.25± 1.99 21.28± 2.2110 25.8± 1.97 20.34± 2.25 23.49± 2.03 21.36± 2.19


Run # REDM-BETS REDM-BETS REFINED-BETS REFINED-BETS1 Range 2 Ranges 2 Ranges 2 Ranges

1 Extra Bet 25 Extra Bets1 93.81± 1.89 116.33± 2.41 110.28± 1.98 116.05± 2.292 103.96± 1.35 122.51± 2.34 117.24± 1.53 121.77± 2.053 108± 1.38 122.72± 2.15 118.26± 1.61 122.62± 1.964 114.91± 1.28 122.99± 2.11 118.9± 1.46 123.24± 2.025 118.16± 1.31 122.73± 2 118.79± 1.44 122.61± 1.876 118.43± 1.26 123.35± 1.83 118.74± 1.48 122.61± 1.967 118.23± 1.18 122.95± 1.82 118.99± 1.44 123.06± 1.988 118.44± 1.2 123.91± 2.03 118.53± 1.3 123.08± 1.989 117.02± 1.29 123.4± 1.77 118.39± 1.5 123.13± 2.0310 118.15± 1.22 124.38± 2.19 118.25± 1.51 122.73± 1.9


Run # REDM-BETS REDM-BETS1 Range 1 Range

Statistical bounds Absolute bounds1 45.73± 0.98 47.26± 2.182 36.05± 0.98 36.55± 2.443 31.63± 0.98 31± 2.244 28.5± 0.98 28.83± 25 28.04± 0.98 28.44± 1.96 26.2± 0.98 28.09± 1.97 27.16± 0.98 27.58± 1.938 26.08± 0.98 27.07± 1.969 24.61± 0.98 26.74± 1.9710 27.21± 0.98 25.8± 1.97


116

Run # REDM-BETS REDM-BETS1 Range 1 Range

Statistical bounds Absolute bounds1 95.53± 0.98 93.81± 1.892 103.64± 0.98 103.96± 1.353 108.33± 0.98 108± 1.384 115.56± 0.98 114.91± 1.285 117.61± 0.98 118.16± 1.316 117.92± 0.98 118.43± 1.267 119.12± 0.98 118.23± 1.188 117.22± 0.98 118.44± 1.29 117.39± 0.98 117.02± 1.2910 117.08± 0.98 118.15± 1.22



1 152.02± 0.94 123.78± 0.98 115.52± 0.962 141.47± 0.93 118.94± 0.98 113.18± 0.963 137.12± 0.94 113.21± 0.98 111± 0.964 132.34± 0.95 113.45± 0.97 109.79± 0.965 130.24± 0.97 112.34± 0.97 107.65± 0.956 128.27± 0.98 113.46± 0.97 106.97± 0.987 127.63± 0.97 112.39± 0.97 108.53± 0.988 127.25± 0.97 112.42± 0.96 108.06± 0.989 126.7± 0.98 113.35± 0.96 108.39± 0.9810 128.9± 0.98 112.71± 1.01 108.03± 1



1 165.78± 0.94 188.28± 0.98 189.05± 0.932 173.36± 0.94 193.67± 0.98 192.44± 0.943 177.63± 0.94 192.96± 0.98 194.54± 0.944 180.58± 0.94 193.3± 0.98 195.06± 0.945 180.76± 0.94 193.06± 0.98 195.47± 0.946 181.04± 0.94 193.17± 0.98 195.54± 0.947 180.22± 0.95 192.98± 0.98 195.6± 0.948 178.69± 0.94 193.39± 0.98 194.53± 0.949 180.72± 0.95 192.63± 0.98 194.54± 0.9410 180.32± 0.95 191.29± 0.98 193.6± 0.94


117

Run # REDM-BETS REDM-BETS REDM-BETS REFINED REFINED1 Range 2 Ranges 3 Ranges -BETS -BETS

3 Ranges 3 Ranges1 Extra Bet 25 Extra Bets

1 152.02± 0.94 123.78± 0.98 115.52± 0.96 129.2± 0.98 118.47± 0.982 141.47± 0.93 118.94± 0.98 113.18± 0.96 121.93± 0.98 116.74± 0.983 137.12± 0.94 113.21± 0.98 111± 0.96 119.51± 0.98 114.26± 0.984 132.34± 0.95 113.45± 0.97 109.79± 0.96 119.54± 0.98 114.78± 0.985 130.24± 0.97 112.34± 0.97 107.65± 0.95 119.89± 0.98 112.38± 0.986 128.27± 0.98 113.46± 0.97 106.97± 0.98 119.53± 0.98 112.55± 0.987 127.63± 0.97 112.39± 0.97 108.53± 0.98 121.58± 0.98 114.33± 0.988 127.25± 0.97 112.42± 0.96 108.06± 0.98 121.55± 0.98 114.76± 0.989 126.7± 0.98 113.35± 0.96 108.39± 0.98 121.74± 0.98 116.34± 0.9810 128.9± 0.98 112.71± 1.01 108.03± 1 121.16± 0.98 115.33± 0.98



3 Ranges 3 Ranges1 Extra Bet 25 Extra Bets

1 165.78± 0.94 188.28± 0.98 189.05± 0.93 172.77± 0.98 186.3± 0.982 173.36± 0.94 193.67± 0.98 192.44± 0.94 176.93± 0.98 188.33± 0.983 177.63± 0.94 192.96± 0.98 194.54± 0.94 180.15± 0.98 188.55± 0.984 180.58± 0.94 193.3± 0.98 195.06± 0.94 182.94± 0.98 188.98± 0.985 180.76± 0.94 193.06± 0.98 195.47± 0.94 184.46± 0.98 190.66± 0.986 181.04± 0.94 193.17± 0.98 195.54± 0.94 185.79± 0.98 189.62± 0.987 180.22± 0.95 192.98± 0.98 195.6± 0.94 184.46± 0.98 188.27± 0.988 178.69± 0.94 193.39± 0.98 194.53± 0.94 185.29± 0.98 188.29± 0.989 180.72± 0.95 192.63± 0.98 194.54± 0.94 185.69± 0.98 187.99± 0.9810 180.32± 0.95 191.29± 0.98 193.6± 0.94 185.09± 0.98 188.8± 0.98


118


3 Ranges 3 Ranges50 Extra Bets 100 Extra Bets

1 152.02± 0.94 123.78± 0.98 115.52± 0.96 116.13± 0.98 115.7± 0.982 141.47± 0.93 118.94± 0.98 113.18± 0.96 113.64± 0.98 113.08± 0.983 137.12± 0.94 113.21± 0.98 111± 0.96 112.13± 0.98 111.36± 0.984 132.34± 0.95 113.45± 0.97 109.79± 0.96 110.99± 0.98 111.31± 0.985 130.24± 0.97 112.34± 0.97 107.65± 0.95 110.97± 0.98 110.69± 0.986 128.27± 0.98 113.46± 0.97 106.97± 0.98 111.61± 0.98 111.59± 0.987 127.63± 0.97 112.39± 0.97 108.53± 0.98 111.9± 0.98 111.95± 0.988 127.25± 0.97 112.42± 0.96 108.06± 0.98 111.27± 0.98 111.83± 0.989 126.7± 0.98 113.35± 0.96 108.39± 0.98 113.45± 0.98 112.31± 0.9810 128.9± 0.98 112.71± 1.01 108.03± 1 114.28± 0.98 114.07± 0.98



3 Ranges 3 Ranges50 Extra Bets 100 Extra Bets

1 165.78± 0.94 188.28± 0.98 189.05± 0.93 186.83± 0.98 187.95± 0.982 173.36± 0.94 193.67± 0.98 192.44± 0.94 191.71± 0.98 190.89± 0.983 177.63± 0.94 192.96± 0.98 194.54± 0.94 191.8± 0.98 192.74± 0.984 180.58± 0.94 193.3± 0.98 195.06± 0.94 193.42± 0.98 192.3± 0.985 180.76± 0.94 193.06± 0.98 195.47± 0.94 190.44± 0.98 192.92± 0.986 181.04± 0.94 193.17± 0.98 195.54± 0.94 190.04± 0.98 192.52± 0.987 180.22± 0.95 192.98± 0.98 195.6± 0.94 191.82± 0.98 193.32± 0.988 178.69± 0.94 193.39± 0.98 194.53± 0.94 191.58± 0.98 192.92± 0.989 180.72± 0.95 192.63± 0.98 194.54± 0.94 191.95± 0.98 193.12± 0.9810 180.32± 0.95 191.29± 0.98 193.6± 0.94 191.37± 0.98 193.02± 0.98


119


1 152.63± 0.96 125.88± 0.97 119.63± 0.982 144.37± 0.96 121.32± 0.98 116.24± 0.983 136.52± 0.96 117.24± 0.97 113.19± 0.984 135.08± 0.97 115.45± 0.97 113.54± 0.985 132.66± 0.97 117.97± 0.97 113.63± 0.986 131.42± 0.98 115.79± 0.96 111.6± 0.987 132.16± 0.98 115.9± 0.96 113.05± 0.988 130.57± 0.98 116.53± 0.96 112.6± 0.989 130.84± 0.98 115.71± 0.96 113.05± 0.9810 130.41± 0.97 115.72± 0.98 112.8± 0.98



1 171.81± 0.96 191.44± 0.98 190.94± 0.982 177.29± 0.97 194.71± 0.98 193.72± 0.983 177.29± 0.97 194.7± 0.98 191.64± 0.984 179.63± 0.97 193.66± 0.98 193.36± 0.985 180.98± 0.96 193.32± 0.98 195.25± 0.986 180.91± 0.96 193.18± 0.98 195.69± 0.987 180.97± 0.96 192.27± 0.98 195.24± 0.988 180.12± 0.96 191.1± 0.98 195.35± 0.889 179.75± 0.96 191.84± 0.98 195.59± 1.0810 179.76± 0.97 192± 0.98 195.44± 0.78


Run # REDM-BETS REDM-BETS REDM-BETS REDM-BETS1 Range 2 Ranges 1 Range 2 Ranges

Bets from IR100s Bets from IR100s1 152.63± 0.96 125.88± 0.97 153.98± 0.96 127.17± 0.982 144.37± 0.96 121.32± 0.98 143.99± 0.96 122.27± 0.983 136.52± 0.96 117.24± 0.97 139.25± 0.96 118.21± 0.984 135.08± 0.97 115.45± 0.97 135.3± 0.97 118.02± 0.975 132.66± 0.97 117.97± 0.97 134.35± 0.99 118.79± 0.966 131.42± 0.98 115.79± 0.96 132.86± 0.99 116.62± 0.977 132.16± 0.98 115.9± 0.96 132.34± 0.99 116.88± 0.978 130.57± 0.98 116.53± 0.96 131.69± 0.99 116.17± 0.969 130.84± 0.98 115.71± 0.96 131.26± 0.98 118.2± 0.9710 130.41± 0.97 115.72± 0.98 132.6± 0.98 116.88± 0.97


120

Run # REDM-BETS REDM-BETS REDM-BETS REDM-BETS1 Range 2 Ranges 1 Range 2 Ranges

Bets from IR100s Bets from IR100s1 171.81± 0.96 191.44± 0.98 171.7± 0.97 190.68± 0.962 177.29± 0.97 194.71± 0.98 177.46± 0.97 193.46± 0.973 177.29± 0.97 194.7± 0.98 179.39± 0.97 196.28± 0.974 179.63± 0.97 193.66± 0.98 179.29± 0.97 195.69± 0.975 180.98± 0.96 193.32± 0.98 179.17± 0.96 195.03± 0.956 180.91± 0.96 193.18± 0.98 179.3± 0.96 193.97± 0.967 180.97± 0.96 192.27± 0.98 179.81± 0.96 195.01± 0.958 180.12± 0.96 191.1± 0.98 178.24± 0.95 194.1± 0.969 179.75± 0.96 191.84± 0.98 178.14± 0.96 194.5± 0.9610 179.76± 0.97 192± 0.98 176.93± 0.96 194.83± 0.96


Run # REDM-BETS REDM-BETS REDM-BETS REDM-BETS1 Range in IR570s 2 Ranges in IR570s 1 Range in IR100s 2 Ranges in IR100s

1 152.63± 0.96 125.88± 0.97 152.02± 0.94 123.78± 0.982 144.37± 0.96 121.32± 0.98 141.47± 0.93 118.94± 0.983 136.52± 0.96 117.24± 0.97 137.12± 0.94 113.21± 0.984 135.08± 0.97 115.45± 0.97 132.34± 0.95 113.45± 0.975 132.66± 0.97 117.97± 0.97 130.24± 0.97 112.34± 0.976 131.42± 0.98 115.79± 0.96 128.27± 0.98 113.46± 0.977 132.16± 0.98 115.9± 0.96 127.63± 0.97 112.39± 0.978 130.57± 0.98 116.53± 0.96 127.25± 0.97 112.42± 0.969 130.84± 0.98 115.71± 0.96 126.7± 0.98 113.35± 0.9610 130.41± 0.97 115.72± 0.98 128.9± 0.98 112.71± 1.01


Run # REDM-BETS REDM-BETS REDM-BETS REDM-BETS1 Range in IR570s 2 Ranges in IR570s 1 Range in IR100s 2 Ranges in IR100s

1 171.81± 0.96 191.44± 0.98 165.78± 0.94 188.28± 0.982 177.29± 0.97 194.71± 0.98 173.36± 0.94 193.67± 0.983 177.29± 0.97 194.7± 0.98 177.63± 0.94 192.96± 0.984 179.63± 0.97 193.66± 0.98 180.58± 0.94 193.3± 0.985 180.98± 0.96 193.32± 0.98 180.76± 0.94 193.06± 0.986 180.91± 0.96 193.18± 0.98 181.04± 0.94 193.17± 0.987 180.97± 0.96 192.27± 0.98 180.22± 0.95 192.98± 0.988 180.12± 0.96 191.1± 0.98 178.69± 0.94 193.39± 0.989 179.75± 0.96 191.84± 0.98 180.72± 0.95 192.63± 0.9810 179.76± 0.97 192± 0.98 180.32± 0.95 191.29± 0.98


121

Run # REDM-BETS REDM-BETS1 Range 2 Ranges

1 180.26± 0.95 155.04± 0.982 175.3± 0.96 152.57± 0.983 170.87± 0.96 149.02± 0.984 166.57± 0.96 148.02± 0.985 163.19± 0.96 147.02± 0.986 161.72± 0.96 149.02± 0.987 161.24± 0.96 148.02± 0.988 161.77± 0.95 149.52± 0.989 160.82± 0.96 148.22± 1.1910 160.63± 0.97 147.52± 1.38


Run # REDM-BETS REDM-BETS1 Range 2 Ranges

1 140.37± 0.95 159.8± 0.982 149.74± 0.96 161.93± 0.983 148.49± 0.95 161.23± 0.984 150.47± 0.96 159.23± 0.985 151.73± 0.96 158.23± 0.986 152.43± 0.96 157.23± 0.987 153.98± 0.95 159.23± 0.988 153.85± 0.95 158.23± 0.989 151.01± 0.96 159.23± 1.1810 151.55± 0.75 157.23± 1.38


Run # REDM-BETS REDM-BETS REFINED-BETS1 Range 2 Ranges 3 Ranges

100 Extra Bets1 180.26± 0.95 155.04± 0.98 149.06± 0.982 175.3± 0.96 152.57± 0.98 147.14± 0.983 170.87± 0.96 149.02± 0.98 146.78± 0.984 166.57± 0.96 148.02± 0.98 144.29± 0.985 163.19± 0.96 147.02± 0.98 144.49± 0.986 161.72± 0.96 149.02± 0.98 144.09± 0.987 161.24± 0.96 148.02± 0.98 144.59± 0.988 161.77± 0.95 149.52± 0.98 144.39± 0.989 160.82± 0.96 148.22± 1.19 143.89± 0.9810 160.63± 0.97 147.52± 1.38 144.89± 0.98


122

Run # REDM-BETS REDM-BETS REFINED-BETS1 Range 2 Ranges 3 Ranges

200 Extra Bets1 140.37± 0.95 159.8± 0.98 156.52± 0.982 149.74± 0.96 161.93± 0.98 158.78± 0.983 148.49± 0.95 161.23± 0.98 160.45± 0.984 150.47± 0.96 159.23± 0.98 159.5± 15 151.73± 0.96 158.23± 0.98 160± 1.26 152.43± 0.96 157.23± 0.98 161± 17 153.98± 0.95 159.23± 0.98 159.5± 1.18 153.85± 0.95 158.23± 0.98 158.5± 1.29 151.01± 0.96 159.23± 1.18 159.3± 110 151.55± 0.75 157.23± 1.38 159.8± 1.1


123

Bibliography

[1] The Computer Poker Competition. http://www.computerpokercompetition.org/.

[2] The Second Man-Machine Poker Competition. http://manmachinepoker.com.

[3] N. Abou Risk. Using Counterfactual Regret Minimization to Create a Competitive MultiplayerPoker Agent. Master’s thesis, University of Alberta, 2009.

[4] A. Antos, R. Munos, and C. Szepesvari. Fitted Q-iteration in Continuous Action-Space MDPs.In Advances in Neural Information Processing Systems (NIPS), pages 9–16, 2007.

[5] N. Bard, M. Johanson, N. Burch, and M. Bowling. Online Implicit Agent Modelling. InProceedings of the Twelfth International Conference on Autonomous Agents and MultiagentSystems (AAMAS), pages 255–262, 2013.

[6] D. Billings. Algorithms and Assessment in Computer Poker. Master’s thesis, University ofAlberta, 2006.

[7] D. Billings, N. Burch, A. Davidson, R. Holte, J. Schaeffer, T. Schauenberg, and D. Szafron.Approximating Game-Theoretic Optimal Strategies for Full-scale Poker. In Proceedings of theEighteenth International Joint Conference on Artifical Intelligence (IJCAI), 2003.

[8] D. Billings, A. Davidson, J. Schaeffer, and D. Szafron. The Challenge of Poker. ArtificialIntelligence Journal, 2002.

[9] B. Chen and J. Ankenman. The Mathematics of Poker. ConJelCo Publishing Company, 2007.

[10] M. Craig, editor. The Full Tlit Poker Strategy Guide. Grand Central Publishing, 2007.

[11] B. David. An Analog of the Minimax Theorem for Vector Payoffs. Pacific Journal of Mathe-matics, 6(1):1–8, 1956.

[12] T. Dean, K. Kim, and R. Givan. Solving Stochastic Planning Problems with Large State andAction Spaces. In Proceedings of the Fourth International Conference on Artificial IntelligencePlanning Systems (AIPS), pages 102–110, 1998.

[13] M. Flynn, S. Mehta, and E. Miller. Professional No-limit Hold’em. Two Plus Two Publishing,2007.

[14] S. Ganzfried and T. Sandholm. Computing an Approximate Jam/Fold Equilibrium for 3-AgentNo-Limit Texas Hold’em Tournaments. In Proceedings of the International Conference onAutonomous Agents and Multiagent Systems (AAMAS), 2008.

[15] S. Ganzfried and T. Sandholm. Computing Equilibria in Multiplayer Stochastic Games ofImperfect Information. In Proceedings of the International Joint Conference on Artificial In-telligence (IJCAI), 2009.

[16] S. Ganzfried and T. Sandholm. Computing Equilibria by Incorporating Qualitative Models.In Proceedings of the National Conference on Autonomous Agents and Multiagent Systems(AAMAS), 2010.

[17] S. Ganzfried, T. Sandholm, and K. Waugh. Strategy Purification and Thresholding: Effec-tive non-Equilibrium Approaches for Playing Large Games. In Proceedings of the EleventhInternational Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pages871–878, 2012.

124

[18] R. Gibson, M. Lanctot, N. Burch, D. Szafron, and M. Bowling. Generalized Sampling andVariance in Counterfactual Regret Minimization. In Proceedings of the Twenty-Sixth Interna-tional Conference on Artificial Intelligence (AAAI), pages 1355–1361, 2012.

[19] R. Gibson and D. Szafron. On Strategy Stitching in Large Extensive Form Multiplayer Games.In Advances in Neural Information Processing Systems (NIPS), pages 100–108, 2011.

[20] A. Gilpin, J. Pena, and T. Sandholm. First-Order Algorithm with O(ln(1/epsilon)) Convergencefor epsilon-Equilibrium in Two-Person Zero-Sum Games. In Proceedings of the InternationalConference on Artificial Intelligence (AAAI), 2008.

[21] A. Gilpin and T. Sandholm. A Competitive Texas Hold’em Poker Player via Automated Ab-straction and Real-time Equilibrium Computation. In Proceedings of the International Con-ference on Artificial Intelligence (AAAI), 2006.

[22] A. Gilpin and T. Sandholm. Better Automated Abstraction Techniques for Imperfect Informa-tion Games, with Application to Texas Hold’em Poker. In Proceedings of the InternationalJoint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2007.

[23] A. Gilpin and T. Sandholm. Gradient-based Algorithms for Finding Nash Equilibria in Ex-tensive Form Games. In Proceedings of the Third International Workshop on Internet andNetwork Economics (WINE), 2007.

[24] A. Gilpin, T. Sandholm, and T. Soerensen. Potential-aware Automated Abstraction of Sequen-tial Games, and Holistic Equilibrium Analysis of Texas Hold’em Poker. In Proceedings of theInternational Conference on Artificial Intelligence (AAAI), 2007.

[25] A. Gilpin, T. Sandholm, and T. B. Sorensen. A Heads-up No-limit Texas Hold’em PokerPlayer: Discretized Betting Models and Automatically Generated Equilibrium-finding Pro-grams. In Proceedings of the International Conference on Autonomous Agents and MultiagentSystems (AAMAS), 2008.

[26] H. Gintis. Game Theory Evolving. Princeton University Press, 2000.

[27] D. Harrington and B. Robertie. Harrington on Hold’em Volume I. Two Plus Two Publishing,2005.

[28] D. Harrington and B. Robertie. Harrington on Hold’em Volume II. Two Plus Two Publishing,2005.

[29] D. Harrington and B. Robertie. Harrington on Cash Games Volume I. Two Plus Two Publish-ing, 2008.

[30] D. Harrington and B. Robertie. Harrington on Cash Games Volume II. Two Plus Two Pub-lishing, 2008.

[31] H. V. Hasselt and M. Wiering. Reinforcement Learning in Continuous Action Spaces. InIEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages272–279, 2007.

[32] J. Hawkin, N. Bard, J. Rubin, and M. Zinkevich. The Annual Computer Poker Competition.AI Magazine, 34(2):112–114, 2013.

[33] J. Hawkin, R. Holte, and D. Szafron. Using Sliding Windows to Generate Action Abstractionsin Extensive-Form Games. In Proceedings of the Twenty-sixth International Conference of theAssociation for the Advancement of Artificial Intelligence (AAAI), pages 1924–1930, 2011.

[34] J. Hawkin, R. Holte, and D. Szafron. Automated Action Abstraction of Imperfect InformationExtensive-form Games. In Proceedings of the Twenty-seventh International Conference of theAssociation for the Advancement of Artificial Intelligence (AAAI), pages 681–687, 2012.

[35] B. Hoehn. The Effectiveness of Opponent Modelling in a Small Imperfect Information Game.Master’s thesis, University of Alberta, 2006.

[36] E. Jackson. Slumbot: An Implementation Of Counterfactual Regret Minimization On Com-modity Hardware. In Computer Poker Symposium at the Twenty-Sixth Conference on ArtificialIntelligence (AAAI), 2012.

125

[37] M. Johanson. Robust Strategies and Counter-Strategies: Building a Champion Level ComputerPoker Player. Master’s thesis, University of Alberta, 2007.

[38] M. Johanson. Measuring the Size of Large No-limit Poker Games. Technical report, Universityof Alberta, Department of Computing Science, 02 2013.

[39] M. Johanson, N. Bard, N. Burch, and M. Bowling. Finding Optimal Abstract Strategies inExtensive Form Games. In Proceedings of the Twenty-Sixth International Conference on Arti-ficial Intelligence (AAAI), pages 1371–1379, 2012.

[40] M. Johanson, N. Bard, M. Lanctot, R. Gibson, and M. Bowling. Efficient Nash EquilibriumApproximation through Monte Carlo Counterfactual Regret Minimization. In Proceedings ofthe Eleventh International Conference on Autonomous Agents and Multi-Agent Systems (AA-MAS), pages 837–844, 2012.

[41] M. Johanson, M. Bowling, K. Waugh, and M. Zinkevich. Accelerating Best Response Cal-culation in Large Extensive Games. In Proceedings of the Twenty-Second International JointConference on Artificial Intelligence (IJCAI), pages 258–265, 2011.

[42] M. Johanson, N. Burch, R. Valenzano, and M. Bowling. Evaluating State-Space Abstrac-tions in Extensive-Form Games. In Proceedings of the Twelfth International Conference onAutonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

[43] D. Koller, N. Megiddo, and B. V. Stengel. Fast Algorithms for Finding Randomized Strategiesin Game Trees. In Proceedings of the Twenty-Sixth Annual ACM Symposium on the Theory ofComputing, 1994.

[44] H. Kuhn. Extensive Games and the Problem of Information. Annals of Mathematics Studies,Volume 28, 1953.

[45] H. W. Kuhn. Simplified Two-Person Poker. Contributions to the Theory of Games, 1:97–103,1950.

[46] M. Lanctot, K. Waugh, M. Zinkevich, and M. Bowling. Monte Carlo Sampling for RegretMinimization in Extensive Games. In Advances in Neural Information Processing Systems(NIPS), pages 1078–1086, 2009.

[47] A. Lazaric, M. Restelli, and A. Bonarini. Reinforcement Learning in Continuous ActionSpaces through Sequential Monte Carlo Methods. In Advances in Neural Information Pro-cessing Systems (NIPS), pages 833–840, 2007.

[48] C. Madeira, V. Corruble, and G. Ramalho. Designing a Reinforcement Learning-based Adap-tive AI for Large-scale Strategy Games. In Proceedings of the Second International Confer-ence On Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pages 121–123,2006.

[49] S. Mehrotra. On the Implementation of a Primal-dual Interior Point Method. SIAM Journal onOptimization, 2(4):575–601, 1992.

[50] P. B. Miltersen and T. B. Sorensen. A Near-Optimal Strategy for a Heads-up No-limit TexasHoldem Poker Tournament. In AAMAS ’07: Proceedings of The 6th International Conferenceon Autonomous Agents and Multiagent Systems, 2007.

[51] C. Moshman. Heads-up No-limit Hold’em. Two Plus Two Publishing, 2008.

[52] J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. Computer Journal,7(4):308–313, 1965.

[53] Y. Nesterov. Excessive Gap Technique in Non-smooth Convex Minimization. SIAM J. onOptimization Volume 16, 2005.

[54] N. Nisan, T. Roughgarden, E. Tardos, and V. V. Vazirani, editors. Algorithmic Game Theory.Cambridge University Press, 2007.

[55] J. Rubin and I. Watson. Similarity-Based Retrieval and Solution Re-use Policies in the Gameof Texas Hold’em. In Proceedings of the Eighteenth International Conference on Case-BasedReasoning (ICCBR), pages 465–479, 2010.

126

[56] J. Rubin and I. Watson. Computer poker: a review. Artificial Intelligence, 175(5):958–987,2011.

[57] J. Rubin and I. Watson. Successful Performance via Decision Generalisation in No LimitTexas Hold’em. In Proceedings of the Nineteenth International Conference on Case-BasedReasoning (ICCBR), pages 467–481, 2011.

[58] D. Schnizlein. State Translation in No-limit Poker. Master’s thesis, University of Alberta,2009.

[59] D. Schnizlein, M. Bowling, and D. Szafron. Probabilistic State Translation in Extensive Gameswith Large Action Sets. In Proceedings of the Twenty-first International Joint Conference onArtifical Intelligence (IJCAI), 2009.

[60] I. Schweizer, K. Panitzek, S. Park, and J. Furnkranz. An Exploitative Monte-Carlo PokerAgent. In Proceedings of the Thirty-second Annual German Conference on AI, 2009.

[61] D. Sklansky. The Theory of Poker. Two Plus Two Publishing, 1987.

[62] D. Sklansky and E. Miller. No-limit Hold’em: Theory and Practice. Two Plus Two Publishing,2006.

[63] N. Sweeney and D. Sinclair. Learning to Play Multi-Player Limit Hold’em Poker by StochasticGradient Ascent. In Proceedings of the Sixth European Workshop on Multi-agent Systems(EUMAS), 2009.

[64] S. Tijs. Stochastic Games with One Big Action Space in Each State. Open Access Publicationsfrom Tilburg University, Tilburg University, 1980.

[65] K. Waugh. Abstraction in Large Extensive Games. Master’s thesis, University of Alberta,2009.

[66] K. Waugh, D. Schnizlein, M. Bowling, and D. Szafron. Abstraction Pathologies in ExtensiveGames. In AAMAS ’09: Proceedings of The Eighth International Conference on AutonomousAgents and Multiagent Systems, 2009.

[67] K. Waugh, M. Zinkevich, M. Johanson, M. Kan, D. Schnizlein, and M. Bowling. A PracticalUse of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformula-tion and Approximation (SARA), 2009.

[68] M. Zinkevich, M. Johanson, M. Bowling, and C. Piccione. Regret Minimization in Gameswith Incomplete Information. In Advances in Neural Information Processing Systems (NIPS),2007.

127

Documents

Data: You had nothing! Geordi: He bluffed you, Data. Data: It …poker.cs.ualberta.ca/publications/Hawkin_PhD.pdf · 2018. 4. 14. · Data: You had nothing! Geordi: He bluffed you,