View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Poker for Fun and Profit(and intellectual challenge)
Robert Holte
Computing Science Dept.
University of Alberta
Poker
World Series of Poker
Poker Research Group - core
• Darse Billings (Ph.D.) • Aaron Davidson M.Sc., Poki • Neil Burch P/A, PsOpti• Terence Schauenberg (M.Sc.), Adapti
• Advisors: J Schaeffer, D Szafron
Poker Research Group – new arrivals
• Bret Hoehn (M.Sc.)• Finnegan Southey (postdoc)
• Michael Bowling• Dale Schuurmans• Rich Sutton• Robert Holte
Our Goal
PsOpti2 vs. “theCount”
Play Us Online
http://games.cs.ualberta.ca/poker/
Poki’s Poker Academy
http://poki-poker.com
Poker Variants
• Many different variants of poker
• Texas Hold’em the most skill-testing
• No-Limit Texas Hold’em used to determine the world champion
• Our research: Limit Texas Hold’em
• Current focus: 2-player (heads up)
Bet Sequence
Initial
Flop
Bet Sequence
Turn
Bet Sequence
River
1,624,350
9 of 19
9 of 19
45
9 of 19
44
17,296
19 Bet Sequence
O(1018)
2-player, limit, Texas Hold’em
2 private cards to each player
3 community cards
1 community card
1 community card
Research Issues
1. Chance events2. Imperfect Information3. Sheer size of the game tree4. Opponent modelling is crucial5. How best to use domain knowledge ?6. Experimental method
Variants have even more challenges:– More than 2 players (up to 10) – “No limit” (bid any amount)
Issues: Chance Events
• Utility of outcomes– currently just reason about expected payoff– short-term vs. long-term
• High variance– was the outcome due to luck or skill ?– experiment design
Issues: Imperfect Information
• Probabilistic strategies are essential
• Cannot construct your strategy in a bottom-up manner, as is done with perfect information games
Issues: Size of the game
• 2-player, Limit, Texas Hold’em game tree has about 1018 states
• Linear Programming can solve games with 108 states
Issues: Opponent Modelling
• Nash equilibrium not good enough– Static– Defensive
• Even the best humans have weaknesses that should be exploited
• How to learn very quickly, with very noisy information ?– Expoitation vs. exploration
• How not to be exploited yourself ?
Issues: Using Expert Knowledge
• We are fortunate to have unlimited access to a poker-playing expert (Darse)
• How best to use his knowledge ?– Expert system (explicitly encoded
knowledge) was not effective– Used his knowledge to devise abstractions
that reduced the game size with minimal impact on strategic aspects of the game
– Use him to evaluate the system
Experimental Method
• High variance
• ‘bot play not the same as human play
• Very limited access to expert humans other than our own expert
Coping with very large games
Full game treeT
StrategyFor T
StrategyFor T*
Abstract game treeT*
abstraction
Solve (LP)
(reversemapping)
(lossy)
too big to solve
Abstraction
• Texas Hold'em 2-player game tree is too big for current LP –solvers (1,179,000,604,565,715,751)
• Many ways of doing the abstractions– We require coarse-grained abstractions– Avoiding a severe loss of accuracy
• Abstract to a set of smaller problems 108 states, 106 equations and unknowns
Alternate Game Structures
• Truncation of betting rounds• Bypassing betting rounds• Models with 3 rounds, 2 rounds, or 1 round
• Many-to-one mapping of game-tree nodes to single nodes in the abstract game tree– How you do the mapping determines the overall
accuracy (few good and many bad mappings)– This is the limiting factor of the method
Bet Sequence
Initial
Flop
Bet Sequence
Turn
Bet Sequence
River
1,624,350
9 of 19
9 of 19
45
9 of 19
44
17,296
19 Bet Sequence
TexasHold'emO(1018)
3-roundModel
(expected valueleaf nodes)
Bet Sequence
Initial
Flop
Bet Sequence
Turn
Bet Sequence
River
1,624,350
9 of 19
9 of 19
45
9 of 19
44
17,296
19 Bet Sequence
TexasHold'emO(1018)
3-roundPostflopModel
(single flop)
1-roundPreflopModel
Abstractions
• Board Q – 7 – 2 • Compare 1.A–3 2.A–4 3.A–K
– Suit isomorphism (24X) (exact)– Rank near-equivalence (small error)
• Bucketing Hands are mapped to a small set of buckets
depending on• Current hand strength• Potential for improvement in hand strength
Bucketing
• Reduce branching factor at chance nodes• Partition hands into six classes per player• Overlaying strategically similar sub-trees
1,1 1,2 1,3 6,6
1,1 1,2 1,3 .…
OriginalBucketing
Next RoundBucketing
Transition Probabilities
….
6,6
Bet Sequence
Initial
Flop
Bet Sequence
Turn
Bet Sequence
River
1,624,350
9 of 19
9 of 19
45
9 of 19
44
17,296
w2 (36)
7 of 15
7 of 15
7 of 15
19 Bet Sequence
15
x2 (36)
z2 (36)
y2 (36)
TexasHold'emO(1018) Abstract
PostflopModelO(107)
AbstractPreflopModelO(107)
Reverse Mapping
• Bucket splitting– LP solution gives a strategy (recipe)– Each partition class split strong / weak– Split the randomized mixed strategy– {0, 0.2, 0.8} => {0, 0, 1.0} & {0, 0.4, 0.6}
• Better hand selection (with some risk)
Putting It All Together – PsOpti1
Bets2 4 6 8
Preflop
Flop
Turn
River
Selby preflop model
Post Post Post Post
Putting It All Together – PsOpti2
Preflop
Flop
Turn
River
Bets +model
3-roundpreflop model
Post Post Post Post Post Post Post
2 4 4 6 6 8 8
Conclusions
• Game Theory can be applied to large problems and practical systems
• Nash Equilibrium (minimax) too defensive, does not exploit the opponent’s weaknesses
• Current work involves opponent modelling– Preliminary results are very promising
• We hope to beat the best poker players in the world in the near future