Artificial Intelligence and Optimization with Parallelism

Embed Size (px)

DESCRIPTION

My habilitation thesis, 2011

Citation preview

  • 1. HABILITATION Artificial intelligence with ParallelismAcknowledgments:All the TAO team. People in Lige, Taiwan, Lri,Artelys, Mash, Iomca, ..,Thanks a lot to the committee.Thanks + good recovery to Jonathan Shapiro.Thanks to Grid5000.Olivier Teytaud [email protected]

2. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 3. AI = using computers where theyare weak / weaker than humans. (thanks Michle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly 4. AI = using computers where theyare weak / weaker than humans. (thanks Michle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly 5. AI = using computers where theyare weak / weaker than humans. (thanks Michle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly 6. AI = using computers where theyare weak / weaker than humans. (thanks Michle S.)Difficult optimization (complex structure, noisy objective functions)Games (difficult ones)Key difference with many operational research works:AI = choosing a model as close as possible to reality and (very) approximately solve itOR = choosing the best model that you can solve almost exactly 7. Many works are about numbers.Providing standard deviations, rates, etc.Other goal (more ambitious ?):switching from something which does not work to something which works.E.g. vision; a computer can distinguish: 8. But it cant distinguish so easily: 9. And its a disaster for categorizing- children,- women,- panda,- babies,- children- men,- bears,- trucks,- cars. 10. And its a disaster for categorizing children,women, panda, babies, children, men, bears, trucks, cars. 11. And its a disaster for categorizing children,women, panda, babies, children, men, bears, trucks, cars. 3 years old; she can do it. 12. ==> AI= focus on things which do notwork and (hopefully) make them work. 13. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 14. Evolutionary optimization is a part of A.I.Often considered as bad, because many EO tools are not that hard, mathematically speaking.Ive met people using - randomized mutations - cross-oversbut who did not call this evolutionary orgenetic, because it would be bad. 15. Gives a lot freedom: - choose your operators (depending on the problem) - choose your population-size (depending on your computer/grid ) - choose (carefully) e.g. min(dimension, /4)==> Can work on strange domains 16. Voronoi representation of a shape:- a family of points (thanks Marc S.) 17. Voronoi representation:- a family of points 18. Voronoi representation:- a family of points - their labels 19. Voronoi representation: - a family of points- their labels==> cross-over makes sense==> you can optimize a shape 20. Voronoi representation: - a family of points- their labels==> cross-over makes sense==> you can optimize a shape 21. Voronoi representation: - a family of points- their labels==> cross-over makes sense==> you can optimize a shape Great substitute for averaging.on the benefit of sex 22. Cantilever optimization:Hamda et al, 2000 23. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 24. Parallelism.Multi-core machinesClustersGridsSometimes parallelization completely changesthe picture. 25. Parallelism. Thank you G5KMulti-core machinesClustersGridsSometimes parallelization completely changesthe picture.Sometimes not.We want to know when. 26. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy cases Robustness,Sequential decision making slow rates. Fundamental facts Monte-Carlo Tree SearchConclusion 27. Derivative-free optimization of f No gradient ! Only depends on the xs and f(x)s 28. Derivative-free optimization of fWhy derivative free optimization ? 29. Derivative-free optimization of fWhy derivative free optimization ? Ok, its slower 30. Derivative-free optimization of fWhy derivative free optimization ? Ok, its slower But sometimes you have no derivative 31. Derivative-free optimization of fWhy derivative free optimization ? Ok, its slower But sometimes you have no derivative Its simpler (by far) ==> less bugs 32. Derivative-free optimization of fWhy derivative free optimization ? Ok, its slower But sometimes you have no derivative Its simpler (by far) Its more robust (to noise, to strange functions...) 33. Derivative-free optimization of fOptimization algorithms==> Newton optimization ?Why derivative free ==> Quasi-Newton (BFGS) Ok, its slower But sometimes you have no derivative ==> Gradient descent Its simpler (by far) ==> ...robust (to noise, to strange functions...) Its more 34. Derivative-free optimization of fOptimization algorithmsWhy derivative free optimization ? Ok, its slowerDerivative-freeoptimization But sometimes you have no derivative (dont need gradients) Its simpler (by far) Its more robust (to noise, to strange functions...) 35. Derivative-free optimization of fOptimization algorithmsWhy derivative free optimization ?Derivative-free optimization Ok, its slower But sometimes you have no derivativeComparison-based optimization(coming soon), Its simpler (by far)comparisons,just needing Its more robust (to noise, to strange functions...) including evolutionary algorithms 36. Comparison-based optimizationyi=f(xi)is comparison-based if parallel evolution 36 37. Population-based comparison-based algorithmsX(1)=( x(1,1),x(1,2),...,x(1,) ) = Opt()X(2)=( x(2,1),x(2,2),...,x(2,) ) = Opt(x(1),signs of diff) ...x(n)=( x(n,1),x(n,2),...,x(n,) ) = Opt(x(n-1),signs of diff)parallel evolution 37 38. P-based c-based algorithms w/ internal state( X(1)=( x(1,1),x(1,2),...,x(1,) ),I(1) ) = Opt()( X(2)=( x(2,1),x(2,2),...,x(2,) ),I(2) ) = Opt(x(1),I(1), signs of diff) ...( x(n)=( x(n,1),x(n,2),...,x(n,) ),I(n) ) = Opt(x(n-1),I(n), signs of diff)parallel evolution 38 39. Comparison-based algorithms are robust Consider f: X --> R We look for x* such thatx,f(x*) f(x) ==> what if we see g o f (g increasing) ? ==> x* is the same, but xn might change parallel evolution 39 40. Robustness of comparison-based algorithms: formal statement this does not depend on g for acomparison-based algorithm a comparison-based algorithm is optimalfor parallel evolution 40 41. Complexity bounds (N = dimension) = nb of fitness evaluations for precision with probability at least for all f Exp ( - Convergence ratio ) = Convergence rate Convergence ratio ~ 1 / computational cost==> more convenient than conv. rate for speed-ups parallel evolution 41 42. Complexity bounds: basic techniqueWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution42 43. Complexity bounds: basic techniqueWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution43 44. Complexity bounds: basic techniqueWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution44 45. Complexity bounds: basic techniqueWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution45 46. Complexity bounds: -ballsWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution46 47. Complexity bounds: -ballsWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution47 48. Complexity bounds: -ballsWe want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitist algorithmFirst idea: how many different branches we have in a run ? We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ? Use packing numbers: at least N() different possible answersparallel evolution48 49. Complexity bounds: basic technique We want to know how many iterations we need for reaching precision in an evolutionary algorithm. Key observation: (most) evolutionary algorithms are comparison-basedLets consider (for simplicity) a deterministic selection-based non-elitistalgorithmFirst idea: how many different branches we have in a run ?We select points among Therefore, at most K = ! / ( ! ( - )!) different branchesSecond idea: how many different answers should we able to give ?Use packing numbers: at least N() different possible answers Conclusion: the number n of iterations should verify Kn N ( )parallel evolution 49 50. Complexity bounds on the convergence ratioFR: full ranking (selected points are ranked)SB: selection-based (selected points are not ranked) parallel evolution 50 51. Complexity bounds on the convergence ratio This is why I lovecross-over.FR: full ranking (selected points are ranked)SB: selection-based (selected points are not ranked)parallel evolution 51 52. Complexity bounds on the convergence ratio Fournier, T., 2009; using VC-dim.FR: full ranking (selected points are ranked)SB: selection-based (selected points are not ranked) parallel evolution 52 53. Complexity bounds on the convergence ratioQuadratic functions easier than sphere functions ? But not for translation invariant quadratic functions...FR: full ranking (selected points are ranked)SB: selection-based (selected points are not ranked) parallel evolution 53 54. Complexity bounds on the convergence ratioQuadratic functions easier than sphere functions ? But not for translation invariant quadratic functions...FR: full ranking (selected points are ranked) results. Covers existingSB: selection-based (selected pointswith discrete domains.Compliant are not ranked) parallel evolution54 55. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization1) Mathematical proof that all Parallelization comparison-based algorithms Noisy cases can be parallelized (log speed-up)Sequential decision making Fundamental facts 2) Practical hint: simple tricks Monte-Carlo Tree Search for some well-known algorithmsConclusion 56. Speculative parallelization with branching factor 3 Consider the sequential algorithm. (iteration 1)parallel evolution 56 57. Speculative parallelization with branching factor 3Consider the sequential algorithm.(iteration 2)parallel evolution 57 58. Speculative parallelization with branching factor 3 Consider the sequential algorithm. (iteration 3)parallel evolution 58 59. Speculative parallelization with branching factor 3Parallel version for D=2.Population = union of all pops for 2 iterations. parallel evolution 59 60. Automatic parallelizationTeytaud, T, PPSN 2010parallel evolution 60 61. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization1) Mathematical proof that all Parallelization comparison-based algorithms Noisy cases can be parallelized (log speed-up)Sequential decision making Fundamental facts 2) Practical hint: simple tricks Monte-Carlo Tree Search for some well-known algorithmsConclusion 62. Define:Necessary condition for log() speed-up: - E log( * ) ~ log()But for many algorithms,- E log( * ) = O(1) ==> asymptotically constant speed-up 63. These algos do not reach the log(lambda) speed-up.th (1+1)-ES with 1/5 ruleStandard CSA Standard EMNA Standard SA. Teytaud, T, PPSN 2010 64. Example 1: Estimation of Multivariate Normal AlgorithmWhile ( I have time ) { Generate points (x1,...,x) distributed as N(x,) Evaluate the fitness at x1,...,x X= mean best points= standard deviation of best points /= log( / 7)1 / d } 65. Ex 2: Log(lambda) correction for mutative self-adapt. = min( /4,d)While ( I have time ){ Generate points (1,...,) as x exp(- k.N) Generate points (x1,...,x) distributed as N(x,i) Select the best points Update x (=mean), update (=log. mean)} 66. Log() corrections (SA, dim 3) In the discrete case (XPs): automaticparallelization surprisingly efficient. Simple trick in the continuous case - E log( *) should be linear in log()(this provides corrections which work for SA and CSA) parallel evolution 66 67. Log() corrections In the discrete case (XPs): automaticparallelization surprisingly efficient. Simple trick in the continuous case - E log( *) should be linear in log()(this provides corrections which work for SA and CSA) parallel evolution 67 68. SUMMARY of the EA part up to now: - evolutionary algorithms are robust (with a precise statement of this robustness) - evolutionary algorithms are somehow slow (precisely quantified...) - evolutionary algorithms are parallel (at least until the dimension for the conv. rate) 69. SUMMARY of the EA part up to now: - evolutionary algorithms are robust (with a precise statement of this robustness) - evolutionary algorithms are somehow slow (precisely quantified...) - evolutionary algorithms are parallel (at least until the dimension for the conv. rate)Now, noisy optimization 70. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 71. Many works focus on fitness functions with small noise: f(x) = ||x||2 x (1+Gaussian )This is because the more realistic case f(x) = ||x||2 + Gaussian (variance >0 at optimum)is too hard for publishing nice curves. 72. Many works focus on fitness functions with small noise: f(x) = ||x||2 x (1+Gaussian )This is because the more realistic case f(x) = ||x||2 + Gaussianis too hard for publishing nice curves.==> see however Arnold Beyer 2006.==> a tool: races ( Heidrich-Meisner et al, Icml 2009)- reevaluating until statistically significant differences- but we must (sometimes) limit the number ofreevaluations 73. Another difficult case: Bernoulli functions.fitness(x) = B( f(x) ) f(0) not necessarily = 0. 74. Another difficult case: Bernoulli functions.EDA Based on fitness(x) = B( f(x) ) + races MaxUncertaintyf(0) not necessarily = 0.(Coulom) 75. Another difficult case: Bernoulli functions.EDA Based on fitness(x) = B( f(x) ) + races MaxUncertaintyf(0) not necessarily = 0.(Coulom)I like this caseWith p=2with p=2 76. Another difficult case: Bernoulli functions.EDA Based on fitness(x) = B( f(x) ) + races MaxUncertaintyf(0) not necessarily = 0.(Coulom)I like this caseWith p=2with p=2 77. Another difficult case: Bernoulli functions.EDA Based on fitness(x) = B( f(x) ) + races MaxUncertaintyf(0) not necessarily = 0.(Coulom)We prove good results here.I like this caseWith p=2with p=2 78. Another difficult case: Bernoulli functions.EDA Based on fitness(x) = B( f(x) ) + races MaxUncertaintyf(0) not necessarily = 0.(Coulom)We prove good results here. We prove goodI like this case results here.With p=2with p=2 79. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 80. The game of Go is a part of AI.Computers are ridiculous in front of children.Easy situation.Termed semeai.Requires a little bitof abstraction. 81. The game of Go is a part of AI.Computers are ridiculous in front of children. 800 cores, 4.7 GHz, top level program. Plays a stupid move. 82. The game of Go is a part of AI.Computers are ridiculous in front of children. 8 years old; little training; finds the good move 83. Introduction What is AI ? Why evolutionary optimization is a part of AI Why parallelism ?Evolutionary computation Comparison-based optimization Parallelization Noisy casesSequential decision making Fundamental facts Monte-Carlo Tree SearchConclusion 84. Monte-Carlo Tree Search1. Games (a bit of formalism)2. Decidability / complexity Games with simultaneous actions 84 Paris 1st of February 85. A game is a directed graph parallel evolution 85 86. A game is a directed graph with actions12 3parallel evolution 86 87. A game is a directed graph with actions and players 1WhiteBlack 23White 12 43WhiteBlackBlack BlackBlackparallel evolution 87 88. A game is a directed graph with actionsand players and observations Bob Bear Bee Bee1 WhiteBlack23 White1243 WhiteBlack BlackBlack Black parallel evolution 88 89. A game is a directed graph with actionsand players and observations and rewards Bob Bear Bee Bee1 WhiteBlack2+130 White12Rewards43 WhiteBlack on leafs Black only!Black Black parallel evolution89 90. A game is a directed graph +actions+players +observations +rewards +loops Bob Bear Bee Bee1 WhiteBlack2+130 White1243 WhiteBlack BlackBlack Black parallel evolution90 91. Monte-Carlo Tree Search1. Games (a bit of formalism)2. Decidability / complexity Games with simultaneous actions 91 Paris 1st of February 92. Complexity (2P, no random) Unbounded Exponential Polynomial horizonhorizonhorizonFull ObservabilityEXP EXPPSPACENo obsEXPSPACE NEXP(X=100%) (Hasslum et al, 2000)Partially 2EXPEXPSPACEObservable (Rintanen, 97)(X=100%)Simult. Actions? EXPSPACE ? 0.5==> also undecidable for therestriction to games in which the probais >0.6 or not just a subtleprecision trouble. 104. Monte-Carlo Tree Search MCTS principle But withEXP3 in nodes forhidden information. 105. UCT (Upper Confidence Trees)Coulom (06)Chaslot, Saito & Bouzy (06)Kocsis Szepesvari (06) 106. UCT 107. UCT 108. UCT 109. UCT 110. UCTKocsis & Szepesvari (06) 111. Exploitation ... 112. Exploitation ...SCORE =5/7 + k.sqrt( log(10)/7 ) 113. Exploitation ...SCORE =5/7 + k.sqrt( log(10)/7 ) 114. Exploitation ...SCORE =5/7 + k.sqrt( log(10)/7 ) 115. ... or exploration ?SCORE =0/2 + k.sqrt( log(10)/2 ) 116. ... or exploration ?SCORE =0/2 + k.sqrt( log(10)/2 )Binary win/loss games: no explo! (Berthier, D., T., 2010) 117. Games vs prosin the game of GoFirst win in 9x9First win over 5 games in 9x9 blind GoFirst win with H2.5 in 13x13 GoFirst win with H6 in 19x19 GoFirst win with H7 in 19x19 Go vs top pro 118. ... or exploration ?SCORE =0/2 + k.sqrt( log(10)/2 )Simultaneous actions:replace it with EXP3 / INF 119. MCTS for simultaneous actions Player 1 playsPlayer 2 playsBoth playersplay...Player 1 plays Player 2 plays 120. MCTS for simultaneous actionsPlayer 1 plays = maxUCB nodePlayer 2 plays Both players play=minUCB node =EXP3 nodePlayer 1 plays...Player 2 plays =maxUCB node =minUCB node 121. MCTS for hidden informationPlayer 1Observation set 1 Observation set 2EXP3 node EXP3 nodeObservation set 3EXP3 node Player 2Observation set 2 Observation set 1EXP3 node EXP3 node Observation set 3 EXP3 node 122. MCTS for hidden informationPlayer 1Observation set 1 Observation set 2EXP3 node EXP3 nodeObservation set 3EXP3 node Thanks Martin(incrementally + application to phantom-tic-tac-toe: see D. Auger 2010) Player 2Observation set 2 Observation set 1EXP3 node EXP3 node Observation set 3 EXP3 node 123. EXP3 in one slideGrigoriadis et al, Auer et al, Audibert & Bubeck Colt 2009 124. Monte-Carlo Tree SearchAppli to Urban Rivals ==>(simultaneous actions) Games with simultaneous actions 124 Paris 1st of February 125. Lets have fun with Urban Rivals (4 cards) Each player has - four cards (each one can be used once) - 12 pilz (each one can be used once) - 12 life points Each card has: - one attack level - one damage - special effects (forget that...) Four turns: P1 attacks P2, P2 attacks P1, P1 attacks P2, P2 attacks P1.parallel evolution 125 126. Lets have fun with Urban RivalsFirst, attacker plays:- chooses a card- chooses ( PRIVATELY ) a number of pilz Attack level = attack(card) x (1+nb of pilz)Then, defender plays: - chooses a card - chooses a number of pilz Defense level = attack(card) x (1+nb of pilz)Result: If attack > defenseDefender looses Power(attackers card) ElseAttacker looses Power(defenders card) parallel evolution 126 127. Lets have fun with Urban Rivals ==> The MCTS-based AI is now at the best human level. Experimental (only) remarks on EXP3: - discard strategies with small number of sims = better approx of the Nash - also an improvement by takinginto account the other bandit - virtual simulations (inspired by Kummer)parallel evolution 127 128. When is MCTS relevant ? Robust in front of:High dimension;Non-convexity of Bellman values;Complex modelsDelayed rewardSimultaneous actions, partial informationMore difficult forHigh values of H;Model-freeHighly unobservable cases (Monte-Carlo, but not Monte-Carlo TreeSearch, see Cazenave et al.)Lack of reasonable baseline for the MC 129. When is MCTS relevant ? T., Dagstuhl 2010,D. Auger, Robust in front of:EvoStar 2011.EvoStar 2011;High dimension; UnpublishedNon-convexity of Bellman values;Complex modelsresults onDelayed reward Some endgames undecidabilitySimultaneous actionsMore difficult for resultsHigh values of H;Model-freeHighly unobservable cases (Monte-Carlo, but not Monte-Carlo TreeSearch, see Cazenave et al.)Lack of reasonable baseline for the MC 130. ConclusionEvo. Opt: robustness, tight bounds, simple algorithmic modifs for better speed-up (SA, 1/5th, (CSA))MCTS just great (but requires a model); UCB not necessary; extension to hidden info (rmk: undecidability); PO endgames; but no abstraction power.Noisy optimization: Consider high noise. Use QR and Learning (in all EA in fact).Not mentioned here: multimodal, multiobj, GP, bandits. 131. Future ? - Solving semeais ? Would involve great AI progress I think... - Noisy optimization; there are still things to be done. ==> Promoting high noise fitness functions even if it is less publication-efficient. - ``Inheritance of belief state in partially observable games. Big progress to be done. Crucial for applications. - Sparse bandits / mixed stochastic/adversarial cases.Thanks for your attention. Thanks to all collaborators for all Ive learnt with them. 132. Appendix 1:MCTS with hidden information 133. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 134. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 135. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 136. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 137. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 138. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 139. MCTS with hidden information:incremental versionWhile (there is time for thinking){s=initial stateos(1)=() os(2)=()while (s not terminal){p=player(s)b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation} 140. MCTS with hidden information:incremental versionWhile (there is time for thinking){ Possiblys=initial stateos(1)=() os(2)=()refinewhile (s not terminal) the family{p=player(s)of bandits.b=Exp3Bandit(os(p))d=b.makeDecision(s,o)=transition(s,d)}send reward to all bandits in the simulation}