Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

CSE473:Ar+ficialIntelligenceAdversarialSearch

Instructor:LukeZe?lemoyerUniversityofWashington

[TheseslideswereadaptedfromDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableath?p://ai.berkeley.edu.]

GamePlayingState-of-the-Art§  Checkers:1950:Firstcomputerplayer.1994:First

computerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!

§  Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mposi+onspersecond,usedverysophis+catedevalua+onandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbe?er,iflesshistoric.

§  Go:Humanchampionsarenowstar+ngtobechallengedbymachines,thoughthebesthumanss+llbeatthebestmachines(atleastun+lonemonthago!!!).Ingo,b>300!Classicprogramsusepa?ernknowledgebases,butbigrecentadvancesuseMonteCarlo(randomized)expansionmethods.

§  Pacman

BehaviorfromComputa+on

[Demo:mysterypacman(L6D1)]

VideoofDemoMysteryPacman

AdversarialGames

§  Manydifferentkindsofgames!

§  Axes:§  Determinis+corstochas+c?§  One,two,ormoreplayers?§  Zerosum?§  Perfectinforma+on(canyouseethestate)?

§  Wantalgorithmsforcalcula+ngastrategy(policy)whichrecommendsamovefromeachstate

TypesofGames

Determinis+cGames

§  Manypossibleformaliza+ons,oneis:§  States:S(startats0)§  Players:P={1...N}(usuallytaketurns)§  Ac+ons:A(maydependonplayer/state)§  Transi+onFunc+on:SxA→S§  TerminalTest:S→{t,f}§  TerminalU+li+es:SxP→R

§  Solu+onforaplayerisapolicy:S→A

Zero-SumGames

§  Zero-SumGames§  Agentshaveoppositeu+li+es(valueson

outcomes)§  Letsusthinkofasinglevaluethatone

maximizesandtheotherminimizes§  Adversarial,purecompe++on

§  GeneralGames§  Agentshaveindependentu+li+es(valueson

outcomes)§  Coopera+on,indifference,compe++on,and

moreareallpossible§  Morelateronnon-zero-sumgames

AdversarialSearch

Single-AgentTrees

8

2 0 2 6 4 6… …

ValueofaState

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome(u+lity)fromthatstate

AdversarialGameTrees

-20 -8 -18 -5 -10 +4… … -20 +8

MinimaxValues

+8-10-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

Tic-Tac-ToeGameTree

AdversarialSearch(Minimax)

§  Determinis+c,zero-sumgames:§  Tic-tac-toe,chess,checkers§  Oneplayermaximizesresult§  Theotherminimizesresult

§  Minimaxsearch:§  Astate-spacesearchtree§  Playersalternateturns§  Computeeachnode’sminimaxvalue:

thebestachievableu+lityagainstara+onal(op+mal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimaxvalues:computedrecursively

MinimaxImplementa+on

defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,max-value(successor))returnv

defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,min-value(successor))returnv

MinimaxImplementa+on(Dispatch)

defvalue(state):ifthestateisaterminalstate:returnthestate’su+lityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)

defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor))returnv

defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))returnv

MinimaxExample

12 8 5 2 3 2 14 4 6

MinimaxEfficiency

§  Howefficientisminimax?§  Justlike(exhaus+ve)DFS§  Time:O(bm)§  Space:O(bm)

§  Example:Forchess,b≈35,m≈100§  Exactsolu+oniscompletelyinfeasible§  But,doweneedtoexplorethewhole

tree?

MinimaxProper+es

Op+malagainstaperfectplayer.Otherwise?

10 10 9 100

max

min

[Demo:minvsexp(L6D2,L6D3)]

VideoofDemoMinvs.Exp(Min)

VideoofDemoMinvs.Exp(Exp)

GameTreePruning

MinimaxExample

12 8 5 2 3 2 14 4 6

MinimaxPruning

12 8 5 2 3 2 14

Alpha-BetaPruning

§  Generalconfigura+on(MINversion)§  We’recompu+ngtheMIN-VALUEatsomenoden§  We’reloopingovern’schildren

§  n’ses+mateofthechildrens’minisdropping§  Whocaresaboutn’svalue?MAX§  LetabethebestvaluethatMAXcangetatanychoice

pointalongthecurrentpathfromtheroot

§  Ifnbecomesworsethana,MAXwillavoidit,sowecanstopconsideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)

§  MAXversionissymmetric

MAX

MIN

MAX

MIN

a

n

Alpha-BetaPruningExample

12 5 1 3 2

8

14

≥8

3 ≤2 ≤1

3

α is MAX’s best alternative here or above β is MIN’s best alternative here or above

α=-∞β=+∞

α=-∞β=+∞

α=-∞β=+∞

α=-∞β=3

α=-∞β=3

α=-∞β=3

α=-∞β=3

α=8 β=3

α=3 β=+∞

α=3 β=+∞

α=3 β=+∞

α=3 β=+∞

α=3 β=2

α=3 β=+∞

α=3 β=14

α=3 β=5

α=3 β=1

Alpha-BetaImplementa+on

defmin-value(state,α,β):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)

returnv

defmax-value(state,α,β):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)

returnv

α:MAX’sbestop+ononpathtorootβ:MIN’sbestop+ononpathtoroot

Alpha-BetaPruningProper+es

§  Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!

§  Valuesofintermediatenodesmightbewrong§  Important:childrenoftherootmayhavethewrongvalue§  Sothemostnaïveversionwon’tletyoudoac+onselec+on

§  Goodchildorderingimproveseffec+venessofpruning

§  With“perfectordering”:§  TimecomplexitydropstoO(bm/2)§  Doublessolvabledepth!§  Fullsearchof,e.g.chess,iss+llhopeless…

§  Thisisasimpleexampleofmetareasoning(compu+ngaboutwhattocompute)

10 10 0

max

min

Alpha-BetaQuiz

Alpha-BetaQuiz2

ResourceLimits

ResourceLimits

§  Problem:Inrealis+cgames,cannotsearchtoleaves!

§  Solu+on:Depth-limitedsearch§  Instead,searchonlytoalimiteddepthinthetree§  Replaceterminalu+li+eswithanevalua+onfunc+onfor

non-terminalposi+ons

§  Example:§  Supposewehave100seconds,canexplore10Knodes/sec§  Socancheck1Mnodespermove§  α-βreachesaboutdepth8–decentchessprogram

§  Guaranteeofop+malplayisgone

§  MorepliesmakesaBIGdifference

§  Useitera+vedeepeningforanany+mealgorithm? ? ? ?

-1 -2 4 9

4

min

max

-2 4

DepthMa?ers

§  Evalua+onfunc+onsarealwaysimperfect

§  Thedeeperinthetreetheevalua+onfunc+onisburied,thelessthequalityoftheevalua+onfunc+onma?ers

§  Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputa+on

[Demo:depthlimited(L6D4,L6D5)]

LimitedDepth

Depth 2:

LimitedDepth

Depth 10:

Evalua+onFunc+ons

Evalua+onFunc+ons§  Evalua+onfunc+onsscorenon-terminalsindepth-limitedsearch

§  Idealfunc+on:returnstheactualminimaxvalueoftheposi+on§  Inprac+ce:typicallyweightedlinearsumoffeatures:

§  e.g.f1(s)=(numwhitequeens–numblackqueens),etc.

Evalua+onforPacman

[Demo:thrashingd=2,thrashingd=2(fixedevalua+onfunc+on),smartghostscoordinate(L6D6,7,8,10)]

VideoofDemoThrashing(d=2)

WhyPacmanStarves

§  Adangerofreplanningagents!§  Heknowshisscorewillgoupbyea+ngthedotnow(west,east)§  Heknowshisscorewillgoupjustasmuchbyea+ngthedotlater(east,west)§  Therearenopoint-scoringopportuni+esa~erea+ngthedot(withinthehorizon,twohere)§  Therefore,wai+ngseemsjustasgoodasea+ng:hemaygoeast,thenbackwestinthenext

roundofreplanning!

VideoofDemoThrashing--Fixed(d=2)

NextTime:Uncertainty!

Documents

Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion