Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
CSE473:Ar+ficialIntelligenceAdversarialSearch
Instructor:LukeZe?lemoyerUniversityofWashington
[TheseslideswereadaptedfromDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableath?p://ai.berkeley.edu.]
GamePlayingState-of-the-Art§ Checkers:1950:Firstcomputerplayer.1994:First
computerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!
§ Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mposi+onspersecond,usedverysophis+catedevalua+onandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbe?er,iflesshistoric.
§ Go:Humanchampionsarenowstar+ngtobechallengedbymachines,thoughthebesthumanss+llbeatthebestmachines(atleastun+lonemonthago!!!).Ingo,b>300!Classicprogramsusepa?ernknowledgebases,butbigrecentadvancesuseMonteCarlo(randomized)expansionmethods.
§ Pacman
BehaviorfromComputa+on
[Demo:mysterypacman(L6D1)]
VideoofDemoMysteryPacman
AdversarialGames
§ Manydifferentkindsofgames!
§ Axes:§ Determinis+corstochas+c?§ One,two,ormoreplayers?§ Zerosum?§ Perfectinforma+on(canyouseethestate)?
§ Wantalgorithmsforcalcula+ngastrategy(policy)whichrecommendsamovefromeachstate
TypesofGames
Determinis+cGames
§ Manypossibleformaliza+ons,oneis:§ States:S(startats0)§ Players:P={1...N}(usuallytaketurns)§ Ac+ons:A(maydependonplayer/state)§ Transi+onFunc+on:SxA→S§ TerminalTest:S→{t,f}§ TerminalU+li+es:SxP→R
§ Solu+onforaplayerisapolicy:S→A
Zero-SumGames
§ Zero-SumGames§ Agentshaveoppositeu+li+es(valueson
outcomes)§ Letsusthinkofasinglevaluethatone
maximizesandtheotherminimizes§ Adversarial,purecompe++on
§ GeneralGames§ Agentshaveindependentu+li+es(valueson
outcomes)§ Coopera+on,indifference,compe++on,and
moreareallpossible§ Morelateronnon-zero-sumgames
AdversarialSearch
Single-AgentTrees
8
2 0 2 6 4 6… …
ValueofaState
Non-TerminalStates:
8
2 0 2 6 4 6… … TerminalStates:
Valueofastate:Thebestachievableoutcome(u+lity)fromthatstate
AdversarialGameTrees
-20 -8 -18 -5 -10 +4… … -20 +8
MinimaxValues
+8-10-5-8
StatesUnderAgent’sControl:
TerminalStates:
StatesUnderOpponent’sControl:
Tic-Tac-ToeGameTree
AdversarialSearch(Minimax)
§ Determinis+c,zero-sumgames:§ Tic-tac-toe,chess,checkers§ Oneplayermaximizesresult§ Theotherminimizesresult
§ Minimaxsearch:§ Astate-spacesearchtree§ Playersalternateturns§ Computeeachnode’sminimaxvalue:
thebestachievableu+lityagainstara+onal(op+mal)adversary
8 2 5 6
max
min2 5
5
Terminalvalues:partofthegame
Minimaxvalues:computedrecursively
MinimaxImplementa+on
defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:
v=min(v,max-value(successor))returnv
defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:
v=max(v,min-value(successor))returnv
MinimaxImplementa+on(Dispatch)
defvalue(state):ifthestateisaterminalstate:returnthestate’su+lityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)
defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:
v=min(v,value(successor))returnv
defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:
v=max(v,value(successor))returnv
MinimaxExample
12 8 5 2 3 2 14 4 6
MinimaxEfficiency
§ Howefficientisminimax?§ Justlike(exhaus+ve)DFS§ Time:O(bm)§ Space:O(bm)
§ Example:Forchess,b≈35,m≈100§ Exactsolu+oniscompletelyinfeasible§ But,doweneedtoexplorethewhole
tree?
MinimaxProper+es
Op+malagainstaperfectplayer.Otherwise?
10 10 9 100
max
min
[Demo:minvsexp(L6D2,L6D3)]
VideoofDemoMinvs.Exp(Min)
VideoofDemoMinvs.Exp(Exp)
GameTreePruning
MinimaxExample
12 8 5 2 3 2 14 4 6
MinimaxPruning
12 8 5 2 3 2 14
Alpha-BetaPruning
§ Generalconfigura+on(MINversion)§ We’recompu+ngtheMIN-VALUEatsomenoden§ We’reloopingovern’schildren
§ n’ses+mateofthechildrens’minisdropping§ Whocaresaboutn’svalue?MAX§ LetabethebestvaluethatMAXcangetatanychoice
pointalongthecurrentpathfromtheroot
§ Ifnbecomesworsethana,MAXwillavoidit,sowecanstopconsideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)
§ MAXversionissymmetric
MAX
MIN
MAX
MIN
a
n
Alpha-BetaPruningExample
12 5 1 3 2
8
14
≥8
3 ≤2 ≤1
3
α is MAX’s best alternative here or above β is MIN’s best alternative here or above
α=-∞β=+∞
α=-∞β=+∞
α=-∞β=+∞
α=-∞β=3
α=-∞β=3
α=-∞β=3
α=-∞β=3
α=8 β=3
α=3 β=+∞
α=3 β=+∞
α=3 β=+∞
α=3 β=+∞
α=3 β=2
α=3 β=+∞
α=3 β=14
α=3 β=5
α=3 β=1
Alpha-BetaImplementa+on
defmin-value(state,α,β):ini+alizev=+∞ foreachsuccessorofstate:
v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)
returnv
defmax-value(state,α,β):ini+alizev=-∞ foreachsuccessorofstate:
v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)
returnv
α:MAX’sbestop+ononpathtorootβ:MIN’sbestop+ononpathtoroot
Alpha-BetaPruningProper+es
§ Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!
§ Valuesofintermediatenodesmightbewrong§ Important:childrenoftherootmayhavethewrongvalue§ Sothemostnaïveversionwon’tletyoudoac+onselec+on
§ Goodchildorderingimproveseffec+venessofpruning
§ With“perfectordering”:§ TimecomplexitydropstoO(bm/2)§ Doublessolvabledepth!§ Fullsearchof,e.g.chess,iss+llhopeless…
§ Thisisasimpleexampleofmetareasoning(compu+ngaboutwhattocompute)
10 10 0
max
min
Alpha-BetaQuiz
Alpha-BetaQuiz2
ResourceLimits
ResourceLimits
§ Problem:Inrealis+cgames,cannotsearchtoleaves!
§ Solu+on:Depth-limitedsearch§ Instead,searchonlytoalimiteddepthinthetree§ Replaceterminalu+li+eswithanevalua+onfunc+onfor
non-terminalposi+ons
§ Example:§ Supposewehave100seconds,canexplore10Knodes/sec§ Socancheck1Mnodespermove§ α-βreachesaboutdepth8–decentchessprogram
§ Guaranteeofop+malplayisgone
§ MorepliesmakesaBIGdifference
§ Useitera+vedeepeningforanany+mealgorithm? ? ? ?
-1 -2 4 9
4
min
max
-2 4
DepthMa?ers
§ Evalua+onfunc+onsarealwaysimperfect
§ Thedeeperinthetreetheevalua+onfunc+onisburied,thelessthequalityoftheevalua+onfunc+onma?ers
§ Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputa+on
[Demo:depthlimited(L6D4,L6D5)]
LimitedDepth
Depth 2:
LimitedDepth
Depth 10:
Evalua+onFunc+ons
Evalua+onFunc+ons§ Evalua+onfunc+onsscorenon-terminalsindepth-limitedsearch
§ Idealfunc+on:returnstheactualminimaxvalueoftheposi+on§ Inprac+ce:typicallyweightedlinearsumoffeatures:
§ e.g.f1(s)=(numwhitequeens–numblackqueens),etc.
Evalua+onforPacman
[Demo:thrashingd=2,thrashingd=2(fixedevalua+onfunc+on),smartghostscoordinate(L6D6,7,8,10)]
VideoofDemoThrashing(d=2)
WhyPacmanStarves
§ Adangerofreplanningagents!§ Heknowshisscorewillgoupbyea+ngthedotnow(west,east)§ Heknowshisscorewillgoupjustasmuchbyea+ngthedotlater(east,west)§ Therearenopoint-scoringopportuni+esa~erea+ngthedot(withinthehorizon,twohere)§ Therefore,wai+ngseemsjustasgoodasea+ng:hemaygoeast,thenbackwestinthenext
roundofreplanning!
VideoofDemoThrashing--Fixed(d=2)
NextTime:Uncertainty!