43
CSE 473: Ar+ficial Intelligence Adversarial Search Instructor: Luke Ze?lemoyer University of Washington [These slides were adapted from Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at h?p://ai.berkeley.edu.]

Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

CSE473:Ar+ficialIntelligenceAdversarialSearch

Instructor:LukeZe?lemoyerUniversityofWashington

[TheseslideswereadaptedfromDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableath?p://ai.berkeley.edu.]

Page 2: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

GamePlayingState-of-the-Art§  Checkers:1950:Firstcomputerplayer.1994:First

computerchampion:Chinookended40-year-reignofhumanchampionMarionTinsleyusingcomplete8-pieceendgame.2007:Checkerssolved!

§  Chess:1997:DeepBluedefeatshumanchampionGaryKasparovinasix-gamematch.DeepBlueexamined200Mposi+onspersecond,usedverysophis+catedevalua+onandundisclosedmethodsforextendingsomelinesofsearchupto40ply.Currentprogramsareevenbe?er,iflesshistoric.

§  Go:Humanchampionsarenowstar+ngtobechallengedbymachines,thoughthebesthumanss+llbeatthebestmachines(atleastun+lonemonthago!!!).Ingo,b>300!Classicprogramsusepa?ernknowledgebases,butbigrecentadvancesuseMonteCarlo(randomized)expansionmethods.

§  Pacman

Page 3: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

BehaviorfromComputa+on

[Demo:mysterypacman(L6D1)]

Page 4: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

VideoofDemoMysteryPacman

Page 5: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

AdversarialGames

Page 6: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

§  Manydifferentkindsofgames!

§  Axes:§  Determinis+corstochas+c?§  One,two,ormoreplayers?§  Zerosum?§  Perfectinforma+on(canyouseethestate)?

§  Wantalgorithmsforcalcula+ngastrategy(policy)whichrecommendsamovefromeachstate

TypesofGames

Page 7: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Determinis+cGames

§  Manypossibleformaliza+ons,oneis:§  States:S(startats0)§  Players:P={1...N}(usuallytaketurns)§  Ac+ons:A(maydependonplayer/state)§  Transi+onFunc+on:SxA→S§  TerminalTest:S→{t,f}§  TerminalU+li+es:SxP→R

§  Solu+onforaplayerisapolicy:S→A

Page 8: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Zero-SumGames

§  Zero-SumGames§  Agentshaveoppositeu+li+es(valueson

outcomes)§  Letsusthinkofasinglevaluethatone

maximizesandtheotherminimizes§  Adversarial,purecompe++on

§  GeneralGames§  Agentshaveindependentu+li+es(valueson

outcomes)§  Coopera+on,indifference,compe++on,and

moreareallpossible§  Morelateronnon-zero-sumgames

Page 9: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

AdversarialSearch

Page 10: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Single-AgentTrees

8

2 0 2 6 4 6… …

Page 11: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

ValueofaState

Non-TerminalStates:

8

2 0 2 6 4 6… … TerminalStates:

Valueofastate:Thebestachievableoutcome(u+lity)fromthatstate

Page 12: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

AdversarialGameTrees

-20 -8 -18 -5 -10 +4… … -20 +8

Page 13: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxValues

+8-10-5-8

StatesUnderAgent’sControl:

TerminalStates:

StatesUnderOpponent’sControl:

Page 14: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Tic-Tac-ToeGameTree

Page 15: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

AdversarialSearch(Minimax)

§  Determinis+c,zero-sumgames:§  Tic-tac-toe,chess,checkers§  Oneplayermaximizesresult§  Theotherminimizesresult

§  Minimaxsearch:§  Astate-spacesearchtree§  Playersalternateturns§  Computeeachnode’sminimaxvalue:

thebestachievableu+lityagainstara+onal(op+mal)adversary

8 2 5 6

max

min2 5

5

Terminalvalues:partofthegame

Minimaxvalues:computedrecursively

Page 16: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxImplementa+on

defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,max-value(successor))returnv

defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,min-value(successor))returnv

Page 17: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxImplementa+on(Dispatch)

defvalue(state):ifthestateisaterminalstate:returnthestate’su+lityifthenextagentisMAX:returnmax-value(state)ifthenextagentisMIN:returnmin-value(state)

defmin-value(state):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor))returnv

defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor))returnv

Page 18: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxExample

12 8 5 2 3 2 14 4 6

Page 19: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxEfficiency

§  Howefficientisminimax?§  Justlike(exhaus+ve)DFS§  Time:O(bm)§  Space:O(bm)

§  Example:Forchess,b≈35,m≈100§  Exactsolu+oniscompletelyinfeasible§  But,doweneedtoexplorethewhole

tree?

Page 20: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxProper+es

Op+malagainstaperfectplayer.Otherwise?

10 10 9 100

max

min

[Demo:minvsexp(L6D2,L6D3)]

Page 21: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

VideoofDemoMinvs.Exp(Min)

Page 22: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

VideoofDemoMinvs.Exp(Exp)

Page 23: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

GameTreePruning

Page 24: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxExample

12 8 5 2 3 2 14 4 6

Page 25: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

MinimaxPruning

12 8 5 2 3 2 14

Page 26: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaPruning

§  Generalconfigura+on(MINversion)§  We’recompu+ngtheMIN-VALUEatsomenoden§  We’reloopingovern’schildren

§  n’ses+mateofthechildrens’minisdropping§  Whocaresaboutn’svalue?MAX§  LetabethebestvaluethatMAXcangetatanychoice

pointalongthecurrentpathfromtheroot

§  Ifnbecomesworsethana,MAXwillavoidit,sowecanstopconsideringn’sotherchildren(it’salreadybadenoughthatitwon’tbeplayed)

§  MAXversionissymmetric

MAX

MIN

MAX

MIN

a

n

Page 27: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaPruningExample

12 5 1 3 2

8

14

≥8

3 ≤2 ≤1

3

α is MAX’s best alternative here or above β is MIN’s best alternative here or above

α=-∞β=+∞

α=-∞β=+∞

α=-∞β=+∞

α=-∞β=3

α=-∞β=3

α=-∞β=3

α=-∞β=3

α=8 β=3

α=3 β=+∞

α=3 β=+∞

α=3 β=+∞

α=3 β=+∞

α=3 β=2

α=3 β=+∞

α=3 β=14

α=3 β=5

α=3 β=1

Page 28: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaImplementa+on

defmin-value(state,α,β):ini+alizev=+∞ foreachsuccessorofstate:

v=min(v,value(successor,α,β))ifv≤αreturnvβ=min(β,v)

returnv

defmax-value(state,α,β):ini+alizev=-∞ foreachsuccessorofstate:

v=max(v,value(successor,α,β))ifv≥βreturnvα=max(α,v)

returnv

α:MAX’sbestop+ononpathtorootβ:MIN’sbestop+ononpathtoroot

Page 29: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaPruningProper+es

§  Thispruninghasnoeffectonminimaxvaluecomputedfortheroot!

§  Valuesofintermediatenodesmightbewrong§  Important:childrenoftherootmayhavethewrongvalue§  Sothemostnaïveversionwon’tletyoudoac+onselec+on

§  Goodchildorderingimproveseffec+venessofpruning

§  With“perfectordering”:§  TimecomplexitydropstoO(bm/2)§  Doublessolvabledepth!§  Fullsearchof,e.g.chess,iss+llhopeless…

§  Thisisasimpleexampleofmetareasoning(compu+ngaboutwhattocompute)

10 10 0

max

min

Page 30: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaQuiz

Page 31: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Alpha-BetaQuiz2

Page 32: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

ResourceLimits

Page 33: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

ResourceLimits

§  Problem:Inrealis+cgames,cannotsearchtoleaves!

§  Solu+on:Depth-limitedsearch§  Instead,searchonlytoalimiteddepthinthetree§  Replaceterminalu+li+eswithanevalua+onfunc+onfor

non-terminalposi+ons

§  Example:§  Supposewehave100seconds,canexplore10Knodes/sec§  Socancheck1Mnodespermove§  α-βreachesaboutdepth8–decentchessprogram

§  Guaranteeofop+malplayisgone

§  MorepliesmakesaBIGdifference

§  Useitera+vedeepeningforanany+mealgorithm? ? ? ?

-1 -2 4 9

4

min

max

-2 4

Page 34: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

DepthMa?ers

§  Evalua+onfunc+onsarealwaysimperfect

§  Thedeeperinthetreetheevalua+onfunc+onisburied,thelessthequalityoftheevalua+onfunc+onma?ers

§  Animportantexampleofthetradeoffbetweencomplexityoffeaturesandcomplexityofcomputa+on

[Demo:depthlimited(L6D4,L6D5)]

Page 35: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

LimitedDepth

Depth 2:

Page 36: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

LimitedDepth

Depth 10:

Page 37: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Evalua+onFunc+ons

Page 38: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Evalua+onFunc+ons§  Evalua+onfunc+onsscorenon-terminalsindepth-limitedsearch

§  Idealfunc+on:returnstheactualminimaxvalueoftheposi+on§  Inprac+ce:typicallyweightedlinearsumoffeatures:

§  e.g.f1(s)=(numwhitequeens–numblackqueens),etc.

Page 39: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

Evalua+onforPacman

[Demo:thrashingd=2,thrashingd=2(fixedevalua+onfunc+on),smartghostscoordinate(L6D6,7,8,10)]

Page 40: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

VideoofDemoThrashing(d=2)

Page 41: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

WhyPacmanStarves

§  Adangerofreplanningagents!§  Heknowshisscorewillgoupbyea+ngthedotnow(west,east)§  Heknowshisscorewillgoupjustasmuchbyea+ngthedotlater(east,west)§  Therearenopoint-scoringopportuni+esa~erea+ngthedot(withinthehorizon,twohere)§  Therefore,wai+ngseemsjustasgoodasea+ng:hemaygoeast,thenbackwestinthenext

roundofreplanning!

Page 42: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

VideoofDemoThrashing--Fixed(d=2)

Page 43: Adversarial Search - courses.cs.washington.edu...month ago!!!). In go, b > 300! Classic programs use paern knowledge bases, but big recent advances use Monte Carlo (randomized) expansion

NextTime:Uncertainty!