27
CSE 473: Ar+ficial Intelligence Uncertainty and Expec+max Tree Search Instructors: Luke ZeDlemoyer Univeristy of Washington [These slides were adapted from Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at hDp://ai.berkeley.edu.]

CSE 473: Ar+ficial Intelligence - University of Washington · 2018. 1. 22. · § Chance nodes are like min nodes but the outcome is uncertain § Calculate their expected u+li+es

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • CSE473:Ar+ficialIntelligenceUncertaintyandExpec+maxTreeSearch

    Instructors:LukeZeDlemoyer

    UniveristyofWashington[TheseslideswereadaptedfromDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathDp://ai.berkeley.edu.]

  • UncertainOutcomes

  • Worst-Casevs.AverageCase

    10 10 9 100

    max

    min

    Idea:Uncertainoutcomescontrolledbychance,notanadversary!

  • Expec+maxSearch

    §  Whywouldn’tweknowwhattheresultofanac+onwillbe?§  Explicitrandomness:rollingdice§  Unpredictableopponents:theghostsrespondrandomly§  Ac+onscanfail:whenmovingarobot,wheelsmightslip

    §  Valuesshouldnowreflectaverage-case(expec+max)outcomes,notworst-case(minimax)outcomes

    §  Expec+maxsearch:computetheaveragescoreunderop+malplay§  Maxnodesasinminimaxsearch§  Chancenodesarelikeminnodesbuttheoutcomeisuncertain§  Calculatetheirexpectedu+li+es§  I.e.takeweightedaverage(expecta+on)ofchildren

    §  Later,we’lllearnhowtoformalizetheunderlyinguncertain-resultproblemsasMarkovDecisionProcesses

    10 4 5 7

    max

    chance

    10 10 9 100

    [Demo:minvsexp(L7D1,2)]

  • VideoofDemoMinvs.Exp(Min)

  • VideoofDemoMinvs.Exp(Exp)

  • Expec+maxPseudocode

    defvalue(state):ifthestateisaterminalstate:returnthestate’su+lityifthenextagentisMAX:returnmax-value(state)ifthenextagentisEXP:returnexp-value(state)

    defexp-value(state):ini+alizev=0foreachsuccessorofstate: p=probability(successor)v+=p*value(successor)

    returnv

    defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:

    v=max(v,value(successor))returnv

  • Expec+maxPseudocode

    defexp-value(state):ini+alizev=0foreachsuccessorofstate: p=probability(successor)v+=p*value(successor)

    returnv

    5 78 24 -12

    1/21/3

    1/6

    v=(1/2)(8)+(1/3)(24)+(1/6)(-12)=10

  • Expec+maxExample

    12 9 6 0 3 2 15 4 6

  • Expec+maxPruning?

    12 9 3 2

  • Depth-LimitedExpec+max

    492 362 …

    400 300Es+mateoftrueexpec+maxvalue(whichwouldrequirealotof

    worktocompute)

  • Probabili+es

  • Reminder:Probabili+es§  Arandomvariablerepresentsaneventwhoseoutcomeisunknown§  Aprobabilitydistribu+onisanassignmentofweightstooutcomes

    §  Example:Trafficonfreeway§  Randomvariable:T=whetherthere’straffic§  Outcomes:Tin{none,light,heavy}§  Distribu+on:P(T=none)=0.25,P(T=light)=0.50,P(T=heavy)=0.25

    §  Somelawsofprobability(morelater):§  Probabili+esarealwaysnon-nega+ve§  Probabili+esoverallpossibleoutcomessumtoone

    §  Aswegetmoreevidence,probabili+esmaychange:§  P(T=heavy)=0.25,P(T=heavy|Hour=8am)=0.60§  We’lltalkaboutmethodsforreasoningandupda+ngprobabili+eslater

    0.25

    0.50

    0.25

  • §  Theexpectedvalueofafunc+onofarandomvariableistheaverage,weightedbytheprobabilitydistribu+onoveroutcomes

    §  Example:Howlongtogettotheairport?

    Reminder:Expecta+ons

    0.25 0.50 0.25Probability:

    20min 30min 60minTime:35minx x x+ +

  • §  Inexpec+maxsearch,wehaveaprobabilis+cmodelofhowtheopponent(orenvironment)willbehaveinanystate§  Modelcouldbeasimpleuniformdistribu+on(rolladie)§  Modelcouldbesophis+catedandrequireagreatdealof

    computa+on§  Wehaveachancenodeforanyoutcomeoutofourcontrol:

    opponentorenvironment§  Themodelmightsaythatadversarialac+onsarelikely!

    §  Fornow,assumeeachchancenodemagicallycomesalongwithprobabili+esthatspecifythedistribu+onoveritsoutcomes

    WhatProbabili+estoUse?

    Havingaprobabilis.cbeliefaboutanotheragent’sac.ondoesnotmeanthattheagentisflippinganycoins!

  • Quiz:InformedProbabili+es

    §  Let’ssayyouknowthatyouropponentisactuallyrunningadepth2minimax,usingtheresult80%ofthe+me,andmovingrandomlyotherwise

    §  Ques+on:Whattreesearchshouldyouuse?

    0.10.9

    §  Answer:Expec+max!§  TofigureoutEACHchancenode’sprobabili+es,

    youhavetorunasimula+onofyouropponent§  Thiskindofthinggetsveryslowveryquickly§  Evenworseifyouhavetosimulateyour

    opponentsimula+ngyou…§  …exceptforminimax,whichhasthenice

    propertythatitallcollapsesintoonegametree

  • ModelingAssump+ons

  • TheDangersofOp+mismandPessimism

    DangerousOp+mismAssumingchancewhentheworldisadversarial

    DangerousPessimismAssumingtheworstcasewhenit’snotlikely

  • Assump+onsvs.Reality

    AdversarialGhost RandomGhost

    MinimaxPacman

    Expec+maxPacman

    [Demos:worldassump+ons(L7D3,4,5,6)]

    Resultsfromplaying5games

    Pacmanuseddepth4searchwithanevalfunc+onthatavoidstroubleGhostuseddepth2searchwithanevalfunc+onthatseeksPacman

    Won5/5

    Avg.Score:503

    Won5/5

    Avg.Score:493

    Won1/5

    Avg.Score:-303

    Won5/5

    Avg.Score:483

  • VideoofDemoWorldAssump+onsRandomGhost–Expec+maxPacman

  • VideoofDemoWorldAssump+onsAdversarialGhost–MinimaxPacman

  • VideoofDemoWorldAssump+onsAdversarialGhost–Expec+maxPacman

  • VideoofDemoWorldAssump+onsRandomGhost–MinimaxPacman

  • OtherGameTypes

  • MixedLayerTypes

    §  E.g.Backgammon§  Expec+minimax

    §  Environmentisanextra“randomagent”playerthatmovesaxereachmin/maxagent

    §  Eachnodecomputestheappropriatecombina+onofitschildren

  • Example:Backgammon

    §  Dicerollsincreaseb:21possiblerollswith2dice§  Backgammon≈20legalmoves§  Depth2=20x(21x20)3=1.2x109

    §  Asdepthincreases,probabilityofreachingagivensearchnodeshrinks§  Sousefulnessofsearchisdiminished§  Solimi+ngdepthislessdamaging§  Butpruningistrickier…

    §  HistoricAI:TDGammonusesdepth-2search+verygoodevalua+onfunc+on+reinforcementlearning:world-championlevelplay

    §  1stAIworldchampioninanygame!Image:Wikipedia

  • Mul+-AgentU+li+es

    §  Whatifthegameisnotzero-sum,orhasmul+pleplayers?

    §  Generaliza+onofminimax:§  Terminalshaveu+litytuples§  Nodevaluesarealsou+litytuples§  Eachplayermaximizesitsowncomponent§  Cangiverisetocoopera+onandcompe++ondynamically…

    1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5