Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
CSE473:Ar+ficialIntelligenceUncertaintyandExpec+maxTreeSearch
Instructors:LukeZeDlemoyer
UniveristyofWashington[TheseslideswereadaptedfromDanKleinandPieterAbbeelforCS188IntrotoAIatUCBerkeley.AllCS188materialsareavailableathDp://ai.berkeley.edu.]
UncertainOutcomes
Worst-Casevs.AverageCase
10 10 9 100
max
min
Idea:Uncertainoutcomescontrolledbychance,notanadversary!
Expec+maxSearch
§ Whywouldn’tweknowwhattheresultofanac+onwillbe?§ Explicitrandomness:rollingdice§ Unpredictableopponents:theghostsrespondrandomly§ Ac+onscanfail:whenmovingarobot,wheelsmightslip
§ Valuesshouldnowreflectaverage-case(expec+max)outcomes,notworst-case(minimax)outcomes
§ Expec+maxsearch:computetheaveragescoreunderop+malplay§ Maxnodesasinminimaxsearch§ Chancenodesarelikeminnodesbuttheoutcomeisuncertain§ Calculatetheirexpectedu+li+es§ I.e.takeweightedaverage(expecta+on)ofchildren
§ Later,we’lllearnhowtoformalizetheunderlyinguncertain-resultproblemsasMarkovDecisionProcesses
10 4 5 7
max
chance
10 10 9 100
[Demo:minvsexp(L7D1,2)]
VideoofDemoMinvs.Exp(Min)
VideoofDemoMinvs.Exp(Exp)
Expec+maxPseudocode
defvalue(state):ifthestateisaterminalstate:returnthestate’su+lityifthenextagentisMAX:returnmax-value(state)ifthenextagentisEXP:returnexp-value(state)
defexp-value(state):ini+alizev=0foreachsuccessorofstate: p=probability(successor)v+=p*value(successor)
returnv
defmax-value(state):ini+alizev=-∞ foreachsuccessorofstate:
v=max(v,value(successor))returnv
Expec+maxPseudocode
defexp-value(state):ini+alizev=0foreachsuccessorofstate: p=probability(successor)v+=p*value(successor)
returnv
5 78 24 -12
1/21/3
1/6
v=(1/2)(8)+(1/3)(24)+(1/6)(-12)=10
Expec+maxExample
12 9 6 0 3 2 15 4 6
Expec+maxPruning?
12 9 3 2
Depth-LimitedExpec+max
…
…
492 362 …
400 300Es+mateoftrueexpec+maxvalue(whichwouldrequirealotof
worktocompute)
Probabili+es
Reminder:Probabili+es§ Arandomvariablerepresentsaneventwhoseoutcomeisunknown§ Aprobabilitydistribu+onisanassignmentofweightstooutcomes
§ Example:Trafficonfreeway§ Randomvariable:T=whetherthere’straffic§ Outcomes:Tin{none,light,heavy}§ Distribu+on:P(T=none)=0.25,P(T=light)=0.50,P(T=heavy)=0.25
§ Somelawsofprobability(morelater):§ Probabili+esarealwaysnon-nega+ve§ Probabili+esoverallpossibleoutcomessumtoone
§ Aswegetmoreevidence,probabili+esmaychange:§ P(T=heavy)=0.25,P(T=heavy|Hour=8am)=0.60§ We’lltalkaboutmethodsforreasoningandupda+ngprobabili+eslater
0.25
0.50
0.25
§ Theexpectedvalueofafunc+onofarandomvariableistheaverage,weightedbytheprobabilitydistribu+onoveroutcomes
§ Example:Howlongtogettotheairport?
Reminder:Expecta+ons
0.25 0.50 0.25Probability:
20min 30min 60minTime:35minx x x+ +
§ Inexpec+maxsearch,wehaveaprobabilis+cmodelofhowtheopponent(orenvironment)willbehaveinanystate§ Modelcouldbeasimpleuniformdistribu+on(rolladie)§ Modelcouldbesophis+catedandrequireagreatdealof
computa+on§ Wehaveachancenodeforanyoutcomeoutofourcontrol:
opponentorenvironment§ Themodelmightsaythatadversarialac+onsarelikely!
§ Fornow,assumeeachchancenodemagicallycomesalongwithprobabili+esthatspecifythedistribu+onoveritsoutcomes
WhatProbabili+estoUse?
Havingaprobabilis.cbeliefaboutanotheragent’sac.ondoesnotmeanthattheagentisflippinganycoins!
Quiz:InformedProbabili+es
§ Let’ssayyouknowthatyouropponentisactuallyrunningadepth2minimax,usingtheresult80%ofthe+me,andmovingrandomlyotherwise
§ Ques+on:Whattreesearchshouldyouuse?
0.10.9
§ Answer:Expec+max!§ TofigureoutEACHchancenode’sprobabili+es,
youhavetorunasimula+onofyouropponent§ Thiskindofthinggetsveryslowveryquickly§ Evenworseifyouhavetosimulateyour
opponentsimula+ngyou…§ …exceptforminimax,whichhasthenice
propertythatitallcollapsesintoonegametree
ModelingAssump+ons
TheDangersofOp+mismandPessimism
DangerousOp+mismAssumingchancewhentheworldisadversarial
DangerousPessimismAssumingtheworstcasewhenit’snotlikely
Assump+onsvs.Reality
AdversarialGhost RandomGhost
MinimaxPacman
Expec+maxPacman
[Demos:worldassump+ons(L7D3,4,5,6)]
Resultsfromplaying5games
Pacmanuseddepth4searchwithanevalfunc+onthatavoidstroubleGhostuseddepth2searchwithanevalfunc+onthatseeksPacman
Won5/5
Avg.Score:503
Won5/5
Avg.Score:493
Won1/5
Avg.Score:-303
Won5/5
Avg.Score:483
VideoofDemoWorldAssump+onsRandomGhost–Expec+maxPacman
VideoofDemoWorldAssump+onsAdversarialGhost–MinimaxPacman
VideoofDemoWorldAssump+onsAdversarialGhost–Expec+maxPacman
VideoofDemoWorldAssump+onsRandomGhost–MinimaxPacman
OtherGameTypes
MixedLayerTypes
§ E.g.Backgammon§ Expec+minimax
§ Environmentisanextra“randomagent”playerthatmovesaxereachmin/maxagent
§ Eachnodecomputestheappropriatecombina+onofitschildren
Example:Backgammon
§ Dicerollsincreaseb:21possiblerollswith2dice§ Backgammon≈20legalmoves§ Depth2=20x(21x20)3=1.2x109
§ Asdepthincreases,probabilityofreachingagivensearchnodeshrinks§ Sousefulnessofsearchisdiminished§ Solimi+ngdepthislessdamaging§ Butpruningistrickier…
§ HistoricAI:TDGammonusesdepth-2search+verygoodevalua+onfunc+on+reinforcementlearning:world-championlevelplay
§ 1stAIworldchampioninanygame!Image:Wikipedia
Mul+-AgentU+li+es
§ Whatifthegameisnotzero-sum,orhasmul+pleplayers?
§ Generaliza+onofminimax:§ Terminalshaveu+litytuples§ Nodevaluesarealsou+litytuples§ Eachplayermaximizesitsowncomponent§ Cangiverisetocoopera+onandcompe++ondynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5