Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Networks of Influence Diagrams:A Formalism for Representing Agents Beliefs and
Decision-Making Processes
By Ya’akov Gal and Avi Pfefferin JAIR (2008), AAAI08 Tutorial
Presented by Chenghui CaiDuke University, ECE
December 8, 2008
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Outline
I Introduction
I Single-agent decision-making− decision theory− influence diagrams
I Multi-agent decision-making− multi-agent influence diagrams− networks of influence diagrams
I NID application: RoShamBo
I NID Relationship with economic models
I Conclusions
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Introduction
I Goals: computer agents to make good decisions wheninteracting with− environments− other computer agents− people− networks of people and computers
I Challenges− large and uncertain environments− numerous and complex decisions− other decision makers (e.g., agents and people)
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Graphical Models
Graphical Models
I Can meet the challenges!− natural and compact representation of decision-makingunder uncertainty− decompose complex decision-making problems− support recursion and divide-and-conquer techniques
I Themes− representation: creating a probabilistic model of agents’decision-making processes− inference: computing strategies for agents
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Decision theory
I Basis: uncertainty (probability) + utility
I Example 1 Bob observes the tomorrow’s weather forecastfrom an expert before deciding whether to carry an umbrellato work tomorrow. Bob wishes to stay dry, but carrying anumbrella around is annoying.
Forecast
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Example 1
I Set A of actions: Umbrella UM = {y , n}I Set S of unobserved events: Weather
W = {sun, rain}I Set O of observations: Forecast
F = {sun, rain}I Probability distribution over: events P(s);
observations given events P(o|s)
I Utility function U maps from actions andevents,S ×A to real numbers R
W = sun W = rain
0.6 0.4
W F = sun F = rain
sun 0.7 0.3
rain 0.4 0.6
W UM U
sun y -10
sun n 100
rain y 100
rain n -10
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Choosing the Best Action a∗
I LetUa(Bob|s) be Bob’s reward for taking action a ∈ A afterevent s ∈ S has occurred.
I The expected utility for Bob after observing o ∈ O is:
EUa(Bob|o) =∑s∈S
P(s|o)× Ua(Bob|s) (1)
, where P(s|o) = P(s)P(o|s)/P(o).
I Optimal behavior – Given observation o, choose the bestaction a∗ that leads to the maximum expected utility
a∗ = argmaxa∈A
EUa(Bob|o) (2)
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Computing an Optimal Strategy for Bob
I A strategy for Bob must specify whether to take an umbrellafor any possible value of the forecast.
I Suppose that F = sun,• Marginal probabilityP(F = sun) = P(F = sun|W = sun)P(W = sun) + P(F =sun|W = rain)P(W = rain) = 0.7× 0.6 + 0.4× 0.4 = 0.58• Bayes rule P(W = sun|F = sun) = P(F = sun|W =sun)P(W = sun)/P(F = sun) = 0.72• P(W = rain|F = sun) = 0.28
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Computing an Optimal Strategy for Bob
I Suppose that F = sun,• P(F = sun) = 0.58,P(W = sun|F = sun) = 0.72,P(W = rain|F = sun) = 0.28• EUUM=y (Bob|F = sun) = P(W = sun|F =sun)× UUM=y (Bob|W = sun) + P(W = rain|F = sun)×UUM=y (Bob|W = rain) = 0.72× (−10) + 0.28× 100 = 20.8• EUUM=n(Bob|F = sun) = P(W = sun|F =sun)× UUM=n(Bob|W = sun) + P(W = rain|F = sun)×UUM=n(Bob|W = rain) = 0.72× (100) + 0.28× (−10) = 69.2• If F = sun,EUUM=n(Bob|F = sun) > EUUM=y (Bob|F = sun),thenUM = n for Bob
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Computing an Optimal Strategy for Bob
I Suppose that F = rain,• P(F = rain) = 0.42,P(W = sun|F = rain) = 0.43,P(W = rain|F = rain) = 0.57• EUUM=y (Bob|F = rain) = 52.7• EUUM=n(Bob|F = rain) = 37.3• If F = rain,EUUM=y (Bob|F = rain) > EUUM=n(Bob|F = rain), thenUM = y for Bob
I Strategy for Bob:F = sun F = rain
UM n y
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
Making Sequential Decisions, Extended Example 1
The newspaper forecast is morereliable, but costs money,decreasing Bob’s utility by 10units. Now two decisions:
I NP = {y , n}I UM = {y , n}
Choosing the best action forone decision depends on theaction for the other decision.How to weigh the tradeoffbetween these two decisions ?
W F = sun F = rain
sun 0.8 0.2
rain 0.2 0.8
W NP UM U
sun y y -20
sun y n 90
rain y y 90
rain y n -20
. . . . . . . . . . . .
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
decision theory
I When NP = y ,• Marginal probability PNP=y (F = sun) = PNP=y (F =sun|W = sun)× P(W = sun) + PNP=y (F = sun|W =rain)× P(W = rain) = 0.8× 0.6 + 0.2× 0.4 = 0.56• Bayes rule PNP=y (W = sun|F = sun) = PNP=y (F =sun|W = sun)× P(W = sun)/PNP=y (F = sun) = 0.86• PNP=y (W = rain|F = sun) = 0.14• Expected Utility EUNP=y ,UM=y (Bob|F = sun) =PNP=y (W = sun|F = sun)× UNP=y ,UM=y (Bob|W =sun) + P(W = rain|F = sun)× UNP=y ,UM=y (Bob|W =rain) = 0.86× (−20) + 0.14× 90 = −6.4• EUNP=y ,UM=n(Bob|F = sun) = 74.6
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Decision Tree
Decision Tree
NP
y n
F F PP
NP=y(F=sun)
UM UM
0.42
sun sun 0.56 0.44 0.58
rainrain
UM UM
y n y n y n y n
-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3
EUNP=y, UM=y(Bob|F=sun)
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Decision Tree
Solving Decision Tree – Backward Induction
NP
0.56*74.6+0.44*60.3 = 68.3
y
F F
0.58*69.2+0.42*52.7 = 62.3
n
PP
NP=y(F=sun)
UM UM
0.42
sun sun 0.56 0.44 0.58
rainrain 74.6 60.3 69.2 52.7
UM UM
-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3
y n y n y n y n
EUNP=y, UM=y(Bob|F=sun)
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Influence diagrams (ID)
ID
I ID: compact graphical andmathematical representation of adecision situation; probabilisticinference + decision making;maximize expected utility
I Rectangles are decisions; ovals arechance variables; diamonds areutility functions
I Each chance node specifies aprobability distribution (CPD)given each value of parents
W
F
U
UM
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Influence diagrams (ID)
ID
I Parents of decisions (informationalparents) represent observations
I Parents of chance nodes representprobabilistic dependence
I Parents of utility nodes represent theparameters of the utility functions
I A strategy for a decision is a functionfrom its informational parents to a choicefor the decision. For each observation, apure strategy prescribes a single choice ofaction for an agent
W
F
U
UM
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Influence diagrams (ID)
Influence Diagram for Example 1, Umbrella Scenario
W
F
U
UM
W = sun W = rain
0.6 0.4
W F = sun F = rain
sun 0.7 0.3
rain 0.4 0.6
W UM U
sun y -10
sun n 100
rain y 100
rain n -10
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Influence diagrams (ID)
Influence Diagram for Extended Umbrella Scenario
I “No forgetting” edgesadded from NP to UM
I Agents remember theirpast decisions when theymake future decisions
I Information available topast decisions is alsoavailable to futuredecisions
F
W
UM
U
NP
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Influence diagrams (ID)
Converting ID to Decision Tree: Extended UmbrellaExample
NP
y n
F F PP
NP=y(F=sun)
UM UM
0.42
sun sun 0.56 0.44 0.58
rainrain
UM UM
y n y n y n y n
-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3
EUNP=y, UM=y(Bob|F=sun)
F
W
UM
U
NP
Disadvantage : Lose the graph structure
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
multi-agent influence diagrams
Example 2 Proposer can offer some split of 3 coins to Responder.If Responder accepts, offer is enforced; if Responder rejects, bothreceive nothing. Offer may be corrupted and set to (1,2) split(proposer/responder) by noisy channel with 0.1 probability.
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
multi-agent influence diagrams
MAID [Milch and Koller IJCAI01]
I Extend Influence Diagrams to themulti-agent case
I Rectangles and diamonds representdecisions and utilities associated withagents, respectively; ovals representchance variables
I A strategy for a decision is a mappingfrom the informational parents of thedecision to a value in its domain
I A strategy profile includes strategies for alldecisions
An Ultimatum Game Example
Proposer can o$er some split of 3 coins to Responder. If Responder accepts, o$er is enforced; if Responder rejects, both receive nothing.
97
An Ultimatum Game Example
Proposer can o$er some split of 3 coins to Responder. If Responder accepts, o$er is enforced; if Responder rejects, both receive nothing. O$er may be corrupted and set to !1,2" split !proposer/responder" by noisy channel with 0.1 probability.
98
Multi%agent In'uence Diagrams ,Milch and Koller -01.
Extend In'uence Diagrams to the multi%agent case.
Rectangles and diamonds represent decisions and utilities associated with agents; ovals represent chance variables.
A strategy for a decision is a mapping from the informational parents of the decision to a value in its domain.
A strategy pro&le includes strategies for all decisions.
Proposer Responder
U(Proposer)
channel
U(Responder)
99
Multi%agent In'uence Diagrams
Extend In'uence Diagrams to the multi%agent case.
Rectangles and diamonds represent decisions and utilities associated with agents; ovals represent chance variables.
A strategy for a decision is a mapping from the informational parents of the decision to a value in its domain.
A strategy pro&le includes strategies for all decisions.
Channel
Proposer !0,3" !1,2" !2,1" !3,0"
!0,3" 0.9 0.1 0 0
!1,2" 0 1 0 0
!2,1" 0 0.1 0.9 0
!3,0" 0 0.1 0 0.9
Proposer Responder
U(Proposer)
channel
U(Responder)
100
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
multi-agent influence diagrams
Solving MAID by Converting MAID to Decision Tree
0.9 0.1 0.9 0.1 0.9 0.11
1
Solve Response and determine strategy for Response: accept anysplit larger than zeroSolve Proposer and Offer is the largest split for proposer that offersa positive share to responder
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
Traditional Game Theory Limitations
Game Theory Assumptions Real World Agents
rational maybe irrational
Common knowledge of Uncertain about game,game structure other’s strategies
Agents’ beliefs correct/consistent Agents’ belief might incorrect
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
What we need
Language for representing uncertainty over decision making mustallow for
I distinction between agents’ models of each other and howthey actually behave
I incorrect/inconsistent beliefs; using heuristics
I representation of belief hierarchies, e.g., “I believe that youbelieve that . . . ”
I framework for learning
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
To motivate single-agent NID, considerExample 3 Bob observes the forecast before deciding whether totake an umbrella when leaving the house. In reality, forecasters arequite trustworthy. We wish to model the fact that Bob is lesstrusting of forecaster than he should be. What is Bobs strategygiven his wrong belief about forecasters?
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
Top-level block Bob’s block
Utility Utility
Bob’s
Mod [UM] = Bob′sBlock, means Bob may be using Bob’s block tocompute strategy to make decision UMEdge represents Bob (agent) at Top-Level block (source block)modeling decision UM as being made in Bob’s block (target block)
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
I A NID is a directed, possibly cyclicgraph, in which each node is aMAID.
I Call the nodes of a NID blocks.They are different mental models.A mental model for an agent mayitself use descriptions of themental models of other agents.
I Let D be a decision belonging toagent α in block K , and let β beany agent. (In particular, β maybe agent α itself.)
Top-level block Bob’s block
Utility Utility
Bob’s
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
networks of influence diagrams
I Introduce a new type of node,denoted by Mod [β,D] withvalues that range over eachblock L in NID. WhenMod [β,D] = L, β believesthat α may be using thestrategy computed in block Lto make decision D
I A Mod node is a chance nodejust like any other; it mayinfluence, or be influenced byother nodes of K
I Solving NID by converting toMAID
Top-level block Bob’s block
Utility Utility
Bob’s
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Rock Paper Scissors Competition: Multi-agent Case
In opponent modeling, agents try to learn the patterns exhibited byother players and react to their model of others and thus do better.Example 4 In the game of RoShamBo (commonly referred to asRock-Paper-Scissors), Alice and Bob simultaneously choose one ofrock, paper, or scissors. If they choose the same item, the result isa tie; otherwise rock crushes scissors, paper covers rock, or scissorscut paper, as shown in the table
Gal & Pfeffer
Speed
fast slow
0 1(a) node Speed(block Top-level)
Speed
fast slow
1 0(b) node Speed(block L)
Mod[Bob, Steal]Top-level L
1 0(c) nodeMod[Bob, Steal](block Top-level)
Mod[Alice, PitchOut]Top-level L
1 0(d) nodeMod[Alice, PitchOut](block L)
Table 9: CPDs for nodes in Cyclic NID (Example 4.5)
Speed
Steal PitchOutThrownOut
Mod[Alice, PitchOut]
BobAlice
(a) Block L
Speed
Steal PitchOut
ThrownOut
Mod[Bob, Steal]
BobAlice
(b) Block Top-level
Top level
L
Bob,STEAL Alice,PITCHOUT
(c) Cyclic NID
Figure 9: Cyclic Baseball Scenario (Example 4.5)
rock paper scissors
rock (0, 0) (−1, 1) (1,−1)paper (1,−1) (0, 0) (−1, 1)scissors (−1, 1) (1,−1) (0, 0)
Table 10: Payoff Matrix for Rock-paper-scissors
134
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Bob’s reasoning
Alice and Bob are playing rounds of rock-paper-scissors. Supposethere exists a signal S that depends on prior history.
Strategy for Bob Strategy for Alice
S = paper (e.g.) BR(S) =scissors BR(BR(S))=rock
BR(BR(BR(S)))=paper BR(BR(BR(BR(S))))=scissors
BR(..(S)..))=rock Modeling double guess, triple guess, like “I think that you think. . . ”
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
NID
1
K1Alice Automaton
Bob modeling Alice
Bpaperrock
1
I Nodes in NIDs are called blocks. Each block represent aseparate decision-making process
I An edge represents an agent at the source block modeling adecision as being made in the target block.
I The edge leads from the modeled decision to the target blockand is labeled with the modeling agent
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Alice’s double guessing Bob
1
K1
K2
Alice Automaton
Bob modeling Alice
Alice modeling Bob
B
A
scissors
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Bob’s double guessing Alice
1
K1
K2
Alice Automaton
Bob modeling Alice
Alice modeling BobBob modeling Alice
B
A
Brock
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Alice’s triple guessing Bob
K3K1
K2
Alice Automaton
Bob modeling Alice
Alice modeling BobBob modeling Alice
Alice modeling Bob
BB
A
B
A
paperrock
scissorsrock
paper
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
RoShamBo NID
K3K1
K2
TL
Alice Automaton
Bob modeling Alice
Alice modeling BobBob modeling Alice
Alice modeling Bob
B
A
B
A
B
B
B
paperrock
scissors
paper
rock
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Opponent Modeling
Empirical Evaluation
Pick nine top contestants from the first automatic RoShamBoCompetition; 3000 rounds with each contestant; +1 for winning around, -1 for losing one).
Networks of Influence Diagrams
0 1 2 3 4 5 6 7 8 9 100
50
100
150
200
250
300
350
400
Contestant
Ave
rage
Sco
re D
iffer
ence
Opponent type NumberIocaine Powder 1
Probabilistic, Pattern, Exploitative 2, 9Deterministic, Pattern, Exploitative 3, 6, 5
Probabilistic, Meta, Exploitative 1, 4Probabilistic, Pattern, Exploitative 7, 8
Figure 11: Difference in average outcomes between NID player and opponents
In a Bayesian game, each agent has a discrete type embodying its private information.Let N be a set of agents. For each agent i a Bayesian game includes a set of possible typesTi, a set of possible actions Ci, a conditional distribution pi and a utility function ui. LetT = ×i∈NTi and let C = ×i∈NCi. For each agent i, let T−i = ×j �=iTj denote the set ofall possible types other than those of agent i. The probability distribution pi is a functionfrom ti to ΔT−i, that is, pi(.|ti) specifies for each type ti ∈ Ti a joint distribution overthe types of the other agents. The utility function ui is a function from C×T to the realnumbers. It is a standard assumption that the game, including agents’ strategies, utilitiesand type distributions, is common knowledge to all agents.
The solution concept most commonly associated with Bayesian games is a BayesianNash equilibrium. This equilibrium maps each type to a mixed strategy over its actionsthat is the agent’s best response to the strategies of the other agents, given its beliefs abouttheir types. Notice that in a Bayesian game, an agent’s action can depend on its own typesbut not on the types of the other agents, because they are unknown to that agent when itanalyzes the game. It is assumed that each agent knows its own type, and that this typesubsumes all of the agent’s private information before the game begins. Because the typesof other agents are unknown, each agent maximizes its expected utility given its distributionover other types.
Let N−i denote all of the agents in the Bayesian game apart from agent i. Let σi(.|ti)denote a random strategy for agent i given that its type is ti. A Bayesian Nash equilibrium
139
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Conclusions - 1
Conclusions:
I Building blocks of NIDs are MAIDs
I In NIDs, each mental model itself is a graphical model of agame.
I Agent in one mental model may believe that another agent(or possibly itself) uses a different mental model to makedecisions
I Relationship between NID and Bayesian games: they areequally expressive, but NIDs may be exponentially morecompact
I NIDs can describe agents who play irrationally, representplayers’ inconsistent and/or incorrect beliefs “I believe thatyou believe” type reasoning
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Conclusions - 2
Conclusions:
I NID can be used to learn non-stationary strategies inrock-paper-scissors
I Models inspired by NIDs can learn people’s play in negotiationgames
I Focus of our continuing work will be to develop a generalmethod for learning models in NIDs
Chenghui’s Remark: something like Dynamic NID to representmultiagent sequential decision process or multiagent POMDP?
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes
Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions
Conclusions - 2
NID Converted to MAIDConclusions:
I Any NID can be converted to a MAIDI But MAID is hard to construct directly
By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:
Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes