AiAi RiRi OiOi S ID for decision making where state may be
partially observable
Slide 7
How do we generalize IDs to multiagent settings?
Slide 8
Adversarial tiger problem
Slide 9
Multiagent influence diagram (MAID) (Koller&Milch01) MAIDs
offer a richer representation for a game and may be transformed
into a normal- or extensive-form game A strategy of an agent is an
assignment of a decision rule to every decision node of that agent
Open or Listen i RjRj Growl i Tiger loc Open or Listen j Growl j
RiRi
Slide 10
Expected utility of a strategy profile to agent i is the sum of
the expected utilities at each of is decision node A strategy
profile is in Nash equilibrium if each agents strategy in the
profile is optimal given others strategies Open or Listen i RjRj
Growl i Tiger loc Open or Listen j Growl j RiRi
Slide 11
Strategic relevance Consider two strategy profiles which differ
in the decision rule at D only. A decision node, D, strategically
relies on another, D, if D s decision rule does not remain optimal
in both profiles.
Slide 12
Is there a way of finding all decision nodes that are
strategically relevant to D using the graphical structure? Yes,
s-reachability Analogous to d-separation for determining
conditional independence in BNs
Slide 13
Evaluating whether a decision rule at D is optimal in a given
strategy profile involves removing decision nodes that are not
s-relevant to D and transforming the decision and utility nodes
into chance nodes Open or Listen i RjRj Growl i Tiger loc Open or
Listen j Growl j RiRi
Slide 14
What if the agents are using differing models of the same game
to make decisions, or are uncertain about the mental models others
are using?
Slide 15
Let agent i believe with probability, p, that j will listen and
with 1- p that j will do the best response decision Analogously, j
believes that i will open a door with probability q, otherwise play
the best response Open or Listen i RjRj Growl i Tiger loc Open or
Listen j Growl j RiRi
Slide 16
Network of ID (NID) Let agent i believe with probability, p,
that j will likely listen and with 1- p that j will do the best
response decision Analogously, j believes that i will mostly open a
door with probability q, otherwise play the best response Listen
Open LOLOR 0.90.05 LOLOR 0.10.45 Block L Block O Top-level
ListenOpen q p (Gal&Pfeffer08)
Slide 17
Let agent i believe with probability, p, that j will likely
listen and with 1- p that j will do the best response decision
Analogously, j believes that i will mostly open a door with
probability q, otherwise play the best response Open or Listen i
RjRj Growl i Tiger loc Open or Listen j Growl j RiRi Top-level
Block -- MAID
Slide 18
MAID representation for the NID BR[i] TL R TL j Growl TL i
Tiger loc TL BR[j] TL Growl TL j R TL i Mod[j; D i ] Open O Open or
Listen TL i Mod[i; D j ] Listen L Open or Listen TL j
Slide 19
MAIDs and NIDs Rich languages for games based on IDs that
models problem structure by exploiting conditional
independence
Slide 20
MAIDs and NIDs Focus is on computing equilibrium, which does
not allow for best response to a distribution of non-equilibrium
behaviors Do not model dynamic games
Slide 21
Generalize IDs to dynamic interactions in multiagent
settings
Slide 22
Challenge: Other agents could be updating beliefs and changing
strategies
Slide 23
Model node: M j, l -1 models of agent j at level l -1 Policy
link: dashed arrow Distribution over the other agents actions given
its models Belief on M j, l -1 : Pr(M j, l -1 |s) Open or Listen i
RiRi Growl i Tiger loc i Open or Listen j M j, l -1 Level l
I-ID
Slide 24
Members of the model node Different chance nodes are solutions
of models m j, l -1 Mod[M j ] represents the different models of
agent j Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j, l -1 1 m j, l -1 2
Open or Listen j m j, l -1 1, m j, l -1 2 could be I-IDs, IDs or
simple distributions
Slide 25
CPT of the chance node A j is a multiplexer Assumes the
distribution of each of the action nodes (A j 1, A j 2 ) depending
on the value of Mod[M j ] Mod[M j ] Aj1Aj1 Aj2Aj2 M j, l -1 S m j,
l -1 1 m j, l -1 2 AjAj
Slide 26
Could I-IDs be extended over time? We must address the
challenge
Slide 27
A i t+1 RiRi O i t+1 S t+1 A j t+1 M j, l -1 t+1 AitAit RiRi
OitOit StSt AjtAjt M j, l -1 t Model update link
Slide 28
Interactive dynamic influence diagram (I-DID)
Slide 29
How do we implement the model update link?
Slide 30
m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1
AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1
m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1
t+1,3 m j,l-1 t+1,4
Slide 31
m j,l-1 t,2 Mod[M j t ] Aj1Aj1 M j, l -1 t stst m j,l-1 t,1
AjtAjt Aj2Aj2 Oj1Oj1 Oj2Oj2 OjOj Mod[M j t+1 ] Aj1Aj1 M j, l -1 t+1
m j,l-1 t+1,1 m j,l-1 t+1,2 A j t+1 Aj2Aj2 Aj3Aj3 Aj4Aj4 m j,l-1
t+1,3 m j,l-1 t+1,4 These models differ in their initial beliefs,
each of which is the result of j updating its beliefs due to its
actions and possible observations
Slide 32
Recap
Slide 33
Prashant Doshi, Yifeng Zeng and Qiongyu Chen, Graphical Models
for Interactive POMDPs: Representations and Solutions, Journal of
AAMAS, 18(3):376-416, 2009 Daphne Koller and Brian Milch,
Multi-Agent Influence Diagrams for Representing and Solving Games,
Games and Economic Behavior, 45(1):181- 221, 2003 Yaakov Gal and
Avi Pfeffer, Networks of Influence Diagrams: A Formalism for
Representing Agents Beliefs and Decision-Making Processes,Journal
of AI Research, 33:109-147, 2008
Slide 34
How large is the behavioral model space?
Slide 35
General definition A mapping from the agents history of
observations to its actions
Slide 36
How large is the behavioral model space? 2H (Aj)2H (Aj)
Uncountably infinite
Slide 37
How large is the behavioral model space? Lets assume computable
models Countable A very large portion of the model space is not
computable!
Slide 38
Daniel Dennett Philosopher and Cognitive Scientist Intentional
stance Ascribe beliefs, preferences and intent to explain others
actions (analogous to theory of mind - ToM)
Slide 39
Organize the mental models Intentional models Subintentional
models
Slide 40
Organize the mental models Intentional models E.g., POMDP = b
j, A j, T j, j, O j, R j, OC j (using DIDs) BDI, ToM Subintentional
models Frame (may give rise to recursive modeling)
Slide 41
Organize the mental models Intentional models E.g., POMDP = b
j, A j, T j, j, O j, R j, OC j (using DIDs) BDI, ToM Subintentional
models E.g., (A j ), finite state controller, plan Frame
Slide 42
Finite model space grows as the interaction progresses
Slide 43
Growth in the model space Other agent may receive any one of |
j | observations |M j | |M j || j | |M j || j | 2 ... |M j || j | t
012t
Slide 44
Growth in the model space Exponential
Slide 45
General model space is large and grows exponentially as the
interaction progresses
Slide 46
It would be great if we can compress this space! No loss in
value to the modeler Flexible loss in value for greater compression
Lossless Lossy
Slide 47
Expansive usefulness of model space compression to many areas:
1.Sequential decision making in multiagent settings using I-DIDs
2.Bayesian plan recognition 3.Games of imperfect information
Slide 48
General and domain-independent approach for compression
Establish equivalence relations that partition the model space and
retain representative models from each equivalence class
Slide 49
Approach #1: Behavioral equivalence (Rathanasabapathy et
al.06,Pynadath&Marsella07) Intentional models whose complete
solutions are identical are considered equivalent
Slide 50
Approach #1: Behavioral equivalence Behaviorally minimal set of
models
Slide 51
Lossless Works when intentional models have differing frames
Approach #1: Behavioral equivalence
Slide 52
Multiagent tiger Approach #1: Behavioral equivalence Impact on
I-DIDs in multiagent settings Multiagent tiger Multiagent MM
Slide 53
Utilize model solutions (policy trees) for mitigating model
growth Approach #1: Behavioral equivalence Model reps that are not
BE may become BE next step onwards Preemptively identify such
models and do not update all of them
Slide 54
Thank you for your time
Slide 55
Intentional models whose partial depth-d solutions are
identical and vectors of updated beliefs at the leaves of the
partial trees are identical are considered equivalent Approach #2:
Revisit BE (Zeng et al.11,12) Sufficient but not necessary Lossless
if frames are identical
Slide 56
Approach #2: ( ,d)-Behavioral equivalence Two models are (
,d)-BE if their partial depth-d solutions are identical and vectors
of updated beliefs at the leaves of the partial trees differ by
Models are (0.33,1)-BE Lossy
Slide 57
Approach #2: -Behavioral equivalence Lemma
(Boyen&Koller98): KL divergence between two distributions in a
discrete Markov stochastic process reduces or remains the same
after a transition, with the mixing rate acting as a discount
factor Mixing rate represents the minimal amount by which the
posterior distributions agree with each other after one transition
Property of a problem and may be pre-computed
Slide 58
Given the mixing rate and a bound, , on the divergence between
two belief vectors, lemma allows computing the depth, d, at which
the bound is reached Approach #2: -Behavioral equivalence Compare
two solutions up to depth d for equality
Slide 59
Discount factor F = 0.5 Multiagent Concert Approach #2:
-Behavioral equivalence Impact on dt-planning in multiagent
settings Multiagent Concert On a UAV reconnaissance problem in a
5x5 grid, allows the solution to scale to a 10 step look ahead in
20 minutes
Slide 60
What is the value of d when some problems exhibit F with a
value of 0 or 1? Approach #2: -Behavioral equivalence F =1 implies
that the KL divergence is 0 after one step: Set d = 1 F =0 implies
that the KL divergence does not reduce: Arbitrarily set d to the
horizon
Slide 61
Intentional or subintentional models whose predictions at time
step t (action distributions) are identical are considered
equivalent at t Approach #3: Action equivalence (Zeng et
al.09,12)
Slide 62
Approach #3: Action equivalence
Slide 63
Lossy Works when intentional models have differing frames
Approach #3: Action equivalence
Slide 64
Impact on dt-planning in multiagent settings Multiagent tiger
AE bounds the model space at each time step to the number of
distinct actions
Slide 65
Intentional or subintentional models whose predictions at time
step t influence the subject agents plan identically are considered
equivalent at t Regardless of whether the other agent opened the
left or right door, the tiger resets thereby affecting the agents
plan identically Approach #4: Influence equivalence (related to
Witwicki&Durfee11)
Slide 66
Influence may be measured as the change in the subject agents
belief due to the action Approach #4: Influence equivalence Group
more models at time step t compared to AE Lossy
Slide 67
Compression due to approximate equivalence may violate ACC
Regain ACC by appending a covering model to the compressed set of
representatives
Slide 68
Open questions
Slide 69
N > 2 agents Under what conditions could equivalent models
belonging to different agents be grouped together into an
equivalence class?
Slide 70
Can we avoid solving models by using heuristics for identifying
approximately equivalent models?
Slide 71
Modeling Strategic Human Intent
Slide 72
Yifeng Zeng Reader, Teesside Univ. Previously: Assoc Prof.,
Aalborg Univ. Yingke Chen Doctoral student Hua Mao Doctoral student
Muthu Chandrasekaran Doctoral student Xia Qu Doctoral student Roi
Ceren Doctoral student Matthew Meisel Doctoral student Adam Goodie
Professor of Psychology, UGA
Slide 73
Computational modeling of human recursive thinking in
sequential games Computational modeling of probability judgment in
stochastic games
Slide 74
Human strategic reasoning is generally hobbled by low levels of
recursive thinking (Stahl&Wilson95,Hedden&Zhang02,Camerer
et al.04,Ficici&Pfeffer08) (I think what you think that I
think...)
Slide 75
You are Player I and II is human. Will you move or stay? Move
Stay Payoff for I: Payoff for II: 3131 1313 2424 4242 IIII Player
to move:
Slide 76
Less than 40% of the sample population performed the rational
action!
Slide 77
Thinking about how others think (...) is hard in general
contexts
Slide 78
Move Stay Payoff for I: (Payoff for II is 1 decimal) 0.6
0.40.20.8 IIII Player to move:
Slide 79
About 70% of the sample population performed the rational
action in this simpler and strictly competitive game
Slide 80
Simplicity, competitiveness and embedding the task in intuitive
representations seem to facilitate human reasoning (Flobbe et
al.08, Meijering et al.11, Goodie et al.12)
Slide 81
3-stage game Myopic opponents default to staying (level 0)
while predictive opponents think about the players decision (level
1)
Slide 82
Can we computationally model these strategic behaviors using
process models?
Slide 83
Yes! Using a parameterized Interactive POMDP framework
Slide 84
Replace I-POMDPs normative Bayesian belief update with Bayesian
learning that underweights evidence, parameterized by Notice that
the achievement score increases as more games are played indicating
learning of the opponent models Learning is slow and partial
Slide 85
Replace I-POMDPs normative expected utility maximization with
quantal response model that selects actions proportional to their
utilities, parameterized by Notice the presence of rationality
errors in the participants choices (action is inconsistent with
prediction) Errors appear to reduce with time
Slide 86
Underweighting evidence during learning and quantal response
for choice have prior psychological support
Slide 87
Use participants predictions of others action to learn and
participants actions to learn
Slide 88
Use participants actions to learn both and Let vary
linearly
Slide 89
Insights revealed by process modeling: 1.Much evidence that
participants did not make rote use of BI, instead engaged in
recursive thinking 2.Rationality errors cannot be ignored when
modeling human decision making and they may vary 3.Evidence that
participants could be attributing surprising observations of others
actions to their rationality errors
Slide 90
Open questions: 1.What is the impact on strategic thinking if
action outcomes are uncertain? 2.Is there a damping effect on
reasoning levels if participants need to concomitantly think ahead
in time
Slide 91
Suite of general and domain-independent approaches for
compressing agent model spaces based on equivalence Computational
modeling of human behavioral data pertaining to strategic
thinking
Slide 92
2. Bayesian plan recognition under uncertainty Plan recognition
literature has paid scant attention to finding general ways of
reducing the set of feasible plans (Carberry, 01)
Slide 93
3. Games of imperfect information (Bayesian games) Real-world
applications often involve many player types Examples Ad hoc
coordination in a spontaneous team Automated Poker player
agent
Slide 94
3. Games of imperfect information (Bayesian games) Real-world
applications often involve many player types Model space
compression facilitates equilibrium computation