Networks of Influence Diagrams: A Formalism for

Outline Introduction Single-agent decision-making Multi-agent decision-making Application Conclusions

Networks of Influence Diagrams:A Formalism for Representing Agents Beliefs and

Decision-Making Processes

By Ya’akov Gal and Avi Pfefferin JAIR (2008), AAAI08 Tutorial

Presented by Chenghui CaiDuke University, ECE

December 8, 2008

By Ya’akov Gal and Avi Pfeffer in JAIR (2008), AAAI08 Tutorial Presented by Chenghui Cai Duke University, ECE:

Networks of Influence Diagrams: A Formalism for Representing Agents Beliefs and Decision-Making Processes


Outline

I Introduction

I Single-agent decision-making− decision theory− influence diagrams

I Multi-agent decision-making− multi-agent influence diagrams− networks of influence diagrams

I NID application: RoShamBo

I NID Relationship with economic models

I Conclusions




Introduction

I Goals: computer agents to make good decisions wheninteracting with− environments− other computer agents− people− networks of people and computers

I Challenges− large and uncertain environments− numerous and complex decisions− other decision makers (e.g., agents and people)




Graphical Models

Graphical Models

I Can meet the challenges!− natural and compact representation of decision-makingunder uncertainty− decompose complex decision-making problems− support recursion and divide-and-conquer techniques

I Themes− representation: creating a probabilistic model of agents’decision-making processes− inference: computing strategies for agents




decision theory

Decision theory

I Basis: uncertainty (probability) + utility

I Example 1 Bob observes the tomorrow’s weather forecastfrom an expert before deciding whether to carry an umbrellato work tomorrow. Bob wishes to stay dry, but carrying anumbrella around is annoying.

Forecast




decision theory

Example 1

I Set A of actions: Umbrella UM = {y , n}I Set S of unobserved events: Weather

W = {sun, rain}I Set O of observations: Forecast

F = {sun, rain}I Probability distribution over: events P(s);

observations given events P(o|s)

I Utility function U maps from actions andevents,S ×A to real numbers R

W = sun W = rain

0.6 0.4

W F = sun F = rain

sun 0.7 0.3

rain 0.4 0.6

W UM U

sun y -10

sun n 100

rain y 100

rain n -10




decision theory

Choosing the Best Action a∗

I LetUa(Bob|s) be Bob’s reward for taking action a ∈ A afterevent s ∈ S has occurred.

I The expected utility for Bob after observing o ∈ O is:

EUa(Bob|o) =∑s∈S

P(s|o)× Ua(Bob|s) (1)

, where P(s|o) = P(s)P(o|s)/P(o).

I Optimal behavior – Given observation o, choose the bestaction a∗ that leads to the maximum expected utility

a∗ = argmaxa∈A

EUa(Bob|o) (2)




decision theory

Computing an Optimal Strategy for Bob

I A strategy for Bob must specify whether to take an umbrellafor any possible value of the forecast.

I Suppose that F = sun,• Marginal probabilityP(F = sun) = P(F = sun|W = sun)P(W = sun) + P(F =sun|W = rain)P(W = rain) = 0.7× 0.6 + 0.4× 0.4 = 0.58• Bayes rule P(W = sun|F = sun) = P(F = sun|W =sun)P(W = sun)/P(F = sun) = 0.72• P(W = rain|F = sun) = 0.28




decision theory


I Suppose that F = sun,• P(F = sun) = 0.58,P(W = sun|F = sun) = 0.72,P(W = rain|F = sun) = 0.28• EUUM=y (Bob|F = sun) = P(W = sun|F =sun)× UUM=y (Bob|W = sun) + P(W = rain|F = sun)×UUM=y (Bob|W = rain) = 0.72× (−10) + 0.28× 100 = 20.8• EUUM=n(Bob|F = sun) = P(W = sun|F =sun)× UUM=n(Bob|W = sun) + P(W = rain|F = sun)×UUM=n(Bob|W = rain) = 0.72× (100) + 0.28× (−10) = 69.2• If F = sun,EUUM=n(Bob|F = sun) > EUUM=y (Bob|F = sun),thenUM = n for Bob




decision theory


I Suppose that F = rain,• P(F = rain) = 0.42,P(W = sun|F = rain) = 0.43,P(W = rain|F = rain) = 0.57• EUUM=y (Bob|F = rain) = 52.7• EUUM=n(Bob|F = rain) = 37.3• If F = rain,EUUM=y (Bob|F = rain) > EUUM=n(Bob|F = rain), thenUM = y for Bob

I Strategy for Bob:F = sun F = rain

UM n y




decision theory

Making Sequential Decisions, Extended Example 1

The newspaper forecast is morereliable, but costs money,decreasing Bob’s utility by 10units. Now two decisions:

I NP = {y , n}I UM = {y , n}

Choosing the best action forone decision depends on theaction for the other decision.How to weigh the tradeoffbetween these two decisions ?

W F = sun F = rain

sun 0.8 0.2

rain 0.2 0.8

W NP UM U

sun y y -20

sun y n 90

rain y y 90

rain y n -20

. . . . . . . . . . . .




decision theory

I When NP = y ,• Marginal probability PNP=y (F = sun) = PNP=y (F =sun|W = sun)× P(W = sun) + PNP=y (F = sun|W =rain)× P(W = rain) = 0.8× 0.6 + 0.2× 0.4 = 0.56• Bayes rule PNP=y (W = sun|F = sun) = PNP=y (F =sun|W = sun)× P(W = sun)/PNP=y (F = sun) = 0.86• PNP=y (W = rain|F = sun) = 0.14• Expected Utility EUNP=y ,UM=y (Bob|F = sun) =PNP=y (W = sun|F = sun)× UNP=y ,UM=y (Bob|W =sun) + P(W = rain|F = sun)× UNP=y ,UM=y (Bob|W =rain) = 0.86× (−20) + 0.14× 90 = −6.4• EUNP=y ,UM=n(Bob|F = sun) = 74.6




Decision Tree

Decision Tree

NP

y n

F F PP

NP=y(F=sun)

UM UM

0.42

sun sun 0.56 0.44 0.58

rainrain

UM UM

y n y n y n y n

-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3

EUNP=y, UM=y(Bob|F=sun)




Decision Tree

Solving Decision Tree – Backward Induction

NP

0.56*74.6+0.44*60.3 = 68.3

y

F F

0.58*69.2+0.42*52.7 = 62.3

n

PP

NP=y(F=sun)

UM UM

0.42

sun sun 0.56 0.44 0.58

rainrain 74.6 60.3 69.2 52.7

UM UM

-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3

y n y n y n y n





Influence diagrams (ID)

ID

I ID: compact graphical andmathematical representation of adecision situation; probabilisticinference + decision making;maximize expected utility

I Rectangles are decisions; ovals arechance variables; diamonds areutility functions

I Each chance node specifies aprobability distribution (CPD)given each value of parents

W

F

U

UM





ID

I Parents of decisions (informationalparents) represent observations

I Parents of chance nodes representprobabilistic dependence

I Parents of utility nodes represent theparameters of the utility functions

I A strategy for a decision is a functionfrom its informational parents to a choicefor the decision. For each observation, apure strategy prescribes a single choice ofaction for an agent

W

F

U

UM





Influence Diagram for Example 1, Umbrella Scenario

W

F

U

UM

W = sun W = rain

0.6 0.4

W F = sun F = rain

sun 0.7 0.3

rain 0.4 0.6

W UM U

sun y -10

sun n 100

rain y 100

rain n -10





Influence Diagram for Extended Umbrella Scenario

I “No forgetting” edgesadded from NP to UM

I Agents remember theirpast decisions when theymake future decisions

I Information available topast decisions is alsoavailable to futuredecisions

F

W

UM

U

NP





Converting ID to Decision Tree: Extended UmbrellaExample

NP

y n

F F PP

NP=y(F=sun)

UM UM

0.42

sun sun 0.56 0.44 0.58

rainrain

UM UM

y n y n y n y n

-6.4 74.6 60.3 9.7 20.8 69.2 52.7 37.3


F

W

UM

U

NP

Disadvantage : Lose the graph structure




multi-agent influence diagrams

Example 2 Proposer can offer some split of 3 coins to Responder.If Responder accepts, offer is enforced; if Responder rejects, bothreceive nothing. Offer may be corrupted and set to (1,2) split(proposer/responder) by noisy channel with 0.1 probability.





MAID [Milch and Koller IJCAI01]

I Extend Influence Diagrams to themulti-agent case

I Rectangles and diamonds representdecisions and utilities associated withagents, respectively; ovals representchance variables

I A strategy for a decision is a mappingfrom the informational parents of thedecision to a value in its domain

I A strategy profile includes strategies for alldecisions

An Ultimatum Game Example

Proposer can o$er some split of 3 coins to Responder. If Responder accepts, o$er is enforced; if Responder rejects, both receive nothing.

97

An Ultimatum Game Example

Proposer can o$er some split of 3 coins to Responder. If Responder accepts, o$er is enforced; if Responder rejects, both receive nothing. O$er may be corrupted and set to !1,2" split !proposer/responder" by noisy channel with 0.1 probability.

98

Multi%agent In'uence Diagrams ,Milch and Koller -01.

Extend In'uence Diagrams to the multi%agent case.

Rectangles and diamonds represent decisions and utilities associated with agents; ovals represent chance variables.

A strategy for a decision is a mapping from the informational parents of the decision to a value in its domain.

A strategy pro&le includes strategies for all decisions.

Proposer Responder

U(Proposer)

channel

U(Responder)

99

Multi%agent In'uence Diagrams

Extend In'uence Diagrams to the multi%agent case.

Rectangles and diamonds represent decisions and utilities associated with agents; ovals represent chance variables.

A strategy for a decision is a mapping from the informational parents of the decision to a value in its domain.

A strategy pro&le includes strategies for all decisions.

Channel

Proposer !0,3" !1,2" !2,1" !3,0"

!0,3" 0.9 0.1 0 0

!1,2" 0 1 0 0

!2,1" 0 0.1 0.9 0

!3,0" 0 0.1 0 0.9

Proposer Responder

U(Proposer)

channel

U(Responder)

100





Solving MAID by Converting MAID to Decision Tree

0.9 0.1 0.9 0.1 0.9 0.11

1

Solve Response and determine strategy for Response: accept anysplit larger than zeroSolve Proposer and Offer is the largest split for proposer that offersa positive share to responder




networks of influence diagrams

Traditional Game Theory Limitations

Game Theory Assumptions Real World Agents

rational maybe irrational

Common knowledge of Uncertain about game,game structure other’s strategies

Agents’ beliefs correct/consistent Agents’ belief might incorrect





What we need

Language for representing uncertainty over decision making mustallow for

I distinction between agents’ models of each other and howthey actually behave

I incorrect/inconsistent beliefs; using heuristics

I representation of belief hierarchies, e.g., “I believe that youbelieve that . . . ”

I framework for learning





To motivate single-agent NID, considerExample 3 Bob observes the forecast before deciding whether totake an umbrella when leaving the house. In reality, forecasters arequite trustworthy. We wish to model the fact that Bob is lesstrusting of forecaster than he should be. What is Bobs strategygiven his wrong belief about forecasters?





Top-level block Bob’s block

Utility Utility

Bob’s

Mod [UM] = Bob′sBlock, means Bob may be using Bob’s block tocompute strategy to make decision UMEdge represents Bob (agent) at Top-Level block (source block)modeling decision UM as being made in Bob’s block (target block)





I A NID is a directed, possibly cyclicgraph, in which each node is aMAID.

I Call the nodes of a NID blocks.They are different mental models.A mental model for an agent mayitself use descriptions of themental models of other agents.

I Let D be a decision belonging toagent α in block K , and let β beany agent. (In particular, β maybe agent α itself.)


Utility Utility

Bob’s





I Introduce a new type of node,denoted by Mod [β,D] withvalues that range over eachblock L in NID. WhenMod [β,D] = L, β believesthat α may be using thestrategy computed in block Lto make decision D

I A Mod node is a chance nodejust like any other; it mayinfluence, or be influenced byother nodes of K

I Solving NID by converting toMAID


Utility Utility

Bob’s




Opponent Modeling

Rock Paper Scissors Competition: Multi-agent Case

In opponent modeling, agents try to learn the patterns exhibited byother players and react to their model of others and thus do better.Example 4 In the game of RoShamBo (commonly referred to asRock-Paper-Scissors), Alice and Bob simultaneously choose one ofrock, paper, or scissors. If they choose the same item, the result isa tie; otherwise rock crushes scissors, paper covers rock, or scissorscut paper, as shown in the table

Gal & Pfeffer

Speed

fast slow

0 1(a) node Speed(block Top-level)

Speed

fast slow

1 0(b) node Speed(block L)

Mod[Bob, Steal]Top-level L

1 0(c) nodeMod[Bob, Steal](block Top-level)

Mod[Alice, PitchOut]Top-level L

1 0(d) nodeMod[Alice, PitchOut](block L)

Table 9: CPDs for nodes in Cyclic NID (Example 4.5)

Speed

Steal PitchOutThrownOut

Mod[Alice, PitchOut]

BobAlice

(a) Block L

Speed

Steal PitchOut

ThrownOut

Mod[Bob, Steal]

BobAlice

(b) Block Top-level

Top level

L

Bob,STEAL Alice,PITCHOUT

(c) Cyclic NID

Figure 9: Cyclic Baseball Scenario (Example 4.5)

rock paper scissors

rock (0, 0) (−1, 1) (1,−1)paper (1,−1) (0, 0) (−1, 1)scissors (−1, 1) (1,−1) (0, 0)

Table 10: Payoff Matrix for Rock-paper-scissors

134




Opponent Modeling

Bob’s reasoning

Alice and Bob are playing rounds of rock-paper-scissors. Supposethere exists a signal S that depends on prior history.

Strategy for Bob Strategy for Alice

S = paper (e.g.) BR(S) =scissors BR(BR(S))=rock

BR(BR(BR(S)))=paper BR(BR(BR(BR(S))))=scissors

BR(..(S)..))=rock Modeling double guess, triple guess, like “I think that you think. . . ”




Opponent Modeling

NID

1

K1Alice Automaton

Bob modeling Alice

Bpaperrock

1

I Nodes in NIDs are called blocks. Each block represent aseparate decision-making process

I An edge represents an agent at the source block modeling adecision as being made in the target block.

I The edge leads from the modeled decision to the target blockand is labeled with the modeling agent




Opponent Modeling

Alice’s double guessing Bob

1

K1

K2

Alice Automaton

Bob modeling Alice

Alice modeling Bob

B

A

scissors




Opponent Modeling

Bob’s double guessing Alice

1

K1

K2

Alice Automaton

Bob modeling Alice

Alice modeling BobBob modeling Alice

B

A

Brock




Opponent Modeling

Alice’s triple guessing Bob

K3K1

K2

Alice Automaton

Bob modeling Alice


Alice modeling Bob

BB

A

B

A

paperrock

scissorsrock

paper




Opponent Modeling

RoShamBo NID

K3K1

K2

TL

Alice Automaton

Bob modeling Alice


Alice modeling Bob

B

A

B

A

B

B

B

paperrock

scissors

paper

rock




Opponent Modeling

Empirical Evaluation

Pick nine top contestants from the first automatic RoShamBoCompetition; 3000 rounds with each contestant; +1 for winning around, -1 for losing one).

Networks of Influence Diagrams

0 1 2 3 4 5 6 7 8 9 100

50

100

150

200

250

300

350

400

Contestant

Ave

rage

Sco

re D

iffer

ence

Opponent type NumberIocaine Powder 1

Probabilistic, Pattern, Exploitative 2, 9Deterministic, Pattern, Exploitative 3, 6, 5

Probabilistic, Meta, Exploitative 1, 4Probabilistic, Pattern, Exploitative 7, 8

Figure 11: Difference in average outcomes between NID player and opponents

In a Bayesian game, each agent has a discrete type embodying its private information.Let N be a set of agents. For each agent i a Bayesian game includes a set of possible typesTi, a set of possible actions Ci, a conditional distribution pi and a utility function ui. LetT = ×i∈NTi and let C = ×i∈NCi. For each agent i, let T−i = ×j �=iTj denote the set ofall possible types other than those of agent i. The probability distribution pi is a functionfrom ti to ΔT−i, that is, pi(.|ti) specifies for each type ti ∈ Ti a joint distribution overthe types of the other agents. The utility function ui is a function from C×T to the realnumbers. It is a standard assumption that the game, including agents’ strategies, utilitiesand type distributions, is common knowledge to all agents.

The solution concept most commonly associated with Bayesian games is a BayesianNash equilibrium. This equilibrium maps each type to a mixed strategy over its actionsthat is the agent’s best response to the strategies of the other agents, given its beliefs abouttheir types. Notice that in a Bayesian game, an agent’s action can depend on its own typesbut not on the types of the other agents, because they are unknown to that agent when itanalyzes the game. It is assumed that each agent knows its own type, and that this typesubsumes all of the agent’s private information before the game begins. Because the typesof other agents are unknown, each agent maximizes its expected utility given its distributionover other types.

Let N−i denote all of the agents in the Bayesian game apart from agent i. Let σi(.|ti)denote a random strategy for agent i given that its type is ti. A Bayesian Nash equilibrium

139




Conclusions - 1

Conclusions:

I Building blocks of NIDs are MAIDs

I In NIDs, each mental model itself is a graphical model of agame.

I Agent in one mental model may believe that another agent(or possibly itself) uses a different mental model to makedecisions

I Relationship between NID and Bayesian games: they areequally expressive, but NIDs may be exponentially morecompact

I NIDs can describe agents who play irrationally, representplayers’ inconsistent and/or incorrect beliefs “I believe thatyou believe” type reasoning




Conclusions - 2

Conclusions:

I NID can be used to learn non-stationary strategies inrock-paper-scissors

I Models inspired by NIDs can learn people’s play in negotiationgames

I Focus of our continuing work will be to develop a generalmethod for learning models in NIDs

Chenghui’s Remark: something like Dynamic NID to representmultiagent sequential decision process or multiagent POMDP?




Conclusions - 2

NID Converted to MAIDConclusions:

I Any NID can be converted to a MAIDI But MAID is hard to construct directly



Documents

Networks of Influence Diagrams: A Formalism for