Upload
colum
View
35
Download
0
Embed Size (px)
DESCRIPTION
Emergence of Gricean Maxims from Multi-agent Decision Theory. Adam Vogel Stanford NLP Group Joint work with Max Bodoia , Chris Potts, and Dan Jurafsky. Decision-Theoretic Pragmatics. Gricean cooperative principle:. - PowerPoint PPT Presentation
Citation preview
Emergence of Gricean Maxims from Multi-agent Decision Theory
Adam VogelStanford NLP Group
Joint work with Max Bodoia, Chris Potts, and Dan Jurafsky
Decision-Theoretic Pragmatics
Gricean cooperative principle:
Make your contribution such as it is required, at the stage at which it occurs, by the accepted purpose or
direction of the talk exchange in which you are engaged.
Decision-Theoretic Pragmatics
Gricean Maxims:• Be truthful: speak with evidence• Be relevant: speak in accordance with goals• Be clear: be brief and avoid ambiguity• Be informative: say exactly as much as needed
Emergence of Gricean Maxims
Co-operative principle
•Be truthful•Be relevant•Be clear•Be informative
???
Approach: Operationalize the co-operative principleTool: Multi-agent decision theoryGoal: Maxims emerge from rational behavior
Joint utility Rationality
Related Work
• One-shot reference tasks– Generating spatial referring expressions [Golland et al.
2010] – Predicting pragmatic reasoning in language games
[Stiller et al. 2011]• Interpreting natural language instructions– Learning to read help guides [Branavan et al. 2009]– Learning to following navigational directions [Vogel
and Jurafsky 2010] [Artzi and Zettlemoyer 2013] [Chen and Mooney 2011] [Tellex et al. 2011]
CARDS Task
Outline
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
Spatial Semantics“in the top left of the board”
“on the left side” “right in the middle”
BOARD(top;left) BOARD(left) BOARD(middle)
MaxEnt Classifier w/ Bag of Words
Estimated from Corpus Data
Complexity Ahoy
• Approximate decision making only feasible for problems with <10k states!
1001000
10000100000
100000010000000
1000000001000000000
10000000000
Semantic State Representation• Divide board into 16 regions• Cluster squares based on meanings
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
Outline
Partially Observable Markov Decision Process (POMDP)
Or: An HMM you get to drive!
State space S: hidden configuration of the world• Location of card• Location of player
Action space A: what we can do• Move around the board• Search for the card
Observations : sensor information + messages• Whether we are on top of the card• BOARD(right;top) etc.
Observation Model : sensor model• We see the card if we search for it and are on it• For messages
Reward R(s,a): value of an action in a state • Large reward if in the same square as the card• Every action adds small negative reward
Transition T(s’|a,s): dynamics of the world• Travel actions change player location• Card never moves
Initial belief state : distribution over S• Uniform distribution over card location• Known initial player location
Belief Update: Action: SEARCHObservation: (Card not here, )
Belief Update:
Belief Update: Action: SEARCHObservation: (Card not here, “left side”)
Belief Update:
Decision Making
Choose policy
Goal: Maximize expected reward
Solution: Perseus, an approximate value iteration algorithm [Spaan et al. 2005]
Computational complexity: P-SPACE!
Immediate reward Future rewardExpected +
• Spatial semantics• ListenerBot: single-agent advice taker– Can accept advice, never gives it
• DialogBot: multi-agent decision maker– Gives advice by tracking the other player’s beliefs
Outline
DialogBot
• (Approximately) tracks beliefs of other player• Speech actions change beliefs of other player• Model: Decentralized POMDP (Dec-POMDP)– Problem: NEXP Hard!!
Top!
Each agent selects its own action
Each agent receives its own observation
Transition depends on both actions
Reward is shared between agentsFormalization of the co-operative principle
Exact Multi-agent Belief Update
Time
Approximate Multi-agent Belief Update
Time
Single-agent POMDP Approximation
Other agent belief transition model
World transition model
Resulting POMDP has states
What to say?
“Top”
“Middle”
“Right”
“Right”
Return to Grice
• Be truthful• Be relevant• Be clear• Be informative
Cooperating DialogBots
Middle of the board
Cooperating DialogBots
Middle of the board
Adolescent DialogBots
Top
Return to Grice
• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win
the game• Be clear• Be informative
Experimental Results• Evaluate pairs of agents from 197 random
initial states• Agents have 50 high-level moves to find the
cardBots % Success Average High
Level ActionsListenerBot & ListenerBot
84.4% 19.8
ListenerBot & DialogBot
87.2% 17.5
DialogBot & DialogBot
90.6% 16.6
Emergent Gricean Behavior
• Be truthful: DialogBot speaks with evidence• Be relevant: DialogBot gives advice to help win• Be clear: need variable costs on messages• Be informative: requires levels of specificity
ACL 2013: Implicatures and Nested Beliefs in Approximate Decentralized-POMDPs
From joint reward, not hard coded
Future Work: intentions, joint plans, deeper belief nesting
Thanks!