Upload
vankhuong
View
224
Download
0
Embed Size (px)
Citation preview
IntroductionModeling pragmatic phenomena
Pragmatics & Game Theory:
Learning Dynamics
Roland Mühlenbernd
WiSe 13/14
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Table of Content
1 IntroductionHomeworksReview: Learning Dynamics
2 Modeling pragmatic phenomenaQ-ImplicatureI-ImplicatureM-Implicature
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 1
What kind of fundamental insight brought evolutionary biology intothe analysis of human relationships?
A fundamental insight from evolutionary biology is that mostsocial relationships involve combinations of cooperation andcon�ict.
This insight applies to communication among organisms noless than to physical actions, and indeed animal signaling hasbeen found to involve exploitative manipulation as well as thecooperative exchange of information.
In the human case, one has to think only of threats, dangeroussecrets, contaminating leakage, and incriminating questions.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 2
What is the advantage of bribing a police o�cer in an indirect way?Why is a veiled bribe a rational option, even if the situation is notof legal or �nancial matter? What kind of costs could be involved?
In a simple case like bribing a police o�cer, the appeal of a veiled bribe isintuitively clear: If some o�cers are corrupt and would accept the bribe,but others are honest and might arrest the driver for bribery, an indirectbribe can be detected by the corrupt cop while not being blatant enoughfor the honest cop to prove it beyond a reasonable doubt.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 2
What is the advantage of bribing a police o�cer in an indirect way?Why is a veiled bribe a rational option, even if the situation is notof legal or �nancial matter? What kind of costs could be involved?
In a nonlegal situation like indirectly bribing a maitre d' to getimmediately seated in a restaurant, indirect speech can avoid acon�ict of relationship types like dominance and reciprocity
an overt con�ict of relationship types causes awkwardness thatinvolves social costs
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 3
What two purposes does language serve according to PolitenessTheory?
Politeness Theory proposes that language serves two purposes:to convey a proposition (e.g. a bribe, a command, an o�er)and to negotiate and maintain a relationship.
People achieve these dual ends by using language at two levels.The literal form of a sentence is consistent with the safestrelationship between speaker and hearer.
At the same time, by implicating a meaning between the lines,the speaker counts on the listener to infer its real intent, whichmay initiate a di�erent relationship.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 4
Name the (according to Fiske) three distinct types of humanrelationships and give a short description of each.
Alan Fiske has advanced the strong claim that humanrelationships in all cultures fall into only three distinct types:
The dominance or authority relationship is governed by theethos, �Don't mess with me.� It has a basis in the dominancehierarchies common in the animal kingdom, although inhumans, it is based not just on brawn or seniority but on socialrecognition: how much others are willing to defer to you.The communality or communal sharing relationship conformsto the ethos, �What's mine is thine; what's thine is mine.�The reciprocity or equality-matching relationship obeys theethos, �You scratch my back; I'll scratch yours.�
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 5
Consider the indirect threat: �Nice store you got here. Would be areal shame if something happened to it.� What types of relationshipare in con�ict here? Explain.
The speaker pretends to be in a reciprocity relationship that�ts to the business context
The speaker indirectly communicates a dominance relationship:�Don't mess with me.�; �Do what I want, otherwise...�
If a cop would eavesdrop the conversation, the speaker couldnot be accused for a threat, as long as his words are indirectand super�cially re�ect a reciprocity relationship
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 6
What does the plausible-deniability hypothesis say about thedirectness of the speaker's wording? And what does therelationship-negotiation hypothesis predict about indirect speech?
The plausible-deniability hypothesis predicts that thedirectness of speakers' wording of a veiled bribe or otheroverture (assessed on linguistic grounds) is not an arbitrarysocial ritual, like saying �Please� and �Thank you�,
but is predictable from strategic factors a�ecting its expectedutility, such as the proportion of honest and dishonest o�cersin an area, the cost of a bribe, the cost of a ticket, and thecost of a bribery charge.
For the listener's part, the directness of a speech act shouldpredict their subjective estimates of the likelihood that thespeaker intended the fraught proposition as opposed to makingan innocent remark.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Homework Question 6
What does the plausible-deniability hypothesis say about thedirectness of the speaker's wording? And what does therelationship-negotiation hypothesis predict about indirect speech?
The relationship-negotiation hypothesis predicts that indirectspeech should be judged as generating less awkwardness anddiscomfort,
as being more respectful,
as better acknowledging the expected relationship with thehearer (such as a�ection, deference, or collegiality),
and as making it easier for the participants to resume theirnormal relationship should the o�er be rebu�ed.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Overview
Game Theory and Linguistics
Language Evolution
Signaling Games
GT in Lang. Use
Indirect Speech
Pragm. Reasoning
Signaling Games
IBR model SIM
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Coordination & Signaling
R L
R 1 0L 0 1
aL aStL 1 0tS 0 1
Messages: One or two lanterns?
s1:tL m1
tS m2
s2:tL
m2tS
m1
s3:tL m1
tS m2
s4:tL
m2tS
m1
r1:m1 aL
m2 aS
r2:m1
aSm2
aL
r3:m1 aL
m2 aS
r4:m1
aSm2
aL
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
a signaling game is a tuple SG = 〈{S ,R},T ,Pr ,M,A,U〉a Lewis game is de�ned by:
T = {tL, tS}M = {m1,m2}A = {aL, aS}Pr(tL) = Pr(tS) = .5
U(ti , aj) =
{1 if i = j
0 else
aL aStL 1 0tS 0 1
N
S
R
1 0
R
1 0
S
R
0 1
R
0 1
.5 .5tL tS
m1 m2 m1 m2
aL aS aL aS aL aS aL aS
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Pure strategies
Pure strategies are contingency plans, players act according to.
sender strategy: s : T → M
receiver strategy: r : M → A
s1:tL m1
tS m2
s2:tL
m2tS
m1
s3:tL m1
tS m2
s4:tL
m2tS
m1
r1:m1 aL
m2 aS
r2:m1
aSm2
aL
r3:m1 aL
m2 aS
r4:m1
aSm2
aL
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Signaling Systems
signaling systems are combinations of pure strategies. TheLewis game has two: L1 = 〈s1, r1〉 and L2 = 〈s2, r2〉
L1:tL
tS
m1
m2
aL
aSL2:tL
tS
m1
m2
aL
aS
signaling systems are strict Nash equilibria of the EU-table:
r1 r2 r3 r4s1 1 0 .5 .5s2 0 1 .5 .5s3 .5 .5 .5 .5s4 .5 .5 .5 .5
in signaling systems messages associate states and actionsuniquely
signaling systems constitute evolutionary stable states
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Behavioral Strategies
Behavioral strategies are functions that map choice points toprobability distributions over actions available in that choice point.
behavioral sender strategyσ : T → ∆(M)
behavioral receiver strategyρ : M → ∆(A)
σ =
t1 7→[m1 7→ .9m2 7→ .1
]t2 7→
[m1 7→ .5m2 7→ .5
] ρ =
m1 7→[a1 7→ .33a2 7→ .67
]m2 7→
[a1 7→ 1a2 7→ 0
]
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Learning Dynamics & Signaling Games
Extensions in time:
agents play the game repeatedly
agents' decisions are in�uenced by previous encounters
application of learning dynamics like reinforcement learning
belief learning
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Best Response & Expected Utility
Playing Best Response means to make a choice thatmaximizes the Expected Utility.
EUS(m|t, β) =∑a∈A
β(a|m)× U(t, a) (1)
EUR(a|m, β) =∑t∈T
β(m|t)× U(t, a) (2)
How does an agent get belief β?
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Belief Learning
The belief is a result of observation
Example:
SO a1 a2m1 8 2m2 7 13
β =
m1 7→[a1 7→ .8a2 7→ .2
]m2 7→
[a1 7→ .35a2 7→ .65
]
RO t1 t2m1 6 0m2 4 4
β =
t1 7→[m1 7→ .6m2 7→ .4
]t2 7→
[m1 7→ 0m2 7→ 1
]
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Best Response as Behavioural Strategy
behavioural sender strategyσ : T → ∆(M)
σ(m|t) =
{1
|BR(t)| if m ∈ BR(t)
0 else
behavioural receiver strategyρ : M → ∆(A)
ρ(a|m) =
{1
|BR(m)| if a ∈ BR(m)
0 else
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Reinforcement Learning
S Rts
tg
m1
m2
as
ag
0
0
0
0the sender has an urn for eachstate t ∈ T
each urn contains balls of eachmessage m ∈ M
the sender decides by drawingfrom urn 0t
the receiver has an urn for eachmessage m ∈ M
each urn contains balls of eachaction a ∈ A
the receiver decides by drawingfrom urn 0t
successful communication → urn update
in general a signaling system emerges over time
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Behavioural & Pure Strategies
Pure strategies are a subset of behavioural strategies.
Example:
σ2:t1
m2t2
m1
ρ2:m1
a2m2
a1
σ2 =
t1 7→[m1 7→ 0m2 7→ 1
]t2 7→
[m1 7→ 1m2 7→ 0
] ρ2 =
m1 7→[a1 7→ 0a2 7→ 1
]m2 7→
[a1 7→ 1a2 7→ 0
]
Note: If an agents plays σ2 as sender and ρ2 as receiver, we say, hehas learned the signaling language L2 = 〈σ2, ρ2〉.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Extensions in Time and Space
Extensions in time and space:
agents are placed in a network structure
agents play the game with direct neighbors
agents play both as sender and receiver
agents play the game repeatedly
agents' decisions are in�uenced by previous encounters:
implementation of learning dynamics
best response + belief learningreinforcement learning
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Example: Result in a SW network
Abbildung: Resulting structure after 30 simulation steps of 100 BL agentsplaying the Lewis game on a SW network. The colours blue and green representboth signaling systems as target strategies.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Example: Result in a SW network
Abbildung: Resulting structure after 300 simulation steps of 100 RL agentsplaying the Lewis game (with lateral inhibition) on a SW network. The coloursblue and green represent both signaling systems as target strategies.
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
HomeworksReview: Learning Dynamics
Belief Learning VS. Reinforcement Learning
behavioural rational learning speed
BL + BR√ √
fastRL
√- slow
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Neo-Gricean Pragmatics
the Conversational Implicature is a pragmatic phenomenonwhere an utterance's intended meaning di�ers from its literalmeaning.
Interlocutors can resolve the di�erence between the intendedpragmatic interpretation (PI) and the literal interpretation (LI)by Cooperation Principles.
Levinson (2000) subdivided GCI's in:
Q-Implicature
I-Implicature
M-Implicature
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Q-Implicature
(1) �Some boys came to the party.�LI: Some, maybe all boys came. ∃ = ∃¬∀ ∨ ∀PI: Some but not all boys came. ∃¬∀
Strategy for LIt∀
t∃¬∀
mall
msome
msbna
a∀
a∃¬∀
Strategy for PIt∀
t∃¬∀
mall
msome
msbna
a∀
a∃¬∀
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Modelling Q-implicature
Parameter settings:
T = {t∀, t∃¬∀}M = {mall ,msome ,msbna}A = {a∀, a∃¬∀}Pr(t∀) = Pr(t∃¬∀) = .5
κ(msbna) = 1κ(mall ) = κ(msome) > 1
Initial LI strategy
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀
.5
.5
.5
.5
.5
.5
mall msome msbna
0t∀ 50 50 00t∃¬∀ 0 50 50
a∀ a∃¬∀0mall
100 00msome 50 500msbna
0 100
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Simulation & Results
200 RL agents play the Q-Implicature game repeatedly on atotal network with random partners
all agents start with the initial urn setting that represents LI
The simulation ends if all agents have learned a pure strategy
Results:
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀
t∀
t∃¬∀
mall
msome
msbna
t∀
t∃¬∀
%ofagents
1 2 3 4 5
κ(msome ),κ(mall )
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
I-Implicature
�What is expressed simply is stereotypically exempli�ed�
(2) �Billy drank a glass of milk.�LI: A glass of any kind of milk. tc , tgPI: A glass of cow's milk. tc
Strategy for LItc
tg
mcm
mm
mgm
ac
ag
Strategy for PItc
tg
mcm
mm
mgm
ac
ag
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Modelling I-implicature
Parameter settings:
T = {tc , tg}M = {mm,mcm,mgm}A = {ac , ag}Pr(tc) = .8 > Pr(tg ) = .2
κ(mm) = 2κ(mcm) = κ(mgm) = 1
Initial LI strategy
tc
tg
mcm
mm
mgm
ac
ag
.5
.5
1− p
p
1− p
p
mcm mm mgm
0tc 100− n n 00tg 0 n 100− n
for n = b100× pc
ac ag0mcm 100 00mm 50 500mgm 0 100
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Simulation & Results
200 RL agents play the I-Implicature game repeatedly on atotal network with random partners
all agents start with the initial urn setting that represents LI
The simulation ends if all agents have learned a pure strategy
Results:
tc
tg
mcm
mm
mgm
tc
tg
tc
tg
mcm
mm
mgm
tc
tg
%ofagents
.3 .35 .4 .45 .5 .55 .6 p
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
M-Implicature
�What's said in an abnormal way isn't normal.�
(3) �Billy caused the sheri� to die.�LI: Billy killed the sheri� in any way. tp, trPI: Billy killed the sheri� in an abnormal way. tr
Strategy for LItp
tr
mk
mctd
ap
ar
Strategy for PItp
tr
mk
mctd
ap
ar
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Modelling the M-implicature
Parameter settings:
T = {tp, tr}M = {mk ,mctd}A = {ap, ar}κ(mk) = 2, κ(mctd ) = 1
Pr(tp) > Pr(tr )
Initial LI strategy
tp
tr
mk
mctd
ap
ar
mk mctd
0tp 50 500tr 50 50
ap ar0mk
50 500mctd
50 50
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Simulation & Results
200 RL agents play the M-Implicature game repeatedly on atotal network with random partners
all agents start with the initial urn setting that represents LI
The simulation ends if all agents have learned a pure strategy
Results:
tp
tr
mk
mctd
ap
ar
tp
tr
mk
mctd
ap
ar
%ofagents
.51 .52 .53 .54 .55 .56 .57 Pr(tp)
Roland Mühlenbernd Learning Dynamics
IntroductionModeling pragmatic phenomena
Q-ImplicatureI-ImplicatureM-Implicature
Conclusion
1 Analysis of dynamics of language change and conventionalization oflinguistic behavior by
applying evolutionary and learning dynamics for repeatedsignaling gameson players (=agents) placed in a population structure
2 Concrete experiments for Q-, I- and M- implicature showed thatagents that start with a literal communication strategy stabilizewith the pragmatic one for the major space of parameter settings
3 Results reveal that pragmatic behavior can explained by rationaldeliberation (IBR) as well as by population dynamics (RL, BL)
4 Results highlight the universal power of pragmatic communicationbehavior as a way to maximize e�ciency of communication
Roland Mühlenbernd Learning Dynamics