Cooperation and Reputation

Cooperation and Reputation

Vincent Traag

June 29, 2010

Introduction Cooperative Mechanisms Indirect Reciprocity Proposed model

Outline

1. Introduction

2. Cooperative Mechanisms

3. Indirect Reciprocity

4. Proposed model


Cooperation

Cooperation (and defection)

• Organizations (also Wikipedia, open source software, . . . )◮ Why do people contribute?

• Worker ants in colonies◮ Why do workers help without individual benefit?

• Prudents parasites in hosts◮ Why do parasites not replicate faster?

• Human body◮ Why do cells not replicate faster?

Central question

If defecting (not cooperating) is a real option, why (and how) hascooperation evolved?


Formal cooperation (and defection)

Prisoner’s Dilemma

• The game knows two options, donating or not donating.

• Donate at a cost c > 0 to benefit someone else with benefitb > c .

• Agents are paired, and play a round of donating or not.

• Cooperators C donate, defectors D do not donate.

This can be summarized in the payoff matrix

A =

(C D

C b − c −c

D b 0

)

Defectors dominate

Whatever strategy you encounter (C or D), always better to defect.


Evolutionary Stability (static)

Definition (Nash equilibrium)

Strategy i is a Nash equilibrium if Aii ≥ Aji

and is a strict Nash equilibrium if Aii > Aji .Players cannot benefit by switching from strategy i if it is a Nashequilibrium.

Definition (ESS)

Strategy i is an Evolutionary Stable Strategy (ESS) if

Aii > Aji or (Aii = Aji and Aij > Ajj).

A population of players with strategy i cannot be ‘invaded’ by asmall number of different strategies.

Strict Nash =⇒ ESS =⇒ Nash


Mixed strategies

Mixed strategies

• There are n different ‘pure’ strategies (e.g. Cooperate, Defect).

• Mixed strategy p is: play ‘pure’ strategy i with probability pi .

• Average payoff for ‘pure’ strategy i versus p is then (Ap)i .

• Average payoff for mixed strategy q versus p is then q⊺Ap.

Stability revisited

Strategy p is(Strict) Nash p⊺Ap ≥ q⊺Ap

ESS p⊺Ap > q⊺Ap orp⊺Ap = p⊺Aq and p⊺Aq > q⊺Aq

There always exists a mixed strategy Nash equilibrium.


Dynamical View

• Natural to model game dynamics in an evolutionary context.

• Survival of the fittest (fitness = payoff).

Definition (Replicator equation)

Population with i = 1, . . . , n different mixed strategies pi

xi Relative abundance (frequency)

p =∑

i pixi Average strategy

fi = p⊺

i Ap Expected payoff

f = p⊺Ap Average payoff

Evolution of the population given by

xi = xi (fi − f ) = xi ((pi − p)⊺Ap).


Stability (dynamic)

Fixed points

• Total population always∑

i xi = 1.

• Dynamics are restricted to unit simplex Sn.

• Fixed point x∗ then p⊺

i Ap = pAp for xi > 0.

Nash and ESS vs. fixed points

• If x∗ is (strictly) Nash, then it is a (stable) fixed point.

• If the fixed point x∗ is stable, it is a Nash equilibrium.

• if x∗ is ESS then it is a stable fixed point.

• An interior ESS x∗ is globally stable.


Overview

What are possibly mechanisms to get cooperation?Payoff matrix

A =

(C D

C b − c −c

D b 0

)

Mechanisms

• Kin selection (r > cb)

Cooperate because offspring benefits of your cooperation. Basisof ‘selfish gene’, or ‘inclusive fitness’.

• Direct reciprocity (w > cb)

Cooperate because of possible future payoffs.

• Indirect reciprocity (q > cb)

Cooperate because someone else may cooperate with you in thefuture.


Kin selection

Kin and gene

• Focus is on the gene, how can the gene spread?

• If coefficient of kinship r > cb

the cooperative gene will spread.

Game theoretic dynamic view

• Let 0 ≤ r ≤ 1 be the assortativity.

• Average payoff (cooperators x , defectors 1 − x)

fC (x) = r(b − c) + (1 − r) (x(b − c) − (1 − x)c)

fD(x) = (1 − r)xb

• Dynamics x = x(1 − x)(fC − fD), x∗ = 1 is stable if r > cb.


Kin selection

Change in payoff

• Average payoff (cooperators x , defectors 1 − x)

fC (x) = r(b − c) + (1 − r) (x(b − c) − (1 − x)c)

fD(x) = (1 − r)xb

• Gives payoff matrix

A =

(C D

C b − c rb − c

D (1 − r)b 0

)

• Cooperation is ESS if (b − c) > (1 − r)b, hence if r > cb.


Reciprocity

Cooperate because possible future rewards.

Iterated Prisoner’s Dilemma

• Play the PD game multiple times.

• Usually probability w to play another round.

• Huge number of possible strategies.

• No definite ESS.

Framework

• Play on average k = 1/(1 − w) rounds, then apply selection.

• Expected payoff aij of strategy i vs j .

• Then apply earlier framework (ESS, replicator).


Some strategies

Example (Always)

Defect/cooperate on all rounds

Other CDDDDCC

AllD DDDDDDD

AllC CCCCCCCC

Example (Win-Stay, Lose-Shift)

Change strategy if losing, keep itotherwise.

Other CDDDDCC

WSLS CCDCDCC

Example (Tit-for-tat)

Start cooperating, then repeatopponent.

Other CDDDDCC

TFT CCDDDDC

Example (Generous Tit-for-tat)

As TFT, but cooperates afterdefection with probability p.

Other CDDDDCC

GTFT CCDDCDC


Stability of reciprocity (TFT)

TFT vs. AllD

• TFT will cooperate first round, then defect subsequently.

• Expected payoff matrix

A =

(TFT AllD

TFT (b − c)/(1 − w) −c

AllD b 0

)

• TFT is ESS when (b − c)/(1 − w) > b, or w > cb.

TFT vs. AllC

• TFT is neutral vs AllC, neither is ESS.

• Expected payoff always (b − c)/(1 − w) for both TFT and AllC.


Cyclic behaviour

Weaknesses of TFT

• TFT population can drift towards AllC.

• TFT does not restore cooperation on errors

TFT CCDCDCDD

TFT CCCDCDDD

• Generous TFT (GTFT) sometimes cooperates unreciprocally.

• GTFT can correct errors but still neutral vs AllC.

TFT GTFT

AllCAllD


Introduction

Why is kin selection and reciprocity not sufficient?

Insufficient explanation

• Humans cooperate also with non-kin.

• Humans cooperate in non-iterative situations.

Indirect reciprocity

• Cooperate if cooperated with others in the past.

• Brings reputation into play.

• How to respond to reputation?

• How to determine new reputation?


Indirect Reciprocity

Cooperate because others will return the favor.

Reputation

• Cooperation increases reputation, defection decreases it.

• Cooperate with those who have a good reputation.

• Defect those who have a bad reputation.

Action and assesment

• Many other possible interactions between cooperation andreputation.

• Should it be ‘bad’ or ‘good’ to cooperate with ‘bad’ agents?

• Should you cooperate only to increase your own reputation?


Image score

Definition (Image score, reputation)

• Integer status −5 ≤ Si ≤ 5 known to all.

• If cooperate increase (with 1).

• If defect decrease (with 1).

Definition (Discriminator Strategy)

• Cooperative threshold −5 ≤ kj ≤ 6.

• If status Si ≥ kj cooperate, otherwise defect.

• Strategy kj = −5 corresponds to AllC.

• Strategy kj = 6 corresponds to AllD.


Image score

Simulation

• Have n agents playing m rounds of donating.

• Each agent i has a threshold ki andreputation Si .

• Reproduce offspring proportional to payoff.

Results of simulation

• Cooperative strategies (ki ≤ 0) prevailswithout mutation.

• Cycles of Discriminator → AllC → AllD withmutation.


Some simple analytics

Simple image score

• Only good (1) or bad (0) reputation.

• Conditional cooperation (CC): cooperate if reputation is good.

• Probability q to know reputation of defector.

CC vs AllD

• Payoff matrix

A =

(CC AllD

CC b − c −c(1 − q)AllD b(1 − q) 0

)

• Conditional Cooperation is ESS when q > cb.


Other reputation dynamics

Morals

• Defecting a defector: bad in image score.

• What action should be regarded as good?

• When to cooperate, when to defect?

GG GB BG BB

C ∗ ∗ ∗ ∗

D ∗ ∗ ∗ ∗

∗ ∗ ∗ ∗

Reputation of donor and recipientAction of donor

New reputation can beeither Good or Bad

Action can be eitherCooperate or Defect


Some reputation dynamics

GG GB BG BB

C G G G G

D B B B BImage scoring

C G G G G

D B G B BStanding

C G B G B

D B G B BJudging

C G B G B

D B B B BShunning


Leading eight

Best strategies

• In total 2, 048 different possible strategies.

• There are 8 strategies (leading eight) that perform best (highestpayoff, and ESS).

GG GB BG BB

C G ∗ G ∗

D B G B ∗

C D C ×

Maintainance of cooperation

Mark defectors

Punish defectors

Forgive defectors

Apologize


Subjective reputation

Subjective reputation

• Unrealistic that everybody knows the reputation of everybody.

• Introduce a subjective (private) reputation.

• ‘Observe’ only a few interactions.

Observing

• Probability q of observing an interaction.

• Cooperation declines with lower q.

• Diverging reputations cause further errors.

• Good may defect bad, but not all agree on who’s bad.


Synchronize reputations

Synchronizing reputations

• Spread local information to synchronize reputations.

• Players ‘gossip’ about each other to share information.

• Start gossip, spread gossip and how to interpret gossip?

Lying, cheating and defecting

• Possibly ‘false’ gossips spread.

• Spread rumours unconditionally allows liars to invade.

• Liars cannot invade conditional rumour spreaders.


Empirical evidence

Directly observable

• Humans seem to be using image scoring.

• Norm (help if S > k) can be different across groups.

• Standing strategy might be too ‘demanding’.

• Generates trust, also in subsequent games.

With gossip

• Gossip effective to spread information on reputation.

• Even in presence of direct observation, gossip has an effect.

• More gossip increases the effect.


Current research

Research questions

• What population structure can result from gossip?

• How stable are certain population structures?

Desired properties

• Have subjective reputations.

• Influenced by ‘local’ gossip.

• In the absence of gossip, rely on own observations.

• More gossip should have more influence.

• Have an analytically tractable model.


Simple model

• Start with some simple model and obtain some results.

• Somewhat arbitrary choices, which might be varied later on.

Basics

1 Each agent has a reputation of the other: Sij .

2 Everybody plays and cooperates/defects based on reputation.

3 Everybody gossips the result of the interaction.

4 Update reputation based on own observation and gossip.


Reputation and cooperation

One interaction

• Suppose agent i and j interact

• Each agent has a reputation of the other: Sij and Sji

• Probability to cooperate αij and αji depend on reputation.

Approximation to image score

• Image score uses effectively a Heaviside step function:

αij = Θ(Sij − k)

• We propose continuous version (for now, k = 0)

αij =1

1 + e−γ(Sij−k)


Individual strategy

The four different outcomes have the following probabilities:Player j

Player i

C DC αijαji αij(1 − αji )D (1 − αij)αji (1 − αij)(1 − αji )

Individual strategy

• +1 for ‘good’ actions, −1 for ‘bad’ actions to reputation.

• TFT-like: Consider CC and DC as good.

• We currently study WSLS-like: Consider CC and DD as good.

∆iSij(t) =αijαji + (1 − αij)(1 − αji )

− (1 − αij)αji − αij(1 − αji )

=(2αij − 1)(2αji − 1)


Gossiping

Who gossips?

• To whom should you gossip?

• What gossip should you trust?

• Pass on the gossip?

• Currently: no further spreading, talk to cooperative people.

Gossip about what?

• Gossip about reputation?

• Gossip about last interaction?

• Currently: last interaction.


Gossiping

Consider all neighbours k when updating the reputation Sij .

i j

k

The link tobe updated.

Does i ‘like’ k?

Will k gossip to i?

What actionhas j takento k?

Change in reputation after gossiping

∆gSij(t) =∑

k 6=i ,j

αki (2αik − 1)(2αjk − 1)


Reputation dynamics

Reputation

• Combine change from individual strategy and from gossiping.

• Balance the two changes with a ‘social influence’ parameter0 ≤ λ ≤ 1.

∆Sij(t) = (1−λ) (2αij − 1)(2αji − 1)︸︷︷︸

Individual strategy

+λ∑

k 6=i ,j

αki (2αik − 1)(2αjk − 1)

︸︷︷︸

Gossip influence


Analytics

Obtain differential equation

• Assume for interval ∆t < 1, probability to interact is ∆t.

• Then we can take the limit lim∆t→0 ∆Sij(t)/∆t

• The derivative Sij can be written in terms of αij , we obtain

Sij =αij

γ(1 − αij)αij

Differential equation becomes (with rescaled time τ = γt)

αij = αij(1 − αij)

[

(1 − λ)(2αij − 1)(2αji − 1)

+ λ∑

k 6=i ,j

αki (2αik − 1)(2αjk − 1)

]


No gossip

No gossip

• When gossip is not presentdifferential equation is simple:

αij = αij(1 − αij)(2αij − 1)(2αji − 1)

• Only dependent on αij and αji .

• Only stable fixed point: α∗ij = α∗

ji = 1.0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0


Stability of fixed points

Two classes of fixed points

• Let Sn be the unit hypercube of dimension n.

• First class of fixed points is the corner of Sn.

• That is α∗ij = 0, 1 for all ij

• Second class is outside the corners (internal points).

• That is, there is at least one α∗ij 6= 0, 1

Corner

Stability of points

• Points in the corner are easily classified as (un)stable

• Internal points more difficult.

• It seems that most internal points are non-hyperbolic.

• Possibly some (limit) cycles may exist.


Corner points

Corner points

• All corner points are fixed points.

• Jacobian of α = F(α) defined as

∇F =

∂f12∂α12

· · · ∂f12αn(n−1)

......

. . ....

∂fn(n−1)

∂α12· · ·

∂fn(n−1)

αn(n−1)

α∗

• For corner points, only ∂fij/∂αij is non-zero:

Condition for stability in corners:

(1 − 2α∗ij)

[

(1 − λ)(2α∗ij − 1)(2α∗

ji − 1) + λ(k+ij − k−

ij )]

< 0

where k±ij is the number of matches/differences between i and j .


Stable groups

Groups

• One special case of corner points

• Cooperate within group, defect between groups

• Working out stability conditions gives

nc >1

λ

• Social influence λ induces lower bound on group size.


Invasion from AllD

AllD

• Suppose system in equilibrium α∗ = (1, 1, . . . , 1).

• Add a number of defectors (AllD).

• Relationships between gossiping cooperators uneffected.

• Only reputation of defector changes.

New reputation equilibrium

• Let i be a cooperator, and j a defector, then

αij = αij(1 − αij) [(1 − λ)(1 − 2αij) − λ(nc − 1)]

• Stable fixed point 1−λnc

2(1−λ) exists if nc < 1λ

(otherwise 0).


Invasion from AllD

• In equilibrium, expected payoff Acc of cooperator vs. itself is

(b − c)nc(nc − 1)

n2

• Expected payoff Adc of defector vs. cooperator is

b1 − λn

2(1 − λ)

ncnd

n2

• Condition Acc > Adc reduces to

1 −(1 − λnc)nd

2(1 − λ)(nc − 1)>

c

b

• Since cb

< 1, if RHS larger than that, AllD cannot invade. Thisreduces to

nc >1

λ


Invasion from AllD

Group size

• Two regimes of behavior:

nc <1

λand nc >

1

λ

• In first regime, some cooperation with defectors.

• Amount of cooperation decreases with group size nc and socialinfluence λ.

• In second regime, defectors can never invade.

• But by earlier stability of groups

nc >1

λ.

• So, always stable against invasion from AllD.

Science

Cooperation and Reputation