45
Arguments for Recovering Cooperation • Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action is wrong! – somehow the dilemma is being formulated wrongly – This isn’t rational. We may not defect for a few cents. If sucker’s payoff really hurts, more likely to be rational. Wool6Iterated Games 1

Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Embed Size (px)

Citation preview

Page 1: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Arguments for Recovering Cooperation

• Conclusions that some have drawn from analysis of prisoner’s dilemma:– the game theory notion of rational action is

wrong!– somehow the dilemma is being formulated

wrongly– This isn’t rational. We may not defect for a few

cents. If sucker’s payoff really hurts, more likely to be rational.

Wool6Iterated Games 1

Page 2: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Arguments to recover cooperation:

– We are not all self-centered! But sometimes we are only nice because there is a punishment. If we don’t give up seat on bus, we receive rude stares.

– If this were true, places like Honor Copy would be exploited.

– The other prisoner is my twin! When I decide what to do, the other agent will do the same. (but can’t force it, as wouldn’t be autonomous).

– Your mother would say, “What if everyone were to behave like that?” You say, “I would be a fool to act any other way.”

– The shadow of the future…we will meet again.

Wool6Iterated Games 2

Page 3: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

The Iterated Prisoner’s Dilemma

• One answer: play the game more than once• If you know you will be meeting your

opponent again, then the incentive to defect appears to evaporate

• Cooperation is the rational choice in the infinitely repeated prisoner’s dilemma(Hurrah!)

Wool6Iterated Games 3

Page 4: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Backwards Induction• But…suppose you both know that you will play the

game exactly n timesOn round n - 1, you have an incentive to defect, to gain that extra bit of payoff…But this makes round n – 2 the last “real”, and so you have an incentive to defect there, too.This is the backwards induction problem.

• Playing the prisoner’s dilemma with a fixed, finite, pre-determined, commonly known number of rounds, defection is the best strategy

Wool6Iterated Games 4

Page 5: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

The centipede game – What would you do? Either player can stop the game.

Wool6Iterated Games 5

Jack

stop

(2, 0)

Go on Jill

stop

(1, 4)

JillGo on

stop

(5, 3)

JackGo on

stop

(4, 7)

Jill

(98, 96)

stop

(99, 99)

Go on

(97, 100)

stop

Go on

JackGo on Jill

(94, 97)

Page 6: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

The centipede game

Wool6Iterated Games 6

Jack

stop

(2, 0)

Go on Jill

stop

(1, 4)

JillGo on

stop

(5, 3)

JackGo on

stop

(4, 7)

Jill

(98, 96)

stop

(99, 99)

Go on

(97, 100)

stop

Go on

JackGo on Jill

(94, 97)

The solution to this game through backward induction is for Jack to stop in the first round!

Page 7: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

The centipede game

• What actually happens?• In experiments the game usually continues for at least a

few rounds and occasionally goes all the way to the end. • But going all the way to the (99, 99) payoff almost never

happens – at some stage of the game ‘cooperation’ breaks down.

• So still do not get sustained cooperation even if move away from ‘roll back’ as a solution

Wool6Iterated Games 7

Page 8: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Lessons from finite repeated games– Finite repetition often does not help players to reach better

solutions– Often the outcome of the finitely repeated game is simply

the one-shot Nash equilibrium repeated again and again. – There are SOME repeated games where finite repetition can

create new equilibrium outcomes. But these games tend to have special properties

– For a large number of repetitions, there are some games where the Nash equilibrium logic breaks down in practice.

Wool6Iterated Games 8

Page 9: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Axelrod’s Tournament• Suppose you play iterated prisoner’s dilemma

against a range of opponents…What strategy should you choose, so as to maximize your overall payoff?

• Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma

Wool6Iterated Games 9

Page 10: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Axelrod’s tournament: invited political scientists, psychologists, economists, game theoreticians to play iterated prisoners dilemma

• All-D – always defect• Random: randomly pick a strategy• Tit-for-Tat – On first round cooperate. Then do

whatever your opponent did last.• Tester – first defect, If the opponent ever retaliates,

then use tit-for-tat. If the opponent does not defect, cooperate for two rounds, then defect.

• Joss: Tit-for-tat, but 10% of the time, defect instead of cooperating.

• Which do you think had the highest scores?

Wool6Iterated Games 10

Page 11: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Best? Tit-for-Tat

• Why? Because you were averaging over all types of strategy

• If you played only All-D, tit-for-tat would lose.

Wool6Iterated Games 11

Page 12: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Two Trigger Strategies

• Grim trigger strategy– Cooperate until a rival deviates– Once a deviation occurs, play non-cooperatively for the

rest of the game

• Tit-for-tat– Cooperate if your rival cooperated in the most recent

period– Cheat if your rival cheated in the most recent period

Wool6Iterated Games 12

Page 13: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Axelrod's rules for success• Do not be envious – not necessary to beat your opponent in

order to do well. This is not zero sum.• Do not be the first to defect. Be nice. Start by cooperating.• Retaliate appropriately:

Always punish defection immediately, but use “measured” force — don’t overdo it

• Don’t hold grudges:Always reciprocate cooperation immediately

• do not be too clever– when you try to learn from the other agent, don’t forget he is trying

to learn from you. – Be forgiving – one defect doesn’t mean you can never cooperate– The opponent may be acting randomly

Wool6Iterated Games 13

Page 14: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Threats• Threatening retaliatory actions may help gain

cooperation• Threat needs to be believable

– “If you are late for class, I will give you an F” - credible?

– “If you break the rules, you will be grounded for a year.” – credible?

– “If you cross me, we will not go trick-or-treating.” – credible?

Wool6Iterated Games 14

Page 15: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

What is Credibility?

• “ The difference between genius and stupidity is that genius has its limits.”

– Albert Einstein

• You are not credible if you propose to take suboptimal actions.:

A rational actor proposes to play a strategy which earns suboptimal profit.

• How can one be credible?

Wool6Iterated Games 15

Page 16: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Trigger Strategy Extremes

•Tit-for-Tat is–most forgiving–shortest memory–proportional–credible but lacks deterrence

•Tit-for-tat answers: • “Is cooperation easy?”

•Grim trigger is–least forgiving–longest memory–MAD–adequate deterrence but lacks credibility

•Grim trigger answers: • “Is cooperation possible?”

Wool6Iterated Games 16

Page 17: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

concepts of rationality [doing the rational thing]

• undominated strategy (problem: too weak) can’t always find a single undominated strategy• (weakly) dominating strategy (alias “duh?”) (problem: too strong, rarely exists)• Nash equilibrium (or double best response) (problem: equilibrium may not exist) • randomized (mixed) Nash equilibrium – players choose

various options based on some random number (assigned via a probability)

• Theorem [Nash 1952]: randomized Nash Equilibrium always exists.

Wool6Iterated Games 17

.

.

.

Page 18: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Mixed strategy equilibria

• i(sj) is the probability player i selects strategy sj• (0,0,…1,0,…0) is a pure strategy (over n possible

choices)• Strategy profile: =(1,…, n)• Expected utility: chance the outcome occurs times

utility• Nash Equilibrium:

– * is a (mixed) Nash equilibrium if

Wool6Iterated Games 18

ii defines a probability distribution over Si

ui(*i, *-i)ui(i, *-i) for all ii, for all i

Page 19: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Example: Matching Penniesno pure strategy Nash Equilibrium

Wool6Iterated Games 19

-1, 1 1,-1

1,-1 -1, 1H

H T

T

Pure strategy equilibria [I make one choice.].

Not all games have pure strategy equilibria. Some equilibria are mixed strategy equilibria.

Page 20: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Example: Matching Pennies

Wool6Iterated Games 20

-1, 1 1,-1

1,-1 -1, 1

p H

q H 1-q T

1-p T

Want to play each strategy with a certain probability. If player 2 is optimally mixing strategies, player 1 is indifferent between his own choices! Compute expected utility given each pure possibility of other player.

Page 21: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

I reason about my choices as player 2

21

Note, my concern is in how well the other person is doing because I know he will be motivated to do what is best for himself

If I pick q=1/2, what is my strategy?

Page 22: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

I am player 2. What should I do?I pick a defensive strategy

• If player1 picks heads, his opponent gets: -q+(1-q)• If Player 1 picks tails , his opponent gets: q + -(1-q)• Want my opponent NOT to care what I pick. The idea is, if my opponent

gets excited about what my strategy is, it means I have left open an opportunity for him. When it doesn’t matter what he does, it says there is no way he wins big.

So: -q +(1-q) =q + -1+q 1-2q=2q-1 so q=1/2

Wool6Iterated Games 22

Page 23: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Example: Bach/Stravinsky

Wool6Iterated Games 23

2, 1 0,0

0,0 1, 2

p B

q B 1-q S

1-p S

Want to play each strategy with a certain probability. If player 2 is optimally mixing strategies, player 1 is indifferent to what player1 does. Compute expected utility given each pure possibility of yours.

Page 24: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Example: Bach/Stravinsky

Wool6Iterated Games 24

2, 1 0,0

0,0 1, 2

p B

q B 1-q S

1-p S

Want to play each strategy with a certain probability. If player 2 is optimally mixing strategies, player 1 is indifferent to what player1 does. Compute expected utility given each pure possibility of yours.

p = 2(1-p) p=2/3

2q = (1-q) q=1/3

player 1 is optimally mixing

player 2 is optimally mixing

Page 25: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Mixed Strategies

• Unreasonable predictors of one-time human interaction

• Reasonable predictors of long-term proportions

Wool6Iterated Games 25

Page 26: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Employee Monitoring• Employees can work hard or shirk

• Salary: $100K unless caught shirking • Cost of effort: $50K (We are assuming that when he works he loses something.

Think of him as having to pay for resources to do his job – expensive paper, subcontracting, etc. We are also assuming that unless the employee is caught shirking the boss can’t tell he hasn’t been working.)

• Managers can monitor or not• Value of employee output: $200K• (We assume he must be worth more than we pay him to cover profit, infrastructure, manager time, mistakes, etc.)

• Profit if employee doesn’t work: $0• Cost of monitoring: $10K

Give me the normal form game payoffs

Wool6Iterated Games 26

Page 27: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Employee Monitoring

• From the problem statement, VERIFY the numbers in the table are correct.• No equilibrium in pure strategies - SHOW IT• What do the players do in mixed strategies? DO AT SEATS• Please do not consider this instruction for how to cheat your boss. Rather, think

of it as advice in how to deal with employees.

27

Manager (q)

Monitor (q)

No Monitor(1-q)

Employee

Work (p)

50 , 90 50 , 100

Shirk(1-p)

0 , -10 100 , -100

Page 28: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

• p – probability of working• q – probability of monitoring

28

Page 29: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Employee’s Payoff• First, find employee’s expected payoff from

each pure strategy

If employee works: receives 50Profit(work) = 50q + 50(1-q)

= 50

If employee shirks: receives 0 or 100Profit(shirk) = 0 q + 100(1-q)

= 100 – 100q

Wool6Iterated Games 29

Page 30: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Employee’s Best Response

• Next, calculate the best strategy for possible strategies of the opponent

• For q<1/2: SHIRK

• Profit(shirk) = 100-100q > 50 = Profit(work) SHIRK

• For q>1/2: WORK

• Profit(shirk) = 100-100q < 50 = Profit(work) WORK• For q=1/2: INDIFFERENT• Profit(shirk) = 100-100q = 50 = Profit(work) ????

Wool6Iterated Games 30

Page 31: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Cycles

31q

0 11/2

p

0

9/10

1

work

shirk

monitorno monitor

If I am not monitoring and they are working, they will change their mind

Page 32: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Properties of Equilibrium• Both players are indifferent between any mixture

over their strategies• E.g. employee:

• If shirk:

• If work:

• Regardless of what employee does, expected payoff is the same

• Similar for employer. Their utility is 80Wool6Iterated Games 32

50 ]1000[21

21

50 ]5050[21

21

Page 33: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Upsetting?• This example is upsetting as it appears to tell you, as workers,

to shirk.• Think of it from the manager’s point of view, assuming you

have unmotivated (or unhappy) workers.• A better option would be to hire dedicated workers, but if

you have people who are trying to cheat you, this gives a reasonable response.

• Sometimes you are dealing with individuals who just want to beat the system. In that case, you need to play their game. For example, people who try to beat the IRS.

• On the positive side, even if you have dishonest workers, if you get too paranoid about monitoring their work, you lose! This theory tells you to lighten up!

Wool6Iterated Games 33

Page 34: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Why Do We Mix?• I don’t want to give my opponent an

advantage. When my opponent can’t decide what to do based on my strategy, I win – as there is not way he is going to take advantage of me.

Wool6Iterated Games 34

COMMANDMENT

Use the mixed strategy that keeps your opponent guessing.

Page 35: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

• The following example is one you can work through on your own.

35

Page 36: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Mixed Strategy Equilibriums

• Anyone for tennis?

– Should you serve to the forehand or the backhand?

Wool6Iterated Games 36

Page 37: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Tennis Payoffs

Wool6Iterated Games 37

Server's Aim

Receiver's Move

Forehand Backhand

Forehand .90, .10 .20, .80

Backhand .30, .70 .60, .40

Page 38: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Tennis: Fixed SumIf you win (the points), I lose (the points)

AKA: Strictly competitive

Wool6Iterated Games 38

Server's Aim

Receiver's Move

Forehand Backhand

Forehand .90,.10 .20,.80

Backhand .30,.70 .60,.40

q 1-q

p

1-p

Page 39: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Solving for Server’s Optimal Mix

• What would happen if the the server always served to the forehand?

– A rational receiver would always anticipate forehand and 90% of the serves would be successfully returned.

Wool6Iterated Games 39

Page 40: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Solving for Server’s Optimal Mix

• What would happen if the the server aimed to the forehand 50% of the time and the backhand 50% of the time and the receiver always guessed forehand?

– (0.5*0.9) + (0.5*0.2) = 0.55 successful returns

Wool6Iterated Games 40

Page 41: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Solving for Server’s Optimal Mix

• What is the best mix for each player?• Receiver thinks:• if server serves forehand .10*p +.70*(1-p) • if server serves backhand .80*p +.40*(1-p)• I want them to be the same• .10*p +.70*(1-p) = .80*p +.40*(1-p)• .10*p +.70 -.70p = .80*p +.40 -.40p• -.6p+.7 = .4p +.4• .3 =p • Use similar argument to solve for q – • So strategies are ((.3, .7)(.4, .6))

Wool6Iterated Games 41

Page 42: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Draw a graph which shows two lines(1) the utility of server of “picking forehand” as a

function of p. (2) the utility of server of “picking backhand” as a

function of p.

Wool6Iterated Games 42

Page 43: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Receiver’s view depending on opponent actionAbove 1/3, backhand wins.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

forehandbackhand

Wool6Iterated Games 43p

Page 44: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

Server’s view dependent on opponent actionAbove .4 plan forehand wins

0 0.1 0.2 0.3 0.4 0.5

0.600000000000001

0.700000000000001 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

plan forehandplan backhand

Wool6Iterated Games

44q

Page 45: Arguments for Recovering Cooperation Conclusions that some have drawn from analysis of prisoner’s dilemma: – the game theory notion of rational action

% of Successful Returns Given Server and Receiver Actions

Wool6Iterated Games 45

% of Successful Returns

% of ServesAimed atForehand

ReceiverAnticipatesForehand

ReceiverAnticipatesBackhand

0 20 60

20 34 54

50 55 45

70 69 39

100 90 30

Where would you shoot knowing the other playerwill respond to your choices?In other words, you pick the rowbut will likelyget the smaller value in a row.