On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis

On Bounded Rationality and Computational Complexity

Christos Papadimitriou and Mihallis Yannakakis

Bounded Rationality

In Economic Theory computation is usually not considered as a pricey resource. Therefore it sometimes predicts that rational agents will invest a large amount of computation for small payoffs.

If we bound the ability of agents to compute, new solutions might appear.

The Prisoner’s Dilemma

3,3 0,4

4,0 1,1

III

C

D

C D

The Only Nash Equilibrium is (D,D).

D strictly dominates C therefore:

Problem: This contradicts natural human behavior.

The n-round Prisoner’s Dilemma Play the game n times. Compute the average

outcome.

A strategy depends upon the history of the game.

Lemma: The only equilibrium is {D,D}n.

Proof: Playing D in the last round dominates any other strategy. Use a backwards induction argument.

DC

CD

CC

DD

Paretto Optimal

Threat Point

Individually Rational

Goal:

Use bounded rationality in order to find an equilibrium which approximates the collaborating outcome.

Find an equilibrium in which both players play ‘C’ in most of the rounds.

Main Idea

Limit the computational powers of the players in such a way that a new equilibrium would emerge.

The players would be too ‘dumb’ to count the number of rounds therefore would not be able to defect in the last round.

Use the fact that the number of strategies is double exponential in the number of rounds.

Model Of Computation

• The model of computation is automata.• Each pure strategy would be played by a single

automaton.• A mixed strategy is a distribution over automata• The resource we limit is the number of states an

automaton is aloud to have.

Automata Theory - Basics

The Automata consists of States and Transitions.

The States represent the Output, i.e the strategy played by the player. The transitions represent the Input, i.e the strategy played by the opponent.

CC

C

DD

DExample: Tit for Tat

Outline of Argument • Limit the number of states an automaton is aloud to have.

• Present two complex strategies, in which any deviation is retaliated by defecting forever.

• Show that playing the strategies utilizes all of the states, thus no states are left to count the rounds.

Question: What should be the bounds on the number of states?

Trivial Lower Bound

Lemma: An automaton with less then n states, cannot count to n.

Corollary: If both of the automata are bounded to have less then n states, then ‘Tit for Tat’ is an equilibrium.

If one of the automata has less then n states, then ‘Tit for Tat’ is an equilibrium when the ‘smart’ player defects in the last round.

Upper Bound

Theorem 1: If both size bounds are at least 2n-1, then, the only equilibrium is the one in which both players defect in all the rounds.

Proof: Bottom up dynamic programming on the decision tree of each player.

Main Result

If at least one of the state bounds in the n-round prisoner’s dilemma is bounded by 2O(εn) , then, for large enough n, there is a (mixed) equilibrium with average payoff for each player at least 3-ε.

Equilibrium – General Description

• The strategies are mixed. The support of each, consists of 2d pure strategies, represented by a string {C,D}d. d<<n.

• First the players will exchange their strings.• The players will collaborate, while periodically

make sure that the opponent remembers their string.

• A deviation would be retaliated by defecting for ever.

The Equilibrium Automaton

d announce

Cooperation

+

Checking

All states contain also a retaliating edge that leads to an ever defecting state.

Technical Points I

Problem: Some ‘business cards’ are more advantageous than others. Therefore it is better to have a ‘business card’ with many ‘D’.

Solution: After the first segment, comes a ‘fix up’ segment: For i=1…d rounds both players play C iff the i’th letter in both their cards is C.

Now in the first 2d steps, the average payoff is 1.75, for every business card.

d Announce

Cooperation

+

Checking

Fix upd

The Equilibrium Automaton

Technical Points II

Problem: The Checking segment might not be fair, making one business card more advantageous then another.

Solution: The Checking segment is done by xoring the C’s and the D’s. Thus on average the payments are the same for all business cards.

Technical Points III

Problem: If the opponent is playing according to his equilibrium strategy, then he will never encounter the punitive edge of a state.A ‘cheating’ automaton might be able to use this in order to save states, and then defect in the last round.

Suppose player 1 has a state q in which he plays C and expects C, and another state p in which he plays C and expects D. A cheating player can unite both states and save a state.

The ‘Honest’ Player’s Automaton

C

C

D

Punitive State.

p

Cq

D

D C

The ‘Cheating’ Player’s Automaton

C

C

p

D

The Solution

After the business card exchange, both players play in unison, i.e. each player, when plays C expects C and when plays D expects D.

Therefore the problematic scenario does not occur.

Generalization

Let G be an arbitrary game and let p = (p1,p2) be a point in the individually rational region realized by pure strategies. For every ε>0, there is c>0 , N>0 such that for all n>N, in the n round repeated game G played by automata.

If at least one of the automata is bounded by 2cn then there exists an equilibrium with average payoff of at least p-ε.

Documents

On Bounded Rationality and Computational Complexity Christos Papadimitriou and Mihallis Yannakakis