Connections between Learning Theory, Game Theory, and Optimization Maria Florina (Nina) Balcan...

Preview:

Citation preview

Connections between Learning Connections between Learning Theory, Game Theory, and Theory, Game Theory, and

OptimizationOptimization

Maria Florina (Nina) Balcan

Lecture 14, October 7th 2010

Improved Equilibria via Public Improved Equilibria via Public Service AdvertisingService Advertising

Good equilibria, Bad equilibriaGood equilibria, Bad equilibriaMany games have both bad and good equilibria.

• In some places, everyone drives their own car. In some, everybody uses and pays for good public transit.

G

Fair cost-sharingFair cost-sharingFair cost-sharing: n players in weighted directed

graph G. Player i wants to get from si to ti, and they share cost of edges they use with others.

Fair cost-sharingFair cost-sharing

s

t

1n

• Player i wants to get from si to ti. • All players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

Good equilibrium: all use edge of cost 1. (paying 1/n each)

Bad equilibrium: all use edge of cost n. (paying 1 each)

• Each player wants to minimize his own cost.

Inefficiency of equilibria, PoA and Inefficiency of equilibria, PoA and PoSPoS

Price of Stability (PoS): ratio of best Nash equilibrium to OPT.

Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT.

Significant effort spent on understanding these in CS.

[Koutsoupias-Papadimitriou’99]

[Anshelevich et. al, 2004]

E.g., for fair cost-sharing, PoS is log(n), whereas PoA is n.

“Algorithmic Game Theory”, Nisan, Roughgarden, Tardos, Vazirani

Fair Cost SharingFair Cost Sharing

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

PoA is n; PoS is log(n).

s

t

1n

PoA is O(n):in any Nash no player pays more than OPT s

t

1n

PoA is (n):

Fair Cost SharingFair Cost Sharing

…1

1/2 1/n-1

s1 sn

t

0 00

1+²

PoA is n; PoS is log(n).

PoS is (log(n)):

1/n

00

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

Fair Cost SharingFair Cost Sharing

…1

1/2 1/n-1

s1 sn

t

0 00

1+²

PoA is n; PoS is log(n).

PoS is (log(n)):

1/n

00

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

Fair Cost SharingFair Cost Sharing

PoS is (log(n)):potential function argument

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

PoA is n; PoS is log(n).

Fair Cost SharingFair Cost Sharing

where

Social cost of is

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

Fair Cost SharingFair Cost Sharing

where

Social cost of is

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

A player moves, change in player’s cost = change in potential

Proof: player i moves, get from S to S’; let A be the edges in S but not in S’, and B the edges in S’ but not in S.

Its change in cost:Change in the potential

Fair Cost SharingFair Cost Sharing

where

Social cost of is

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

Fair Cost SharingFair Cost Sharing

PoS is (log(n)):potential function argument

• The potential does not increase & reach a pure Nash of cost · H(n) ¢ OPT.

• Iterate best-response dynamics starting from an optimal solution [i.e, while there is a player that can improve, pick an arbitrary such player and let him to best response].

• Player i wants to get from si to ti, and minimize its cost.

• all players share cost of edges they use with others.

• n players in directed graph G, each edge e costs ce.

• Potential always decreases, finite # of states, so reach a pure Nash.

Congestion games more generallyCongestion games more generallyGame defined by n players and m resources.

• Cost of a resource j is a function fj(nj) of the number nj of players using it.

• Each player i chooses a set of resources (e.g., a path) from collection Si of allowable sets of resources (e.g., paths from si to ti).

• Cost incurred by player i is the sum, over all resources being used, of the cost of the resource.

• Generic potential function:

Best-response dynamics always gives an equilibrium.

Congestion games more Congestion games more generallygenerally

• Always have a pure-strategy equilibrium.

• Have a potential function s.t. whenever a player switches, potential drops by exactly that player’s improvement.

• Nice general class of games with many players.

– Best-response dynamics always gives an equilibrium.

• But maybe a large gap between the quality of the best and the worst equilibrium.

• Lots of work on understanding properties of these games and quality of their equilibria.

Good equilibria, Bad equilibriaGood equilibria, Bad equilibriaMany games have both bad and good equilibria.

• In some places, everyone drives their own car. In some, everybody uses and pays for good public transit.

Guiding from Bad to GoodGuiding from Bad to Good

Standard motivation for PoS: If a central authority could suggest a low-cost Nash

(ride public transit), and everyone followed the suggestion, then this would be stable.

Price of Anarchy (PoA): ratio of worst Nash equilibrium to OPT. Price of Stability (PoS): ratio of best Nash equilibrium to OPT.

Can a helpful authority encourage (guide) behavior to move from a bad state to a good state?

What if only some fraction will pay attention?

• Can the authority guide behavior to a good state?

• Will it just snap back? How does this depend on ?

Guiding from Bad to Good

[Balcan-Blum-Mansour, SODA 2009]

Main ModelMain Model

1. Authority launches advertising, proposing joint action sad.

0. n players initially playing some arbitrary equilibrium.

…1 1 1 1

s1 sn

t

0 00

k

Main ModelMain Model

1. Authority launches advertising, proposing joint action sad. Each player i follows with probability

. Call players that follow receptive players

0. n players initially playing some arbitrary equilibrium.

…1 1 1 1

s1 sn

t

0 00

k

Main ModelMain Model

1. Authority launches advertising, proposing joint action sad.

2. Remaining (non-receptive) players fall to some arbitrary equilibrium for themselves, given play of receptive players.

3. All players follow best-response dynamics to an overall Nash equilibrium.

potential games, pure Nash eqs.

Each player i follows with probability . Call players that follow receptive players

Notes:

social cost:

0. n players initially playing some arbitrary equilibrium.

Main ResultsMain Results

• If only a constant fraction of the players follow the advice, then we can still get within O(1/) of the PoS.• Extend to cost-sharing + linear delays.

(PoS = log(n), PoA = n)

(PoS = 1, PoA = (n2))

• Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

Fair Cost SharingFair Cost Sharing

…1 1 1 1

s1 sn

t

0 00

k

Note: this is best you can hope for. E.g., k =2n.

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

Advertiser proposes OPT (any apx also works)

random vars

Phase 1:

Fair Cost SharingFair Cost Sharing

- Moreover, this option is guaranteed to be at least as good as if other NR players didn’t exist.

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

- In any NE a non-receptive player i, can’t improve by switching to his path Pi

OPT in OPT.

Cost of non-receptive players at the end of Phase 2

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

- In any NE a non-receptive player i, can’t improve by switching to his path Pi

OPT in OPT.

Cost of non-receptive players at the end of Phase 2

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

- In any NE a non-receptive player i, can’t improve by switching to his path Pi

OPT in OPT.

Cost of non-receptive players at the end of Phase 2

- Calculate total cost of these guaranteed options.

Rearrange sum...

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

Cost of non-receptive players at the end of Phase 2

Cost of receptive players at the end of Phase 2

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

Cost of non-receptive players at the end of Phase 2

Use: X ~ Bi(n,p)

Cost of receptive players at the end of Phase 2

Fair Cost SharingFair Cost Sharing

If only a constant fraction of the players follow the advice, then we get within O(1/) of the PoS.

(PoS = log(n), PoA = n)

Expected total cost at the end of Phase 2: O(OPT/). In Phase 3, potential argument shows behavior cannot get worse by more than an additional log(n) factor.

Cost Sharing, ExtensionCost Sharing, Extension

- Still get same guarantee, but proof is trickier

+ linear delays:

Problem: can’t argue as if remaining NR players didn’t exist since they add to delays

Cost Sharing, ExtensionCost Sharing, Extension

- Still get same guarantee, but proof is trickier

- Shadow game wrt non-receptieve players: pure linear latency fns. Offset defined by equilib. at end of phase 2.

# users on e at end of phase 2

- This game has good PoA (5/2) .

+ linear delays:

Cost Sharing, ExtensionCost Sharing, Extension

- Still get same guarantee, but proof is trickier

- Shadow game: pure linear latency fns

- Behavior of NR at end of phase 2 is equilib for this game too.- Show

Cost of the of nonreceptive players at the end of step 2: O(OPT/).

+ linear delays:

Cost Sharing, ExtensionCost Sharing, Extension

- Still get same guarantee, but proof is trickier

Need to still argue about the cost of the receptive players.

Edge by edge charging:

- more receptive players, loose a factor of two compared to OPT

- more non-receptive players, already paid for, loose a factor of two

Cost of the of nonreceptive players at the end of step 2: O(OPT/).

+ linear delays:

Party affiliation gamesParty affiliation games• Given graph G, each edge labeled + or -.• Vertices have two actions: RED or BLUE.

Pay 1 for each + edge with endpoints of different color, and each – edge with endpoints of same color.

• Special cases:

+

+

+

--

• All + edges is consensus game. • All – edges is cut-game.

Party affiliation gamesParty affiliation games OPT is an equilibrium so PoS = 1.

But even for consensus games, PoA = (n2)

Clique with perfect matching removed

all edges labeled plus

Party affiliation gamesParty affiliation games(PoS = 1, PoA = (n2))

- Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

- Same example as for consensus PoA, but sparser across cut.

(lower bound)

Degree ° n/8 across cut, °=1/2-®

Party affiliation gamesParty affiliation games(PoS = 1, PoA = (n2))

- Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

- Same example as for consensus PoA, but sparser across cut.

• For large n, whp all nodes have at most a 1/2-°/2fraction on neighbs in R• Initially, each node has a °/4 fraction on nodes of the other color.

• So, players “locked” into place

Degree ° n/8 across cut, °=1/2-®

(lower bound)

Party affiliation gamesParty affiliation games

(upper bound, consensus games)

- Advertising strategy = follow OPT, e.g. all red.- By Hoeffding, all nodes with degree log n/(®-1/2)2 have more than half of their neighbors in the set R, with prob. 1-1/n.- At the end of step two, all nodes are red.

(PoS = 1, PoA = (n2))

- Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

Note: for general cut games, OPT might not have zero cost for each player.

Party affiliation gamesParty affiliation games

- Split nodes into those incurring low-cost vs those incurring high-cost under OPT.

(upper bound, general party affiliation games)

- Advertising strategy = follow OPT.

- Show that low-cost will switch to behavior in OPT. For high-cost, don’t care.

- Cost only improves in final best-response process.

(PoS = 1, PoA = (n2))

- Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

Party affiliation gamesParty affiliation games

• S is a ¯-dominating if every vertex not in S has more than a ½+¯ fraction of neighbs in S.

• If ® > ½+2¯, then set R of receptive players is ¯-dominating whp

(PoS = 1, PoA = (n2))

- Threshold behavior: for > ½, can get ratio O(1), but for < ½, ratio stays (n2). (assume degrees (log n)).

• Split nodes into those incurring low-cost (less than a ¯-fraction of incident edges incur a cost in OPT) vs those incurring high-cost under OPT.

• Low-cost will switch to behavior in OPT. For high-cost, can only incur a cost of only 1/¯ more their cost in OPT.

(upper bound, general party affiliation games)

SummarySummary

Analyze ability of a central authority to guide behavior to a good equilibrium even if only ® fraction of players are paying attention.

Influencing DynamicsInfluencing Dynamics

Play Best Response

Play the Advertised Behavior

Each player has a few abstract actions.

Expert 1 Expert 2

Uses a learning, experts based alg. to decide which one to use

A more adaptive model [Balcan Blum Mansour, ICS 2010]

[no rigid separation between receptive vs non-receptive players]

Open QuestionsOpen Questions

Get around problem of natural dynamics converging to poor equilibrium without central authority by giving players more information about the game?

Recommended