67
Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1 CTW 2009

Concurrent Reachability Games

  • Upload
    starr

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Concurrent Reachability Games. Peter Bro Miltersen Aarhus University. My apologies …. For not getting slides ready in time for inclusion in booklet ! Slides available at http://www.daimi.au.dk/~bromille. Concurrent reachability games. - PowerPoint PPT Presentation

Citation preview

Page 1: Concurrent Reachability  Games

CTW 2009 1

Concurrent Reachability Games

Peter Bro MiltersenAarhus University

Page 2: Concurrent Reachability  Games

CTW 2009 2

My apologies…

• For not getting slides ready in time for inclusion in booklet!

• Slides available at http://www.daimi.au.dk/~bromille

Page 3: Concurrent Reachability  Games

CTW 2009 3

Concurrent reachability games

• Class of two-player zero-sum games generalizing simple stochastic games (Uri’s talk yesterday).

• Studied mainly by the formal methods (”Eurotheory”) community (but sometimes at such venues as FOCS and SODA).

• Very interesting and challenging algorithmic problems!

Page 4: Concurrent Reachability  Games

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

Slide stolen from Uri…..

1/2

1/2ZP’96

Page 5: Concurrent Reachability  Games

Simple Stochastic games (SSGs)Strategies

A general strategy may be randomized and history dependent

A positional strategy is deterministicand history independent

Positional strategy for MAX: choice of an outgoing edge from each MAX vertex

Another slide stolen from Uri…..

Page 6: Concurrent Reachability  Games

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Last slide stolen from Uri (I promise!)

Page 7: Concurrent Reachability  Games

Simple Stochastic game (SSGs) Reachability version [Condon (1992)]

Objective: MAX/min the probability of getting to the MAX-sink

Two Players: MAX and min

MAX minRAND

R

MAX-sink

min-sink

1/2

1/2ZP’96

Concurrent Reachability Games

Page 8: Concurrent Reachability  Games

CTW 2009 8

(Simple) concurrent reachability game

• Arena:– Finite directed graph. – One Max sink (”goal”) node.– Each non-sink node has assigned a 2x2 matrix of outgoing arcs.

• Play:– A pebble moves from node to node as in a simple stochastic game.– In each step, Max chooses a row and Min simultaneously chooses

a column of the matrix. – The pebble moves along the appropriate arc.– If Max reaches the goal node he wins– If this never happens, Min wins.

Page 9: Concurrent Reachability  Games

CTW 2009 9

Simulation

MAX

Page 10: Concurrent Reachability  Games

CTW 2009 10

Simulation

min

Page 11: Concurrent Reachability  Games

CTW 2009 11

Simulation

R1/2

1/2

…. Somewhat more subtle that this works!

Page 12: Concurrent Reachability  Games

CTW 2009 12

”Proof” of correctness

• We want values in the CRG to be the same as in the SSG.

• In particular, the value of the node simulating a coin toss should be the average of the values of the two nodes it points to.

• If these two values are the same, this is ”clearly” the case.

• If they have different values v1, v2, the simulated coin toss nodes is a game of Matching Pennies with payoffs v1, v2. This game has value (v1+v2)/2.

Page 13: Concurrent Reachability  Games

Simple Stochastic games (SSGs)Values

Both players have positional optimal strategies

Every vertex i in the game has a value vi

positional general

positional general

There are strategies that are optimal for every starting position

Concurrent Reachability Games (CRGs)

Page 14: Concurrent Reachability  Games

Simple Stochastic games (SSGs)Values

Both players have stationary optimal strategies

Every vertex i in the game has a value vi

stationary general

stationary general

There are strategies that are optimal for every starting position

sup

inf

Concurrent Reachability Games (CRGs)

Stationary: As positional, except that we allow randomization

Page 15: Concurrent Reachability  Games

CTW 2009

Why randomized strategies?

15

MAX-sink

min-sink0-1 matrix games can

be immediately siimulated

Page 16: Concurrent Reachability  Games

CTW 2009

Why sup/inf instead of max/min?

16

MAX-sink

min-sink

Page 17: Concurrent Reachability  Games

CTW 2009

Why sup/inf instead of max/min?

17

MAX-sink

min-sink

Page 18: Concurrent Reachability  Games

CTW 2009 18

Why sup/inf instead of max/min

• ”Conditionally repeated matching pennies”: – Min hides a penny– Max tries to guess if it is heads up or tails up.– If Max guesses correctly, he gets the penny.– If Max incorrectly guesses tails, he loses (goes into

min-sink/trap)– If Max incorrectly guesses heads, the game

repeats.• What is the value of this game? 1

Page 19: Concurrent Reachability  Games

CTW 2009 19

Almost optimal strategy for Max

• Guess ”heads” with probability 1-² and ”tails” with probability ² (every time).

• Guaranteed to win with probability 1-².

• But no strategy of Max wins with probability 1.

Page 20: Concurrent Reachability  Games

Values and near-optimal strategies

• Each position in a concurrent reachability game has a value.

• For any ε>0, each player has a stationary strategy guaranteeing the value within ε (an ε-optimal strategy).

• Shown in Everett, “Recursive games”, 1953.

Page 21: Concurrent Reachability  Games

CTW 2009 21

Algorithmic problems

• Qualitatively solving a CRG.– Determining which nodes have value 1.

• Quantitatively solving a CRG.– Approximately computing the values of the nodes.

• Strategically solving a CRG.– Computing an ²-optimal stationary strategy for a

given ².

Page 22: Concurrent Reachability  Games

CTW 2009 22

Qualitatively solving CRGs

• De Alfaro, Henzinger, Kupferman, FOCS 1998.– Beautiful algorithm!– Formal methods community type algorithm!– Fixed point computation inside a fixed point

computation inside a fixed point computation….– Runs in time O(n2).

• Open (I think): Can this time bound be improved? (for SSGs the corresponding time is linear)

Page 23: Concurrent Reachability  Games

CTW 2009 23

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?

Page 24: Concurrent Reachability  Games

CTW 2009 24

The value of a CRG may be irrational!Ferguson, Game Theory

Positive payoffs different from 1 can be simulated with scaling and coin toss gadgets.Negative payoffs are harder to simulate but in this game we can do it by adding a constant to all payoffs

Page 25: Concurrent Reachability  Games

CTW 2009 25

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?• Maybe we want to look at the decision

problem consisting of comparing the value to a given rational?

Page 26: Concurrent Reachability  Games

CTW 2009 26

SUM-OF-SQRT hardness

• SUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of square roots (of integers), does E evaluate to a positive number?

• Not known to be in P or NP or even the polynomial hierarchy (open at least since Garey and Johnson).

• Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT.

Page 27: Concurrent Reachability  Games

CTW 2009 27

Sketch of Proof

• We already saw how to make games whose values are the solution to certain quadratic equations, i.e., square roots + rationals.

• Once we have a bunch of such games, we can easily make a game whose value is the average by a ”coin toss gadget”.

Page 28: Concurrent Reachability  Games

CTW 2009 28

Quantitatively solving CRGs

• We want to approximate the values of the positions.

• Why not compute them exactly?• Maybe we want to compare the value to a

given rational?• Given ², we want to compute an

approximation within ².

Page 29: Concurrent Reachability  Games

CTW 2009 29

Value iteration• Assign all nodes ”value approximation” 0• Replace pointers with value approximations. Each node is now

a matrix game.• Solve and replace approximations.• Theorem: Value approximations converge to values (from

below).• Proof sketch: The value approximations are the exact values of

a time limited version of the game.• How long time to get witin 0.01 of actual values?• Even for SSGs this takes exponential time (Condon’93).• For CRGs, an open problem until recently (see later).

Page 30: Concurrent Reachability  Games

CTW 2009 30

Another algorithm for approximating values

• The property of being a number larger or smaller than the value of a CRG can be expressed by a polynomial length formula in the existential first order theory of the reals.

• There exists a stationary strategy such that…. • As a corollary to Renegar’89, approximating the value is

in PSPACE.• This is the best known ”complexity class” upper bound!• …. also the best known concrete ”big-O” complexity

bound (using Basu et al instead of Renegar).

Page 31: Concurrent Reachability  Games

CTW 2009 31

Why no NP Å coNP upper bound?

• Guess a strategy and verify that it works?

• Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL’04 claims such a result.

• In 2007, Kousha Etessami found a technical issue in the proof and the authors retracted the claim.

Page 32: Concurrent Reachability  Games

CTW 2009 32

• It is not obvious that computing the values gives any information about the strategies.

• In contrast, for SSGs, optimal strategies can be computed from values in linear time (Andersson and M., ISAAC’09)

Computing values vs. Finding strategies

MAX-sink

Page 33: Concurrent Reachability  Games

Algorithms strategically solving concurrent reachability games

Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST’06.

Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA’09.

Policy improvement!No time bounds given….

Page 34: Concurrent Reachability  Games

Theorem [Hansen, Koucky and M., LICS’09]:– Any algorithm that manipulates ε-optimal

strategies of concurrent reachability games must use exponential space (so no NP Å coNP algorithm comes from guessing strategies)

– Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time).

“Hardness” of solving CRGs

Page 35: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

Dante enters Purgatory at terrace 1.

Purgatory has 7 terraces.

Page 36: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

While in Purgatory, once asecond, Dante must playMatching Pennieswith Lucifer

Page 37: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 38: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 39: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 40: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 41: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 42: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 43: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 44: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 45: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 46: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 47: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 48: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante wins, he proceedsto the next terrace

Page 49: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7 If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.

Page 50: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7 If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.

Page 51: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Page 52: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Page 53: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Page 54: Concurrent Reachability  Games

Dante in Purgatory

1

2

3

4

5

6

7

If Dante loses Matching Penniesguessing Taiis…..

…. he loses the game of Purgatory!!!!

Page 55: Concurrent Reachability  Games

Dante in Purgatory

Page 56: Concurrent Reachability  Games

• Is there is a strategy for Dante so that he is guaranteed to win the game of Purgatory with probability at least 90%?– Yes.

• How long can Lucifer confine Dante to Purgatory if Dante plays by such a strategy?– 1055 years.

Dante in Purgatory

A bit surprising – when Dante wins, he has guessedcorrectly which hand seven times in a row!

Apply algorithm of de Alfaro, Henzinger and Kupferman

Page 57: Concurrent Reachability  Games

Purgatory is a game of doubly exponential patience.

• The patience of a mixed strategy is 1/p where p is the smallest non-zero probability used by the strategy (Everett, 1957).

• To win with probability 1-ε, Dante must choose “Heads” at terrace i with probability greater than (approximately)

1- ε27-i

• On the other hand, choosing “Heads” with probability 1 is no good!

• To win with probability 9/10, he must choose “Heads” at terrace 1 with probability greater than 1-(1/10)64 = 0.9999999999999999999999999999999999999999999999999999999999999999. But then Lucifer can respond by always choosing “Tails” at terrace 1.

Page 58: Concurrent Reachability  Games

Theorem [Hansen, Koucky and M.]:–Any algorithm that manipulates ε-optimal

strategies of concurrent reachability games must use exponential space.

• Proof: Storing 0.9999999999999999999999999999999999999999999999999999999999999999 takes up a lot of space!

“Hardness” of solving CRGs

Page 59: Concurrent Reachability  Games

Time of play and value iteration• To win Purgatory with probability 1-², almost all probability mass

has to be assigned to strategies leading to plays of length at least (1/²)2n-1.

• On the other hand, (1/²)2116n is worst possible expected time of play for any game with n nodes.

• Corollary: To solve Purgatory quantitatively using value iteration, (1/²)2n-1 iterations are needed to get anywhere near the correct values. But (1/ε)2116 n iterations is enough to get ε-close for any n-node game.

• Upper bounds shown (again )by appealing to the first order theory of the reals (semi-algebraic geometry), in particular Basu et al.

Page 60: Concurrent Reachability  Games

Patience of Purgatory with n terraces and ² < ½

• Upper bound: (1/²)2n-1

• Lower bound: ((1-²)/²2)2n-2

Page 61: Concurrent Reachability  Games

Proof of lower bound

Page 62: Concurrent Reachability  Games

±

> ±2

WLOG first placefrom abovewhere thishappens…

Page 63: Concurrent Reachability  Games

Proof of lower bound

Page 64: Concurrent Reachability  Games

Open problems

• What is the exact patience of Purgatory? Probably not a closed expression.• Is Purgatory extremal with respect to

patience among n-node CRGs? • If yes, this gives a better upper bound on

number of iterations of value iteration for CRGs, replacing 116 with 1!

Page 65: Concurrent Reachability  Games

Compare

Condon’s example. Extremal with respect to, e.g., expected absorption time.

Page 66: Concurrent Reachability  Games

Open Problem• The fact that the values can be approximated in PSPACE, stronlgy

suggests that PSPACE should be enough for “understanding” CRGs.

• Is there a “natural” representation of probabilities so that– ε-optimal strategies of CRGs can be represented succinctly and– ε-optimal strategies of CRGs can be computed using polynomial space?

• De Alfaro, Henzinger, Kupferman , FOCS’98: Yes, for the restricted case CRGs where the values of all positions are 0 or 1.

• CRGs seem much harder to analyze than SSGs. Are there any formal argument sfor this (beyond SUM-OF-SQRT hardness)?

Page 67: Concurrent Reachability  Games

CTW 2009 67

Thank you!