of 67/67

Concurrent Reachability Games Peter Bro Miltersen Aarhus University 1 CTW 2009

View

35Download

0

Tags:

Embed Size (px)

DESCRIPTION

Concurrent Reachability Games. Peter Bro Miltersen Aarhus University. My apologies …. For not getting slides ready in time for inclusion in booklet ! Slides available at http://www.daimi.au.dk/~bromille. Concurrent reachability games. - PowerPoint PPT Presentation

Concurrent Reachability Games

Concurrent Reachability GamesPeter Bro MiltersenAarhus University1CTW 2009

My apologiesFor not getting slides ready in time for inclusion in booklet!

Slides available at http://www.daimi.au.dk/~bromille

CTW 20092Concurrent reachability gamesClass of two-player zero-sum games generalizing simple stochastic games (Uris talk yesterday).

Studied mainly by the formal methods (Eurotheory) community (but sometimes at such venues as FOCS and SODA).

Very interesting and challenging algorithmic problems!CTW 20093Simple Stochastic game (SSGs) Reachability version [Condon (1992)]Objective: MAX/min the probability of getting to the MAX-sinkTwo Players: MAX and minMAXminRAND

RMAX-sinkmin-sinkSlide stolen from Uri..1/21/2ZP964Simple Stochastic games (SSGs)StrategiesA general strategy may be randomized and history dependentA positional strategy is deterministicand history independentPositional strategy for MAX: choice of an outgoing edge from each MAX vertexAnother slide stolen from Uri..5Simple Stochastic games (SSGs)ValuesBoth players have positional optimal strategiesEvery vertex i in the game has a value vi

positionalgeneralpositionalgeneralThere are strategies that are optimal for every starting positionLast slide stolen from Uri (I promise!)6Simple Stochastic game (SSGs) Reachability version [Condon (1992)]Objective: MAX/min the probability of getting to the MAX-sinkTwo Players: MAX and minMAXminRAND

RMAX-sinkmin-sink1/21/2ZP96 Concurrent Reachability Games

7(Simple) concurrent reachability gameArena:Finite directed graph. One Max sink (goal) node.Each non-sink node has assigned a 2x2 matrix of outgoing arcs.Play:A pebble moves from node to node as in a simple stochastic game.In each step, Max chooses a row and Min simultaneously chooses a column of the matrix. The pebble moves along the appropriate arc.If Max reaches the goal node he winsIf this never happens, Min wins.CTW 20098SimulationCTW 20099MAXSimulationCTW 200910minSimulationCTW 200911

R1/21/2. Somewhat more subtle that this works!Proof of correctnessWe want values in the CRG to be the same as in the SSG.In particular, the value of the node simulating a coin toss should be the average of the values of the two nodes it points to.If these two values are the same, this is clearly the case.If they have different values v1, v2, the simulated coin toss nodes is a game of Matching Pennies with payoffs v1, v2. This game has value (v1+v2)/2.

CTW 200912Simple Stochastic games (SSGs)ValuesBoth players have positional optimal strategiesEvery vertex i in the game has a value vi

positionalgeneralpositionalgeneralThere are strategies that are optimal for every starting positionConcurrent Reachability Games (CRGs)13Simple Stochastic games (SSGs)ValuesBoth players have stationary optimal strategiesEvery vertex i in the game has a value vi

stationarygeneralstationarygeneralThere are strategies that are optimal for every starting positionsup inf Concurrent Reachability Games (CRGs)Stationary: As positional, except that we allow randomization14Why randomized strategies?CTW 200915MAX-sinkmin-sink0-1 matrix games can be immediately siimulatedWhy sup/inf instead of max/min?CTW 200916MAX-sinkmin-sinkWhy sup/inf instead of max/min?CTW 200917MAX-sinkmin-sinkWhy sup/inf instead of max/minConditionally repeated matching pennies: Min hides a pennyMax tries to guess if it is heads up or tails up.If Max guesses correctly, he gets the penny.If Max incorrectly guesses tails, he loses (goes into min-sink/trap)If Max incorrectly guesses heads, the game repeats.What is the value of this game?

CTW 2009181Almost optimal strategy for MaxGuess heads with probability 1- and tails with probability (every time).

Guaranteed to win with probability 1-.

But no strategy of Max wins with probability 1.CTW 200919Values and near-optimal strategiesEach position in a concurrent reachability game has a value.

For any >0, each player has a stationary strategy guaranteeing the value within (an -optimal strategy).

Shown in Everett, Recursive games, 1953.

Algorithmic problemsQualitatively solving a CRG.Determining which nodes have value 1.Quantitatively solving a CRG.Approximately computing the values of the nodes.Strategically solving a CRG.Computing an -optimal stationary strategy for a given .CTW 200921Qualitatively solving CRGsDe Alfaro, Henzinger, Kupferman, FOCS 1998.Beautiful algorithm!Formal methods community type algorithm!Fixed point computation inside a fixed point computation inside a fixed point computation.Runs in time O(n2).Open (I think): Can this time bound be improved? (for SSGs the corresponding time is linear)

CTW 200922Quantitatively solving CRGsWe want to approximate the values of the positions.Why not compute them exactly?

CTW 200923The value of a CRG may be irrational!CTW 200924

Ferguson, Game Theory

Positive payoffs different from 1 can be simulated with scaling and coin toss gadgets.Negative payoffs are harder to simulate but in this game we can do it by adding a constant to all payoffsQuantitatively solving CRGsWe want to approximate the values of the positions.Why not compute them exactly?Maybe we want to look at the decision problem consisting of comparing the value to a given rational?

CTW 200925SUM-OF-SQRT hardnessSUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of square roots (of integers), does E evaluate to a positive number?

Not known to be in P or NP or even the polynomial hierarchy (open at least since Garey and Johnson).

Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT.

CTW 200926Sketch of ProofWe already saw how to make games whose values are the solution to certain quadratic equations, i.e., square roots + rationals.

Once we have a bunch of such games, we can easily make a game whose value is the average by a coin toss gadget.

CTW 200927Quantitatively solving CRGsWe want to approximate the values of the positions.Why not compute them exactly?Maybe we want to compare the value to a given rational?Given , we want to compute an approximation within .CTW 200928Value iterationAssign all nodes value approximation 0Replace pointers with value approximations. Each node is now a matrix game.Solve and replace approximations.Theorem: Value approximations converge to values (from below).Proof sketch: The value approximations are the exact values of a time limited version of the game.How long time to get witin 0.01 of actual values?Even for SSGs this takes exponential time (Condon93).For CRGs, an open problem until recently (see later).

CTW 200929Another algorithm for approximating valuesThe property of being a number larger or smaller than the value of a CRG can be expressed by a polynomial length formula in the existential first order theory of the reals.There exists a stationary strategy such that. As a corollary to Renegar89, approximating the value is in PSPACE.This is the best known complexity class upper bound!. also the best known concrete big-O complexity bound (using Basu et al instead of Renegar).CTW 200930Why no NP coNP upper bound?Guess a strategy and verify that it works?

Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL04 claims such a result.

In 2007, Kousha Etessami found a technical issue in the proof and the authors retracted the claim.

CTW 200931It is not obvious that computing the values gives any information about the strategies.

In contrast, for SSGs, optimal strategies can be computed from values in linear time (Andersson and M., ISAAC09)

Computing values vs. Finding strategiesCTW 200932MAX-sinkAlgorithms strategically solving concurrent reachability games Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST06.

Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA09.

Policy improvement!No time bounds given. Theorem [Hansen, Koucky and M., LICS09]:Any algorithm that manipulates -optimal strategies of concurrent reachability games must use exponential space (so no NP coNP algorithm comes from guessing strategies)Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time). Hardness of solving CRGsDante in Purgatory1234567

Dante enters Purgatory at terrace 1.Purgatory has 7 terraces.Dante in Purgatory1234567

While in Purgatory, once asecond, Dante must playMatching Pennieswith Lucifer

Dante in Purgatory1234567

If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567

If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567If Dante wins, he proceedsto the next terrace

Dante in Purgatory1234567

If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.Dante in Purgatory1234567

If Dante wins Matching Penniesat terrace 7, he wins the game of Purgatory.Dante in Purgatory1234567

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.

Dante in Purgatory1234567

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.Dante in Purgatory1234567

If Dante loses Matching Penniesguessing Heads, he goes back toterrace 1.Dante in Purgatory1234567

If Dante loses Matching Penniesguessing Taiis..

. he loses the game of Purgatory!!!!

Dante in PurgatoryIs there is a strategy for Dante so that he is guaranteed to win the game of Purgatory with probability at least 90%?Yes.

How long can Lucifer confine Dante to Purgatory if Dante plays by such a strategy?1055 years.

Dante in PurgatoryA bit surprising when Dante wins, he has guessedcorrectly which hand seven times in a row!Apply algorithm of de Alfaro, Henzinger and KupfermanPurgatory is a game of doubly exponential patience.The patience of a mixed strategy is 1/p where p is the smallest non-zero probability used by the strategy (Everett, 1957).

To win with probability 1-, Dante must choose Heads at terrace i with probability greater than (approximately) 1- 27-i

On the other hand, choosing Heads with probability 1 is no good!

To win with probability 9/10, he must choose Heads at terrace 1 with probability greater than 1-(1/10)64 = 0.9999999999999999999999999999999999999999999999999999999999999999. But then Lucifer can respond by always choosing Tails at terrace 1.

Theorem [Hansen, Koucky and M.]:Any algorithm that manipulates -optimal strategies of concurrent reachability games must use exponential space.

Proof: Storing 0.9999999999999999999999999999999999999999999999999999999999999999 takes up a lot of space!

Hardness of solving CRGsTime of play and value iterationTo win Purgatory with probability 1-, almost all probability mass has to be assigned to strategies leading to plays of length at least (1/)2n-1.

On the other hand, (1/)2116n is worst possible expected time of play for any game with n nodes.

Corollary: To solve Purgatory quantitatively using value iteration, (1/)2n-1 iterations are needed to get anywhere near the correct values. But (1/)2116 n iterations is enough to get -close for any n-node game.

Upper bounds shown (again )by appealing to the first order theory of the reals (semi-algebraic geometry), in particular Basu et al.Patience of Purgatory with n terraces and < Upper bound: (1/)2n-1

Lower bound: ((1-)/2)2n-2

Proof of lower bound

> 2WLOG first placefrom abovewhere thishappensProof of lower bound

Open problemsWhat is the exact patience of Purgatory? Probably not a closed expression.Is Purgatory extremal with respect to patience among n-node CRGs? If yes, this gives a better upper bound on number of iterations of value iteration for CRGs, replacing 116 with 1!Compare

Condons example. Extremal with respect to, e.g., expected absorption time. Open ProblemThe fact that the values can be approximated in PSPACE, stronlgy suggests that PSPACE should be enough for understanding CRGs.

Is there a natural representation of probabilities so that-optimal strategies of CRGs can be represented succinctly and-optimal strategies of CRGs can be computed using polynomial space?

De Alfaro, Henzinger, Kupferman , FOCS98: Yes, for the restricted case CRGs where the values of all positions are 0 or 1.

CRGs seem much harder to analyze than SSGs. Are there any formal argument sfor this (beyond SUM-OF-SQRT hardness)?Thank you!CTW 200967