Laurent Doyen LSV, ENS Cachan & CNRS

Laurent DoyenLSV, ENS Cachan & CNRS

Krishnendu ChatterjeeIST Austria

Luxembourg, December 8th, 2011

&

Partial-Observation Stochastic Games: How to Win when Belief Fails

Example

void main () { int got_lock = 0; do { 1: if (*) { 2: lock (); 3: got_lock++; } 4: if (got_lock != 0) { 5: unlock (); } 6: got_lock--; } while (*); }

void lock () { assert(L == 0); L = 1; }

void unlock () { assert(L == 1); L = 0; }

Example




Wrong!

Example

void main () { int got_lock = 0; do { 1: if (*) { 2: lock (); 3: s0 | s1 | inc | dec; } 4: if (got_lock != 0) { 5: unlock (); } 6: s0 | s1 | inc | dec; } while (*); }

s0 | s1 | inc | dec;


void lock () { assert(L == 0); L = 1;}

void unlock () { assert(L == 1); L = 0;}

s0 ≡ got_lock = 0s1 ≡ got_lock = 1Inc ≡ got_lock++dec ≡ got_lock--

Example

Repair/synthesis as a game:

• System vs. Environment

• Turn-based game graph

• ω-regular objective






Example






Example






A winning strategy may use the value of L to update got_lock accrodingly:

• if L==0 then play s0 (got_lock = 0)

• if L==1 then play s1 (got_lock = 1)

Example

Repair/synthesis as a game of partial observation:

visible

hidden






States that differ only by the value of L have the same observation

Example






Example


s1;

s0;



Partial-Observation Stochastic Games

Examples

• Poker

- partial-observation

- stochastic

Examples

• Poker

- partial-observation

- stochastic

• Bonneteau

Bonneteau

2 black card, 1 red card

Initially, all are face downGoal: find the red card

Bonneteau


Initially, all are face downGoal: find the red card

Rules:

1. Player 1 points a card

2. Player 2 flips one remaining black card

3. Player 1 may change his mind, wins if pointed card is red

Bonneteau


Initially, all are face down

Rules:




Goal: find the red card

Bonneteau


Initially, all are face down

Rules:




Goal: find the red card

Bonneteau: Game Model

Bonneteau: Game Model

Game Model

Game Model

Game Model

Game Model

Game Model

Observations (for player 1)





Observation-based strategy

This strategy is observation-based, e.g. after it plays

Observation-based strategy

This strategy is observation-based, e.g. after it plays

Optimal ?

This strategy is winning with probability 2/3

Game Model

This game is:

• turn-based

• (almost) non-stochastic

• player 2 has perfect observation

Interaction

General case: concurrent & stochastic

Player 1’s move

Player 2’s move

Players choose their moves simultaneously and independently

General case: concurrent & stochastic

Player 1’s move

Player 2’s move

Probability distribution on successor state

-player games

Interaction

Special cases:

• player-1 state

Turn-based games

• player-2 state

Interaction

Partial-observation

Observations: partitions induced by coloring

General case: 2-sided partial observation

Two partitions and

Partial-observation


Player 1’s view

Player 2’s view

General case: 2-sided partial observation

Two partitions and

or

Special case: 1-sided partial observation

Partial-observation


Strategies & objective

A strategy for Player is a function that maps histories (sequences of observations) to probability distribution over actions.


History-depedentrandomized



Reachability objective:

Winning probability of :


Decide if there exists a strategy for player 1 that is winning with probability at least ½.

Qualitative analysis

The following problem is undecidable:(already for probabilistic automata [Paz71])

[Paz71] Paz. Introduction to Probabilistic Automata. Academic Press 1971.

Decide if there exists a strategy for player 1 that is winning with probability at least ½.

Qualitative analysis

The following problem is undecidable:

Qualitative analysis:

• Almost-sure: … winning with probability 1

• Positive: … winning with probability > 0

(already for probabilistic automata [Paz71])

Applications in verification

• Control with inaccurate digital sensors

• multi-process control with private variables

• multi-agent protocols

• planning with uncertainty/unknown

control

tank




Example 1

Player 1 partial, player 2 perfect

Example 1


No pure strategy of Player 1 is winning with probability 1(example from [CDHR06]).

Example 1


No pure strategy of Player 1 is winning with probability 1(example from [CDHR06]).

Example 1

Player 1 wins with probability 1, and needs randomization


Belief-based-only randomized strategies are sufficient

Example 2


Example 2


Randomized action-visible strategies:

To win with probability 1, player 1 needs to observe his own actions. (example from [CDH10]).

Classes of strategies

rand. action-invisible

pure

rand. action-visible Classification

according to the power of strategies



pure



Poly-time reduction from decision problem of rand. act.-vis. to rand. act.-inv.

The model of rand. act.-inv. is more general



pure



Computational complexity(algorithms)

Strategy complexity(memory)

Known results

Almost-sureplayer 1 partialplayer 2 perfect

player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

exponential (belief) [CDHR’06]

memoryless[BGG’09]

exponential (belief) [BGG’09]

rand. act.-inv.

exponential (belief) [GS’09]


pure ? ? ?

Reachability - Memory requirement (for player 1)

Known results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.

exponential (belief)

[CDHR’06(remark), GS’09]


pure ? ? ?


[BGG09] Bertrand, Genest, Gimbert. Qualitative Determinacy and Decidability of Stochastic Games with Signals. LICS’09.

[CDHR06] Chatterjee, Doyen, Henzinger, Raskin. Algorithms for ω-Regular games with Incomplete Information. CSL’06.[GS09] Gripon, Serre. Qualitative Concurrent Stochastic Games with Imperfect Information. ICALP’09.

Known results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.




pure ? ? ?


Positiveplayer 1 partialplayer 2 perfect

player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

memoryless memoryless memoryless

rand. act.-inv.

memoryless memoryless

pure ? ? ?

About beliefs

• Belief is sufficient.

• Randomized action invisible or visible almost same.

• The general case memory is similar (or in some cases exponential blow up) as compared to the one-sided case.

Three prevalent beliefs:

Pure Strategies


Proofs• Doubts.

Belief

Pure Strategies


Proofs• Doubts.

Lesson: Doubt your belief and believe in your doubts !! See the unexpected.

Belief

When belief fails (1/2)

Belief-based-only pure strategies are not sufficient, both for positive and for almost-sure winning

player 1 partial

player 2 perfect



player 1 partial

player 2 perfect



not winning

player 1 partial

player 2 perfect



not winning

player 1 partial

player 2 perfect



Neither is winning !

player 1 partial

player 2 perfect



player 1 partial

player 2 perfect



player 1 partial

player 2 perfect

This strategy is almost-sure winning !

Using the trick of “repeated actions” we construct an example where belief-only randomized action-invisible strategies are not sufficient (for almost-sure winning)


player 1 partial

player 2 perfect



player 1 partial

player 2 perfect



player 1 partial

player 2 perfect

Almost-sure winning requires to play pure strategy, with more-than-belief memory !

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.




pure ? ? ?



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.


pure ? ? ?

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.

exponential (more than belief) ?

pureexponential (more

than belief)? ?



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)? ?

Pure Strategies: Player 1 Perfect, Player 2 Partial

Pl1 Perfect, Pl 2 Partial : Non-stochastic, Pure. Memoryless

Pl1 Perfect, Pl 2 Partial : Stochastic, Randomized. Memoryless

Pl1 Perfect, Pl 2 Perfect: Stochastic, Pure. Memoryless

Pl1 Partial, Pl 2 Perfect: Stochastic, Pure. Exponential

Pl1 Perfect, Pl 2 Partial: Stochastic, Pure.





Pl1 Perfect, Pl 2 Partial: Stochastic, Pure.

Add probabilityRestrict to pure

Pl 2 less informedPl 1 more informed, Pl 2 less informed






Pl1 Perfect, Pl 2 Partial: Stochastic, Pure. Non-elementary complete

Add probabilityRestrict to pure

Pl 2 less informedPl 1 more informed, Pl 2 less informed


New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.



than belief)

non-elementarycomplete

?



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)


?

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.



than belief)


?



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)


?

Player 1 wins from more states, but needs more memory !

Player 1 perfect, player 2 partial

• lower bound: simulation of counter systems with increment and division by 2

• upper bound: positive: non-elementary counters simulate randomized strategies almost-sure: reduction to iterated positive

Memory of non-elementary size for pure strategies

Counter systems with {+1,÷2} require non-elementary counter value for reachability


• Win from more places.

• Winning strategy is very hard to implement.

Information is useful, but ignorance is bliss !

More information:

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.



than belief)


finite (at least non-elementary)



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)



Pure randomized-invisible

Equivalence of the decision problems for almost-sure reachwith pure strategies and rand. act.-inv. strategies• Reduction of rand. act.-inv. to pure choice of a subset of actions (support of prob. dist.)

• Reduction of pure to rand. act.-inv. repeated-action trick (holds for almost-sure only)

It follows that the memory requirements for pure hold for rand. act.-inv. as well !

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.

exponential (more than belief)



than belief)





player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)



Beliefs


• Randomized action invisible or visible almost same.

• The general case memory is similar (or in some cases exponential blow up) as compared to the one-sided case.

Three prevalent beliefs:

Belief fails !

Summary of our results

• player 1 partial: exponential memory, more than belief

• player 1 perfect: non-elementary memory (complete)

• 2-sided: finite, at least non-elementary memory (as compared to previously claimed exponential upper bound)

Pure strategies (for almost-sure and positive):

• player 1 partial: exponential memory, more than belief

• 2-sided: finite, at least non-elementary memory

Randomized action-invisible strategies (for almost-sure) :

More results & open questions

• Player 1 partial: reduction to Büchi game, EXPTIME-complete

Open questions:

• Player 2 partial: non-elementary complexity (note: almost-sure Büchi is poly-time equivalent to almost-sure reachability, positive Büchi is undecidable [BBG08])

Computational complexity for 1-sided:

• Whether non-elementary size memory is sufficient in 2-sided• Exact computational complexity

Details

Details can be found in:

[CD11] Chatterjee, Doyen. Partial-Observation Stochastic Games: How to Win when Belief Fails. CoRR abs/1107.2141, July 2011.

References

Details can be found in:

[BBG08] Baier, Bertrand, Grösser. On Decision Problems for Probabilistic Büchi automata. FoSSaCS’08.

[BGG09] Bertrand, Genest, Gimbert. Qualitative Determinacy and Decidability of Stochastic Games with Signals. LICS’09.

[CDHR06] Chatterjee, Doyen, Henzinger, Raskin. Algorithms for ω-Regular games with Incomplete Information. CSL’06.

[CDH10] Cristau, David, Horn. How do we remember the past in randomised strategies?. GANDALF’10.

[GS09] Gripon, Serre. Qualitative Concurrent Stochastic Games with Imperfect Information. ICALP’09.

[Paz71] Paz. Introduction to Probabilistic Automata. Academic Press 1971.

Other references:

[CD11] Chatterjee, Doyen. Partial-Observation Stochastic Games: How to Win when Belief Fails. CoRR abs/1107.2141, July 2011.

Some proof ideas

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

rand. act.-inv.


than belief)



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

rand. act.-inv.


than belief)

When belief fails


player 1 partial

player 2 perfect

When belief fails


player 1 partial

player 2 perfect

When belief is {q1,q2}, then ensure that the target is reached from both q1 and q2.

When belief fails


When belief fails


When belief fails


belief

When belief fails


belief

When belief fails


beliefobligation

Obligation = from every state in belief, reach target with positive probability

When belief fails


beliefobligation

At this point, obligation of q2 is satisfied


When belief fails


beliefobligation



Here, obligation of q1 is satisfied

When belief fails


beliefobligation




When belief failsbeliefobligation




Empty obligation set

All obligations

fulfilled

When belief failsbeliefobligation

Empty obligation set

All obligations

fulfilled

Positive reachability: ensure empty obligation once

Reachability condition

Almost-sure reachability: ensure empty obligation infinitely often(and recharge when empty)

Büchi condition

Computing with obligations

How to update the obligation set ?

How to check satisfaction of obligations ?



Ask player 1 to decrease distance to target








Makes it harder for player 1…




Problem: obligations are chosen randomly player 1 may get “lucky”.



Cannot store all successors as obligation

Ask player to promise visit to target

Promise of player 1 may vanish if player 1 is “lucky”

(i.e., he gets an observation for which he did not promise anything)











Remove target states from obligation set

Let player 2 make the probabilistic choice










Makes it even harder for player 1









Recharge obligation set when empty

Reduces almost-sure reachability to 2-player Büchi game









Key argument of the proof:

not really harder for player 1 !!











Key argument of the proof:

not really harder for player 1 !!



New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

rand. act.-inv.

purenon-

elementarycomplete



player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.

rand. act.-inv.

purenon-

elementarycomplete








Flavor of a counter system with:

• increment,

• division by 2 (size of alphabet)


Show that:

1. games can simulate increment and division by 2

2. Such counter systems require non-elementary counter value for reachability

Counter system with {+1,÷2}


Reachability ?








Show that:

1. games can simulate increment and division by 2

2. Such counter systems require non-elementary counter value for reachability

Game gadgets for {idle,+1,÷2}

Game gadgets for {idle,+1,÷2}

Game that simulates 3 counters…

Pl1 Perfect, Pl 2 Partial: Stochastic, Pure. Non-elementary lower bound



• lower bound: simulation of counter systems with increment and division by 2

• upper bound: positive: non-elementary counters simulate randomized strategies almost-sure: reduction to iterated positive

Memory of non-elementary size for pure strategies

Counter systems with {+1,÷2} require non-elementary counter value for reachability

New results


player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.




rand. act.-inv.

exponential (more than belief)



than belief)





player 1 perfect

player 2 partial

2-sidedboth partial

rand. act.-vis.


rand. act.-inv.



than belief)



Documents

Laurent Doyen LSV, ENS Cachan & CNRS