Games, Times, and Probabilities: Value Iteration in Verification and Control

Games, Times, and Probabilities:Value Iteration in Verification and Control

Krishnendu Chatterjee Tom Henzinger

Graph Models of Systems

vertices = states

edges = transitions

paths = behaviors

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Graphs vs. Games

Games model Open Systems

Two players: environment / controller / input vs.

system / plant / output

Multiple players: processes / components / agents

Stochastic players: nature / randomized algorithms

Example

init x := 0

choice | x := x+1 mod 2| x := 0

end choice

end loop

1: (x = y )

init y := 0

choice | y := x | y := x+1 mod 2

end choice

end loop

2: ( y = 0 )

Graph Questions

8 ( x = y )

9 ( x = y )

Graph Questions

8 ( x = y )

9 ( x = y )00

Zero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

ATL [Alur/H/Kupferman]

hhP1ii ( x = y )

hhP2ii ( y = 0 )

1111ATL [Alur/H/Kupferman]

hhP1ii ( x = y )

hhP2ii ( y = 0 )

hhP1ii ( x = y )

hhP2ii ( y = 0 )

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

Secure equilibra [Chatterjee/H/Jurdzinski]

Nonzero-Sum Game Questions

hhP1ii ( x = y )

hhP2ii ( y = 0 )

Secure equilibra [Chatterjee/H/Jurdzinski]

Strategies

Strategies x,y: Q* ! Q

From a state q, a pair (x,y) of a player-1 strategy x21 and a player-2 strategy y22 gives a unique infinite path Outcomex,y(q) 2 Q.

Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

Short for:

q ² hhP1ii 1 iff (9 x21) (8 y22) ( Outcomex,y(q) ² 1 )

Strategies

hhP1ii 1 = (9 x21) (8 y22) 1(x,y)

hhP1ii 1 hhP2ii 2 = (9 x21) (9 y22) [ (1 Æ 2)(x,y) Æ (8 y’22) (2 ! 1)(x,y’) Æ (8 x’21) (2 ! 1)(x,y) ]

Objectives and 2

Qualitative: reachability; Buechi; parity (-regular)

Quantitative: max; lim sup; lim avg

Reachability } aSafety a = :}: a

Normal Forms of -Regular Sets

Borel-1

Buechi } acoBuechi } a = :}: a

Borel-1

Borel-2

Buechi } acoBuechi } a = :}: a

Streett Æ ( } a ! } b ) = Æ ( }: a Ç } b )Rabin Ç ( } a Æ } b )

Parity: complement-closed subset of Streett/Rabin

Borel-1

Borel-2

Borel-2.5

Buechi Game

q4q0q2

Buechi Game

q4q0q2

• Secure equilibrium (x,y) at q0:

x: if q1 ! q0, then q2 else q4. y: if q3 ! q1, then q0 else q4.

• Strategies require memory.

Zero-Sum Games: Determinacy

1 = : 2

hhP2ii 2

hhP1ii 1

Nonzero-sum Games

W10 hhP1ii (1 Æ : 2 )

W01 hhP2ii (2 Æ : 1)

hhP1ii1 hhP2ii2

Objectives

Qualitative: reachability; Buchi; parity (-regular)

Objectives

Qualitative: reachability; Buchi; parity (-regular)

Borel-1Borel-2

Borel-3

Quantitative Games

hhP1ii lim sup

hhP1ii lim avg

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg

Quantitative Games

hhP1ii lim sup = 3

hhP1ii lim avg = 1

Solving Games by Value Iteration

Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

Region R: Q ! V

R(q’)

Generalization of the -calculus: computing fixpoints of transfer functions (pre; post).

Generalization of dynamic programming: iterative optimization.

Region R: Q ! V

R(q’)

R(q) := pre(R(q’))

Q states transition labels : Q Q

transition function

Q states transition labels : Q Q

transition function

= [ Q ! {0,1} ] regions with V = B

q 9pre(R) iff ( ) (q,) R

q 8pre(R) iff ( ) (q,) R

9 c = ( X) ( c Ç 9pre(X) )

8 c = ( X) ( c Ç 8pre(X) )

Graph Reachability

Given RµQ, find the states from which some path leads to R.

RR [ pre(

R = ( X) (R Ç 9pre(X))

Graph Reachability

RR [ pre(

[ pre(R)

[ pre2(R)

R = ( X) (R Ç 9pre(X))

Graph Reachability

[ pre(R)

R [ pre(

R) [ pre2(R

R = ( X) (R Ç 9pre(X))

Graph Reachability

[ pre(R)

R [ pre(

R) [ pre2(R

R = ( X) (R Ç 8pre(X))

Given RµQ, find the states from which all paths lead to R.

Graph Reachability

Value Iteration Algorithms

consist of

A. LOCAL PART: 9pre and 8pre computation

B. GLOBAL PART: evaluation of a fixpoint expression

We need to generalize both parts to solve games.

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q Q

transition function

Turn-based Game

transition function

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

Turn-based Game

transition function

q 1pre(R) iff q 2 Q1 Æ ( ) (q,) R or q 2 Q2 Æ (8 2 )

(q,) 2 R

q 2pre(R) iff q 2 Q1 Æ (8 ) (q,) R or q 2 Q2 Æ (9 2 ) (q,) 2 R

Turn-based Game

hhP1ii c = ( X) ( c Ç 1pre(X) )

Turn-based Game

Given RµQ, find the states from which player 1 has a strategy to force the game to R.

Reachability Game

RR [ 1pre(

Reachability Game

RR [ 1pre(

[ 1pre(R)

[ 1pre2(R)

Reachability Game

[ 1pre(R)

R [ 1pre(

R) [ 1pre2(R

P1 R = ( X) (R Ç 1pre(X))

Reachability Game

Given RµQ, find the states from which player 1 has a strategy to keep the game in R.

Safety Game

R \ 1pre(R)

Safety Game

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

Safety Game

. . .1 R

R \ 1pre(R)

R \ 1pre(R) \ 1pre2(R)

P1 R = ( X) (R Æ 1pre(X))

Safety Game

Q1, Q2 states ( Q = Q1 [ Q2 ) transition labels : Q N £ Q transition function

Quantitative Game

= [ Q ! N ] regions with V = N

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game

= [ Q ! N ] regions with V = N

1pre(R)(q) = (max ) max( 1(q,), R(2(q,)) ) if q 2 Q1 (min 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

2pre(R)(q) = (min ) max( 1(q,), R((q,)) ) if q 2 Q1 (max 2 ) max( 1(q,), R(2(q,)) ) if q 2 Q2

Quantitative Game

Maximizing Game

hhP1ii 0 = ( X) max( 0, 1pre(X) )

Maximizing Game

hhP1ii 0 = ( X) max( 0, 1pre(X) )

Maximizing Game

hhP1ii 0 = ( X) max( 0, 1pre(X) )

Maximizing Game

hhP1ii 0 = ( X) max( 0, 1pre(X) )

Given BµQ, find the states from which some path visits B infinitely often.

Buechi Graph

BR1 = pre(B)

. . .pre(B

pre(B) [ pre2(B)

Buechi Graph

BR1 = pre(B)

Buechi Graph

BR1 = pre(B)R2 = pre(B Å

Buechi Graph

B = ( Y) 9 (B Æ 9pre(Y))

Buechi Graph

B = ( Y) ( X) ((B Æ 9pre(Y)) Ç 9pre(X))

Buechi Graph

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

...P1 B

R2 = P1 1pre(B Å R1)

R1 = P1 1pre(B)

P1 B = ( Y) ( X) ((B Æ 1pre(Y)) Ç 1pre(X))

Given BµQ, find the states from which player 1 has a strategy to force the game to B infinitely often.

Buechi Game

Can we use the same value iteration scheme?

Yes, iff the fixpoint expression computes correctly on all single-player (player 1 and player 2) structures.

Reachability: 9 p = ( X) (p Ç 9pre(X)) 8 p = ( X) (p Ç 8pre(X))

Hence: hhP1ii p = ( X) (p Ç 1pre(X)) hhP2ii p = ( X) (p Ç 2pre(X))

From Graphs to Games

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

Complexity of Turn-based Games

1. Reachability, safety: linear time (P-complete)

2. Buechi: quadratic time (optimal ???)

3. Parity: NP Å coNP (in P ???)

on graphs polynomial

on graphs linear

Graph-based (finite-carrier) systems:

Q = Bm = boolean formulas [e.g. BDDs]

9pre = (9 x 2 B)

Timed and hybrid systems:

Q = Bm £ Rn

= formulas of (Q,·,+) [e.g. polyhedral sets]9pre = (9 x 2 Q)

Beyond Graphs as Finite Carrier Sets

Q states 1, 2 moves of both players : Q 1 2 Q transition function

Concurrent Game

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

Concurrent Game

q 1pre(R) iff (1 1) (2 2) (q,1,2) R

q 2pre(R) iff (2 2 ) (1 1) (q,1,2) R

Concurrent Game

1,1 1,2

2,1 2,2

1,1 1,2 2,2

Concurrent Game

1,1 1,2

2,1 2,2

1,1 1,2 2,2

Concurrent Game

1,1 1,2

2,1 2,2

1,1 1,2 2,2

Concurrent Game

1,1 1,2

2,1 2,2

1,1 1,2 2,2

Concurrent Game

Pr(1): 0.5 Pr(2): 0.5

Extended Graph Models

CONTROL: game graph

OBJECTIVE: -automaton

PROBABILITIES: Markov decision process

stochastic game

regular game

CLOCKS: timed automaton

stochastic hybrid system

Nondeterministic closed system.

Graph: 1 Player

Probabilistic closed system.

0.4 0.6

MDP: 1.5 Players

Asynchronous open system.

Turn-based Game: 2 Players

Probabilistic asynchronous open system.

0.4 0.6

Turn-based Stochastic Game: 2.5 Players

bbq2 q4 q5q3

1,2 2,1

Concurrent Game

Synchronous open system.

bbq2 q4 q5q3

q2: 0.3 q3: 0.2 q4: 0.5 q5:

q2: 0.1 q3: 0.1 q4: 0.5 q5: 0.3

q2: q3: 0.2 q4: 0.1 q5: 0.7

q2: 1.0 q3: q4: q5:

1Matrix game at each vertex.

Concurrent Stochastic Game

Probabilistic synchronous open system.

Graph: nondeterministic generator of behaviors (possibly stochastic)

Strategy: deterministic selector of behaviors (possibly randomized)

Graph + Strategies for both players ! Behavior

Two pure strategies at q1: “left” and “right”. Two pure behaviors: ab; aa.

Model = graph Pure behavior = path

q2 q3b

Two pure strategies at q1: “left” and “right”. Two pure behaviors: {ab: 1}; {aac: 0.4, aaa: 0.6}.

Model = MDP Pure behavior = probability distribution on paths = p-path

0.4 0.6

Model = turn-based game Pure behavior = path

Two pure pl. 1 strategies at q1: “left” and “right”. Two pure pl. 2 strategies at q3: “left” and “right”. Three pure behaviors: ab; aac; aaa.

Model = turn-based game Pure behavior = path General (randomized) behavior = p-path

Three pure behaviors: ab; aac; aaa. Infinitely many behaviors, e.g. {aac: 0.5, aaa: 0.5}.

The objective of each player is to find a strategy that optimizes the value of the resulting behavior.

How do we define “value”?

A. Assign a value to each path

B. Assign a value to each behavior (expected value of A.)

C. Assign a value to each state (strategy sup inf of B.)

A. Value of Paths

Qualitative value function: : Q ! {0,1}

e.g. -regular subsets of Q

B. Value of Behaviors

path t: (T) = (t)

p-path T: (T) = Exp {(T)} (expected value)

Example:

T = {aaa: 0.2, aab: 0.7, bbb: 0.1 }

(} b)(T) = 0.8

C. Value of States

hh1ii (q) = supx infy ( Outcomex,y(q) ) hh2ii (q) = supy infx ( Outcomex,y(q) )

Q states 1, 2 moves of both players : Q 1 2 Dist(Q) probabilistic transition function

= [ Q ! [0,1] ] regions with V = [0,1]

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))

= [ Q ! [0,1] ] regions with V = [0,1]

1pre(R)(q) = (sup 1 1 ) (inf 2 2) R((q,1,2))

2pre(R)(q) = (sup 2 2) (inf 1 1) R((q,1,2))

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

hhP1ii c = ( X) max( c, 1pre(X) )

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

2Pl.1Pl.2

a: 0.6 b: 0.4

a: 0.1 b: 0.9

a: 0.5 b: 0.5

a: 0.2 b: 0.8

2Pl.1Pl.2

a: 0.0 c: 1.0

a: 0.7 c: 0.3

a: 0.0 c: 1.0

limit 1

Reachability / max: Buechi / lim sup: Parity: …

Many open questions: How do different evaluation orders compare? How fast do these algorithms converge? When are they

optimal?

1. Number of players: 1, 1.5, 2, 2.5

2. Alternation: turn-based or concurrent

3. Strategies: pure or randomized

4. Value of a path: qualitative (boolean) or quantitative (real)

5. Objective: Borel 1, 2, 3

6. Zero-sum vs. nonzero-sum

Summary: Classification of Games

The two players have complementary path values: 2(t) = 1 – 1(t)

-reachability vs. safety / max vs. min-Buechi vs. coBuechi / lim sup vs. lim inf -Rabin vs. Streett

Main Theorem [Martin75, Martin98]: The concurrent stochastic games are determined for all Borel objectives, i.e., hh1ii1(q) + hh2ii2(q) = 1.

sup inf = inf sup

Summary: Zero-Sum Games

1.5 players

2 players

2.5 players

concurrent

parity

CY98, dAl97: polynomial

GH82, EJ88

dAH00, CdAH06:NP Å coNP

Summary: Zero-Sum Games

-optimal strategies may not exist

-limit values may not be rational

--close strategies, for fixed , may require infinite memory

-no determinacy for pure strategies

1,2 2,1

2,2 hhP1ii (} a) (q1) = 0 hhP2ii (} b) (q1) = 0

Concurrent Games are Difficult

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

Turn-based Games are More Pleasant

-optimal strategies always exist [McIver/Morgan]

-in the non-stochastic case, pure finite-memory optimal strategies exist for -regular objectives [Gurevich/Harrington]

-for parity objectives, pure memoryless optimal strategies exist [Emerson/Jutla: non-stochastic Rabin; Condon: stochastic reachability; Chatterjee/deAlfaro/H: stochastic Rabin], hence

NP Å coNP

If solvable in P is open for non-stochastic parity games and for stochastic reachability games.

Turn-based Games are More Pleasant

Summary

Verification and control are very special (boolean) cases of graph-based optimization problems.

They can be generalized to solve questions that involve multiple players, quantitative resources, probabilistic transitions, and continuous state spaces.

The theory and practice of this is still wide open …

Games, Times, and Probabilities: Value Iteration in Verification and Control

Documents

Complemental Probabilities

Ignition probabilities

Iteration DEvpt

Agile · Agile Inception Iteration 1 Iteration 2 Retrospective (every 2-4 iterations) Release Planning Meeting (every 12-24 iterations) Iteration 3 ... Iteration n. Iteration

T-76.4115 Iteration Demo Group CoMedia Project Planning Iteration 19.10.2005

T-76.4115 Iteration Demo Apollo Crew I1 Iteration 10.12.2008

T-76.4115 Iteration Demo Vitamin B I1 Iteration 13.12.2006

T-76.4115 Iteration Demo Software Trickery I2 Iteration 5.3.2008

Review Probabilities –Definitions of experiment, event, simple event, sample space, probabilities, intersection, union compliment –Finding Probabilities

Iteration Manager to Iteration Leader

Unreliable Probabilities

Conditional probabilities

T-76.4115 Iteration demo T-76.4115 Iteration Demo Neula PP Iteration 21.10.2008

ECMWF WWRP/WMO Workshop on QPF Verification - Prague, 14-16 May 2001 NWP precipitation forecasts: Validation and Value Deterministic Forecasts Probabilities

T-76.4115 Iteration Demo Team WiseGUI I2 Iteration 5.3.2008

T-76.4115 Iteration Demo Hermes Team PP Iteration 24.10.2006

T-76.4115 Iteration Demo Neula I1 Iteration 12.12.2008

Intro: Agile over lunchagileconsortium.pbworks.com/f/Agile_Intro+2007-04-21b.pdfIteration 1 Iteration 2 Iteration 3 Iteration 4 Iteration 5 Iteration 6 Iteration 7 Iteration 8 Release

T-76.4115 Iteration Demo Tikkaajat [PP] Iteration 18.10.2007

Computational Modeling and Verification of Signaling ...akomurav/presentations/signaling_pathways_cancer.pdf · a sample of Bernoulli random variables Prior probabilities P(H0), P(H1)