# Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

• View
13

0

Embed Size (px)

### Text of Reachability in MDPs: Refining Convergence of Value...

• Reachability in MDPs: Refining Convergence

of Value Iteration

Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and

Benjamin Monmege (ULB) !

RP 2014, Oxford

• 2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

• • Where?

✦ Optimization

✦ Program verification: reachability as the basis of PCTL model-checking

✦ Game theory: 1+½ players

2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

• MDPs: definition and objective

½

bc

d

½

½ e

3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3

• MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3

Maximal probability to reach: Pr

smax(F )= sup

σPrsσ(F )

• Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

• Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

used in the numerical PRISM model checker

[Kwiatkowska, Norman, Parker, 2011]

• Value iteration

5

½

bc

d

½

½ e

• Value iteration

5

½

bc

d

½

½ e

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

≤0.001

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2k

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 01/ 2

k−1

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0

1/ 2k−1

1/ 2k−1 1/ 2k

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

1/ 2k−1

1/ 2k−1 1/ 2k

• Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

Real value: 1/2 (by symmetry)

1/ 2k−1

1/ 2k−1 1/ 2k

• Contributions

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

• also applies to classical value iteration

7

• Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

• also applies to classical value iteration

3. Improved rounding procedure for exact computation

7

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s(n)

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

• Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

ys(0) = 0 if s =

1 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n)) y(n+1) = f

max(y(n))

usual stopping criterion

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

1 1

0.7

0 0

1

• Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . fmax

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

0.5 0.5

0.45

0 0

1

• Solution: ensure uniqueness!

10

a

½½

c½g

b

e

d

0.2

0.3

f

Usual techniques applied for MDPs do not apply…

• Solution: ensure uniqueness!

10

a

½½

c½g

b

e

d

0.2

0.3

f

Usual techniques applied for MDPs do not apply…

Prsmax(F )= 1

Prsmax(F )= 0

• Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

Prsmax(F )= 1

a

½½

c½g

b

e

0.2

0.3

f

Prsmax(F )= 0

Still multiple fixed points!

• Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

[de Alfaro, 1997]

• Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

[de Alfaro, 1997]

• Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)and trivialize them!

½½

½g

b

e

0.2

0.3

f

Now, unicity of the fixed point

Max-reduced MDP

[de Alfaro, 1997]

• An even smaller MDP for minimal probabilities

11

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

• An even smaller MDP for minimal probabilities

11

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

Non-trivial (and non accepting) MEC have null minimal probability!

• An even smaller MDP for minimal probabilities

11

b

e

0.2

0.3

f

Non-trivial (and non accepting) MEC have null minimal probability!

Min-reduced MDP

• Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s

0)xs06 y

0s := mina2A(s)

Ps02S �M(s, a)(s

0) ys0

7 � := maxs2S(y0s � x0s)

8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmin

(y(n));

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmax

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

• Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s

0)xs06 y

0s := mina2A(s)

Ps02S �M(s, a)(s

0) ys0

7 � := maxs2S(y0s � x0s)

8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmin

(y(n));

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmax

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )

• Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s

0)xs06 y

0s := mina2A(s)

Ps02S �M(s, a)(s

0) ys0

7 � := maxs2S(y0s � x0s)

8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmin

(y(n));

– y

(0) = y(0) and for all n 2 N, y(n+1) = fmax

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )

Possible speed-up: only check size of interval for a given state…

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECs

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

G≤n(¬ )≡ G≤n¬( ∨ )⊕F≤nsince

= Prs′σ (G≤nI¬( ∨ ))≤ (1− ηI )n

ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

• Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

The interval iteration algorithm converges in at most steps.Ilogε

log(1− ηI )

⎢⎢⎢

⎥⎥⎥

• Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

• Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

• Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

• Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

Improvement since !1/ η ≤d N ≤M

• Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

Improvement since !1/ η ≤d N ≤M

Sketch of proof: • use as threshold (with α gcd of optimal probabilities)

• upper bound on α based on matrix properties of Markov chains:

ε = 1/ 2α

α =O(NNd 3N2

)

• Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

• Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

• Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

• To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly ##### Artificial Intelligence Beyond Plain MDPs** Artiï¬پcial Intelligence Beyond Plain MDPs** Marc Toussaint
Documents ##### Oscillator Veriï¬پcation with Probability Our Approach: Reachability Analysis Reachability analysis
Documents ##### Mostly-copying reachability-based orthogonal ... Mostly-copying reachability-based orthogonal persistence
Documents ##### Automatically verifying reachability and well-formedness in P4 Automatically verifying reachability
Documents ##### Abstractions for devising compact controllers for MDPs compact controllers for MDPs Research Thesis
Documents ##### Classiï¬پcation-based Approximate Reachability with ... Classiï¬پcation-based Approximate Reachability
Documents ##### Delta-Complete Reachability Analysis (Part I)reports- Delta-Complete Reachability Analysis (Part I)
Documents ##### Lecture 3. Reachability Analysis - cis.upenn. alur/Talks/hl3.pdf · Symbolic Reachability Analysis
Documents ##### Markov Decision Processes (MDPs) (cont.) guestrin/Class/10701/slides/mdps-rl.pdfآ  Markov Decision Process
Documents ##### Axiomatic Characterization of Trace Reachability for ... Axiomatic Characterization of Trace Reachability
Documents ##### Verifying Reachability for Stateful Networks Verifying Reachability for Stateful Networks Aurojit Panda,
Documents ##### Controllability, Reachability, and Stabilizability of ...· ResearchArticle Controllability, Reachability,
Documents ##### Reachability Analysis for Polynomial Dynamical Systems ... ... 132 Dang and Testylier, Reachability
Documents ##### Research Article Reachability and Controllability of ... Reachability and Controllability of Fractional
Documents Documents