89
Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and Benjamin Monmege (ULB) RP 2014, Oxford

Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

  • Upload
    others

  • View
    29

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Reachability in MDPs: Refining Convergence

of Value Iteration

Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and

Benjamin Monmege (ULB) !

RP 2014, Oxford

Page 2: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

Page 3: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

• Where?

✦ Optimization

✦ Program verification: reachability as the basis of PCTL model-checking

✦ Game theory: 1+½ players

2

Markov Decision Processes

• What?

✦ Stochastic process with non-deterministic choices

✦ Non-determinism solved by policies/strategies

Page 4: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

3

Page 5: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

3

Page 6: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

3

Page 7: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

3

Page 8: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy3

Page 9: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy3

Page 10: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3

Page 11: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

MDPs: definition and objective

½

bc

d

½

½ e

Finite number of states

Probabilistic states

Actions to be selected by the policy

Reachability objective

M= (S,α,δ)δ :S×α→ Dist(S)

σ : (S ⋅α)! ⋅S → Dist(α)Policy

Probability to reach: Prsσ(F )

3

Maximal probability to reach: Pr

smax(F )= sup

σPrsσ(F )

Page 12: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

Page 13: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Optimal reachability probabilities of MDPs

• How?

✦ Linear programming

✦ Policy iteration

✦ Value iteration: numerical scheme that scales well and works in practice

4

used in the numerical PRISM model checker

[Kwiatkowska, Norman, Parker, 2011]

Page 14: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

Page 15: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 16: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 17: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 18: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 19: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 20: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 21: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 22: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 23: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 24: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration

5

½

bc

d

½

½ e

0 0 0 00 2/3 (b) 0 0

1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

≤0.001

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

Page 25: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

Page 26: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2k

Page 27: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0

Page 28: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0

Page 29: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0

Page 30: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

Page 31: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 01/ 2k−1

Page 32: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0

1/ 2k−1

1/ 2k−1 1/ 2k

Page 33: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k 1/ 2k−1

1/ 2k−1 1/ 2k

Page 34: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Value iteration: which guarantees?

6

½

½

½

½

k

k-1

…k+2k+1 2k-1 2k

½

½

½

½

½½

…k-2 1½

½½

½

State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

… … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

Real value: 1/2 (by symmetry)

1/ 2k−1

1/ 2k−1 1/ 2k

Page 35: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

7

Page 36: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

7

Page 37: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

7

Page 38: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

7

Page 39: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

7

Page 40: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

7

Page 41: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

• also applies to classical value iteration

7

Page 42: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Contributions

1. Enhanced value iteration algorithm with strong guarantees

• performs two value iterations in parallel

• keeps an interval of possible optimal values

• uses the interval for the stopping criterion

2. Study of the speed of convergence

• also applies to classical value iteration

3. Improved rounding procedure for exact computation

7

Page 43: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

Page 44: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

xs(n+1) = max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

(n)

Page 45: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

Page 46: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

Page 47: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

Page 48: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n))

usual stopping criterion

Page 49: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration

8

½

½

½

½

k

k-1

…k+2k+1 2k-1 2n

½

½

½

½

½½

…k-2 1½

½½

½

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

NEW stopping criterion

fmax(x)

s= max

a∈αδ

′s ∈S∑ (s,a)( ′s )×x ′s

xs(0) = 1 if s =

0 otherwise

⎧⎨⎪⎪

⎩⎪⎪

ys(0) = 0 if s =

1 otherwise

⎧⎨⎪⎪

⎩⎪⎪

x (n+1) = fmax(x (n)) y(n+1) = f

max(y(n))

usual stopping criterion

Page 50: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

Page 51: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

Page 52: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

Page 53: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

Page 54: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

Page 55: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

1 1

0.7

0 0

1

Page 56: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Fixed point characterization

9

(Prsmax(F ))

s∈S is the smallest fixed point of . f

max

0

0,25

0,5

0,75

1

Number of steps

1 2 3 4 5 6

always converges towards (Prsmax(F ))s∈S

not always…!

a

½½

c½g

b

e

d

0.2

0.3

f

0.5 0.5

0.45

0 0

1

Page 57: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

a

½½

c½g

b

e

d

0.2

0.3

f

Usual techniques applied for MDPs do not apply…

Page 58: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

a

½½

c½g

b

e

d

0.2

0.3

f

Usual techniques applied for MDPs do not apply…

Prsmax(F )= 1

Prsmax(F )= 0

Page 59: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

Prsmax(F )= 1

a

½½

c½g

b

e

0.2

0.3

f

Prsmax(F )= 0

Still multiple fixed points!

Page 60: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

[de Alfaro, 1997]

Page 61: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

[de Alfaro, 1997]

Page 62: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Solution: ensure uniqueness!

10

Usual techniques applied for MDPs do not apply…

NEW! Use Maximal End Components… (computable in polynomial time)and trivialize them!

½½

½g

b

e

0.2

0.3

f

Now, unicity of the fixed point

Max-reduced MDP

[de Alfaro, 1997]

Page 63: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

An even smaller MDP for minimal probabilities

11

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

Page 64: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

An even smaller MDP for minimal probabilities

11

a

½½

c½g

b

e

d

0.2

0.3

f

Bottom Maximal End Component

Bottom Maximal End Component

Trivial Maximal End Component

Maximal End Component

Non-trivial (and non accepting) MEC have null minimal probability!

Page 65: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

An even smaller MDP for minimal probabilities

11

b

e

0.2

0.3

f

Non-trivial (and non accepting) MEC have null minimal probability!

Min-reduced MDP

Page 66: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "

Output: Under- and over-approximation of Pr

minM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s0)xs0

6 y

0s := mina2A(s)

Ps02S �M(s, a)(s0) ys0

7 � := maxs2S(y0s � x

0s)

8 foreach s 2 S \ {s+, s�} do x

0s := xs; y

0s := ys

9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

min

(y(n));

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

max

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y

(1) and y

(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x

(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr

min

M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x

and y on those inputs, then for all s 2 S, Pr

min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+

))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

Page 67: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "

Output: Under- and over-approximation of Pr

minM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s0)xs0

6 y

0s := mina2A(s)

Ps02S �M(s, a)(s0) ys0

7 � := maxs2S(y0s � x

0s)

8 foreach s 2 S \ {s+, s�} do x

0s := xs; y

0s := ys

9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

min

(y(n));

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

max

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y

(1) and y

(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x

(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr

min

M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x

and y on those inputs, then for all s 2 S, Pr

min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+

))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )

Page 68: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Interval iteration algorithm in reduced MDPs

12

Algorithm 1: Interval iteration algorithm for minimum reachability

Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "

Output: Under- and over-approximation of Pr

minM (F s+)

1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

0s := mina2A(s)

Ps02S �M(s, a)(s0)xs0

6 y

0s := mina2A(s)

Ps02S �M(s, a)(s0) ys0

7 � := maxs2S(y0s � x

0s)

8 foreach s 2 S \ {s+, s�} do x

0s := xs; y

0s := ys

9 until � 6 "

10 return (xs)s2S , (ys)s2S

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

min

(y(n));

– y

(0) = y

(0) and for all n 2 N, y(n+1) = f

max

(y(n)).

Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y

(1) and y

(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

x

(1) = y

(1) = Pr

min

M (F s+) and x

(1) = y

(1) = Pr

max

M (F s+) .

Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

as ky(n) � x

(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of Pr

min

M (F s+)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors x

and y on those inputs, then for all s 2 S, Pr

min

M,s

(F s+

) (respectively, Pr

max

M,s

(F s+

))is in the interval [x

s

, y

s

] of length at most ".

We implemented a prototype of the algorithm in OCaml.

Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

n

= 0.4995 andy

n

= 0.5005, given a good confidence to the user.

14

Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

of length at most ε for each state containing .Prsmax(F )

Possible speed-up: only check size of interval for a given state…

Page 69: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

Page 70: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECs

Page 71: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Page 72: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Page 73: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Page 74: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Page 75: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length I

Page 76: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Page 77: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

Page 78: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

Page 79: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

G≤n(¬ )≡ G≤n¬( ∨ )⊕F≤nsince

= Prs′σ (G≤nI¬( ∨ ))≤ (1− ηI )n

ys(nI )−x

s(nI ) = Pr

sσ(G≤nI (¬ ))−Pr

s′σ (F≤nI )≤Pr

s′σ (G≤nI (¬ ))−Pr

s′σ (F≤nI )

Page 80: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Rate of convergence

13

x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

s= Pr

smin(F≤n ) ys = Prs

min(G≤n(¬ ))

½

½

½

½…

½

½

½

½

½½

…½

½½

½

2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

The interval iteration algorithm converges in at most steps.Ilogε

log(1− ηI )

⎢⎢⎢

⎥⎥⎥

Page 81: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

Page 82: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Page 83: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

Page 84: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

Improvement since !1/ η ≤d N ≤M

Page 85: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Stopping criterion for exact computation

14

MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

[Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

Improvement since !1/ η ≤d N ≤M

Sketch of proof: • use as threshold (with α gcd of optimal probabilities)

• upper bound on α based on matrix properties of Markov chains:

ε = 1/ 2α

α =O(NNd 3N2

)

Page 86: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

Page 87: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

Page 88: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

Page 89: Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

Conclusion and related work• Framework allowing guarantees for value iteration algorithm

• General results on convergence rate

• Criterion for computation of exact value

• Future work: test of our preliminary implementation over real instances

15

• [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

• [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

• To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly