Click here to load reader

Reachability in MDPs: Refining Convergence of Value Iterationdi.ulb.ac.be/verif/monmege/talks/RP2014.pdf · Reachability in MDPs: Refining Convergence of Value Iteration Serge Haddad

  • View
    13

  • Download
    0

Embed Size (px)

Text of Reachability in MDPs: Refining Convergence of Value...

  • Reachability in MDPs: Refining Convergence

    of Value Iteration

    Serge Haddad (LSV, ENS Cachan, CNRS & Inria) and

    Benjamin Monmege (ULB) !

    RP 2014, Oxford

  • 2

    Markov Decision Processes

    • What?

    ✦ Stochastic process with non-deterministic choices

    ✦ Non-determinism solved by policies/strategies

  • • Where?

    ✦ Optimization

    ✦ Program verification: reachability as the basis of PCTL model-checking

    ✦ Game theory: 1+½ players

    2

    Markov Decision Processes

    • What?

    ✦ Stochastic process with non-deterministic choices

    ✦ Non-determinism solved by policies/strategies

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    Actions to be selected by the policy

    3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    Actions to be selected by the policy

    M= (S,α,δ)δ :S×α→ Dist(S)

    σ : (S ⋅α)! ⋅S → Dist(α)Policy3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    Actions to be selected by the policy

    Reachability objective

    M= (S,α,δ)δ :S×α→ Dist(S)

    σ : (S ⋅α)! ⋅S → Dist(α)Policy3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    Actions to be selected by the policy

    Reachability objective

    M= (S,α,δ)δ :S×α→ Dist(S)

    σ : (S ⋅α)! ⋅S → Dist(α)Policy

    Probability to reach: Prsσ(F )

    3

  • MDPs: definition and objective

    ½

    bc

    d

    ½

    ½ e

    Finite number of states

    Probabilistic states

    Actions to be selected by the policy

    Reachability objective

    M= (S,α,δ)δ :S×α→ Dist(S)

    σ : (S ⋅α)! ⋅S → Dist(α)Policy

    Probability to reach: Prsσ(F )

    3

    Maximal probability to reach: Pr

    smax(F )= sup

    σPrsσ(F )

  • Optimal reachability probabilities of MDPs

    • How?

    ✦ Linear programming

    ✦ Policy iteration

    ✦ Value iteration: numerical scheme that scales well and works in practice

    4

  • Optimal reachability probabilities of MDPs

    • How?

    ✦ Linear programming

    ✦ Policy iteration

    ✦ Value iteration: numerical scheme that scales well and works in practice

    4

    used in the numerical PRISM model checker

    [Kwiatkowska, Norman, Parker, 2011]

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

    0.7969 0.7988 (b) 0.3977 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

    0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration

    5

    ½

    bc

    d

    ½

    ½ e

    0 0 0 00 2/3 (b) 0 0

    1/3 2/3 (b) 0 01/2 2/3 (b) 1/6 07/12 13/18 (b) 1/4 0… … … …

    0.7969 0.7988 (b) 0.3977 00.7978 0.7992 (b) 0.3984 0

    ≤0.001

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2k

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

    … … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 01/ 2

    k−1

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

    … … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

    Step k+1 1 1/2 1/4 1/8 … 0 … 0

    1/ 2k−1

    1/ 2k−1 1/ 2k

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

    … … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

    Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

    1/ 2k−1

    1/ 2k−1 1/ 2k

  • Value iteration: which guarantees?

    6

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2k½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    State 0 1 2 3 … k-1 k k+1 … 2kStep 1 1 0 0 0 … 0 0 0 … 0Step 2 1 1/2 0 0 … 0 0 0 … 0Step 3 1 1/2 1/4 0 … 0 0 0 … 0Step 4 1 1/2 1/4 1/8 … 0 0 0 … 0

    … … … … … … … … … … …Step k 1 1/2 1/4 1/8 … 0 0 … 0

    Step k+1 1 1/2 1/4 1/8 … 0 … 0≤1/2k

    Real value: 1/2 (by symmetry)

    1/ 2k−1

    1/ 2k−1 1/ 2k

  • Contributions

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    • keeps an interval of possible optimal values

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    • keeps an interval of possible optimal values

    • uses the interval for the stopping criterion

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    • keeps an interval of possible optimal values

    • uses the interval for the stopping criterion

    2. Study of the speed of convergence

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    • keeps an interval of possible optimal values

    • uses the interval for the stopping criterion

    2. Study of the speed of convergence

    • also applies to classical value iteration

    7

  • Contributions

    1. Enhanced value iteration algorithm with strong guarantees

    • performs two value iterations in parallel

    • keeps an interval of possible optimal values

    • uses the interval for the stopping criterion

    2. Study of the speed of convergence

    • also applies to classical value iteration

    3. Improved rounding procedure for exact computation

    7

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    xs(n+1) = max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s(n)

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    fmax(x)

    s= max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    x (n+1) = fmax(x (n))

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    fmax(x)

    s= max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    x (n+1) = fmax(x (n))

    usual stopping criterion

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    fmax(x)

    s= max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    x (n+1) = fmax(x (n))

    usual stopping criterion

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    NEW stopping criterion

    fmax(x)

    s= max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    x (n+1) = fmax(x (n))

    usual stopping criterion

  • Interval iteration

    8

    ½

    ½

    ½

    ½

    k

    k-1

    …k+2k+1 2k-1 2n½

    ½

    ½

    ½

    ½½

    …k-2 1½

    ½½

    ½

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    NEW stopping criterion

    fmax(x)

    s= max

    a∈αδ

    ′s ∈S∑ (s,a)( ′s )×x ′s

    xs(0) = 1 if s =

    0 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    ys(0) = 0 if s =

    1 otherwise

    ⎧⎨⎪⎪

    ⎩⎪⎪

    x (n+1) = fmax(x (n)) y(n+1) = f

    max(y(n))

    usual stopping criterion

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    always converges towards (Prsmax(F ))s∈S

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    always converges towards (Prsmax(F ))s∈S

    not always…!

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    always converges towards (Prsmax(F ))s∈S

    not always…!

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    always converges towards (Prsmax(F ))s∈S

    not always…!

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    1 1

    0.7

    0 0

    1

  • Fixed point characterization

    9

    (Prsmax(F ))

    s∈S is the smallest fixed point of . fmax

    0

    0,25

    0,5

    0,75

    1

    Number of steps

    1 2 3 4 5 6

    always converges towards (Prsmax(F ))s∈S

    not always…!

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    0.5 0.5

    0.45

    0 0

    1

  • Solution: ensure uniqueness!

    10

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    Usual techniques applied for MDPs do not apply…

  • Solution: ensure uniqueness!

    10

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    Usual techniques applied for MDPs do not apply…

    Prsmax(F )= 1

    Prsmax(F )= 0

  • Solution: ensure uniqueness!

    10

    Usual techniques applied for MDPs do not apply…

    Prsmax(F )= 1

    a

    ½½

    c½g

    b

    e

    0.2

    0.3

    f

    Prsmax(F )= 0

    Still multiple fixed points!

  • Solution: ensure uniqueness!

    10

    Usual techniques applied for MDPs do not apply…

    NEW! Use Maximal End Components… (computable in polynomial time)

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    [de Alfaro, 1997]

  • Solution: ensure uniqueness!

    10

    Usual techniques applied for MDPs do not apply…

    NEW! Use Maximal End Components… (computable in polynomial time)

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    Bottom Maximal End Component

    Bottom Maximal End Component

    Trivial Maximal End Component

    Maximal End Component

    [de Alfaro, 1997]

  • Solution: ensure uniqueness!

    10

    Usual techniques applied for MDPs do not apply…

    NEW! Use Maximal End Components… (computable in polynomial time)and trivialize them!

    ½½

    ½g

    b

    e

    0.2

    0.3

    f

    Now, unicity of the fixed point

    Max-reduced MDP

    [de Alfaro, 1997]

  • An even smaller MDP for minimal probabilities

    11

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    Bottom Maximal End Component

    Bottom Maximal End Component

    Trivial Maximal End Component

    Maximal End Component

  • An even smaller MDP for minimal probabilities

    11

    a

    ½½

    c½g

    b

    e

    d

    0.2

    0.3

    f

    Bottom Maximal End Component

    Bottom Maximal End Component

    Trivial Maximal End Component

    Maximal End Component

    Non-trivial (and non accepting) MEC have null minimal probability!

  • An even smaller MDP for minimal probabilities

    11

    b

    e

    0.2

    0.3

    f

    Non-trivial (and non accepting) MEC have null minimal probability!

    Min-reduced MDP

  • Interval iteration algorithm in reduced MDPs

    12

    Algorithm 1: Interval iteration algorithm for minimum reachability

    Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

    1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0)xs06 y

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0) ys0

    7 � := maxs2S(y0s � x0s)

    8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

    10 return (xs)s2S , (ys)s2S

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmin

    (y(n));

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmax

    (y(n)).

    Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

    x

    (1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

    +) .

    Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

    as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

    +)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

    Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

    s

    , y

    s

    ] of length at most ".

    We implemented a prototype of the algorithm in OCaml.

    Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

    n

    = 0.4995 andy

    n

    = 0.5005, given a good confidence to the user.

    14

  • Interval iteration algorithm in reduced MDPs

    12

    Algorithm 1: Interval iteration algorithm for minimum reachability

    Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

    1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0)xs06 y

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0) ys0

    7 � := maxs2S(y0s � x0s)

    8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

    10 return (xs)s2S , (ys)s2S

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmin

    (y(n));

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmax

    (y(n)).

    Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

    x

    (1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

    +) .

    Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

    as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

    +)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

    Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

    s

    , y

    s

    ] of length at most ".

    We implemented a prototype of the algorithm in OCaml.

    Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

    n

    = 0.4995 andy

    n

    = 0.5005, given a good confidence to the user.

    14

    Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

    of length at most ε for each state containing .Prsmax(F )

  • Interval iteration algorithm in reduced MDPs

    12

    Algorithm 1: Interval iteration algorithm for minimum reachability

    Input: Min-reduced MDP M = (S,↵M, �M), convergence threshold "Output: Under- and over-approximation of PrminM (F s+)

    1 xs+ := 1; xs� := 0; ys+ := 1; ys� := 02 foreach s 2 S \ {s+, s�} do xs := 0; ys := 13 repeat4 foreach s 2 S \ {s+, s�} do5 x

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0)xs06 y

    0s := mina2A(s)

    Ps02S �M(s, a)(s

    0) ys0

    7 � := maxs2S(y0s � x0s)

    8 foreach s 2 S \ {s+, s�} do x0s := xs; y0s := ys9 until � 6 "

    10 return (xs)s2S , (ys)s2S

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmin

    (y(n));

    – y

    (0) = y(0) and for all n 2 N, y(n+1) = fmax

    (y(n)).

    Because of the new choice for the initial vector, notice that y and y are non-increasing sequences. Hence, with the same reasoning as above, we know thatthese sequences converge, and that their limit, denoted by y(1) and y(1) re-spectively, are the minimal (respectively, maximal) reachability probabilities. Inparticular, notice that x and y, as well as x and y, are adjacent sequences, andthat

    x

    (1) = y(1) = PrminM (F s+) and x(1) = y(1) = PrmaxM (F s

    +) .

    Let us first consider a min-reduced MDP M. Then, our new value iterationalgorithm computes both in the same time sequences x and y and stops as soon

    as ky(n) � x(n)k 6 ". In case this criterion is satisfied, which will happen aftera finite (yet possibly large and not bounded a priori) number of iterations, wecan guarantee that we obtained over- and underapproximations of PrminM (F s

    +)with precision at least " on every component. Because of the simultaneous com-putation of lower and upper bounds, we call this algorithm interval iterationalgorithm, and specify it in Algorithm 1. A similar algorithm can be designedfor maximum reachability probabilities, by considering max-reduced MDPs andreplacing min operations of lines 5 and 6 by max operations.

    Theorem 15. For every min-reduced (respectively, max-reduced) MDP M, andconvergence threshold ", if the interval iteration algorithm returns the vectors xand y on those inputs, then for all s 2 S, PrminM,s(F s+) (respectively, PrmaxM,s(F s+))is in the interval [x

    s

    , y

    s

    ] of length at most ".

    We implemented a prototype of the algorithm in OCaml.

    Example 16. For the same example as the one in Example 14, our tool convergesafter 10548 steps, and outputs, for the initial state s = n, x

    n

    = 0.4995 andy

    n

    = 0.5005, given a good confidence to the user.

    14

    Sequences x and y converge towards the minimal probability to reach . Hence, the algorithm terminates by returning an interval

    of length at most ε for each state containing .Prsmax(F )

    Possible speed-up: only check size of interval for a given state…

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECs

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length I

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length I

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length I

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length I

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length I

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

    Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

    Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

    ys(nI )−x

    s(nI ) = Pr

    sσ(G≤nI (¬ ))−Pr

    s′σ (F≤nI )≤Pr

    s′σ (G≤nI (¬ ))−Pr

    s′σ (F≤nI )

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

    Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

    G≤n(¬ )≡ G≤n¬( ∨ )⊕F≤nsince

    = Prs′σ (G≤nI¬( ∨ ))≤ (1− ηI )n

    ys(nI )−x

    s(nI ) = Pr

    sσ(G≤nI (¬ ))−Pr

    s′σ (F≤nI )≤Pr

    s′σ (G≤nI (¬ ))−Pr

    s′σ (F≤nI )

  • Rate of convergence

    13

    x stores reachability probabilities, y stores safety probabilities, i.e., after n iterations: x

    s= Pr

    smin(F≤n ) ys = Prs

    min(G≤n(¬ ))

    ½

    ½

    ½

    ½…

    ½

    ½

    ½

    ½

    ½½

    …½

    ½½

    ½

    2 BMECs and only trivial MECsattractor decomposition: length Ismallest positive probability: η

    Leaking property: ∀n ∈ ! Prsmax(G≤nI¬( ∨ ))≤ (1− ηI )n

    The interval iteration algorithm converges in at most steps.Ilogε

    log(1− ηI )

    ⎢⎢⎢

    ⎥⎥⎥

  • Stopping criterion for exact computation

    14

    MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

  • Stopping criterion for exact computation

    14

    MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

    [Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

  • Stopping criterion for exact computation

    14

    MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

    [Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

    Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

  • Stopping criterion for exact computation

    14

    MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

    [Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

    Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

    Improvement since !1/ η ≤d N ≤M

  • Stopping criterion for exact computation

    14

    MDPs with rational probabilities: d the largest denominator of transition probabilities N the number of states M the number of transitions with non-zero probabilities

    [Chatterjee, Henzinger 2008] claim for exact computation possible after iterations of value iterationd 8M

    Optimal probabilities and policies can be computed by the interval iteration algorithm in at most steps.O((1/ η)NN 3 logd)

    Improvement since !1/ η ≤d N ≤M

    Sketch of proof: • use as threshold (with α gcd of optimal probabilities)

    • upper bound on α based on matrix properties of Markov chains:

    ε = 1/ 2α

    α =O(NNd 3N2

    )

  • Conclusion and related work• Framework allowing guarantees for value iteration algorithm

    • General results on convergence rate

    • Criterion for computation of exact value

    • Future work: test of our preliminary implementation over real instances

    15

  • Conclusion and related work• Framework allowing guarantees for value iteration algorithm

    • General results on convergence rate

    • Criterion for computation of exact value

    • Future work: test of our preliminary implementation over real instances

    15

    • [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

  • Conclusion and related work• Framework allowing guarantees for value iteration algorithm

    • General results on convergence rate

    • Criterion for computation of exact value

    • Future work: test of our preliminary implementation over real instances

    15

    • [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

    • [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

  • Conclusion and related work• Framework allowing guarantees for value iteration algorithm

    • General results on convergence rate

    • Criterion for computation of exact value

    • Future work: test of our preliminary implementation over real instances

    15

    • [Katoen, Zapreev, 2006] On-the-fly detection of steady-state in the transient analysis of continuous-time Markov chains

    • [Kattenbelt, Kwiatkowska, Norman, Parker, 2010] CEGAR-based approach for stochastic games

    • To be published at ATVA 2014 [Brázdil, Chatterjee, Chmelík, Forejt, Křetínský, Kwiatkowska, Parker, Ujma, 2014] same techniques in a machine learning framework with almost sure convergence and computation of non-trivial end components on-the-fly