Are you doing what I think you are doing? Robust AI via ...schulzef/2017-05-24-Francesco-Leofante.pdf · Robust AI via Veri cation, Monitoring and Repair ∗ Francesco Leofante RWTH

Are you doing what I think you are doing?Robust AI via Verification, Monitoring and Repair ∗

Francesco Leofante

RWTH Aachen University, Germany

University of Genoa, Italy

Munich, May 24th, 2017

∗joint work with E. Abraham, N. Jansen, L. Pulina, A. Tacchella, S.VuottoLeofante Robust AI PUMA Seminar 1 / 24

Why?

www.businnessinsider.com

www.science.howstuffworks.com

Leofante Robust AI PUMA Seminar 2 / 24

www.businnessinsider.com

www.science.howstuffworks.com

Why?


What?

. . . aka my Ph.D. in 3 questions:

Q1. How to learn formal models of robots’ behaviorsthat can be subject to automated verification?

Q2. How to employ formal methods for the automatedconstruction of safe and effective strategies?

Q3. How to detect and repair faulty models?


What?

. . . aka my Ph.D. in 3 questions:

Q1. How to learn formal models of robots’ behaviorsthat can be subject to automated verification?

Q2. How to employ formal methods for the automatedconstruction of safe and effective strategies?

Q3. How to detect and repair faulty models?


How?

ModelMLearning Safe?

Deploy

Discrepancies?Beyond repair?

Shut Down

RepairM

Static Verification

yes

Online Monitoring no

yesyes

no


Case Studies

Let’s get down to business:

Problem 1: Verifying learning systemsI Verifying Support Vector Machines via SMT solving [1, 2]

Problem 2: Safety at the deliberative levelI Safe Standing Up for Humanoid Robots via Model Checking [3]


Verifying Support Vector Machines via SMT solving


Learning in Physical Domains

Machine learning is pervasive, several applications to CPS

In this context, data samples are expensive

CPS are safety-critical

Central Question

? �∧

images courtesy of: scandinavianstudy.com, wikipedia.com


Learning in Physical Domains

Machine learning is pervasive, several applications to CPS

In this context, data samples are expensive

CPS are safety-critical

Central Question

? �∧

images courtesy of: scandinavianstudy.com, wikipedia.com


How?

⇒From domain ... infer automatically ... ... models as

interaction... (active learning) Support Vector Machines.

Kernel machines are funny beasts!

Statistical guarantees only (at best)

R→ R functions ⇒ no (easy) verification algos


How?

⇒From domain ... infer automatically ... ... models as

interaction... (active learning) Support Vector Machines.

Kernel machines are funny beasts!

Statistical guarantees only (at best)

R→ R functions ⇒ no (easy) verification algos


How? - cont’d

⇒From concrete ... extract ... ... conservative

machines... (automatically) abstractions.

Abstractions can be model checked!

Quantifier-Free Linear Arithmetic over RConcrete machine is safe if abstract one is safe


Building the abstraction

Support Vector Regression y =∑l

i=1(αi − α∗i )K (xi , x) + b

Radial Basis Function K (xi , x) = e−1

2σ2‖x−xi‖2

p

[min(G(x),G(x + p)),G(µ)] if

[min(G(x),G(x + p)),max(G(x),G(x + p))] if

[0,G(x0)] otherwise

And now...

QF LRA encoding for each Ki

Can be verified with SMT X

if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)

then (Ki ≥ min(G(0),G(0.5)) (Ki ≤ max(G(0),G(0.5)). . .if (x1 < x)

then (Ki ≥ 0) (Ki ≤ G(x0))




i=1(αi − α∗i )K (xi , x) + b


2σ2‖x−xi‖2

p



[0,G(x0)] otherwise

And now...



if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)


then (Ki ≥ 0) (Ki ≤ G(x0))




i=1(αi − α∗i )K (xi , x) + b


2σ2‖x−xi‖2

p



[0,G(x0)] otherwise

And now...



if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)


then (Ki ≥ 0) (Ki ≤ G(x0))




i=1(αi − α∗i )K (xi , x) + b


2σ2‖x−xi‖2

p



[0,G(x0)] otherwise

And now...



if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)


then (Ki ≥ 0) (Ki ≤ G(x0))




i=1(αi − α∗i )K (xi , x) + b


2σ2‖x−xi‖2

p



[0,G(x0)] otherwise

And now...



if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)


then (Ki ≥ 0) (Ki ≤ G(x0))




i=1(αi − α∗i )K (xi , x) + b


2σ2‖x−xi‖2

p



[0,G(x0)] otherwise

And now...



if (x ≤ x0)then (Ki ≥ 0) (Ki ≤ G(x0))

. . .if (0 < x)(x ≤ 0.5)


then (Ki ≥ 0) (Ki ≤ G(x0))


What do we verify?

We are interested in the stability of the SVR ς, i.e.,

∀x1, x2 ∈ I : ||x1 − x2|| ≤ δ → ||ς(x1)− ς(x2)|| ≤ ε

and compare the results with Gaussian Process Regression (GPR).

Why?GPR is a higher order kernel method natively providing statistical boundson predictions

Samples MAE p ε CPU Time (s)SVR 10 samples 0.036 0.09 0.38 465.201

+ 20 samples 0.032 0.09 0.40 2041.645SMT 40 samples 0.030 0.06 0.42 7480.110

Samples MAE σ Time10 samples 0.106 0.291 3.336

GP 20 samples 0.095 0.303 6.80540 samples 0.099 0.280 14.035


What do we verify?


∀x1, x2 ∈ I : ||x1 − x2|| ≤ δ → ||ς(x1)− ς(x2)|| ≤ ε








What do we verify?


∀x1, x2 ∈ I : ||x1 − x2|| ≤ δ → ||ς(x1)− ς(x2)|| ≤ ε








What do we verify?


∀x1, x2 ∈ I : ||x1 − x2|| ≤ δ → ||ς(x1)− ς(x2)|| ≤ ε








Safe Standing Up for Humanoid Robots via Model Checking


Standing up for humanoid robots

Bipedal locomotion is a challenging task for a humanoid robot

Reliable standing-up routines are fundamental in case of a fall

Conventional motion-planning is difficult to apply

Scripted strategies are often used:I lack flexibility (by definition)I reliability and robustness issuesI daunting task

Learning offers an elegant solution
















Objectives

Problem: Synthesize a standing-up procedure that minimizesthe expected number of falls, self-collisions and actions.

?

Simulated Bioloid humanoid in V-REP


Reinforcement learning

Goal: Learn an optimal strategy for a non-deterministic probabilisticsystem

Given:I state set S , initial state s init

I action set ActI a possibility to observe the successor state when executing a given

action in a given stateI a reward function R : S × Act × S → R

Method: Q-learning







Method: Q-learning







Method: Q-learning







Method: Q-learning


Q-learning: Learning through simulation


Q-learning on an example

s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .

Qk+1(s0, a1) = 0.5 · Qk(s0, a1)+0.5 · (100 + 1 ·maxai∈ActQk(s1, ai ))



s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .




s0

s1

a1

Rewards:

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

R s0 s1 s2 . . .

(s0, a0) -10 100 -50

(s0, a1) -10 100 -50

. . .

(s1, a0) -50 -10 100

(s1, a1) -50 -10 100

. . .

Q-matrix:

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 0 0

s1 0 0 0

s2 0 0 0

. . .

Q a0 a1 a2 . . .

s0 0 50 0

s1 0 0 0

s2 0 0 0

. . .



Q-learning: The action space

The robot has 18 joints → intractable action space

Simplifying assumptions:

some joints are inhibited

joints operate symmetrically

action space is discretized

We end up with 730 actions:

3 upper limbs, 3 lower limbs, 3 actions each

→ action space {−1, 0, 1}6

additional action arestart for safe restart


Q-learning: The state space

Robot states: s = (x , y , z , q0, q1, q2, q3, ρ1, . . . , ρ18) ∈ R25

Infinite state space!

Full grid discretization is infeasible

Input: scripted trace A =(aA0 , . . . , a

Ak

)for standing-up

Explore states in a “tube” around A

s initaA0

a∈Act s initaA0 aA1

a∈Act s initaA0 aA1 aA2

a∈Act . . .

Discretize the so reachable states → 17614 states

Still, several adaptation of Q-learning were needed to achieveconvergence

Several additional paths to the goal could be identified (even shorter)


Q-learning: The state space

Robot states: s = (x , y , z , q0, q1, q2, q3, ρ1, . . . , ρ18) ∈ R25

Infinite state space!

Full grid discretization is infeasible

Input: scripted trace A =(aA0 , . . . , a

Ak

)for standing-up

Explore states in a “tube” around A

s initaA0

a∈Act s initaA0 aA1

a∈Act s initaA0 aA1 aA2

a∈Act . . .

Discretize the so reachable states → 17614 states

Still, several adaptation of Q-learning were needed to achieveconvergence

Several additional paths to the goal could be identified (even shorter)


Static and online methods: Our frameworkWait but... how to guarantee that our properties of interest are satisfied?

State space generation

Q-learning

Model generation

Greedy model repair

Online monitoring

stable strategy σ

stable strategy σ

safe stable strategy σ

new observations M,

current strategy σ

That’s why we combine it with static analysis and online monitoring


Static and online methods: Our frameworkWait but... how to guarantee that our properties of interest are satisfied?

State space generation

Q-learning

Model generation

Greedy model repair

Online monitoring

stable strategy σ

stable strategy σ

safe stable strategy σ

new observations M,

current strategy σ

That’s why we combine it with static analysis and online monitoring


Model repair: IdeaHow can we adapt schedulers to satisfy certain safety requirements?

Collect statistical information during Q-learning

Compute a Markov decision process (MDP) model of the robot

Abstract scheduler → parametric DTMC

Instantiate parametric DTMC model by the scheduler from Q-learning

Check safety by probabilistic model checking

Repair the scheduler if unsafe

s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far

Reach.prob. in model 0.001 0.005 0.048

Reach.prob. in simulation 0 0.003 0.046

Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far

Reach.prob. in model before repair 0.001 0.005 0.048

Reach.prob. in simulation before repair 0 0.003 0.046

Reach.prob. in model after repair 0.0003 6.8 · 10−6 0.02

Reach.prob. in simulation after repair 0 0 0









s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far













s

s1

s2

s ′1

s ′2

a1

0.2

0.8

a20.6

0.4

s

s1

s2

s ′1

s ′2

σ(s, a1

) · 0.2

σ(s, a1)· 0.8

σ(s, a2) · 0.6σ(s, a2 ) · 0.4

s

s1

s2

s ′1

s ′2

0.3· 0.2

0.3 · 0.8

0.7 · 0.60.7 · 0.4

s fall scoll s far



Pr s1(♦B)

Pr s2(♦B)

Pr s′1(♦B)

Pr s′2(♦B)

+

>+

s

s1

s2

s ′1

s ′2

0.28 · 0

.2

0.28 · 0.8

0.72 · 0.60.72 · 0.4

s fall scoll s far






Online monitoring

So now we deploy our safe, repaired strategy on the real robot andeverything should be fine right?

WRONG

What if the assumptions on which the model was built change?

environmental changes, robot failures . . .

Looks like this is a problem we could solve using. . .

Online Monitoring


Online monitoring


WRONG




Online Monitoring


Online monitoring


WRONG




Online Monitoring


Online monitoring


WRONG




Online Monitoring


Online monitoring


WRONG




Online Monitoring


Online monitoring

We collect statistical observations during deployment

From time to time, we update the MDP model with the newobservations

Model check and repair the scheduler if needed

We simulated that a part of the robot was broken

Out of 300 simulation episodes only 2 reached the goal state

After a feedback loop, in further 300 episodes, 197 reached the goal


Online monitoring

We collect statistical observations during deployment

From time to time, we update the MDP model with the newobservations

Model check and repair the scheduler if needed

We simulated that a part of the robot was broken

Out of 300 simulation episodes only 2 reached the goal state

After a feedback loop, in further 300 episodes, 197 reached the goal


Thank you for your attention

In case you still want to speak with me after thistalk, you can find me at:

[email protected]

[email protected]

Francesco Leofante, Luca Pulina, and Armando Tacchella.

Learning with safety requirements: State of the art and open questions.

In Proc. of RCRA@AI*IA 2016), pages 11–25, 2016.

Francesco Leofante and Armando Tacchella.

Learning in physical domains: Mating safety requirements and costly sampling.

In Proc. of AI*IA 2016, pages 539–552, 2016.

Francesco Leofante, Simone Vuotto, Erika Abraham, Armando Tacchella, and NilsJansen.

Combining static and runtime methods to achieve safe standing-up for humanoidrobots.

In Proc. of ISoLA 2016, Part I, pages 496–514, 2016.Leofante Robust AI PUMA Seminar 24 / 24

Documents

Are you doing what I think you are doing? Robust AI via ...schulzef/2017-05-24-Francesco-Leofante.pdf · Robust AI via Veri cation, Monitoring and Repair ∗ Francesco Leofante RWTH