91
Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018

@EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

  • Upload
    vuquynh

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Tutorial on Safe Reinforcement Learning

Felix Berkenkamp, Andreas Krause

@EWRL, October 1 2018

Page 2: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Reinforcement Learning (RL)

2

Need to trade exploration & exploitation

Reinforcement Learning: An IntroductionR. Sutton, A.G. Barto, 1998

Felix Berkenkamp, Andreas Krause

State

Action

Agent

Environment

Reward

Page 3: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Towards Safe RL

3

...

Felix Berkenkamp, Andreas Krause

How can we learn to actsafely in unknown environments?

Page 4: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Therapeutic Spinal Cord Stimulation

4

S. Harkema, The Lancet, Elsevier

Felix Berkenkamp, Andreas Krause

Safe Exploration for Optimization with Gaussian ProcessesY. Sui, A. Gotovos, J. W. Burdick, A. Krause

Stagewise Safe Bayesian Optimization with Gaussian ProcessesY. Sui, V. Zhuang, J. W. Burdick, Y. Yue

Page 5: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe Controller Tuning

5Felix Berkenkamp, Andreas Krause

Safe Controller Optimization for Quadrotors with Gaussian ProcessesF. Berkenkamp, A. P. Schoellig, A. Krause, ICRA 2016

Page 6: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Outline

6

Specifying safety requirements and quantify risk

Acting safely in known environments

Acting safely in unknown environments

Safe exploration (model-free and model-based)

Felix Berkenkamp, Andreas Krause

Page 7: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Specifying safe behavior

7

Is this trajectory safe?

Monitoring temporal properties of continuous signalsO. Maler, D. Nickovic, FT, 2004

Safe Control under UncertaintyD. Sadigh, A. Kapoor, RSS, 2016

Felix Berkenkamp, Andreas Krause

Page 8: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

What does it mean to be safe?

8Felix Berkenkamp, Andreas Krause

Safety ≅ avoid bad trajectories (states/actions)

Fix a policy How do I quantify uncertainty and risk?

State

Action

Agent

Environment

Reward

Page 9: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Stochastic environment / policy

9

Expected safety

Safety function is now a random variable

Felix Berkenkamp, Andreas Krause

Page 10: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Expected safety can be misleading

10Felix Berkenkamp, Andreas Krause

Safe in expectation!

0

Page 11: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Expected safety and variance

11Felix Berkenkamp, Andreas Krause

Page 12: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Risk sensitivity

12

Even at low variance, a significant amount of trajectories may still be unsafe.

Felix Berkenkamp, Andreas Krause

Page 13: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Value at Risk

13

Use confidence lower-bound instead!

Felix Berkenkamp, Andreas Krause

Page 14: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Conditional Value at Risk

14Felix Berkenkamp, Andreas Krause

Page 15: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Worst-case

15Felix Berkenkamp, Andreas Krause

Page 16: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Notions of safety

16

Stochastic Worst--case

Expected risk

Moment penalized

Value at risk

Conditional value at risk

Robust Control

Formal verification

Felix Berkenkamp, Andreas Krause

Page 17: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Acting in known model with safety constraints

17Felix Berkenkamp, Andreas Krause

Constrained Markov decision processesEitan Altman, CRC Press, 1999

Essentials of robust controlKemin Zhou, John C. Doyle, PH, 1998

Robust control of Markov decision processes with uncertain transition matricesArnab Nilim, Laurent El Ghaoui, OR, 2005

State

Action

Agent

Environment

Reward

Page 18: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Reinforcement Learning

18Felix Berkenkamp, Andreas Krause

State

Action

Agent

Environment

Reward

Key challenge: Don’t know the consequences of actions!

Page 19: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

How to start acting safely?

19

No knowledge!Now what??

Felix Berkenkamp, Andreas Krause

Page 20: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Imitation learning

20

Dataset

No experience gained here!

Felix Berkenkamp, Andreas Krause

Page 21: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Imitation learning algorithms

21

Data aggregation

Generate state sequence with policy

Policy aggregation

Search-based Structured Prediction Hal Daume III, John Langford, Daniel Marcu, ML, 2009

Efficient Reductions for Imitation LearningStephan Ross, Drew Bagnell, AISTATS 2010

Generate state sequence with policy

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning Stephane Ross, Geoffrey J. Gordon, J. Andrew Bagnell, 2011

Felix Berkenkamp, Andreas Krause

Page 22: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe Imitation learning

22

EnsembleDAgger: A Bayesian Approach to Safe Imitation Learning Kunal Menda, Katherine Driggs-Campbell, Mykel J. Kochenderfer, arXiv2018

Only apply action from learned policy when

Query-Efficient Imitation Learning for End-to-End Autonomous Driving Jiakai Zhang, Kyunghyun Cho, AAAI, 2017

Felix Berkenkamp, Andreas Krause

Page 23: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

What to do with this initial policy?

23Felix Berkenkamp, Andreas Krause

Can find an initial, safe policy based on domain knowledge.

State

Action

Agent

Environment

Reward

How to improve?

Page 24: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Prior knowledge as backup for learning

24

Provably safe and robust learning-based model predictive controlA. Aswani, H. Gonzalez, S.S. Satry, C.Tomlin, Automatica, 2013

Safe Reinforcement Learning via Shielding M. Alshiekh, R. Bloem, R. Ehlers, B. Könighofer, S. Nickum, U.Topcu, AAAI, 2018

Linear Model Predictive Safety Certification for Learning-based ControlK.P. Wabersich, M.N. Zeilinger, CDC, 2018

Safety controller takes over

Know what is safeLearner is seen as a disturbance

Safe Exploration of State and Action Spaces in Reinforcement LearningJ. Garcia, F. Fernandez, JAIR, 2012

Safe Exploration in Continuous Action SpacesG. Dalai, K. Dvijotham, M. Veccerik, T. Hester, C. Paduraru, Y. Tassa, arXiv, 2018

Felix Berkenkamp, Andreas Krause

Page 25: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Prior knowledge as backup for learning

25

Safety controller takes over

Know what is safeLearner is seen as a disturbance

Felix Berkenkamp, Andreas Krause

Need to know what is unsafe in advance.

Without learning, need significant prior knowledge.

The learner does not know what’s happening!

Page 26: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safety as improvement in performance (Expected safety)

26

Di Castro, Shie Mannor, ICML 2012]

Performance

Initial, stochastic policy

Safety constraint

Need to estimate based only on data from

Felix Berkenkamp, Andreas Krause

Page 27: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Off-Policy Policy Evaluation

27Felix Berkenkamp, Andreas Krause

What does this tell me about a different policy ?

Importance sampling:

(there are better ways to do this)

Eligibility Traces for Off-Policy Policy EvaluationDoina Precup, Richard S. Sutton, S. Singh

Page 28: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Guaranteeing improvement

28

Generate trajectories using

Unbiased estimate of . What about ?

With probability at least :

Use concentration inequality to obtain confidence intervals

Felix Berkenkamp, Andreas Krause

Page 29: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Overview of expected safety pipeline

29

Trajectory dataTraining set Candidate policy

Evaluate safetyTest set

Use new policy if safeHigh Confidence Policy ImprovementPhilip S. Thomas, Georgios Theocharous, Mohammad Havamzadeh, ICML 2015

Safe and efficient off-policy reinforcement learningRemi Munos, Thomas Stepleton, Anna Harutyunyan, Marc G. Bellemare, NIPS, 2016

Constrained Policy OptimizationJoshua Achiam, David Held, Aviv Tamar, Pieter Abbeel, ICML, 2017

Felix Berkenkamp, Andreas Krause

Page 30: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Summary part one

30

Stochastic- Expected risk- Moment penalized- VaR / CVaR

Worst-case- Formal verification- Robust optimization

Reviewed safety definitions

Saw how to obtain a first, safe policy

Reviewed a first method for safe learning in expectation

Second half: Explicit safe exploration

More model-free safe exploration

Model-based safe exploration without ergodicity

Felix Berkenkamp, Andreas Krause

Page 31: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Policy

Reinforcement learning (recap)

31

Image: Plainicon, https://flaticon.com

Exploration

Policy update

Felix Berkenkamp, Andreas Krause

Page 32: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe reinforcement learning

32

Image: Plainicon, https://flaticon.com

Statistical models to guarantee safety

Model-free Model-based

Estimateand optimize

Estimate/identify,then plan/control

Felix Berkenkamp, Andreas Krause

Page 33: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Model-free reinforcement learning

33

Tracking performance

Safety constraint

Few, noisy experiments

Safety for all experiments

Felix Berkenkamp, Andreas Krause

Page 34: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe policy optimization

34

(Noisy) Reward

(Noisy) Constraint

Goal:

Safety: with probability ≥ 1-ẟ

Page 35: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe policy optimization illustration

35

J(𝛉)

𝛉

Safety threshold

X

Safe seed

*

Reachableoptimum O

Globaloptimum=g(𝛉)

Page 36: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Starting Point: Bayesian Optimization

36Felix Berkenkamp, Andreas Krause

Expected/most prob. improvement [Močkus et al. ’78,’89]Information gain about maximum [Villemonteix et al. ’09Knowledge gradient [Powell et al. ’10]Predictive Entropy Search [Hernández-Lobato et al. ’14] TruVaR [Bogunovic et al’17]Max Value Entropy Search [Wang et al’17]

Constraints/Multiple Objectives [Snoek et al. ‘13, Gelbart et al. ’14, Gardner et al. ‘14, Zuluaga et al. ‘16]

Acquisition function

Page 37: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Gaussian process

37Felix Berkenkamp, Andreas Krause

Page 38: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Gaussian process

37Felix Berkenkamp, Andreas Krause

Page 39: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Gaussian process

37Felix Berkenkamp, Andreas Krause

Page 40: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Gaussian process

37Felix Berkenkamp, Andreas Krause

Page 41: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Gaussian process

37Felix Berkenkamp, Andreas Krause

Page 42: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 43: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 44: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 45: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 46: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 47: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 48: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 49: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 50: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 51: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 52: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 53: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 54: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 55: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 56: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT: Constrained Bayesian optimization

42Felix Berkenkamp, Andreas Krause

Page 57: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

SafeOPT Guarantees

57

Theorem (informal):

Under suitable conditions on the kernel and on , there exists a function such that for any ε>0 and δ>0, it holds with probability at least that

1) SAFEOPT never makes an unsafe decision

2) After at most iterations, it found an ε-optimal reachable point

Safe Exploration for Active Learning with Gaussian ProcessesJ. Schreiter, D. Nguyen-Tuong, M. Eberts, B. Bischoff, H. Markert, M. Toussaint

Safe Exploration for Optimization with Gaussian ProcessesY. Sui, A. Gotovos, J.W. Burdick, A. Krause

Bayesian Optimization with Safety Constraints: Safe and Automatic Parameter Tuning in RoboticsF.Berkenkamp, A.P. Schoellig, A. Krause

Felix Berkenkamp, Andreas Krause

Page 58: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

58Felix Berkenkamp

Video link: http://tiny.cc/icra16_video

Page 59: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

59Felix Berkenkamp

Page 60: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Modelling context

60

Additional parameters

Felix Berkenkamp, Andreas Krause

Page 61: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

61Felix Berkenkamp

Page 62: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Therapeutic Spinal Cord Stimulation

62

S. Harkema, The Lancet, Elsevier

Felix Berkenkamp, Andreas Krause

Stagewise Safe Bayesian Optimization with Gaussian ProcessesY. Sui, V. Zhuang, J. W. Burdick, Y. Yue

Page 63: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Multiple sources of information

63

cheap, inaccurate expensive, accurate

Automatic tradeoff

Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian OptimizationA. Marco, F. Berkenkamp, P. Hennig, A. Schöllig, A. Krause, S. Schaal, S. Trimpe, ICRA'17

Felix Berkenkamp, Andreas Krause

Page 64: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Modeling this in a Gaussian process

64

simulation experiment

Felix Berkenkamp, Andreas Krause

Page 65: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

65Felix Berkenkamp

Video at https://youtu.be/oq9Qgq1Ipp8

Page 66: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Safe reinforcement learning

66

Image: Plainicon, https://flaticon.com

Statistical models to guarantee safety

Model-free Model-based

Estimateand optimize

Estimate/identify,then plan/control

Felix Berkenkamp, Andreas Krause

Page 67: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

From bandits to Markov decision processes

67

Bandit Markov Decision Process

Can use the same Bayesian model to determine safety of states

Felix Berkenkamp, Andreas Krause

Page 68: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Challenges with long-term action dependencies

68

Image: Plainicon, VectorsMarket, https://flaticon.com

Non-ergodic MDP

Felix Berkenkamp, Andreas Krause

Page 69: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Rendering exploration safe

69

Safe Exploration in Markov Decision ProcessesT.M. Moldovan, P. Abbeel, ICML, 2012

Safe Exploration in Finite Markov Decision Processes with Gaussian ProcessesM. Turchetta, F. Berkenkamp, A. Krause, NIPS, 2016

Safe Exploration and Optimization of Constrained MDPs using Gaussian ProcessesAkifumi Wachi, Yanan Sui, Yisong Yue, Masahiro Ono, AAAI, 2018

Exploration:

Reduce model uncertainty

Only visit states from which the agent can recover safely

Felix Berkenkamp, Andreas Krause

Safe Control under UncertaintyD. Sadigh, A. Kapoor, RSS, 2016

Page 70: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

On a real robot

70Felix Berkenkamp, Andreas Krause

Page 71: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

From bandits to Markov decision processes

71

Bandit Markov Decision Process

Next: model-based reinforcement learning

Felix Berkenkamp, Andreas Krause

Page 72: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Policy

Reinforcement learning (recap)

72

Image: Plainicon, https://flaticon.com

Exploration

Policy update

Felix Berkenkamp, Andreas Krause

Page 73: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Policy

Model-based reinforcement learning

73

Image: Plainicon, https://flaticon.com

Exploration

Model learning

Felix Berkenkamp, Andreas Krause

Page 74: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Policy

Safe model-based reinforcement learning

74

Image: Plainicon, https://flaticon.com

Safe exploration

Statistical model learning

Felix Berkenkamp, Andreas Krause

Page 75: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

A Bayesian dynamics model

75

Dynamics

Felix Berkenkamp, Andreas Krause

Page 76: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Region of attraction

76

unsafe

Felix Berkenkamp, Andreas Krause

Baseline policy is safe

Page 77: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Linear case

77

Safe and Robust Learning Control with Gaussian ProcessesF. Berkenkamp, A.P. Schoellig, ECC, 2015

Regret Bounds for Robust Adaptive Control of the Linear Quadratic RegulatorS. Dean, H. Mania, N. Matni, B. Recht, S. Tu, arXiv, 2018

Designing safe controllers for quadratic costs is a convex optimization problem

Uncertainty about entries

Felix Berkenkamp, Andreas Krause

Page 78: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Outer approximation contains true dynamics for all time steps with probability at least

Forwards-propagating uncertain, nonlinear dynamics

78

Learning-based Model Predictive Control for Safe Exploration T. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018

Felix Berkenkamp, Andreas Krause

Page 79: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Region of attraction

79

unsafe safety trajectory

exploration trajectoryfirst step same

Theorem (informally):

Under suitableconditions can alwaysguarantee that we areable to return to thesafe set

Felix Berkenkamp, Andreas Krause

Page 80: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Model predictive control references

80

Learning-based Model Predictive Control for Safe ExplorationT. Koller, F. Berkenkamp, M. Turchetta, A. Krause, CDC, 2018

Reachability-Based Safe Learning with Gaussian ProcessesA.K. Akametalu, J.F. Fisac, J.H. Gillula, S. Kaynama, M.N. Zeilinger, C.J. Tomlin, CDC, 2014

Robust constrained learning-based NMPC enabling reliable mobile robot path trackingC.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

Data-Efficient Reinforcement Learning with Probabilistic Model Predictive ControlS. Kamthe, M.P. Deisenroth, AISTATS, 2018

Chance Constrained Model Predictive ControlA.T. Schwarm, M. Nikolaou, AlChE, 1999

Felix Berkenkamp, Andreas Krause

Page 81: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Example

81Felix Berkenkamp, Andreas Krause

Robust constrained learning-based NMPC enabling reliable mobile robot path trackingC.J. Ostafew, A.P. Schoellig, T.D. Barfoot, IJRR, 2016

Video at https://youtu.be/3xRNmNv5Efk

Page 82: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Region of attraction

82

unsafe safety trajectory

exploration trajectoryfirst step same

Exploration limited by size of the safe set!

Felix Berkenkamp, Andreas Krause

Page 83: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Region of attraction

83

unsafe

Theorem (informally):

Under suitable conditionscan identify (near-)maximal subset of X on which π is stable, while never leaving the safe set

Initial safe policy

Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Felix Berkenkamp, Andreas Krause

Page 84: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Lyapunov functions

84

[A.M. Lyapunov 1892]

Felix Berkenkamp, Andreas Krause

Lyapunov Design for Safe Reinforcement LearningT.J. Perkings, A.G. Barto, JMLR, 2002

Page 85: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Lyapunov functions

85Felix Berkenkamp, Andreas Krause

Page 86: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Illustration of safe learning

86

Policy

Need to safely explore!

Felix Berkenkamp, Andreas Krause

Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Page 87: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Illustration of safe learning

87

Policy

Felix Berkenkamp, Andreas Krause

Safe Model-based Reinforcement Learning with Stability GuaranteesF. Berkenkamp, M. Turchetta, A.P. Schoellig, A. Krause, NIPS, 2017

Page 88: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Lyapunov function

88Felix Berkenkamp, Andreas Krause

Finding the right Lyapunov function is difficult!

The Lyapunov Neural Network: Adaptive Stability Certification for Safe Learning of Dynamic SystemsS.M. Richards, F. Berkenkamp, A. Krause

Weights - positive-definiteNonlinearities - trivial nullspace

Decision boundary

Page 89: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Towards safe reinforcement learning

89

unsafe safety trajectory

exploration trajectoryfirst step same

Felix Berkenkamp, Andreas Krause

How to verify safety of a given policy

Page 90: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Summary

90

Stochastic- Expected risk- Moment penalized- VaR / CVaR

Worst-case- Formal verification- Robust optimization

Reviewed safety definitions

Saw how to obtain a first, safe policy

Reviewed a first method for safe learning in expectation

Safe Bayesian optimization for safe exploration

How to transfer this intuition to the safe exploration in MDPs

Model-based methods (reachability=safety, certification, exploration)

Felix Berkenkamp, Andreas Krause

Page 91: @EWRL, October 1 2018 - las.inf.ethz.ch · Tutorial on Safe Reinforcement Learning Felix Berkenkamp, Andreas Krause @EWRL, October 1 2018. ... 12 Even at low variance, a significant

Where to go from here?

91

Safe Reinforcement Learning

Machine learning

Control theory

Statistics

Decision theory

Scalability (computational & statistical)Safe imitation learningTradeoff safety and performance (theory & practice)Lower bounds; define function classes that are safely learnable

Format methods

Felix Berkenkamp, Andreas Krause