Supervised Learning of...

Supervised Learning of Behaviors

CS 285: Deep Reinforcement Learning, Decision Making, and Control

Sergey Levine

Class Notes

1. Homework 1 is out this evening

2. Remember to start forming final project groups• Final project assignment document is now out!

• Proposal due Sep 25

Today’s Lecture

1. Definition of sequential decision problems

2. Imitation learning: supervised learning for decision makinga. Does direct imitation work?

b. How can we make it work more often?

3. A little bit of theory

4. Case studies of recent work in (deep) imitation learning

• Goals:• Understand definitions & notation

• Understand basic imitation learning algorithms

• Understand tools for theoretical analysis

1. run away

2. ignore

3. pet

Terminology & notation

1. run away

2. ignore

3. pet

Aside: notation

Richard Bellman Lev Pontryagin

управление

Images: Bojarski et al. ‘16, NVIDIA

trainingdata

supervisedlearning

Imitation Learning

behavioral cloning

Does it work? No!

Does it work? Yes!

Video: Bojarski et al. ‘16, NVIDIA

Why did that work?

Bojarski et al. ‘16, NVIDIA

Can we make it work more often?

stability (more on this later)

Can we make it work more often?

DAgger: Dataset Aggregation

Ross et al. ‘11

DAgger Example

Ross et al. ‘11

What’s the problem?

Ross et al. ‘11

Can we make it work without more data?

• DAgger addresses the problem of distributional “drift”

• What if our model is so good that it doesn’t drift?

• Need to mimic expert behavior very accurately

• But don’t overfit!

Why might we fail to fit the expert?

1. Non-Markovian behavior

2. Multimodal behavior

behavior depends only on current observation

If we see the same thing twice, we do the same thing twice, regardless of what happened before

Often very unnatural for human demonstrators

behavior depends on all past observations

How can we use the whole history?

variable number of frames, too many weights

How can we use the whole history?

RNN state

shared weights

Typically, LSTM cells work better here

Aside: why might this work poorly?

“causal confusion” see: de Haan et al., “Causal Confusion in Imitation Learning”

Question 1: Does including history exacerbate causal confusion?

Question 2: Can DAgger mitigate causal confusion?

1. Non-Markovian behavior

2. Multimodal behavior1. Output mixture of

Gaussians

2. Latent variable models

3. Autoregressive discretization

1. Output mixture of Gaussians

Look up some of these:• Conditional variational autoencoder• Normalizing flow/realNVP• Stein variational gradient descent

(discretized) distribution over dimension 1 only

discrete sampling

dim 1 value

dim 2 value

Imitation learning: recap

• Often (but not always) insufficient by itself• Distribution mismatch problem

• Sometimes works well• Hacks (e.g. left/right images)

• Samples from a stable trajectory distribution

• Add more on-policy data, e.g. using Dagger

• Better models that fit more accurately

trainingdata

supervisedlearning

Case study 1: trail following as classification

Imitation learning: what’s the problem?

• Humans need to provide data, which is typically finite• Deep learning works best when data is plentiful

• Humans are not good at providing some kinds of actions

• Humans can learn autonomously; can our machines do the same?• Unlimited data from own experience

• Continuous self-improvement

1. run away

2. ignore

3. pet

Aside: notation

Richard Bellman Lev Pontryagin

A cost function for imitation?

trainingdata

supervisedlearning

Ross et al. ‘11

Some analysis How bad is it?

More general analysis

For more analysis, see Ross et al. “A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning”

Another imitation idea

Goal-conditioned behavioral cloning

1. Collect data 2. Train goal conditioned policy

3. Reach goals

1. run away

2. ignore

3. pet

Cost/reward functions in theory and practice

Supervised Learning of...

Documents

Exploration (Part 2) and Transfer Learning - rll.berkeley.edurll.berkeley.edu/deeprlcourse/f17docs/lecture_14_transfer.pdf · Transfer learning terminology transfer learning: using

Deep RL with Q-Functionsrail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-8.pdf · Deep RL with Q-Functions CS 285: Deep Reinforcement Learning, Decision Making, and Control Sergey

What is Reinforcement Learning? - University of …rll.berkeley.edu/deeprlcourse/docs/lec0.pdf · 2017-01-21 · What is Reinforcement Learning? I Branch of machine learning concerned

Advanced Model-Based Reinforcement Learningrail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-12.pdf2. Model-based RL with images •The POMDP model for model-based RL •Learning

Deep Reinforcement Learning CS 294 - 112rail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-1.pdf · •Greg Kahn will TensorFlow and neural networks on Wed next week (8/29)

SUPERVISED AND UNSUPERVISED ALGORITHM FOR LAND … KHLAIF-230.pdfSUPERVISED AND UNSUPERVISED ALGORITHM FOR LAND COVER CLASSIFICATION USING SPOT5 IMAGERY IMZAHIM ABDULKAREEM 1, AKRAM

SUPERVISED SPEECH SEPARATION AND PROCESSINGweb.cse.ohio-state.edu/~wang.77/pnl/theses/Han_Dissertation14.pdfSUPERVISED SPEECH SEPARATION AND PROCESSING DISSERTATION Presented in Partial

DeepReinforcementLearning - UC Berkeley Robot Learning Lab ...rll.berkeley.edu/deeprlcourse-fa15/docs/2015.08.26.Lecture01Intro.pdf · Figure 1.4: Categorization of various learning

Model-Based Reinforcement Learningrail.eecs.berkeley.edu › deeprlcourse-fa17 › f17docs › lecture_9_model_based_rl.pdfLearning CS 294-112: Deep Reinforcement Learning Sergey Levine

Transfer and Multi-Task Learning - Robotics and AI Labrail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-16.pdf · transfer learning: using experience from one set of tasks for

Experiential Learning (Supervised Agricultural …dass.missouri.edu/aged/resources/handbook/chap8.pdfSupervised Agricultural Experience ... 4. Activities must be instructional in nature

Advanced Imitation Learning Challenges and Open …rll.berkeley.edu/deeprlcourse/f17docs/lecture_17_challenges.pdf · Advanced Imitation Learning Challenges and Open Problems CS 294-112:

Introduction to Reinforcement Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-4.pdfIntroduction to Reinforcement Learning CS 294-112: Deep Reinforcement Learning

Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must

Deep Reinforcement Learning, Decision Making, and Controlrail.eecs.berkeley.edu/deeprlcourse-fa19/static/slides/lec-1.pdf · What is deep RL, and why should we care? standard computer

Transfer and Multi-Task Learning - Robotics and AI Labrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/... · 2019-07-27 · Transfer and Multi-Task Learning CS 294-112: Deep

Deep Reinforcement Learning CS 294 - 112rll.berkeley.edu/deeprlcourse/docs/week_1_lecture_1_introduction.pdf · 1. Homework 1: Imitation learning (control via supervised learning)

Towards a unified view of [.1cm] supervised learning and [.1cm] …rll.berkeley.edu/deeprlcourse/docs/mohammad_lecture.pdf · 2017-04-26 · RAML optimization Training with RAML is

PowerPoint Presentationrail.eecs.berkeley.edu/deeprlcourse-fa20/static/slides/...•Dont assume that the basic RL problem is set in stone Back to the Bigger Picture Learning as the

Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-25.pdf · AutoML: Automated Machine Learning Barret Zoph, Quoc Le