Supervised Learning of...

Supervised Learning of Behaviors

CS 294-112: Deep Reinforcement Learning

Sergey Levine

Class Notes

1. Make sure you sign up for Piazza!

2. Homework 1 is now out

3. Remember to start forming final project groups

Today’s Lecture

1. Definition of sequential decision problems

2. Imitation learning: supervised learning for decision makinga. Does direct imitation work?

b. How can we make it work more often?

3. Case studies of recent work in (deep) imitation learning

4. What is missing from imitation learning?

• Goals:• Understand definitions & notation

• Understand basic imitation learning algorithms

• Understand their strengths & weaknesses

1. run away

2. ignore

3. pet

Terminology & notation

1. run away

2. ignore

3. pet

Aside: notation

Richard Bellman Lev Pontryagin

управление

Images: Bojarski et al. ‘16, NVIDIA

trainingdata

supervisedlearning

Imitation Learning

behavior cloning

Does it work? No!

Does it work? Yes!

Video: Bojarski et al. ‘16, NVIDIA

Why did that work?

Bojarski et al. ‘16, NVIDIA

Can we make it work more often?

stability

Learning from a stabilizing controller

(more on this later)

Can we make it work more often?

DAgger: Dataset Aggregation

Ross et al. ‘11

DAgger Example

Ross et al. ‘11

What’s the problem?

Ross et al. ‘11

Can we make it work without more data?

• DAgger addresses the problem of distributional “drift”

• What if our model is so good that it doesn’t drift?

• Need to mimic expert behavior very accurately

• But don’t overfit!

Why might we fail to fit the expert?

1. Non-Markovian behavior

2. Multimodal behavior

behavior depends only on current observation

If we see the same thing twice, we do the same thing twice, regardless of what happened before

Often very unnatural for human demonstrators

behavior depends on all past observations

How can we use the whole history?

variable number of frames, too many weights

How can we use the whole history?

RNN state

shared weights

Typically, LSTM cells work better here

1. Non-Markovian behavior

2. Multimodal behavior1. Output mixture of

Gaussians

2. Latent variable models

3. Autoregressive discretization

1. Output mixture of Gaussians

Look up some of these:• Conditional variational autoencoder• Normalizing flow/realNVP• Stein variational gradient descent

(discretized) distribution over dimension 1 only

discrete sampling

dim 1 value

dim 2 value

Imitation learning: recap

• Often (but not always) insufficient by itself• Distribution mismatch problem

• Sometimes works well• Hacks (e.g. left/right images)

• Samples from a stable trajectory distribution

• Add more on-policy data, e.g. using Dagger

• Better models that fit more accurately

trainingdata

supervisedlearning

Case study 1: trail following as classification

Case study 2: Imitation with LSTMs

Follow-up: adding vision

Supervised Learning of...

Documents

docs.house.govdocs.house.gov/meetings/FA/FA18/20170719/106289/HHRG-115-FA18-… · 7/17/2017 Saudi Arabia: USCIRF Confirms Material Inciting Violence, Intolerance Remains in Textbooks

CS 188: Artificial Intelligencecs188/fa18/assets/slides/lec21/FA18… · Linear Classifiers §Inputs are feature values §Each feature has a weight §Sum is the activation §If the

For Sale - bulkloader.prd.pl.artirix.com.s3.amazonaws.combulkloader.prd.pl.artirix.com.s3.amazonaws.com/42492aaa-fa18-442e-ba56... · Swiss Valley Garden Centre, Felinfoel Road, Llanelli,

Russia: Counterterrorism Partner or Fanning the Flamesdocs.house.gov/meetings/FA/FA18/20171107/106596/HHRG-115-FA18 … · P a g e | 2 Note: The statements, views, and policy recommendations

Google Brain team Thanks: AutoML: Automated Machine Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-25.pdf · Searching for Activation Functions. Prajit Ramachandran,

Exploration (Part 2) and Transfer Learning - rll.berkeley.edurll.berkeley.edu/deeprlcourse/f17docs/lecture_14_transfer.pdf · Transfer learning terminology transfer learning: using

Israel Imperiled: Threats to the Jewish Statedocs.house.gov/meetings/FA/FA18/20160419/104817/HHRG-114-FA18-W...Israel Imperiled: Threats to the Jewish State ... 2014, accessed via

CS 294-5: Statistical Natural Language Processingcs188/fa18/assets/slides/lec12/FA18… · CS 188: Artificial Intelligence Probability Instructors: Dan Klein and Pieter Abbeel ---

PancreaticHydatidCystCausingAcutePancreatitis ...downloads.hindawi.com/journals/cris/2018/9821403.pdfOther investigations were performed; upper gastrointes-tinalendoscopy(UGIendoscopy)demonstratedanesophagitis

Introduction to Reinforcement Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-4.pdfIntroduction to Reinforcement Learning CS 294-112: Deep Reinforcement Learning

Deep Reinforcement Learning CS 294 - 112rail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-1.pdf · •Greg Kahn will TensorFlow and neural networks on Wed next week (8/29)

Exploration - Robot Learningrll.berkeley.edu/deeprlcourse/f17docs/lecture_13_exploration.pdfExploration and exploitation examples •Restaurant selection •Exploitation: go to your

CS188 Outline CS 188: Artificial Intelligence We’re done with Part I ...cs188/fa18/assets/slides/lec12/FA18… · M: meningitis, S: stiff neck Note: posterior probability of meningitis

September 2013 Jessica D. Lewis MIDDLE EAST …docs.house.gov/meetings/FA/FA18/20131212/101591/HHRG-113-FA18-W...VBied aTTaCks oVer Time ... have mobilized in response to AQi’s attacks.12

Connections Between Inference and Controlrll.berkeley.edu/deeprlcourse/f17docs/lecture_11_control... · 2017. 10. 4. · Linearly solvable Markov decision problems: one framework

CCNY Engineering 101csweb.engr.ccny.cuny.edu/~fenster/engr101/Fa18/engr101-fa18-intro.pdf · Associate director, Computer Engineering program. I organize these Friday talks. Sam Fenster

Inverse Reinforcement Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-16.pdfReview • IRL: infer unknown reward from expert demonstrations • MaxEnt IRL: infer

Deep Reinforcement Learning CS 294 - rail.eecs.berkeley.edurail.eecs.berkeley.edu/deeprlcourse/static/slides/lec-1.pdf · Prerequisites & Enrollment •All enrolled students must

Meta Reinforcement Learningrail.eecs.berkeley.edu/deeprlcourse-fa18/static/slides/lec-20.pdf · Learning to Learn with Gradients. PhD Thesis 2018 {k rollouts from dataset of datasets

POACHING AND TERRORISM: A NATIONAL SECURITY …docs.house.gov/meetings/FA/FA18/20150422/103355/HHRG-114-FA18... · POACHING AND TERRORISM: A NATIONAL SECURITY CHALLENGE ... trafficked