Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample

2/18/15 1

Click to edit Master title style

Click to edit Master subtitle style

2/18/15 1

Model-based Sample Efficient RL

Emma Brunskill15-889eFall 2015

2/18/15 2

Sample Efficient RL

• Objectives• Probably Approximately Correct• Minimizing regret• Bayes-optimal RL

• Today: Model-based data efficient RL

2/18/15 3

Model-based Sample Efficient RL

• What objective is algorithm optimizing?• Using function approximation for the model• Planning with complex models • Computational constraints

2/18/15 4

Model Based Approaches

• Linear representations are fairly limited• Lots of powerful function approximators, e.g.

• Gaussian processes• Random forests• Neural networks

2/18/15 5

Exploration / Exploitation when Using Function Approximation for Models

• When learning a single policy from a batch of data, we didn’t have to address exploration vs exploitation

• Now we are doing online RL• If using function approximator to represent

the transition/reward models, how should we address exploration/exploitation?

2/18/15 6

s’ = + s

Figure adjusted from Wilson et al. JMLR 2014

s

Gaussian Process to Model MDP

2/18/15 7

s’ = + s

Figure adjusted from Wilson et al. JMLR 2014

s

Gaussian Process: Explicit Representation of Uncertainty Over Model Parameters

2/18/15 8

Slide from Ghahramanihttp://www.eurandom.tue.nl/events/workshops/2010/YESIV/Prog-Abstr_files/Ghahramani-lecture2.pdf

2/18/15 9

Can Exploit Structure in Dynamics

Slide modified from Jung & Stone ECML 2010

2/18/15 10

Gaussian

Processes for

Sample Efficient

Reinforcement

Learning with

RMAX-like

Exploration

(Jung & Stone,

ECML 2010)


2/18/15 11

Model Learning


2/18/15 12

Planning


2/18/15 13

Computational Cost

This is expensive!

Hard to scale to large dim state

spaces


2/18/15 14

Simulation Experiments


2/18/15 15


2/18/15 16


GP model with no exploration doing best

2/18/15 17

GP model + no exploration also best in larger domains

2/18/15 18

Summary

• GP models can be very useful for quickly learning a good dynamics model, especially if there’s structure in the domain

• Planning can be computationally expensive

• In domains considered here, leveraging GP’s representation of model parameter uncertainty not needed


2/18/15 19

Model Based Approaches

• Linear representations are fairly limited• Lots of powerful function approximators, e.g.

• Gaussian processes• Random forests• Neural networks

2/18/15 20

TEXPLORE

Slide modified from Todd Hester

2/18/15 21


2/18/15 22

Decision Trees for MDP Model


2/18/15 23

• Assume actions have similar effect across states

• srel = = s’-s• in some cases may be independent of s

(or be shared by many states)• Brunskill et al. 2008, Leffler et al. 2007,

Jong & Stone 2007

Assumption: Relative Effects

2/18/15 24

Using Decision Trees for Dynamics Model


2/18/15 25

Representing Uncertainty Over Model: Random Forest


2/18/15 26

Exploration/Exploitation with Random Forest Model of MDP


2/18/15 27

TEXPLORE Planning Using Random Forest of

Models

• Essentially, compute an average model (from random forest)

• Then use that for planning• Some computational advantages too

Equation from Hester & Stone JMLR 2013

2/18/15 28

MCTS for TEXPLORE Planning


2/18/15 29

TEXPLORE: Planning using UCT

Figure from Hester & Stone JMLR 2013

2/18/15 30

TEXPLORE: Reuse Tree Across Time Steps


2/18/15 31

TEXPLORE: UCT + lambda-returns


2/18/15 32

TEXPLORE


2/18/15 33

Simulations on Car Driving


2/18/15 34


2/18/15 35

Comparing to Other Approaches


2/18/15 36


2/18/15 37

Fuel World


2/18/15 38


2/18/15 39


2/18/15 40

Model Accuracy


2/18/15 41

Does it work on real car?


2/18/15 42

TEXPLORE

• Uses random forests to represent MDP dynamics & rewards

• Uses MCTS for planning

• In domains presented, little explicit exploration needed


2/18/15 43

Summary: Model-based Sample Efficient RL

• What objective is algorithm optimizing?• Today, empirical performance. No formal guarantees

• Using function approximation for the model can greatly speed learning (can exploit structure in dynamics model)

• Exploration / exploitation• Do we need to explicitly explore? • We’ll always explore things that look promising• In results saw today, didn’t need much explicit exploration

• Planning with complex models • Can be computationally prohibitive• Approximate approaches, like MCTS, useful

Documents

Click to edit Master title style Efficient RL Model-based Sampleebrun/15889e/lectures/Sample... · 2015. 11. 4. · Efficient RL Emma Brunskill 15-889e Fall 2015. 2/18/15 2 Sample