Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2/18/15 1
Click to edit Master title style
Click to edit Master subtitle style
2/18/15 1
Model-based Sample Efficient RL
Emma Brunskill15-889eFall 2015
2/18/15 2
Sample Efficient RL
• Objectives• Probably Approximately Correct• Minimizing regret• Bayes-optimal RL
• Today: Model-based data efficient RL
2/18/15 3
Model-based Sample Efficient RL
• What objective is algorithm optimizing?• Using function approximation for the model• Planning with complex models • Computational constraints
2/18/15 4
Model Based Approaches
• Linear representations are fairly limited• Lots of powerful function approximators, e.g.
• Gaussian processes• Random forests• Neural networks
2/18/15 5
Exploration / Exploitation when Using Function Approximation for Models
• When learning a single policy from a batch of data, we didn’t have to address exploration vs exploitation
• Now we are doing online RL• If using function approximator to represent
the transition/reward models, how should we address exploration/exploitation?
2/18/15 6
s’ = + s
Figure adjusted from Wilson et al. JMLR 2014
s
Gaussian Process to Model MDP
2/18/15 7
s’ = + s
Figure adjusted from Wilson et al. JMLR 2014
s
Gaussian Process: Explicit Representation of Uncertainty Over Model Parameters
2/18/15 8
Slide from Ghahramanihttp://www.eurandom.tue.nl/events/workshops/2010/YESIV/Prog-Abstr_files/Ghahramani-lecture2.pdf
2/18/15 9
Can Exploit Structure in Dynamics
Slide modified from Jung & Stone ECML 2010
2/18/15 10
Gaussian
Processes for
Sample Efficient
Reinforcement
Learning with
RMAX-like
Exploration
(Jung & Stone,
ECML 2010)
Slide modified from Jung & Stone ECML 2010
2/18/15 11
Model Learning
Slide modified from Jung & Stone ECML 2010
2/18/15 12
Planning
Slide modified from Jung & Stone ECML 2010
2/18/15 13
Computational Cost
This is expensive!
Hard to scale to large dim state
spaces
Slide modified from Jung & Stone ECML 2010
2/18/15 14
Simulation Experiments
Slide modified from Jung & Stone ECML 2010
2/18/15 15
Slide modified from Jung & Stone ECML 2010
2/18/15 16
Slide modified from Jung & Stone ECML 2010
GP model with no exploration doing best
2/18/15 17
GP model + no exploration also best in larger domains
2/18/15 18
Summary
• GP models can be very useful for quickly learning a good dynamics model, especially if there’s structure in the domain
• Planning can be computationally expensive
• In domains considered here, leveraging GP’s representation of model parameter uncertainty not needed
Slide modified from Jung & Stone ECML 2010
2/18/15 19
Model Based Approaches
• Linear representations are fairly limited• Lots of powerful function approximators, e.g.
• Gaussian processes• Random forests• Neural networks
2/18/15 20
TEXPLORE
Slide modified from Todd Hester
2/18/15 21
Slide modified from Todd Hester
2/18/15 22
Decision Trees for MDP Model
Slide modified from Todd Hester
2/18/15 23
• Assume actions have similar effect across states
• srel = = s’-s• in some cases may be independent of s
(or be shared by many states)• Brunskill et al. 2008, Leffler et al. 2007,
Jong & Stone 2007
Assumption: Relative Effects
2/18/15 24
Using Decision Trees for Dynamics Model
Slide modified from Todd Hester
2/18/15 25
Representing Uncertainty Over Model: Random Forest
Slide modified from Todd Hester
2/18/15 26
Exploration/Exploitation with Random Forest Model of MDP
Slide modified from Todd Hester
2/18/15 27
TEXPLORE Planning Using Random Forest of
Models
• Essentially, compute an average model (from random forest)
• Then use that for planning• Some computational advantages too
Equation from Hester & Stone JMLR 2013
2/18/15 28
MCTS for TEXPLORE Planning
Slide modified from Todd Hester
2/18/15 29
TEXPLORE: Planning using UCT
Figure from Hester & Stone JMLR 2013
2/18/15 30
TEXPLORE: Reuse Tree Across Time Steps
Figure from Hester & Stone JMLR 2013
2/18/15 31
TEXPLORE: UCT + lambda-returns
Figure from Hester & Stone JMLR 2013
2/18/15 32
TEXPLORE
Slide modified from Todd Hester
2/18/15 33
Simulations on Car Driving
Slide modified from Todd Hester
2/18/15 34
Slide modified from Todd Hester
2/18/15 35
Comparing to Other Approaches
Slide modified from Todd Hester
2/18/15 36
Slide modified from Todd Hester
2/18/15 37
Fuel World
Slide modified from Todd Hester
2/18/15 38
Slide modified from Todd Hester
2/18/15 39
Slide modified from Todd Hester
2/18/15 40
Model Accuracy
Slide modified from Todd Hester
2/18/15 41
Does it work on real car?
Slide modified from Todd Hester
2/18/15 42
TEXPLORE
• Uses random forests to represent MDP dynamics & rewards
• Uses MCTS for planning
• In domains presented, little explicit exploration needed
Slide modified from Todd Hester
2/18/15 43
Summary: Model-based Sample Efficient RL
• What objective is algorithm optimizing?• Today, empirical performance. No formal guarantees
• Using function approximation for the model can greatly speed learning (can exploit structure in dynamics model)
• Exploration / exploitation• Do we need to explicitly explore? • We’ll always explore things that look promising• In results saw today, didn’t need much explicit exploration
• Planning with complex models • Can be computationally prohibitive• Approximate approaches, like MCTS, useful