27
Reinforcement Learning Simulations and Robotics

Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Reinforcement Learning Simulations and

Robotics

Page 2: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Models

● Partially observable – noise in sensors

● Policy search methods rather than value function-based approaches

● Isolate key parameters by choosing an appropriate representation for a value function or policy

● Incorporating prior knowledge and transfer knowledge from simulations

Page 3: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Safety

● Key issue of the learning process● Doesn't apply to the rest of the RL community● Perkins and Barto

– RL agents based on Lyapunov functions– Switching between the underlying controllers – Always safe and offers basic performance

guarantees.

Page 4: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Grid World Themed Movements

● Classical RL approaches ● Discrete states and actions ● Projected for navigational tasks● Use actions like “move to the cell to the left”● Use a lower level controller to take care of

accelerating moving and stopping while ensuring precision

Page 5: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Quick Reward Shaping

● Rewards → Quick success – Real-world experience costly

● Specifying good reward functions – Requires domain knowledge

– Difficult in practice

● Intermediate rewards instead of binary

Page 6: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Tracking Solution

● Used to help convergence ● The dynamics of a robot can change

– Temperature– Wear on gears or motors– Other external factors

Page 7: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Building an Accurate Model

● Challenging ● Requires very many data samples ● Under-modeling errors accumulate

– Simulated robot can quickly diverge from the real-world system

● Transfer requires significant modifications if model is not accurate

Page 8: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Approximate models

● Verifying and testing algorithms in simulation● Establishing proximity to theoretical optimal

solution● Calculating approximate gradients for local

policy improvement● Identifying strategies for collecting more data● Performing “Mental rehearsal”

Page 9: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Mental Rehearsal

● Practicing in simulation● The simulated learning step● Used after learning a forward model from real

world● Only the resulting policy is transferred to the robot● Model-based methods

– Sample efficient– Often requires a great deal of memory

Page 10: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Mental Rehearsal Issues

● Simulation Biases● Stochasticity of the real world● Efficient optimization when sampling from a

simulator

Page 11: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Mental Rehearsal Solutions

● Add a stochastic model of distribution to your simulation

● Average results over model uncertainty ● Artificially add noise the the simulation

– Avoids policy over-fitting

– Smooths model errors

● Explicity model uncertainty

Page 12: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Grounded Simulation Learning

Iterative optimization framework for speeding up robot learning using an imperfect simulator

1. Behavior is optimized in simulation

2. Behavior is tested on robot and compared to expected results from the simulation

3. Simulator is modified using machine-learning approach to come closer to reality

Page 13: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

GSL: Fitness Sim

● Imperfect simulation of the robot● Evaluates the parametrized behavior of

the robot● Function must be modifiable● Used to make the simulation better match the

real robot’s behavior.

Page 14: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

GSL Fitness Robot

● Small number of evaluations ● Evaluates the fitness of the parametrized

behavior on the robot itself

Page 15: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

GSL Explore Robot

● A small number of explorations can be

run on the real robot

● Collect states and actions relevant to the current parameterization of the behavior

● While exploring

Page 16: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

GSL Learn

● Used to learn a model of the effects of actions

on state of the real robot.

● This model will be used to modify Fitness sim to make it better reflect the behavior on the real robot.

Page 17: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

GSL Optimize

● In simulation● Optimization to find better parameters

Page 18: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Ball in Cup Real Robot Example

Page 19: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Ball in Cup Real Robot Results

● 42-45 episodes to get the ball n the cup● 70-80 episodes to be consistent● Always converged tot he maximum after 100

episodes

Page 20: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Simulation in Robot RL

● Simulation matched recorded data very well● Simulated policies usually missed ● First improve a demonstrated policy in

simulation and only perform the fine-tuning on the real robot

● Importance sampler– Considers only the n best previous episodes

Page 21: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

SARSA

● Popular base RL algorithm for robotics● Compatible with Q-Value reuse

● The mapping Q-Value Reuse function

Page 22: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Q-Value Reuse

Page 23: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Transfer Methods

Page 24: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator
Page 25: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

● Weak Transfer: Time spent in source task doesn't count against the learner in the target

● Strong Tranfer: Source time does count

Page 26: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator
Page 27: Reinforcement Learning Simulations and Roboticstaylorm/14_580/Hawbaker.pdf · Reinforcement Learning Simulations and Robotics. Models ... up robot learning using an imperfect simulator

Two Step Transfer

● Learned sequentially from multiple source tasks

● The Q-Value Reuse function for two step