15
Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter Robotic Systems Lab, ETH Zurich Presented by Nicole McNabb University of Waterloo June 27, 2018 1 / 15

Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Control of a Quadrotor with Reinforcement Learning

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Robotic Systems Lab, ETH Zurich

Presented by Nicole McNabb

University of Waterloo

June 27, 2018

1 / 15

Page 2: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Overview

1 Introduction

2 The Method

3 Empirical Results

4 Summary and Future Work

2 / 15

Page 3: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

3 / 15

Page 4: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Introduction

What is a quadrotor?

Figure: Quadrotor [1]

High-level goal:

Train the quadrotor to performtasks with varying initializations

A policy optimization problem.

4 / 15

Page 5: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Introduction

Related Approaches

Deep Deterministic PolicyGradient (DDPG)

Actor-critic architecture

Off-policy, model-free

Deterministic

Insufficient exploration

Very slow (if any)convergence

Trust Region Policy Optimization(TRPO)

Actor-critic architecture

Off-policy, model-free

Stochastic

Computationally intensive

Slow, unreliable convergence

5 / 15

Page 6: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Introduction

A New Approach

Goal:A deterministic model with

Fast and stable convergence

Model-free training

Extensive exploration

Solution:A method combining the actor-critic architecture with an on-policydeterministic policy gradient algorithm and a new exploration strategy.

6 / 15

Page 7: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

The Method

Setup

Continuous State-Action Space

State Space

18-D states, model:

Orientation (or rotation)

Position

Linear velocity of system

Angular velocity of system

Action Space

4-D actions, dictate rotor thrust for each rotor

7 / 15

Page 8: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

The Method

Exploration

Figure: Exploration Strategy [2]

8 / 15

Page 9: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

The Method

Network Training

Figure: Value Network [2]

Value function training:Approximate with Monte-Carlosamples obtained from currenttrajectory

Figure: Policy Network [2]

Policy optimization:Same idea as TRPO, replacingKL-divergence with Mahalanobismetric

9 / 15

Page 10: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

The Method

Learning Algorithm

Algorithm 1 Policy optimization

1: Input: Initial value function approximation, initial policy2: for j = 1,2,. . . do3: Perform exploration, take action4: Compute MC estimates from current trajectory5: Do approximate value function update6: Do policy gradient update7: end for

10 / 15

Page 11: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Empirical Results

Empirical Results

Training done in simulation

Testing on two main tasks done on a real quadrotor

11 / 15

Page 12: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Summary and Future Work

Summary

Primary contributions:

A new deterministic, model-free neural network policy for training aquadrotor

Stable and reliable performance on hard tasks, even under harshinitial conditions

12 / 15

Page 13: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Summary and Future Work

Future Research

Also compare model against PPO

Introducing more accurate model of the system into simulation

Train an RNN to adapt to model errors automatically

13 / 15

Page 14: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Summary and Future Work

References

https://www.seeedstudio.com/Crazyflie-2.0-p-2103.html

Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and Marco Hutter

Control of a Quadrotor with Reinforcement Learning

IEEE Robotics and Automation Letters, June 2017.

14 / 15

Page 15: Control of a Quadrotor with Reinforcement Learningppoupart/teaching/cs885-spring...Control of a Quadrotor with Reinforcement Learning Jemin Hwangbo, Inkyu Sa, Roland Siegwart, and

Summary and Future Work

Questions?

15 / 15