Reinforcement Learning for Autonomous Quadrotor Helicopter...

Reinforcement Learning for Autonomous Quadrotor Helicopter ControlMichael Koval, Christopher Mansley, and Michael Littman

Rutgers, the State University of New Jersey

Abstract

I Quadrotor helicopters are rapidly finding use for a number of applicationsI Programming a new behavior means implementing a control algorithmI Tuning a new algorithms is costly and requires expertise in control theoryI Supervised learning is not a viable alternative: training data is hard to getI Reinforcement learning can learn directly from the quadrotor’s raw sensorsI Goal: Use reinforcement learning to teach a quadrotor helicopter how to

perform complex tasks with minimal human interaction

Experimental Setup

1. Parrot AR.Drone [4]. Inexpensive quadrotor helicopter with impressive technical specifications. Outfitted with two cameras, an altitude sensor, and an IMU. On-board computer running a variant of embedded Linux. Communicates with a controlling computer over a dedicated wifi network

2. Vicon Motion Tracking System. Quadrotor is tagged with an asymmetric pattern of reflective spheres. Configure the Vicon system to track the pattern as a rigid body. Track the quadrotor’s flight using the Vicon system’s infrared cameras

3. Control Software. Exerts direct, low-level control over the AR.Drone’s flight. Closes the loop between the motion tracking system and the AR.Drone. Uses reinforcement learning to find an optimal control algorithm

4. Robot Operating System (ROS). Inter-process communication framework developed by Willow Garage. Allows the system to be easily extended for more complex experiments

Joystick

Teleoperation AR.Drone

Vicon System

Learning Algorithm

Kalman Filter

Reinforcement Learning

I Agent learns from rewards earned while interacting with a stochasticenvironment

I Does not require annotated examples of the agent’s correct actionsI Ideal for situations rewards can automatically be assignedI Environment is modeled as a Markov decision process where, at time t:

1. the agent is in state st ∈ S,2. chooses action at ∼ π(st, a) using policy π, and3. receives reward rt

I Goal is to find the policy, π∗, that maximizes the agent’s reward:

J |π = E

{ ∞∑t=0

}I Popular algorithms include: Q Learning, TD Learning, and Actor-CriticI Policy gradient and natural actor critic are most popular in robotics

Simulation: Linear Quadratic Regulator

2 -8-4

Long-Term Expected Reward

−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0w1

Estimated Gradient of Expected Reward

I Simple one-dimensional control theory problem with s ∈ R, a ∈ R, and the environment: [2]

st+1 = st + at + noise

Rt = −s2t − a2

I Stochastic controller draws actions from at ∼ N(µθ, σθ) where µθ = θ1st and σθ =[1 + e−θ2

]−1[2]

I Vanilla policy gradient was simulated using central-difference estimator to compute gradients [3]

Results: Linear Quadratic Regulator Simulation

0 50 100 150 200 250 300 350 400

Policy Parameter θ1

0 50 100 150 200 250 300 350 400

Policy Parameter θ2

I Control theory proves that the theoretical optimum is at θ∗1 = −0.58 and θ∗2 = −∞ [2]I Vanilla gradient ascent quickly converges to θ1 = θ∗1 and continues moving towards θ2 = θ∗2

Policy Gradient

I Assumes the policy π(s, a | θ) is uniquely parameterized by θI Learning the optimal policy π∗ is now simplified to learning the optimal θ∗

I For any given θ, ∇J |θ points in the direction of maximum increaseI Step the parameters by a small amount in the direction of ∇J |θi:

θi+1 = θi + α∇J |θiI Gradually decrease α until the algorithm converges to a local maximumI Problem: This requires an empirical estimate of the gradient ∇J |θ

Finite Difference Gradient Estimate

I Simple form of gradient approximation that requires no knowledge of πI Slightly perturb θ with several small changes (i.e. ∆θ1, ∆θ2, ...)I Approximate J(θ + ∆θi) for each ∆θi by averaging multiple roll-outs [3]I Construct ∆Θ and ∆J such that ∆Θi = θi, ∆Ji = J(θ + ∆θi), and

∇J |θ =(∆Θ∆ΘT

)−1∆ΘT∆J

Likelihood Ratio Gradient Estimate

I Requires specific knowledge of the policy to estimate ∇π(s, a|θ) [2]I Estimates ∇J |θ without requiring the policy to be perturbed:

∇J |θ = E

{ ∞∑t=0

(rt − b)∇ log π(st, at|θ)

}I Faster convergence than finite difference methods in noisy environments

Natural Policy Gradient

I Gradient is defined in terms of Euclidean distance:

lim∆θ→0

(||J(θ + ∆θ)− J(θ) +∇J(θ) ·∆θ||2

||∆θ||2

I Parameters have arbitrary units, making the space highly non-EuclideanI Vanilla gradient no longer points in the direction of maximum increase [3]I Natural gradient is rotated to face the direction of maximum increase: [3]

∇̃J |θ = G−1θ ∇J |θ

I Dramatically better convergence rates than vanilla policy gradient [1]

References

S.I. Amari.Natural gradient works efficiently in learning.Neural computation, 10(2):251–276, 1998.

H. Kimura and S. Kobayashi.Reinforcement learning for continuous action using stochastic gradient ascent.Intelligent Autonomous Systems (IAS-5), pages 288–295, 1998.

J. Peters and S. Schaal.Policy gradient methods for robotics.In Intelligent Robots and Systems, 2006 IEEE/RSJ International Conference on, pages2219–2225. IEEE, 2006.

Stephane Piskorski.AR.Drone Developers Guide SDK 1.5.Parrot, 2010.

mkoval@eden.rutgers.edu, cmansley@cs.rutgers.edu, mlittman@cs.rutgers.edu

Reinforcement Learning for Autonomous Quadrotor Helicopter...

Documents

On-board Model Predictive Control of a Quadrotor ...€¦ · Abstract On-board Model Predictive Control of a Quadrotor Helicopter: Design, Implementation, and Experiments by Patrick

Inﬂuence of Aerodynamics on Quadrotor Dynamics - … · Inﬂuence of Aerodynamics on Quadrotor Dynamics ... of a quadrotor helicopter system. ... a great development in the area

Control of a Quadrotor Erdin§ Altug Helicopter Using Dual

Switching Model Predictive Attitude Control for a ...ltu.diva-portal.org/smash/get/diva2:976823/FULLTEXT01.pdf · Switching Model Predictive Attitude Control for a Quadrotor Helicopter

Design and Modelling of a Quadrotor Helicopter with Variable Pitch …folk.ntnu.no/skoge/prost/proceedings/ifac2014/media/... · · 2014-07-16Helicopter with Variable Pitch Rotors

Trajectory Tracking of Quadrotor Aerial Robot Using ...file.scirp.org/pdf/ICA_2013111310154345.pdffull nonlinear dynamics without gaining scheduling tech- ... and rotor-wing helicopter

Robust PID Control of the Quadrotor Helicopter

Quadrotor Program

Quadrotor Helicopter Trajectory Tracking Control,ai.stanford.edu/~gabeh/papers/GNC08_QuadTraj.pdf · Quadrotor Helicopter Trajectory Tracking Control Gabriel M. Ho mann, Steven L

Model Predictive Control of an Unmanned Quadrotor ... · Model Predictive Control of an Unmanned Quadrotor Helicopter: Theory and Flight Tests Mahyar Abdolhosseini A Thesis in The

Design, Construction and Control of a Quadrotor Helicopter ... · Design, Construction and Control of a Quadrotor Helicopter Using a New Multirate Technique Camilo Ossa-G omez´ A

Autonomous Navigation and Exploration of a Quadrotor …groups.csail.mit.edu/rrg/papers/auvsi09.pdf · 2010-05-10 · Autonomous Navigation and Exploration of a Quadrotor Helicopter

Research Article A Simple Attitude Control of Quadrotor ...downloads.hindawi.com/journals/tswj/2014/280180.pdf · A Simple Attitude Control of Quadrotor Helicopter Based on Ziegler-Nichols

Modelling and Control of a Large Quadrotor Robot · Modelling and Control of a Large Quadrotor Robot P.Pounds,a, ... itations on helicopter size are structural, ... system design

Robust PID Control of the Quadrotor Helicopterfolk.ntnu.no/skoge/prost/proceedings/PID-12/papers/0090.pdf · Robust PID Control of the Quadrotor Helicopter? R.A. Garc a, F.R. Rubio

Quadrotor Helicopter Flight Control Using Hough … · Quadrotor Helicopter Flight Control Using Hough Transform and Depth Map from a Microsoft Kinect Sensor John Stowers University

Active Fault Tolerant Control of Quad-Rotor Helicopterymzhang/courses/ENGR691X... · Active Fault Tolerant Control of Quad-Rotor Helicopter ... Mode Control for a Quadrotor Helicopter

Precision Flight Control for A Multi-Vehicle Quadrotor ...ai.stanford.edu/~gabeh/papers/cepStarmac.pdf · Figure 1: STARMAC II quadrotor helicopter. upon the models are critical for

Group 1: Autonomous Navigation of a Quadrotor Helicopter ... · Autonomous Navigation of a Quadrotor Helicopter Using GPS and Vision Control Group 1 December 10, 2009

DUAL NEURAL NETWORK FOR ADAPTIVE SLIDING MODE CONTROL …airccse.org/journal/IS/papers/2412ijist01.pdf · control based on sliding mode technique for Quadrotor helicopter control