Control Regularization for Reduced Variance Reinforcement ... · Theorem 1. Using the mixed policy...

Control Regularization for Reduced Variance Reinforcement Learning

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri

Yisong Yue, Joel W. Burdick

max𝜃

𝐽(𝜃) = max𝜃

𝔼𝜏~𝜋𝜃

𝛾𝑡 𝑟 𝑠𝑡 , 𝑎𝑡

Reinforcement Learning

Reinforcement learning (RL) studies how to use data from interactions with the environment to learn an optimal policy:

𝜋𝜃 𝑎 𝑠 : 𝑆 × 𝐴 → 0,1

𝜏: 𝑠𝑡 , 𝑎𝑡, … , 𝑠𝑡+𝑁, 𝑎𝑡+𝑁

Figure from Sergey Levine

Policy gradient-based optimization with no prior information:

Policy:

Reward Optimization:

Williams, 1992; Sutton et al. 1999 Baxter and Bartlett, 2000 Greensmith et al. 2004

RL methods suffer from high variance in learning (Islam et al. 2017; Henderson et al. 2018)

Variance in Reinforcement Learning

Allows us to optimize policy with no prior information (only sampled trajectories from interactions)

Greensmith et al. 2004, Zhao et al. 2012 Zhao et al. 2015; Thodoroff et al. 2018

Inverted pendulum10 random seeds

Figure from Alex Irpan

RL methods suffer from high variance in learning (Islam et al. 2017; Henderson et al. 2018)

Variance in Reinforcement Learning

Allows us to optimize policy with no prior information (only sampled trajectories from interactions)

Greensmith et al. 2004, Zhao et al. 2012 Zhao et al. 2015; Thodoroff et al. 2018

Inverted pendulum10 random seeds

Figure from Kris Hauser

Figure from Alex Irpan

However, is this necessary or even desirable?

𝒂 = 𝒖𝒑𝒓𝒊𝒐𝒓(𝒔)

Cartpole

𝑠𝑡+1 ≈ 𝑓 𝑠𝑡 + 𝑔 𝑠𝑡 𝑎𝑡

𝐿𝑄𝑅 Controller

Nominal controller is stable but based on:• Error prone model• Linearized dynamics

Regularization with a Control Prior

Combine control prior, 𝑢𝑝𝑟𝑖𝑜𝑟(𝑠),

with learned controller, 𝑢𝜃𝑘 𝑠 ,

sampled from 𝜋𝜃𝑘 𝑎 𝑠

𝜋𝜃𝑘 learned in same manner with samples drawn from new distribution (e.g. )

𝜆 is a regularization parameter weighting the prior vs. the learned controller

Under the assumption of Gaussian exploration noise(i.e. 𝜋𝜃 𝑎 𝑠 has Gaussian distribution):

Johannink et al. 2018; Silver et al. 2019

which can be equivalently expressed as the constrained optimization problem,

Interpretation of the Prior

Theorem 1. Using the mixed policy above, variance from

each policy gradient step is reduced by factor 1

1+𝜆 2 .

However, this may introduce bias into the policy

where represents the total variation distance between two policies.

Interpretation of the Prior

Theorem 1. Using the mixed policy above, variance from

each policy gradient step is reduced by factor 1

1+𝜆 2 .

However, this may introduce bias into the policy

Strong regularization: The control prior heavily constrains exploration. Stabilize to the red trajectory, but miss green one.

Weak regularization: Greater room for exploration, but may not stabilize around red trajectory.

where represents the total variation distance between two policies.

Theorem 2. Assume a stabilizing ℋ∞ control prior within the set 𝒞 for the dynamical system (14). Then asymptotic stability and forward invariance of the set 𝒮𝑠𝑡 ⊆ 𝒞

is guaranteed under the regularized policy for all 𝑠 ∈ 𝒞.

Cartpole

With a robust control prior, the regularized controller always remains near the equilibrium point, even during learning

Stability Properties from the Prior

Regularization allows us to “capture” stability properties from a robust control prior

Data gathered from chain of cars following each other. Goal is to optimize fuel-efficiency of the middle car.

Results

Goal is to minimize laptime of simulated racecar

Control Regularization helps by providing:

• Reduced variance• Higher rewards• Faster learning• Potential safety guarantees

However, high regularization also leads to potential bias

See Poster for similar results on CartPole domain

Code at: https://github.com/rcheng805/CORE-RL

Poster Number: 42

Control Regularization for Reduced Variance Reinforcement ... · Theorem 1. Using the mixed policy...

Documents

A unified variance-reduced accelerated gradient method for

c 2017 by Aston Zhang. All rights reserved.seclab.illinois.edu/wp-content/uploads/2017/08/zhang2017analyzing.… · a stochastic variance reduced alternating direction method of multipliers

variance co-variance

A unified variance-reduced accelerated gradient method for ...A unified variance-reduced accelerated gradient method for convex optimization GuanghuiLan, ZhizeLi, Yi Zhou

Variance-reduced weak Monte Carlo simulations of ... · to work satisfactorily even in the special case of the ensemble size being just 1. The variance reduction scheme is ﬁrst

Allan variance - Iowa State Universityhome.engineering.iastate.edu/~shermanp/AERE432/lectures/Rate Gyros... · Allan variance 1 Allan variance The Allan variance (AVAR), also known

Mean-Variance Policy Iteration for Risk-Averse

Introduction to Analysis of Variance Procedures€¦ · ANOVAperforms analysis of variance, multivariate analysis of variance, and repeated measures analysis of variance for balanced

Ensemble learning€¦ · Bias-variance tradeoff • We encountered this trade-off for weights in linear regression • Regularizing introduced bias, but reduced variance • More

Unconventional Monetary Policy and the Spillovers to ... Conference... · unconventional monetary policy through changes in the variance of policy shocks on QE announcement days

A Detailed Examination of Minimum Variance and Low ... · – Using 25% weight to the minimum variance portfolio reduced volatility with no loss of return • Blitz and VanVliet (2007)

A reduced form ARDL approach to counterfactual policy analysis

Stochastic Variance-Reduced Cubic Regularization Methodsjmlr.csail.mit.edu/papers/volume20/19-055/19-055.pdfWe conduct extensive numerical experiments with di erent types of nonconvex

HEALTH AND REDUCED POLICY BRIEF INEQUALITIES2 POLICY BRIEF / Health and reduced inequalities Health and inequalities There is ample evidence that income and social inequalities have

One-Way Analysis of Variance (ANOVA): Between …howardsei.org/uploads/One-way_Between_ANOVA.pdf · Systematic Variance Systematic variance (or between-groups variance) is that part

Riemannian stochastic variance reduced gradient on Grassmann manifold (ICCOPT2016)

arXiv:1904.08598v3 [stat.ML] 25 Jun 2020Reducing Noise in GAN Training with Variance Reduced Extragradient Tatjana Chavdarova Mila, Université de Montréal Idiap, École Polytechnique

Modeling the Variance of Variance through a Constant

A Simple Stochastic Variance Reduced Algorithm with Fast ... · A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates enjoy a fast linear convergence rate for

Lecture 2: Spatial Models - Mauricio Sarrias · 1 Taxonomy of Models MandatoryReading SpatialLagModel Reduced Form and Parameter Space Expected Value and Variance SpatialDurbinModel