5
Deep Reinforcement Learning Based Maintenance Free Control Nathan Lawrence ~ UBC mathematics Collaborative R&D between UBC (Bhushan Gopaluni, Philip Loewen, Greg Stewart) and Honeywell (Michael Forbes, Johan Backstrom) Control (e.g. PID or MPC) Reinforcement Learning Agent

Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

Deep Reinforcement Learning Based Maintenance Free Control Nathan Lawrence ~ UBC mathematicsCollaborative R&D between UBC (Bhushan Gopaluni, Philip Loewen, Greg Stewart) and Honeywell (Michael Forbes, Johan Backstrom)

Control (e.g. PID or MPC)

Reinforcement Learning Agent

Page 2: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

Overview• Deep Reinforcement Learning as a (model-free) framework for

control in industrial settings

➡ Utilize new and historical data

➡ Control with minimal disruptions to the plant and minimal human intervention

Control (e.g. PID or MPC)

actuators sensors

Reinforcement Learning Agent

Control (e.g. PID or MPC)

actuators sensors

Prior Art Tuning Techniques

(1) Prior art: semi-manual tuning

(2) Control tuning with DRL

(3) Replace controller with DRL

offlineonline

offlineonline

Deep Neural Network

Reinforcement Learning Agent

actuators sensors

offline

online

Page 3: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

Initial work

• Deep RL for set-point tracking

➡ Actor = controller = deep neural network

• Tracking performance and adaptability is promising, but issues with sample efficiency, stability, interpretability, compatibility, tuning of hyper-parameters

ProcessActor Replay Memory

Z�1

Update Actor Update Critic

Target NetworkUpdate

atyt+1yt

< st, at, rt, st+1 >

PolicyUpdates

Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. 2015 Sep 9. Spielberg S, Tulsyan A, Lawrence NP, Loewen PD, Bhushan Gopaluni R. Toward self-driving processes: A deep reinforcement learning approach to control. AIChE J. 2019;e16689.

Page 4: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

Back to basics

• PID

fits naturally in the actor-critic framework

• Simple, industrially-accepted control structure with straightforward initialization

Lawrence NP, Stewart GE, Loewen PD, Forbes MG, Backstrom JU, Bhushan Gopaluni R. Optimal PID and antiwindup control design as a reinforcement learning problem. IFAC World Congress, submitted.

u(t) = kpe(t) + ki ∫t

0e(τ)dτ + kd

ddt

e(t),

Page 5: Deep Reinforcement Learning Based Maintenance Free ...Back to basics • PID fits naturally in the actor-critic framework • Simple, industrially-accepted control structure with

Anti-windup

• Integral action with actuator constraints can lead to integral windup

• PID + AW is the (nonlinear) actor

Lawrence NP, Stewart GE, Loewen PD, Forbes MG, Backstrom JU, Bhushan Gopaluni R. Optimal PID and antiwindup control design as a reinforcement learning problem. IFAC World Congress, submitted.

st at

Qt

…kp ki kd ρ

st

at

(controller) (performance)

at = sat(kpe + kiIy + kdD + ρIu)