24
Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula [Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665]

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

  • View
    228

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

Fusing Machine Learning & Control TheoryWith Applications to Smart Buildings & ActionWebs

UC BerkeleyActionWebs MeetingNovember 03, 2010

By Jeremy Gillula

[Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665]

Page 2: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 2

Talk Outline

Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching

Motivation Goals – Combining Machine Learning and Control

Theory Existing Approaches Current Research Extensions Conclusions Questions

Page 3: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 3

“An Application of Reinforcement Learning toAerobatic Helicopter Flight” (Abbeel et al., 2007)

Use linear regression to learn parameters of given model Use differential dynamic programming to solve the MDP

Generate trajectory using current policy and nonlinear dynamics

Compute new policy using LQR and linearized dynamics around that trajectory

Reward function generated using apprenticeship learning[Video from Abbeel et. al. 2007]

Analysis: Great performance No formal safety analysis Required some hand-tweaking

for stability (e.g. hand-chosen reward weights)

Easily generalizable

Page 4: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 4

“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)

Safe given accuracy of model and worst-case disturbances

Used reachability analysis via level-set methods to design and perform a safe backflip

Page 5: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 5

“A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005)

Create a level set function such that: Boundary of keep-out set

K is defined implicitly by is negative inside

region and positive outside

Reachability as game: Disturbance attempts to

force system into unsafe region, control attempts to stay safe

Solution can be found via Hamilton-Jacobi-Bellman PDE:

[Figure from Tomlin 2009]

Page 6: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 6

“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)

Recovery Drift Impulse Analysis: Decent performance Formal safety analysis Required human input for

choosing design parameters

Difficult to generalize

Page 7: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 7

Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques

Abbeel et al., 2007 Gillula et al., 2010

Modeling & System ID

Based on data (could be nonparametric)

Based on physics

Planning & Control

Based on data (could be sampling-based)

Based on heuristic reward functions (or physics)

Types of Guarantees

Proofs of convergence for learning algorithms

Safety and robustness guarantees for system performance

SummaryData-Driven

Convergence FocusedPhysics-DrivenSafety Focused

Page 8: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 8

Goals/Research Statement

How can we get high-performance on complicated systems while still guaranteeing safety

Take advantage of “Machine Learning” techniques for performance Data-driven models (potentially nonparametric) Data-driven, sampling-based techniques for estimation and control

While getting “Control Theory”-style safety guarantees Formal, principled analyses of safety

Several Possible Approaches Adapt data-driven methods to existing safety-analysis techniques Closely couple data-driven methods with techniques for generating

safety guarantees Use data-driven techniques in the context of existing safety-

analysis techniques

Other alternatives

Page 9: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 9

Talk Outline

Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching

Motivation Goals – Combining Machine Learning and Control

Theory Existing Approaches Current Research Extensions Conclusions Questions

Page 10: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 10

“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)

Nonlinear and transient aerodynamics in perching Need to learn model from

data

Use physically-inspired basis functions Nonlinear functions of

state x, z, , etc.

Compute least-squares fit for every combination of n basis functions:

[Figures from Hoburg and Tedrake 2009]

Adapt data-driven methods to existing safety-analysis techniques

Page 11: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 11

“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)

Nonlinear and transient aerodynamics in perching Need to learn model from

data

Use physically-inspired basis functions Nonlinear functions of

state x, z, , etc.

Compute least-squares fit for every combination of n basis functions:

[Figures from Hoburg and Tedrake 2009]

Analysis/Extensions: Use standard control theory

techniques to generate safety guarantees

Use lasso or other regularization to choose basis functions

Adapt data-driven methods to existing safety-analysis techniques

Page 12: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 12

“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)

Augmented process model is:

Use an adaptive EKF to learn the error:

Let augmented state be:

Then:

NN weights

Closely couple data-driven methods with techniques for generating safety guarantees

Page 13: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 13

“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)

Then associated Jacobian is:

so state estimation and NN training are coupled

Normal EKF analysis follows

Analysis: Learns model error Learning done online But combining ML and control theory tools can be tricky

E.g. augmented system is not observable

Closely couple data-driven methods with techniques for generating safety guarantees

Page 14: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 14

Talk Outline

Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching

Motivation Goals – Combining Machine Learning and Control

Theory Existing Approaches Current Research Extensions Conclusions Questions

Page 15: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 15

Safely Learning A Bounded System

Learning unknown dynamics of a target vehicle via observation

Limited field of view Safety = always keeping target in

view, i.e.

Bounded system Assume target dynamics are

autonomous and bounded, i.e.

Measurement model given by:

[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]

Use data-driven techniques in the context of existing safety-analysis techniques

Page 16: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 16

Safely Learning A Bounded System

Problem statement1) Learn target dynamics

2) Minimize error:

3) Maintain target in view:

For (1) use machine learning: Fixed model w/linear regression Physically inspired basis functions Neural network

(1) leads to (2) via EKF, UKF, or PF

(3) requires controlling our vehicle’s position and height[Pioneer image courtesy University of

Queensland, http://tinyurl.com/38dje6f]

Use data-driven techniques in the context of existing safety-analysis techniques

Page 17: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 17

Safely Learning A Bounded System

For (3) use reachability: Unsafe set Treat target motion as

adversarial disturbance Augmented system dynamics:

Result: Can use any learning/tracking

algorithm Reachability only kicks in on

border of unsafe sets[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]

Use data-driven techniques in the context of existing safety-analysis techniques

Page 18: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 18

Caveat

What follows is pure brainstorming Feedback and suggestions are welcome

Page 19: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 19

Safely Learning A Bounded System

Possible extension: safe autonomous data collection/learning Attempt to learn/modify

building model (or control policy) online

Start w/basic physics model (or control policy)

Assume bounded errors as disturbance

Reachability enables following any exploration policies when safe

Use data-driven techniques in the context of existing safety-analysis techniques

[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]

Page 20: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 20

Safely Learning A Bounded System

Limited acceptable range Safety = always keeping

target states within acceptable tolerances, i.e.

Bounded system Assume target dynamics

are bounded, i.e.

Problem statement1) Learn system dynamics

2) Minimize error:

3) Maintain target states in safe region:

Use data-driven techniques in the context of existing safety-analysis techniques

[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]

Proposed Approach1) Use machine learning

2) Use the results of (1) with optimal control

3) Use reachability

Page 21: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 21

Safely Learning A Bounded System ActionWeb

Difficulties: Reachable set calculations for high dimensions And they need to be online

Use data-driven techniques in the context of existing safety-analysis techniques

[Image courtesy David Culler, http://tinyurl.com/2bcaqnh]

Page 22: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 22

Safely Learning A Bounded System ActionWeb

Solution: Building decomposition Decompose building into

separate rooms Model each room in

parallel Treat interactions

between rooms as bounded adversarial inputs

Still fits in machine learning framework (can still model interactions)

Still fits in reachability framework (can still calculate safe sets)

Use data-driven techniques in the context of existing safety-analysis techniques

[Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8]

Page 23: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 23

Conclusions

Possible Approaches: Adapt data-driven methods to

existing safety-analysis techniques

Closely couple data-driven methods with techniques for generating safety guarantees

Use data-driven techniques in the context of existing safety-analysis techniques

Extension to smart buildings and ActionWebs

Combining Machine Learning and Control Theory Achieving high-performance on complicated systems while still

guaranteeing safety

NN weights

Page 24: Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 24

Questions?