Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula

Fusing Machine Learning & Control TheoryWith Applications to Smart Buildings & ActionWebs

UC BerkeleyActionWebs MeetingNovember 03, 2010

By Jeremy Gillula

[Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665]

c Nov 03, 2010 ActionWebs Talk (J. Gillula) 2

Talk Outline

Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching

Motivation Goals – Combining Machine Learning and Control

Theory Existing Approaches Current Research Extensions Conclusions Questions


“An Application of Reinforcement Learning toAerobatic Helicopter Flight” (Abbeel et al., 2007)

Use linear regression to learn parameters of given model Use differential dynamic programming to solve the MDP

Generate trajectory using current policy and nonlinear dynamics

Compute new policy using LQR and linearized dynamics around that trajectory

Reward function generated using apprenticeship learning[Video from Abbeel et. al. 2007]

Analysis: Great performance No formal safety analysis Required some hand-tweaking

for stability (e.g. hand-chosen reward weights)

Easily generalizable


“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)

Safe given accuracy of model and worst-case disturbances

Used reachability analysis via level-set methods to design and perform a safe backflip


“A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005)

Create a level set function such that: Boundary of keep-out set

K is defined implicitly by is negative inside

region and positive outside

Reachability as game: Disturbance attempts to

force system into unsafe region, control attempts to stay safe

Solution can be found via Hamilton-Jacobi-Bellman PDE:

[Figure from Tomlin 2009]


“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)

Recovery Drift Impulse Analysis: Decent performance Formal safety analysis Required human input for

choosing design parameters

Difficult to generalize


Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques

Abbeel et al., 2007 Gillula et al., 2010

Modeling & System ID

Based on data (could be nonparametric)

Based on physics

Planning & Control

Based on data (could be sampling-based)

Based on heuristic reward functions (or physics)

Types of Guarantees

Proofs of convergence for learning algorithms

Safety and robustness guarantees for system performance

SummaryData-Driven

Convergence FocusedPhysics-DrivenSafety Focused


Goals/Research Statement

How can we get high-performance on complicated systems while still guaranteeing safety

Take advantage of “Machine Learning” techniques for performance Data-driven models (potentially nonparametric) Data-driven, sampling-based techniques for estimation and control

While getting “Control Theory”-style safety guarantees Formal, principled analyses of safety

Several Possible Approaches Adapt data-driven methods to existing safety-analysis techniques Closely couple data-driven methods with techniques for generating

safety guarantees Use data-driven techniques in the context of existing safety-

analysis techniques

Other alternatives


Talk Outline





“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)

Nonlinear and transient aerodynamics in perching Need to learn model from

data

Use physically-inspired basis functions Nonlinear functions of

state x, z, , etc.

Compute least-squares fit for every combination of n basis functions:

[Figures from Hoburg and Tedrake 2009]

Adapt data-driven methods to existing safety-analysis techniques


“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)

Nonlinear and transient aerodynamics in perching Need to learn model from

data

Use physically-inspired basis functions Nonlinear functions of

state x, z, , etc.

Compute least-squares fit for every combination of n basis functions:

[Figures from Hoburg and Tedrake 2009]

Analysis/Extensions: Use standard control theory

techniques to generate safety guarantees

Use lasso or other regularization to choose basis functions

Adapt data-driven methods to existing safety-analysis techniques


“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)

Augmented process model is:

Use an adaptive EKF to learn the error:

Let augmented state be:

Then:

NN weights

Closely couple data-driven methods with techniques for generating safety guarantees


“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)

Then associated Jacobian is:

so state estimation and NN training are coupled

Normal EKF analysis follows

Analysis: Learns model error Learning done online But combining ML and control theory tools can be tricky

E.g. augmented system is not observable



Talk Outline





Safely Learning A Bounded System

Learning unknown dynamics of a target vehicle via observation

Limited field of view Safety = always keeping target in

view, i.e.

Bounded system Assume target dynamics are

autonomous and bounded, i.e.

Measurement model given by:

[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]

Use data-driven techniques in the context of existing safety-analysis techniques



Problem statement1) Learn target dynamics

2) Minimize error:

3) Maintain target in view:

For (1) use machine learning: Fixed model w/linear regression Physically inspired basis functions Neural network

(1) leads to (2) via EKF, UKF, or PF

(3) requires controlling our vehicle’s position and height[Pioneer image courtesy University of

Queensland, http://tinyurl.com/38dje6f]




For (3) use reachability: Unsafe set Treat target motion as

adversarial disturbance Augmented system dynamics:

Result: Can use any learning/tracking

algorithm Reachability only kicks in on

border of unsafe sets[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]



Caveat

What follows is pure brainstorming Feedback and suggestions are welcome



Possible extension: safe autonomous data collection/learning Attempt to learn/modify

building model (or control policy) online

Start w/basic physics model (or control policy)

Assume bounded errors as disturbance

Reachability enables following any exploration policies when safe


[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]



Limited acceptable range Safety = always keeping

target states within acceptable tolerances, i.e.

Bounded system Assume target dynamics

are bounded, i.e.

Problem statement1) Learn system dynamics

2) Minimize error:

3) Maintain target states in safe region:


[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]

Proposed Approach1) Use machine learning

2) Use the results of (1) with optimal control

3) Use reachability


Safely Learning A Bounded System ActionWeb

Difficulties: Reachable set calculations for high dimensions And they need to be online


[Image courtesy David Culler, http://tinyurl.com/2bcaqnh]


Safely Learning A Bounded System ActionWeb

Solution: Building decomposition Decompose building into

separate rooms Model each room in

parallel Treat interactions

between rooms as bounded adversarial inputs

Still fits in machine learning framework (can still model interactions)

Still fits in reachability framework (can still calculate safe sets)


[Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8]


Conclusions

Possible Approaches: Adapt data-driven methods to

existing safety-analysis techniques



Extension to smart buildings and ActionWebs

Combining Machine Learning and Control Theory Achieving high-performance on complicated systems while still

guaranteeing safety

NN weights


Questions?

Documents

Fusing Machine Learning & Control Theory With Applications to Smart Buildings & ActionWebs UC Berkeley ActionWebs Meeting November 03, 2010 By Jeremy Gillula