View
228
Download
2
Tags:
Embed Size (px)
Citation preview
Fusing Machine Learning & Control TheoryWith Applications to Smart Buildings & ActionWebs
UC BerkeleyActionWebs MeetingNovember 03, 2010
By Jeremy Gillula
[Some rights reserved unless otherwise noted; see http://tinyurl.com/2qn665]
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 2
Talk Outline
Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching
Motivation Goals – Combining Machine Learning and Control
Theory Existing Approaches Current Research Extensions Conclusions Questions
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 3
“An Application of Reinforcement Learning toAerobatic Helicopter Flight” (Abbeel et al., 2007)
Use linear regression to learn parameters of given model Use differential dynamic programming to solve the MDP
Generate trajectory using current policy and nonlinear dynamics
Compute new policy using LQR and linearized dynamics around that trajectory
Reward function generated using apprenticeship learning[Video from Abbeel et. al. 2007]
Analysis: Great performance No formal safety analysis Required some hand-tweaking
for stability (e.g. hand-chosen reward weights)
Easily generalizable
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 4
“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)
Safe given accuracy of model and worst-case disturbances
Used reachability analysis via level-set methods to design and perform a safe backflip
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 5
“A…Hamilton–Jacobi Formulation of Reachable Sets for Continuous Dynamic Games” (Mitchell et al., 2005)
Create a level set function such that: Boundary of keep-out set
K is defined implicitly by is negative inside
region and positive outside
Reachability as game: Disturbance attempts to
force system into unsafe region, control attempts to stay safe
Solution can be found via Hamilton-Jacobi-Bellman PDE:
[Figure from Tomlin 2009]
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 6
“Design of Guaranteed Safe Maneuvers Using Reachable Sets…” (Gillula et al., 2010)
Recovery Drift Impulse Analysis: Decent performance Formal safety analysis Required human input for
choosing design parameters
Difficult to generalize
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 7
Motivation: “Machine Learning” Techniques vs. “Control Theory” Techniques
Abbeel et al., 2007 Gillula et al., 2010
Modeling & System ID
Based on data (could be nonparametric)
Based on physics
Planning & Control
Based on data (could be sampling-based)
Based on heuristic reward functions (or physics)
Types of Guarantees
Proofs of convergence for learning algorithms
Safety and robustness guarantees for system performance
SummaryData-Driven
Convergence FocusedPhysics-DrivenSafety Focused
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 8
Goals/Research Statement
How can we get high-performance on complicated systems while still guaranteeing safety
Take advantage of “Machine Learning” techniques for performance Data-driven models (potentially nonparametric) Data-driven, sampling-based techniques for estimation and control
While getting “Control Theory”-style safety guarantees Formal, principled analyses of safety
Several Possible Approaches Adapt data-driven methods to existing safety-analysis techniques Closely couple data-driven methods with techniques for generating
safety guarantees Use data-driven techniques in the context of existing safety-
analysis techniques
Other alternatives
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 9
Talk Outline
Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching
Motivation Goals – Combining Machine Learning and Control
Theory Existing Approaches Current Research Extensions Conclusions Questions
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 10
“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)
Nonlinear and transient aerodynamics in perching Need to learn model from
data
Use physically-inspired basis functions Nonlinear functions of
state x, z, , etc.
Compute least-squares fit for every combination of n basis functions:
[Figures from Hoburg and Tedrake 2009]
Adapt data-driven methods to existing safety-analysis techniques
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 11
“System Identification of Post Stall Aerodynamics forUAV Perching” (Hoburg and Tedrake, 2009)
Nonlinear and transient aerodynamics in perching Need to learn model from
data
Use physically-inspired basis functions Nonlinear functions of
state x, z, , etc.
Compute least-squares fit for every combination of n basis functions:
[Figures from Hoburg and Tedrake 2009]
Analysis/Extensions: Use standard control theory
techniques to generate safety guarantees
Use lasso or other regularization to choose basis functions
Adapt data-driven methods to existing safety-analysis techniques
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 12
“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)
Augmented process model is:
Use an adaptive EKF to learn the error:
Let augmented state be:
Then:
NN weights
Closely couple data-driven methods with techniques for generating safety guarantees
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 13
“Predictive Guidance Intercept Using The Neural EKF Tracker” (Stubberud and Kramer, 2007)
Then associated Jacobian is:
so state estimation and NN training are coupled
Normal EKF analysis follows
Analysis: Learns model error Learning done online But combining ML and control theory tools can be tricky
E.g. augmented system is not observable
Closely couple data-driven methods with techniques for generating safety guarantees
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 14
Talk Outline
Current “State of the Art” Reinforcement learning and apprenticeship learning Reachability for guaranteed safe mode switching
Motivation Goals – Combining Machine Learning and Control
Theory Existing Approaches Current Research Extensions Conclusions Questions
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 15
Safely Learning A Bounded System
Learning unknown dynamics of a target vehicle via observation
Limited field of view Safety = always keeping target in
view, i.e.
Bounded system Assume target dynamics are
autonomous and bounded, i.e.
Measurement model given by:
[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]
Use data-driven techniques in the context of existing safety-analysis techniques
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 16
Safely Learning A Bounded System
Problem statement1) Learn target dynamics
2) Minimize error:
3) Maintain target in view:
For (1) use machine learning: Fixed model w/linear regression Physically inspired basis functions Neural network
(1) leads to (2) via EKF, UKF, or PF
(3) requires controlling our vehicle’s position and height[Pioneer image courtesy University of
Queensland, http://tinyurl.com/38dje6f]
Use data-driven techniques in the context of existing safety-analysis techniques
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 17
Safely Learning A Bounded System
For (3) use reachability: Unsafe set Treat target motion as
adversarial disturbance Augmented system dynamics:
Result: Can use any learning/tracking
algorithm Reachability only kicks in on
border of unsafe sets[Pioneer image courtesy University of Queensland, http://tinyurl.com/38dje6f]
Use data-driven techniques in the context of existing safety-analysis techniques
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 18
Caveat
What follows is pure brainstorming Feedback and suggestions are welcome
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 19
Safely Learning A Bounded System
Possible extension: safe autonomous data collection/learning Attempt to learn/modify
building model (or control policy) online
Start w/basic physics model (or control policy)
Assume bounded errors as disturbance
Reachability enables following any exploration policies when safe
Use data-driven techniques in the context of existing safety-analysis techniques
[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 20
Safely Learning A Bounded System
Limited acceptable range Safety = always keeping
target states within acceptable tolerances, i.e.
Bounded system Assume target dynamics
are bounded, i.e.
Problem statement1) Learn system dynamics
2) Minimize error:
3) Maintain target states in safe region:
Use data-driven techniques in the context of existing safety-analysis techniques
[Image courtesy Jorge Ortiz, http://tinyurl.com/2dnz5jl]
Proposed Approach1) Use machine learning
2) Use the results of (1) with optimal control
3) Use reachability
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 21
Safely Learning A Bounded System ActionWeb
Difficulties: Reachable set calculations for high dimensions And they need to be online
Use data-driven techniques in the context of existing safety-analysis techniques
[Image courtesy David Culler, http://tinyurl.com/2bcaqnh]
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 22
Safely Learning A Bounded System ActionWeb
Solution: Building decomposition Decompose building into
separate rooms Model each room in
parallel Treat interactions
between rooms as bounded adversarial inputs
Still fits in machine learning framework (can still model interactions)
Still fits in reachability framework (can still calculate safe sets)
Use data-driven techniques in the context of existing safety-analysis techniques
[Image courtesy Claire Tomlin, http://tinyurl.com/26bpcl8]
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 23
Conclusions
Possible Approaches: Adapt data-driven methods to
existing safety-analysis techniques
Closely couple data-driven methods with techniques for generating safety guarantees
Use data-driven techniques in the context of existing safety-analysis techniques
Extension to smart buildings and ActionWebs
Combining Machine Learning and Control Theory Achieving high-performance on complicated systems while still
guaranteeing safety
NN weights
c Nov 03, 2010 ActionWebs Talk (J. Gillula) 24
Questions?