Autonomous Motion Learning for Near Optimal Control By Alan Jennings School of Engineering, University of Dayton Dayton, OH, August 2012 Dissertation defense

Autonomous Motion Learning for Near Optimal Control

By Alan JenningsSchool of Engineering, University of Dayton

Dayton, OH, August 2012

Dissertation defense in partial fulfillment of the requirements for the degree of doctor of philosophy in electrical engineering

Alan Jennings, Dissertation Defense, July 2012 2

MotivationConsider human learning: •Intelligent system: able to solve new problems and become an expertConsider computer accomplishments: •Beat chess & Jeopardy! grandmasters•But one cannot be repurposed for the other. Consider general purpose learning: •People can grow up to be presidents, design fashion, play croquet, identify liars, train animals, predict weather…•Not been accomplishedThe foundation for general purpose learning is a developmental framework:•Shaped by environment & experiences•Complex value systems guide learning•Infant stage restrict exploration until basic skills are established

IBM’s Deep blue beat Kasparov on their second match in 1997 IBM’s Watson beat Jennings

and Rutter in 2011

In 2011, Google gained Nevada licenses for self-driving cars

Google Prius image, Flckr user Steve Jurvetson


Context in Developmental Learning

• Developmental learning seeks to mimic the progressive learning process– Infant -> Toddler -> Child -> Young adult -> … – The solution/knowledge should be unguided by the

programmer• Learning basic tasks supports learning high-level tasks

– Proverbial walking before running• The robot then learns general tasks of increasing

complexity at increasing proficiency– Does not require reasoning/understanding/consciousness


My ContributionsAutonomous motion learning:

• General purpose rigid body motion optimization– Provides novel high-level interface at the robot geometry level– Allows for novice roboticists or computers to design motions– However, has high computation requirements

• Optimal inverse functions from a global search– Organizes motions in continuous, optimal inverse functions– Provides a set of reflexive responses for use online– Efficiently searches high dimension space using agents & local gradient

• Improving motions by unbounded resolution– Nodes are added to an interpolation approaching optimal continuous function

in the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training time

limitation


Motivating Example

Use of general purpose programs to solve control problems– Use CAD package to draw robot– Use kinematic program for equations of motion– Use optimal control program to solve

• Optimal control problem is introduced– Finding the input with the lowest cost among

inputs satisfying constraints.


Motivating ExampleUse of general purpose programs for solving control problemsThe optimal control problem •Finding the control input with the lowest cost among inputs satisfying constraints.

Optimal Control

DynamicsMass & joints

Set up DIDO

Draft project

Set up Simulink

What does it look like

What are the controls

What is trying to be done

Human creativity comes in at the design level, not the optimization.


Motivating Example

• Typically solved by discretizing over time – Optimize a set of variables, not the continuous function– Local search method– Applies to isolated problem

• Change final value and needs new optimization

Use of general purpose programs for solving control problemsThe optimal control problem •Finding the control input with the lowest cost among inputs satisfying constraints.

x(t), u(t) → g(t)

ψo

ϕ

J ψf

xo

xf

Xf

Xo

General Optimal Control Problem


Motivating ExampleMotion Primitive

Example Problem: System:

Pendulum actuated at base Cost: (Torque)2, J=∫ u(τ)2 dτOutput:

Initial Disturbance, y = θ(t0)

Constraints: Reach final value: θ (tf) =0

Saturation: -umax ≤ u(t) ≤ umax

The way forwardIf system dynamics and initial

state are repeatable, Then problem is really only to

find a control signal. Continuous signals can be

approximated by parameterization,

So motion primitives can be composed solely by a vector function of an output.







limitation


Diversity and Progression in Motion Primitives

• Continuous, optimal inverse function– Motion primitives should be continuous so that changes in the

system behavior are not abrupt• Global search required for discovery

– Global search offers possibility of finding alternative motion primitives

– Finding isolated optima requires testing candidates which local conditions indicate would give worse performance

• Progression via increasing resolution– After optimizing at a given resolution, the signal is then limited by

the optimal signal not lying in the space of the parameterization. So the resolution must be increased to improve performance.


Optimal Inverse FunctionsHigh level concept

• Population covers broad area and uses local gradients to improve. • Converging agents are removed so number of agents quickly drops. • Settled agents create a motion primitive and use the local gradient to expand to

new outputs.• The operator has a choice of inverse functions to select from.

– Can use softer criteria for preference.• Inverse function is continuous and easily calculated making them suited for real-

time use.

Optimization

Initialize Population

Move Agents: Lower J(x),

Maintain f(x)

Check for removal or settling conditions

Form Cluster

Set of hk(yd)’s

Execution

Get yd, Evaluate hk

Select inverse function, hk(yd)

Move to new x*

Operator


Optimal Inverse FunctionsMechanics of the method

Improving a given agent1.Restrict motion to null space of

Output gradient2.Move opposite Cost gradient

•Saturation If gradients are large -> Limits effectIf Cost gradient is small -> small stepIf Output gradient is small

-> ease null space restriction

•Boundary constraint reduces step length

•Minimum step for settling•Remove particles too close

Quickly reduces population size

-1 -0.9 -0.8 -0.7

1.1

1.15

1.2

1.25

1.3

1.35

1.4

1.45

1.5

Output Cost

Step


Optimal Inverse FunctionsMechanics of the method

Form a cluster of optimal points1.Change output by moving

along the Output gradient2.Repeat optimizing steps

•Test for continuity/optimalityOutput changes in expected directionNot too far (discontinuity) Not too close (ill conditioned surface)Settled (optimality satisfied)

Decreasing yd

Increasing yd


Optimal Inverse FunctionsTesting of the method

x1

x 2

Peaks

-1 0 1-1

0

1

x1

x 2

x12 (x

2-c

1)2

-1 0 1-1

0

1

x1

x 2

(x1-c

1) x

22

-1 0 1-1

0

1

x1

x 2

x1 sin(c

1 x

2)

-1 0 1-1

0

1

Quadratic Cost

-1

0

1

Optimal Inverse Functions, hi

x 1

-1

0

1

x 2

-1 -0.5 0 0.5 10

1

2

yd

J

-1 0 1-1

-0.5

0

0.5

1

1.5Final Clusters: 5

x1

x 2

-1

0

1


x 1

-1

0

1

x 2

-0.5 0 0.5 1

-2

0

2

yd

J

-1 0 1-1

-0.5

0

0.5

1


x1

x 2

Linear/Quadratic Cost

-1

0

1


x 1-1

0

1

x 2

-0.5 0 0.5 1

-2

0

2

yd

J

-1 0 1-1

-0.5

0

0.5

1


x1

x 2

Periodic Cost

-1

0

1


x 1

-1

0

1

x 2

-0.5 0 0.5 101

2

yd

J

-1 0 1-1

-0.5

0

0.5

1


x1

x 2

Quadratic Cost

• Combination of functions Multiple extremum Saddle points 2-dim for verification• Expected result Clusters between output extremum


Optimal Inverse FunctionsTesting of the method

Quadratic CostPeriodic-Linear output

-1 0 1-1

-0.5

0

0.5

1


x1

x 2

-1

0

1


x 1

-1

0

1

x 2

-1 -0.5 0 0.5 10

1

2

yd

J


Optimal Inverse FunctionsPractical example

• Robot control Problem– Precision is dependent on the pose– Radial precision is optimized

via joint angles for varying radial distance• Planar Robot, Motoman HP-3:

• Complex Robot, Motoman IA-20:



• Each link has a different radius to the tip and therefore a different sensitivity

• In addition, the direction of sensitivity is different

• The problem effectively finds the joint locations that reduce sensitivity in the radial distance

Links are shown by solid arrows. The effective length to the tip is shown by a dashed arrow. The arc showing the sensitivity for a joint is matched by color.


Output is adjusted as desired (additional task of finding angle of plane and the in-plane angle)

Operator selects an inverse function


Optimal Inverse Functions

• Method searches a large space efficiently by:– Having agents congregate to locally optimal

solutions (increasing the effective search area of each), and

– Eliminating neighboring points (once locations of optima are sketched out, less agents are needed).

• Sets of continuous, optimal inverse functions– Can be used in real time, and – Reduces the burden on operator

without reducing optimality






in the limit– Efficiently collects and “understands” experiences – Motions are not limited by initial programming resolution or initial training

time limitation


Unbounded ResolutionHigh level concept

• To have continuous learning, must have unbounded resolution.

• Unbounded resolution leads to exponential growth in complexity

• Must make efficient use of experience

Developing theRe°ex Function

³y; J´ Optimization Reflex

FunctionMemory Model

a

a¤

a¤(yd)

yd

(y; J )

Cubic Interpolation SystemReflex

Function

u(t)a¤

Operator or Higher Level Planner

yd

Using theRe°ex Function

Cubic Interpolation SystemMemory

Model

u(t)

(y; J )

a

Developing theMemory Model

aq³y(aq); J (aq)

´


Unbounded ResolutionMechanics of the method

(y; J )

Cubic Interpolation SystemReflex

Function

u(t)a¤

Operator or Higher Level Planner

yd

Using theRe°ex Function

Developing theRe°ex Function

³y; J´ Optimization Reflex

FunctionMemory Model

a

a¤

a¤(yd)

yd

Cubic Interpolation SystemMemory

Model

u(t)

(y; J )

a

Developing theMemory Model

aq³y(aq); J (aq)

´

System Assumptions• t and a are bounded• y(a) and J(a) are in C2 and

constant


Unbounded ResolutionWhy cubic interpolation

• Adding node to cubic interpolation allows for all experiences to be transferred.

• Power series parameters are ill conditioned as the effective area of the basis approaches extremes

• Fourier series parameters typically create a less smooth optimization surface

• Radial basis function scaling parameter is either too small at low resolutions or large at high resolutions, and automatically changing it means data cannot be mapped exactly

• Sigmoid neural network parameters are large with respect to the input magnitude, resulting in poor optimization scaling.


Unbounded ResolutionWhy Locally weighted regression

• Locally Weighted Regression performs a least-squared-error regression where the error is scaled by the distance to the test point.

– local weighting allows global nonlinear behavior• Quadratic regression to accurately model optima• Provides gradient for optimization (and hessian) • Directions with insufficient data are identified from

eigenvalues – Allows for autonomously determining which samples must be

tested


Unbounded ResolutionTesting of the method

Problem Design: Cost: (Distance to sine wave)2

J2=∫ (u(τ)-(sin(2π τ)+2)/4)2 dτ

Output: Average valuey=∫ u(τ) dτ

Saturation applied to u(t)Results

Sinusoidal shape & Saturate at closer side

MotivationPossibly internal resonance,

Distance traveled, material processed, …

Internal LimitationsFlattens peaks in the absolute

distance -> Minimize RMS


Unbounded ResolutionTesting of the method

Near optimal compared to direct optimization

Exponential Learning Rate

Waveform results

The results exploits saturation. Going from 4 to 9 nodes,

the cost decreases but the shape appears identical by sight.


Unbounded ResolutionPractical example

• Objective:– Control the motor voltage to

spin the motor to a given speed at a set time with the minimum peak current.

• Only modifications– Adjusted parameters for

range of u, y & J– Increase measure of data

required to deal with process variation

– Ideal cost based on steady state

Amplifier Motor

Voltage out

u(t)

TachometerCurrent Peak Detector

yJSampled after the run, does not need to be sampled continuously

Unknown to Method



• Completely automated• Progressive improvement• Sizable variation

– Direct optimization on an average of 10 trials still did not converge

– However, LWR provided an sufficiently accurate estimate of the gradients to converge

• Thirteen sets of data– Multiple runs gave similar results



• 7 dim in 17 hours – About 40,000 samples– Method parameters were not optimized

• Results make sense– Final voltage determines output– Initial voltage very similar– Initial slope flattens


My ContributionsRelated publications and presentations:

• Journal submissions– “Unbounded Motion Optimization by Developmental Learning ” Revision submitted to IEEE

Systems, Man and Cybernetics Part B– “Optimal Inverse Functions Created via Population Based Optimization” Submitted to IEEE Systems,

Man and Cybernetics Part B• Conference Presentations

– “Memory-Based Motion Optimization for Unbounded Resolution” Computational Intelligence and Bioinformatics, IASTED, 753-31, Nov 2011

– “Population Based Optimization for Variable Operating Points” Congress on Evolutionary Computation, IEEE, Jun 2011

– “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” Robotics and Applications, IASTED, 706-21, Nov 2010

– “Biomimetic Learning, Not Learning Biomimetics: A survey of developmental learning” National Aerospace and Electronics Conference (NAECON), IEEE, July 2010

• Posters– “Memory Based Optimization for Unbounded Learning” 2011 Great Midwest Regional Space Grant

Consortia Meeting, also NASA Futures Form, Feb 2012.– “Constrained Near-Optimal Control Using a Numerical Kinetic Solver” 2009 Great Midwest Regional

Space Grant Consortia, 3rd place







limitation


CommencementFuture applications:

• Implementation for novel locomotion– Implement on an inch worm– Challenge is automating the tests, such as defining distance traveled– Would be very interesting to reduce variation

• Learn control law for regulation – Develop control law for pendulum– Question of what disturbance to use and metric for cost or output

(possibly response time, the operator sets the urgency)• Address multidimensional outputs

– Robots are used to provide multiple outputs– A manifold of the output may not be represented in the output space

(Think of a screw thread, despite moving continuously , there are multiple surfaces with the same horizontal coordinates).

Documents

Autonomous Motion Learning for Near Optimal Control By Alan Jennings School of Engineering, University of Dayton Dayton, OH, August 2012 Dissertation defense