110
References On Movement Skill Learning and Movement Representations for Robotics Gerhard Neumann 1 1 Graz University of Technology, Institute for Theoretical Computer Science November 2, 2011 On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

  • Upload
    donhu

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

On Movement Skill Learning and MovementRepresentations for Robotics

Gerhard Neumann1

1Graz University of Technology, Institute for Theoretical Computer Science

November 2, 2011

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 2: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Modern Robotic Systems: Motivation...

Many degrees of freedoms, compliant actuators, highly dynamicmovements...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 3: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

In principle the advanced morphology of these robots would allowus to perform a wide range of complex movements such as

• Different forms of locomotion (walking, running, trott)

• Jumping

• Playing tennis...

Classical control methods often fail or are very hard to use for suchcomplex movements.

• More promising approach : Let the robot learn the movementfrom trial and error

• Main topic of this thesis !

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 4: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Movement Skill Learning for Robotics

Movement Skill Learning can be easily formulated asReinforcement Learning problem.

• The agent has to search for a policy which optimizes reward

So why is it challenging?

• High dimensional continuous state spaces

• High dimensional continuous action spaces

• Data is expensive : Needs to be data efficient

• Needs to be safe

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 5: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Movement Skill Learning for Robotics

Learning algorithms can be roughly divided into

• Value-based methods

• Policy-search methods

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 6: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Value-based methods

• Estimate the expected discounted future reward for each states when following policy π

V π(s) = E

[

∞∑

t=0

γtrt

]

,

• Also denoted as value function of policy π

• Recursive Form

V π(s) = E[

r(s,a) + γV π(s′)]

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 7: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Value-based methods

+ The value function can be used to assess the quality of eachintermediate action of an episode

• E.g. by the use of the Temporal Difference (TD) error

δt = rt + γV π(st+1) − V π(st)

• Evaluates if the current step 〈st,at, rt, st+1〉 was better orworse than expected

• We can efficiently solve the temporal credit assignmentproblem

- The value function is very hard to estimate inhigh-dimensional continuous state and action spaces

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 8: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Policy Search Methods

• Rely on a parametric representation of the policy π(a|s;w)• Parameters of the policy w

• Directly optimize policy parameters by performing rollouts onthe real system

- We can only assess the quality of a whole trajectory instead ofsingle actions

+ However, as no value function is estimated this can be donevery accurately

• More successful than value based methods

• Performance strongly depends on the used movementrepresentation

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 9: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 10: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

Movement Representations

• Kinematic Synergies

• Motion Templates

• Planning Movement Primitives

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 11: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

Movement Representations

• Kinematic Synergies

• Motion Templates

• Planning Movement Primitives

Policy Search

• Variational Inference for Policy Search in Changing Situations

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 12: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

Movement Representations

• Kinematic Synergies

• Motion Templates

• Planning Movement Primitives

Policy Search

• Variational Inference for Policy Search in Changing Situations

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 13: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 14: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Fitted Q-iteration : Batch-Mode Reinforcement Learning(BMRL)

• Batch-Mode RL methods use the whole history H of theagent to update the value or action value function

H = {< si,ai, ri, s′i >}1≤i≤N

• Advantage : Data-points are used more efficiently than inonline methods

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 15: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Fitted Q-iteration : Batch-Mode Reinforcement Learning(BMRL)

• Fitted Q-Iteration (Ernst et al., 2003) approximates thestate-action value function Q(s,a) by iteratively usingsupervised regression techniques

• Repeat K times

Qk+1(i) = ri + γVk(s′i) = ri + γ max

a′

Qk(s′i,a

′)

Dk ={

[

(si,ai), Qk+1(i)]

1≤i≤N

}

, Qk+1 = Regress(Dk)

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 16: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Fitted Q-iteration : Batch-Mode Reinforcement Learning(BMRL)

+ FQI has proven to outperform classical online RL methods inmany applications (Ernst et al., 2005).

+ Any type of supervised learning method can be used ... E.g.neural networks (Riedmiller, 2005), regression trees(Ernst et al., 2005), Gaussian Processes

- High computational demands...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 17: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

Continuous state spaces :√

Any type of supervised learning method can be used ... E.g.neural networks, regression trees, Gaussian Processes

Continuous action spaces :

• We have to solve

Qk+1(i) = ri + γ maxa′

Qk(s′i,a

′)

- Hm... how do perform the maxa′-operator in continuous

action spaces?

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 18: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

Hm... how do perform the maxa′-operator in continuous action

spaces?

• Discretizations become prohibitively expensive in highdimensional spaces

• We have to solve an optimization problem for each sample !E.g. use Cross-Entropy optimization for each data point s′i

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 19: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

Hm... how do perform the maxa′-operator in continuous action

spaces?

• We show that an advantage-weighted regression can be usedto approximate maxa Q(s,a).

• The regression uses the states si as input values and Q(si,ai)as target values.

• The weighting wi = exp(τA(s,a) of each data point is basedon the advantage function A(s,a) = Q(s,a) − V (s).

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 20: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

What is a weighted regression ?

• Minimize the error function w.r.t. θ

E =N

i=1

wi(V (si; θ) − Q(si,ai))2

• wi . . . each data point gets an individual weighting

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 21: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

We proof this by applying the following 2 steps:

• Weighted regression for value estimation

• Soft-greedy policy improvement

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 22: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Weighted regression for value estimation

• The value function of a stochastic policy π is given byV π(s) =

aπ(a|s)Q(s,a)da

• We show that this can be approximated without evaluatingthe integral by solving a weighted regression problem

DV = {〈si, Q(si,ai)〉} , U = {π(ai|si)} ,

V = WeightedReg(DV , U)

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 23: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Proof

We want to find an approximation V (s) of V π(s) by minimizingthe error function

Error(V ) =

s

µ(s)

(∫

a

π(a|s)Q(s,a)da − V (s)

)2

ds

=

s

µ(s)

(∫

a

π(a|s)(

Q(s,a) − V (s))

da

)2

ds,

• µ(s) : state distribution when following policy π(·|s).

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 24: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Proof

Squared error function :

Error(V ) =

s

µ(s)

(∫

a

π(a|s)(

Q(s,a) − V (s))

da

)2

ds,

An upper bound of Error(V ) is given by :

ErrorB(V ) =

s

µ(s)

a

π(a|s)(

Q(s,a) − V (s))2

dads ≥ Error(V ).

• Use of Jensens inequality

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 25: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Proof

It is easy to show that both error functions have the sameminimum for V

• The upper bound ErrorB can be approximatedstraightforwardly by samples {(si,ai), Q(si,ai)}1≤i≤N

ErrorB(V ) ≈N

i=1

π(ai|si)(

Q(si,ai) − V (si))2

(1)

• No integral over the action space is needed!

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 26: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

FQI for Robotics...

We proof this by applying the following 2 steps:

• Weighted regression for value estimation

• Soft-greedy policy improvement

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 27: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Soft-greedy policy improvement

The optimal value function V (s) = maxa Q(s,a) can beapproximated without evaluating maxa Q(s,a) by solving anadvantage-weighted regression problem.

DV = {〈si, Q(si,ai)〉} , U∗ ={

exp(τA(si,ai))}

, (2)

V = WeightedReg(DV , U∗) (3)

- τ . . . greediness parameter of the algorithm.

- A(s,a) . . . normalized advantage function.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 28: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Proof

We approximate the value function V π1 of a soft-max policy π1 bythe use of weighted regression.

• Since a soft-max policy is an approximation of the greedypolicy, we can replace V (s) = maxa Q(s,a) with V π1(s).

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 29: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Proof

The used soft-max policy π1(a|s) is based on the advantagefunction A(s,a) = Q(s,a) − V (s).

π1(a|s) =exp(τA(s,a))

aexp(τA(s,a))da

, A(s,a) = A(s,a)−mA(s)σA(s) .

• If we assume that the advantages A(s,a) are normallydistributed the denominator of π1 is constant.

• Thus we can use exp(τA(s,a)) ∝ π1(a|s) directly asweighting for the regression.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 30: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Concrete algorithm : LAWER

The Locally-Advantage WEighted Regression (LAWER) algorithmimplements the presented theoretical results.

• It combines Locally Weighted Regression (LWR,(Atkeson et al., 1997)) and advantage weighted regression.

• The locality weighting wi(s) and the advantage weightingui = exp(τA(si,ai)) can be multiplicatively combined

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 31: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Concrete algorithm : LAWER

• The value function is then given by a simple weighted linearregression:

Vk+1(s) = s(STUS)−1STUQk+1

• s = [1, sT ]T , S = [s1, s2, ..., sN ]T is the state matrix.• U = diag(wi(s)ui)

• In order to approximate V (s) = maxa Qk(s,a) only theQ-values of neighbored state-action pairs are needed.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 32: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximation of the policy

For unseen states we need to approximate the soft-max policy

• Gaussian policy π(a|s) = N (a|µ(s), σ2).

• For estimating this policy we use reward-weighted regression(Peters & Schaal, 2007), only the advantage is used insteadof the reward for the weighting.

• Thus, we optimize the long-term reward instead of theimmediate reward

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 33: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Results

• We use the Cross-Entropy (CE) optimization method(de Boer et al., 2005) as comparison to find the maximumQ-values maxa Q(s,a).

• We compare the LAWER algorithm to 3 different state of theart CE-based fitted Q-iteration algorithms:

• Tree-based FQI (Ernst et al., 2005) (CE-Tree)• Neural FQI (Riedmiller, 2005) (CE-Net)• LWR-based FQI (CE-LWR)

• After each FQI cycle new data was collected.

• The immediate reward function was quadratic in the distanceto the goal position xG and in the applied torque/force

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 34: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum swing-up task

• A pendulum needs to be swung up from the position at thebottom to the top position (Riedmiller, 2005).

• 2 experiments with different torque punishment factors (c2)were carried out.

5 10 15 20

−40

−30

−20

−10

Number of Data Collections

Ave

rage

Rew

ard

LAWERCE TreeCE LWRCE Net

(a) c2 = 0.005

5 10 15 20−80

−60

−40

−20

Number of Data Collections

Ave

rage

Rew

ard

LAWERCE TreeCE LWRCE Net

(b) c2 = 0.025

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 35: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Comparison of torque trajectories

−505

LAWER

−505

u [N

]

CE Tree

0 1 2 3 4 5−5

05

Time [s]

CE LWR

(c) c2 = 0.005

−505

LAWER

−505

u [N

]

CE Tree

0 1 2 3 4 5−5

05

Time [s]

CE LWR

(d) c2 = 0.025

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 36: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Dynamic puddle-world

• The agent has to navigate from a start position to a goalposition, it gets negative reward when going through puddles.

• Dynamic version of the puddle-world : The agent can set aforce accelerating a k-dimensional point mass.

• This was done for k = 2 and k = 3 dimensions.

0 1

1

Start

Goal

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 37: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Comparison of the algorithms

5 10 15 20 25 30−100

−80

−60

−40

−20

Number of Data Collections

Ave

rage

Rew

ard

LAWERCE Tree

(e) 2-D

5 10 15 20 25 30−150

−100

−50

Number of Data Collections

Ave

rage

Rew

ard

LAWERCE Tree

(f) 3-D

• The CE-Tree Method learns faster, but does not manage tolearn high quality policies for the 3D setting.

• LAWER also works for high dimensional action spaces.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 38: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Comparison of torque trajectories

−202

u1

−202

u2

0 1 2 3 4 5−2

02

Time [s]

u3

(g) LAWER

−202

u1

−202

u2

0 1 2 3 4 5−2

02

Time [s]

u3

(h) CE-Tree

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 39: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Conclusion

• We have proven that the greedy operator maxa Q(s,a) can beapproximated efficiently by an advantage-weighted regression.

• The resulting algorithm runs an order of magnitude fasterthan competing algorithms.

• In spite of the resulting soft-greedy policy improvement ouralgorithm was able to produce policies of higher quality.

• The Locally-Advantage Weighted Regression algorithm allowsus to use fitted Q-iteration even for high dimensionalcontinuous action spaces.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 40: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

Movement Representations

• Kinematic Synergies

• Motion Templates

• Planning Movement Primitives

Policy Search

• Variational Inference for Policy Search in Changing Situations

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 41: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Movement Representations for Motor Skill Learning

Directly optimize a parametric movement representation

• No value estimation is needed

• What is a good representation for learning a movement?

Episodic Tasks:

• Often it is sufficient to formulate the learning task in theepisodic RL setup

• Single initial state, specified fixed duration of the movement

• Direct Policy Search can be applied easily in this setup

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 42: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Movement Representations for Motor Skill Learning

Episodic setup : Use a trajectory-based representation

• We learn a parametric representation of the desired trajectory

[qd(t;w), qd(t;w)]

• t . . . duration of the movement, no direct dependence on thehigh dimensional state

• t is now a scalar, this significantly simplifies the learningproblem

• Can only be used in the episodic setup (single start states)• This trajectory is then followed by using feedback control laws

• Most common movement representations are trajectorybased...

• Dynamic Movement Primitives (Ijspeert & Schaal, 2003),Splines (Kolter & Ng, 2009), ...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 43: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Trajectory-based vs. Value Based Motor Skill learning

Trajectory-Based:

• Can be seen as single-step decision task

• The agent chooses the parameters w as action of a single,temporally extended step

• Only one step per episode...

Value-Based:

• One decision per time step of the agent

• The agent chooses the tourque u as action of a single, veryshort time step

• Up to a few hundred steps per episode...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 44: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Trajectory-based vs. Value Based Motor Skill learning

Can we find a more intuitive solution for which the agent choosesnew actions only at certain, characteristic time points of themovement?

• Temporal Abstraction: Sequencing of temporally extendedactions, also called Motion Templates

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 45: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Temporal Abstractions for Motor-Skill Learning

Example : Drawing a triangle with a pen

Flat Setup

• We have to make manyunessential decisions

Abstracted Setup

• The movement can be easilydecomposed into 3elemental motions

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 46: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Temporal Abstractions for Motor-Skill Learning

Example : Drawing a triangle with a pen

Flat Setup

• We have to make manyunessential decisions

Abstracted Setup

• The movement can be easilydecomposed into 3elemental motions

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 47: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Temporal Abstractions for Motor-Skill Learning

Standard framework for temporally extended actions : Options(Sutton et al., 1999)

• Options are closed loop policies taking actions over a periodof time

• However: They are mainly used in discrete environments.• In many applications options are discrete temporally extended

actions

• E.g. “Go to another room”, “Follow the hallway” or “Frightenthe poor monkey”

• For motor tasks useful options are often difficult to specify.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 48: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Temporal Abstractions for Motor-Skill Learning :Illustration

• Pendulum Swing-up Task :• Standard RL benchmark task• Learn how to swing up and balance an inverted pendulum from

the bottom position• We additionally want to minimize the energy consumption• Flat RL : Choose a new action every 50ms

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 49: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up: Illustration

How can we decompose the trajectory into options?

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[s]

Positive peakNegative peakBalancing Motion

• We have positive and negativepeaks in the torque trajectory ...

• ... followed by a final balancingmotion.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 50: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up: Illustration

How can we decompose the trajectory into options?

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[s]

Positive peakNegative peakBalancing Motion

Specify the exact form of the peaks andthe balancing motion for the options ?

• Requires a lot of prior knowledge...

• The learning task becomes trivial...

However : We can specify thefunctional form of the options

• Use parameterized options...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 51: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Motion Templates

• Motion templates : Parameterized Options• Used as our building blocks of motion.

• A motion template mp is defined by :• Its kp dimensional parameter space Θp

• Its parameterized policy up(s, t; θp)• Its termination condition cp(s, t; θp)

• s . . . state, t . . . execution time, θp ∈ Θp . . . parameters

• The functional form of up and cp is chosen by the designer,the parameters θp are learned by the agent

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 52: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Motion Templates

• At each decision time step σk the agent has to choose :• Which motion template mp ∈ A(σk) to use.

• A(σk) . . . set of available motion templates in decision timestep σk.

• Which parameterization θp ∈ Θp of mp to use.

• Subsequently the policy up is executed until the terminationcondition cp is fullfilled.

• Continuous time :• The duration of the templates can be continuous valued

• The agent has to learn the correct sequence andparameterization of the motion templates

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 53: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-up : Decomposition into MotionTemplates

How can we decompose the trajectory into motion templates?

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[s]

Positive peakNegative peakBalancing Motion

• We have positive and negativepeaks in the torque trajectory ...

• ... followed by a final balancingmotion.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 54: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Templates to model the peaks

We use 2 templates per peak:

• One for the ascending part : m1...

• ... and one for the descending part : m2

• Both just depend on the execution time of the template.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 55: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Decomposition into MotionTemplates

We use 2 templates per peak:

0 0.1 0.2 0.3 0.4

0

2

4

Ascending part (m1)

Time [s]

Tor

que

[N]

a1 = 4

a1 = 3

a1 = 2

0 0.1 0.2 0.3 0.4

0

2

4

Descending part (m2)

Time [s]

Tor

que

[N]

a2 = 4

a2 = 3

a2 = 2

Parameters :

• ai . . . height of the template

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 56: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Decomposition into MotionTemplates

We use 2 templates per peak:

0 0.1 0.2 0.3 0.4

0

2

4

Ascending part (m1)

Time [s]

Tor

que

[N]

o1 = 0.5

o1 = 1

o1 = 2

0 0.1 0.2 0.3 0.4

0

2

4

Descending part (m2)

Time [s]

Tor

que

[N]

o2 = 3

o2 = 6

o2 = 9

Parameters :

• ai . . . height of the template

• oi . . . curvature of the template

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 57: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Decomposition into MotionTemplates

We use 2 templates per peak:

0 0.2 0.4 0.6

0

2

4

Ascending part (m1)

Time [s]

Tor

que

[N]

d1 = 0.3

d1 = 0.5

d1 = 0.7

0 0.2 0.4 0.6

0

2

4

Descending part (m2)

Time [s]

Tor

que

[N]

d2 = 0.3

d2 = 0.5

d2 = 0.7

Parameters :

• ai . . . height of the template

• oi . . . curvature of the template

• di . . . duration of the template

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 58: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Decomposition into MotionTemplates

• We fix the height of the descendingpeak template m2 to be the heightof m1.

• m3 and m4 are the sametemplates, just for negative peaks.

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[N]

Positive Peak, Descending PartNegative Peak, Ascending PartNegative Peak, Descending PartPositive Peak, Ascending PartBalancing Motion

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[N]

m2

m3

m4

m1

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 59: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Decomposition into MotionTemplates

• The balancing template is implemented as PD-controller

MT Functional Form Parametersm5 −k1θ − k2θ

′ k1, k2

• k1 and k2 are the PD controller gains.• m5 always runs for 20s, subsequently the episode is terminated

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 60: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Constructing the motion

• The agent can either choose thepeak templates in the predefinedorder (m2, m3, m4, m1, m2, ...)

• ... or it can use the balancingtemplate m5 as final template.

• Thus the agent has to learn thecorrect number of swing-ups andthe correct parameterization of theswing- ups.

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[N]

Positive Peak, Descending PartNegative Peak, Ascending PartNegative Peak, Descending PartPositive Peak, Ascending PartBalancing Motion

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[N]

m2

m3

m4

m1

m5

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 61: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Constructing the motion

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]

Tor

que

[N]

Positive Peak, Descending PartNegative Peak, Ascending PartNegative Peak, Descending PartPositive Peak, Ascending PartBalancing Motion

0 1 2 3 4 5−6

−4

−2

0

2

4

6

Time [s]T

orqu

e [N

]

m2

m3

m4

m1

m5

• Flat : Approximately 50 decisions/parameters are needed toreach the top position

• Motion Templates : The whole motion consists only of 5decisions / 13 parameters

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 62: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Accuracy of the policy

• Motion templates decrease the number of necessary decisionssignificantly

• Overall learning task is simplified• Ok... where is the catch?

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 63: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Pendulum Swing-Up : Accuracy of the policy

• Motion templates decrease the number of necessary decisionssignificantly

• Overall learning task is simplified• Ok... where is the catch?

• A single decision has now much more influence on theoutcome of the whole motion.

• Therefore a single decision has to be made much moreprecisely than in flat RL.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 64: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Algorithm for Motion Template Learning

• An RL algorithm is needed which can learn very precisecontinuous valued policies!

• For each template mp, we use an advancement of the LocallyAdvantage WEighted Regression (LAWER,(Neumann & Peters, 2009)) algorithm to learn the policyπp(θp|s) for selecting the parameters of mp.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 65: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Extensions of LAWER : LAWER for Motion TemplateLearning

Due to the increased precision requirements of motion templatelearning we had to develop 2 substantal extensions of LAWER

• Adaptive tree-based Kernels

• Additional optimization to improve the approximation ofV (s) = maxa Q(s,a)

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 66: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Extensions of LAWER : Adaptive Tree-Based Kernels

The use of an uniform weighting kernel is often problematic in thecase of ...

• High dimensional input spaces (’curse of dimensionality’)

• Spatially varying data densities

• Spatially varying curvatures of the regression surface

This problem can be alleviated by varying the ’shape’ of theweighting kernel.

• We do this by the use of randomized regression trees...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 67: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Extensions of LAWER : Improved approximation ofV (s) = maxa Q(s, a)

• In order to estimate the weightings ui, the original LAWERneeded the assumption of normally distributed advantagevalues.

• Often this assumption does not hold and the estimate of ui

gets imprecise.

• We improve the estimate of the ui by an additionaloptimization...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 68: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments

Minimum-Time problems with additional energy-consumptionconstraints (c2)

• Pendulum Swing-Up

• 2-link Pendulum Swing-Up

• 2-link Pendulum Balancing

Iterative learning protocoll:

• We collect L episodes with the currently estimatedexploration policy

• Subsequently the optimal policy is reestimated ...

• ... and the performance (summed reward) of the optimalpolicy (without exploration) is evaluated.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 69: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : Pendulum Swing-Up

Comparison of learning progress for different energy punishmentfactors (L = 50)

0 20 40 60−80

−60

−40

−20

Number of Data Collections

Ave

rage

Rew

ard

MT TreeMT GaussFlat

0 20 40 60−200

−150

−100

−50

Number of Data Collections

Ave

rage

Rew

ard

MT TreeMT GaussFlat

Figure: Learning curves for the Gaussian kernel (MT Gauss) and thetree-based kernel (MT Tree) for (left) c2 = 0.025 and (right) c2 = 0.075

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 70: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : Pendulum Swing-Up

Comparison of the flat and the motion template policy

−505

m2

m3

m5

c2 = 0.005

−505

m2m

3m

4m

1m

5

c2 = 0.025

0 1 2 3 4 5−5

05

m2

m3m

4m

1m

2m

3m

4m

1m

5

c2 = 0.075

Time [s]

−505

c2 = 0.005

−505

c2 = 0.025

0 1 2 3 4 5−5

05

c2 = 0.075

Time [s]Figure: (a) Torque trajectories and motion templates learned for differentenergy punishment factors c2. (b) Torque trajectories learned with flat RL

Performance for c2 = 0.075 :

• flat RL −48.6, motion templates −38.5

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 71: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : 2-Link Pendulum Swing-Up

Same templates as for the1-dimensional task

• The peak templates have now 2additional parameters, the heightand the curvature for the secondcontrol dimension u2.

• The parameters of the balancertemplate m5 consists of two 2 × 2matrices for the controller gains.

0 50 100 150−70

−60

−50

−40

−30

−20

−10

Number of Data Collections

Ave

rage

Rew

ard

MT TreeFlat

Figure: Comparison formotion template learningwith tree-based kernels andflat RL

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 72: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : 2-Link Pendulum Swing-Up

Learned motion template policy

1 2−6

−4

−2

0

2

4

6 m2

m3

m4

m5

Time [s]

Tou

rque

[Nm

]

u1

u2

Figure: Left: Torque trajectories and decomposition in the motiontemplates. Right: Illustration of the motion. The bold postures representthe switching time points of the motion templates.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 73: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Conclusions

• We have shown that by the use of motion templates, i.e.parametrized options, many motor tasks can be decomposedinto elemental movements.

• Motion templates are the first movement representation whichcan be sequenced in time

• While the whole motion consists of less decisions, a singledecision has to be made more precisely.

• We propose a new algorithm for motion template learningwhich can cope with the precision requirements

• We have shown that learning with motion templates canproduce policies of higher quality than flat RL and could evenbe applied to tasks where flat RL was not successful.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 74: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline: The thesis is divided into 3 parts...

Value-based Methods

• Graph-Based Reinforcement Learning

• Fitted Q-Iteration by Advantage Weighted Regression

Movement Representations

• Kinematic Synergies

• Motion Templates

• Planning Movement Primitives

Policy Search

• Variational Inference for Policy Search in Changing Situations

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 75: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Policy Search for trajectory-based representations

Back to trajectory-based representations

• Only 1 decision per episode : Choose parameter vector w

• Typically w is very high dimensional (40 - 100 parameters)

How can we optimize the parameters w?

• Policy Gradient Methods (Williams, 1992;Peters & Schaal, 2006)

• EM-based Methods (Kober & Peters, 2010)

• Inference-based Methods (Vlassis et al., 2009;Theodorou et al., 2010)

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 76: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Inference-based Methods: Policy Search for changingSituations

In different situations s0i we have to choose different parameter

vectors wi

• Can we generalize between solutions to avoid relearning?

• Learn a hierarchic policy πMP(w|s0; θ) which chooses theparameter vector w according to the situation s0.

• In order to do so we will use approximate inference methods

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 77: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Outline

Approximate Inference for Policy Search

• Decomposition of the log-likelihood

• Monte-Carlo EM based methods

• Variational Inference based methods

Policy Search for Movement Primitives in changing situations

• 4-Link Balancing

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 78: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximate Inference for Policy Search

Using inference or inference-based methods has proven to be veryuseful for policy search

• PoWeR (Kober & Peters, 2010), Policy Improvement by PathIntegrals (Theodorou et al., 2010)

• Reward Weighted Regression, Cost Regularized KernelRegression (Kober et al., 2010)

• Monte Carlo EM Policy Search(Vlassis et al., 2009)

• CMA-ES (Heidrich-Meisner & Igel, 2009)

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 79: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

All these algorithms use the Moment-projection of a certaintarget distribution to estimate the policy

• As we will see this can be problematic in many cases...(multi-modal solution space, complex reward functions...)

• Here we will introduce the theory to use theInformation-projection and show that this projectionalleviates many of these problems

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 80: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximate Inference for Policy Search

Formulating policy search as inference problem...

• Observed variable :• Introduce a Reward Event p(R = 1|τ), e.g

p(R = 1|τ) ∝ exp(−C(τ))• C(τ) · · · trajectory costs

• Latent Variables : trajectories τ

• Probabilistic Model : p(R = 1, τ ; θ) = p(R = 1|τ)p(τ ; θ)

We want to find parameters θ which maximize the log-marginallikelihood

log p(R; θ) = log

τ

p(R|τ)p(τ ; θ)dτ

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 81: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximate Inference for Policy Search

Policy Search can be seen as finding the maximum likelihood (ML)solution of p(R; θ)

p(R; θ) =

τ

p(R|τ)p(τ ; θ)dτ

• Problem: Huge trajectory space, the integral is intractable

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 82: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Decomposition of the log-likelihood

We can decompose the log-likelihood by introducing a variationaldistribution q(τ) over the latent variable τ :

log p(R; θ) = L(q, θ) + KL(q||pR),

Lower Bound L(q, θ):

L(q, θ) =

τ

q(τ) log p(R, τ ; θ) + f1(q) = · · ·

=

τ

q(τ) log p(τ ; θ) + f2(q)

• Expected complete data log-likelihood ...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 83: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Decomposition of the log-likelihood

We can decompose the log-likelihood by introducing a variationaldistribution q(τ) over the latent variable τ :

log p(R; θ) = L(q, θ) + KL(q||pR),

Kullback-Leibler divergence KL(q||pR) :

KL(q||pR) = −∫

τ

q(τ) logpR(τ)

q(τ)dτ

• ’Distance’ between variational distribution q and conditionaldistribution of the latent variable p(τ |R; θ)

• pR(τ) = p(τ |R; θ) ∝ p(R|τ)p(τ ; θ) ... reward-weighted modeldistribution

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 84: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Decomposition of the log-likelihood

We can now iteratively increase the lower bound L(q, θ) by:

• E-Step:• Keep model parameters θ fixed• Minimize KL-divergence KL(q||pR) w.r.t q

• M-Step:• Keep variational distribution q fixed• Maximize Lower Bound L(q,θ) w.r.t θ

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 85: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximate Inference for Policy Search

Two types of policy search algorithms emerge from thisdecomposition

• Monte-Carlo EM based Policy Search (Kober et al., 2010;Kober & Peters, 2010; Vlassis et al., 2009)

• Variational Inference Policy Search

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 86: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Monte-Carlo (MC) EM based Algorithms

MC-EM based algorithms use a sample based approximation of q

in the E-step.

• E-Step minq KL(q||pR) : q(i) = pR(i) ∝ p(R|τi)p(τi; θold)

• M-Step maxθ L(q, θ) : Use q(i) to approximate lower bound

L(q, θ) ≈∑

i

pR(i) log p(τi; θ) + const

= −KL(pR||p(τ ; θ)) + const

This is the same lower Bound given as given for PoWER andReward-Weighted Regression.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 87: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Monte-Carlo (MC) EM based Algorithms

Iteratively calculate M(oment)-projection of pR :

minθ

KL(pR||p) = −∑

i

pR(i) log

(

p(τi; θ)

pR(i)

)

• The model becomes ’Reward attracted’• Forces model p to have high probability in regions with high

reward• Negatively rewarded samples are neglected!

• Minimization?• p can be easily calculated by matching the moments of p with

the moments of pR

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 88: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Variational Inference based Algorithms

For Variational Inference we use a parametric variationaldistribution q(τ) = q(τ ; θ′)

• E-Step minq KL(q||pR) : Use sample-based approximation forthe integral in the KL-divergence

KL(q(τ ; θ′)||pR) ≈ −∑

τi

q(τi; θ′) log

pR(i)

q(τi; θ′)

• M-Step maxθ L(q, θ) : If we use the same family ofdistributions for p(τ ; θ) and q(τ ; θ′) we can simply set θ to θ′

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 89: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Variational Inference based Algorithms

Iteratively calculate I(nformation)-projection of pR :

minθ

KL(p||pR) = −∑

i

p(τi; θ) log

(

pR(i)

p(τi; θ)

)

• The model becomes ’Cost-averse’ :• Tries to avoid including in regions with low reward in p(τ ;θ)• Uses information from negatively and positively rewarded

examples

• Minimization?• Non-convex optimization problem (computationally much more

demanding than using M-projection) ...• We use numerical gradient ascent

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 90: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Approximate Inference for Policy Search

MC-EM : M-projection based

minθ

KL(pR||p) = −∑

i

pR(i) log

(

p(τi; θ)

pR(i)

)

Variational Inference : I-projection based

minθ

KL(p||pR) = −∑

i

p(τi; θ) log

(

pR(i)

p(τi; θ)

)

Both algorithms are guaranteed to iteratively increase the lowerbound...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 91: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

I vs M-projection : Illustrative Examples

Lets look at the differences in more detail...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 92: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

I vs M-projection : Illustrative Examples

We consider 1-step decision problems in continuous state andaction spaces

• We typically use a Gaussian distribution as model distribution

p(s,a; θ) = N([

s

a

]

|[

µs

µa

]

,

[

Σss Σas

Σsa Σaa

])

,

• with θ = {µs, µa,Σss,Σas,Σaa}

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 93: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

I vs M-projection : Illustrative Examples

2-dimensional action space, no state variables, multimodal targetdistribution

M−

Pro

ject

ion

I−P

roje

ctio

n

• M-projection averages over all modes• I-projection concentrates on one mode

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 94: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

I vs M-projection : Illustrative Examples

We also want to have state variables...

• The policy π(a|s; θ) is obtained by conditioning on the state s.

• Policy π is a linear Gaussian model...

In order to get more complex policies π(a|st; θ) ...

• For each state st, we reestimate the model p(s,a; θ) locally(using either the M- or I-projection)

• We clamp µs at st.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 95: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

I vs M-projection : Illustrative Examples

• 1-dimensional state and action space

• complex reward function (dark background indicates negativereward)

• Policy is estimated for 6 different states

M−projection

s1 s2 s3 s4 s5 s6

I−projection

s1 s2 s3 s4 s5 s6

• M-projection includes areas of low reward in the distribution !

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 96: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Policy Search for Motion Primitives

Lets apply variational inference for policy search in changingsituations

• Movement representation : parametrized velocity profiles

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 97: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Multi situation setting : How can we learn θ?

• Existing algorithms are all MC-EM based and therefore usethe M-projection

• Reward-Weighted Regression (Peters & Schaal, 2007),Cost-Regularized Kernel-Regression (Kober et al., 2010)

• Online learning setup : as samples we always use the historyof the agent...

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 98: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : Cannon-Ball Task

Learn to shoot a cannon-ball at a desired location

• State-space s0 : Desired Location, Wind Force

• Parameter-space w : Launching Angle and Velocity of the ball

Comparison of I andM-projection :

• CRKR : Cost-RegularizedKernel Regression

• Multi-modal solution space,I-projection performs best

1000 2000 3000 4000 5000−80

−60

−40

−20

0

Episodes

Per

form

ance

I−ProjectionM−projectionCRKR

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 99: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : 4-link pendulum balancing

4-link ’Humanoid’ robot has to counterbalance different pushes

• Situations :• The robot gets pushed with different forces Fi ∈ [0; 25]Ns at 4

different points of origin

• 4-dimensional state space

• Movement Primitives• Sequence of sigmoidal velocity profiles (39 parameters)...

t = 0.10 s t = 0.60 s t = 1.10 s t = 1.60 s t = 2.10 s

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 100: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Experiments : 4-link pendulum balancing

4-link ’Humanoid’ robot has to counterbalance different pushes

• After 60000 episode therobot has learned to balancealmost every force

• The robot learns completelydifferent balancing strategies

• We could not producerelyable results with theM-projection...

t = 0.10 s t = 0.60 s t = 1.10 s t = 1.60 s t = 2.10 s

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 101: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Conclusion

• We can use the M-projection or the I-projection for PolicySearch

• The I-projection also uses information of bad samples, whichare neglected by the M-projection!

• It therefore can be used with ease for multi-modal distributionsor non-concave reward functions

• Computationally quite demanding...• More efficient methods to calculate the I-projection are needed

• Is there still a big difference for more complex modeldistributions... ?

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 102: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

The end

Thanks for your attention!

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 103: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography I

Atkeson, Chris G., Moore, Andrew W., & Schaal, Stefan. 1997.Locally Weighted Learning.Artificial Intelligence Review, 11, 11–73.

de Boer, Pieter-Tjerk, Kroese, Dirk, Mannor, Shie, & Rubinstein,Reuven. 2005.A Tutorial on the Cross-Entropy Method.Annals of Operations Research, 134(1), 19–67.

Ernst, D., Geurts, P., & Wehenkel, L. 2005.Tree-Based Batch Mode Reinforcement Learning.Journal of Machine Learning Resource, 6, 503–556.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 104: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography II

Ernst, Damien, Geurts, Pierre, & Wehenkel, Louis. 2003.Iteratively Extending Time Horizon Reinforcement Learning.Pages 96–107 of: European Conference on Machine Learning(ECML).

Heidrich-Meisner, V., & Igel, C. 2009.Neuroevolution Strategies for Episodic ReinforcementLearning.Journal of Algorithms, 64(4), 152–168.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 105: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography III

Ijspeert, Auke Jan, & Schaal, Stefan. 2003.Learning Attractor Landscapes for Learning Motor Primitives.Pages 1523–1530 of: Advances in Neural InformationProcessing Systems 15.Cambridge, MA: MIT Press.

Kober, J., & Peters, J. 2010.Policy Search for Motor Primitives in Robotics.Machine Learning Journal, online first, 1–33.

Kober, Jens, Oztop, Erhan, & Peters, Jan. 2010.Reinforcement Learning to adjust Robot Movements to NewSituations.In: Proceedings of the 2010 Robotics: Science and SystemsConference (RSS 2010).

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 106: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography IV

Kolter, Z., & Ng, A. 2009.Task-Space Trajectories via Cubic Spline Optimization.Pages 2364–2371 of: Proceedings of the 2009 IEEEinternational conference on Robotics and Automation.ICRA’09.Piscataway, NJ, USA: IEEE Press.

Neumann, G., & Peters, J. 2009.Fitted Q-Iteration by Advantage Weighted Regression.In: Advances in Neural Information Processing Systems 22(NIPS 2008).MA: MIT Press.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 107: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography V

Peters, J., & Schaal, S. 2006.Policy Gradient methods for Robotics.In: Proceedings of the IEEE International Conference onIntelligent Robotics Systems (IROS).

Peters, J., & Schaal, S. 2007.Reinforcement Learning by Reward-Weighted Regression forOperational Space Control.In: Proceedings of the International Conference on MachineLearning (ICML).

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 108: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography VI

Riedmiller, M. 2005.Neural fitted Q-Iteration - First Experiences with a DataEfficient Neural Reinforcement Learning Method.In: Proceedings of the European Conference on MachineLearning (ECML).

Sutton, Richard, Precup, Doina, & Singh, Satinder. 1999.Between MDPs and Semi-MDPs: A Framework for TemporalAbstraction in Reinforcement Learning.Artificial Intelligence, 112, 181–211.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 109: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography VII

Theodorou, E., Buchli, J., & Schaal, S. 2010.Reinforcement Learning of Motor Skills in High Dimensions: aPath Integral Approach.Pages 2397–2403 of: Robotics and Automation (ICRA), 2010IEEE International Conference on.

Vlassis, Nikos, Toussaint, Marc, Kontes, Georgios, & Piperidis,Savas. 2009.Learning Model-Free Robot Control by a Monte Carlo EMAlgorithm.Autonomous Robots, 27(2), 123–130.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk

Page 110: On Movement Skill Learning and Movement Representations ... · References Movement Skill Learning for Robotics Movement Skill Learning can be easily formulated as Reinforcement Learning

References

Bibliography VIII

Williams, Ronald J.. 1992.Simple Statistical Gradient..Following Algorithms forConnectionist Reinforcement Learning.Machine Learning.

On Movement Skill Learning and Movement Representations for Robotics Seminar Talk