Learning by Demonstrationinside.mines.edu/.../Lectures/05-ImitationLearning.pdf · hierarchy of...

Preview:

Citation preview

Learning by Demonstration

1

Robot Learning

2

• This is the learning methods we have learned:

Learning from data

Have we solved all problems?

3

How to teach a robot to play table tennis with people?

Or how to flip pancakes?

Learning from Demonstration

4

Definition: an end-user development technique for teaching a robot new behaviors by demonstrating the task to transfer directly instead of programming it.

Reinforcement Learning

5

Definition: an area of machine learning inspired by behaviorist psychology, concerned with how robots ought to take actions in an environment so as to maximize a cumulative reward.

Learning from Demonstration and Reinforcement Learning

6

In robotics, learning from demonstration and reinforcement learning typically go hand by hand

• Learning from demonstration provides initial solution

• Reinforcement learning provides adaptation capability

Examples:

7

Examples:

8

Examples:

9

Programming by Demonstration (PbD)

10

Programming by Demonstration (PbD)

11

Programming by Demonstration (PbD)

12

Programming by Demonstration (PbD)

13

Programming by Demonstration (PbD)

14

Programming robots is hard!

•Huge number of possible tasks•Unique environmental demands•Tasks difficult to describe formally•Expert engineering impractical

Programming by Demonstration (PbD)

15

How can robots be shown how to perform tasks?

•Natural, expressive way to program•No expert knowledge required•Valuable human intuition•Program new tasks as-needed

Record and Replay

16Then, how to integrate multiple demonstrations?

Programming by Demonstration (PbD)

17

Definition (from Wiki): In computer science, programming by demonstration (PbD) is an end-user development

technique for teaching a robot new behaviors by demonstrating the task to transfer directly instead of

programming it through machine commands.

Also called:• Learning from demonstration• Imitation learning • Apprenticeship learning

A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007Calinon, S. and Billard, A. (2007) Incremental Learning of Gestures by Imitation in a Humanoid Robot. in Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI). (Slides credits: Aude Billard)

Imitation

Level of granularity: What is copied?

Should it copy the intention,

or dynamics of movement?

Gesture Recognition

How are actions perceived?

How is information parsed?

Motor Learning

How is information transferred

and implemented on a

physical robot?

Learning by Imitation

Gesture Recognition

Motor Learning

Biological

Inspiration

Robotic

Implementation

Programming by Demonstration (PbD)

21

Prior to building any capability in robots, we might want to understand how the equivalent capability works in humans and other animals: BIOLOGICAL INSPIRATION

Learning by ImitationBiological

Inspiration

Gesture Recognition

Imitation Capabilities in Animals

Which species may exhibit imitation is still a main

area of discussion and debate

One differentiate “true” imitation from copying

(flocking, schooling, following), stimulus

enhancement, contagion or emulation

Learning by ImitationBiological

Inspiration

Gesture Recognition

Imitation Capabilities in Animals

• Copying and Mimicry: Rats, Monkeys

• Observe companion actor rats performing different

spatial tasks differing according to the experimental

requirements. After the observational training,

surgical ablation to block any further learning

• The observer ratsdisplayed exploration abilities that

closely matched the previously observed behaviors.

Learning by ImitationBiological

Inspiration

Gesture Recognition

Imitation Capabilities in Monkeys

Subjects who saw the Lever demonstrations tended to

use a levering movement to pop open the lid whereas

subjects who viewed Poke, as well as the controls, did

not display this behavior at all.

Learning by ImitationBiological

Inspiration

Gesture Recognition

Imitation Capabilities in Animals

• “True” imitation: Ability to learn new actions not part

of the usual repertoire

• The appanage of humans only, and possibly great apes

Learning by ImitationBiological

Inspiration

Gesture Recognition

Developmental Stages of Imitation

• Innate Facial Imitation (newborns 3 months)

Tongue and lips protrusion, mouth-opening, head

movements, cheek and brow motion, eye blinking

• Delayed imitation up to 24 hours

Imitation is mediated by a stored representation

Learning by ImitationBiological

Inspiration

Gesture Recognition

Developmental Stages of Imitation

• Deferred and delayed imitation - 18 month (Piaget), 9-

12 months (Meltzoff)

• Deferred imitation of novel behavior 67% of the infants who saw the display reproduced the act

after the week's delay, as compared to 0% of the control

infants who had not seen the novel display.

Learning by ImitationBiological

Inspiration

Gesture Recognition

Goals and Intentions

• Infants aged 14 months.

• Children imitate new action to achieve the same

goal only if they consider it to be the most rational

alternative.

Learning by ImitationBiological

Inspiration

Gesture Recognition

Goals and Intentions

• 18-months infants

• Differentiate between human and machine

demonstration

Attribute intentions only to the human

• Learn from unsuccessful examples

Learning by ImitationBiological

Inspiration

Gesture Recognition

Goals and Intentions

• Imitation is hierarchical and goal-directed

• Single-hand motions: accurate ipsilateral imitation,

48% subsitution for crosslateral imitation

• Two-hand motions: only 10% substitution for

crosslateral imitation.

• Two-phase motion eliminates mistakes

• Adding constraints of hand gestures increases mistakes

Learning by ImitationBiological

Inspiration

Gesture Recognition

Imitation in adults

• Reaches highest level of complexity

• Is present in all activities:

Social influence in establishing group norms; collective

frame of reference, transmission of phoebias

• Range of imitative behaviors in animals

Increasing in complexity across species

• Stages of development in children imitation

facial and motion imitation

inferring goals

hierarchy of imitation

• Imitation in adulthood is influenced by mvmt observation,

handedness, orientation of the demonstrator

• Adaptation and reinforcement (including learning from

failures) comes hand by hand with imitation learning

• The underlying neural mechanisms are not yet completely

deciphered

Imitation Learning in Animals

Advantages: When is Imitation useful?

• It is a powerful means of transferring skills

• It speeds up the learning process by showing

possible solutions or conversely by showing bad

solutions

Imitation Learning in Animals

Disadvantages:

When is Imitation not useful?

• Not appropriate: When a good solution for the

teacher is not a possible solution for the learner (when not considering adaptation and reinforcement)

• Disadvantageous: When it induces you in error -

bad teacher

Imitation Learning in Animals

35

36

Programming by Demonstration (PbD)

37How to integrate multiple demonstrations?

Technical Details of PbD

38

• Classification vs regression

Classification: the output variables take

discrete class labels

Regression: the output variable takes

continuous variables

Regression can be used to integrate multiple different demonstrations

Technical Details of PbD

39

• Gaussian (normal) distribution Gaussian is a characteristic symmetric bell curve

that quickly falls off towards 0 (practically)

Technical Details of PbD

40

• Multivariate Gaussian distributions in the n-D space

Technical Details of PbD

41

• Gaussian Mixture Models (GMM) Mixture model is a probabilistic model which

assumes the underlying data to belong to a mixture distribution

Technical Details of PbD

42

• Gaussian Mixture Regression (GMR) Mixture model is a probabilistic model which

assumes the underlying data to belong to a mixture distribution

Technical Details of PbD

43

• Input trajectories from humandemonstrations

Technical Details of PbD

44

• Trajectories are modeled as GMMs

The trajectory p(s,t) is encoded using a GMM, which is a continuous model.

Technical Details of PbD

45

• Trajectories are modeled as GMMs

GMR is used to retrieve p(s,t), namely the expected position at each time step.

Technical Details of PbD

46

• Other examples

GMM is used to model the trajectoryGMR is used to retrieve the trajectory

Technical Details of PbD

47

• Robustness to perturbation

GMM is used to model the trajectoryGMR is used to retrieve the trajectory

Technical Details of PbD

48

Have we solved the problem?

• How to estimate the parameters of a Gaussian or GMM?

• How to estimate the number of Gaussian component in a GMM?

• How to align the demonstrated trajectories with different speed?

Technical Details of PbD

49

Estimate parameters of a Gaussian

Maximum-likelihood estimation (MLE): find the parameters under which the data is most likely for that model• Likelihood function:

• The likelihood is thought of as a function of the parameters where the data is fixed.

Technical Details of PbD

50

Estimate parameters of a Gaussian

Maximum-likelihood estimation (MLE): find the parameters under which the data is most likely for that model• In the maximum likelihood problem, our goal is to find

the that maximizes :

• Often we maximize instead because it is analytically easier.

Technical Details of PbD

51

Estimate parameters of a Gaussian

Does EM work for GMMs?• The answer is no…• Since the data points are

not from the identical Gaussian components

Technical Details of PbD

52

Estimate parameters of a GMM

Expectation–maximization (EM): an iterative method to find maximum likelihood estimates of parameters in statistical models, where the model depends on unobserved latent variables.

Technical Details of PbD

53

Estimate GMM parameters using Expectation–maximization (EM)

Technical Details of PbD

54

Estimate GMM parameters using Expectation–maximization (EM)

Technical Details of PbD

55

Have we solved the problem?

• How to estimate the parameters of a Gaussian or GMM?

• How to estimate the number of Gaussian component in a GMM?

• How to align the demonstrated trajectories with different speed?

Technical Details of PbD

56

Estimate # Gaussian component in GMMs

Model Selection:• Given different models (defined by different hyper-

parameter values), select the best model (i.e., the hyper-parameter resulting in best performance).

• Many methods exist based on different criteria:• Cross-validation• Others such as

Bayesian information criterion, and structural risk minimization

Technical Details of PbD

57

Have we solved the problem?

• How to estimate the parameters of a Gaussian or GMM?

• How to estimate the number of Gaussian component in a GMM?

• How to align the demonstrated trajectories with different speed?

Technical Details of PbD

58

Trajectory Alignment

Dynamic Time Warping (DTW): aims at aligning two sequences by warping the time axis iteratively until an optimal match between the two sequences is found• DTW is a time series alignment algorithm developed

originally for speech recognition.• Consider two trajectories (sequences of data points)

Technical Details of PbD

59

Trajectory Alignment based on DTW• The two sequences can be arranged on the sides of a grid,

with one on the top and the other up the left hand side.• Both sequences start on the bottom left of the grid.• Inside each cell a distance measure

can be placed, comparing the corresponding elements of the two sequences.

• To find the best match or alignment between these two sequences one need to find a path through the grid,which minimizes the total distance between them.

• This shortest path can be found using dynamic programming.

Technical Details of PbD

60

Trajectory Alignment based on DTW

Technical Details of PbD

61

Trajectory Alignment based on DTW

1 3 1 2 1 1 1

1 1 3 1 2 1 11

1

2

1

3

1

1

1 3 1 2 1 1 1

0 8 0 3 0 0 0

0 8 0 3 0 0 0

3 5 3 0 3 3 3

0 8 0 3 0 0 0

8 0 8 5 8 8 8

0 8 0 3 0 0 0

0 8 0 3 0 0 0

Compute the distance matrix

Technical Details of PbD

62

Trajectory Alignment based on DTW

1 3 1 2 1 1 1

1 1 3 1 2 1 11

1

2

1

3

1

1

1 3 1 2 1 1 1

0 8 0 3 0 0 0

0 8 0 3 0 0 0

3 5 3 0 3 3 3

0 8 0 3 0 0 0

8 0 8 5 8 8 8

0 8 0 3 0 0 0

0 8 0 3 0 0 0

1 3 1 2 1 1 1

1 1 3 1 2 1 1

Technical Details of PbD

63

Have we solved the problem?

• How to estimate the parameters of a Gaussian or GMM?

• How to estimate the number of Gaussian component in a GMM?

• How to align the demonstrated trajectories with different speed?

Other general data-related issues also exist in PbD, for example, the curse of dimensionality

64

Recommended