View
23
Download
1
Category
Tags:
Preview:
DESCRIPTION
Function Approximation for Imitation Learning in Humanoid Robots Rajesh P. N. Rao Dept of Computer Science and Engineering University of Washington, Seattle neural.cs.washington.edu Students : Rawichote Chalodhorn, David Grimes Funding : ONR, NSF, Packard Foundation. - PowerPoint PPT Presentation
Citation preview
Function Approximation for Function Approximation for Imitation Learning in Humanoid Imitation Learning in Humanoid
Robots Robots
Rajesh P. N. RaoRajesh P. N. RaoDept of Computer Science and EngineeringDept of Computer Science and Engineering
University of Washington, SeattleUniversity of Washington, Seattleneural.cs.washington.eduneural.cs.washington.edu
Students: Rawichote Chalodhorn, David Grimes
Funding: ONR, NSF, Packard Foundation
The Problem: Robotic Imitation of Human Actions
Teacher (David Grimes)
HOAP-2 Humanoid Robot(Morpheus or Mo)
Example of Motion Capture DataExample of Motion Capture Data
Motion Capture Sequence Attempted Imitation
GoalsGoals
Learn from only observations of teacher statesLearn from only observations of teacher states Expert Expert does notdoes not control robot control robot Also called “implicit imitation” Also called “implicit imitation” (Price & Boutilier, 1999)(Price & Boutilier, 1999)
Similar to how humans learn from imitationSimilar to how humans learn from imitation Avoid hand-coded physics-based modelsAvoid hand-coded physics-based models Learn dynamics in terms of sensory Learn dynamics in terms of sensory
consequences of executed actionsconsequences of executed actions Use teacher demonstration to restrict search Use teacher demonstration to restrict search
space of feasible actionsspace of feasible actions
Step 1: Kinematic mappingStep 1: Kinematic mapping
Need to solve the Need to solve the “correspondence “correspondence problem”problem”
Solved by assuming Solved by assuming markers are on markers are on scaled version of scaled version of robot bodyrobot body
Standard inverse Standard inverse kinematics recovers kinematics recovers joint angles for joint angles for motionmotion
Step 2: Dimensionality Step 2: Dimensionality ReductionReduction
Humanoid robots have large DOF, making Humanoid robots have large DOF, making action optimization intractableaction optimization intractable HOAP-2 has 25 DOFHOAP-2 has 25 DOF
Fortunately, most actions are highly Fortunately, most actions are highly redundantredundant
Can use dimensionality reduction Can use dimensionality reduction techniques (e.g., PCA) to represent states techniques (e.g., PCA) to represent states and actionsand actions
Step 3: Learning Forward Models Step 3: Learning Forward Models using Function Approximationusing Function Approximation
Basic Idea: Basic Idea: 1. Learn forward model in the neighborhood of 1. Learn forward model in the neighborhood of
teacher demonstrationteacher demonstration Use function approximation techniques to map Use function approximation techniques to map
actions to observed sensory consequences actions to observed sensory consequences
2. Use the learned model to infer stable actions 2. Use the learned model to infer stable actions for imitationfor imitation
3. Iterate between 1 and 2 for higher accuracy3. Iterate between 1 and 2 for higher accuracy
Approach 1: RBF Networks for Approach 1: RBF Networks for Deterministic Action SelectionDeterministic Action Selection
Radial Basis Function (RBF) network used Radial Basis Function (RBF) network used to learn the n-th order Markov function:to learn the n-th order Markov function:
st is the sensory state vector is the sensory state vector E.g., E.g., st = t (3D gyroscope signal)
at is the action vector in latent space E.g., Servo joint angle commands in latent space
),,,,,( 111 nttnttt F aasss
Action Selection using the Action Selection using the Learned FunctionLearned Function
Select optimal action for next time step Select optimal action for next time step tt: :
measures torso stability based on predicted measures torso stability based on predicted gyroscope signals:gyroscope signals:
Search for optimal action Search for optimal action att* limited to local * limited to local
region around teacher trajectory in subspaceregion around teacher trajectory in subspace
),,,,,(minarg 11*
nttnttt Ft
aassaa
222zzyyxx ω
(Chalodhorn et al., Humanoids, 2005; IJCAI 2007; IROS, 2009)
Example: Learning to WalkExample: Learning to Walk
Human motion capture data Unoptimized (kinematic) imitation
Example: Learning to WalkExample: Learning to Walk
Motion scalingTake baby steps first (literally!)
Final Result
(Chalodhorn et al., IJCAI 2007)
Approach 2: Gaussian Processes for Approach 2: Gaussian Processes for Probabilistic Action SelectionProbabilistic Action Selection
Dynamic Bayesian Network (DBN) for Imitation
[Slice at time t]
Ot are observations of states St
St = low-D joint space, gyro, foot pressure readingsCt are constraints on states (e.g., gyroscope values near zero)
(Grimes et al., RSS 2006; NIPS 2007; IROS 2007; IROS 2008)
DBN for Imitative LearningDBN for Imitative Learning
Gaussian Process-based Forward Model (input [st-1,at]):
)()(*
)(*1
)( ;,),|( it
iitt
it NP sass
(Grimes, Chalodhorn, & Rao, RSS 2006)
Action Inference using Action Inference using Nonparametric Belief PropagationNonparametric Belief Propagation
Maximum marginal posterior actions
Evidence (blue nodes)
Summary of ApproachSummary of Approach
Learning and action inference are interleaved to yield progressively more accurate forward models and actions
Human Action Imitation
Result after LearningResult after Learning
(Grimes, Rashid, & Rao, NIPS 2007)
From Planning to Policy LearningFrom Planning to Policy Learning
Behaviors shown in the previous slides were Behaviors shown in the previous slides were open-loop, based on planning by inferenceopen-loop, based on planning by inference
Can we learn closed-loop “reactive” behaviors?Can we learn closed-loop “reactive” behaviors? Idea: Idea:
Learn state-to-action mappings (“policies”) Learn state-to-action mappings (“policies”) based on the final optimized output of the based on the final optimized output of the planner and resulting sensory measurements planner and resulting sensory measurements
Policy Learning using Gaussian ProcessesPolicy Learning using Gaussian Processes
For a parameterized task T(For a parameterized task T(), watch ), watch demonstrations for particular values of demonstrations for particular values of E.g., Teacher lifting objects of different weightE.g., Teacher lifting objects of different weight Parameter Parameter not given but intrinsically encoded in not given but intrinsically encoded in
sensory measurementssensory measurements Use inference-based planning to infer stable Use inference-based planning to infer stable
actions aactions at t and states sand states stt for demonstrated values for demonstrated values
of of Learn Gaussian process policy based on {sLearn Gaussian process policy based on {s tt, a, att}: }:
(Grimes & Rao, IROS 2008)
Example: Learning to Lift Objects of Example: Learning to Lift Objects of Different WeightsDifferent Weights
Summary and ConclusionsSummary and Conclusions Stable full-body human imitation in a humanoid robot Stable full-body human imitation in a humanoid robot
may be achievable without a physics-based modelmay be achievable without a physics-based model
Function approximation techniques play a crucial role in Function approximation techniques play a crucial role in learning a forward model and in action inferencelearning a forward model and in action inference RBF networks, Gaussian processesRBF networks, Gaussian processes
Function approximation also used to learn policies for Function approximation also used to learn policies for reactive behaviorreactive behavior
Dimensionality reduction using PCA (via “eigenposes”) Dimensionality reduction using PCA (via “eigenposes”) helps keep learning and inference tractablehelps keep learning and inference tractable
Challenges: Scaling up to large number of actions, Challenges: Scaling up to large number of actions, smooth transition between actions, hierarchical controlsmooth transition between actions, hierarchical control
Recommended