Upload
pancho
View
34
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Life-long Learning in Sociable Agents. A Hierarchical Reinforcement Learning Approach Professor Andrea Thomaz Peng Zhou. Sociable Agents. What are sociable agents? Essentially, agents that must interact with humans in a social manner Why sociable agents?. Major Issues. - PowerPoint PPT Presentation
Citation preview
A Hierarchical Reinforcement Learning Approach
Professor Andrea ThomazPeng Zhou
Sociable AgentsWhat are sociable agents?
Essentially, agents that must interact with humans in a social manner
Why sociable agents?
Major IssuesNatural language processing
Required for talking systemsActivity recognition
Not just in the real worldUser interface
Agent-human communication, non-linguisticLife-long learning
Teach, explore, reviseThe role of emotions
Not just fluff
My Focus (for the moment)How to build persistent agents that
accumulate concepts and skills “opportunistically” from its environmentEnvironment includes humans (usually non-
expert)Socially guided learning
Background: Teaching Agents Through Social Interaction
Human input is a long-standing topic in machine learning (ie supervised learning, learning by demonstration, etc.)
Many existing techniques for“teaching” the robot
Psychological benefitsEase of use (“how humans want to teach”),
increased believability, personal investment
Previous Work: Sophie’s KitchenReinforcement Learning, domain ~1000 statesAutonomous exploration
Human input: guidance & state rewardsCommunication channel: gazing, explicit actions
Conducted user studiesResults:
Improved learning speedInsight into how humans like to
teachFun for the human
Reinforcement LearningBasic idea: Finding an Optimal Policy
Act in the environment, receive rewards, modify policy accordingly
Typical formulation: a MDP defined by (S, A, R, T)Advantages:
Desirable statistical propertiesUnsupervised, autonomous learning
LimitationsThe curse of scalePoor transfer of knowledgeRewards can be hard to define
Hierarchical Reinforcement LearningTackles scaling and transfer problemsMay more closely resemble human cognitive
process and therefore inform their expectations for the agent“I’m trying to teach you how to open doors, darn it!”
Two main componentsHierarchical task structureState abstraction
Learning the hierarchy (as opposed to handcrafting)U-trees, HEXQ, diverse density approaches, …
My Approach: Extend Sophie’s Kitchen to HRLBasic idea behind Sophie’s Kitchen:
unsupervised learning is great, but if non-expert supervision is available, why not make use of it?Humans typically have insights into the domainHRL could make very good use of those structures
Challenges extending this to HRLAdapting non-expert, ambiguous inputModifying existing HRL algorithms to use adapted
inputSkill reuse and retention, evaluation of human
suggestions, improvement through practice, personality and trust issues
Current Research StatusExtended Sophie’s Kitchen domain to a tool-
use grid world domain: Sophie’s Adventure Basic Features
NavigationTool useHierarchical StructureTransferrable skillsLarge number of states
Current Research StatusOptions
Sutton, Precup, Singh (1999)HRL method that addresses hierarchical task
structure Temporally extended actions consisting of:
(Ι, π, β), where input set I is a subset of S, π is a local policy, β is the termination condition mapping states in S to [0, 1]
Learning options is a natural extension of RL learning
Primitive actions can be thought of as one-step options, options framework optimal if augmenting
Current Research StatusLearning Options
Feature-based“Clapping” reward channel
Multi-step guidanceIntra-options learning
Keep track of successes and failuresPractice when user is not aroundAggregate similar options
In ProgressFormalize Reward Types
State rewards: “doing good”Object-specific rewards: “look at this…”Special rewards: “that’s the way to do it”
Extracting state abstractions from rewardsObject-specific reward -> make object a
feature???
Planned Future WorkOptions-level state abstraction (MAXQ, HAM,
etc.)Learning options-level state abstraction
U-treesInvolving human input – ie pointing out salient
features of the environmentThe “trust” issue: extending the user
evaluation process for the purpose of formulating “trust” for certain users
Planned Future WorkActual transfer learning experiments, and
exploring how humans could facilitate the process
Carry out user studies on the systemAgent transparency in HRL – how to
communicate internal state to the humanAmbiguous user signalsShould agent ask for clarification?
ConclusionSociable agents are, or will be, ubiquitousThese agents should be able to learn from
humansSocially guided learning can both improve
the learning speed and “personalize” the agent
Higher-order learning likely necessary for realistic applications
Interesting inquiry into our own social expectations and desires