View
4
Download
0
Category
Preview:
Citation preview
A Learning-based Control Architecture for Socially Assistive Robots Providing Cognitive Interventions
by
Jeanie Chan
A thesis submitted in conformity with the requirements for the degree of Masters of Applied Science
Mechanical and Industrial Engineering University of Toronto
© Copyright by Jeanie Chan 2011
ii
A Learning-based Control Architecture for Socially Assistive
Robots Providing Cognitive Interventions
Jeanie Chan
Masters of Applied Science
Mechanical and Industrial Engineering
University of Toronto
2011
Abstract
Due to the world’s rapidly growing elderly population, dementia is becoming increasingly
prevalent. This poses considerable health, social, and economic concerns as it impacts
individuals, families and healthcare systems. Current research has shown that cognitive
interventions may slow the decline of or improve brain functioning in older adults. This research
investigates the use of intelligent socially assistive robots to engage individuals in person-
centered cognitively stimulating activities. Specifically, in this thesis, a novel learning-based
control architecture is developed to enable socially assistive robots to act as social motivators
during an activity. A hierarchical reinforcement learning approach is used in the architecture so
that the robot can learn appropriate assistive behaviours based on activity structure and
personalize an interaction based on the individual’s behaviour and user state. Experiments show
that the control architecture is effective in determining the robot’s optimal assistive behaviours
for a memory game interaction and a meal assistance scenario.
iii
Acknowledgments
I would like to thank my advisor, Professor Goldie Nejat, for her guidance and support of my
research work, and my M.A.Sc. thesis committee for their time and input. I would also like to
thank the following undergrad students for their valuable contributions to this project: Nelson
Tran, Bijan Shahriari, Jingcong Chen, Greg Jhin, Howard Tseng, John Adler, Sean Feng, Kelly
Payette, Andy Tseung, Clarence Leung, Ray Zhao, Manav Agarwal, Amy Do, and Adib Saad.
Lastly, I would like to thank all of my lab mates, friends, and family for their motivation,
encouragement, and support during the two year period of my research.
iv
Table of Contents
Acknowledgments .......................................................................................................................... iii
Table of Contents ........................................................................................................................... iv
List of Tables ................................................................................................................................. vi
List of Figures ............................................................................................................................... vii
Chapter 1 Introduction .................................................................................................................... 1
1.1 Motivation ........................................................................................................................... 1
1.1.1 Cognitive Interventions ........................................................................................... 1
1.2 Robots as Assistive Aids for Cognitively Impaired Persons .............................................. 4
1.2.1 Social Robots as Therapeutic Aids in Cognitive Interventions .............................. 4
1.2.2 Activity Guidance Systems for Cognitively Impaired Persons .............................. 5
1.2.3 The Socially Assistive Robot Brian 2.0 .................................................................. 6
1.3 Problem Definition .............................................................................................................. 7
1.4 Proposed Methodology and Tasks ...................................................................................... 7
1.4.1 Literature Review .................................................................................................... 7
1.4.2 Design of Control Architecture for Socially Assistive Robots ............................... 8
1.4.3 Learning-based Decision Making for the Behaviour Deliberation Module ........... 8
1.4.4 Implementation ....................................................................................................... 8
1.4.5 Conclusion .............................................................................................................. 8
Chapter 2 Literature Review ........................................................................................................... 9
2.1 Development of a Socially Assistive Robot for HRI .......................................................... 9
2.1.1 Learning Strategies for Socially Intelligent Robots ................................................ 9
2.1.2 Strategies for Addressing Uncertainty in Social HRI ........................................... 11
Chapter 3 Design of Control Architecture for Socially Assistive Robots .................................... 13
3.1 Proposed HRI Control Architecture .................................................................................. 13
3.1.1 Methods of Inter-module Communication ............................................................ 14
3.1.2 Addressing Uncertainty ........................................................................................ 15
3.2 Memory Game Scenario ................................................................................................... 15
3.2.1 The Card Game of Memory .................................................................................. 17
3.2.2 Control Architecture for the Memory Game Scenario ......................................... 17
3.3 Meal-time Scenario ........................................................................................................... 28
3.3.1 Control Architecture for Meal-assistance Robot .................................................. 30
Chapter 4 Learning-based Decision Making for the Behaviour Deliberation Module ................. 42
v
4.1 Behaviour Deliberation Module ....................................................................................... 42
4.2 Model of HRI Scenario ..................................................................................................... 42
4.3 MAXQ Hierarchical Reinforcement Learning ................................................................. 43
4.3.1 Task Decomposition ............................................................................................. 43
4.3.2 Value Function Decomposition ............................................................................ 44
4.3.3 MAXQ Learning Algorithm ................................................................................. 44
4.4 Memory Game Scenario ................................................................................................... 45
4.4.1 Knowledge Clarification Layer ............................................................................. 45
4.4.2 Intelligence Layer ................................................................................................. 45
4.5 Meal-time Scenario ........................................................................................................... 54
4.5.1 Knowledge Clarification Layer ............................................................................. 54
4.5.2 Intelligence Layer ................................................................................................. 55
Chapter 5 Experiments .................................................................................................................. 64
5.1 Memory Game Scenario ................................................................................................... 64
5.1.1 Performance Assessment: Control Architecture ................................................... 64
5.1.2 HRI Study: Activity Engagement ......................................................................... 67
5.1.3 Performance Assessment: Learning-based Decision Making ............................... 70
5.1.4 HRI Study: Minimizing Task-Induced Stress ....................................................... 76
5.2 Meal-time Scenario ........................................................................................................... 80
5.2.1 Performance Assessment ...................................................................................... 80
5.2.2 Human-Robot Interaction Studies ......................................................................... 91
Chapter 6 Conclusion .................................................................................................................... 97
6.1 Summary of Contributions ................................................................................................ 97
6.1.1 Control Architecture for Socially Assistive Robots .............................................. 97
6.1.2 Learning-based Robot Assistive Behaviours ........................................................ 97
6.1.3 Metrics Explored for the Evaluation of HRI ......................................................... 98
6.2 Discussion of Future Work ............................................................................................... 98
6.3 Final Concluding Statement .............................................................................................. 99
References ................................................................................................................................... 100
Appendix ..................................................................................................................................... 106
A.1 List of My Publications ....................................................................................... 106
vi
List of Tables
Table 1: Background Subtraction Method .................................................................................... 23
Table 2: List of Recognized Questions for the Memory Game Scenario ..................................... 25
Table 3: Task-based User States ................................................................................................... 26
Table 4: Robot Emotional State for Memory Game Scenario ...................................................... 27
Table 5: Facial Action Units for Angry [84][86] .......................................................................... 36
Table 6: Activity State Parameters for the Meal-time Scenario ................................................... 38
Table 7: Robot Emotional State for Meal-assistance Robot ......................................................... 41
Table 8: Examples of Primitive Robot Actions for Memory Game ............................................. 46
Table 9: State Functions for Memory Game Scenario .................................................................. 47
Table 10: Bi-gram User Simulation Model (Memory Game Scenario) ....................................... 50
Table 11: Speech Recognition Rates ............................................................................................ 50
Table 12: Card Identity Detection Rates ....................................................................................... 51
Table 13: Detection Rates for the Number of Cards Flipped Over .............................................. 51
Table 14: Task Termination Conditions for Meal-time Scenario ................................................. 56
Table 15: State Functions for Mealtime Scenario ......................................................................... 57
Table 16: Examples of Primitive Actions for Mealtime Scenario ................................................ 58
Table 17: Human User Model for Meal-time Scenario ................................................................ 60
Table 18: Sensor Error Model (Meal-time Scenario) ................................................................... 61
Table 19: Activity State Identification Results (Memory Game Scenario) .................................. 66
Table 20: Robot Emotion-based Behaviour Selection and Execution Results ............................. 66
Table 21: Engagement Results (Memory Game Scenario) ........................................................... 69
Table 22: Robot Behaviours Effective at Relieving Stress ........................................................... 80
Table 23: Performance Results for Activity State Module (Meal-time Scenario): Sensitivity .... 83
Table 24: Performance Results for Activity State Module (Meal-time Scenario): Specificity .... 83
Table 25: User State Module Performance (Meal-time Scenario): Sensitivity ............................ 84
Table 26: User State Module Performance (Meal-time Scenario): Specificity ............................ 84
Table 27: Performance of Behaviour Deliberation Module (Meal-time Scenario) ...................... 84
Table 28: Construct Definitions [94] ............................................................................................ 92
Table 29: Users’ Acceptance Questionnaire [94] ......................................................................... 93
Table 30: Users’ Acceptance Results ........................................................................................... 95
Table 31: Robot Behaviours Effective at Engaging the Person in the Meal-time Scenario ......... 96
Table 32: Most Liked Robot Characteristics ................................................................................ 96
vii
List of Figures
Figure 1: Brian 2.0 .......................................................................................................................... 7
Figure 2: Control Architecture for a Socially Assistive Robot ..................................................... 13
Figure 3: Brian 2.0 in a Memory Game Scenario ......................................................................... 16
Figure 4: Sensory System for the Memory Game Scenario ......................................................... 18
Figure 5: Keypoint Identification for the Memory Game ............................................................. 20
Figure 6: Card Cluster Identification: a square is drawn symmetrically around keypoint pij (red
dot within the square) and expands in the directions denoted by the arrows. .............................. 21
Figure 7: Card Recovery: The picture card in the game (right) is matched to the database card
(left). (a) The picture card in the game is upright, (b) the picture card in the game is rotated, and
(c) the picture card in the game is partially obstructed by a person’s fingers. ............................. 22
Figure 8: Division of camera image into a 4x4 grid ..................................................................... 23
Figure 9: Brian 2.0 in a (a) happy state, (b) neutral state, and (c) sad state. ................................. 27
Figure 10: Meal-assistance Robot ................................................................................................. 30
Figure 11: Sensory System for the Meal-assistance Robot .......................................................... 31
Figure 12: Activity Sensing System (Note: load cells are under the side dish and cup) .............. 32
Figure 13: Meal Tray Sensing Platform Schematic (Courtesy of Amy Do) ................................. 33
Figure 14: Clip-on Device for Utensil Position Sensing .............................................................. 33
Figure 15: Horizontal face orientation: (a) facing right, (b) facing center, and (c) facing left. .... 36
Figure 16: Vertical face orientations: (a) facing down, (b) facing level, and (c) facing up. ........ 36
Figure 17: Two regions of interest for eyebrow sensing .............................................................. 37
Figure 18: Detection of the eyebrow and its slope: (a) neutral face and (b) angry expression..... 37
Figure 19: Hierarchical task graph for the memory game scenario (primitive robot actions on the
bottom row are defined in Table 8). .............................................................................................. 46
Figure 20: Comparison of MAXQ and flat Q-learning for the Memory Game Scenario ............. 53
Figure 21: Task Decomposition for Meal-time Scenario ............................................................. 55
Figure 22: Flowchart of Human Actions for Meal-time Scenario ................................................ 59
Figure 23: MAXQ vs. Flat-Q Comparison for Meal-time Scenario ............................................. 62
Figure 24: Experimental Set-up for Performance Assessment (Memory Game Scenario) .......... 65
Figure 25: (a) Baseline Scenario and (b) Robot Interaction Scenario .......................................... 68
viii
Figure 26: Examples of robot behaviour during interactions: (a) Robot providing celebration in a
happy emotional state after a correct match, (b) Robot providing help in a neutral state, and (c)
Robot providing instruction in a sad state when game disengagement occurs. ............................ 70
Figure 27: Experimental Setup for Evaluation of Learning-based Decision Making .................. 71
Figure 28: Participant user states detected during the memory game .......................................... 73
Figure 29: Interaction details for all participants .......................................................................... 74
Figure 30: Rewards for the Flip Over subtask .............................................................................. 75
Figure 31: Reward for Help subtask ............................................................................................. 76
Figure 32: (a) Baseline Scenario and (b) HRI Scenario ............................................................... 77
Figure 33: The percentage of the interaction that a participant is stressed ................................... 78
Figure 34: Comparison of the percentage of the interaction that a participant is in a stressed or
positive state during the HRI scenario .......................................................................................... 79
Figure 35: Examples of robot behaviours during interactions: (a) Robot providing help in a
neutral emotional state, (b) Robot providing celebration in a happy state after a correct match,
and (c) Robot providing instruction in a sad state when game disengagement occurs. ................ 79
Figure 36: Experimental Setup for Performance Assessment (Meal-time Scenario) ................... 82
Figure 37: Details of interactions involving the main dish ........................................................... 86
Figure 38: Details of interactions involving the beverage ............................................................ 87
Figure 39: Details of interactions involving the side dish ............................................................ 88
Figure 40: Rewards for the Obtain food from main dish subtask ................................................. 89
Figure 41: Rewards for the Pick up beverage subtask .................................................................. 89
Figure 42: Rewards for the Obtain food from side dish subtask ................................................... 90
Figure 43: Rewards for the Eat food subtask ................................................................................ 90
Figure 44: Rewards for the Drink beverage subtask .................................................................... 90
Figure 45: (a) Baseline Scenario A and (b) Scenario B for HRI Study (Meal-time Scenario) ..... 91
1
Chapter 1 Introduction
1.1 Motivation
Cognitive impairment progressively diminishes a person’s memory, orientation, verbal skills,
visuospatial ability, abstract reasoning and attentional skills [1]. Impairment can range from mild
to severe and is associated with many disorders and disabilities that are either present at birth or
acquired later in life, i.e. as a result of an illness or accident. Common causes include brain
injury, autism spectrum disorder, learning disorders, dementia, and substance dependencies. In
particular, for the elderly population, the risk of developing cognitive impairment increases as
one ages and due to the world’s rapidly growing elderly population, dementia is becoming
progressively more prevalent worldwide. In 2010, there was an estimated 35.6 million people
living with dementia and by 2050, it is predicted that an estimated 115.4 million people
worldwide will suffer from cognitive impairment related to the condition [2]. The rapid increase
in people suffering from dementia poses considerable medical, social, and economic concerns as
it impacts individuals, families, and healthcare systems. The total worldwide cost of dementia in
2010 was estimated to be $604 billion USD, which includes the direct costs of medical and
social care in addition to the costs of informal care provided by family members [2].
With dementia, the ability to independently initiate and perform daily activities can be
compromised as specific cognitive abilities such as activity planning, problem solving, self-
initiation, attention, and memory can all be severely affected [3]. If a person is incapable of
performing these activities, assistance from other people and/or mechanical devices may be
necessary. The prevalence rate for elderly persons who have difficulties performing basic daily
tasks increase significantly with advancing age and are especially high for persons aged 85 and
over [4]. There is no cure for dementia, but there is hope that the use of pharmacological
interventions as well as cognitive, social, and physical interventions may help reduce the decline
of or improve brain functioning in people suffering from this condition.
1.1.1 Cognitive Interventions
Currently, a growing body of research supports the effectiveness of using non-pharmacological
interventions to reduce the decline of or improve brain functioning in people suffering from
2
dementia. Based on recent human studies, sustained engagement in cognitively stimulating
activities has been found to impact the neural structure in humans [5]-[6].
There is a considerable amount of literature documenting the positive effects of cognitive
training programs on the cognitive functioning of older adults [7]-[12]. These training programs
include the adult development and enrichment project (ADEPT) and the advanced cognitive
training for independent and vital elderly (ACTIVE). ADEPT is designed to improve the
reasoning capability of older adults through training activities which include solving pictorial
reasoning and numerical pattern identification problems. Results from studies using the program
have shown significant improvement in reasoning abilities after five, one-hour training [7],[13].
The objective of the ACTIVE program is to design training interventions that can improve the
performance of older adults in cognitive-based daily functioning tasks. Training activities
include mnemonic strategies, and pattern and object identification. The interventions have shown
to have positive effects on memory, reasoning and speed of processing [12],[14].
Other training programs that currently exist have focused solely on the use of computer-based
interventions, i.e. [15]-[18]. For example, Hofmann et al. [15] utilized a touch screen based
computer program simulating a walk into the center of a simulated town for patients suffering
from mild to moderate Alzheimer's disease. Patients were instructed to "move through"
simulated scenes by touching the correct picture shown on the screen and to complete tasks such
as shopping or answering multiple-choice questions. Tárraga et al. [16] used the interactive
internet-based computer program, Smartbrain, for interventions with people suffering from mild
Alzheimer's disease. The program provides stimulation exercises across the domains of memory,
attention, orientation, recognition, language, calculation and executive functions. Günther et al.
investigated the effects of computer-assisted cognitive training on older adults via the software,
Cognition I [17]. The software includes exercises involving anagrams, reading comprehension,
mental arithmetic, and the reading and setting of an analogue clock. With the aid of computers,
cognitive training programs are relatively easier to implement and sustain; however, they have
yet to be conclusively proven to be effective for the cognitively impaired and are inflexible to the
addition of crucial complementary therapies such as social interventions, which are also
beneficial in maintaining or improving brain functioning [19]-[22].
3
During meal-times in a long-term care facility, personal support workers (PSWs) provide one-
on-one assistance and, if necessary, feed the individual in order to provide social and cognitive
stimulation during the interaction. However, this becomes challenging to implement as PSWs
become overwhelmed with providing individual care to so many people during a short meal-
time, in addition to performing other tasks. The consequences of an inadequate number of
knowledgeable and well-trained staff during meal-times may include the negligence of the social
dimensions of meal-time, residents not receiving the necessary assistance and residences being
fed forcefully [23]. Moreover, if the residents do not consume an adequate amount of food
during meal-times, serious health problems such as malnutrition may arise. Malnutrition is
defined as faulty or inadequate nutritional status, undernourishment characterized by insufficient
dietary intake, poor appetite, muscle wasting, and weight loss [24]. Malnutrition is a serious
problem amongst the elderly living in long-term care facilities as it contributes significantly to
morbidity, decreased quality of life and mortality. Namely, malnourished elderly patients have
longer hospital stays, 2 to 20 times more health complications than healthy older adults, frequent
re-admissions to hospitals, and delayed recovery times [24]. Therefore, it is imperative to
investigate new methods to effectively promote independent eating habits and ways to aid PSWs
in addressing the needs of elderly persons during meal-times.
The aforementioned cognitive interventions show potential in reducing the decline of,
maintaining, or even improving cognitive and global functioning in persons suffering from
cognitive impairment. However, more research is required as these initiatives still have
inadequate ecological validity and unproven outcomes due to the fact that they lack the
experimental evidence needed to assess their effectiveness. Moreover, the implementation and
sustaining of such therapeutic measures on a long-term basis can be very difficult and time-
consuming for already busy healthcare staff as they require considerable resources and people.
Because of fast-growing demographic trends, the available care needed to provide supervision
and coaching for cognitive therapy is already lacking and on a recognized steady decline [25].
Therefore, there exists an urgent need to further investigate the potential use of cognitive training
interventions as a tool to aid the rapid growing numbers of people suffering from dementia.
4
1.2 Robots as Assistive Aids for Cognitively Impaired Persons
Recently, due the fast growing elderly population in the world, there has been great interest in
the development of social robots [26]-[31] and systems [32]-[35] as aids for cognitively impaired
persons in leisure and daily living activities. The aim of these robots and systems is to aid
healthcare workers in a variety of different scenarios by providing the necessary attention,
cognitive and social stimulation, and guidance to cognitively impaired persons who may not
otherwise receive the necessary care. The following subsections discuss various social robots and
systems utilized for engaging cognitively impaired individuals in cognitively and/or socially
stimulating activities.
1.2.1 Social Robots as Therapeutic Aids in Cognitive Interventions
To date, only a handful of research groups have focused on developing life-like social robots to
engage different individuals in varying socially and/or cognitively stimulating activities [26]-
[31]. For example, the seal-like robot Paro, [26], has been designed to engage elderly persons,
including those with dementia, in animal therapy scenarios by learning which of the robot’s
behaviours (i.e., moving its body parts and making seal sounds) are desired by the way a person
pets, holds, or speaks to it. In [27], the wirelessly controlled robotic dog, AIBO, performs dog-
like actions such as fetching objects and chasing a ball to engage persons with dementia in card
and ball games designed to improve memory, control of emotions and social skills. Bandit II,
[28], engages a person with dementia in a music game by providing assistance and
encouragement via a pre-recorded human voice, social cues like applauding, and other human-
like body movements. The game is designed to improve or maintain cognitive attention. The
robot also adapts its behaviour to a person’s task performance and disability level. KASPAR, a
child-sized tele-operated humanoid robot engages an autistic child to play imitation games by
displaying various facial expressions, waving its hand, and drumming on a tambourine [29].
Keepon, a small soft yellow snowman-like robot, is designed to perform emotional and attention
exchange with children suffering from developmental delays/disorders as well as normally
developing children. The robot acquires attention by orienting its face to a person. It expresses
emotions with sounds and by rocking or bobbing its body [30]. Lastly, the IROMEC (Interactive
Robotic Social Mediators as Companions) robot is designed to encourage the development of
communication, motor, cognitive, sensory and social interaction skills for autistic children via
5
various interactive play scenarios. The robot engages a child in an activity with its graphical user
interface, buttons and wireless switches [31].
1.2.2 Activity Guidance Systems for Cognitively Impaired Persons
Currently, a few automated computer-based activity guidance systems have been developed to
aid elderly persons in various activities of daily living. The aim of these systems is to reduce the
dependence of elderly persons to caregivers by monitoring their actions and providing prompts
to guide the person through the steps of an activity. For example, Hoey et al. have created a real-
time vision-based system to assist a person with dementia with washing his/her hands. Via video
inputs, assistance is given in the form of verbal or visual prompts, or through the enlistment of a
human caregiver's help if necessary [32]. In [33], the Erroneous Plan Recognition (EPR) system
monitors a person with dementia during meal-time and determines if he/she has executed a
correct or erroneous action according to a pre-defined plan. Pressure sensors, near-field Radio
Frequency Identification (RFID) antennas, Pyroelectric InfraRed (PIR) sensors, reed switches,
and accelerometers are deployed in the dining area and the kitchen to detect actions such as the
opening of cupboards and bringing food from a plate to the mouth. If the person performs an
erroneous action, the system provides audio and visual prompts to correct the action. Si et al.
have developed a guidance system for older people with dementia to support them in activities
such as tea making, shaving, and self-hairstyling [34]. Attached pressure sensors or
accelerometers are utilized to sense the usage of the tools relevant to the activity. The system is
designed to first learn the person’s routine (i.e. the order in which he/she uses the tools) and then
provide prompts in the form of blinking red or green LED lights to guide him/her through the
learned routine. In [35], the Pearl robot was designed to assist elderly individuals with mild
cognitive and physical impairments in their daily activities by providing appointment reminders
and information via audio prompts as well as physically guiding them to their appointment. The
robot uses information obtained through navigation (laser range-finder) and interaction sensors
(speech recognition and a touch-screen). The aforementioned systems demonstrate the potential
in effectively guiding a person through a daily activity via instructive prompts; however none of
these systems have investigated the potential benefits of having a human-like embodied robotic
system.
6
1.2.3 The Socially Assistive Robot Brian 2.0
One of the main objectives of the Autonomous Systems and Biomechatronics Laboratory
(ASBLab) is to provide insight into the use of innovative robotic technologies to manage mild to
moderate dementia. One particular type of robot being developed is socially assistive robots,
which can provide assistance to individuals through social and cognitive interaction.
The initial work of the ASBLab in this area focused on developing the human-like socially
assistive robot Brian, who provides monitoring, reminders, and companionship to individuals in
social human-robot interaction (HRI) scenarios [36]-[38]. A person’s accessibility level towards
Brian, as determined by his/her body language and the assistive tasks to be accomplished, was
utilized by the Q-learning algorithm to determine the robot’s appropriate assistive behaviour.
Brian’s behaviour was then displayed through both verbal (e.g., speech) and non-verbal (e.g.,
facial expressions, body language) communication means.
Currently, the ASBLab is developing the next generation robot, Brian 2.0, Figure 1. Brian 2.0 is
being designed as a tool to engage people suffering from dementia in personalized cognitive
interventions in order to reduce their dependence on healthcare workers as well as provide them
with an avenue to interact and socialize during the course of these activities. The significance of
using a human-like social robot lies in the ability to directly incorporate a person’s existing
capabilities to communicate naturally as well as his/her ability to understand these forms of
communication. Namely, the robot will tap into remnants of already existing communication
skills of a person suffering from dementia in order to provide effective guidance as well as
cognitive and social stimuli. Another objective of this research is to study how these robots can
contribute to therapeutic protocols aimed at improving or maintaining residual social, cognitive,
affective, and global functioning in persons with dementia. Finally, the ASBLab aims to
ultimately make cognitive interventions more accessible to residents in long-term care facilities
through the aid of socially assistive robots.
7
Figure 1: Brian 2.0
1.3 Problem Definition
This thesis focuses on the development of a novel learning-based HRI control architecture for
Brian 2.0. This control architecture will enable the robot to effectively engage an individual in
one-on-one person-centered scenarios and provide task assistance as needed. In particular, the
architecture allows Brian 2.0 to be a social motivator that provides a variety of assistive,
instructive, encouraging, orienting, and celebratory prompts during the course of an activity. A
hierarchical reinforcement learning (HRL) approach is used in the architecture to provide the
robot with the ability to: (i) learn appropriate assistive behaviours based on the structure of the
activity, and (ii) personalize an interaction based on human actions/affect during HRI. The
control architecture will be implemented for a Memory Game Scenario and a Meal-time
Scenario.
1.4 Proposed Methodology and Tasks
The overall design of the learning-based control architecture for Brian 2.0 comprises of the
following components with corresponding reference to the Thesis Chapters:
1.4.1 Literature Review
In Chapter 2, literature review for the following two areas, which are critical to the development
of intelligent socially assistive robots, is presented: (i) social intelligence and (ii) strategies for
addressing uncertainty in social HRI.
8
1.4.2 Design of Control Architecture for Socially Assistive Robots
In Chapter 3, the overall design of a novel control architecture is presented. The control
architecture will be first discussed in terms of the general functionality of each module within the
architecture, the different types of inter-module communication systems that can be utilized, and
how uncertainty is addressed. The implementation of the control architecture for the Memory
Game Scenario and the Meal-time Scenario will also be presented. Namely, the specific
functionality of each module as it pertains to each HRI scenario will be shown, as well as which
inter-module communication systems are utilized.
1.4.3 Learning-based Decision Making for the Behaviour Deliberation Module
In Chapter 4, the detailed design of the Behaviour Deliberation module is presented. A brief
background of Markov Decision Processes (MDPs) and the MAXQ HRL technique is first
presented. Then, the application of the developed MAXQ HRL technique on the Memory Game
Scenario and the Meal-time Scenario is shown. Namely, for each scenario, the task
decomposition, the state and action definitions, and a two-stage training procedure are proposed.
The 1st training stage determines the appropriate behaviours for the robot based on the structure
of the activity and the 2nd
stage focuses on developing personalized interactions based on human
actions/affect during HRI.
1.4.4 Implementation
In Chapter 5, extensive experiments are presented to evaluate the control architecture and its
learning-based decision making capabilities for the Memory Game Scenario and the Meal-time
Scenario. Two types of experiments were conducted for each scenario. Namely, the 1st set
assessed the performance of the key modules within the control architecture. The 2nd
set studied
the effect of the robot’s behaviours during HRI scenarios. Discussions to illustrate the
effectiveness of the proposed designs are also presented.
1.4.5 Conclusion
Lastly, Chapter 6 presents concluding remarks on the development of the control architecture,
highlighting the main contributions of the thesis and future work.
9
Chapter 2 Literature Review
2.1 Development of a Socially Assistive Robot for HRI
In order for robots to effectively engage human participants in different types of interactions,
they must possess the necessary intelligence to adapt to human behaviours in each type of
interaction. Furthermore, the robot should be capable of dealing with uncertainty due to
incomplete and inconsistent sensory data and non-deterministic human actions/behaviours. The
1st subsection will discuss the learning strategies that have been utilized by socially intelligent
robots to adapt to various social HRI settings. The 2nd
subsection will discuss strategies for
addressing uncertainty in social HRI.
2.1.1 Learning Strategies for Socially Intelligent Robots
It is envisioned that robots will need to have social intelligence in order to be effectively
integrated into human society. Social intelligence allows a robot to share information with, relate
to, and interact with humans. HRI research involves empowering a robot with the social
functionalities needed to engage human participants in different types of interactions. In order to
be socially intelligent, robots must be able to [39]: (i) perceive and interpret human activity and
behaviour, (ii) respond in a natural, appropriate, and believable manner, (iii) display
understandable social cues such as the expression of emotions, and (iv) operate at human
interaction rates. A number of these characteristics will need to be formulated via the study and
development of social learning capabilities for robots.
In general, learning strategies can be utilized by robots in order to enable them to adapt to social
HRI settings. Recently, a number of socially intelligent robots have been developed that are
capable of learning their behaviours for social HRI scenarios. A common approach has been to
utilize reinforcement learning (RL) strategies to solve HRI control problems that are modeled as
either a Markov decision process (MDP) [36]-[38],[40]-[43] or partially observable Markov
decision process (POMDP) [44], where the latter deals with noise and state uncertainty.
In [40], the Leonardo robot used a Q-learning approach to learn to turn on/off a set of buttons
through a turn-taking socially guided interaction. The interaction consisted of verbal instructions
10
such as “press” and “look”, task feedback such as “good” and “not quite” provided by a person
as well as the person physically demonstrating how to press the buttons. In [41], the HOPE-3
robot used Q-learning in order to learn various social body gestures through imitation. Namely,
Q-learning was used to extract optimal symbolic postures from a human and incorporate
interpolation techniques for generating the same postures on the robot. Prommer et al., [42], used
the Watkins’ Q(λ) method to manage the dialog of an early-stage robot bartender whose main
goal was to identify an object of interest, such as a bottle, plate or cup, by asking a human
customer a series of questions. The STAIR home/office robotic assistant used RL to learn the
optimal dialogue for a speaker identification scenario where the robot attempts to identify the
person with whom it was interacting through a set of questions [43]. Other approaches have
focused on utilizing policy gradient reinforcement learning (PGRL) when there is no obvious
notion of state, i.e., [28],[45]. In particular, PGRL was used in [28] to determine Bandit II’s
behaviour in terms of the amount and speed of its movements, and the type of help it should
provide based on a person’s task performance. In [45], PGRL was used to tailor a mobile robot’s
behaviour to a person’s personality during post-stroke rehabilitation exercises. Namely, learning
was used to determine the mobile robot’s behaviour in terms of interaction distance, robot speed,
and vocal content, based on the person’s introversion-extroversion level.
Hierarchal reinforcement learning (HRL) methods have also been proposed for HRI scenarios. In
the case of HRL, the decision making problem is decomposed into a collection of smaller sub-
problems so that they can be solved more efficiently [46]. This results in faster learning as the
value function requires less data to be learned. For example, in [35], a hierarchical POMDP
approach was implemented in the dialog-based guidance task of the Pearl robot in order for the
robot to perform tasks such as reminding a person of an appointment, navigation and/or
information assistance. The control policy was computed off-line, hence, during task execution,
the controller simply looked up the appropriate robot action to be implemented.
As an alternative to RL, neural networks (NN) have also been implemented for learning in social
HRI scenarios, i.e., [47],[48]. For example, in [47], an NN was used to teach the receptionist
robot Arisco to convey appropriate communicative behaviours such as facial expressions based
on various input stimuli including clapping, color, movement, speech, IR signals, and a human
face. Breazeal et al. also used an NN to enable Leonardo to imitate facial expressions of a human
11
[48]. Correspondence between perceived facial features from the human and the robot’s own
facial features was determined through learning.
2.1.2 Strategies for Addressing Uncertainty in Social HRI
For our social HRI application, the robot should be capable of dealing with uncertainty due to
incomplete and inconsistent sensory data and non-deterministic human actions/behaviours.
Uncertainty from sensor data can be addressed at the sensor data processing level
[42],[35],[49],[50], and/or at the decision making level of the robot’s control architecture
[42],[35],[49],[51], whereas, uncertainly resulting from non-deterministic human behaviours is
usually addressed at the robot decision making level [42],[44],[35],[49],[51].
The advantage of resolving sensor data uncertainty at the sensory data processing level is that
dedicated sensor-specific algorithms can be developed to directly deal with uncertainties and
noise acquired from raw sensor readings, resulting in a more accurate representation of the state
of the interaction prior to the decision making process. Common practices in addressing
uncertainty include the utilization of data filtering and data fusion techniques. For example, in
[50], the tour-guide robot RoboX deployed at the Swiss National Exhibition fused both visitors’
dialogue with the presence of visitors in close proximity of the robot (determined by a laser
scanner) via a Bayesian network framework in order to determine a visitor’s intention (e.g. if the
visitor wants to know more about the exhibit or go see the next exhibit). Namely, the laser
scanner was utilized to reduce the speech recognition errors that arise from noisy environments
consisting of crowds of people and other moving robots.
At the decision-making level, stochastic non-deterministic Decision Theory
[42],[44],[35],[49],[51] techniques are a popular approach for robot decision-making as they can
take into account the non-deterministic nature of social HRI scenarios. Non-deterministic
approaches allow for multiple paths to be taken from a given starting point. Some paths arrive at
the same outcome and some arrive at different outcomes. Nonetheless, all outcomes are valid
regardless of the choices that are made during execution. Typically, an MDP [42] or POMDP
[44],[35],[49],[51] approach is taken, where the latter approach deals directly with state
uncertainty.
12
Unlike at the sensory data processing level, an in-depth analysis of uncertainties acquired from
actual sensor readings cannot be performed at the decision-making level as all sensor inputs are
treated the same. However, techniques unique to the decision-making level such as state
prediction and error correction methods can be equally effective when dealing with sensor data
uncertainty. For example, state prediction methods used with a POMDP model process inputs
from the sensory level into a belief state by using observation and transition models in a
Bayesian update step [49]. On the other hand, error correction methods can be implemented by
incorporating state validation questions [42],[35],[51] or by repeating a recognition action [49].
For example, state prediction and error correction methods using clarification questions were
utilized within a POMDP model to resolve speech recognition [35],[51] and self-localization
[51] uncertainty.
Hybrid approaches also exist, for which uncertainty can be resolved at both the sensor data
processing level and at the decision making level. For example, in [49], Schmidt-Rohr et al. used
a feature filter system at the sensory data processing level to handle the abstraction of multi-
modal perception and sensor uncertainty for a service robot designed to identify if a person
would like a cup and then fetch the cup to place it where the person chooses. Each filter
processed sensory data from one or more sensors in a single sensor group (i.e. object
localization, human speech, and human activity) into a belief state probability distribution. The
output of these filters was then merged via Bayesian forward filtering into a single belief state
that was used at the robot’s decision making level, where simplified state prediction was
performed symbolically within a discrete POMDP model.
Uncertainty resulting from non-deterministic human actions/behavior can also be addressed
using a probabilistic user model for the training of MDPs and POMDPs. A simple stochastic
approach for user modeling is the n-gram model, [52], which predicts human response based on
the last n number of system actions. The advantages of the n-gram model is that it allows for the
multiplicity (i.e. the person can perform multiple actions at the same time) and multi-modality of
user actions to be modeled. For example, in [42], a bi-gram model was used to represent user
actions defined by a person’s speech and pointing gestures in response to the robot bartender’s
actions. Other user models that have been used specifically in human-computer interaction (HCI)
applications include Levin [53], Pietquin [54], and Hidden Markov [55] models.
13
Chapter 3 Design of Control Architecture for Socially Assistive Robots
3.1 Proposed HRI Control Architecture
A generic learning-based HRI control architecture is proposed to enable the robot to monitor the
person’s user state and behaviour (i.e. verbal and physical actions) during the activity and adapt
its own emotion-based behaviour to the current interactive scenario. A modular design approach
is applied to the overall control architecture, allowing for the addition and/or substitution of
different sensor modalities as needed based on the intended activity. Figure 2 shows an overview
of the HRI control architecture for a socially assistive robot.
Activity State
User State
Speech Recognition &
Analysis
Behaviour Deliberation
Actuator Control
Activity Sensors
User State Sensors
Microphone
Motors
Speakers
Knowledge Clarification
Robot Emotional State
Intelligence
Figure 2: Control Architecture for a Socially Assistive Robot
Sensory information is acquired for: (i) recognizing human verbal actions via a microphone, (ii)
user state recognition via user state sensors (e.g. microphone, heart-rate sensor, camera), and (iii)
activity state monitoring using activity sensors (e.g. camera, load cell). The Activity State
module is used to monitor the state of the activity during the interaction. Human verbal intent is
recognized via the Speech Recognition and Analysis module, while user state is determined
using the User State Recognition module. The Robot Emotional State module uses the person’s
user state and the current assistive action of the robot to determine the emotional state of the
robot. The objective of the emotional state module is to determine which robot emotion will
elicit an appropriate response from the human in order to accomplish a given task while also
responding appropriately to a person’s user state. The Behaviour Deliberation module is the main
decision making module of the architecture and is utilized to determine the robot’s effective
assistive behaviour. This module requires inputs from all four of the aforementioned modules.
14
Robot behaviour is executed by the actuator control module. In particular, the module is
responsible for physically implementing the robot’s behaviour using a combination of speech,
facial expressions and gestures via the appropriate actuator hardware (e.g. speakers and motors).
Within this module, a voice synthesizer is utilized to generate the robot’s voice based on the
robot’s emotion. Sensors, actuators, and the specific functionality of each of the aforementioned
modules can be customized to the chosen activity.
In this work, the proposed generic control architecture is applied to the Memory Game Scenario
and Meal-time Scenario; however, due to its generality, it can be applied to any person-centered
guidance-type activity in HRI scenarios. This control architecture has the potential to provide
socially assistive robots with the necessary intelligence to be effective social motivators for
individuals that need assistance.
3.1.1 Methods of Inter-module Communication
There are two ways that the aforementioned modules communicate with each other in this
control architecture: (i) via a pipeline system or (ii) via a data-pool system. The pipeline system
is utilized for synchronization purposes. Namely, it enables a module to start or pause the
operation of another module by sending commands through the pipeline. Conversely, the data-
pool system is used to store state information, which can be accessed and updated by multiple
modules in the control architecture.
3.1.1.1 Pipeline System
The pipe system consists of unique unidirectional first-in, first-out (FIFO) connections between
the modules in the control architecture via a reserved area of computer memory. Conceptually, a
message can be sent from one module to another through a pipe. Once the message is read, it is
destroyed and no longer present in the pipe. Furthermore, to avoid reading from an empty pipe,
which will cause the module to be suspended by the computer operating system, the receiving
module always checks if there is data present in the pipe before reading form it. The advantage
of the pipe system is that it does not allow memory to be written and read at the same time by
two modules. Moreover, modules that do not need to be continuously running can be paused
until needed.
15
3.1.1.2 Data-pool System
The data-pool system consists of a shared pool of data stored in a reserved area of computer
memory that can be accessed and updated by multiple modules in the control architecture.
Sensory input modules (e.g. Activity State module) can update to the data-pool whenever they
detect a change in state and whenever the Behaviour Deliberation module needs to observe the
state, it queries this data-pool. Flags indicate the current status of the data-pool (i.e. in-use or
free). Before querying a data-pool the module must check the status of the data-pool; otherwise,
if the module queries a data-pool that is in-use, it, and/or the other module using it, may be
suspended by the computer operating system. The advantage of the data-pool system is that
modules can operate in parallel and communicate with each other without running the risk of
being suspended by the operating system.
3.1.2 Addressing Uncertainty
In the proposed HRI control architecture, a hybrid approach is applied to resolve uncertainty at
both the sensor data processing level and at the decision making level. At the sensor data
processing level, sensor-specific algorithms are utilized to obtain the best possible state
representation prior to the decision making process. These algorithms directly deal with
uncertainties and noise acquired from raw sensor readings, resulting in a more accurate
representation of the state of the interaction. At the decision making level, a knowledge
clarification layer, which utilizes clarification dialogue or multi-modal fusion techniques, is
incorporated to reduce state recognition errors. Furthermore, non-deterministic human
behaviours are accounted for by the MAXQ algorithm. On-line training is also utilized to adapt
to non-deterministic scenarios as well as new users.
3.2 Memory Game Scenario
Studies have shown that individuals with dementia residing in long-term care facilities are at a
higher risk for understimulation because they lack the initiative to begin or sustain leisure
activities [56],[57]. Prolonged lack of stimulation can be harmful to these individuals as it can
increase the boredom, apathy, loneliness, and depression that accompany the progression of
dementia [58],[59]. Therefore, engagement is a critical priority for the mental and emotional
16
health of individuals suffering from dementia. Studies have shown that social engagement also
plays an important role in the prevention of dementia [19]-[22].
Cognitive leisure activities are types of activities which individuals engage in for their enjoyment
or overall well-being, and may include writing, puzzles, reading, games, playing musical
instruments and participating in social discussions [60]. The goal is to design a robotic social
motivator to provide interventions that focus on strengthening the remaining cognitive abilities
of a person, while promoting engagement in the cognitively stimulating leisure activity at hand.
During a cognitively stimulating activity, Brian 2.0 is designed to provide the elderly with
opportunities to socialize and interact during cognitively stimulating activities. Two criteria
identified in the literature have been used to design the cognitive intervention that Brian 2.0 can
provide to individuals in order to better engage them in an activity of interest. Firstly, the focus is
on matching the stimuli that is provided by the robot to a person’s skill and interest level, which
has shown to significantly increase a person’s engagement and positive affect [61]. Secondly, the
robot is designed to provide one-on-one social stimuli, which has been identified to be one of the
most engaging forms of stimuli [62].
In this work, the card game of memory has been chosen as the cognitively stimulating activity. In
this scenario, the proposed control architecture will enable Brian 2.0 to effectively engage an
individual in one-on-one person-centered a cognitively stimulating activity, i.e. Figure 3. In
particular, the architecture allows Brian 2.0 to be a social motivator by providing assistance,
encouragement and celebration during the course of an activity.
Figure 3: Brian 2.0 in a Memory Game Scenario
17
3.2.1 The Card Game of Memory
The memory game consists of 16 picture cards turned face down in a 4x4 grid formation. The
objective is for the human player to flip over pairs of cards and match the pictures on the cards
correctly. Once a pair has been matched, the two cards are removed from the game. The game is
over when all cards have been matched. Individuals play the game as single players while the
robot autonomously provides preferred amounts of social stimulation in order to keep these
individuals engaged in the game. The memory functions within the brain that are trained while
playing this card game include the visual object memory and the updating function of the central
executive component of the working memory [63].
For the memory game, the notion of winning is not as crucial as keeping the person stimulated
and engaged in the activity. In order to do this, herein, the focus is on reducing activity-induced
stress of the person. Activity-induced stress is known to result in negative moods, and lead to
disturbances in motivation (e.g., loss of task interest) and cognition (e.g., worry) [64]. Moreover,
stress has been found to progress the symptoms of dementia after its onset [65].
3.2.2 Control Architecture for the Memory Game Scenario
The proposed HRI control architecture for the Memory Game Scenario is presented in the
following subsections. The HRI control architecture focuses on determining the person’s user
state, his/her task performance, and speech during an interaction with Brian 2.0, and adjusting
the robot’s behaviour to reflect the task to be completed given a particular user state.
3.2.2.1 Sensory System
Figure 4 shows the sensory system for the Memory Game Scenario. Sensory information is
acquired for: (i) recognizing human verbal actions via a Logitech noise-cancelling microphone,
(ii) user state recognition via an emWave ear-clip heart rate sensor or a Logitech noise-cancelling
microphone, and (iii) activity state monitoring using a Logitech 1.3MP webcam. The heart rate
sensor is utilized to determine a person’s affective arousal level during activity engagement.
18
Figure 4: Sensory System for the Memory Game Scenario
3.2.2.2 Activity State Module
The objective of this module is to identify and analyze the state of the cards (game state) as a
person plays the memory game. 1.3 Megapixel images taken by webcam are utilized to identify,
locate and track the picture cards during the course of the game. The camera is placed above the
card game and provides a top view perspective of the game set-up. Card recovery errors can arise
during activity state recognition when cards become obstructed. This mainly occurs due to the
temporary presence of human hands. Uncertainty is minimized by capturing and analyzing n
number of images of the same activity state. A probabilistic voting system is then utilized on the
images to determine the current activity state.
The game state is defined by five distinct classifications: (i) start of game s(nf), (ii) zero cards
flipped s(c), (iii) one card flipped s(c,i,l1x,y,lmx,y), (iv) two cards flipped s(c,i,l1x,y,l2x,y,m), and (v)
end of game s(nm). nf is defined as the total number of cards flipped in the game and c represents
the number of cards flipped in a single round, where c = 0,1,2. i represents the identity of a card
as defined by the picture on it, where i = 1 to 8. The locations of the cards that have been flipped
over are defined by l1x,y, l2x,y, respectively. lmx,y represents the pair location of a card that has
been flipped. This location is only known if that pair card was already flipped over in a previous
round of the game. m represents whether a matched pair has been found, and nm represents the
total number of matches a person has found during a game.
Two card recognition and localization approaches were developed to identify the aforementioned
states: (i) a SIFT-based method, and (ii) a colour-based method. The 1st method utilizes the
19
Scale-Invariant Feature Transform (SIFT) technique, developed by Lowe [66], and is more
robust than the 2nd
method as card recognition is invariant to image translation, scaling, and
rotation, as well as partially invariant to illumination changes and affine or 3D projection. The
sampling time (i.e. approximately 10-20 seconds) for this method is slower than the colour-based
method (i.e. approximately 3 seconds). However, interactions with the elderly individuals with
cognitive impairments will be slower and the SIFT-based method will be sufficient. For our HRI
experiments in the lab, we found the colour-based method to be more effective for our
interactions. The SIFT-based method was used for the 1st set of experiments, which are presented
in Sections 0 and 5.1.2. The colour-based method was developed and used for the 2nd
set of
experiments, which are presented in Sections 5.1.3 and 5.1.4.
SIFT-based Method
The overall proposed SIFT-based approach will be discussed herein outlining its most pertinent
steps: Step 1- Identifying keypoints on cards; Step 2- Identifying card clusters; Step 3- Card
recovery; and Step 4- Card matching. By performing the four steps described below, the card(s)
of interest will be successfully located and identified. A 1024x768 resolution camera image is
used in this approach.
Keypoint Identification (Step 1)
In order to identify each of the picture cards as they are flipped over during game playing, a card
recognition and localization approach that utilizes the SIFT technique has been developed to
identify distinctive invariant features on the picture cards captured by the camera. SIFT
transforms an image provided by the camera into a large collection of local feature vectors,
which are called SIFT keypoints. Pairs of picture cards utilized in the memory game have unique
SIFT keypoints, allowing them to be distinguished from each other. Herein, SIFT is utilized to
identify the keypoints on the cards in an image taken by the camera, i.e. Figure 5. The blue dots
in the figure represent the keypoints.
20
Figure 5: Keypoint Identification for the Memory Game
Once the keypoints are identified, the total number of keypoints found in the image, K, is
compared with a Minimum Keypoint Threshold (Kmin). In general, when there are no cards
flipped over, there is a smaller number of keypoints found in the image. These keypoints are
mainly a result of the shadows projected by the edges of the cards as can be seen in Fig. 3.
Conversely, when there are one or more card(s) flipped over, the number of keypoints found in
an image is considerably large due to the texture provided by the pictures on the card(s). This, in
turn, results in K exceeding the threshold Kmin. When this occurs, Step 2 is implemented in order
to cluster keypoints belonging to the same cards together. Otherwise, the module notifies the
Behaviour Deliberation module that there are no cards flipped over.
Card Cluster Identification (Step 2)
A nearest neighbor search algorithm is proposed that defines regions in the 2D images containing
keypoints that may potentially represent picture cards that have been flipped over. The following
sub-steps outline the search algorithm:
Sub-Step 1: A random keypoint, pij , is chosen on the image.
Sub-Step 2: A square of length l and width w is drawn symmetrically around pij to
search for its nearest neighbour keypoints, Figure 6. The objective is to
find a starting point with a large number of nearest neighbours. If the
21
number of nearest neighbours initially found is small (less than a defined
threshold nsmin), Sub-Step 1 is repeated. Otherwise, each side of the square
is extended one by one (in a clockwise fashion starting with the top side)
in order to determine all the nearest neighbours of pij, i.e. Figure 6. The
extension stops when the number of keypoints in an extended area is
below a minimum threshold, nemin.
Once a boundary for the cluster of keypoints has been identified, the cluster is said to represent a
card, whose location is defined by the center of the cluster (xc,yc). This cluster is used to
determine the identification of the card. After the first card has been localized, the keypoints
belonging to this card are removed from the keypoint clustering search. At this time the
remaining keypoints in the image are utilized to determine if a subsequent cluster representing
another card exists.
Figure 6: Card Cluster Identification: a square is drawn symmetrically around keypoint pij
(red dot within the square) and expands in the directions denoted by the arrows.
Card Recovery (Step 3)
The card recovery module is utilized to determine the identity of the card(s) that have been
flipped over. A database of the keypoints for each picture card is utilized to identify the cards
that have been flipped over during the game. Namely, the clusters of keypoints representing
cards found in Step 2 are matched to the database of keypoints that has been defined for each
individual picture card, i.e. Figure 7a. Matching utilizes the Best-Bin-First (BBF) method [66].
This can be achieved in terms of matching the descriptors of the keypoints, which can
22
correspond to finding a set of nearest neighbors (NN) to a query point. Since individual SIFT
keypoints are easily distinguishable, the majority of keypoints can be matched correctly to
identity the card of interest. During game playing, cards may be slightly moved and rotated from
their starting positions or partially obstructed. The robustness of the method enables a card to be
identified in all of these circumstances as long as the card stays within the viewing area of the
camera. Figure 7b shows an example of a rotated card matching correctly and Figure 7c shows
an example of a partially obstructed card matching correctly. Within this step, the identity of the
card is stored along with the location of the card for use in future rounds.
(a)
(b)
(c)
Figure 7: Card Recovery: The picture card in the game (right) is matched to the database
card (left). (a) The picture card in the game is upright, (b) the picture card in the game is
rotated, and (c) the picture card in the game is partially obstructed by a person’s fingers.
Card Matching (Step 4)
The card matching technique checks to see if the two cards that are flipped over match by using
the BBF method to match the card clusters found in Step 2. If the keypoints in the two clusters
are correctly matched, then the cards are considered a match, otherwise it is defined to be a no
match condition.
Colour-based Method
In the colour-based method, the cards are differentiated based on the colour of the features on the
card. Card identification based on colour is a quick and accurate process as long as the colour
content of each card is unique and lighting is controlled as much as possible in terms of intensity
and tint. A 640x480 resolution camera image is used, resulting in a fast (i.e. 3 seconds) analysis
time. The method is invariant to image translation, scaling, and rotation and is sufficient for our
experiments in the lab. The overall proposed colour-based approach, which was developed by
another ASBLab member [67], consists of the following steps:
23
First, the camera image is divided into a 4x4 grid of 16 sections (Figure 8).
.
Figure 8: Division of camera image into a 4x4 grid
Within each section, the colour content is checked using RGB values. If more than half of the
section is detected to be black, the program assumes that there is no card present and moves on
to analyzing the next section. However, if there is a large non-black entity detected in the
section, the presence of a card is inferred. Black and white areas in the image are then subtracted
by setting their pixel values to 0, leaving only the coloured areas, i.e., Table 1. Lastly, to
determine the identity of the card, the RGB and CMYK (i.e., cyan, magenta, yellow and black)
values of these coloured areas are compared with a database of pre-defined colour ranges for
each card.
Table 1: Background Subtraction Method
Camera image Processed Image
Unflipped card
Flipped card
24
3.2.2.3 Speech Recognition and Analysis Module
Human speech is recognized via the Speech Recognition and Analysis module. Recognition is
performed by Julius, a two-pass large vocabulary continuous speech recognition (LVCSR)
decoder [68]. The LVCSR utilizes the person independent VoXForge acoustic model [69], which
is composed of statistical representations, created via Hidden Markov Models, for each phoneme
in the English language to account for persons with different accents and speaking styles. The
acoustic model has been trained using 625 unique voices.
Words are recognized based on their phonemes and their approximate location in an utterance.
The software can also recognize syntax or patterns of words (i.e. sentence structure). When given
a speech input, it searches for the most likely word sequence under constraint of the given
grammar. The sampling period is 625 nanoseconds. At the end of each sample, the program
outputs the words to the speech analyzer. The speech analyzer compares corresponding synsets
to its own database of words which are grouped into nouns and other lexical categories to
identify a match.
The reliability of the spoken utterance is determined using word confidence scores which are
based on a combination of predicator features (e.g., acoustic and language model scores). If the
weighted average of all the confidence scores of an utterance is low, this information is sent to
the Knowledge Clarification layer in the Behaviour Deliberation module in order to resolve the
uncertainty via the robot asking clarification questions.
For the Memory Game Scenario, the LVCSR software has been customized to support the
vocabulary, dialog and action-based context needed during game playing. In particular, the
vocabulary and grammar definitions have been configured with the syntactic constraints of a
“response” or “question” posed to Brian 2.0. Examples of “responses” include “Yes” and “No”.
Table 2 shows a list of “questions” that the Brian 2.0 can recognize during the HRI scenario.
They are categorized into three types of phrases: (i) Localize, (ii) Identify, and (iii) Recall.
25
Table 2: List of Recognized Questions for the Memory Game Scenario
Questions
Localize
Where is the card located?
Where is this card located?
Where's the card located?
Where is the location of the card?
Identify
What is this card?
What is the card?
What is on the card?
Recall
Have I seen this card before?
Have I seen the card before?
Have I seen this card?
Have I seen the card?
Have I seen it already?
3.2.2.4 User State Recognition Module
The User State recognition module is used to determine a person’s user state during game
playing. Two user state detection approaches were developed for the Memory Game Scenario: (i)
verbal intonation and (ii) a combination of affective arousal and activity performance. The 1st
approach was used for the 1st set of performance assessment experiments and HRI studies
(Sections 5.1.1 and 5.1.2). Experimental results showed that this approach provided limited
opportunities in detecting user state during the course of the activity. Namely, a change in user
state (if present) could only be detected when the person spoke, which may not be often during
the memory game. Therefore, a 2nd
approach was developed to allow for a more continuous
monitoring of user state during HRI. The 2nd
approach was used for the 2nd
set of performance
assessment experiments and HRI studies (Sections 5.1.3 and 5.1.4).
Verbal Intonation
Verbal intonation can used to determine user state during HRI. In particular, this approach
utilizes Layered-Voice Analysis (LVA) software developed by Nemesysco Ltd. [70]. LVA uses
wide range spectrum analysis to detect anomalies in brain activity that are revealed by minute
changes in speech waveform. In this work, the software is used to determine the following
affective states during game playing: stressed, bored, neutral or positively excited.
26
The “affect signature” is analyzed for each sentence uttered by the person. Numerical values for
the parameters stress, bored and positively excited are determined by the software for each
utterance. These values are compared with threshold values for each parameter to determine the
dominant affective state of a person. If none of the parameters exceed their threshold, then the
person is determined to be in a neutral state. The thresholds for the parameters are set based on
the input from 100 different experimental trials.
Affective Arousal and Activity Performance
In this approach, user state is determined using a combination of affective arousal and activity
performance. Affective arousal is the intensity with which emotional stimuli are perceived [71].
Herein, a person’s arousal level is based on his/her heart rate value. Heart rate has a long history
of being used as an index of arousal [72]. Heart rate data is gathered from the user during
interaction at a sampling rate of 2Hz. A smoothing algorithm is employed to eliminate outliers
due to sensory noise. Every data point is compared with a threshold of ±4bpm to an average of
four data points before it and an average of four data points after it. The baseline heart rate,
which is an average of 10 valid data points, is acquired before the start of the activity.
Subsequent valid heart rate readings are compared to this baseline, with a threshold of 5bpm, to
determine if the person is in a high or low affective arousal state. Activity performance is
determined by whether or not matching card pairs were found in the previous round of the
memory game by the Activity State module, Table 3. Table 3 has been developed through the
monitoring of numerous user experiments and the acquiring of participant feedback. In these
experiments, the heart-rate sensor was able to detect increased heart rate when a person was
faced with both a stressful and exciting situation in an activity. In the context of the memory
game, stress was directly related to the scenario when a matching card pair could not be found
and excitement was directly related to matching a pair of cards.
Table 3: Task-based User States
Activity Performance
No Match Match
Arousal High State = 0 (Stressed) State = 3(Excited)
Low State = 1 (Neutral) State = 2(Pleased)
27
3.2.2.5 Robot Emotional State Module
The Robot Emotional State module uses the person’s user state and the current assistive action of
the robot to determine the emotional state of the robot. A finite-state machine approach is used to
match the appropriate robot emotion to a given user state and the robot’s assistive action within
the context of the cognitively stimulating activity. For the memory game, the robot emotional
states are: happy, neutral and sad (Figure 9).
(a)
(b)
(c)
Figure 9: Brian 2.0 in a (a) happy state, (b) neutral state, and (c) sad state.
A summary of the robot’s emotional state for various scenarios are shown in Table 4. When the
person finds a matching card pair and is in an excited state, the robot celebrates with him/her by
being in a happy state. The robot is sad when it has to repeat an instruction after a long period of
waiting. In general, in all cases when the user is stressed, regardless of the robot action to be
implemented, the robot will try to improve the user state of a person by being in a happy state.
For all other cases not mentioned here, the robot’s emotional state is neutral.
Table 4: Robot Emotional State for Memory Game Scenario
Human User
State
Current Robot Assistive Action
Instruct Celebrate Encourage Help
Stressed Happy Happy Happy Happy
Neutral Neutral Neutral Neutral Neutral
Pleased Neutral Neutral Neutral Neutral
Excited Neutral Happy Neutral Neutral
Distracted Sad Neutral Neutral Neutral
28
3.2.2.6 Behaviour Deliberation Module
The Behaviour Deliberation module is the main decision making module within the HRI control
architecture and one of the main tasks of this thesis. This module requires inputs from all four of
the aforementioned modules in order to determine the robot’s effective assistive behaviour via a
MAXQ hierarchical reinforcement learning approach [46]. The detailed design of the Behaviour
Deliberation module will be discussed as it pertains to the robot engaging a person in the card
game of memory in Section 4.4.
3.2.2.7 Inter-module Communication System
Communication between the Activity State module and the Behaviour Deliberation module is
facilitated by the use of a pipe system. A pipeline system is used here in order to minimize the
computational load of Activity State module. In this system, there are two pipes. The Behaviour
Deliberation module requests information from the Activity State module by sending a request
for the current game state via the first pipe. Once the Activity State module receives the request,
it performs the necessary tasks to determine the current game state as previously discussed in
Section 3.2.2.2. The response is then sent to the Behaviour Deliberation module via the second
pipe.
All other inter-module communication is facilitated via the use of the data-pool system. A
designated data-pool is set up for each of the remaining modules. These modules update to their
respective data-pools whenever they detect a change in state and the Behaviour Deliberation
module queries these data-pools whenever it needs to observe the state. Parallel operation of the
modules is important in this application because speech and user state sensing must be performed
continuously and simultaneously. For example, the Speech Recognition and Analysis module
must sense quick actions (e.g., a person answering “Yes” in response to a question), which may
be missed if the module only monitors the activity when requested by the Behaviour Deliberation
module. The User State module monitors heart-rate continuously in order to filter out sensor
noise.
3.3 Meal-time Scenario
Recently, there have been numerous studies [73]-[77] that investigate approaches to prevent
malnutrition by improving eating habits among the elderly. In general, social interaction during
29
meal-time has been found to play an important role at improving dietary intake for the elderly
[73]-[76]. Results from a qualitative study of meal-time in a long-term care facility suggest that
caregivers need to socially interact and encourage meaningful activities for the older person in
order to improve the quality of the meal-time experience and his/her nutritional intake [74].
Addressing the resident was found to be an important aspect of the social interaction. For
example, personal greetings, invitations to sit down, comments about the appearance of the food,
were shown to be effective at engaging the person in the social meal-time event. Nonverbal
exchanges such as smiles and laughter, and occasionally exchanging a few sentences on topics
unrelated to food was also effective at increasing engagement in the activity of eating.
Slowness in the activity of eating can hinder meal-time independence for cognitively impaired
elderly persons. Treatable causes of slowness include drug induced lethargy, easy distractibility,
prolonged chewing related to dry mouth, getting "lost" in repetitive behaviours such as chewing
and forgetting to swallow, and unappetizing food [75]. As an intervention to slowness, Osborn et
al. suggest providing frequent orienting information, verbal reminders, prompts, praise and
encouragement [75]. Hellen [76] suggests that when feeding elderly persons suffering from
dementia, it is important to present them with one food item at a time because they may not be
able to focus on an entire tray of food. Furthermore, they may need to be reminded to eat, chew,
or swallow in order to keep their attention on the meal and prevent choking. Findings from a
study by Coyne et al. confirm that directed verbal prompting and positive reinforcement can
increase the eating performance of elderly persons suffering from dementia [77].
The goal, herein, is to design a robot that can assist personal support workers during meal-times
by acting as a social motivator that promotes independent eating habits during the meal. The
robot’s task is to engage a person in the meal eating activity, while adding a social element to the
meal-time experience. The intelligence of Brian 2.0 is designed in this work to allow the robot
to motivate the person to consume the contents of the meal by providing meal-related cues,
reminders, encouragement, praise and orientating statements via verbal/non-verbal
communication means. Herein, the contents of the meal consist of a main dish, side dish, and
beverage, i.e. Figure 10. The robot will focus the person’s attention to one dish at a time by
referring to a specific dish until that dish is fully consumed. The order in which the dishes are
referred to is based on a meal plan. The meal plan that is implemented is to consume the meal in
this order: main dish, drink, and then side dish. To add a social element to the meal, Brian 2.0
30
will also greet the person by name, invite him/her to sit down, and provide various meal-related
or non-meal-related jokes and comments. Since the robot may potentially interact with elderly
persons with different interaction preferences and/or varying degrees of cognitive impairment, it
also has the ability to personalize its actions based on the person’s task compliance.
Figure 10: Meal-assistance Robot
The Meal-time Scenario was chosen to be the 2nd
HRI scenario because of the following reasons:
(i) eating is an important daily activity for elderly persons suffering from dementia as it is
directly related to their health and quality of life, and (ii) to test the design of the learning-based
control architecture on a less structured, preference-driven, human-robot interaction.
3.3.1 Control Architecture for Meal-assistance Robot
The proposed HRI control architecture for the Meal-time Scenario is described in the following
subsections. The HRI control architecture focuses on determining the person’s user state, his/her
task performance, and speech during an interaction with Brian 2.0, and adjusting the behaviour
of the robot to reflect the task to be completed given a particular user state.
31
3.3.1.1 Sensory System
Figure 11 shows the sensory system for the Meal-assistance robot. Sensory information is
acquired for: (i) recognizing human verbal actions via a Logitech noise-cancelling microphone
(i.e. on the table), (ii) user state recognition via a front-facing 10MP Creative webcam (i.e. on the
robot’s left shoulder) to capture the person’s facial expressions, and (iii) activity state monitoring
using an embedded activity sensing system (i.e. on the table).
Figure 11: Sensory System for the Meal-assistance Robot
The activity sensing system (Figure 12) consists of a meal sensing tray platform and a computer
vision-based utensil tracking system. This system is designed to be easily implemented into
meal-time routines at dining halls in any long-term care facility. Non-contact sensors are utilized
within the system to minimize the disturbance of the sensors to the users and their natural eating
habits. Furthermore, the system is designed to accommodate any type of dishes, cups and
utensils used in the facilities so that it does not require extra preparation from caregivers prior to
meal time.
32
Figure 12: Activity Sensing System (Note: load cells are under the side dish and cup)
Figure 13 shows the schematic of the meal tray sensing platform, which was developed by
another ASBLab member [78]. The platform consists of the following embedded sensors: (i) a
DYMO M10 Postal Scale (i.e. load cells) with a capacity of 10 lb for monitoring the main plate,
and (ii) two pairs of Phidgets shear micro load cells with a 0.78 kg capacity interfaced with a 4-
input Phidgets bridge for the side dish and beverage. The resolution of weight outputs for the
postal scale and the micro load cells is 2g and 1g, respectively. The weight outputs are fed into
the Activity State module of the control architecture. Weight readings from the pair of micro
load cells for the side dish and beverage are averaged. The meal components are physically
confined to rest on a certain area of the tray via tapered supports in order to achieve optimal and
repeatable contact between the dish/cup and the sensors, resulting in an accurate weight sensing
of the food and drink.
33
Figure 13: Meal Tray Sensing Platform Schematic (Courtesy of Amy Do)
The utensil tracking system, which was developed by another ASBLab member [78] consists of:
(i) an IR camera from a Nintendo Wii remote and (ii) two 940nm infrared lights with a 120°
viewing angle. The Wii remote is mounted on the right shoulder of Brian 2.0, i.e. Figure 11.
Communication between the computer and the Wii remote is performed via an ASUS Bluetooth
dongle. The IR camera has a resolution of 1024x768 and can sense up to four infrared lights.
Utilizing the Wiiuse C library [79], the location of the infrared light within the camera image can
be obtained. The infrared lights are mounted on a small circuit board, which is attached to the
utensil via a clip-on attachment, i.e. Figure 14. One light is placed on each side of the circuit
board in order to accommodate left or right handed users.
FRONT BACK
Figure 14: Clip-on Device for Utensil Position Sensing
ON/OFF switch
Infrared Light
Infrared Light
Battery
TOP VIEW
34
3.3.1.2 User State Recognition Module
For the Meal-time Scenario, the following user states are detected: happy, neutral, angry, and
distracted. Herein, user state is determined using a combination of facial orientation and
expression detection techniques. The resolution of the video image utilized for analysis is
640x480 pixels and images are acquired at 30fps. Both techniques require the positional data and
dimensions of the person’s face and its features within the video image. The program used to
detect a person’s face and its features was developed by another ASBLab member [80]. The
presence of a face and its features is detected utilizing Haar feature-based cascade classifiers
[81]. Once the face and its features are located within the image, a tracking algorithm is then
utilized to monitor their positions during the course of the interaction. If the tracking algorithm
loses the face and its features due to them exiting the view of the camera, tracking is disabled
and detection is performed again. The following sections demonstrate how this data is used to
determine the user state.
User State: Happy
An open-source smile detection program [82], which is based on facial expression detection
algorithms developed by Bartlett et al. [83], is used to detect that the person is smiling. The video
image is further downsized to a resolution of 320x240 pixels for this program. The program first
scans video image to detect approximately upright-frontal faces. The faces found are then scaled
into image patches of equal size, convolved with Gabor energy filters, and then passed to a facial
expressions recognition engine. The engine utilizes an Adaptive Boosting learning algorithm to
select a subset of Gabor filters and then trains Support Vector Machines (SVMs) on the outputs
of the Gabor filters. Facial expressions are recognized based on the Facial Action Coding System
(FACS) [84].
User State: Distracted
The distracted user state is defined to be when a person’s face is not oriented towards the robot
or the meal. The person is seated in front of the robot at approximately eye level and the meal is
situated on the table directly in front of him/her. Therefore, the user is distracted when he/she is
looking left, right, or up, with respect to looking straight ahead at the robot. Facial orientation is
detected by monitoring the distances between the following facial features: eyes, nose, and
mouth. For a sampling period of 0.5 seconds, the face orientation is monitored. A voting system
35
is used to determine the person’s detected face orientation for that sampling period. Namely, the
orientation that occurs the most during the sampling period is the detected orientation. The
algorithms utilized to detect if the person is distracted were developed by another ASBLab
member [80].
Horizontal face orientation is found by comparing the horizontal distance between the right eye
and the nose (xRE,N) to the horizontal distance between the left eye and the nose (xLE,N). First, the
difference of these two distances ( ) is determined:
(1)
is then compared to the threshold (i.e. to
determine the horizontal face orientation of the person for a single video image:
Horizontal Face Orientation {
Center
Left
Right
(2)
Vertical face orientation is found by comparing the average vertical distance between the eyes
and the nose (yE,N) to the vertical distance between the nose and mouth (yN,M). Since both
distances are not similar, an offset value is empirically determined through calibration for
each user. First, the difference of these two distances plus the offset ( is determined:
(3)
is then compared to the threshold to determine the vertical face orientation of the
person for a single video image:
Vertical Face Orientation {
Level
Up
Down
(4)
Figure 15 and Figure 16 show the different horizontal and vertical face orientations that can be
detected. For the case when the person’s face turns 45 degrees or more away from the robot and
the frontal face is lost, the person’s profile face will be detected. Since some facial features are
occluded and cannot be detected on the person’s profile face, the aforementioned face orientation
36
detection algorithm cannot be applied. Therefore, the orientation of the profile face will be
assumed to be same as the last detected horizontal face orientation.
(a) (b) (c)
Figure 15: Horizontal face orientation: (a) facing right, (b) facing center, and (c) facing left.
(a) (b) (c)
Figure 16: Vertical face orientations: (a) facing down, (b) facing level, and (c) facing up.
User State: Anger
Anger detection is performed by sensing the key emotion-related facial actions as defined in the
Facial Action Coding System Affect Interpretation Database (FACSAID) [85]. Table 5 shows
the facial action units related to the emotion anger. Based on these facial action units, the
proposed detection algorithm utilizes two indicators to define anger: (i) the decrease of space
between the eyebrow and the eye and/or (ii) the increased slope of the eyebrow relative to the
center of the face (i.e. the eyebrows make a “V” shape). These indictors can be detected by
tracking two points on each eyebrow and the location of each eye.
Table 5: Facial Action Units for Angry [84][85]
Emotion Action Units Description
Anger Brow Lowerer Lowers the eyebrow
Upper Lid Raiser Widens the eye aperture
Lid Tightener Narrows eye aperture
37
Two regions of interest are created to find two points on each eyebrow, i.e. Figure 17. The size
and location of the region is relative to the size and location of the detected eye. The 1st region
(i.e., red rectangle) encompasses the inner portion of the eyebrow and the 2nd
region (i.e., green
rectangle) encompasses the outer region of the eyebrow.
Figure 17: Two regions of interest for eyebrow sensing
Each region is first converted into a binary image in order to highlight the contrast between the
darker eyebrow and the lighter skin. For persons with light eyebrows and darker skin, the image
is inverted before being converted into binary images. To locate the vertical position of the
eyebrow within the region, the algorithm determines the percentage of white pixels in each row
of pixels in the region. Beginning at the top of the region, the first row of pixels with a high
percentage (i.e., 40% or higher) of white pixels is defined as the vertical location of the eyebrow.
Figure 18 shows the program detecting a person’s eyebrows when the person has a neutral
(Figure 18a) and an angry expression (Figure 18b). It can be seen from Figure 18b that the inner
eyebrow (i.e. red circles) moves closer to the eye and the slope of the eyebrows changes when
the person is displaying an angry expression.
(a) (b)
Figure 18: Detection of the eyebrow and its slope: (a) neutral face and (b) angry expression
38
3.3.1.3 Activity State Module
The objective of this module is to monitor: (i) the consumption of the meal and (ii) the state of
the utensil. Namely, with respect to meal consumption, the module keeps track of the amount of
food or drink remaining and if there is a change in weight detected in each dish or cup. The
module also detects if the cup has been taken from the tray. The state of the utensil is detected in
terms of its current location and the direction in which it is moving.
An overall activity state a is defined by two distinct classification: (i) the person is idle or
obtaining food/beverage s(mw,sw,dt,ul,um) and (ii) the person has consumed the food item
s(ul,db). Table 6 shows a summary of all the detected activity state parameters and their
respective detected levels.
Table 6: Activity State Parameters for the Meal-time Scenario
Activity State Parameter Levels
Main plate weight level, m
Side dish weight level, s
Drink weight level, d
0: 76 - 100% full
1: 51 - 75% full
2: 26 - 50% full
3: 6 - 25% full
4: 0 - 5% full
Main plate weight change, mw
Side dish weight change, sw
Drink weight change, db
0: No weight decrease
1: Weight decrease
Presence of drink, dt 0: Drink is on meal tray
1: Drink has been picked up
Utensil location, ul 0: At tray
1: At mouth of person
Utensil movement, um 0: Not moving
1: Moving towards mouth
2: Moving towards tray
Meal Consumption
The calibration procedure for the meal sensing tray and the algorithms utilized to track meal
consumption has been developed by another ASBLab member [78]. The meal sensing tray is
designed to accommodate a variety of standard dishware that is used in long-term care facilities.
If new dishware is introduced, the program needs to be calibrated with the weight of an empty
dish/cup. This value can be obtained by placing the object on the scale for the main dish and
running a calibration program, which will ask the user if the object is the main dish, side dish, or
39
cup. This information is then stored in a text file, which can be read by the Activity State
module. If the calibration program is not run, the robot will assume that the same dishware is
used.
In the beginning of the interaction, the initial weight of each meal item is obtained by the weight
sensors as the user is greeted by Brian 2.0. Based on this initial reading, five consumption levels
are created for each dish or cup, which are defined in Table 6 for the main dish, drink and side
dish. During the interaction, the weight sensory data from the meal tray sensing platform is
utilized to determine the user’s current consumption level of each meal item. Namely, the
module checks to see in which consumption range the current weight reading of the food item
falls. In addition, the weight readings are used to determine if the user has obtained some food
from either the main plate or side dish by searching for a small food weight decrease of at least
10g. For the drink a small weight change of 10g would indicate that the user has taken a sip from
the cup. Specifically for the drink, a weight reading of zero indicates that the user has picked up
the cup.
The weight readings from the load cells are subjected to data acquisition delays, sensor noise and
errors caused by the user exerting pressure onto the sensor with his/her utensil. For example,
pushing on the dishware with one’s utensil is a common occurrence when obtaining food as
contact is made when one scrapes, cuts or mixes one’s food. To minimize the effect of these
aforementioned errors and to obtain accurate weight readings, sensor signals are passed through
a median filtering algorithm. This method was chosen because it was found that filters that
manipulate raw sensor data using averaging techniques such as low bypass filters were
unsuitable for the application of the meal tray. Averaging techniques would erroneously include
increased weight values caused by utensil pressure in the calculation of meal weight and weight
change. Through testing, it was found that applying the median filter on twenty-one sensor
readings per cycle was effective at minimizing the effects of utensil pressure. With this filtering
process, it takes 2 seconds to achieve an accurate weight reading.
State of the Utensil
The algorithms utilized to obtain utensil location have been developed by another ASBLab
member [78]. The state of the utensil is obtained by analyzing the current location of the utensil
relative to the person’s face, which is obtained by the User State module (Section 3.3.1.2), and
40
the utensil’s previous location. For example, if the detected location of the utensil is close to the
detected location of the face, then the utensil is identified as being at the mouth. Conversely, if
the utensil is detected to be far away and below the face, then it is identified as being at the tray.
In order to detect if the utensil is moving and in which direction, the current location of the
utensil is compared to its last five previous locations. From that, the utensil can be identified as
moving up or down or not at all. During HRI, the utensil location is tracked through the images
provided by the camera.
In order to accurately compare the location of the face and the utensil, visual data from the
640x480 pixel resolution webcam and the 1024x768 pixel resolution IR camera must be related
to physical distances relative to a reference point. First a common reference point (i.e. a point in
the bottom left corner of both images) is chosen to be the [0,0] coordinate. From that reference
point, an object is moved a certain distance and the pixel location of the object is observed in
both types of images. This action is performed ten times in order to find two linear relationships:
(i) between actual distances and pixel location in the webcam image, and (ii) between actual
distances and pixel location within the IR camera image.
3.3.1.4 Speech Recognition and Analysis Module
Human speech is recognized via the Speech Recognition and Analysis module. Herein, speech
recognition and analysis is performed with the same software (i.e. Julius) that was utilized for the
memory game scenario. For the Meal-time Scenario, the vocabulary definitions have been
configured to recognize the words “Yes” or “No”, which can be spoken by the user in response
to a question posed by the robot.
3.3.1.5 Robot Emotional State Module
Similar to the Memory Game Scenario, a finite-state machine approach is used to match the
appropriate robot emotion to on a given user state and the robot’s current assistive action within
the context of the Mealtime Scenario. The same robot emotions are also utilized in this scenario.
Table 7 shows a summary of the robot’s emotional state for various situations. The robot is in
happy state when providing encouragement. Regardless of the robot action to be implemented,
when the user is angry, the robot will try to improve the user state of a person by being in a
41
happy state. The robot is sad when the person is distracted. For all other cases, the robot’s
emotional state is neutral.
Table 7: Robot Emotional State for Meal-assistance Robot
Human
User State
Current Robot Assistive Action
Encourage Cue Orient Monitor
Happy Happy Neutral Neutral Neutral
Neutral Happy Neutral Neutral Neutral
Angry Happy Happy Happy Happy
Distracted Happy Neutral Neutral Sad
3.3.1.6 Behaviour Deliberation Module
The detailed design of the Behaviour Deliberation module will be discussed as it pertains to the
robot engaging a person in the meal-assistance activity.
3.3.1.7 Inter-module Communication System
Communication between all modules is facilitated by the use of a data-pool system. Parallel
operation of the modules is important in this application because activity, speech, and user state
sensing must be performed continuously and simultaneously. For example, the Activity State and
the Speech Recognition and Analysis module must sense quick actions (e.g., a person picking up
the cup or answering “Yes” in response to a question), which may be missed if the module only
monitors the activity when requested by the Behaviour Deliberation module. The User State
module tracks face and its features instead of detecting them every time, which minimizes the
module’s computational load, but requires that the module be run continuously.
42
Chapter 4 Learning-based Decision Making for the Behaviour Deliberation
Module
4.1 Behaviour Deliberation Module
As previously mentioned, the main decision making module of the HRI control architecture is
the Behaviour Deliberation module. The module requires inputs from all sensor data analysis
modules in order to determine the robot’s effective assistive behaviour. The module is composed
of two layers: (i) the Knowledge Clarification layer and (ii) the Intelligence layer. The role of
Knowledge Clarification layer is to clarify the current state of the activity and person before it is
sent to the Intelligence layer. Multi-modal fusion techniques and/or clarification dialogue
techniques can be integrated into this layer to ensure that the state submitted to the Intelligence
layer is as accurate as possible. The Intelligence layer consists of the MAXQ hierarchical
reinforcement learning technique which is capable of adapting the robot’s behaviour to the
current assistive interactive scenario. This layer determines the overall behaviour of the robot as
a function of both verbal (speech) and nonverbal (gestures, and facial expressions and intonation
based on the robot’s emotions) communication means. A HRL approach is utilized to provide the
robot with the ability to: (i) learn appropriate assistive behaviours based on the structure of the
activity, and (ii) personalize an interaction based on a person’s behaviour and user state during
HRI.
4.2 Model of HRI Scenario
The modelling of the HRI scenario follows a standard Markov Decision Process (MDP) setup
[46], which consists of:
S: a finite set of states of the environment.
A: a finite set of available actions for the current state s.
P: the probability that the environment will transition from the current state s to a
resulting state s’ when an action is performed.
R: a real-valued reward that the agent receives based on a, s, and s’.
43
The role of reinforcement learning is to solve the MDP by determining the policy, π, which maps
a particular action to a state (i.e. ). The goal is to determine a policy that maximizes the
expected cumulative reward and minimizes the cost of the actions taken to reach a terminal state.
4.3 MAXQ Hierarchical Reinforcement Learning
MAXQ provides a hierarchical decomposition of a given reinforcement learning problem (task)
into a set of sub-problems (sub-tasks). MAXQ is able to support temporal abstraction, state
abstraction, and subtask abstraction which are important in the decision making process for the
socially assistive robot in the HRI scenario. The need for temporal abstraction exists since,
depending on the person’s skill level in the chosen activity and their interaction preferences,
some actions may take varying amounts of time to execute. State abstraction is beneficial since
all state variables are not needed at certain tasks. Due to state abstraction, the overall value
function for this task can be represented more effectively by utilizing only a subset of the state
variables. Subtask abstraction is also necessary because it allows subtasks to be learned only
once; the solution can then be shared by other subtasks. The next few sections outline the core
components of MAXQ.
4.3.1 Task Decomposition
Utilizing the MAXQ task decomposition method, a given MDP M is decomposed into a finite set
of subtasks {M0, M1,…, Mn} [46]. M0 is the Root Task, which is the overall assistive task for
chosen activity. Each subtask consists of a set of actions A, which can be performed to achieve
subtask Mi. These actions can be either primitive robot behavioural actions or other subtasks.
Since the MDP is decomposed into subtasks, a hierarchical policy, π, which is a set containing a
policy for each subtask (i.e. { }), must be determined.
The exploration policy, πx, for a given task determines if the action for that task is chosen
randomly or based on the action’s Q-value and the current state s, i.e. a = πx(s). Two examples of
exploration strategies that can be used within the MAXQ framework are: greedy and epsilon-
decreasing. With a greedy exploration strategy, the action that is chosen is always the one with
the highest Q-value. Conversely, with an epsilon-decreasing strategy, the action with the highest
Q-value is chosen 1-ε of the time. For rest of the time, the action is chosen randomly. ε gradually
44
decreases over time; therefore, action selection in the beginning is highly explorative and as time
progresses, action selection becomes more and more exploitative.
4.3.2 Value Function Decomposition
At the core of MAXQ is the value function decomposition, which describes how to decompose
the overall value function (i.e. Q-value) for a policy into a collection of value functions for the
individual subtasks, recursively [86]. The Q-value for a parent task p, state s, and action a, is
decomposed into two components [86]:
(5)
V(a,s) is the value function for action a, which can be subtask Mi or a primitive action. If the
action is a subtask, V(a,s) is further decomposed into the same two components (i.e. V(a,s) and
C(p,s,a)) for its own state and actions. To terminate the recursion, V(a,s) for a primitive action is
defined as the expected one-step reward of performing the primitive action a at state s. C(p,s,a)
is the completion function. Conceptually, V(a,s) is the reward for performing action a and
C(p,s,a) is the reward for completing the parent task.
4.3.3 MAXQ Learning Algorithm
The MAXQ value function for all tasks is learned utilizing the MAXQ learning algorithm [86]:
function MAXQ(state s, subtask p) returns float
Let TotalReward = 0
while p is not terminated do
Choose action a = πx(s) according to the exploration policy πx.
Execute a.
if a is primitive, observe one-step reward r
else r := MAXQ(s,a), which invokes subroutine a and returns the total
reward received while a is executed.
TotalReward := TotalReward + r
Observe resulting state s’
if a is primitive
else a is a subroutine
end // while
return TotalReward
end
45
4.4 Memory Game Scenario
The Behaviour Deliberation module for the Memory Game Scenario is composed of two layers:
(i) the Knowledge Clarification layer and (ii) the Intelligence layer.
4.4.1 Knowledge Clarification Layer
For the Memory Game Scenario, this layer is in charge of generating a clarification dialogue
between a person and the robot in order to reduce errors as a result of speech recognition.
Namely, if the average confidence score for the utterance by the person is low, as determined by
the Speech Recognition and Analysis module, the robot will state the utterance that has the
highest relative confidence score and ask the person to confirm his/her request by providing
positive/negative feedback in the form of yes or no answers. This allows the robot to match the
utterance with its own stored activity-specific utterance templates and hence, increase the
accuracy of speech recognition.
4.4.2 Intelligence Layer
For the Memory Game Scenario, the Intelligence layer utilizes the MAXQ hierarchical
reinforcement learning technique to adapt the robot’s behaviour to the current assistive
interactive scenario. The following subsections will describe how this is performed.
4.4.2.1 Task Decomposition
The proposed hierarchical task graph for the Memory Game Scenario is presented in Figure 19.
M0 is the Root Task and is defined, herein, to be the overall assistive task, which aligns with the
objective of the card game: To identify and check that cards flipped over result in a
corresponding pair match. The other subtasks are designed to determine the appropriate assistive
behaviours of the robot based on the current user state and activity state.
46
Root Task
Flip Over Flip Back
L1
Encouragement
Remove
Help
Localize Identify Recall
L2 L3 L4 I1 I2 R1 R2 R3 R4
CelebrationInstruction
Figure 19: Hierarchical task graph for the memory game scenario (primitive robot actions
on the bottom row are defined in Table 8).
Table 8: Examples of Primitive Robot Actions for Memory Game
Action Type Example
Instruction “Let’s play a round of the memory game. Please flip over a card.”
Celebration
(with prompting)
“Congratulations, you have made a successful match. Please
remove the cards from the game.”
Encouragement
(with prompting)
“Those are interesting cards that you have flipped, but they are not
the same. Please flip back the cards and try again. I know you can
do this.”
Help: Identify
(Player asks Robot
to identify a card)
I1: Related question
“This is a very good question. This card shows a picture of a dog.”
Help: Recall
(Player asks Robot
to recall a card)
R1: Level of difficultly = high
“Yes! You have definitely seen this card before in the game.”
R2: Level of difficultly = low
“Yes, you have seen this card before here.”
(Robot points at the location of the card)
R3: Card Location is not known
“You have not yet seen this card before.”
Help: Localize
(Player asks Robot
to locate a card)
L1: Level of difficultly = high
“The card is located in the top left corner.”
L2: Level of difficultly = low
“The card is located here.” (As identified by the pointing gesture of
the robot)
L3: Card Location is not known
“You have not yet flipped over this card.”
Help:
Identify/Recall/
Localize
I2/R4/L4: Unrelated question
“I’m sorry. I cannot answer your question at this time in the game.
Please try again later.”
47
The three main 1st level subtasks are defined as: Flip over cards, Remove (matched) cards from
the game, and Flip back (unmatched) cards. Each 1st level subtask is divided into a primitive
action, which includes Celebration, Encouragement, and Instruction, and a 2nd
level subtask:
Help. The Help subtask is further divided into three 3rd
level subtasks: 1) Localize a particular
card in the game, 2) Recall if a card has been flipped over in a previous round, and 3) Identify the
picture on a particular card. For example, if there are no cards flipped over, the path taken in the
task graph should be: Root Task → Flip Over → Instruction. Alternatively, if there is one card
flipped over and the player has asked the robot to localize the matching pair of this card, which
was flipped over in a previous round, the path taken should be: Root Task → Flip Over →
Help→ Localize →L1; where L1 is the primitive robot action where the robot informs the person
of the location of the matching card.
Every subtask in the task graph has a termination condition. For example, for the Root Task, the
termination condition is that eight pair matches are found in the game. For the Flip Over subtask,
the termination condition is that there are two cards flipped over. If the termination condition for
this subtask is not met after i number of iterations, the robot becomes sad since the person has
become disengaged from the game. This change in robot emotion is used to re-engage the
person. The termination condition for both the Flip Back and Remove subtasks is that there are 0
cards flipped over. The termination condition for the Help, Localize, Identify, and Recall
subtasks is that there is no human speech input.
4.4.2.2 State and Action Definition
A set of states, S, have been determined for the aforementioned subtasks to be utilized within the
MAXQ framework. Specifically, the state functions for the robot’s subtasks are defined in Table
9, where sS:
Table 9: State Functions for Memory Game Scenario
Task State functions
Root Task s(mc, c, m)
Flip Over cards s(c, hs, hu, re)
Remove cards s(c, hs)
Flip back cards s(c, hs)
Help s(c, hs, hu, re)
Localize s(c, hs, gd, l, I)
Recall s(c, hs, gd, l, I)
Identify s(c, hs, I)
48
mc represents the number of matches found in the game, c represents the number of cards flipped
over in a single round, m represents if a matched pair has been found, hu represents a person’s
user state, hs represents human speech, re represents the emotional state of the robot, l is the pair
location of the flipped over card, I is the identity of the flipped over card, and gd is the level of
difficulty of the game, which changes based on the number of incorrect matches the person has
made in the last n rounds.
Table 8 show examples of primitive robot actions. The primitive actions for the subtasks
Localize, Recall and Identify provide varying levels of encouragement and assistance to keep a
person engaged in the game. The first primitive action for Identify (i.e. I1) is to inform the player
of the identity of the card in question. A 2nd
primitive action for Identify (i.e. I2) is used when the
person asks a question that is not related to the activity state. Similarly, for Localize and Recall,
if the person asks a question that is not related to the activity state, the fourth action (i.e. L4 or
R4) is chosen. The remaining primitive actions for Localize and Recall provide two different
levels of difficulty of the game (i.e. L1 and L2, or R1 and R2). The third action (i.e. L3 or R3) is
used to deal with the case when a card’s location is unknown since the card has yet to be flipped
over.
At the start and end of the game, the Deliberation module implements the following behavioural
actions for the robot: 1) At Game Start: “Hi, my name is Brian. I am glad you want to play the
memory game with me. Let’s start.”, and 2) At Game End: “Congratulations, you have
completed the memory game.”
4.4.2.3 MAXQ Training
A two stage training procedure has been implemented for the MAXQ approach discussed in
following subsections. In the 1st stage, the primary focus is on determining appropriate
behaviours for the robot based on the structure of the game. After the robot has learned its
optimal behaviours with respect to the card game, the 2nd
training stage focuses on developing
personalized interactions for each person utilizing his/her user state during game playing.
MAXQ Off-line Training
The objective of the 1st training stage is to learn the robot’s optimal behaviours based on human
actions and activity states. On-line training would be unrealistic to use at this stage due to the
49
large amount of possible states and actions that need to be explored, as well as the extensive
amount of experience required to learn the optimal strategy. Therefore, an off-line training
procedure is utilized. The procedure incorporates a human user simulation model, error models
for both speech recognition and activity state detection, and an epsilon-decreasing exploration
strategy that can provide the extensive interaction experience needed for policy learning.
Human User Simulation Model
A simple probabilistic approach for user modeling is the n-gram model, [52], which predicts
human behaviour based on the last n-1 number of system actions. n-grams have been proven to
be effective in simulating real user behaviours for learning scenarios. Their main advantage is
that they allow for the multiplicity (i.e. the person can perform multiple actions at the same time)
and multi-modality of user actions to be modeled. Furthermore, they can easily be trained and
are fully domain-independent [52]. Herein, a bi-gram (n=2) human user model is used to
represent both human verbal and physical actions during the proposed assistive HRI scenario.
Experiments consisting of ten participants, each playing the memory game while interacting with
the robot, were performed to acquire the necessary data for the bi-gram model. In this bi-gram
user model approach, a person’s action is dependent on the last robot action, i.e.
p=P(actionhuman|actionrobot). For these experiments, an action is defined as any possible
behaviour of the robot or person related to the game. Examples of human actions include a
person flipping over a card or asking the robot a help-related question. Examples of the robot’s
actions are presented in Table 8. Full cooperation of the user during the interaction is assumed.
Namely, the user’s actions are related to the memory game, abiding by the rules of the game in
order to find all possible matches. The bi-gram user model as obtained from these experiments is
presented in Table 10.
50
Table 10: Bi-gram User Simulation Model (Memory Game Scenario)
Human Actions
Robot
Actions
Flips
over 1st
card
Flips
over 2nd
card*
Flips
back
cards
Remove
cards
Ask
“What..?”
(Identify)
Ask
“Where..?”
(Localize)
Ask
“Have..?”
(Recall)
Instruct/
Flip Over
(0 card)
76% 21% 0% 0% 1% 1% 1%
Instruct/
Flip Over
(1 card)
0% 62% 0% 0% 11% 8% 19%
Encourage/
Flip back 0% 0% 97% 0% 1% 1% 1%
Celebrate/
Remove 0% 0% 0% 97% 1% 1% 1%
Answers
Identify 0% 86% 0% 0% 1% 12% 1%
Answers
Localize 0% 97% 0% 0% 1% 1% 1%
Answers
Recall 0% 67% 0% 0% 1% 31% 1%
* If there are 0 cards initially flipped over, this action is described as flipping
over two cards at once.
Sensor Error Model
To account for variations in recognition performance caused by noise and speaker-dependant
differences, a speech recognition error model that assumes a new speaker for every game is
utilized. For each recognition task (RT), the following equation is used to compute the
recognition rate (RR) [42]:
( )( ) ( (0,1)) ( )Game RR RT OverallRR RT Sample N RR RT (6)
The recognition results of ten different speakers are used to compute the overall RR and standard
deviation for Recall, Identify and Localize, i.e. Table 11. These errors are incorporated into the
simulation model for when the robot needs to detect a person’s verbal action.
Table 11: Speech Recognition Rates
Recall Identify Localize
Number of utterances detected 250 150 200
RROverall(RT) 82.0% 97.3% 97.0%
σRR(RT) 0.148 0.064 0.063
51
The activity state detection error is based on determining: (i) the identity of the cards in the
game, and (ii) the number of cards flipped over by the user. The card identification error is
incorporated into the simulation model for when the robot must provide help to the user. The
game area is split into a 4x4 grid, representing the location of the cards. Table 12 shows the
detection rates for each section based on the results of ten detection trials per section.
Table 12: Card Identity Detection Rates
Game Area Column
1 2 3 4
Row
1 90% 100% 90% 100%
2 100% 100% 100% 100%
3 100% 100% 100% 100%
4 100% 100% 100% 100%
Errors resulting from detecting an incorrect number of cards flipped over are also incorporated in
the simulation model for when the robot must provide the appropriate instructions based on the
activity state. Table 13 shows the detection rates for detecting if there are 0, 1 or 2 cards flipped
over.
Table 13: Detection Rates for the Number of Cards Flipped Over
Number of cards flipped: 0 1 2
Number of occurrences 75 32 142
Detection Rate 100% 94% 100%
Rewards System
The aim of the reward system is to minimize the cost of the actions taken to reach the ultimate
goal of completing the activity. Therefore, the cost of an undesired robot action is higher than a
desired action. In the memory game, a desired action is defined as an appropriate action for the
current state (e.g. the robot congratulating a person when he/she has found matching cards).
Every completed primitive action is given a negative reward of -1; whereas undesired actions are
given an additional negative reward of -20. Desired primitive actions are not further rewarded. A
positive reward of +21 is given at 1st level subtasks if a person is asking a help-related question
and the appropriate Help subtask is chosen. A game reward of +400 is given at the Root Task
when the player finds all 8 matches in the game. The reward values presented here were chosen
in a manner that allows a clear distinction between desired and undesired actions.
52
For example, when a player flips over the last two matching cards of the game, the correct robot
action is to celebrate the match and inform the player to remove the cards. If the person
completes the task or removing the cards, the resulting reward is r = 399 since a completed
primitive action has been implemented (-1) and the game is finished (+400). Alternatively, if
there is one card flipped over and the player asks the robot to localize a card that has been
previously flipped over, the appropriate robot action would be to inform the user of the correct
location of the card. In this case, the resulting reward is r = +20 since a primitive action has been
implemented (-1) and the appropriate Help subtask has been chosen based on the player’s
question (+21).
Exploration Policy
An epsilon-decreasing exploration strategy is applied during off-line training. At the beginning
of off-line training, ε is set to 1 for the Root Task as well as all 1st and 2
nd level subtasks to
encourage the maximum amount of exploration possible. 3rd
level subtasks, which only evoke
primitive actions employ a greedy policy (i.e. ε = 0), where the action with the highest potential
reward (Q-value) will always be chosen. Since a previously implemented action will result in a
negative reward for that action, a greedy policy is used here to ensure that all primitive actions
are explored at least once. Once all primitive actions have been explored, the exploration policy
gradually reduces to 0 at the Root Task as well as all 1st and 2
nd level subtasks. This is done so
that Q-values at these subtasks will converge to their optimal values.
Performance Analysis
A study was performed to compare the rate of convergence of the MAXQ approach versus a
traditional flat Q-learning approach, [87], for the proposed memory game scenario. The same
learning rate (i.e. α=0.8), initial Q-values, state parameters, primitive actions and user simulation
model were used for both implementations. Figure 20 presents the cumulative rewards for the
overall assistive task. The results from this study show that the MAXQ method converges at a
faster rate than the flat Q-learning approach. For flat Q-learning, there were 33 state parameters
and 13 primitive actions, resulting in 20,736 unique states and 269,568 Q-values. With state and
subtask abstraction, the MAXQ approach significantly reduces the amount of Q-values needed to
be stored to only 1,707 Q-values, making it a considerably more efficient solution to this
decision making problem.
53
Figure 20: Comparison of MAXQ and flat Q-learning for the Memory Game Scenario
MAXQ On-line Training
Once the 1st training stage has determined the appropriate behaviours of the robot based on the
structure of the memory game, the 2nd
stage of the training process is implemented. This on-line
training stage is used to allow Brian 2.0 to learn its optimal assistive behaviours based on a
person’s user states during game engagement. The aim is to select the robot’s behaviours in an
attempt to maintain positive (i.e., pleased or excited) user states during game playing. It is
postulated that this will, in turn, allow a person to be more engaged in the cognitively stimulating
activity.
A novel on-line training procedure has been developed utilizing a person’s user state to explore
robot behaviours such as providing instruction or help when appropriate, and reward the
behaviours that succeed at improving user state during the memory game. Exploration of
behaviours is triggered by the robot detecting that the person is in a stressed state. At this user
state, the exploration policy, ε, is non-greedy for the Flip Over and Help subtasks. ε is gradually
reduced at every successful robot action. The robot will eventually revert back to the greedy
exploration policy when ε finally decreases to 0, where the action with the highest Q-value will
be chosen. This on-line training procedure is repeated for every new user that interacts with the
robot.
0 2 4 6 8
x 106
0
100
200
300
400
500
Primitive Actions (x106)
Cum
ula
tive R
ew
ard
Flat Q-learning
MAXQ
54
At the end of the 1st stage of training, the Instruction behaviour (at the Flip Over subtask level)
has a higher Q-value than the Help behaviour. To promote exploration of both these behaviours
in the 2nd
stage of training, the first successful Help behaviour is given a reward of +20 so that it
has the same Q-value as the Instruction behaviour. Subsequent successful Help and all successful
Instruction behaviours are given a reward of +10. At the Help subtask level, a reward of +1 is
given to a successful Localize, Identify, or Recall action.
The on-line training procedure was implemented and tested on ten healthy adult participants (20
to 35 years old) as they played the memory game twice while interacting with the robot. Results
from this experiment will be discussed in Section 5.1.3.
4.5 Meal-time Scenario
4.5.1 Knowledge Clarification Layer
For the Meal-time Scenario, the objective of the Knowledge Clarification layer is used to clarify
speech recognition and to utilize a multi-modal fusion method to determine the current human
activity state. Similar to the Memory Game Scenario, this layer generates a clarification dialogue
between a person and the robot in order to reduce errors as a result of speech recognition. Human
intent is based on multiple inputs from the Activity State and User State modules. These modules
provide complimentary inputs so that there are multiple indicators for each human activity state
ha(t). For example, the following combination of inputs indicates that person may need
assistance from a staff member: (i) the person is currently idle as determined by the activity state
module, (ii) the person is distracted or angry as determined by the user state module, and (iii) the
person has been idle for a long period of time as determined by the length of time that has passed
since the person was first at the idle state.
Utilizing inputs from various modules, three distinct human activity states are defined: (i) the
person engaged in the activity s(a,ha(t-1)), (ii) the person may need assistance from a staff
member s(a,hu,it), and (iii) the person has confirmed that he/she needs assistance from a staff
member s(ha(t-1),hs). a indicates the activity state. hu represents the user state of the person and
hs represents human speech. it indicates if the person has idled for a long period of time (i.e. 10
minutes). Lastly, ha(t-1) is the previous human activity state.
55
4.5.2 Intelligence Layer
The MAXQ HRL task hierarchy for the Meal-time Scenario is shown in Figure 21. The root task
for this scenario is defined to be the overall assistive task, which is to motivate the person to
consume the contents of the meal.
Motivate to Eat
Obtain food or drink
Put food or drink in mouth
Pick up beverage
Obtain food from side
dish
Obtain food from main
dish
Eat foodDrink
beverage
Encourage 1
Cue 1
Orient 1
Encourage2
Cue2
Orient 2
Encourage3
Cue3
Orient3
Monitor
Encourage4
Cue4
Encourage 5
Cue5
Figure 21: Task Decomposition for Meal-time Scenario
The root task Motivate to eat or drink is divided into two subtasks and one primitive action. The
two 1st level subtasks are defined as: Obtain food/drink and Put food/drink in mouth. The 1
st
level primitive action is Monitor, is where the robot asks if the person wants further assistance
from a staff member. This action is only evoked when the robot has sensed that person is
distracted or has been idling for a long time (i.e. 10 minutes). Based on the meal plan, Obtain
food/drink is a subtask designed to motivate the person to either obtain the food on a particular
dish with a utensil or pick up the beverage from the tray. Therefore, Obtain food/drink is divided
into three 2nd
level subtasks: Obtain food from main dish, Pick up beverage, and Obtain food
from side dish. Each of these subtasks is further decomposed into three primitive actions, which
are Encourage, Cue, and Orient. Based on whether food was previously obtained or the beverage
was picked up, Put food/drink in mouth is a subtask designed to motivate the person to either eat
the food he/she has obtained or drink the beverage. Therefore, Put food/drink in mouth is divided
into two 2nd
level subtasks: Put food in mouth and Drink beverage. These subtasks are each
decomposed into two primitive actions, which are Encourage and Cue. For example, if the
person is currently idle and the robot should motivate him/her to eat from the side dish, the path
taken in the task graph could be: Motivate to eat or drink → Obtain food or drink → Obtain food
56
from side dish → Encourage. Alternatively, if the person has picked up the beverage, the path
taken could be: Motivate to eat or drink → Put food or drink in mouth → Drink beverage →
Cue.
Table 14 shows the termination conditions for each task. In general, the termination occurs: (i)
when the objective of a root task or subtask is completed, or (ii) if the user appears to be in need
of further assistance from a staff member. The only exception is for subtasks Obtain food from
main dish, Pick up beverage, and Obtain food from side dish. These subtasks also terminate
when the person has progressed in the interaction. For example, when the robot prompts the
person to obtain food from the main dish, but the person obtains food from the side dish, the
Obtain food/drink and Obtain food from the main dish subtasks have to terminate because the
person has now moved on to consuming the obtained food. In this case, the robot must adapt to
the obtained meal item and continue to provide the appropriate prompting.
Table 14: Task Termination Conditions for Meal-time Scenario
Task Termination Conditions
Motivate to eat or drink The meal is completed.
Assistance from a staff member has been requested.
Obtain food or drink/
Obtain food from main dish/
Pick up beverage/
Obtain food from side dish
The person has obtained food or picked up their beverage.
The person is distracted or has been idling for a long time.
Put food or drink in mouth The person has brought the food or drink to their mouth.
The person is distracted or has been idling for a long time.
Put food in mouth The person has brought the food to their mouth.
The person is distracted or has been idling for a long time.
Drink beverage The person has brought the drink to their mouth.
The person is distracted or has been idling for a long time.
4.5.2.1 State and Action Definitions
S is a set of states that have been determined for the aforementioned subtasks to be utilized
within the MAXQ framework. Table 15 shows the state functions for the robot’s subtasks, where
sS.
57
Table 15: State Functions for Mealtime Scenario
Task State functions
Motivate to eat or drink s(m,ha)
Obtain food or drink s(mp,ha)
Put food or drink in mouth s(ha,ms,fd)
Obtain food from main dish s(mw,m hu)
Pick up beverage s(dt,d,hu)
Obtain food from side dish s(sw,s,hu)
Put food in mouth s(fe,hu)
Drink beverage s(db,hu)
m represents how much of the meal has been already consumed. mp represents the designated
meal plan, which is the order in which the dishes and drink should be consumed. ha is the current
human activity. ms is the meal status, which indicates if the meal is still in progress or finished.
fd represents whether a food item or drink has been obtained by the person. mw indicates that the
person has removed food from the main dish, resulting in a weight change. Similarly, sw
indicates that food has been taken from the side dish. dt represents that the person has taken the
beverage off the tray. m, d, and s represents how much of the main dish, drink, and side dish, has
been consumed by the person, respectively. fe indicates that the food has been successfully
brought to the mouth and eaten by the person. db represents that the person has drank the
beverage, resulting in a weight change. Finally, hu represents the user state of the person.
The Monitor action is used to call a staff member to assist with the scenario at hand. The
Encourage, Cue, and Orient primitive actions represent the three different techniques used to
motivate the person to complete the given task. Encourage actions are positive reasoning tactics
to convince the person to perform the task. Namely, the robot may promote the task by informing
him/her of the positive health benefits of a certain food/drink type, providing reinforcement for
completing the previous task, adding courteous and encouraging words to cuing phrases,
promoting healthy eating habits (e.g., chewing the food or drinking slowly), or commenting on
the aroma or visual appeal of the food/drink. Cueing actions are designed to be straight forward
and direct instructions. In this case, the robot simply cues the person to eat a certain type of food,
use a utensil to obtain the food, pick up the beverage, or slow down if he/she is eating too fast.
Lastly, Orienting actions are designed to provide general awareness of the situation. Namely, the
robot may inform the person of which meal he/she is attending, his/her location, what type of
58
food is in the meal (e.g. pasta with sauce, juice), and the location of the food and drink items on
the tray. Table 16 shows examples of the aforementioned primitive actions.
At the start and end of the meal, the Deliberation module implements the following behavioural
actions for the robot: (i) at the beginning of the meal: “Hi, my name is Brian. You look very nice
today. I will be joining you for lunch. Let’s eat.”, and (ii) at the end of the meal: “I see that
you’ve finished your meal. Thank you for letting me join you for lunch. Have a nice day!” In
order to add more social elements to the interaction, as suggested in [73]-[76], the robot also
provides jokes and general positive statements about the interaction and the meal every 2
minutes.
Table 16: Examples of Primitive Actions for Mealtime Scenario
Action
Type Example
Monitor “I detect that you have not eaten in a long time. Would you like me to ask a
staff member to help you with your meal?”
Encourage
“The main dish smells amazing. Why don’t you pick up some food with your
fork?”
“Drinking some of your beverage will make your food easier to swallow.”
“I see that you have finished eating the entire main dish. Good job. Try the
side dish.
“What you have on your spoon looks delicious. Why don’t you take a bite and
see how it tastes?”
“Please drink your beverage slowly. You will enjoy it more.”
Cue
“Use your spoon to pick up your food from the main dish.”
“Take a drink from your cup.”
“Please chew your food before taking another bite.”
“Slow down. Please drink your beverage slowly.”
Orient
“The menu today includes chicken breast, vegetable medley, and rice.”
“We are in the dining room and you are enjoying lunch with me.”
“Your side dish is located at the bottom right corner of your tray.”
4.5.2.2 MAXQ Training
Similar to the Memory Game Scenario, the proposed MAXQ training procedure for the Meal-
time Scenario is performed in two stages. In the 1st stage, appropriate robot behaviours are
learned based on the structure of the meal-time activity. In the 2nd
stage, the interaction is
personalizing by learning which motivating technique is effective at convincing the person to
complete a specific task.
59
MAXQ Off-line Training
MAXQ off-line training for Meal-time Scenario is presented in the following subsections.
Human User Simulation Model
Similar to the MAXQ training setup used for the Memory Game, a bi-gram human user model is
used to represent human activity stages during the proposed assistive HRI scenario. Examples of
the robot’s actions are presented in Table 16. This model assumes full cooperation of the user
during the interaction. Namely, the user’s actions are related to the meal activity, there is
generally a higher probability that the user will act as directed by the robot, and he/she has the
objective of consuming the entire meal. Figure 22 and Table 17 present the proposed bi-gram
model used for off-line training.
Idle
Obtain from main dish / Take beverage /
Obtain from side drink
Put food in mouth /Drink beverage
Needs Attention
Get Staff
Cue[1-3]Encourage[1-3]
Orient[1-3]
Cue[4-5]Encourage[4-5]
Cue[1-3]Encourage[1-3]
Orient[1-3]
Monitor
Legend
Robot ActionsHuman Activity
Stages
Figure 22: Flowchart of Human Actions for Meal-time Scenario
60
Table 17: Human User Model for Meal-time Scenario
Human Activity Stages
Idle
Obtain
food from
main dish
Take drink
Obtain
food from
side dish
Put food in
mouth
Drink
beverage
Needs
Attention
Robot
Actions
Encourage
1 10% 75% 5% 5% 0% 0% 5%
Cue
1 10% 75% 5% 5% 0% 0% 5%
Orient
1 10% 75% 5% 5% 0% 0% 5%
Encourage
2 10% 5% 75% 5% 0% 0% 5%
Cue
2 10% 5% 75% 5% 0% 0% 5%
Orient
2 10% 5% 75% 5% 0% 0% 5%
Encourage
3 10% 5% 5% 75% 0% 0% 5%
Cue
3 10% 5% 5% 75% 0% 0% 5%
Orient
3 10% 5% 5% 75% 0% 0% 5%
Encourage
4 0%
0% or
10%* 0%
0% or
10%* 85% 0% 5%
Cue
4 0%
0% or
10%* 0%
0% or
10%* 85% 0% 5%
Encourage
5 0% 0% 10% 0% 0% 85% 5%
Cue
5 0% 0% 10% 0% 0% 85% 5%
Monitor 95% 0% 0% 0% 0% 0% 5%
*0% or 10% depending on where food was obtained from. For example, if food was obtained from the main dish,
then there would be a 10% that Encourage4 and Cue4 would result in the person staying in the same state (i.e. the
person has not brought the food to their mouth yet) and 0% that the person obtained food from the side dish.
Sensor Error Model
The activity state detection error model is presented in Table 18. Detection rates are determined
via repeatability testing.
61
Table 18: Sensor Error Model (Meal-time Scenario)
Activity State Parameters Detection Rate
Consumption Levels
Main Dish 99%
Beverage 100%
Side Dish 96%
Weight Change
Main Dish 100%
Beverage 97%
Side Dish 98%
Beverage Picked up from tray 100%
Utensil Location At tray 99%
At mouth 99%
Utensil Movement
Not moving 100%
Moving to mouth 99%
Moving to tray 100%
Reward System
In the Meal-time Scenario, a desired action is defined as an appropriate action for the current
state (e.g. the robot encouraging the person to eat the food that he/she has obtained from the
main dish). Every completed primitive action is given a negative reward of -1; whereas
undesired actions are given an additional negative reward of -20. Desired primitive actions are
not further rewarded. A reward of +300 is given at the root task if the activity ends with the
person finishing his/her meal. Conversely, a reward of +50 is given if the activity ends with the
person requesting for further assistance from a staff member. The reward values presented here
were chosen in a manner that allows a clear distinction between desired and undesired actions.
For example, at end of the meal, the person has obtained the last remaining portion of food or
drink in the meal and the correct robot action is to encourage or cue the person to consume what
he/she has obtained. If the person performs the action as directed, and thus, ends the activity by
completing the meal, the resulting reward is r = 299, since a completed primitive action has been
implemented (-1) and the meal is completed (+300). Alternatively, if the person is idle and the
robot performs the incorrect action of cuing the person to eat the food he/she has obtained from
the main dish, the reward is r = -21, since a primitive action has been implemented (-1) and an
undesired action is performed (-20).
62
Exploration Policy
An epsilon-decreasing exploration strategy is applied during off-line training. ε is originally set
to 1 for the root task and the 1st level subtasks to encourage the maximum amount of exploration
possible at the beginning of off-line training. 2rd
level subtasks, which only evoke primitive
actions employ a greedy policy (i.e. ε = 0), where the action with the highest potential reward
(Q-value) will always be chosen. A greedy policy is used here to ensure that all primitive actions
are explored at least once since a previously implemented action will result in a negative reward.
Over time, the exploration policy at the root task and 1st level subtask gradually reduces to 0,
which enables their Q-values to ultimately converge to their optimal values.
Performance Analysis
A convergence study for the Meal-time Scenario was also performed to confirm that the MAXQ
approach is more efficient than a traditional flat Q-learning approach. The same learning rate
(i.e. α=0.8), initial Q-values, state parameters, primitive actions and user simulation model were
used for both implementations. Figure 23 shows the convergence rates for the MAXQ method
and the flat Q-learning approach for this scenario. For flat Q-learning, there were 6,000,000
unique states and 78,000,000 Q-values. With state and subtask abstraction, the MAXQ approach
significantly reduces the amount of Q-values needed to be stored to only 650 Q-values, making it
a considerably more efficient solution to this decision making problem.
Figure 23: MAXQ vs. Flat-Q Comparison for Meal-time Scenario
0 1 2 3 4 5
x 106
0
100
200
300
400
Primitive Actions (x106)
Cum
ula
tive R
ew
ard
MAXQ
Flat Q-learning
Flat Q-learning MAXQ
63
MAXQ On-line Training
The aim of the 2nd
stage of MAXQ training is to learn the optimal assistive robot behaviours that
will motivate the person to ultimately consume his/her entire meal. Herein, the effectiveness of
three motivation styles (i.e., Encourage, Cue, and Orient) is investigated for each person in terms
of convincing the person to consume a portion of his/her meal.
In this online procedure, up to three levels of customization can be applied. Namely, the robot
can learn the preferred motivation style for each 2nd
level subtask, user state, and consumption
level of the dish/drink (for the 2nd
level subtasks of Obtain food or drink only). This is performed
by first setting a non-greedy exploration policy at 2nd
level subtasks to trigger the exploration of
assistive behaviours. At each 2nd
level subtask, there are up to three possible assistive behaviours
from which the robot can choose from: Encourage, Cue, and Orient (for the subtasks of Obtain
food or drink only). Based on the exploration policy, an assistive behaviour is chosen and
executed, and the resulting human action is observed. If the robot is successful is motivating the
person to complete the task, the action is rewarded with an on-line training reward of +1. The
exploration policy is gradually reduced at every successful robot action. The robot will
eventually revert back to the greedy exploration policy when ε finally decreases to 0, where the
action with the highest Q-value will be chosen. This on-line training procedure is repeated for
every new user.
The on-line training procedure was implemented and tested on five healthy adult participants as
they each interacted with the robot during two meals. Results from this experiment will be
discussed in Section 5.2.1.2.
64
Chapter 5 Experiments
Experiments were performed to evaluate the proposed control architecture and its learning-based
decision making capabilities for the Memory Game Scenario and the Meal-time Scenario. The
following two types of experiments were conducted for each scenario: (i) performance
assessment of the key modules within the control architecture and (ii) HRI studies that
investigate the effect of the robot’s behaviours during HRI scenarios.
5.1 Memory Game Scenario
For the Memory Game Scenario, four experiments were performed to evaluate: (i) the
performance of the individual modules within the control architecture, (ii) the robot’s ability to
engage an individual in a cognitively stimulating activity, (iii) the performance of the overall
proposed MAXQ online training procedure, and (iv) the robot’s ability to minimizes task-
induced stress during the a cognitively stimulating activity. The following subsections will
discuss the findings from all experiments.
5.1.1 Performance Assessment: Control Architecture
In this experiment, ten healthy adult participants (21 to 40 years old) were instructed to play the
memory game with Brian 2.0 in its entirety with the aim of utilizing all its instruction and help
functions during game playing. The objective of this experiment is to evaluate the ability of the
control architecture to: (i) accurately detect human behaviours during HRI and (ii) choose the
appropriate robot behaviour based on the current interactive scenario and the policy learned from
MAXQ offline training. The experimental set-up is shown in Figure 24. All participants were
familiar with the memory game. 16 picture cards were used in total for the game. The SIFT-
based card recognition and localization method, as described in Section 3.2.2.2, was utilized in
this set of experiments. Moreover, voice intonation was utilized to determine user state.
65
Figure 24: Experimental Set-up for Performance Assessment (Memory Game Scenario)
5.1.1.1 Results and Discussions
Results of the experiments are presented in Table 19 and Table 20. Table 19 presents the
performance of the Activity State module and Table 20 shows the robot’s ability to successfully
determine and execute the appropriate behaviour (as determined by the Behaviour Deliberation
module) for a given game state. The number of trials represents the number of opportunities that
existed to observe a particular activity state parameter or robot behaviour.
As shown in Table 19, high success rates were achieved for all the tested game state parameters
during the experiments. It is interesting to note that there were significantly more opportunities
to detect the presence of two cards flipped over, compared to one card flipped over, as most
participants preferred to flip two cards at once rather than one card at a time. Typically, it was
found that human hands have an insignificant number of features associated with them compared
to a picture card; therefore, during game state analysis, if there is a hand present in an image, the
hand is ignored the majority of the time. Instead, the system focuses on locating clusters with a
large number of features. However, in two occurrences during the experiments, each happening
within two different trials, a participant’s hand was erroneously identified as a 2nd
card that has
been flipped over when only one card was flipped over in that trial. In general, a hand has the
potential to be falsely identified if a person is wearing nail polish, jewellery, or anything else that
provides additional texture to the hand. False identification of a card also occurred twice during
game play in two different trials with separate participants. Both participants had accidently
moved the card beyond the defined game area, leaving only a small portion of the card to be
66
visible to the 2D camera. In one case, this false identification subsequently led to the incorrect
recognition of a “match” due to the lack of a sufficient number of keypoints.
Table 19: Activity State Identification Results (Memory Game Scenario)
Activity State Parameters No. of Trials No. of Failures Success Rate
Number of cards flipped
0 cards flipped 75 0 100%
1 card flipped 32 2 94%
2 cards flipped 142 0 100%
Localization and identification of cards 160 2 99%
“Match” recognition 80 1 99%
“No match” recognition 62 0 100%
From Table 20, it can be seen that the robot was successful at selecting and executing
appropriate emotion-based behaviours throughout the interactions. The two previously
mentioned types of errors resulted in the robot also implementing the wrong behavioural actions
(i.e., instructions and celebration (with prompting)). Failures related to Help behavioural actions
were primarily a result of the speech recognition software incorrectly recognizing the intent of
the participants. Namely, the acoustic model utilized for speech recognition was sensitive to the
different word pronunciations and speaking styles amongst the 10 participants, as well as
background noise during the interactions. The two following words were the most difficult to
recognize: “What” and “Have”. The acoustic model utilized in these experiments was not trained
to be participant-dependant and thus, inherently, as a general limitation to person independent
speech recognition techniques, it has difficulty correctly recognizing different pronunciations of
the same words. This limitation is reflected in the results shown in Table 20.
Table 20: Robot Emotion-based Behaviour Selection and Execution Results
Expected Robot Behaviour No. of Trials No. of Failures Success Rate
Game Start 10 0 100% Instruction 111 2 98% Celebration (with prompting) 80 1 99% Encouragement (with prompting) 62 0 100% Help: Player asks to recall card 42 9 79% Help: Player asks to locate a card 43 3 93% Help: Player asks to identify a card 44 8 82%
Game End 10 0 100%
67
5.1.2 HRI Study: Activity Engagement
To assess the ability of Brian 2.0 using the proposed control architecture to engage people in the
memory game, an AB test design was implemented. One of the main symptoms of dementia is
the potential for people suffering from the disease to be easily distracted from a particular task
due to limited attention span and concentration [88]. The objective of this experiment is to
simulate scenarios in which a person can be distracted and show that, in such situations, the robot
can be effectively used to engage the person in the memory game.
5.1.2.1 Study Procedure
The experiment consisted of ten healthy adult participants (21 to 40 years old) interacting in a
one-on-one scenario with the robot in a laboratory setting. All participants were familiar with the
memory game. 16 picture cards were used in total for the game. Each participant was situated in
an environment in which they would be easily distracted from the memory game. The specific
aim was to assess and validate the robot’s ability to provide positive interactions during HRI-
based person-directed activities prior to initiating pilot testing at the ASBLab’s collaborative
healthcare facility with persons suffering from mild cognitive impairment. Namely, within this
specific aim the following hypothesis was addressed: The robot Brian 2.0 with social interaction
capabilities will increase the likelihood of a person engaging in a specific cognitively stimulating
activity.
The definition of activity engagement used by Brenske et al. [89] is applied to the cognitively
stimulating activity performed during the HRI scenario. Within the context of the memory game,
engagement is identified to consist of any of the following activities performed by the person: (i)
manipulation of the cards, (ii) looking towards the game area or the robot, and (iii) partaking in
verbal dialogue with the robot.
During the baseline (i.e., experimental set A), a table and chair set-up was utilized, where the
memory card game was arranged in the center of the table, in front of the person. Distractions in
the form of a tennis ball, magazines, a toy robot and a robotic dog were placed around the game,
i.e. Figure 25a. Each participant was instructed that the memory game was available to them to
play for a 20 minute time period. For the robot interaction scenario (i.e., experimental set B), the
sessions were designed to be the exact same as in the baseline except that the robot was present
68
to provide social engagement, i.e. Figure 25b. The baseline scenario was conducted first and the
robot interaction scenario was conducted the following day. The objective of the overall
experiment was not revealed to the participants until the experiment was completed.
(a)
(b)
Figure 25: (a) Baseline Scenario and (b) Robot Interaction Scenario
Observations were targeted at the activities related to engagement in the memory game. Namely,
engagement was noted only if the person’s attention is solely directed towards the memory game
or Brian 2.0. A participant was observed every 30 seconds, during which the presence or non-
presence of engagement indicators were recorded. The percentage of intervals in which the
participants engaged in the memory game was determined and used to identify a participant’s
level of engagement. All observations were recorded using a small webcam to monitor the
participants indirectly in order to relieve any potential psychological pressure instilled by the
presence of an observer. In this experiment the SIFT-based method was used for card recognition
and voice intonation was utilized to determine user state.
5.1.2.2 Results and Discussions
Table 21 presents the results for the baseline and robot interaction scenario for all 10
participants. A one-tailed, paired t-test was performed to validate the hypothesis that the robot
Brian 2.0 with social interaction capabilities will increase the likelihood of a person engaging in
a cognitive stimulating activity. A statistical significance (p<0.001) between the mean values of
engagement in the no-robot and robot sessions was determined. It can be confirmed that
engagement in the memory game was noticeably greater in the robot interaction scenario, which
supports the hypothesis.
69
Table 21: Engagement Results (Memory Game Scenario)
Participant
Baseline
Scenario “A”
(%)
Robot
Interaction
Scenario “B”
(%)
1 38 45
2 10 100
3 25 60
4 10 47.5
5 15 100
6 0 82.5
7 10 55
8 0 65
9 30 45
10 28 72.5
Mean 16.5 67.3
Standard Dev. 12.9 21.1
It is interesting to note that two participants had zero engagement in the memory game during the
baseline. Instead of playing the memory game, the two participants read the magazines and
played with the toys for the full 20 minute duration. In general, during the baseline, the
distracters worked well in causing the participants to deviate from (and in two cases ignore) the
memory game. Namely, on average the participants only played the memory game for 3.3
minutes during the baseline. In short, participants found the memory game, on its own, engaging
for only a short period of time. Even during game playing, they were often distracted by the
other objects and hence, at times, not fully concentrating on playing the game.
Conversely, for the robot interaction scenario, participants were more engaged in the memory
game due to the presence of the social robot. The robot’s instructions and help functions were the
most effective at keeping a participant’s attention on the game during the interaction period. On
average, the robot kept the participants engaged in the memory game for approximately 13.5
minutes. For two participants, the robot was even able to keep them engaged in the game for the
full 20 minutes. If the participants did not manipulate the cards or interact with the robot for a 1
minute interval, the robot would provide instructions or help to direct the participant’s attention
back to the game. This would cause the participants to be engaged in the game again and their
affective states would also increase. Because of this, the participants never deviated from the
70
memory game for a long duration of time but rather for short time intervals. Examples of the
robot’s behaviour during the interactions are presented in Figure 26.
(a)
(b)
(c)
Figure 26: Examples of robot behaviour during interactions: (a) Robot providing
celebration in a happy emotional state after a correct match, (b) Robot providing help in a
neutral state, and (c) Robot providing instruction in a sad state when game disengagement
occurs.
5.1.2.3 Participant Survey Results
Once the participants were finished interacting with the robot, they were asked to provide their
feedback on the robot’s behaviour during game playing via a Likert scale survey. They were
asked to answer questions regarding the effectiveness of the robot’s communication and
intelligence attributes in terms of engaging them and enhancing their experience of the game.
80% of the participants stated that the instruction, celebration, and encouragement phrases
provided by the robot helped keep them engaged and interested in the game. 70% of the
participants found the help functions of the robot to be very useful, especially how these
functions assisted them in finding a match and provided specific details about the location and
identity of the cards. Furthermore, 70% of participants explicitly commented on how they liked
the robot’s ability to communicate with them in a clear and natural manner as they played the
game. Specifically, they liked that the robot was intelligent enough to recognize what questions
they were asking and give the appropriate answer, the variety of different phrases it was capable
of saying, and how its facial movements complemented its speech.
5.1.3 Performance Assessment: Learning-based Decision Making
The aim of this experiment is to evaluate the proposed on-line training procedure, which was
discussed in Section 4.5.2.2. In this experiment, ten healthy adult participants (20 to 35 years
71
old) played the memory game twice while interacting with the robot. A baseline heart rate was
obtained for each participant prior to game initiation. A successful action is defined as a robot
action that improves a person’s user state from a stressed state to a non-stressed state. Figure 27
shows the experimental setup.
Observations from the first two experiments showed that voice intonation (i.e. Section 3.2.2.4)
provided limited opportunities in detecting user state during the course of the activity. Therefore,
the heart-rate based method (i.e. Section 3.2.2.4) was utilized from this point forward to allow
for a more continuous monitoring of user state during HRI. Moreover, the colour-based card
recognition and localization method (Section 3.2.2.2) was utilized from this point forward to
improve the sampling time for the activity state module.
Figure 27: Experimental Setup for Evaluation of Learning-based Decision Making
In this experiment, a scenario involving activity-induced stress is simulated in order to
demonstrate that, in such situations, the robot can be effectively used to minimize this type of
stress. As the participants are healthy adults, the following constraint was imposed on the game:
each participant must try to win the game with five or less incorrect matches.
5.1.3.1 Results and Discussions
Preliminary experiments demonstrate that the proposed on-line training procedure allows the
robot to learn its optimal assistive behaviours during personalized interactions. Namely, the robot
successfully detects a participant’s user state at every interaction, explores different behaviours,
and is rewarded when its behaviours improve user states.
Figure 28 shows the user states of all ten participants during the two games. One interaction is
defined to include a robot detecting a user’s action (which updates the activity state) as well as
72
the robot’s reaction during game playing. For example, an interaction may include the robot
detecting a person has flipped over one card and the robot providing instruction to the person to
flip over another card. From Figure 28, it can be seen that the robot was successful at detecting
the participants’ change in user state throughout the activity.
Figure 29 provides a more detailed view of two sets of ten different interactions for each of the
participants, i.e., one for each game. The robot was able to explore and determine appropriate
behaviours during game playing utilizing the proposed MAXQ control architecture and on-line
training procedure based on the participants’ user states and activity states. For example, for
Participant A, the robot was able to detect that the person was in a stressed state at interactions 8,
12 and 35, and provided assistance via the Identify and Locate help actions. Similarly, the
Identify action was explored for Participant D at interactions 7 and 15, and for Participant H at
interaction 12. The Recall help action was explored for Participant F at interaction 28. The
Instruction action was also explored and found to be effective at improving user states for
Participant C at interactions 10 and 35; Participant E at interactions 3, 8, and 40; Participant F at
interaction 22; Participant G at interactions 3, 5, 9, and 39; Participant I at interactions 14, 42 and
47; and Participant J at interactions 10 and 68. During the second game, the policy resulted in the
robot performing more exploitation than exploration, due to the decrease in ε. For example, this
was observed with Participants B, E and I, where the Instruction action was repeatedly chosen
during game 2 at interactions 65 and 69; 36 and 40; and 42 and 47, respectively. In general, the
robot was, on average, 70% successful at improving user state with its Instruction and Help
actions during the experiments.
73
Figure 28: Participant user states detected during the memory game
0
1
2
3
0 10 20 30 40 50 60U
ser
Stat
e
Interaction
Participant A
0
1
2
3
0 10 20 30 40 50 60 70 80
Use
r St
ate
Interaction
Participant B
0
1
2
3
0 10 20 30 40 50
Use
r St
ate
Interaction
Participant C
0
1
2
3
0 10 20 30 40 50 60
Use
r St
ate
Interaction
Participant D
0
1
2
3
0 10 20 30 40 50 60
Use
r St
ate
Interaction
Participant E
0
1
2
3
0 10 20 30 40 50 60 70 80
Use
r St
ate
Interaction
Participant F
0
1
2
3
0 10 20 30 40 50 60
Use
r St
ate
Interaction
Participant G
0
1
2
3
0 10 20 30 40 50 60 70 80
Use
r St
ate
Interaction
Participant H
0
1
2
3
0 10 20 30 40 50 60 70
Use
r St
ate
Interaction
Participant I
0
1
2
3
0 10 20 30 40 50 60 70 80
Use
r St
ate
Interaction
Participant J
74
Figure 29: Interaction details for all participants
Participant A Excited 1 1 2M 1 Pleased 2M 1 Neutral 2 Stressed 1 2 1
6 7 8 9 10 11 12 13 14 15 Interaction
Participant A Excited 2M 1 2M Pleased 1 2M Neutral 1 2 1 Stressed 2 1
33 34 35 36 37 38 39 40 41 42 Interaction
Participant B Excited Pleased 2M 0 1 Neutral 1 1 2 0 1 2 Stressed 1
1 2 3 4 5 6 7 8 9 10 Interaction
Participant B Excited 2M 0 Pleased 1 2M 0 Neutral 1 Stressed 1 2 0 1
65 66 67 68 69 70 71 72 73 74 Interaction
Participant C Excited 1 2M Pleased 2M Neutral 2 1 Stressed 0 1 2 2 1
2 3 4 5 6 7 8 9 10 11 Interaction
Participant C Excited 1 2M 1 2M 1 Pleased 2M Neutral 1 Stressed 2 2 1
30 31 32 33 34 35 36 37 38 39 Interaction
Participant D Excited 1 2M 2M Pleased 2M 1 2M 1 Neutral Stressed 1 2 1
7 8 9 10 11 12 13 14 15 16 Interaction
Participant D Excited 1 1 2M 1 2M Pleased 2M 2M 1 2M Neutral 1 Stressed
43 44 45 46 47 48 49 50 51 52 Interaction
Participant E Excited 2M Pleased 1 1 Neutral 2 1 2 1 Stressed 1 0 1
3 4 5 6 7 8 9 10 11 12 Interaction
Participant E Excited Pleased 2M 1 Neutral 1 2 1 2 Stressed 1 2 2 1
36 37 38 39 40 41 42 43 44 45 Interaction
Participant F Excited 2M 0 1 2M Pleased Neutral 2 0 Stressed 0 1 2 1
20 21 22 23 24 25 26 27 28 29 Interaction
Participant F Excited 0 1 2M Pleased 2M 0 1 2M 0 1 2M Neutral Stressed
69 70 71 72 73 74 75 76 77 78 Interaction
Participant G Excited 1 Pleased 2M 2M Neutral 2 Stressed 1 2 1 1 2 1
1 2 3 4 5 6 7 8 9 10 Interaction
Participant G Excited 2M 1 2M 1 2M 1 Pleased Neutral 2 1 Stressed 1 2
36 37 38 39 40 41 42 43 44 45 Interaction
Participant H Excited 2M 1 Pleased 0 Neutral Stressed 1 2 0 1 2 0 1
9 10 11 12 13 14 15 16 17 18 Interaction
Participant H Excited 2M 0 1 2M 0 1 Pleased Neutral Stressed 1 2 0 1
64 65 66 67 68 69 70 71 72 73 Interaction
Participant I Excited 2M 0 Pleased 2M 1 1 2M 0 1 Neutral Stressed 2 1
13 14 15 16 17 18 19 20 21 22 Interaction
Participant I Excited 1 2M 0 1 Pleased 2M 1 Neutral Stressed 2 1 2 1
40 41 42 43 44 45 46 47 48 49 Interaction
Participant J Excited 2M 0 Pleased 1 2M 0 1 Neutral 2 1 1 Stressed 1
8 9 10 11 12 13 14 15 16 17 Interaction
Participant J Excited 2M 0 1 2M 0 1 2M Pleased Neutral Stressed 2 0 1
66 67 68 69 70 71 72 73 74 75 Interaction
Game State
0 Zero cards flipped
1 One card flipped
2 Two cards flipped + No match
2M Two cards flipped + Match
Robot Behaviors
Instruction: Flip over
card
Encouragement: Flip back cards
Celebration: Remove
cards
Help: Localize Help: Recall Help:
Identify
75
Figure 30 shows the rewards for the Flip Over subtask for all ten participants during the
experiment to illustrate how rewarding of actions are implemented. After the two games, one can
observe which robot behaviours obtained the highest rewards and thus, became the current
optimal behaviour for each participant. For the majority of the participants the Instruction action
has the highest rewards followed by the Help action.
Figure 30: Rewards for the Flip Over subtask
-30-20-10
01020304050
0 10 20 30 40 50 60
Rew
ard
Interaction
Participant A
InstructionHelp
-30-20-10
01020304050
0 10 20 30 40 50 60 70 80
Rew
ard
Interaction
Participant B
-30-20-10
01020304050
0 10 20 30 40 50
Rew
ard
Interaction
Participant C
-30-20-10
01020304050
0 10 20 30 40 50 60
Rew
ard
Interaction
Participant D
-30-20-10
01020304050
0 10 20 30 40 50 60
Rew
ard
Interaction
Participant E
-30-20-10
01020304050
0 10 20 30 40 50 60 70 80
Re
war
d
Interaction
Participant F
-30-20-10
010203040506070
0 10 20 30 40 50 60
Re
war
d
Interaction
Participant G
-30-20-10
0102030405060
0 10 20 30 40 50 60 70 80
Re
war
d
Interaction
Participant H
-30-20-10
01020304050
0 10 20 30 40 50 60 70
Re
war
d
Interaction
Participant I
-30-20-10
01020304050
0 10 20 30 40 50 60 70 80
Re
war
d
Interaction
Participant J
76
Figure 31 shows the rewards for the Help subtask for Participants A and D. For Participant A, at
the end of the two games, the reward for Localize is higher than the rewards for the other Help
actions, and for Participant D, Identify has the highest reward.
Figure 31: Reward for Help subtask
5.1.4 HRI Study: Minimizing Task-Induced Stress
The goal of this work is to design robotic interventions focusing on cognitively stimulating
leisure activities in order to strengthen the remaining cognitive abilities of a person and promote
meaningful engagement in these activities. Therefore, the notion of completing the activity is not
as essential as keeping the person stimulated and engaged in the activity. To accomplish this,
herein, a person’s activity-induced stress must be reduced. In general, activity/task-induced stress
is known to result in negative moods and lead to disturbances in motivation (e.g., loss of task
interest) and cognition (e.g., worry) [90]. Moreover, stress has been found to progress the
symptoms of dementia [91]. Hence, a social robot motivator is utilized to keep a person engaged
in a cognitively stimulating activity so that he/she may potentially receive the positive benefits
from the directed stimulus.
A 2nd
HRI study was conducted to verify the use of Brian 2.0 in minimizing activity-induced
stress during cognitively stimulating activities. In this study, the proposed MAXQ online training
procedure was performed during HRI. The study consisted of one-on-one interaction scenarios
between six healthy adult participants (21 to 35 years of age) and the robot in a laboratory
setting. The memory game was used as the stimulating activity. All participants had past
experience with the game.
The specific aim of this study is to assess and validate the robot’s influence on the user state of a
person during HRI-based person-directed activities. Namely, within this specific aim the
following hypothesis was addressed: The robot Brian 2.0 with social interaction capabilities will
minimize activity-induced stress during a specific cognitively stimulating activity.
-22
-21
-20
-19
-18
-17
0 10 20 30 40 50 60
Re
war
d
Interaction
Participant A
LocalizeRecallIdentify
-22
-21
-20
-19
-18
-17
0 10 20 30 40 50 60
Rew
ard
Interaction
Participant D
77
The study simulated a scenario in which the participants could experience activity-induced stress
in order to demonstrate that, in such situations, the robot can be effectively used to minimize this
type of stress. As the participants are healthy adults, the following constraint is imposed on the
game: each participant must try to win the game with five or less incorrect matches.
5.1.4.1 Brian 2.0`s Influence on User State
The robot’s ability to improve user state is evaluated by examining the following two
performance indices: (i) the duration of the game (measured as percentage of game rounds) that
the participant is in a stressed state, and (ii) the average number of times during game playing
that the robot is actively able to improve user state when the participant is stressed. The 1st index
is calculated to validate the hypothesis and the 2nd
index is calculated to further investigate the
effectiveness of the robot’s behaviours.
5.1.4.2 Study Procedure
An AB test design was used to evaluate the ability of the robot to detect and improve the task-
based user state of a person during the memory game. During the baseline scenario, experimental
set A, a table and chair set-up was used, where the card game was arranged in the center of the
table, in front of the person, Figure 32a. Each participant was instructed to play the memory card
game with the aforementioned constraint imposed. Conversely, for the HRI scenario,
experimental set B, the sessions were designed to be the exact same as in the baseline except that
the robot was present to provide encouragement, help, and instructions during game playing,
Figure 32b. A baseline heart rate was obtained for each participant prior to each experiment. All
tests were recorded using a small webcam in order to monitor the participants indirectly to
relieve any potential psychological pressure instilled by the presence of an observer.
(a)
(b)
Figure 32: (a) Baseline Scenario and (b) HRI Scenario
78
5.1.4.3 Results and Discussions
In order to calculate the performance indices defined in Section 5.1.4.1, the changes in user state
during the baseline scenario and the HRI scenario were analyzed. In particular, the task-based
user state of a participant was detected and recorded at every round of the game. Figure 33
presents the results for the baseline and HRI scenarios with respect to the percentage of game
rounds that a participant was in a stressed state.
Figure 33: The percentage of the interaction that a participant is stressed
In the baseline scenario, participants spent, on average, 42.7% (σ = 11.5%) of the total number of
rounds in a stressed state. In the HRI scenario, participants spent an average of 21.8% (σ = 9.6%)
of the total number of rounds in a stressed state. To validate the hypothesis, a one-tailed, paired t-
test was performed. A statistical significance (p<0.005) between the mean values of the
percentage of the rounds a participant spent in a stressed state in both sessions was determined. It
can be confirmed that the amount of rounds of the game spent in a stress state was noticeably
less in the HRI scenario, which supports the hypothesis.
Figure 34 shows a comparison of the percentage of rounds in the HRI scenario for a stressed or
positive state (i.e., pleased or excited). On average, the participants were in a positive state
57.2% (σ = 5.6%) of the total number of rounds compared to the 21.8% for the stressed state.
This result shows that the robot’s behaviours can help promote positive user states during game
0%
10%
20%
30%
40%
50%
60%
1 2 3 4 5 6
Percen
tag
e o
f R
ou
nd
s in
a
Str
ess
ed
Sta
te
Participant
Baseline Scenario HRI Scenario
79
playing. The most effective robot behaviours in relieving stress were instructions and helping a
participant locate a corresponding card pair. It was observed that when utilizing these
behaviours, the robot, on average, was directly able to improve user state during game playing a
total of four times when a single participant was stressed. Examples of the robot’s behaviour
during the interactions are presented in Figure 35.
Figure 34: Comparison of the percentage of the interaction that a participant is in a
stressed or positive state during the HRI scenario
(a)
(b)
(c)
Figure 35: Examples of robot behaviours during interactions: (a) Robot providing help in a
neutral emotional state, (b) Robot providing celebration in a happy state after a correct
match, and (c) Robot providing instruction in a sad state when game disengagement
occurs.
5.1.4.4 Participant Survey Results
A post-experiment survey was administered to the participants after the HRI scenario to obtain
feedback on the robot’s behaviour during game playing. The types of questions that were asked
in the survey pertained to the robot’s social intelligence attributes as well as the participants’
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2 3 4 5 6
Per
cen
tag
e o
f R
ou
nd
s d
uri
ng
HR
I S
cen
ari
o
Participant
Stressed State Positive State
80
overall impressions of the robot. The participants were asked to choose their responses from a
list of robot behaviours.
The participants were first asked to identify the robot behaviours they found were the most
effective at relieving stress during game playing. Table 22 summarizes their responses based on
a ranking of the total number of responses for each behaviour. The robot providing instructions
was ranked the highest by the participants, which concurs with the quantitative experimental
results discussed above.
Table 22: Robot Behaviours Effective at Relieving Stress
Ranking Robot Behaviours
1st Providing instructions
2nd
Celebrating with you when you get a match
3rd
Prompting you to continue the game
4th
Encouraging you when you do not get a match
5th
Providing help
With respect to their overall impressions of the robot, ninety-percent of the participants stated
that the robot’s life-like appearance and its ability to express different emotions via facial
expressions and tone of voice while providing instruction, celebration, and encouragement
phrases helped keep them engaged and interested in the game. Seventy-percent stated that they
liked the fact that the robot provided companionship by just being present during the activity.
5.2 Meal-time Scenario
For the Meal-time Assistance scenario, the performance of the control architecture was validated
and HRI experiments were also performed.
5.2.1 Performance Assessment
Two sets of experiments were performed in order to individually evaluate the performance of: (i)
the control architecture and (ii) its learning-based decision making capabilities. The following
subsections will discuss the findings from both sets.
81
5.2.1.1 Control Architecture
Experiments were conducted to assess the performance of the proposed control architecture for
the Meal-assistance robot. Namely, the following three key modules were evaluated: (i) Activity
State, (ii) User State, (iii) and Behaviour Deliberation.
Activity State Module
For the Activity State module, a series of repeatability tests were conducted to evaluate the
module’s ability to detect changes in activity state. Each activity state parameter was tested by
“consuming” a meal multiple times. Pasta was used as the main dish food, fresh fruit was used as
the side dish food, and water was used as the beverage.
User State Module
The User State module was tested utilizing images from two validated facial expression
databases as well as real-time natural experiences from three volunteer participants. To test the
recognition for the user state angry, the User State module was tested with 115 front-facing
images of 31 different subjects displaying an angry expression. 88 images of 9 female and 13
male subjects were chosen from the Cohn-Kanade facial expression database [92]. 27 images of
9 female subjects were also chosen from the Japanese Female Facial Expression (JAFFE)
database [93]. Similarly, in order to test the recognition for the user state happy, the User State
module was tested with 124 front-facing images of 33 different subjects displaying a happy
expression. 96 images of 10 female and 14 male subjects were chosen from the Cohn-Kanade
facial expression database [92]. 28 images of 9 female subjects were chosen from the JAFFE
database [93]. All images used to test the user states angry and happy were coded with a
validated emotional label, which was determined by the database’s own coding system. The
distracted user state was tested using 824 images of the three participants. These participants
were instructed to look toward or away from the robot when directed to do so.
Behaviour Deliberation Module
Lastly, the Behaviour Deliberation module was tested via an in-lab HRI experiment consisting of
six healthy adults (21 to 33 years old). Participants were instructed to mimic eating of two meals
while interacting with Brian 2.0. Figure 36 shows the experimental setup. Participants were
asked to mimic eating by removing the food off the plates and out of the cup and pretending to
82
eat/drink by bringing food to their mouths. This experiment was used to test the robot behaviours
determined for the proposed scenario. Behaviours as determined from offline training were used
in this experiment.
Figure 36: Experimental Setup for Performance Assessment (Meal-time Scenario)
Results and Discussions
Results from this set of experiments are presented in Tables 23 to Table 27. As shown in Table
23 and Table 24, high sensitivity and specificity is achieved for all the tested activity state
parameters during the repeatability tests. A significant number of false negatives related to the
consumption levels were caused by undesired forces applied to the dishware during the act of
obtaining food with the utensil. In general, when food is scooped or pierced with a utensil, there
is a pressure applied to the dish. If there is prolonged contact (i.e. 5 seconds) the module
incorrectly detects that there is food added to the dish. In this test, the side dish was most
sensitive to these actions. Errors related to weight change may be attributed to there being an
insufficient amount of food/drink obtained from the dish or cup. In these tests, the system could
sense weight changes of 10g or more. Utensil location errors are the result of the inaccurate face
location data, which may occur when the User State module cannot detect the location of the
face. In this test, the person looked downwards while eating, causing the camera view of the
facial features to be skewed. Namely, the User State module could not detect the facial features
at this perceptive and thus, it temporarily lost the detection of the face. In Table 24, the higher
83
number of false positives for the detection of weight change in the side dish are attributed to
vibrations caused by aggressively placing the cup into its holder.
Table 23: Performance Results for Activity State Module (Meal-time Scenario): Sensitivity
Activity State Parameters No. of False
Negatives
No. of True
Positives Sensitivity
Consumption
Levels
Main Dish 5 402 99%
Beverage 11 1476 99%
Side Dish 25 567 96%
Weight
Change
Main Dish 0 121 100%
Beverage 3 100 97%
Side Dish 2 106 98%
Beverage Picked up from tray 0 103 100%
Utensil
Location
At tray 1 100 99%
At mouth 1 99 99%
Utensil
Movement
Not moving 3 803 100%
Moving to mouth 2 230 99%
Moving to tray 0 243 100%
Table 24: Performance Results for Activity State Module (Meal-time Scenario): Specificity
Activity State Parameters No. of False
Positives
No. of True
Negatives Specificity
Consumption
Levels
Main Dish 0 2079 100%
Beverage 0 999 100%
Side Dish 0 1894 100%
Weight Change
Main Dish 0 314 100%
Beverage 3 329 99%
Side Dish 27 327 92%
Beverage Picked up from tray 0 332 100%
Utensil
Location
At tray 1 99 99%
At mouth 1 100 99%
Table 25 and Table 26 show high sensitivity and specificity for detecting the angry, happy, or
distracted user state. False negatives in detecting the correct facial expression were caused by the
limitation of the User State module in detecting very subtle expressions. This can be corrected by
changing certain detection thresholds; however, it was found that adjusting these thresholds to
detect subtle expressions also results in the detection of more false positives. Errors in detecting
84
the distracted user state are attributed to losing the tracking or detection of key facial features.
For example, the module may lose the location of an eye if the person tilts his/her head too far in
one direction and the User State module can no longer detect eye at that perspective.
Table 25: User State Module Performance (Meal-time Scenario): Sensitivity
User State No. of False Negatives No. of True Positives Sensitivity
Angry 13 102 89%
Happy 14 110 89%
Distracted 78 768 91%
Table 26: User State Module Performance (Meal-time Scenario): Specificity
User State No. of False Positives No. of True Negatives Specificity
Angry 1 123 99%
Happy 4 111 97%
Distracted 60 1632 96%
As seen in Table 27, the Behaviour Deliberation module has a high success rate in choosing the
appropriate robot behaviour based on the person’s current action. Sources of error that were
found during this experiment include: (i) the person resting his/her hand on the dish causing a
false weight change, (ii) the person covering the infrared light with their hand causing a false
utensil location reading, (iii) vibration from slamming down the cup causing a false weight
change in the side dish, and (iv) the person leaving the utensil on the dish causing a false
consumption level reading.
Table 27: Performance of Behaviour Deliberation Module (Meal-time Scenario)
Expected Robot Behaviour No. of Trials No. of Failures Success Rate
Meal Start 12 0 100%
Obtain food from main dish 150 3 98%
Pick up beverage 78 2 97%
Obtain food from side dish 102 5 95%
Eat food 183 4 98%
Drink beverage 59 2 97%
Monitor 2 2 100%
Meal End 12 0 100%
85
5.2.1.2 Learning-based Decision Making
On-line training was utilized to adapt the system on the fly to the preferred motivation style of a
person eating a meal. Five healthy adult participants (21 to 33 years old) interacted with the
robot during two meals. The experimental setup is similar to the one utilized for the evaluation of
the offline training procedure (i.e. Figure 36). In these experiments, the first level of
customization is performed. Namely, during HRI, the robot learns the preferred motivation style
for each 2nd
level subtask. A successful action is defined to be when a person completes the
requested task. For example, if the robot asks a person to obtain food from the main dish, then a
successful action is when the person complies with the robot’s request. For this set of
experiments, the exploration policy for all subtasks is set to gradually reduce to 0 from 1, in 15
successful actions for each subtask.
Results and Discussions
The experimental results demonstrate the ability of the proposed on-line training procedure to
allow the robot to learn its optimal motivation styles for each participant during HRI. Namely,
the robot successfully explores different behaviours at every interaction based on the current
exploration policy, and is rewarded when its behaviours enables the person to complete the
requested action. Figure 37-39 show the details of the interactions for the main dish, beverage
and side dish, respectively, during the 1st and 2
nd meal. The left column of each figure shows ten
interactions during the 1st meal and the right column shows ten interactions during the 2
nd meal.
It can be seen that there is, as predicted, explorative behaviour during the first few interactions
with the dish or beverage, where the robot is trying different behaviours. As the HRI scenario
progressed into the 2nd
meal, robot behaviours became more exploitative (i.e. >50% exploitative
actions), which can be seen in the later interactions for all participants except for Participant A,
B, and C. For Participant A, the robot’s behaviour is still very explorative for the Pick up
beverage subtask since there have only been four successful actions prior to the interactions
shown for the 2nd
meal. For Participants A, B, and C, the robot’s behaviour is still very
explorative (i.e. >40% explorative actions) for the Drink beverage subtask.
86
Figure 37: Details of interactions involving the main dish
Needs attentionEating/Drinking M M M M M M M M M M M
Obtained Food/Drink E E E E E E E EIdle M
1 2 3 4 5 6 7 8 9 10 66 67 68 69 70 71 72 73 74 75
Needs attentionEating/Drinking M M M M M M M M M
Obtained Food/Drink E E E E E E E EIdle M M M
1 2 3 4 5 6 7 8 9 10 90 91 92 93 94 95 96 97 98 99
Needs attentionEating/Drinking M M M M M M M M M
Obtained Food/Drink E E E E E E E E E EIdle M
1 2 3 4 5 6 7 8 9 10 101 102 103 104 105 106 107 108 109 110
Needs attentionEating/Drinking M M M M M M M M
Obtained Food/Drink E E E E E E E E E E EIdle M
1 2 3 4 5 6 7 8 9 10 98 99 100 101 102 103 104 105 106 107
Needs attentionEating/Drinking M M M M M M M M M
Obtained Food/Drink E E E E E E E E E EIdle M
1 2 3 4 5 6 7 8 9 10 67 68 69 70 71 72 73 74 75 76Interaction
Participant E: Main Dish
Interaction
Participant D: Main Dish
Interaction
Participant D: Main Dish
Interaction
Participant E: Main Dish
Interaction
Participant B: Main Dish
Interaction
Participant C: Main Dish
Interaction
Participant C: Main Dish
Interaction
Interaction
Participant A: Main Dish Participant A: Main Dish
Interaction
Participant B: Main Dish
Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage
Subtask
87
Figure 38: Details of interactions involving the beverage
Needs attentionEating/Drinking B B B B B B B B B B
Obtained Food/Drink D D D D D D D D D DIdle
22 23 24 25 26 27 28 29 30 31 83 84 85 86 87 88 89 90 91 92
Needs attentionEating/Drinking B B B B B B B B B B
Obtained Food/Drink D D D D D D D D DIdle B
54 55 56 57 58 59 60 61 62 63 102 103 104 105 106 107 108 109 110 111
Needs attentionEating/Drinking B B B B B B B B B B
Obtained Food/Drink D D D D D D D D D DIdle
46 47 48 49 50 51 52 53 54 55 115 116 117 118 119 120 121 122 123 124
Needs attentionEating/Drinking B B B B B B B B
Obtained Food/Drink D D D D D D D D D DIdle B B
56 57 58 59 60 61 62 63 64 65 112 113 114 115 116 117 118 119 120 121
Needs attentionEating/Drinking B B B B B B B B B
Obtained Food/Drink D D D D D D D DIdle B B B
25 26 27 28 29 30 31 32 33 34 93 94 95 96 97 98 99 100 101 102
Interaction
Participant E: Beverage
Interaction
Participant E: Beverage
Interaction
Participant D: Beverage Participant D: Beverage
Interaction Interaction
Interaction
Participant C: Beverage Participant C: Beverage
Interaction Interaction
Participant A: Beverage
Interaction
Participant A: Beverage
Interaction
Participant B: Beverage Participant B: Beverage
Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage
Subtask
88
Figure 39: Details of interactions involving the side dish
Figure 40 to Figure 44 show the rewards of all 2nd
level subtasks during the experiment to
illustrate how rewarding of actions are implemented. After the two meals, one can observe which
robot behaviours obtained the highest rewards and thus, became the current optimal behaviours
for each participant. For example, all of the participants preferred the persuasion style Encourage
for the subtask Obtain food from main dish (Figure 40). For the Pick up beverage subtask
(Figure 41), Encourage was the most effective for Participants C D and E, and Orient was the
most effective for Participant A. For Participant B, there was no preference of persuasion style
indicated for this subtask, namely, at the end of two meals, the reward for all three robot actions
is the same. Participants B to E preferred the Encourage persuasion style for the Obtain food
from side dish subtask (Figure 42); whereas, Participant A preferred the Orient actions more for
Needs attentionEating/Drinking S S S S S S S S S S S S S
Obtained Food/Drink E E E E E E EIdle
34 35 36 37 38 39 40 41 42 43 109 110 111 112 113 114 115 116 117 118
Needs attentionEating/Drinking S S S S S S S S S S S S
Obtained Food/Drink E E E E E E EIdle S
65 66 67 68 69 70 71 72 73 74 121 122 123 124 125 126 127 128 129 130
Needs attentionEating/Drinking S S S S S S B B S S S S
Obtained Food/Drink E E E E D E E EIdle
67 68 69 70 71 72 73 74 75 76 123 124 125 126 127 128 129 130 131 132
Needs attentionEating/Drinking S S S S S S S S S S
Obtained Food/Drink E E E E E E E E E EIdle
76 77 78 79 80 81 82 83 84 85 127 128 129 130 131 132 133 134 135 136
Needs attentionEating/Drinking S S S S S S S S S S S
Obtained Food/Drink E E E E E E E E EIdle
38 39 40 41 42 43 44 45 46 47 108 109 110 111 112 113 114 115 116 117
Participant A: Side Dish
Interaction Interaction
Participant A: Side Dish
Participant E: Side Dish
Interaction Interaction
Participant C: Side Dish Participant C: Side Dish
Interaction Interaction
Participant B: Side Dish Participant B: Side Dish
InteractionInteraction
Participant E: Side Dish
Participant D: Side Dish Participant D: Side Dish
Interaction Interaction
Persuasion StyleM Obtain food from main dish EncourageB Pick up beverage Cue S Obtain food from main dish Orient E Eat foodD Drink beverage
Subtask
89
this subtask. For the Eat food subtask (Figure 43), Participants B to E preferred the Encourage
persuasion style and Participant A preferred the Cue persuasion style. Lastly, for the Drink
beverage subtask (Figure 44), the Cue style was preferred by Participants B-D and the
Encourage style was preferred by Participants A and E.
Motivation Styles
Figure 40: Rewards for the Obtain food from main dish subtask
Motivation Styles
Figure 41: Rewards for the Pick up beverage subtask
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Main Dish
-202468
101214
0 20 40 60 80 100 120 140R
ewar
dInteraction
Participant B: Main Dish
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant C: Main Dish
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant D: Main Dish
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant E: Main Dish
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Main Dish
EncourageCueOrient
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant B: Beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant C: Beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant D: Beverage
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant E: Beverage
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Main Dish
EncourageCueOrient
90
Motivation Styles
Figure 42: Rewards for the Obtain food from side dish subtask
Motivation Styles
Figure 43: Rewards for the Eat food subtask
Motivation Styles
Figure 44: Rewards for the Drink beverage subtask
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Side Dish
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant B: Side Dish
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant C: Side Dish
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant D: Side Dish
-202468
101214
0 20 40 60 80 100 120R
ewar
dInteraction
Participant E: Side Dish
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Main Dish
EncourageCueOrient
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Eat food
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant B: Eat food
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant C: Eat food
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant D: Eat food
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant E Eat food
-202468
101214
0 20 40 60 80 100 120
Re
war
d
Interaction
Participant A: Main Dish
EncourageCue
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant A: Drink beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant B: Drink beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant C: Drink beverage
-202468
101214
0 20 40 60 80 100 120 140
Rew
ard
Interaction
Participant D: Drink beverage
-202468
101214
0 20 40 60 80 100 120
Rew
ard
Interaction
Participant E Drink beverage
-202468
101214
0 20 40 60 80 100 120
Re
war
d
Interaction
Participant A: Main Dish
EncourageCue
91
5.2.2 Human-Robot Interaction Studies
In order to investigate the potential benefits of having the human-like embodied Brian 2.0
provide meal-assistance, an HRI and HCI comparison experiment was conducted. The specific
aim of this experiment was to study the users’ acceptance of the robot as well as the effects of
embodiment for the socially assistive robot Brian 2.0 in the Meal-time Scenario. The study
consisted of one-on-one interaction scenarios between six healthy adult participants (21 to 33
years of age) and the robot in a laboratory setting. An AB test design was used to evaluate the
users’ acceptance of the Meal-assistance robot: (i) a screen agent presented on the computer
screen, and (ii) an embodied socially assistive robot. Namely, within this specific aim the
following hypothesis was addressed: The embodied robot Brian 2.0 with both verbal and non-
verbal (e.g., facial expressions, body language) communication means will have improved user
acceptance over a screen agent.
During the baseline scenario, experimental set A, a table and chair set-up was used, where the
meal tray with food was placed on the table in front of the person and a still image of Brian 2.0
was shown on a computer screen, Figure 45a. Only verbal communication took place with
respect to the robot. Conversely, for the HRI scenario, experimental set B, the sessions were
designed to be the exact same as in the baseline except the robot Brian 2.0 was used, Figure 45b.
For both scenarios, each participant was invited to eat a meal while interacting with Brian 2.0.
Pasta and fresh fruit was utilized as “food” to be consumed by the participants during HRI.
Water was utilized as the beverage. Participants were asked to mimic eating and drinking similar
to the previous experiments. After each scenario, participants were directed to fill out a users’
acceptance questionnaire.
(a) (b)
Figure 45: (a) Baseline Scenario A and (b) Scenario B for HRI Study (Meal-time Scenario)
92
The questionnaire utilized in this experiment is based on the technology acceptance model
developed by Heerink et al. [94] to evaluate users’ acceptance of social robots for elderly users.
The questions are grouped into eleven constructs, which are defined in Table 28. The complete
questionnaire is presented in Table 29. Participants were instructed to indicate their agreement
with each statement using a five point Likert scale (5=strong agreement, 3=neutral, 1=strong
disagreement). The scale is inversed for statements in the Anxiety construct (i.e. 1=strong
agreement, 3=neutral, 5=strong disagreement).
Table 28: Construct Definitions [94]
Code Construct Definition
ANX Anxiety Evoking anxious or emotional reactions when using the system.
ATT Attitude Positive or negative feelings about the appliance of the technology.
FC Facilitating
conditions Objective factors in the environment that facilitate using the system.
ITU Intention to
use
The outspoken intention to use the system over a longer period in
time.
PAD Perceived
adaptability
The perceived ability of the system to be adaptive to the changing
needs of the user.
PENJ Perceived
enjoyment
Feelings of joy or pleasure associated by the user with the use of the
system.
PEOU Perceived
ease of use
The degree to which the user believes that using the system would be
free of effort.
PS Perceived
sociability The perceived ability of the system to perform sociable behaviour.
PU Perceived
usefulness
The degree to which a person believes that using the system would
enhance his or her daily activities.
SP Social
presence
The experience of sensing a social entity when interacting with the
system.
Trust Trust The belief that the system performs with personal integrity and
reliability.
93
Table 29: Users’ Acceptance Questionnaire [94]
Construct No. Statement
ANX
1 If I should use the robot, I would be afraid to make mistakes with it
2 If I should use the robot, I would be afraid to break something
3 I find the robot scary
4 I find the robot intimidating
ATT
5 I think it’s a good idea to use the robot
6 The robot would make my life more interesting
7 It’s good to make use of the robot
FC 8 I have everything I need to make good use of the robot.
9 I know enough of the robot to make good use of it.
ITU
10 I think I’ll use the robot again
11 I am certain to use the robot again
12 I’m planning to use the robot again
PAD
13 I think the robot can be adaptive to what I need
14 I think the robot will only do what I need at that particular moment
15 I think the robot will help me when I consider it to be necessary
PENJ
16 I enjoy the robot talking to me
17 I enjoy doing things with the robot
18 I find the robot enjoyable
19 I find the robot fascinating
20 I find the robot boring
PEOU
21 I think I will know quickly how to use the robot
22 I find the robot easy to use
23 I think I can use the robot without any help
24 I think I can use the robot when there is someone around to help me
25 I think I can use the robot when I have a good manual
PS
26 I consider the robot a pleasant conversational partner
27 I find the robot pleasant to interact with
28 I feel the robot understands me
29 I think the robot is nice
PU
30 I think the robot is useful to me
31 It would be convenient for me to have the robot
32 I think the robot can help me with many things
SP
33 When interacting with the robot I felt like I’m talking to a real person
34 It sometimes felt as if the robot was really looking at me
35 I can imagine the robot to be a living creature
36 I often think the robot is not a real person.
37 Sometimes the robot seems to have real feelings
Trust 38 I would trust the robot if it gave me advice
39 I would follow the advice the robot gives me
94
5.2.2.1 Results and Discussions
Table 30 shows the descriptive statistics for Scenarios A and B. In general, the proposed robot in
Scenario B, on average, scored high on questions within the Attitude, Intention of Use, Perceived
Enjoyment and Perceived Sociability constructs. Table 30 also shows the results from a paired
two-tail t-test, which was performed to evaluate the differences between the results of both
scenarios. Even though significance testing should be conducted with a larger group of
participants, the results here show if any potential relationships exist. Statements that show a t-
test result of “n/a” have exactly the same agreement scores for both scenarios. The highlighted
questions in Table 30 show the questions that have a higher agreement score in Scenario B than
Scenario A within this group with a statistical significance of p≤0.20. Based on the results, the
overall hypothesis is validated for these particular questions. Namely, the embodied robot Brian
2.0 with both verbal and non-verbal communication means has improved user acceptance over a
screen agent with respect to these particular statements.
95
Table 30: Users’ Acceptance Results
Scenario A: HCI Scenario B: HRI Paired t-test
Construct No. Min Max Mean Std.
Dev. Min Max Mean
Std.
Dev. t-stat
p-value
(two-tail)
ANX
1 1.0 4.0 1.8 1.2 1.0 5.0 2.7 1.6 -2.712 0.04
2 1.0 4.0 2.7 1.5 1.0 5.0 2.8 1.7 -0.307 0.77
3 1.0 3.0 1.7 1.0 1.0 4.0 2.7 1.2 -1.732 0.14
4 1.0 4.0 1.8 1.2 1.0 4.0 2.2 1.2 -0.542 0.61
ATT
5 3.0 5.0 4.2 0.8 4.0 5.0 4.3 0.5 -0.542 0.61
6 3.0 5.0 4.3 0.8 3.0 5.0 4.2 0.8 -0.542 0.61
7 3.0 5.0 4.2 0.8 3.0 5.0 4.2 0.8 0.000 1.00
FC 8 2.0 4.0 3.2 1.0 1.0 4.0 3.2 1.3 0.000 1.00
9 1.0 4.0 2.7 1.2 1.0 5.0 3.5 1.6 -1.746 0.14
ITU
10 2.0 5.0 3.7 1.2 3.0 5.0 4.5 0.8 -1.536 0.19
11 3.0 5.0 3.8 1.0 3.0 5.0 4.3 0.8 -1.464 0.20
12 2.0 5.0 3.7 1.2 3.0 5.0 4.2 1.0 -0.889 0.42
PAD
13 3.0 5.0 3.7 0.8 3.0 4.0 3.5 0.5 0.542 0.61
14 2.0 4.0 3.3 1.0 1.0 5.0 3.5 1.4 -0.542 0.61
15 2.0 5.0 3.7 1.2 3.0 5.0 3.8 0.8 -0.542 0.61
PENJ
16 3.0 5.0 4.2 0.8 4.0 5.0 4.3 0.5 -0.542 0.61
17 2.0 5.0 3.8 1.2 4.0 5.0 4.3 0.5 -1.168 0.30
18 2.0 5.0 3.8 1.2 3.0 5.0 4.3 0.8 -1.464 0.20
19 2.0 5.0 3.2 1.2 3.0 5.0 4.3 0.8 -2.445 0.06
20 1.0 4.0 2.3 1.4 1.0 3.0 1.7 0.8 1.348 0.24
PEOU
21 2.0 5.0 3.7 1.0 2.0 5.0 3.8 1.2 -1.000 0.36
22 3.0 5.0 4.0 0.6 1.0 5.0 3.8 1.5 0.415 0.70
23 1.0 5.0 3.7 1.5 1.0 5.0 3.2 1.6 1.464 0.20
24 4.0 5.0 4.8 0.4 4.0 5.0 4.8 0.4 n/a n/a
25 3.0 5.0 4.7 0.8 3.0 5.0 4.7 0.8 n/a n/a
PS
26 2.0 5.0 3.7 1.2 3.0 5.0 4.2 0.8 -1.464 0.20
27 2.0 5.0 3.7 1.2 3.0 5.0 4.2 0.8 -1.464 0.20
28 2.0 4.0 2.5 0.8 2.0 4.0 3.2 0.8 -3.162 0.03
29 2.0 5.0 4.0 1.1 3.0 5.0 4.2 0.8 -1.000 0.36
PU
30 2.0 5.0 3.5 1.2 2.0 5.0 3.3 1.2 0.542 0.61
31 2.0 5.0 3.3 1.5 2.0 5.0 3.7 1.0 -0.791 0.47
32 2.0 5.0 2.8 1.2 1.0 5.0 3.0 1.4 -0.542 0.61
SP
33 2.0 4.0 3.0 0.9 1.0 5.0 2.7 1.9 0.598 0.58
34 2.0 5.0 3.7 1.0 2.0 5.0 3.7 1.4 0.000 1.00
35 1.0 5.0 2.7 1.6 1.0 5.0 2.8 1.8 -0.349 0.74
36 2.0 5.0 4.3 1.2 1.0 5.0 3.2 1.7 1.941 0.11
37 1.0 4.0 3.0 1.1 2.0 4.0 3.3 0.8 -0.598 0.58
Trust 38 3.0 4.0 3.7 0.5 3.0 5.0 4.0 0.6 -1.581 0.18
39 3.0 4.0 3.5 0.5 3.0 5.0 3.8 0.8 -1.581 0.18
96
5.2.2.2 Participant Survey Results
A post-experiment survey was administered to the participants after the HRI scenario to obtain
feedback on the robot’s behaviour during the Meal-time Scenario. The types of questions that
were asked in the survey pertained to the robot’s social intelligence attributes as well as the
participants’ overall impressions of the robot. The participants were asked to choose their
responses from a list of robot behaviours.
The participants were first asked to identify the robot behaviours they found were the most
effective at engaging them in the meal assistance activity. They were also asked what
characteristics of the robot they liked the most and helped keep them engaged in the interaction
during the meal. Table 31 and Table 32 summarize their responses for both questions based on a
ranking of the total number of responses for each behaviour and characteristic, respectively.
Table 31: Robot Behaviours Effective at Engaging the Person in the Meal-time Scenario
Ranking Robot Behaviours
1st Providing verbal prompts/cues
2nd
Providing jokes
3rd
Providing comments about the food
4th
Providing non-verbal prompts/cues (e.g. robot pointing to the dish)
4th
Robot’s facial expressions in reaction to your actions
5th
Providing encouragement
5th
Providing orienting statements
6th
Providing non-meal-related commentary about the interaction
Table 32: Most Liked Robot Characteristics
Ranking Robot Behaviours
1st The robot’s human voice
2nd
The companionship the robot provides by just being there
3rd
The robot’s life-like appearance and demeanour
4th
The robot expressing different emotions through facial expressions
and different tones of voice
97
Chapter 6 Conclusion
6.1 Summary of Contributions
The primary contributions of this work are summarized in the following subsections.
6.1.1 Control Architecture for Socially Assistive Robots
A novel HRI control architecture has been developed to enable socially assistive robots to
effectively engage a person in cognitively stimulating person-centred activities. A modular
design approach has been applied to the overall control architecture, allowing for the addition
and/or substitution of different sensor modalities as needed based on the intended activity.
During HRI, the control architecture monitors the person’s user state and behaviour during the
activity and adapts the robot’s emotion-based behaviour to the current interactive scenario. This
control architecture has been successfully applied to the Memory Game and Meal-time HRI
scenarios.
6.1.2 Learning-based Robot Assistive Behaviours
A learning-based decision making module for the proposed control architecture has been
developed to determine the robot’s effective assistive behaviour during HRI. The module is
composed of two layers: (i) the Knowledge Clarification layer and (ii) the Intelligence layer. The
role of Knowledge Clarification layer is to clarify the current state of the activity and person.
Based on the current assistive interactive scenario, the Intelligence Layer module then uses the
MAXQ hierarchical reinforcement learning technique to determine the robot’s behaviour. A
HRL approach is utilized to provide the robot with the ability to: (i) learn appropriate assistive
behaviours based on the structure of the activity, and (ii) personalize an interaction based on a
person’s behaviour or user state during HRI. The learning-based decision making module has
been successfully applied to the Memory Game and Mealtime HRI scenarios.
6.1.2.1 MAXQ for Assistive HRI Scenarios
To date, this work presents the first application of MAXQ for assistive HRI scenarios. The
MAXQ technique is an efficient solution compared to the traditional Q-learning approach for
multimodal HRI, where the latter requires the exploration of a large number of states and actions
98
and an extensive amount of experience to learn the optimal policy. MAXQ solves the
reinforcement learning problem more efficiently for assistive scenarios by decomposing the
overall assistive problem into a set of sub-problems. MAXQ also supports temporal abstraction,
state abstraction, and subtask abstraction which are important in the decision making process for
the socially assistive robot in an HRI scenario.
6.1.2.2 Affect-based Learning
In this work, affect-based learning is an essential part of the personalization of HRI. In affect-
based learning, the robot learns its optimal assistive behaviours based on the person’s affective
state during HRI. This is a novel addition to decision making processes of socially assistive
robots that engage individuals in varying socially and/or cognitively stimulating activities.
Typically, robots developed for these purposes only adapt their behaviours to task performance.
The aim in utilizing affect-based learning in this scenario is to select the robot’s behaviours in an
attempt to maintain positive affective states during interactions. It is postulated that this will, in
turn, enable the person to be more engaged in the activity.
6.1.3 Metrics Explored for the Evaluation of HRI
Different evaluation metrics for HRI with Brian 2.0 were explored in this work. Namely, three
HRI studies were performed to investigate: (i) the effectiveness of the robot in engaging a person
in a cognitively stimulating activity, (ii) the ability of the robot to minimize task-induced stress
during a cognitively stimulating activity, and (iii) user acceptance of an embodied robot versus a
screen agent in a meal-time scenario.
6.2 Discussion of Future Work
Future work consists of optimizing the overall robotic system in order to perform a pilot study
with the robot at the ASBLab’s collaborative healthcare facility with elderly persons suffering
from mild cognitive impairment. Experimental findings and participant feedback has provided
insight into how the capabilities of Brian 2.0 can be further improved for future studies. With
respect to the development of the control architecture, the main area that should be focused on is
improving user state detection.
99
In both the Memory Game and Meal-time scenario, there was only one sensor used to detect the
affect component of user state. One sensor provided enough information to evaluate the proposed
control architecture; however, to obtain a more accurate indication of a person’s affect during
HRI, more inputs should be added. For example, additional sensors can be added to detect more
affect indicators (e.g. vocal intonation, body language, etc.). Moreover, the detection of more
human affective states (e.g. sad, surprised, fear, etc.) for existing inputs (i.e. heart-rate and facial
expression) can also be explored. In general, the detection of more affective states will allow the
robot to learn how to respond appropriately to human affect, resulting in more opportunities for
bidirectional emotion-based HRI.
6.3 Final Concluding Statement
This work provides valuable insight into the use of innovative robotic technologies as therapeutic
or assistive aids to manage dementia. Specifically, the proposed learning-based control
architecture provides a socially assistive robot with the necessary abilities to be an effective
social motivator in cognitively stimulating activities. As a social motivator, the robot can
effectively engage individuals in important activities to promote task completion, and thus,
reduce dependence on caregivers. Moreover, the architecture’s learning capabilities can increase
activity engagement over time by personalizing the robot’s behaviours. Namely, it can enable the
robot to learn which of its behaviours will promote positive user states and/or increase task
compliance. The affect detection aspect of this control architecture provides opportunities for
bidirectional emotion-based HRI, which are critical for cognitively impaired individuals who
have begun to lose their ability to communicate verbally. Proposed improvements to the User
State within the control architecture can potentially enhance the quality of the bidirectional
emotion-based HRI, resulting in HRIs that are more natural and believable.
100
References
[1] T.K. Tatemichi et al., "Cognitive impairment after stroke: frequency, patterns, and
relationship to functional abilities," Journal of Neurology, Neurosurgery & Psychiatry,
vol. 57, no. 2, pp. 202-207, 1994.
[2] A. Wimo and M. Prince, "World Alzheimer's Report 2010: The Global Economic Impact
of Dementia," Alzheimer's Disease International, London, 2010.
[3] E.F. LoPresti, R.C. Simpson, N. Kirsch, D. Schreckenghost, and S. Hayashi, "Distributed
cognitive aid with scheduling and interactive task guidance," Journal of Rehabilitation
Research and Development, vol. 45, no. 4, pp. 505-522, 2008.
[4] J.M. Wiener, R.J. Hanley, R. Clark, and J.F. Van Nostrand, “Measuring the Activities of
Daily Living: Comparisons across National Surveys,” Journal of Gerontology, vol. 45,
no. 6, pp. S229-S237, 1990.
[5] J.D. Churchill, R. Galvez, S. Colcombe, R.A. Swain, and W.T. Greenough, "Excercise,
experience, and the aging brain," Neurobiology of Aging, vol. 23, no. 5, pp. 941-955,
2002.
[6] G.H. Recanzone, "Cerebral cortical plasticity: perception and skill acquisition," in The
New Cognitive Neurosciences, M.S. Gazzaniga, Ed. Cambridge: MIT Press, 2000, pp.
237-247.
[7] P.B. Baltes and S.L. Willis, "Plasticity and enhancement of intellectual functioning in old
age: Penn State's Adult Development and Enrichment Project (ADEPT)," in Aging and
Cognitive Processes, F.I.M. Craik and S. Trehub, Eds. New York: Plenum Press, 1982,
pp. 353-390.
[8] C. Greenburg and S.M. Powers, "Memory improvement among adult learners,"
Educational Gerontology, vol. 13, no. 3, pp. 263-280, 1987.
[9] G. Rebok and L.J. Bacerak, "Memory self-efficacy and performance differences in young
and old adults: effects of mnemonic training," Developmental Psychology, vol. 25, no. 5,
pp. 714-721, 1989.
[10] S.L. Willis, "Cognitive training and everyday competence," Annual Review of
Gerontology and Geriatrics, vol. 7, no. 1, pp. 159-188, 1987.
[11] J.A. Yesavage, "Nonpharmalogical treatments for memory losses and normal aging," The
American Journal of Psychiatry, vol. 142, no. 1, pp. 600-605, 1985.
[12] K. Ball, D. Berch, and K. Helmers, "Effects of cognitive training interventions with older
adults," The Journal of the American Medical Association, vol. 288, no. 18, pp. 2271-
2281, 2002.
[13] A.T. Cianciolo and R.J. Sternberg, Intelligence: A Brief History. Malden, MA, USA:
Wiley-Blackwell, 2004.
[14] J.B. Jobe et al., "ACTIVE: A cognitive intervention trial to promote independence in
older adults," Controlled Clinical Trials, vol. 22, no. 4, pp. 453-479, 2001.
[15] M. Hofmann, C. Hock, A. Kuhler, and F. Muller-Spahn, “Interactive computer-based
cognitive training in patients with Alzheimer's disease,” Journal of Psychiatric Research,
vol. 30, no. 6, pp. 493-501, 1996.
[16] L. Tárraga et al., “A randomised pilot study to assess the efficacy of an interactive,
multimedia tool of cognitive stimulation in Alzheimer’s disease,” Journal of Neurology,
Neurosurgery & Psychiatry, vol. 77, no. 10, pp. 1116–1121, 2006.
101
[17] V.K. Günther et al., “Long-term improvements in cognitive performance through
computer-assisted cognitive training: a pilot study in a residential home for older people,”
Aging & Mental Health, vol. 7, no. 3, pp. 200–206, 2003.
[18] G. Cipriani, A. Bianchette, M. Trabucchi, “Outcomes of a computer-based cognitive
rehabilitation program on Alzheimer’s disease patients compared with those on patients
affected by mild cognitive impairment,” Archives of Gerontology and Geriatrics, vol. 43,
no. 3, pp. 327–335, 2006.
[19] L.E. Middleton and K. Yaffe, “Promising strategies for the prevention of dementia,”
Archives of Neurology, vol. 66, no. 10, pp. 1210-1215, 2009.
[20] L. Fratiglioni and H.X. Wang, “Brain reserve hypothesis in dementia,” Journal of
Alzheimer’s Disease, vol. 12, no. 1, pp. 11-22, 2007.
[21] J.S. Saczynski et al., "The effect of social engagement on incident dementia: the
Honolulu-Asia aging study," American Journal of Epidemiology, vol. 163, no. 5, pp. 433-
440, 2006.
[22] A. Karp et al., “Mental, physical and social components in leisure activities equally
contribute to decrease dementia risk,” Dementia and Geriatric Cognitive Disorders, vol.
21, no. 2, pp. 65-73, 2005.
[23] J. Kayser-Jones and E.S. Schell, “Staffing and the meal-time experience of long-term
care facilityresidents on a Special Care Unit,” American Journal of Alzheimer’s Disorder
and other Dementias, vol. 12, no. 2, pp. 67-72, 1997.
[24] C.C. Chen, L.S. Schilling, and C.H. Lyder, "A concept analysis of malnutrition in the
elderly," Journal of Advanced Nursing, vol. 36, no. 1, pp. 131-142, 2001.
[25] M. Matarić, A. Okamura and H. Christensen, “A Research Roadmap for Medical and
Healthcare Robotics,” NSF/CCC/CRA Roadmapping for Robotics Workshop,
Arlington/Washington, DC, 2008, pp. 1-30.
[26] S. Shibata and K. Wada, “Robot therapy-a new approach for mental healthcare of the
elderly,” Gerontology, vol. 57, no. 4, pp. 378-386, 2010.
[27] T. Hamada et al., “Robot therapy as for recreation for elderly people with dementia -
Game recreation using a pet-type robot,” IEEE International Symposium on Robot and
Human Interactive Communication, Munich, 2008, pp.174-179.
[28] A. Tapus, C. Tapus, and M.J. Mataric, “Long term learning and on-line robot behaviour
adaptation for individuals with physical and cognitive impairments,” Field and Service
Robotics: Springer Tracts in Advanced Robotics, 1st ed., vol. 62, A. Howard et al., Eds.,
Berlin/Heidelberg: Springer, 2010, pp. 389-398.
[29] B. Robins et al., “Human-centred design methods: developing scenarios for robot assisted
play informed by user panels and field trials,” International Journal of Human-Computer
Studies, vol. 68, no. 12, pp. 873-898, 2010.
[30] H. Kozima and C. Nakagawa, “Social robots for children: practice in communication-
care,” IEEE International Workshop on Advanced Motion Control, Istanbul, 2006, pp.
768-773.
[31] E. Ferrari, B. Robins, and K. Dautenhahn, “Therapeutic and educational objectives in
robot assisted play for children with autism,” IEEE International Symposium on Robot
and Human Interactive Communication, Toyama, 2009, pp. 108-114.
[32] J. Hoey, A. Von Bertoldi, T. Craig, P. Poupart, and A. Mihailidis, “Automated
Handwashing Assistance For Persons With Dementia Using Video and A Partially
Observable Markov Decision Process,” Computer Vision and Image Understanding, vol.
114, no. 5, pp. 503-519, 2010.
102
[33] K. Sim et al. "Improving the accuracy of erroneous-plan recognition system for Activities
of Daily Living," IEEE International Conference on e-Health Networking Applications
and Services (Healthcom), Lyon, 2010, pp.28-35.
[34] H. Si, S.J. Kim, N. Kawanishi, and H. Morikawa, “An Guidance System Based on Q-
Learning for Supporting Dementia Patient's Activities of Daily Living,” IEEE Consumer
Communications and Networking Conference, Las Vegas, NV, 2007, pp. 1199-1200.
[35] J. Pineau, M. Montemerlo, M. Pollack, N. Roy and S. Thrun, “Towards robotic assistants
in long-term care facilities: Challenges and results,” Robotics and Autonomous Systems,
vol. 42, no. 3-4, pp. 271-281, 2003.
[36] Z. Zhang and G. Nejat, “Human affective state recognition and classification during
human-robot interaction,” ASME Design Engineering Technical Conference, San Diego,
CA, 2009, DETC2009-87647.
[37] G. Nejat and M. Ficocelli, “Can I be of assistance? The intelligence behind an assistive
robot,” IEEE International Conference on Robotics and Automation, Pasadena, CA,
2008, pp. 3564–3569.
[38] J. Terao, L. Trejos, Z. Zhang, and G. Nejat, “The design of an intelligent socially
assistive robot for elderly care,” ASME International Mechanical Engineering Congress
and Exposition, Boston, MA, 2008, IMECE2008-67678.
[39] T. Fong, I. Nourbakhsh, and K. Dautenhahn, "A survey of socially interactive robots,"
Robotics and Autonomous Systems, vol. 42, no. 3-4, pp. 143-166, 2003.
[40] A. Lockerd and C. Breazeal, "Tutelage and socially guided robot learning," IEEE/RSJ
International Conference on Intelligent Robots and Systems, Sendai, 2004, pp. 3475-
3480.
[41] P. Ravindra, S. De Silva, T. Matsumoto, S.G. Lambacher, and M. Higashi, "Development
of a social learning mechanism for a humanoid robot," International Conference on
Intelligent Sensors, Sensor Networks and Information Processes, 2008, Sydney, pp. 243-
248.
[42] T. Prommer, H. Holzapfel, and A. Waibel, “Rapid simulation-driven reinforcement
learning of multimodal dialog strategies in human-robot interaction”, INTERSPEECH,
Pittsburgh, PA, 2006, pp. 1918-1921.
[43] F. Krsmanovic, C. Spencer, D. Jurafsky, and A.Y. Ng, “Have we met? MDP based
speaker ID for robot dialogue,” INTERSPEECH, Pittsburgh, PA, 2006, pp. 461-464.
[44] S.R. Schmidt-Rohr, M. Losch, R. Dillmann, “Human and robot behaviour modeling for
probabilistic cognition of an autonomous service robot,” IEEE International Symposium
on Robot and Human Interactive Communication, Munich, 2008, pp. 635-640.
[45] A. Tapus, C. Tapus, and M.J. Mataric, "Hands-off therapist robot behaviour adaptation to
user personality for post-stroke rehabilitation therapy," IEEE International Conference
on Robotics and Automation, Roma, 2007, pp. 1547-1553.
[46] T.G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function
decomposition,” Journal of Artificial Intelligence Research, vol. 13, no. 1, pp. 227-303,
2000.
[47] S. Dominguez, E. Zalama, J.G. Garcia-Bermejo, and J. Pulido, "Motivation and
competitive learning in a social robot," IEEE/RSJ International Conference on Intelligent
Robots and Systems, Nice, 2008, pp. 3826-3831.
[48] C. Breazeal et al., “Learning from and about others: towards using imitation to bootstrap
the social understanding of others by robots," Artificial Life, vol. 11, no. 1-2, pp. 31-62,
2006.
103
[49] S. Schmidt-Rohr, S. Knoop, M. Lösch, and R. Dillmann, "Reasoning for a multi-modal
service robot considering uncertainty in human-robot interaction," ACM/IEEE
International Conference on Human-Robot Interaction: Living with Robots, Amsterdam,
2008, pp. 249-254.
[50] P. Prodanov and A. Drygajlo, "Bayesian networks based multi-modality fusion for error
handling in human-robot dialogues under noisy conditions," Speech Communication, pp.
231-248, 2005.
[51] F. Doshi and N. Roy, "Spoken language interaction with model uncertainty: an adaptive
human-robot interaction system," Connection Science, vol. 20, no. 4, pp. 299-318, 2008.
[52] K. Georgila, J. Henderson, and O. Lemon, “User simulation for spoken dialogue systems:
learning and evaluation,” INTERSPEECH, Pittsburgh, PA, 2006, pp. 1065-1068.
[53] E. Levin, R. Pieraccini, and W. Eckert, “A stochastic model of human- machine
interaction for learning dialog strategies,” IEEE Trans. Speech Audio Process., vol. 8, no.
1, pp. 11–23, 2000.
[54] O. Pietquin, “Probabilistic framework for dialog simulation and optimal strategy
learning,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no.
2, pp. 589-599, 2006.
[55] H. Cuayahuitl, S. Renals, O. Lemon, and H. Shimodaira, “Human-computer dialogue
simulation using hidden Markov models,” IEEE Workshop on Automatic Speech
Recognition Understanding (ASRU), San Juan, PR, 2005.
[56] B.A. Shore, D.C. Lerman, R.G. Smith, B.A. Iwata, and I.G. DeLeon, “Direct assessment
of quality of care in a geriatric long term care facility,” Journal of Applied Behaviour
Analysis, vol. 28, pp. 435–448, 1995.
[57] D. Kuhn, B.F. Fulton, and P. Edelman, “Factors influencing participation in activities in
dementia care settings,” Alzheimer’s Care Quarterly, vol. 5, pp. 144–152, 2004.
[58] J. T. Cacioppo, M.E. Hughes, L.J. Waite, L.C Hawkley, and R.A. Thisted, “Loneliness as
a specific risk factor for depressive symptoms: Cross-sectional and longitudinal
analyses,” Psychology and Aging, vol. 21, no. 1, pp. 140–151, 2006.
[59] L. Buettner et al, “Therapeutic recreation as an intervention for persons with dementia
and agitation: an efficacy study,” American Journal of Alzheimer’s Disease and Other
Dementias, vol. 11, no. 5, pp. 4-12, 1996
[60] C. B. Hall et al., “Cognitive activities delay onset of memory decline in persons who
develop dementia,” Neurology, vol. 73, pp. 356-361, 2009.
[61] A.M. Kolanowski et al, “Capturing interests: therapeutic recreation activities for persons
with dementia,” Therapeutic Recreation Journal, vol. 35, pp. 220-235, 2001.
[62] J. Cohen-Mansfield, M.S. Marx, M. Dakheel-Ali, N.G. Regier, and K. Thein, “Can
persons with dementia be engaged with stimuli?” American Journal of Geriatric
Psychology, December 2009.
[63] S. Jeffery. (2008, July). Cognitive stimulation technique may prevent decline in healthy
elderly. Medscape Medical News [On-line]. Available:
http://www.medscape.com/viewarticle/577373
[64] G. Matthews et al., “Emotional intelligence, personality, and task-induced stress,”
Journal of Experimental Psychology: Applied, vol. 12, no. 2, pp. 96-107, 2006.
[65] M.C. Pardon, “Therapeutic potential of some stress mediators in early Alzheimer's
disease,” Experimental Gerontology, vol. 46, no. 2-3, pp. 170-173, 2010.
[66] D.G. Lowe, “Distinctive image features from scale-invariant keypoints,” International
Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004.
104
[67] M. Agarwal, "2D/3D Object Detection Techniques," University of Toronto, B.A.Sc.
Thesis 2011.
[68] A. Lee, T. Kawahara, “Recent Development of Open-Source Speech Recognition Engine
Julius,” Asia-Pacific Signal and Information Processing Association Annual Summit and
Conference, Sapporo, 2009, pp. 131-137.
[69] VoxForge. (2010, Dec. 17). VoxForge Downloads [On-line]. Available:
http://www.voxforge.org/home/downloads
[70] Nemesysco. Ltd. (2006) Voice-Analysis Tools for Security & Commercial Use [On-line].
Available: http://www.nemesysco.com
[71] A. Heinzel et al., “Differential modulation of valence and arousal in high-alexithymic and
low-alexithymic individuals,” Neuroreport, vol. 21, no. 15, pp. 998-1002, 2010.
[72] P.G. Jorna, “Spectral analysis of heart rate and psychological state: A review of its
validity as a workload index,” Biological Psychology, vol. 34, no. 2-3, pp. 237-257, 1992.
[73] R.G. Hansson, “Considering social nutrition in assessing geriatric nutrition,” Geriatrics
vol. 33, no. 3, pp. 49-51, 1978.
[74] E.S. Schell and J. Kayser-Jones, “The effect of role-taking on caregiver-resident meal-
time interaction,” Applied Nursing Research, vol. 12, no. 1, pp. 38-44, 1999.
[75] C.L. Osborn and M. Marshall, "Promoting meal-time independence," Geriatric Nursing,
vol. 13, no. 5, pp. 254-258, 1992.
[76] C.R. Hellen, "Eating: An Alzheimer's activity," American Journal of Alzheimer’s
Disorder and other Dementias, vol. 5, no. 2, pp. 5-9, 1990.
[77] M. Coyne and L. Hoskins, "Improving Eating Behaviors in Dementia Using Behavioral
Strategies," Clinical Nursing Research, vol. 6, no. 3, pp. 275-290, 1997.
[78] A. Do and A. Saad, "Development of a Sensory System for a Meal-assistance Socially
Assistive Robot," Autonomous Systems and Biomechatronics Laboratory, University of
Toronto, Internal Report 2011.
[79] SourceForge. (2008). Wiiuse [Online]. Available: http://sourceforge.net/projects/wiiuse/
[80] K. Choi, ""Face Tracking Program"," Autonomous Systems and Biomechatronics
Laboratory, University of Toronto, Internal Report 2010.
[81] P. Viola and M. Jones, “Rapid object detection using boosted cascade of simple
features,” IEEE Conference on Computer Vision and Pattern Recognition, 2001, pp. 511-
518.
[82] T. Watson. (2010). Auto Smiley – Computer vision smile generator. FAT LAB: free art &
technology [Online]. Available: http://fffff.at/auto-smiley/
[83] M. Bartlett, G. Littlewort, C. Lainscsek, I. Fasel, and J. Movellan, “Machine learning
methods for fully automatic recognition of facial expressions and facial actions,” IEEE
International Conference on Systems, Man & Cybernetics, 2004, pp. 592-597.
[84] P. Ekman and W. Friesen. Facial Action Coding System: A Technique for the
Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto, CA, 1978.
[85] P. Ekman, and J. Hager. (1998). Facial Action Coding System Affect Interpretation
Database (FACSAID) [Online]. Available: http://face-and-
emotion.com/dataface/facsaid/description.jsp
[86] T.G. Dietterich, "An Overview of MAXQ Hierarchical Reinforcement Learning," in
Abstraction, Reformulation, and Approximation, B. Choueiry and T. Walsh, Eds.
Berlin/Heidelberg: Springer, 2000, pp. 26-44.
[87] C. Watkins and P. Dayan, “Q-learning”, Machine Learning, vol. 8, no. 3-4, pp. 279-292,
1992.
105
[88] M.C. Silveri, G. Reali, C. Jenner and M. Puopolo, “Attention and memory in the
preclinical stage of dementia,” Journal of Geriatric Psychiatry and Neurology, vol. 20,
no. 2, pp. 67-75, 2007.
[89] B. Brenske, E.H. Rudrud, K. A. Schulze, and J. T. Rapp, “Increasing activity attendance
and engagement in individuals with dementia using descriptive prompts,” Journal of
Applied Behaviour Analysis, vol. 41, no. 2, pp. 273-277, 2008.
[90] G. Matthews et al., “Emotional intelligence, personality, and task-induced stress,”
Journal of Experimental Psychology: Applied, vol. 12, no. 2, pp. 96-107, 2006.
[91] M.C. Pardon, “Therapeutic potential of some stress mediators in early Alzheimer's
disease,” Experimental Gerontology, vol. 46, no. 2-3, pp. 170-173, 2010.
[92] T. Kanade, J.F. Cohn, and Y. Tian, “Comprehensive database for facial expression
analysis,” IEEE International Conference on Automatic Face and Gesture Recognition,
Grenoble, 2000, pp. 46-53.
[93] M.J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, "Coding Facial Expressions with
Gabor Wavelets" IEEE International Conference on Automatic Face and Gesture
Recognition, Nara, 1998, pp. 200-205.
[94] M. Heerink, B. Krose, V. Evers, and B. Wielinga, “Measuring acceptance of an assistive
social robot: a suggested toolkit,” IEEE International Symposium on Robot and Human
Interactive Communication, Toyama, 2009, pp. 528-533.
106
Appendix
A.1 List of My Publications
[1] J. Chan and G. Nejat, “Designing intelligent socially assistive robots as effective tools in
cognitive interventions,” International Journal of Humanoid Robotics, vol. 8, no. 1, pp.
103-126, 2011.
[2] J. Chan and G. Nejat, “Minimizing Task-Induced Stress in Cognitively Stimulating
Activities using an Intelligent Socially Assistive Robot,” IEEE International Symposium
on Robot and Human Interactive Communication, Atlanta, GA, 2011, In print.
[3] J. Chan and G. Nejat, “A learning-based control architecture for an assistive robot
providing social engagement during cognitively stimulating activities,” IEEE
International Conference on Robotics and Automation, Shanghai, 2011, In print.
[4] J. Chan and G. Nejat, “The design of an intelligent socially assistive robot for person-
centered cognitive interventions,” ASME International Design Engineering Technical
Conferences, Montreal, 2010, IDETC2010-26861.
[5] J. Chan and G. Nejat, "Promoting engagement in cognitively stimulating activities using
an intelligent socially assistive robot," IEEE/ASME International Conference on
Advanced Intelligent Mechatronics, Montreal, 2010, pp. 533-538.
Recommended