Upload
dimitrios-vogiatzis
View
212
Download
0
Embed Size (px)
Citation preview
LONG PAPER
A cognitive framework for robot guides in art collections
Dimitrios Vogiatzis • Vangelis Karkaletsis
Published online: 15 July 2010
� Springer-Verlag 2010
Abstract A basic goal in human–robot interaction is to
establish such a communication mode between the two
parties that the humans perceive it as effective and natural;
effective in the sense of being responsive to the informa-
tion needs of the humans, and natural in the sense of
communicating information in modes familiar to humans.
This paper sets the framework for a robot guide to visitors
in art collections and other assistive environments, which
incorporates the principles of effectiveness and naturalness.
The human–robot interaction takes place in natural lan-
guage in the form of a dialogue session during which the
robot describes exhibits, but also recommends exhibits that
might be of interest to the visitors. It is also possible for the
robot to explain its reasoning to the visitors, with a view to
increasing transparency and consequently trust in the
robot’s suggestions. Furthermore, the robot leads the visi-
tors to the location of the desired exhibit. The framework is
general enough to be implemented in different hardware,
including portable computational devices. The framework
is based on a cognitive model comprised of four modules: a
reactive, a deliberative, a reflective and an affective one.
An initial implementation of a dialogue system realising
this cognitive model is presented. main ontology.
Keywords HCI � Dialogue system �Cognitive architecture � Recommender system �Explanations
1 Introduction
The present work aims to set the specifications for the
dialogue system of intelligent mobile robots that interact
with humans, while they operate and provide services in
populated environments. In particular, the specifications
are focused on robots serving as guides in museums, art
collections or cultural foundations, but they can be exten-
ded to other domains also, since the specifications are quite
general. Robots have been deployed as guides in art col-
lections for a long time (at least since 1999), but their focus
has been mostly on navigation, image recognition and
keeping track of people [5, 7, 31]. In human–robot inter-
action, the support of natural language (at least up to a
certain degree) has been somewhat neglected; most exist-
ing systems are offering only prerecorded information
about the exhibits. A museum visitor, after entering the
premises, is in the following situation: there are too many
exhibits to visit, some of which might be of interest,
whereas many others are indifferent. It seems that some
guidance is necessary in order to personalise the visit and
make it better fulfil the goals and preferences of the visitor.
This guidance could be offered through a mobile robot that
interacts with visitors, offering personalised information
about specific exhibits and advice about what exhibits to
visit, could play this guidance role. Interaction occurs
through a system that is equipped with a dialogue manager
enacting a dialogue session. The dialogue manager is based
on a modular cognitive model that comprises a reactive
module, a deliberative module and finally a reflective
module, staring from the bottom up to top. These modules
perform the ‘‘thinking’’ process of the system, each acting
at a different level of cognition. In addition, the architec-
ture is enriched with an affective module that is able to
detect user emotions and express robot emotions. Thus, the
D. Vogiatzis (&) � V. Karkaletsis
Institute of Informatics and Telecommunications, NCSR
‘‘Demokritos’’, 153 10, Agia Paraskevi, Athens, Greece
e-mail: [email protected]
V. Karkaletsis
e-mail: [email protected]
123
Univ Access Inf Soc (2011) 10:179–193
DOI 10.1007/s10209-010-0199-3
aim is to address human–robot communication from two
sides: by enabling robots to correctly perceive and under-
stand natural human behaviour and by making them act in
ways that are familiar to humans. The proposed cognitive-
based model is a solid framework for an effective and
natural robot.
This paper is organised as follows. The next section
provides an overview of related work, as well as of this
paper’s contribution. Section 3 provides an overview of the
cognitive-based framework. The constituent parts are then
analysed. The reactive, deliberative and reflective modules
of the framework are elaborated in Sects. 4, 5, and 6,
respectively. An implementation of the framework as well
as some early evaluation results are exposed in Sect. 7.
Finally, Sect. 8 presents the conclusions and future
research directions.
2 Related work
The research presented here falls at the crossroads of dia-
logue, cognitive and recommender systems. While a robot
guide will integrate speech and gesture recognition as well
as text to speech generation, and it will be able to map the
surrounding space and navigate, these subjects are not
covered in the current work.
Among humans, dialogue represents perhaps the most
natural form of communication. A spoken dialogue sys-
tem (SDS) enables humans to communicate with com-
puters using speech as input, and the computer responds
with speech generation (see [17] for an overview of
spoken dialogue systems). Spoken dialogue systems are
divided into three major categories depending on who has
the control of the dialogue. In system-directed dialogue,
the system asks a sequence of questions to elicit the
required parameters of the task from the user. In user-
directed dialogue, the dialogue is controlled by the user,
who asks the system questions in order to obtain infor-
mation. Finally, in mixed-initiative dialogue, the dialogue
control alternates between the two participants. The user
can ask questions at any time, but the system can also
take control to elicit the required information or to clarify
ambiguous information. In all types of dialogue systems,
the dialogue has to be managed from the robot’s side, in
order to determine what questions the system should ask,
when and how it should respond to the requests of the
users.
Concerning dialogue management techniques, there are
three main types, i.e., state based, frame based and plan
based. The most common category is the state based which
is based on graphs. Dialogue is expressed as a network of
states connected by edges. The system at each state can
perform one of the following steps:
1. Ask the user for specific information, either providing
the expected answers or asking a specific question,
2. Generate a response to the user, or
3. Access an external application.
The assumption here is that the user should provide a
response among a limited number of particular answers,
thus the input at each state should be specific. This leads to
system-directed dialogue systems. The advantages of this
technique are faster development and more robust systems.
On the other hand, the usage of this technique, results in
limited flexibility in the dialogue structure. CSLU,1
Dipper2 and TrindiKit3 are representative examples of
platforms that support the building of state-based dialogue
systems.
In frame-based approaches, each frame represents a task
or subtask and it has slots representing the pieces of
information that the system needs in order to complete the
task. Frame-based dialogue managers ask the user ques-
tions in order to fill in the slots of the frame in a certain
order, but they also allow the user to guide the dialogue
by proving information that fills in slots according to
individual preference. Therefore, with this technique,
mixed-initiative systems can be built. This leads to shorter
dialogues compared to state-based approaches. However,
user utterances become less restricted and hence harder to
predict, which increases the time needed to develop a
robust system.
Plan-based techniques are used in building dialogue
systems, where the pieces of information or actions needed
to perform a task are hard to predict in advance. This type
of technique, in contrast with the other two types, does not
depend on task modelling (using graphs or frames). It
concentrates instead on identifying the user’s plan and
determining how the system can contribute towards the
execution of that plan. This is a dynamic process, where
new information from the user may either contribute to the
system’s perception of the user’s plan or force the system
to modify it. Consequently, more initiatives are allowed by
the user, at the cost of far more complex implementation
and maintenance compared to the previously mentioned
approaches.
Recent research has started to examine the efficient use
of domain knowledge in dialogue systems [18]. Domain
knowledge is knowledge about the environment in which
the system operates. It contains domain-specific concepts,
concepts properties and relations, such as concept x is-part-
of concept y. The ontology in ontology-based dialogue
systems drives interpretation, generation and the interac-
tion of the dialogue system with the user. For instance, the
1 http://www.cslu.cse.ogi.edu/toolkit/.2 http://www.ltg.ed.ac.uk/dipper/.3 http://www.ling.gu.se/projekt/trindi/trindikit/.
180 Univ Access Inf Soc (2011) 10:179–193
123
domain-specific lexicon and grammar of the automatic
speech recognition (ASR) component can be partially
derived from the ontology. Furthermore, the NLG com-
ponent (NLG) can generate descriptions of the ontology
objects. Dialogue systems with a domain knowledge
backbone benefit from reduced reliance on hand-crafted
linguistic components, more flexible dialogues and, in
general, they can be ported more easily to new application
domains.
Building artificial systems that interact intelligently and
naturally with humans is a difficult task. The field of
cognitive systems contributes in this effort by creating
systems that are based on the cognitive processes or
architectures in humans. Cognition can be understood as a
sequence of steps, each step feeding the next one. It begins
with a perception of the outside world and continues with
the representation of the perceived information. Then the
newly acquired information is processed along with older
pieces of information to produce actions towards the out-
side world, as well as to change the internal state of the
system. Thus, a cognitive system reacts upon events in the
environment, and deliberates anticipating future events.
Moreover, it can handle situations that have not been
preprogrammed and is robust upon unexpected events.
Additionally, it interacts intelligently with other social
agents, including humans. Cognitive systems are at the
crossroads of artificial intelligence, linguistics and psy-
chology. An overview of cognitive system architectures
can be found in [15] and [34].
There are two main stances for implementing a cogni-
tive system. The first, known as cognitivist approach,
involves symbols and rules. Symbols are related to entities
of the surrounding world, and rules tell how to process the
symbols. Ultimately, it is based on the physical symbol
system hypothesis that was stated by Newell and Simon
[21]. The second major approach is described as emergent
and incorporates techniques from connectionism and
dynamic systems. In such systems, knowledge is repre-
sented in a more opaque, subsymbolic manner. The sub-
symbolic representation is usually defined by the origin of
the information (e.g., sensors or database), rather than by
the content of the information. Moreover, in an emergent
system with multiple streams of information, knowledge is
usually stored in a distributed fashion. The need for com-
bination of cognitivist and emergent systems arises from
the need for both symbolic and subsymbolic processing.
This combination leads to hybrid systems. A key enabling
technology for next generation robots for the service,
domestic and entertainment market is human–robot inter-
action. A robot that collaborates with humans on a daily
basis (be this in care applications, in a professional or
private context) requires interactive skills that go beyond
keyboards, button clicks or even natural language. For this
class of robots, human-like interactivity is a fundamental
part of their functionality. In doing so, virtual emotion
expression and human emotion recognition must be inte-
grated in a cognitive system. Handling emotions in a
computational system has been advocated for a long time
[26]. Essentially, the argument for affective computers is
that they can exhibit a more natural and intelligent
behaviour in interaction with humans. The idea of incor-
porating emotions in dialogue systems is gaining momen-
tum [2].
The authors were involved in the projects Xenios4 and
Indigo,5 which both involved robots operating in museums
and serving as guides to the exhibits. In particular, in both
projects robots were deployed at the ‘‘Foundation of
the Hellenic World’’, which is a cultural institution for the
presentation of Hellenic history.6 In the Xenios project, the
robotic system involved a state-based spoken dialogue
system that could talk about specific exhibits of a museum,
upon the user’s request. In the Indigo project (a continua-
tion and extension of Xenios), the robotic platform enabled
a more sophisticated human–robot interaction. The dia-
logue form was more complex, and the platform included
an affective module that could express robotic emotions by
altering the robot’s facial features.
The present work, based on the experience from those
projects, advocates the introduction of a complex cog-
nitive architecture (inspired by the work of Minsky [19])
to support a more natural form of robot human interac-
tion, which is enhanced with a affective module. There
is also a recommender module, which aims at making
suggestions to the visitor about exhibits that might be of
individual interest, based on analysing his past prefer-
ences and his similarity to other users. In addition, the
reasoning behind the suggestions can be explained to the
user.
Another line of research work is the conversational
recommender systems, which actively involve the user in
the formation of a recommendation of a product or service.
The user’s participation can take the form of a dialogue
session or simply of a selection of the most relevant items
from a list of n-items; thus, the user provides feedback that
can be extremely useful, especially in differentiating
between his short-term interests and his longer-term
interests. Conversational recommender systems were ini-
tially developed as an enhancement to content-based sys-
tems, but recently they have found a fertile ground in
collaborative systems [13, 27].
4 http://www.ics.forth.gr/xenios/.5 http://www.ics.forth.gr/indigo/contact.html.6 http://www.fhw.gr/fhw/en/home/index.html.
Univ Access Inf Soc (2011) 10:179–193 181
123
2.1 Contribution
This paper sets a framework of intelligent interaction
between a machine agent and a human. The suggested
framework incorporates many modalities of interaction,
creating a whole, which is perceived as natural and intel-
ligent by humans. The modalities are separate modules
connected to the framework, and they include, natural
language generation, natural language understanding and
robot navigation.
The framework is based on an enhanced variation of a
three-level cognitive architecture that has been proposed in
[30], and it allows the engineering of many possible
machine agents in different incarnations. In the current
work, the framework is implemented as a mobile robot, but
the strength of the framework is that it is flexible enough to
be implemented on different hardware, such as PDA
devices. Moreover, the framework reduces, what is con-
sidered as intelligent interaction into the reaction, deliber-
ation and reflection layers. The current form of the
framework clarifies the interplay of its different constitu-
ents, and the ways they are combined to create a natural
form of interaction. The framework models the user that
interacts with it. The modelling aims to predict user future
preferences. The model is quite complex, as it includes past
user interaction, user similarity to previous users and user
emotional state. In addition, the robot is able to express
emotions. It also differentiates between what is to be
expressed by the machine agent, and how it will be
expressed. Finally, the framework provides for explana-
tions of the robot’s utterances and is able to adapt to the
desires of the current user. This is achieved with the aid of
a recommendation engine that forms part of the the
deliberative level of the framework.
The framework was implemented in a workable proto-
type, and a demonstration of its feasibility has been dem-
onstrated at the premises of an art collection.
3 Robot framework
The dialogue system according to the proposed framework
consists of resources and modules (see Fig. 1). The mod-
ules are the Dialogue System Manager (DSM), the Natural
NLG TTS NLIASR
ROBOT FACE
PSERVER
APPLICATION DATABASES
RECOMMENDATION ENGINE
EXPLANATION ENGINE
DIALOGUE MANAGER &
COMMUNICATION SERVER
DOMAIN ONTOLOGY
USERS MODELS
(PSERVER)
ROBOT PERSONALITY
GESTURE RECOGNITION
NAVIGATION
Fig. 1 Modules and resources of the robotic platform
182 Univ Access Inf Soc (2011) 10:179–193
123
Language Generation (NLG) engine, the Personalisation
Server (PServer), the Automatic Speech Recognition (ASR)
engine, the Text To Speech (TTS) synthesiser, a display
depicting a robot face and the Gesture Recogniser. Finally,
there is a communication server, which enables the inter-
module communication.
The DSM is the ‘‘actor’’ of the whole dialogue system,
in the sense that it is the module that invokes and coordi-
nates all the other modules. In order to decide the next
dialogue state and the text it will utter (through the TTS
unit), it takes into account the cognitive model, the inter-
action history of the user, as well as the location of the
robot. All the above contribute into creating more natural
dialogues. The DSM also communicates with the robot
navigation module, which controls the robotic movement,
and the gesture recogniser for understanding simple ges-
tures from the visitor. The DSM is the cognitive centre of
the entire system, and its constituent parts are the reactive,
deliberative, reflective and affective modules, which are
discussed separately in the next sections (see Fig. 2 for a
conceptual view of the cognitive structure). The cognitive
structure has been influenced by the work of Sloman in
[30]. The modules are not domain specific; consequently,
they can be easily transferred to other domains.
The resources of the dialogue system are as follows: the
robotic personality, which influences the expression of
robotic emotions; the resources of the NLG engine; user
models; data bases recording the interaction history of each
user; and some databases that hold canned text and other
information used during the dialogue.
4 Reactive module
The reactive module is the lowest in terms of cognitive
abilities. It involves no decision and no planning. It just
reacts to user’s requests without any deliberation. It has rules
of the form: \if[ condition \do[, or rules of the form if
\situation and goal[then\do[. For instance, the first type
of rules might capture the fact that when the user talks, the
dialogue system tries to understand. The second type of
rules, refer to acting, provided that the action matches a user
set goal. For instance, the user wants to be informed about an
exhibit that is located far away from his current location, and
places this query to the robot. As the robot traverses the
museum rooms, guiding the user to the exhibit of interest, it
encounters other exhibits that it completely ignores. Other
functions performed by the reactive module include:
– if there is an obstacle in the course of the robot, it tries
to avoid it
– if the robot is close to an exhibit of interest to the user,
it stops and talks about it
– quits the tour upon relevant request by the user.
In order to achieve the above, the robot needs a form a
mapping of the surrounding space and also to be able to
recognise human gestures. The ASR module is also part the
of reactive module.
Next, two examples of a user’s request that invoke the
reactive module are presented.
On the other hand, a request such as the following
cannot be handled by the reactive module:
U: I want something interesting.
5 Deliberative module
The deliberative module allows the robot to exhibit a
sophisticated and complex behaviour in a museum that
renders the interaction with a human more natural. Delib-
eration is placed at a ‘‘higher level’’ than reaction. The
point of focus here is the dialogue system of the robotic
platform. This is not to say that navigation, machine visionFig. 2 Cognitive system architecture of the dialogue manager
Univ Access Inf Soc (2011) 10:179–193 183
123
and speech recognition do not require sophisticated algo-
rithms and possibly planning, but they do not form an
integral part of the proposed framework.
In the current framework, the dialogue form is dynamic
in the sense that the next robot utterance depends on the
history of interaction of the user, his current location, his
stereotype, the museum’s ontology and some other factors
that are analysed below. That is, the dialogue form is not
preset; rather, the next dialogue state is generated dynam-
ically by a complex deliberation process. It should be
mentioned that the dialogue manager is in control of the
dialogue session. In particular, the deliberation takes
decisions regarding (see also Fig. 3).
WHAT to talk about: It concerns the exhibit or the
museum program to talk about next to a specific user. It
depends on the ontology of the museum, the user’s past
interaction with the robot, other users’ interaction with the
robot, the visitor assimilation factor related to exhibits and,
finally, the robot’s preference.
HOW to express what is to be uttered. This depends on
the user’s model and on the robot’s emotional state.
The functionality of these modules is described next.
5.1 WHAT to talk about
5.1.1 Ontology based
Based on the museum’s ontology suggestions for exhibits
that are spatially related to the current position of the user
can be provided. In particular, suggestions are provided for
exhibits that are in close physical proximity to the location
of the user (close-to, next-to, right-to, left-to, above, below
the user).
This mechanism also provides suggestions for items that
are semantically related to what the user has seen. Some
interesting semantic similarities that can exploited include
the following: subclasses of the same classes and entities of
the same class are related. Similar time periods are also
related, even if they do not refer to entities of the same
class. Finally, exhibits by the same designer are also
semantically related.
Let us assume that the visitor has seen some pottery
samples from the archaic period. Then the robot makes the
following semantically related suggestion:
5.1.2 Recommendation
The DSM records the interaction history of each user, i.e,
what each user has seen. The above information, along
with the user demographic data, constitute the user model.
The interaction history of all users is kept in a structured
form, which facilitates extraction of personal preferences
and the derivation of similarities among users. The
module that realises the above is the PServer. In partic-
ular, the PServer is equipped with a recommender module
that suggests items of possible interest to the current user
(see [1] for an overview of recommender systems). It can
provide content-based recommendations, that is sugges-
tions that are similar to past user preferences. Thus, the
suggestion of an unseen exhibit to a particular user is
estimated based on similar items he has seen in the past.
In doing so, the system tries to discover commonalities
between exhibits seen in the past, which have been pos-
itively rated.
In particular, some of following features (fields) of
exhibits (entities) are used to detect commonalities: author,
historical period, type of artefact, etc.
Also, a collaborative recommender system is in place.
Such a system can suggest interesting exhibits to a user
based on items judged as interesting by a group of ‘‘similar
minded’’ users. For instance, it might be discovered that
visitors interested in architecture are also interested in
sculpture. Thus, a new visitor, after having received
information about an architectural exhibit, might be offered
advice for seeing the collection of sculptures. A more
detailed expositions of the capabilities of the PServer is
provided is Sect. 7.3.
The system also takes into account acceptance of pre-
vious suggestions. For example, assuming that the user hasFig. 3 Functional relation of the modules of the dialogue manager
184 Univ Access Inf Soc (2011) 10:179–193
123
seen in the past a similar programme to the one suggested,
the following is a case of content-based recommendation,
where the user expresses aversion to the suggestion.
In another example, the user has just seen a few exhibits
of the museum. A recommendation is produced by simi-
larly minded users:
5.1.3 Assimilation
Assimilation is a variable that controls whether text will be
generated and consequently uttered for an exhibit. As
already mentioned, the whole museum collection of
exhibits is represented by an OWL ontology. For every
entity of the ontology, the value of a parameter known as
assimilation variable can be set. This captures the degree
to which a particular entity has been comprehended by a
certain visitor stereotype. Thus, each time text is generated
for an exhibit, a counter is decreased by one, until it
reaches zero. After that, the exhibit is considered known to
a user and no more text can be generated even if the user
explicitly asks for it.
5.2 HOW to express
Assuming the existence of certain plausible user stereo-
types, such as children, adults and experts, different user
stereotypes have different needs concerning a museum
tour. For example, children might favour, shorter descrip-
tions of museum items with simpler words, and experts
might favour more elaborate descriptions. The user ste-
reotype can be inferred as follows. Assuming that child-
and adult-specific tickets are tagged with RFID, then a
robot equipped with an RFID reader could infer the user
stereotype.
The following is an example of description of an exhibit
by the Robot when addressing to an adult and a child,
respectively. In the case of the child, the text is shorter and
uses simpler vocabulary.
5.2.1 Affective module
In the cognitive architecture, the affective module is
the‘‘emotional’’ centre of the system. It is capable of rec-
ognising certain types of emotions, but also of expressing
emotions.
Thus, the robot is able to recognise the emotional con-
tent in the user’s utterance as he interacts. Currently, the
emotional recognition is limited to recognising polite,
impolite or neutral requests by the user. Recognition is
based on swallow parsing the user’s utterance. The emo-
tional content of the user’s utterance creates an emotional
impulse to the robot, which is represented according to the
OCC model [23].
Then, the next step is the expression of robot emotion. In
doing so, the emotional impulse is processed along with the
robot’s personality represented by an OCEAN vector and
the previous mood of the robot (represented as an OCC
vector). This will result in a new robotic emotion, which
can be directed towards influencing the facial robot fea-
tures on an LCD display. It is also possible to influence the
intonation of the text to speech system of the robot in order
to differentiate between positive and negative emotions.
The robot’s facial features change to reflect high interest
following the emotional response received by a motivated
user’s utterance.
The robot’s facial features can also change to reflect
disappointment, upon an unwelcome user’s request.
U: I am bored, I want to quit.
R: Ok. Let me take you to the exit.
Univ Access Inf Soc (2011) 10:179–193 185
123
6 Reflective module
The reflective module stands on top of the deliberation
and reactive modules and provides a management mech-
anism for the cognitive system. There are many reasons
for the existence of such a mechanism, and in particular
for revising the policies implemented by the two lower
level modules. There are cases when the deliberation
module fails to provide suggestions about exhibits that are
interesting to the users. As mentioned, the robotic sug-
gestions are based on semantic similarity, or on a rec-
ommender system that has two versions (a content based
and a collaborative based). Hence, there are three strate-
gies for generating the next dialogue state with a sug-
gestion, and it is necessary to identify which one is most
adequate to a certain user. Whereas some a-priori
assumptions can be stated about when each suggestion is
more applicable, these assumptions will soon or later turn
to insufficient. This is a case where the reflective mech-
anism is needed to revise or evolve the strategy of
selecting the most appropriate strategy to generate the
next state of the dialogue, or in other words to decide the
content of an utterance. A simple and efficient way to set
up the management mechanism is to assign weights to the
different suggestions, and then to evolve the weights by
means of a genetic algorithm. This algorithm is revising
the weights based on approval or disapproval discovered
in users’ utterance, which creates a positive or negative
reinforcement signal.
Another important function of the reflective module is to
provide explanations of its suggestions.
6.1 Explanations
6.1.1 Opaque explanations
There are cases in which the visitor will request an
explanation of why the robot suggested an exhibit. First, it
might be on a purely informative basis, or the request
might be an indirect suggestion to the robot that it has
failed to provide adequate explanation. Thus, the robot
needs to revise its policy. In any case, explainability
increases the visitor’s trust in the system.
Two different kinds of explanation are offered. The first
is the neighbour style explanation, which essentially pro-
vides a histogram of the similar users’ preferences (see
Fig. 4 and [11] for reference). This type of explanation is
pertinent in the case that the system employs the collabo-
rative-based recommendations.
Another type of explanation is based on the system’s
past-credibility, with respect to the current user.
The two aforementioned types of explanation are char-
acterised as opaque, since the reasoning is not based on the
features of the exhibit but rather on preferences of users.
6.1.2 Transparent explanations
In this case, the explanation is based on the features (fields)
of the suggested exhibit, which are matched again the
features of the user’s model (profile). For instance, if
the user’s model is that of Table 1, then the features of the
15
32
0%
10%
20%
30%
40%
50%
60%
70%
80%
Liked Neutral Disliked
Fig. 4 Similar users preferences
Table 1 Individual user model, comprising three attributes and 8
features. Numbers denote user’s ratings or preference of the corre-
sponding features
Age 23
Expertise Non-expert
Gender Female
Occupation Engineer
Dedicated-to.athena 0.2
Dedicated-to.zeus 0.2
Dedicated-to.athena 0.2
Located-in.ancient-agora 0.8
Located-in.acropolis 0.1
Located-in.keramikos 0.3
Constructed-by.phedias 0.9
Constructed-by.kallikrates 0.1
186 Univ Access Inf Soc (2011) 10:179–193
123
model (those below the occupation) will be matched
against the features of the suggested exhibit, and the fields
that provide a close match will be reported as an
explanation.
In case there is a follow-up question with the user
requesting further explanation on the acquisition, in the
first place, of the user model fields (feature) values, then
the relevant exhibits that the user has visited can be cited to
back-up the current form of his model. This type of
explanations is based on the work reported in [20].
6.1.3 Explanations and natural language generation
As it has been mentioned, the text that is communicated to
the user is generated by the NLG and is based on resources
such as the domain ontology and the microplans (which are
templates used by the NLG engine). Apart from the
exhibits that are described by the domain ontology, the
types of explanation that are offered are also included in
the domain ontology. The opaque and the transparent types
of explanation are subtypes of the explanation type and are
represented in OWL as follows:
\owl : Class rdf : ID ¼ ‘‘Transparent’’[\rdfs : subClassOf rdf : resource
¼ ‘‘#Explanations’’=[ \=owl : Class[owl : Classrdf : ID ¼ ‘‘Opaque’’[\rdfs : subClassOf rdf : resource
¼ ‘‘#Explanations’’=[ \=owl : Class[
The opaque type has two instances, namely the past-
credibility and the neighbour style, whereas the transparent
type has the feature-based instance.
\a:owl : Opaquerdf : ID¼ ‘‘past� credibility’’[\a:owl : numberxml : lang¼ ‘‘EN’’[singular
\=a:owl : number[\=a:owl : Opaque[\a:owl : Opaquerdf : ID¼ ‘‘neighbour� style’’[
\a:owl : numberxml : lang¼ ‘‘EN’’[
singular
\=a:owl : number[\=a:owl : Opaque[\a:owl : Transparentrdf : ID
¼ ‘‘feature� based’’[\a:owl : numberxml : lang¼ ‘‘EN’’[ singular
\=a:owl : number[\=a:owl : Transparent[
The following are examples of two fields of the opaque
data type, namely the number of similar neighbours and the
percentage of past successes.
\owl : DatatypePropertyrdf : ID
¼ ‘‘nSimilarNeighbors’’[\owl : DatatypePropertyrdf : ID
¼ ‘‘perPastSuccess’’[
A microplan that is employed by the NLG module for
generating text for the field perPastSuccess. A microplan is a
succession of slots. In this case, the name of the microplan is
p1, and the microplan is made of the following slots: the
pronoun I, the verb have been the string correct and is
followed by the field value, which is a number.
\owlnl : MicroplanName[p1
\owlnl : MicroplanName[\owlnl : Pron[\owlnl : Valxml : lang¼ ‘‘en’’[I\=owlnl : Val[\=owlnl : Pron[\owlnl : Verb[\owlnl : voice[active\owlnl : voice[\owlnl : tense[presentperfect\=owlnl : tense[\owlnl : Valxml : lang¼ ‘‘en’’[havebeen \=owlnl : Val[\=owlnl : Verb[\owlnl : string[\owlnl : Valxml : lang¼ ‘‘en’’[correct\owlnl : Val[\owlnl : string[\owlnl : Filler[\owlnl : case[nominative\=owlnl : case[\owlnl : Filler[\owlnl : string[\owlnl : Valxml : lang¼ ‘‘en’’
[ofthetime\owlnl : Val[ \owlnl : string[
7 Implementation and evaluation
A part of the proposed framework has been implemented
with a view to evaluating it in real conditions. This section
Univ Access Inf Soc (2011) 10:179–193 187
123
describes the modules and resources that have been
implemented and integrated in an early test case.
7.1 Design of the dialogue manager
This subsection describes how the dialogue manager imple-
ments the content module that was mentioned in a previous
subsection. TrindiKit [33] was chosen as the starting point of
a model for dialogue management. TridiKit is based on two
very simple notions: the current state of the dialogue and
some update rules, which are comprised of a condition and an
action part. Once the appropriate rule that matches the cur-
rent state is discovered, then the current state is replaced with
the action part of the rule. The original TrindiKit had some
limitations, as it did not allow for dynamic rules in the sense
that is described below. Thus, Trindikit was altered in order
to integrate a recommendation module.
The system tries to suggest items, which in this case are
museum exhibits or documentaries that are relevant to the
user’s desires, interests, etc. The suggestion depends on the
factors mentioned in Sect. 5.1 regarding the WHAT mod-
ule. Namely, the suggestion can be ontology based or based
on the suggestion of a recommender system. Thus, the
update rule of the TrindiKit should include in the action
part an invocation of the recommendation engine. In that
sense, the update rules are dynamic—the same rule yields
different responses based on the user’s interaction history.
The following is an example of information state (IS) of
the dialogue, as well as of an update rule, following the
formal representation of Trindikit. The IS is split into 3
parts. The first part is called Objects and refers to the
objects, or exhibits (in the current setting), that the user has
already seen. Position refers to the current position of the
visitor with reference to a specific object. The second part
is the dialogue history (DH) and refers to what the user
(usr) and the robot (sys) have uttered in the past. Finally,
the third part represents the last utterance (LU), which
came from the user or the system. The user in this example
has already visited objects o2, o4 and o16. He stands now
close to o16; the user liked o16, and the last one to talk was
the user. The above information is represented in accor-
dance with the TrindiKit formalism as follows:
IS
Objects [Visited: o2, o4, o16]
[Position: o16]
DH [...likes(usr, o16)]
LU Speaker: usr
At this point, an update rule is applied that matches the
current information state. In particular, if the last speaker
was the user, and the meaning of his utterance was that he
liked the visited object, then the recommender module is
called to suggest an item that might be of interest to the
user. This is the action part of the rule, realised in the
context of TrindiKit with the push command, which will
insert in a stack the result of the recommendation.
7.2 Ontology authoring and NLG
The ontology is represented in OWL,7 the Semantic Web
standard language for specifying ontologies, using the
ELEON8 ontology authoring and enrichment tool [6].
ELEON is an open-source project developed at NCSR
‘‘Demokritos’’.
In particular, ELEON allows the authoring of OWL
ontologies, and the enrichment of ontologies with linguistic
resources, such as a lexicon and microplans. Microplans
provide instruction to an NLG module about the production
of text. Microplans are templates, which are simpler than
grammar rules, but far easier to develop by non-experts. An
NLG engine does not form an integral part of ELEON.
ELEON exports the authored ontology in OWL format, and
the linguistic resources in the RDF format. Moreover,
ELEON is able to support linguistic resources for English,
Italian and Greek. Finally, the ELEON can be connected to
an inference engine for consistency check of the authored
ontology.
The NLG engine is the NaturalOWL [10], which
generates text based on an enriched ontology. The
ontology was authored based on information provided by
the archaeologists of the FHW. NaturalOWL, is heavily
based on ideas from ILEX [22] and M-PIRO [12]. Unlike
its predecessors, NaturalOWL is simpler (e.g., it is
entirely template-based), and it provides native support for
OWL ontologies. NaturalOWL adopts the typical pipeline
architecture of NLG systems [28]. It produces texts in
three sequential stages: document planning, microplanning
7 http://www.w3.org/TR/owl-features/.8 ELEON is available at http://www.iit.demokritos.gr/*eleon/.
188 Univ Access Inf Soc (2011) 10:179–193
123
and surface realisation. In document planning, the system
first selects the logical facts (OWL triples), which will be
conveyed to the user and specifies the document structure.
In microplanning, it constructs abstract forms of sen-
tences, then it aggregates them into more complex peri-
ods, and finally selects appropriate referring expressions.
In surface realisation, the abstract forms of sentences are
transformed into real text, and appropriate syntactic and
semantic annotations can be added, for example to help
the TTS produce more natural prosody. The system is also
able to compare the described entity to other entities of
the same collection (e.g., unlike all the vessels that you
saw, which were decorated with the black-figure tech-
nique, this amphora was decorated with the red-figure
technique).
7.3 PServer and user models
This section provides an extensive description of PServer
because its functionality is crucial to the implementation of
the framework. It stores and handles user models, and it
supports user stereotypes, as well as groupings (commu-
nities) of users and exhibits. The recommendation engine,
as well as the explanation engine, are implemented as
clients to the PServer. In particular, the recommendation
engine relies on the existence of user and exhibit com-
munities. Also, the user stereotypes are essential to the
NLG engine, because the generated text is user stereotype
dependent. The PServer is a tool under development at
NCSR ‘‘Demokritos’’ [24].
In the context of PServer, it is necessary to define and
distinguish between attributes and features. Attributes are
independent of the current application domain (i.e.,
museum), and they primarily refer to user characteristics
such as: age, gender, occupation, level of expertise,
address, etc. Naturally, attribute values do not change in
the course of user interactions. Features, are application
dependent and consequently refer to characteristics of the
exhibits of the current art collection. Thus, they may refer
to the following: archaeological period, architect, location,
type of the exhibit, etc. The value of each feature reflects
the user’s preference of this feature, or the user’s frequency
of visit of this feature.
In particular, the PServer distinguishes between the
following entities: individual user models, user stereo-
types and user or feature communities. All of these entities
aim to capture and represent users, and to group users and
features with a view of detecting similar users, with the
ultimate purpose of offering recommendations. Let A be
the set of all user attributes, and F be the the set of
features.
7.3.1 Individual user models
Each user is represented by an individual user model that is
comprised of attributes and features (see Table 1).
7.3.2 Stereotypes
User modelling via stereotypes was introduced for the first
time in [29]. In the context of the PServer, stereotypes SC
are made of a combination of a subset of attributes, i.e.,
SC(SA), where SA � A (see Table 2). Stereotypes can be
hand crafted by experts and represent some plausible
assumptions about users. Stereotypes are very useful for
novel users, i.e., for users who have no or little interaction
history and thus no exhibit suggestions can be made from
their past behaviour or their similarity to other users. Ste-
reotypes essentially link users with characteristics of
exhibits. Thus, in the provided example the first stereotype
in Table 2 might be associated with shorter descriptions of
exhibits, and the second stereotype might be associated
with the omission of certain exhibit descriptions that are
too evident for experts.
7.3.3 Communities
Two types of communities are supported, user and feature
types. Both types represent groupings of users or features
based exclusively on feature values. A user community
UC, in particular, represents a group of users who have
similar feature values with respect to certain features.
UC = (fk(1), fk(2), …, fk(n)). Thus, users who belong in the
same community are deemed to be similar.
Table 2 A stereotype of children non-expert users (visitors) and a
stereotype of expert users
Age Child
Expertise Non-expert
Age Adult
Expertise Expert
Table 3 Two features communities: the first one expresses the fact
that users interested in Acropolis are also interested in the Ancient
Agora; the second expresses a common interest in the architects
‘‘Phedias’’ and ‘‘Kallikrates’’
Located-in.acropolis
Located-in.ancient-agora
Constructed-by.phedias
Constructed-by.kallikrates
Univ Access Inf Soc (2011) 10:179–193 189
123
On the other hand, a feature community FC represents a
collective of features that have similar feature values.
FC = (fj(1), fj(2), …, fj(m)). Thus, features belonging in the
same community are deemed to be similar, and can be
suggested in the sense of: ‘‘users who selected this item
have also expressed interest in these items’’. Communities
of both types are built with the cluster mining algorithm
[25]. Since user communities group similar users, they can
be used for listing similar users for explaining the sug-
gestions (see relevant section). Feature communities are
most useful in suggesting new items to users, once a user
has expressed interest in an item of the community.
7.4 Other modules
The rest of the modules of the platform can be split into
user interface modules and platform-specific modules.
7.4.1 User interface modules
A TTS and an ASR system from Acapela9 have been used.
In particular, the ASR module, apart from the voice rec-
ognition, also includes natural language understanding for
both English and Greek. Moreover, the TTS is able to
produce speech in English and in Greek.
The emotion generation and expression module can be
implemented according to the work in [9] and in [14]. In
the future, it is planned to influence also the intonation of
the text to speech system.
7.4.2 Platform-specific modules
The communication server that transfers messages between
the different modules has been provided by the Computa-
tional Vision and Robotics Laboratory of Institute of
Computer Science (ICS) of the Foundation for Research
and Technology–Hellas (FORTH).10 The same group has
provided the hardware for the robot platform as well as the
navigation module. In particular, the robot is the model
MP-470, of the GPS GmbH Neobotix11 and is depicted at
Fig. 5.
Moreover, the mapping of the premises of the FHW has
been performed automatically by the robot according to the
work described in [32]. The gesture recognition module is
able to comprehend three gestures (yes, no and quit) and is
described extensively in [4].
Finally, the cognitive structure of the dialogue manager
has been developed by the authors for the purpose of the
evaluation.
7.5 Resources
The resources of the present implementation include the
domain ontology, the user stereotypes and the application
databases.
7.5.1 Domain ontology
The domain ontology describes the area to be visited
(buildings, rooms and programs). Thus, there are exhibit
classes (such as sculptures and paintings), subclasses
(such as sculptures from a certain archaeological site) and
instances (such as the statue of Zeus). For each instance,
there is additional information in the form of fields that
describe the exhibit, state its author, the historical period
in which it belongs, etc. The ontology can be updated by
adding visiting areas, new exhibits or by updating
information on already existing areas and exhibits. The
use of the ontology enables the DSM to describe the
newly added or updated objects without further
configuration.
Authors using the ELEON can record lexical informa-
tion in the form of nouns and verbs that form the domain-
specific dictionary. This is required by the natural language
generation process. In addition, authors can specify the
degree of appropriateness of each noun for each user ste-
reotype. For instance, some nouns would be marked as
appropriate for adults only, whereas others, simpler words
would be a substitute for children.
Authors, can add a new noun by providing an identity
name. Then, the author has to specify the forms that the
noun assumes in various languages (English, Italian and
Greek), which depend on the idiosyncrasy of each lan-
guage. For instance, the singular and plural form across
Fig. 5 Robot at the foundation of the Hellenic world
9 http://www.acapela-group.com/index.asp.10 http://www.ics.forth.gr/*xmpalt/research/orca/.11 http://www.neobotix.de/en/.
190 Univ Access Inf Soc (2011) 10:179–193
123
cases can be specified, in addition to the gender and
whether it is countable or uncountable.
7.5.2 User stereotypes
User stereotypes such as adult, child, expert, can be defined
with the aid of the PServer (described in Sect. 7.3). User
stereotypes are quite useful since they permit extensive
personalisation of the information that users receive [3,
16]. Thus, user stereotypes determine the user’s interest in
the ontology entities (e.g., some facts about painting
techniques may be too elementary for experts and thus they
can be omitted). Also, the number of times a fact has to be
repeated before the system can assume that a user has
assimilated it depends on whether the user is an expert or a
layman (e.g., how many times the robot has to repeat a
description of the temple of Hephestos). In addition, user
stereotypes specify the appropriateness of certain linguistic
elements (e.g., it might employ simpler terms for children
than for adults), as well as parameters that control the
maximum desired length of exhibit description. Finally,
different speech synthesiser intonations can be chosen for
different user stereotypes.
7.5.3 Application databases
There is also a Canned Text Database, which contains
fixed text that will be spoken at the commencement, at the
end, or at an intermediate stage of the visitor’s interaction
with the dialogue system. Canned text, also contains some
string variables that are instantiated during the dialogue
session. Moreover, there is a Domain-Specific Database
that in effect contains instances of the ontology, for
example, particular buildings, programs and rooms. This
information is extracted from the ontology that the NLG
module uses [10]. Finally, the robot personality is included
in the Domain-Specific Database and is represented
according to the OCEAN [8] model.
7.6 Evaluation
The first phase of the evaluation involved a usability study
of the robot guide at the premises of the Foundation of the
Hellenic world (FHW). According to the experimental
protocol, the robot was left wandering at the foyer of FHW,
with a view of attracting the attention of the visitors. Then,
upon the request of a visitor the experiment supervisor
would briefly explain the purpose of this robot, as well as
its capabilities and would let the visitor interact for about
15 minutes. After that the experiment supervisor filled in a
predefined questionnaire based on the visitor’s answers.
In this phase of the experiment, 76 visitors were
involved, 37 men and 39 women; 34 of them were up to
15 years old. The point was to evaluate the robot guide as a
platform, but also the individual modules, such as the voice
recognition module, the NLG engine, the TTS module, and
the dialogue system from a usability perspective.
According to the results, the robot platform urges the
visitors to visit the exhibits of the FHW. Another finding is
that visitors would like to find similar robots in other
museums. Considering the effect of different platform
modules on the robot–human interaction, it was discovered
that speech recognition had the most profound effect, fol-
lowed by the form of the robot itself and by the TTS
module. Considering the user-friendliness of different
modes of communication, first comes the speech recogni-
tion, followed by the usage of the robot’s touch screen (as
it was expected).
The produced text was deemed as satisfactory. How-
ever, it should be noted that the users were unaware that the
text is produced dynamically and that it is based on an
ontology, on microplans and on user stereotypes. The
produced text can be improved by extending the ontology
and the microplans.
Finally, the views about the sophistication of the dia-
logue system seem to be split between satisfactory and
mediocre.
8 Conclusions and future directions
This article aimed to introduce a framework for the dia-
logue system of intelligent robots. Such robots interact with
humans in a quite natural way as they describe or suggest
exhibits.
The framework proposes a new way of designing dia-
logue managers. In particular, it introduces a cognitive
architecture into dialogue management. The architecture is
based on a reactive, a deliberative and a reflective module.
The reactive module handles the simplest human–robot
interactions. The deliberative module is able to suggest
concepts of interest to the user. A domain ontology allows
the deliberative module to suggest items that are semanti-
cally related to what the user has just seen. In addition,
user’s past interaction as well as similar users’ interaction
can be used to suggest popular items. Finally, the delib-
erative module employs an affective module for emotion
recognition and emotion expression. At the topmost level is
the reflective module that acts as the management mech-
anism of the cognitive architecture. Necessary resources
for such a system include, a domain ontology, a robot
personality and the user types.
A full evaluation of the proposed framework, which is
based on a cognitive architecture, is a complex issue, which
will be the subject of future research. However, the eval-
uation process has started, by examining the effect of
Univ Access Inf Soc (2011) 10:179–193 191
123
certain modules as well as that of the robot as a whole on
the perceived usability. The evaluation was based on users’
perception of the interaction with the robot, and it was
captured through answers to a questionnaire.
Certain modules are still to be evaluated; in particular,
the recommendation engine and the explanation engine,
which form part of the PServer, have not been evaluated
yet. Another criterion that can be tested is the robustness
of the system. How can it respond to unforeseen situa-
tions; for instance how does the system behave if it
does not understand the user’s utterance? Or what if
it does not have enough knowledge to respond to a
user’s request? These situations must be handled when
striving for a system that operates in an open-ended
environment.
Acknowledgments This work was partially supported by the
research programmes XENIOS (Information Society, 3.3, Greek
national project) and INDIGO (FP6, IST-045388, EU project).
References
1. Adomavicius, G., Tuzhilin, A.: Toward the next generation of
recommender systems: a survey of the state-of-the-art and pos-
sible extensions. IEEE Trans. Knowl. Data Eng. 17(6), 734–749
(2005)
2. Andre, E., Dybkjær, L., Minker, W., Heisterkamp, P. (eds):
Affective Dialogue Systems. Springer, Berlin (2004)
3. Androutsopoulos, I., Oberlander, J., Karkaletsis, V.: Source
authoring for multilingual generation of personalised object
descriptions. Nat. Lang. Eng. 13(3), 191–233 (2007)
4. Baltzakis, H., Argyros, A., Lourakis, M., Trahanias, P.: Tracking
of human hands and faces through probabilistic fusion of multiple
visual cues. In: Proceedings of International Conference on
Computer Vision Systems (ICVS) (2008)
5. Bennewitz, M., Faber, F., Schreiber, M., Behnke, S.: Towards a
humanoid museum guide robot that interacts with multiple per-
sons. In: Proceedings of the 5th IEEE-RAS International Con-
ference on Humanoid Robots (2005)
6. Bilidas, D., Theologou, M., Karkaletsis, V.: Enriching OWL
ontologies with linguistic and user-related annotations: the ELEON
system. In: Proceeding of International Conference on Tools with
Artificial Intelligence (ICTAI). IEEE Computer Society Press
(2007)
7. Chiu, C.: The Bryn Mawr tour guide robot. PhD thesis, Bryn
Mawr College (2004)
8. Costa, P.T., McCrae, R.R.: Normal personality assessment in
clinical practice: The NEO personality inventory. Psycol. Assess.,
4(1), 5–13 (1992)
9. Egges, A., Zhang, X., Kshiragar, S., Thalmann, N.M.: Emotional
communication with virtual humans. In: Proceedings of the 9th
International Conference on Multimedia Modelling (2003)
10. Galanis, D., Androutsopoulos, I.: Generating multilingual
descriptions from linguistically annotated OWL ontologies: the
NATURALOWL system. In: Proceedings of the 11th European
Workshop on Natural Language Generation, Schloss Dagstuhl,
Germany (2007)
11. Herlocker, J.L., Konstan, J.A., Riedl, J.: Explaining collaborative
filtering recommendations. In: CSCW: Proceedings of the ACM
Conference on Computer Supported Cooperative Work,
pp. 241–250. ACM, New York (2000)
12. Isard, A., Oberlander, J., Matheson, C., Androutsopoulos, I.:
Special issue advances in natural language processing: speaking
the users’ languages. IEEE Intell. Syst. Mag. 1(18), 40–45 (2003)
13. Kelly, J.P., Bridge, D.: Enhancing the diversity of conversational
collaborative recommendations: a comparison. Artif. Intell. Rev.
25(1–2), 79–95 (2003)
14. Kshirsagar, S., Garchery, S., Sannier, G., Magnenat-Thalmann,
N.: Synthetic faces: analysis and applications. Int. J. Imaging
Syst. Technol. 13(e1), 65–73 (2003)
15. Langley, P., Laird, J.E., Rogers, S.: Cognitive architectures:
research issues and challenges. Technical report, Computational
Learning Laboratory, Stanford University (2006)
16. McEleney, B., Hare, G.O.: Efficient dialogue using a probabilistic
nested user model. In: 19th International Joint Conference on
Artificial Intelligence, Edinburgh, Scotland (2005)
17. McTear, M.F.: Spoken Dialogue Technology. Towards the
Conversational User Interface. Springer, Berlin (2004)
18. Milward, D., Beveridge, M.: Ontology-based dialogue systems.
In: Proceedings of IJCAI Workshop on Knowledge and Rea-
soning in Practical Dialogue Systems (2003)
19. Minsky, M.: The Emotion Machine. Simon and Shuster, New
York City (2006)
20. Mooney, R.J., Roy, L.: Content-based book recommending using
learning for text categorization. In: DL: Proceedings of the 5th
ACM conference on Digital Libraries, pp. 195–204, ACM, New
York (2000)
21. Newell, A., Simon, H.A.: Computer science as empirical inquiry:
symbols and search. Commun. ACM 19(3), 113–126 (1976)
22. O’Donnell, M., Mellish, C., Oberlander, J., Knott, A.: ILEX: an
architecture for a dynamic hypertext generation system. Nat.
Lang. Eng. 7(3), 225–250 (2001)
23. Ortony, A., Glore, G., Collins, A.: The Cognitive Structure of
Emotions. Cambridge University Press, Cambridge (1988)
24. Paliouras, G., Mouzakidis, A., Ntoutsis, C., Alexopoulos, A.,
Skourlas, C.: PNS: personalized multi-source news delivery. In:
Proceedings of the 10th International Conference on Knowledge-
Based & Intelligent Information & Engineering Systems (KES),
UK (2006)
25. Paliouras, G., Papatheodorou, C., Karkaletsis, V., Spyropoulos,
C.D.: Clustering the users of large web sites into communities. In:
ICML : Proceedings of the 17th International Conference on
Machine Learning, pp. 719–726. Morgan Kaufmann, San Fran-
cisco (2000)
26. Piccard, R.W.: Affective Computing. MIT Press, Cambridge
(1997)
27. Rafter, R., Smyth, B.: Conversational collaborative recommen-
dation—an experimental analysis. Artif. Intell. Rev. 24(3–4),
301–318 (2005)
28. Reiter, E., Dale, R.: Building Natural Language Generation
Systems. Cambridge University Press, Cambridge (2000)
29. Rich, E.: User modeling via stereotypes. Cogn. Sci. 3(4),
329–354 (1979)
30. Sloman, A.: Requirements for a fully-deliberative architecture (or
component of an architecture). http://www.cs.bham.ac.uk/
research/projects/cosy/papers/#dp0604, COSY-DP-0604 (HTML)
(2006)
31. Thrun, S., Bennewitz, M., Burgard, W., Cremers, A.B., Dellaert,
F., Fox, D., Haehnel, D., Rosenberg, C., Roy, N., Schulte, J.,
Schulz, D.: MINERVA: a second generation mobile tour-guide
robot. In: Proceedings of the IEEE International Conference on
Robotics and Automation (ICRA) (1999)
32. Trahanias, P., Burgard, W., Argyros, A., Hahnel, D., Baltzakis,
H., Pfaff, P., Stachniss, C.: TOURBOT and WebFAIR: web-
operated mobile robots for tele-presence in populated exhibitions.
IEEE Robot. Autom. Mag., Spec. Issue Robot. Autom. Eur.: Proj.
Funded Comm. Eur. Union 12(2), 77–89 (2005)
192 Univ Access Inf Soc (2011) 10:179–193
123
33. Traum, D., Larsson, S.: The information state approach to dia-
logue management. In: Juppenvelt, J., Smith, R. (eds) Current and
New Directions in Discourse and Dialogue. Kluwer, The Neth-
erlands (2003)
34. Vernon, D., Metta, G., Sandini, G.: A survey of artificial cogni-
tive systems: implications for the autonomous development of
mental capabilities in computational agents. IEEE Trans. Evol.
Comput., Spec. Issue Auton. Ment. Dev. 11(2), 151–180 (2007)
Univ Access Inf Soc (2011) 10:179–193 193
123