Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Visual-Topological Mapping:
An approach for indoor robot
navigation
Master Thesis Project
Jorge Davila Chacon(s1846965)
Faculty of Mathematics and Natural Sciences
Artificial Intelligence Department
University of Groningen
June 2011
Internal supervisors:
Prof. Dr. Lambert Schomaker
(ALICE, University of Groningen, the Netherlands)
Dr. Tijn van der Zant
(ALICE, University of Groningen, the Netherlands)
External supervisor:
Dr. Peter Ford Dominey
(SBRI, Institut National de la Sante et de la Recherche Medicale, France)
Abstract
Using visual features for robotic navigation in natural and changing envi-
ronments remains to be an unsolved problem. This paper describes the use
of nearest neighbor algorithm on salient visual features for indoor mapping.
The mapping procedure consists on defining an arbitrary number of vector
fields -or paths- in a domestic environment, pointing towards a set of target
locations. From these specified paths, the robot constantly estimates its
most probable location, adjusts its orientation and moves forward a prede-
fined distance, in order to converge to a target location. A SURF (Speeded
Up Robust Features) implementation was used for object detection and
scene recognition. For every location in a path, the perceived scene was
represented with the set of detected objects, and their particular geometric
properties relative to the robot’s body. A first experiment confirmed that
the deviation from the trained paths -due to odometric error- decreased as
the n umber of trained intermediate-locations increased. A second exper-
iment showed promising results when testing the effectiveness of nearest
neighbor search for estimating the robot location. Finally in a third experi-
ment, a satisfactory convergence rate of the robot towards predefined target
locations, was observed under real world circumstances.
Keywords: Topological robot-mapping, visual object-recognition, nearest
neighbor, vector-field navigation.
i
Preface
In the following pre-introductory paragraphs are described the research goals of the
scientists supporting this thesis project. It is also explained how the objectives they
share in common originated this project. Finally, it is included an acknowledgments
section.
Shared efforts towards long living robots
This thesis project was originated by the commencement of collaboration between
the Artificial Intelligence and Cognitive Engineering Institute (ALICE), at the Univer-
sity of Groningen (RUG) in the Netherlands, and the Stem-Cell and Brain-Research
Institute (SBRI), a branch of the Institut National de la Sante et de la Recherche
Medicale (INSERM) in France. In order to clarify the objectives of the present work,
it is useful to know the research objectives of both institutes.
The ALICE institute objective is concisely stated by its research director Lambert
Schomaker:
The research in Artificial Intelligence and Cognitive Engineering is in-
herently interdisciplinary in nature. With Computing Science, an interest
in formal modeling is shared, where these models may be symbolic, statisti-
cal or hybrid in nature. With Psychology and Biology an interest in natural
cognition is shared, where the goal is to describe fundamental qualities of
cognition in such an explicit way that computational models of perceptual,
cognitive, and behavioral control functions can be constructed.1
While working at the ALICE institute, Tijn van der Zant, together with Thomas
Wisspeintner, published in 2005 the paper “RoboCup X: A Proposal for a new League
1http://www.rug.nl/ai/onderzoek/index
ii
where RoboCup goes real world”. This paper was the ideological foundation of the
RoboCup@Home league. RoboCup@Home is part of the RoboCup1 initiative, and it is
the largest international competition for autonomous domestic-service robots. Its main
objective is:
. . . to develop service and assistive robot technology with high relevance
for future personal domestic applications. [. . . ] A set of benchmark tests
is used to evaluate the robot’s abilities and performance in a realistic non-
standardized home environment setting. Focus lies on the following domains
but is not limited to: Human-Robot-Interaction and Cooperation, Naviga-
tion and Mapping in dynamic environments, Computer Vision and Object
Recognition under natural light conditions, Object Manipulation, Adaptive
Behaviors, Behavior Integration, Ambient Intelligence, Standardization and
System Integration.2
The Cortical Networks for Cognitive Interaction (CNCI) team, from SBRI, repre-
sents the other side of the collaborative group. Their research is based on Artificial
Intelligence (AI) and Embodied Cognition paradigms. It is described by Peter Ford
Dominey -CNCI research director- as follows:
Our previous research has addressed sensory-motor integration and higher
cognition including spatial integration and language comprehension through
a variety of methods including human functional imagery and clinical neu-
roscience, neural network simulation and robotics. [. . . ] One of the crucial
issues that has emerged in this research is the realization that the nature
of internal representations will be determined by the physiological structure
of the system and its direct link with the nervous system, thus bringing us
towards the “embodied cognition” stance. In this context we will develop a
“hybrid” embodied cognitive system for a humanoid robot, the iCub.3
Intelligent robots @Home
1http://www.robocup.org/about-robocup/a-brief-history-of-robocup/2http://www.ai.rug.nl/robocupathome/3http://www.sbri.fr/teams/human-and-robot-interactive-cognitive-systems.html
iii
The link between RoboCup@Home research goals[34] and the iCub1 robot, can
be found by exploring the Cooperative Human-Robot Interaction Systems (CHRIS)
project. CHRIS is a European effort that supported the CNCI research on embodied
cognitive systems. The CHRIS project addresses:
. . . the problem of a human and a robot performing co-operative tasks
in a co-located space, such as in the kitchen where your service robot stirs
the soup as you add the cream. [. . . ] The project is based on the essential
premise that it will be ultimately beneficial to our socioeconomic welfare to
generate service robots capable of safe co-operative physical interaction with
humans. The key hypothesis is that safe interaction between human and
robot can be engineered physically and cognitively for joint physical tasks
requiring co-operative manipulation of real world objects.2
During the first semester of 2010, as a research member of the CNCI team, van
der Zant integrated the CHRIS project research goals with the RoboCup@Home com-
petition. He created a team to participate for the first time with a humanoid robotic
platform. One of the requirements for the competition was a navigation system and
students at the ALICE institute were offered the opportunity to join the venture.
The first stage of this thesis project was conducted in Lyon, France, at the SBRI.
The second and third stages were carried out in Groningen, the Netherlands, at the
ALICE institute. The different stages originate from significant changes in the hard-
ware and software architecture. Often it will be useful to explain the work realized on
the initial ones to better understand the later ones. It will be referred to them as first,
second and third stages of the thesis project during the rest of this document.
Acknowledgments
This work was carried under the internal supervision of Prof. Dr. Lambert Schomaker,
Director of Research at ALICE institute, and Dr. Colin Martijn van der Zant, re-
1http://www.robotcub.org/index.php/robotcub2http://www.chrisfp7.eu/
iv
searcher at the same institute and founding member of the RoboCup@Home competi-
tion. As an external supervisor I was supported by Dr. Peter Ford Dominey, Research
Director at SCBRI.
Both, RUG and INSERM, greatly supported this thesis project and made possible
the empirical research by providing me with academic guidance, access to physical
equipment, facilities and travel expenses when required.
Concerning to the economic grant that allowed me to take part in the master studies
program at RUG, I benefited from the agreement made between Benemerita Universi-
dad Autonoma de Puebla (BUAP) and Erasmus-Mundus External Cooperation Win-
dow - Mexico (ECW- Mexico). I want to thank both institutions for the opportunity
to study in a university of excellence.
I want to give special thanks to some friends that supported me academically at
various stages during my studies: Olexandr Murov, Herke van Hoof, the BORG1 and
Radical Dudes2 team members. They all made my time in Europe one of the most
memorable experiences in my life. Thank you for giving me a deeper insight in the
diversity of cultures around the world.
Finally, I am deeply grateful with my family for their support in every aspect during
these two years. They clarified my thoughts in moments of uncertainty, and gave me
the mettle to materialize a dream.
In summary, for all the moral, intellectual, and economical support I want to express
my gratitude to the pointed institutions, researchers, friends, colleagues and family. Be-
hind RUG, INSERM, BUAP and ECW- Mexico exists a vast amount of work making
possible to spread high quality education around the globe. You are creating a better
future for society by extending education opportunities without frontiers.
Jorge Davila Chacon
July 2011, Groningen, the Netherlands.
1http://www.ai.rug.nl/crl/People/People2https://sites.google.com/site/pfdominey/robocup@home
v
Contents
List of Figures ix
List of Tables xi
1 Introduction 1
1.1 Robotic-navigation systems . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Animal navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Behavior based robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Technical challenges for autonomous robot mobility . . . . . . . . . . . . 5
1.5 Research goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Project overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background 8
2.1 Current approaches for autonomous mobility . . . . . . . . . . . . . . . 8
2.2 Visual-topological navigation . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Robot localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Obstacle avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Platform 15
3.1 First stage: Nao with low computing power . . . . . . . . . . . . . . . . 15
3.1.1 First stage hardware platform . . . . . . . . . . . . . . . . . . . . 16
3.1.2 First stage software platform . . . . . . . . . . . . . . . . . . . . 16
3.2 Second stage: Nao with high computing power . . . . . . . . . . . . . . 21
3.2.1 Second stage hardware platform . . . . . . . . . . . . . . . . . . 23
3.2.2 Second stage software platform . . . . . . . . . . . . . . . . . . . 23
3.3 Third stage: Pioneer with distributed computing power . . . . . . . . . 23
vi
CONTENTS
3.3.1 Third stage hardware platform . . . . . . . . . . . . . . . . . . . 24
3.3.2 Third stage software platform . . . . . . . . . . . . . . . . . . . . 26
4 An Algorithm for Visual-Topological Mapping 29
4.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.1 Robot niche . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 Lighting conditions . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.3 Obstacles cluttering . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.4 Landmarks location . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.5 Camera properties . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Machine learning techniques . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Visual navigation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 Object detection: Spikenet . . . . . . . . . . . . . . . . . . . . . 33
4.3.2 Object detection: SURF . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 Calculation of an object’s angle . . . . . . . . . . . . . . . . . . . 38
4.3.4 Detect surroundings . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 Location estimation . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.5.1 Location disambiguation . . . . . . . . . . . . . . . . . 44
4.3.6 Correcting orientation . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Experiments 47
5.1 Experiment 1: Path formation . . . . . . . . . . . . . . . . . . . . . . . 47
5.1.1 Conclusion for path formation . . . . . . . . . . . . . . . . . . . 49
5.2 Experiment 2: Location estimation . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Conclusion for location estimation . . . . . . . . . . . . . . . . . 52
5.3 Experiment 3: Visual navigation . . . . . . . . . . . . . . . . . . . . . . 54
5.3.1 Elevator path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3.1.1 Conclusion for elevator path . . . . . . . . . . . . . . . 56
5.3.2 Lounge path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.3.2.1 Conclusion for lounge path . . . . . . . . . . . . . . . . 59
6 Discussion 62
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
vii
CONTENTS
References 67
viii
List of Figures
2.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Certainty grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Potential field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Nao with additional mini-PC . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Nao hardware diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 Additional mini-PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Used graphical user interfaces . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Nao mounted on pioneer platform . . . . . . . . . . . . . . . . . . . . . 25
4.1 Projection on camera sensor . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Nao’s camera angle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Spikenet detections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.4 SURF detections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Full images considered as objects . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Object detection components . . . . . . . . . . . . . . . . . . . . . . . . 39
4.7 Example path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Trained “U” path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 6-locations path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 11-locations path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.4 Estimation of the robot’s location in a vector field . . . . . . . . . . . . 53
5.5 Stored vector field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.6 Robot trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.7 Confidence limits for Elevator path . . . . . . . . . . . . . . . . . . . . . 57
5.8 Lounge path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
ix
LIST OF FIGURES
5.9 Confidence limits for Lounge path . . . . . . . . . . . . . . . . . . . . . 61
x
List of Tables
3.1 Nao specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Additional mini-PC specifications . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Desk PC specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Robot’s laptops specifications . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Pioneer specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Default SURF parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.1 Visual navigation results . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.2 Optimized navigation results . . . . . . . . . . . . . . . . . . . . . . . . 60
xi
LIST OF TABLES
xii
Chapter 1
Introduction
As a starting point, it is addressed the importance of navigation algorithms for au-
tonomous systems. Afterward, it is provided an overview of some navigation methods
found in nature. Then is explained the influence of animal navigation in artificial sys-
tems, and what are the current challenges for robotic navigation. Finally, the research
goals and the project overview are detailed.
1.1 Robotic-navigation systems
The goal of providing autonomy and physical awareness to intelligent systems has been
pursued for decades. More specifically, the case of a robot discovering and describing
its environment has been a challenging task. This is a relevant goal, as it could provide
means for exploring harmful or unreachable environments for humans. Autonomous
mobility is also a requirement when it is not possible or desirable to instruct a robot
-in real time- about which path to follow. This could be the case of natural disasters or
interplanetary exploration. On the other hand, robot-experimentation could provide
insights in the way animals navigate in dynamical environments in real time.
In the case of indoor scenarios, one important goal is to achieve a smooth interac-
tion between robots and people. In typical human environments -such as homes, offices,
hospitals or elderly houses- a robot can be useful for entertaining purposes, collaborat-
ing in stress reduction or even for safety vigilance. Examples include alerting medical
personnel in case a human suffered an accident, or patrolling a neighborhood during
the night. Basic tasks expected from a robot companion include localizing spots in its
1
1. INTRODUCTION
surroundings -after spoken commands or through exploration- and moving objects be-
tween different locations. A reliable navigation system is a priority when being realistic
about the deployment of robots in human niches.
Achieving an accurate map description with a robot may be based on different
kinds of measurements, such as distances traveled, turning angles, or the topology of
landmarks detected in the environment. All the measurements rely on a variety of
sensors like odometers, cameras or proximity sensors (sonars or range finders). The
economical resources available for different research programs is reflected in the variety
of sensors used, affecting in this way the approach to be taken. What remains constant
in different navigation approaches found in literature, is the common goal of providing
a robot with a fast and accurate system to describe its environment [30].
Before providing a further insight on the technical challenges for robot mobility,
it is useful to have an overview of the navigation strategies observed in some animal
species.
1.2 Animal navigation
Many navigation behaviors found in the animal kingdom, depend on reference land-
marks that provide a compass sense in the environment [22]. For example, the desert
ant Cataglyphis bicolor uses the relative sizes and angles of visual landmarks for navi-
gation. They do not retrace their return journey to their nest by following a pheromone
trail. Once they reached a point near the nest, they start searching the nest in a spiral-
like motion, of constantly widening circles [15]. This is an example of the functionality
of dead-reckoning in animal navigation. It suggests the hard coding of exploring behav-
iors by evolutionary means. Sometimes a dead-reckoning behavior is the best solution
for navigation.
Landmarks can be represented by several ways in nature. Studies on the navigation
system of some bees (Apis mellifera) and birds (Passerina cyanea), show the use of
the sun movement, forests border lines and individual trees, as reference landmarks for
navigation [17, 33]. Another example involves human navigation without the use of
compasses or geometrical maps. The inhabitants of the Polynesian islands were able to
navigate successfully over several thousand kilometers of open sea between Tahiti and
Hawaii [33].
2
1.2 Animal navigation
The variety of landmarks used to achieve such odyssey include the sun, moon, night
stars, and unique wind and water patterns determined by near -although not necessarily
visible- islands. On a canoe the view can reach around 15 km, but clouds may form
above islands and make them visible from much further distances. This is because land
heats air masses faster than the surrounding water, creating elevating air currents that
form clouds when cooling down. Furthermore, these white clouds reflect a green light
when formed above atolls or coral islands encircling a lagoon.
In all the cases mentioned above, the common denominator is the formation of
canonical paths. These paths greatly simplify the navigation of animals between their
nest and the food source, name it between their current location and a target location.
The origin of these paths may be learnt or “hard coded” for different animals. Nehmzow
explains concisely the advantage of canonical paths:
The use of unchangeable (canonical) paths improves navigation reliabil-
ity greatly: the task of navigating to a goal location is simplified to one of
staying on a (possibly learned) path. [22]
For some animals, these paths seem to be acquired throughout their lives (Honey
bee) [22]. For others these paths must be genetically preprogrammed. For example,
first-year juvenile terns do not rely on the exemplification of an adult to find their
migratory paths [22].
As seen in the previous examples, the animal niche plays a fundamental role for
navigation. The specific characteristics of different environments determine the na-
ture of the objects to be used for orientation and path traversing. Nevertheless, the
intelligent use of reference landmarks would not make successful navigation possible,
if the environment was not favorable. Nehmzow relates this principle to autonomous
robotics:
This principle applies to robotics as well: we believe that robot perfor-
mance (and therefore navigation as one aspect of robot performance) always
has to be seen with respect to the robot’s environment. A general purpose
robot does not exist. [22]
3
1. INTRODUCTION
Mobile robots are not an exception, and the characteristics of the environment are
also an important consideration for designing navigation algorithms. In the case of
indoor domestic environments, it may be assumed an environment filled with visual
landmarks such as decorative objects and furniture. It will be elaborated further on
this topic in section 4.1.
Another important factor for the interaction between robots and the real world
is time reaction. Behavior based robotics is a promising alternative for the successful
deployment of robots in dynamic environments. This topic is addressed in the following
section.
1.3 Behavior based robotics
In the past decades it has been studied alternatives to provide machines with an animal-
like interaction with real environments. By the end of th 1940’s decade, Norbert Wiener
lead the development of cybernetics. Arkin defines the field and concisely explains the
importance of feedback loops to reproduce natural behaviors:
. . . a marriage of control theory, information science, and biology that
seeks to explain the common principles of control and communication in
both animals and machines [. . . ] Ashby and Wiener furthered this view of
an organism as a machine by using the mathematics developed for feedback
control systems to express natural behavior. This affirmed the notion of
situatedness, that is, a strong two-way coupling between an organism and
its environment. [3]
In the case of autonomous robots, situatedness requires a fast response-loop in or-
der to cope with the environment dynamics. A reactive system is therefore essential to
control the robot behavior. Reactive control involves a tight coupling between percep-
tion and action. More specifically, in the context of robotic behaviors, reactive control
aims to produce a timely motor response in dynamic and unstructured environments.
Due to the accumulated scientific experience of the past years, nowadays it is
clear the advantages provided by reactive hardware and software architectures for au-
tonomous robotics. In our case, this approach was taken at the third stage of the project
by developing a behavior-based software architecture, where a navigation system was
4
1.4 Technical challenges for autonomous robot mobility
required. In the following section, the technical challenges for robotic navigation sys-
tems are explained in more detail.
1.4 Technical challenges for autonomous robot mobility
As a starting point, it is useful to mention some of the main difficulties when trying
to provide a robot with physical awareness. Certainly the creation of effective and effi-
cient navigation algorithms, requires an appropriate interpretation of the information
provided by the sensors. But it is also necessary to consider the perceptual limitations
of the chosen sensors. The sensor limitations make necessary for a robot to explore its
surroundings. The noise affecting the sensed information needs to be included in the
sensor’s noise models, having unknown and/or variable distributions [29]. In the case
of motion, the rotational and translational odometric errors accumulate, making these
sensed data less reliable.
The robot niche generate important constraints for the robot mobility. It is nec-
essary to consider both, static physical constraints (stairs, different rooms, fireplaces),
and dynamic objects perceived as static (animals, chairs, other robots). Finally, it is
fundamental to understand the how a robot perceives its environment from its partic-
ular embodiment. A small biped robot and a tall wheel based robot require different
strategies to interact with the world.
Problems with traditional topological algorithms arise from the assumption that
the environment is static. Conservative approaches constrain the use of a robot to
artificial niches [30]. As well the use of active distance-based sensors such as laser range
finders imposes strict spatial limitations. Other methods capable of handling dynamic
environments assume the lack of occlusions from an aerial perspective [19]. In a typical
domestic setting these algorithms tend to fail. They require extensive retraining (precise
maps must be updated constantly), are costly to implement or simply not suitable
for most indoor environments. Items, such as some furniture and plants, can change
position frequently. Wall’s lines might be obscured from view only to reappear after a
while. Navigation algorithms should be able to deal with changing environments and
retrain automatically when new information is available.
Approaches using temporal consistency or odometry mostly rely on Bayesian models
[1, 2, 10, 23, 35]. Their drawback is that the information required for this modeling
5
1. INTRODUCTION
is unavailable as soon as the user turns off the robot, or changes it from one room to
another.
Some visual approaches for navigation can be found in literature. One alternative
is the use of omni directional cameras [6, 36]. They have the disadvantage of reduced
image resolution at relatively short distances compared to monocular cameras.
The algorithm described in this paper shows that navigation can be implemented
in a dynamic environment, by building a visual-topological map using salient image
features. This is, a map describing the spatial and geometric relations between several
visual landmarks in a given environment. As the system relies on several landmarks, it
is possible to take away some of them without affecting the navigation behavior. The
underpinnings of this approach will be addressed in the following chapters.
Now the motivations for exploring visual-topological navigation have been intro-
duced. It is possible then, to expose the general objective and research questions of the
project.
1.5 Research goals
The general objective of this project was to implement and explore the robustness of
the proposed visual-topological mapping algorithm, for indoor navigation with the robot
Nao. Consequently, the following research questions emerged:
1. Is it possible to implement a visual-topological mapping algorithm with the robot
Nao, relying on its embedded motor, sensing and computing capabilities?
2. Is it possible to improve the mapping algorithm after customizing it to the NAO’s
incorporated sensing devices, and with the aid of additional computational power?
3. Is this algorithm providing sufficient capabilities to the robot Nao to perform
SLAM (Simultaneous Localization and Mapping) [27] in an indoor location in
a reasonable amount of time, relative to the RoboCup@Home [34] competition
requirements?
6
1.6 Project overview
1.6 Project overview
The proposed mapping algorithm relies on the visual detection of landmarks. Two
different approaches were tested for object detection. The first one was a commercial
application based on artificial neural networks [11]. The second one was a SURF
implementation based on [4].
It was implemented an algorithm to calculate spatial and geometric properties of
the detected objects. Such properties include the angle of the objects relative to the
robot’s body, and abstract features dependent on the distance and rotation of the
objects. Assuming a large enough number of visual landmarks, these features allow the
robot to describe any location in a room as a vector with different values.
The environment constrains where the robot is required to navigate, are defined by
the RoboCup@Home rule book. It is assumed that a typical human indoor environment
can provide the robot with enough visual landmarks to differentiate several locations
in a room. It is hypothesized that the difference between all locations, is large enough
to generate a fine-grained vector field. Such vector field should allow the robot to move
from any location near the trained vectors, towards a target location.
Three experiments were carried to test the performance of the mapping algorithm.
The first experiment was performed in a constrained environment, in order to observe
the relation between the density of the vector field and the accumulation of odometric
error. The second experiment tested the performance of the nearest neighbor statistic
for estimating the location of the robot in a room. The last experiment tested the
robustness of the mapping algorithm in the real world.
The results of the experiments are detailed in chapter 5. A global discussion of the
project results, and the possible future work will be commented in chapter 6.
7
Chapter 2
Background
Many of the indoor robot autonomous mobility procedures found in literature consist
of three steps: mapping, localization and navigation [30]. At the same time, navigation
can be subdivided in path planning and obstacle avoidance. In this chapter, it will be
provided an overview of the current challenges for autonomous robot mobility. Then
it will be provided a more detailed description of the alternatives we propose to tackle
each of its required components.
2.1 Current approaches for autonomous mobility
A large proportion of the navigation algorithms found in scientific literature rely on
probabilistic models of the robot’s motion control (movements commanded to the robot)
and localization (its actual position after moving). A topological SLAM algorithm in
[26], used Partially Observable Markov Decision Process (POMDP) model the proba-
bility distribution of the position of a robot. In the experiments in [31] this approach
was adapted to perform hybrid topological-metric SLAM with a laser scanner. How-
ever, POMDP are not well suited for on line mapping since they are learned through
an off line process [25] or built manually from earlier information concerning the envi-
ronment’s geometry, appearance and topology. For example, in some methods [26, 31]
the environment geometry is standardized, e.g. room’s shape, floor distribution and
hall intersections angle.
Inference methods have been investigated by [24] and [23] to create topological
SLAM algorithms. Topologies are built in a space of relational maps. When sufficient
8
2.1 Current approaches for autonomous mobility
information is collected, maps are correlated by commanding the robot to perform extra
actions [24] or measurements [23], in order to estimate the validity of each individually
created map. Nevertheless, the acquisition of large sums of sensor information and the
complexity of sampling through all the images, does not allow mapping environments
with more than 15 nodes in such methods.
Other topological SLAM approaches are based on the global appearance from omni
directional-camera images [6, 16, 32, 36]. Images are described by regions or points
of interest, and the similarity distance between images in a given dataset defines the
nodes in the map. Images with a distance between them below a chosen threshold,
i.e. “similar images”, are considered to be originated from the same location in the
environment. These approaches based on image-appearance, efficiently segment the
environment by recognizing spots from distant locations with omni directional images.
However, none of the previous approaches performs on line nor on real-time. Either the
input images are previously processed in an off line step [6, 16, 36], or when using large
datasets, the computation of the images similarity becomes untreatable for a real-time
implementation [32].
In the case of the approach taken by [14], it is presented a real-time framework
performing topological SLAM based on vision using one monocular camera. Their
approach is based on the bag of words method [9], i.e. the collected images are measured
by a set of elementary features, denominated visual-words, and stored in a dictionary.
This dictionary is constructed by grouping visual-words, i.e. similar visual descriptors
in the images. Once a dictionary is built, the images are represented according to the
frequency of words contained in them. These representations or visual-words vectors,
are taken from the visual vocabulary built off line and a voting process finds the previous
images closest to the current one. Nevertheless, the authors do not mention if the
algorithm performance was tested under occlusion circumstances, e.g. when a chair or
a person would be blocking some region of interest in the image.
The requirements of the RoboCup@Home competition demand a navigation system
capable of performing in dynamical indoor environments. The algorithms described
above can not tackle such challenge. For this reason, an algorithm combining -the
advantages of- visual and topological mapping was considered an appropriate approach.
9
2. BACKGROUND
2.2 Visual-topological navigation
The main motivation for building a visually based navigation system is the big accu-
mulation of odometric error in the humanoid robot Nao, relative to the odometric error
found in wheel based robotic platforms. When a Nao robot is commanded to move n
meters forward, it will never reach exactly the specified distance nor move in a straight
line. The accumulated error depends on several factors, such as the floor material, dust,
the robot battery level and the amount of hours it was used during a day. The temper-
ature of the robot-joints tends to increase faster than it decreases. Another important
consideration is the use of passive sensors for navigation due to energy constraints.
The advantages of a visual navigation system can be thought from a practical per-
spective and from a scientific point of view. In comparison to metric approaches, topo-
logical maps provide a more compact representation that scales better with the size of
the environment. From a scientific point of view, topological maps rely on a higher level
of representation than metric mapping, allowing for semantically transparent planning
and navigation. At the same time, visual-topological navigation is biologically more
plausible. It is closer to animal behavior [22], revealing proof of concept.
It was made a review of different SLAM (Self Localization And Mapping) implemen-
tations, suitable for the embodiment of the robot Nao. There are integrated approaches
as in [20]. It implements the creation of nodes and loop closing based on SIFT (Scale
Invariant Feature Transform) [21] image features.
The proposed paradigm for mapping an indoor area using the android Nao, includes
the collaboration of parallel projects on image processing and computer vision. Current
cognitive and perceptual paradigms were considered as guidelines, to provide the robot
with the required physical awareness. It is known from literature that context plays a
fundamental role [12, 18]. Our method assumes a dynamic indoor human-environment,
where there is enough visual-texture to define fixed regions of interest, e.g. furniture,
decoration, windows, etc.
Finally some algorithms performing localization based on visual features lack the
implementation of navigation paths [13, 35]. In our approach, paths can be constructed
by describing a vector field, where the direction of every vector corresponds to the
desired orientation of the robot before moving towards a target location.
10
2.3 Robot localization
Using a nearest neighbor implementation the humanoid is able to find its way
around a domestic environment, even when some landmarks have changed position or
disappeared. The paper discusses the use of this method in a controlled environment in
details, to demonstrate the workings of the algorithm. Since the real world is changing
all the time, it is difficult to assert the exactness of the method. Nevertheless, when
tested in natural environments it performed satisfactorily, in the sense that the robot
reached the location it was commanded to find.
2.3 Robot localization
In robotics there are two common and related problems concerning localization [7]:
The Kidnapped and the Wake-up robot problems. They are important to test the
recovering capability of a robot from catastrophic localization failures. The kidnapped
robot refers to the situation where an autonomous robot is moved to an arbitrary
location while operating, and then it has to infer its current location. The wake-up
robot refers to the case where the robot is taken to an arbitrary location and put in
to operation. Then it has to infer its current location without any prior information.
Notice that the Wake-up problem is the special case of the Kidnapped problem, where
the robot is “told” that it was moved. These two problems create significant issues
with many localization systems. Only a subset of localization algorithms can deal with
the uncertainty created1.
Our proposed approach relies on the detection of objects or visual landmarks. It
is assumed that from different locations, the objects perceived by the robot have dif-
ferent geometric properties, e.g. size and orientation. The algorithm was tested under
different levels of control in the environment, and proved to be robust enough to tackle
the challenges of both the Kidnapped and the Wake-up robot problems. As expected,
when the landmarks are different at every location, even when the robot perceives a
single one , it always infers correctly its actual location. When the different locations
share some landmarks in common, at least an extra distinct landmark is needed to infer
correctly the current location. Further more, it also showed a satisfactory performance
when the only difference between the trained locations was the perceived size of the
visual landmarks. In this case the algorithm increased its accuracy with the number
1http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume11/fox99a-html/node1.html
11
2. BACKGROUND
of objects associated with each location. This behavior will be explained in detail in
section 5.2.
2.4 Obstacle avoidance
This component of navigation can be subdivided in two components: collision avoidance
and path recovery. The collision avoidance part can be tackled with the use of sensors
like sonars, range finders or cameras. The goal is to avoid the robot hitting any static
or moving object in its environment. Different methods vary in the degree of reactivity,
depending on the expected speed of the robot and the expected noise coming from
the sensors [5]. Common algorithms for obstacle avoidance found in literature include
edge-detection, certainty grids and potential field methods [5].
Edge detection extracts the obstacle vertical edges. This information allows the
robot to move around any of the detected edges. Depending on the sensors used, some
shortcomings of this method include: poor directionality, frequent misreadings, and
specular reflections. The Hough transform is a common approach for edge-extraction
in images. An example can be seen in figure 2.1.
Figure 2.1: Edge detection - Example of a sensed image (left) and its Hough transform
(right) showing the found edges.
Certainty grid is an adequate map representation for sensor data accumulation and
fusion. This probabilistic method locates obstacles in a grid-type model. The world is
modeled as a 2D array of squares, or cells. Every cell has a certainty value indicating
the confidence of having an obstacle inside (See figure 2.2). This value is a probability
12
2.4 Obstacle avoidance
function that depends on the sensor properties. Each cell continuously updates its
certainty value considering the sensor readings. After some time of moving around, the
robot can build an accurate map of a given area. There is a trade off between accuracy
and computational cost in this method, as both depend on the cell size.
Figure 2.2: Certainty grid - Example of the detection of an obstacle by a robot (left)
and the obstacle representation in a certainty grid (right). The values in the grid cells
represent the probability of containing an obstacle.
In potential of vector field algorithms, the goal or target location applies an attrac-
tive force to the robot. At the same time, obstacles produce imaginary repulsive forces.
The imaginary forces are represented by vector arrows distributed in the plane. The
robot’s movement direction is computed by adding all attractive and repulsive vector
forces. An example of a computed path in a vector field is shown in figure 2.3.
In our approach, the modules used for obstacle avoidance were different in the last
two stages of the project. During the second stage, the Nao default behaviors included
a reactive collision avoidance module using input from its front sonars. For the third
stage, a parallel project was developed by some of the faculty students to implement
an infrared-range finder collision avoidance module.
In both cases, unless explicitly avoided, the navigation system gives priority to the
collision avoidance commands, over the path-tracking commands indicating where to
move in order to reach the target location.
The path recovery in our method, comes from a vector field built around the main
path that allows the robot to keep moving towards the target location. This approach
13
2. BACKGROUND
Figure 2.3: Potential field - Example of a potential field containing two obstacles (Circle
and thick line). The small arrows represent the repulsion vectors around the obstacles,
and attraction vectors around the goal. The green line shows the computed path from the
starting point to the goal.
successfully solves this problem, although it has the disadvantage of slowing down the
training procedure.
14
Chapter 3
Platform
In this chapter are described the changes made in the hardware and software platforms
used during this research. The evolution of the visual-topological navigation system
can be divided in three stages, relative to the hardware and software architecture used.
This chapter reveals the importance developing hardware and software architectures as
a single parallel project.
3.1 First stage: Nao with low computing power
The election of the android NAO was due to pragmatic reasons. It has been the
standard platform of RoboCup since 2007, confirming it as a reliable robot for exper-
imentation. Nevertheless, it had never been used in the RoboCup@Home league. It
was an exciting challenge to participate for the first time with a humanoid robot in
the competition. Its robustness and re usability also are important factors. Dominey
concisely summarizes the advantages of commercial hardware platforms:
The research strategy is to use high quality commercially available robot
platforms, and software [. . . ] in order to provide a high performance baseline
system. From this baseline we then implement human-robot cooperation
capabilities inspired by contemporary results in human cognitive development
research. Because much of the technology that we use is off-the-shelf it is
robust, versatile and reusable.1
1http://www.sbri.fr/robocup-home.html
15
3. PLATFORM
The robot external configuration was modified in order to increase its computing
power. This modification consisted of a mini-PC attached to the front, an extra battery
set attached to the back, and an extra unidirectional microphone. These changes did
not impeded the mobility of the robot, but certainly decreased its walking stability. The
computer attached in the front blocked the sonars from the robot, making it impossible
to use of them for obstacle avoidance.
3.1.1 First stage hardware platform
The initial configuration of the robot platform can be seen in figure 3.1. It consisted of
the following components:
Nao robot. Humanoid robot produced by the French company Aldebaran Robotics.
Its detailed specifications are in figure 3.2 and table 3.1.
Additional computer. A fit-PC2i mini-PC was attached in front of the robot for the
additional processing power required for computer vision. This hardware struc-
ture was adopted to provide autonomy to the robot during the RoboCup@Home
competition, without depending on unstable and slower wireless communication.
Its detailed specifications can be found in figure 3.3 and table 3.2.
Additional battery. An external battery set was attached to the back, in order to
provide energy to the additional computer.
Additional microphone. This unidirectional microphone was incorporated to reduce
the background noise detected during the competitions, and increase the perfor-
mance of the speech recognition modules.
3.1.2 First stage software platform
During this stage the modules developed by the team were mainly written in Python
and some in C++ and C#. The set of open source libraries, middleware and commercial
software packages included in the architecture, are described in the following list:
YARP. Middleware from CHRIS project, used to control the information traffic be-
tween sensor’s raw data, the CPU’s and the robot’s end effectors 1.
1http://eris.liralab.it/yarpdoc/index.html
16
3.1 First stage: Nao with low computing power
Figure 3.1: Nao with additional mini-PC - Hardware platform used during the
first stage of the project. The unidirectional microphone reduced the background noise.
The external computer provided immediate processing power for computer vision, without
depending on unstable and slower wireless communication.
17
3. PLATFORM
Figure 3.2: Nao hardware diagram - Localization of the principal body components
of Nao robot. With permission from Aldebaran Robotics.
18
3.1 First stage: Nao with low computing power
Table 3.1: Nao specifications
Component Specification
Height 580 mm
Weight 4.3 kg
Energy autonomy 15-45 min1
DoF 17 (Distributed in joints)
Speakers 2 - 2W
Microphones 4 (Square array)
Networking Ethernet/Wi-Fi (IEE 802.11g)
Sonars 4 (2 emitters, 2 receivers)
Gyrometer 2 - 1 axis
Accelerometer 1 - 3 axis
IR 2 (1 emitter, 1 receiver)
Cameras 2 (VGA 640x480)2
Focus range 30 cm - Infinity
Vision field 58 degrees on the diagonal
CPU x86 AMD-GEODE 500MHz
RAM 256 MB
OS Linux 32 bit, x86 ELF
1 Autonomy depends on use conditions such as the activity of the wireless card or LAN
port, the current joints temperature and the intensity of its movement (standing or
continuously walking).2 Cameras are sorted in a non-overlapping vertical array. Maximum 30 fps, depending
on the chosen image resolution and communication channel. Wired communication is
faster than wireless.
19
3. PLATFORM
Figure 3.3: Additional mini-PC - Devices available in the mini-PC attached to the
front of Nao robot.
Table 3.2: Additional mini-PC specifications
Component Specification
Dimensions 101 x 115 x 27 mm
Weight 0.370 kg
CPU Intel Atom Z530 1.6GHz
RAM 1 GB
OS Linux Mint/Windows 7
Audio Line-out, line-in, mic
Networking Ethernet/Wi-Fi (IEE 802.11g)
USB 4 USB 2.0 High Speed ports
20
3.2 Second stage: Nao with high computing power
OpenCV. An open source set of optimized libraries for image processing and devel-
opment of computer vision techniques 1.
Choregraphe. A graphical user interface (GUI) from Aldebaran included with the
robot Nao. It facilitates the construction of behaviors, control of the robot mo-
tion, and access to sensor data and the robot’s Operating System (OS). Finally,
Choregraphe’s text-to-speech and voice-recognition modules were used for human-
machine communication. A snapshot can be seen on figure 3.4.
Spikenet. A commercial package implementing spiking artificial-neural-networks for
visual object-recognition 2. A snapshot can also be seen on figure 3.4.
3.2 Second stage: Nao with high computing power
During this stage it was used a newer version of the Nao robot and the complementary
computing power was provided by desk computers. The experiments were being carried
for as part of a robotics course and the robot-autonomy constraint was not present
anymore. Therefore, it was possible to communicate with the robot via a long LAN
cable.
It is also important to highlight the drastic changes in the software architecture
between the first and second stages of the project. During the first stage, the architec-
ture was mainly focused on integrating available commercial software with the modules
developed by the team. During the second stage, the architecture evolved to a more
flexible and customized functionality compared to the one provided by YARP.
The changes made in this stage were the first steps towards a behavior-based soft-
ware architecture. This approach should provide the robot with a more reactive inter-
action with the world. This transition finished until third stage, and it is explained in
more detail on section 3.3.
On the other hand, these benefits came at the expense of spending a significant
amount of time building the new structure. The development of the architecture was a
continuous loop, where different requirements were being discovered along the process
and modules had to be re-engineered on several occasions.
1http://opencv.willowgarage.com/wiki/Welcome2http://www.spikenet-technology.com/technology.htm
21
3. PLATFORM
Figure 3.4: Used graphical user interfaces - GUI’s used during the first stage of the
thesis project: Choregraphe (Top) and Spikenet (Bottom). Choregraphe was also used
during the second and third stages
22
3.3 Third stage: Pioneer with distributed computing power
Table 3.3: Desk PC specifications
Component Specification
CPU Intel Quad Core 2.67GHz
RAM 3.4 GB
OS Linux 2.6.32-5-686 (Debian 6.0)
3.2.1 Second stage hardware platform
The Nao version used on the second and third stages was more recent than the one
on the first stage. Nevertheless, the changes between both versions were software im-
provements. The main change in the hardware architecture was related to the increased
external computing power. The specifications of the desk computers used at this stage,
instead of the mini-PC, are detailed in table 3.3.
3.2.2 Second stage software platform
For the customized architecture built at this stage, the programming language used
was Python. As mentioned before, the middleware was completely developed by our
team. YARP was replaced in order to solve some ground problems, including a long
time-delay for images transmission.
A Speeded-Up Robust Features (SURF) [4] implementation was done for object
detection. In comparison to Spikenet, it is faster, more robust to light changes, and it
is based on open source libraries. Specifically, we used the OpenCV libraries from from
Willow Garage 1. The detailed functioning of the SURF implementation in described
in section 4.1.
3.3 Third stage: Pioneer with distributed computing power
This final stage is part of the current work being done at the Artificial Intelligence
department of the University of Groningen. It will be referred mainly in the discussion
section. However, it is important to provide a description of this stage hardware and
1http://www.willowgarage.com/pages/about-us/overview
23
3. PLATFORM
software components, in order to understand the possibilities for future work on visual
navigation.
The Nao robot was placed on top of a Pioneer platform from the company Mobile
Robots 1. The idea was to ensure robust mobility, and to incorporate a set of five
laptops in order to parallelize the computational processes (See figure 3.5). The Nao
was mainly used for human-machine interaction. It has a more familiar appearance to
people, and its embodiment allows it to produce human-like body language.
During this stage it was conceived the final structure of the behavior-based software
architecture. All the modules previously built were migrated to this new paradigm, and
the underpinnings of the behaviors were completely different in nature. The details are
explained in subsection 3.3.2.
3.3.1 Third stage hardware platform
The functionality of the new hardware components included in the last robot con-
figuration is described in the following list (See tables 3.5 and 3.4 for the technical
specifications):
HD-cameras. Used for visual mapping, object detection and person identification.
Kinect. Utilized for obstacle avoidance, object detection and person identification. It
is an infrared range finder that provides a 3D point-cloud with a range between
0.3 to 5 meters approximately.
Unidirectional microphone. Required for speech recognition. It was located at a
height that could capture the voice coming from a person standing in front of the
robot.
Laptops. Five additional laptops provided the required computer power. The process-
ing of the raw data coming from the sensors, and the execution of the designed
behaviors was done in parallel.
Pioneer. Wheel based robot-platform 2 used to mobilize the processing units, the Nao
robot and the sensors.
1http://www.mobilerobots.com/ResearchRobots/ResearchRobots.aspx2http://www.mobilerobots.com/researchrobots/researchrobots/PioneerP3DX.aspx
24
3.3 Third stage: Pioneer with distributed computing power
Figure 3.5: Nao mounted on pioneer platform - Principal body components of Nao-
Pioneer robot:
Kinect. Infrared range finder providing a 3D point-cloud. Used for obstacle avoidance,
object detection and person identification.
Cameras. High definition monocular cameras used for visual mapping, object detection
and person identification.
Unidirectional microphone. Used for speech recognition.
Nao. Humanoid robot used for human-machine interaction.
Laptops. They provided computing power for the parallel processing of sensor data and
robot control.
Pioneer. Wheel based robot-platform used for stable mobility.
25
3. PLATFORM
Table 3.4: Robot’s laptops specifications
Component Specification
CPU Intel Core Duo T2400 1.83GHz
RAM 2 GB
OS Linux 2.6.32-31-generic (Ubuntu 10.04.2)
Table 3.5: Pioneer specifications
Component Specification
CPU Intel Pentium M 1.40GHz
RAM 1 GB
OS Linux 2.6.32-5-686 (Debian 6.0)
3.3.2 Third stage software platform
The modules developed for object detection and mapping were adapted to the behavior-
based architecture. However, their core functionality remained the same. The main
challenge was to obtain the same response -proven to successfully work in real environments-
with a much faster performance.
The details of the navigation algorithm are provided in chapter 4. Nevertheless, it
is briefly described the navigation behavior to justify the changes made. This example
allows to understand the reasons for the changes in the software platform, between the
second and third stages.
During the first stage, in order to move between two consecutive points in a room,
the robot performed the following steps:
1. Detect surroundings. While standing at a given location, the robot would
move the head on the horizontal axis (head yaw) to “look around”. The robot
head-yaw allowed a movement of 120 degrees. Every 30 degrees on the yaw axis
the robot would look at ±10 degrees to analyze the incoming image from its
camera. The actual processing of the image would take between 2 - 12 seconds,
depending on the amount of texture found in the image. As moving the neck
26
3.3 Third stage: Pioneer with distributed computing power
produced motion blur, at every step a 2 seconds pause was needed to stabilize
the image. At each location, including the time required for moving the head, it
was required 20-80 seconds to detect the surroundings.
2. Estimate location. Depending on the detected objects (their size and orien-
tation), the robot would estimate its most probable location in the room with
respect to the trained locations.
3. Correct and walk. Finally, if required, it would correct its orientation a variable
amount of degrees -according to the estimated location in the trained vector field-
and walk a predefined distance (around 25 cm).
This process was to be repeated, until the robot would estimate that its current
location was the target location in the room.
During the third stage the intention was to make navigation a reactive behavior, i.e.
to have a fast loop between perception and action. It was required to have a smooth
continuous movement of the robot until the target location was reached. For this
reason, it was not necessary for the robot to correct its orientation a variable amount
of degrees at different locations. Instead, it could continuously correct its orientation
a small fixed amount of degrees (5 degrees) in the required direction. Then it would
move forward a smaller distance than the one used in the second stage (5 cm).
Whenever the orientation of the robot was calculated to be corrected less than a
small range (± 5 degrees), the robot would simply move straight forward the predefined
distance. With the help of the distributed computing power, this more fine grained
motion could provide the required smoothness for a reactive behavior.
The use of two cameras mounted on the pioneer platform, instead of one in Nao
robot, eliminated the need of “looking around” at every location, saving valuable time.
However, the processing of the incoming images from the HD cameras required a pro-
cessing time of around 3 seconds, even when the image was reduced to a size similar to
the one used with the Nao camera. This lag may be due to the lower processing power
of the laptops on the robot, in comparison with the desk computers in the robot lab.
From the five available laptops, only one has a graphics accelerator card that could
speed up this processing. CUDA (Compute Unified Device Architecture) is a paral-
lel computing architecture that enables dramatic increases in computing performance
27
3. PLATFORM
by harnessing the power of the GPU (Graphics Processing Unit). Nevertheless, the
particular architecture of this card model was not capable of handling CUDA 1 imple-
mentations.
Some feasible possibilities to speed up the navigation algorithm include:
1. Modules optimization. Re-implementation of the algorithm in a lower level
language (C++).
2. Parallel processing. Starting multiple instances of the same vision modules,
where each works with a subset of the trained images. This division of the pro-
cessing tasks allows parallel processing in multi core CPUs.
3. Planning algorithm. Implementing a planner module that splits a building in
sub-areas. The planner would indicate the object detection modules to search
only for the objects found in the sub-area where the robot is currently located.
The reactive navigation system is part of the current work at the Artificial In-
telligence department of Groningen University. Future-work possibilities for a reactive
visual-topological navigation system, are addressed in the last chapter of this document.
1http://www.nvidia.com/object/what_is_cuda_new.html
28
Chapter 4
An Algorithm for
Visual-Topological Mapping
This chapter is dedicated to the technical explanation of the proposed visual-topological
mapping algorithm. It is described the design of the testing environments and the
methodology used for the experiments. Finally, a description of each module from
the mapping algorithm provides the necessary information to analyze the experimental
results.
4.1 Experimental setup
There exists a set of conditions that remained constant during all the experiments
performed in this project. They are described in the following subsections.
4.1.1 Robot niche
As the origin of the visual-navigation project is related with the RoboCup@Home
competitions, the robot niche is clearly defined in the tournament rule book. It is
intended to represent a typical indoor human environment. The main concept is defined
in the rule book as follows:
As uncertainty is part of the concept, no standard scenario will be pro-
vided in the RoboCup@Home League. [. . . ] The scenario is something that
people encounter in daily life. It can be a home environment, such as a
29
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
living room and a kitchen, but also an office space, garden, supermarket,
restaurant etc. The scenario should change from year to year. . . [8]
The experiments described in the Experiments chapter were performed in two dif-
ferent kind of scenarios. The first kind was an maze were the robot’s motion and the
mapping algorithm were observed with high precision. The second kind of scenarios
were real human indoor environments: A corridor towards an elevator, and a lounge
room with kitchen and living room facilities.
4.1.2 Lighting conditions
During the first stage of the thesis project, the light conditions at the SBRI labora-
tory remained constant during the complete day. During the second stage, the light
conditions were variable at the ALICE laboratory.
The lighting conditions varied within the same range during the training and testing
phases. In order to focus on the navigation algorithm performance, the experiments
were carried between the same day hours. In this way the vision algorithm performance
remained at the same level.
4.1.3 Obstacles cluttering
During the first and second stage of the project, the environment was free of moving
obstacles. The study of the visual-topological mapping required to avoid the influence
of the obstacle avoidance behavior. Collision avoidance was implemented during the
third stage. Its implementation in a full navigation system is work in progress.
4.1.4 Landmarks location
The environment for the first experiment was designed according to the Nao robot
embodiment. The visual landmarks were located at a height that would allow the
robot to detect them between 30-250 cm, without being disturbed by the lights from
the ceiling.
4.1.5 Camera properties
All models of the objects and visual features were built with the camera integrated
at the top-front of the Nao’s head. The chosen image size was 640 x 480 pixels. The
30
4.2 Machine learning techniques
navigation system can infer the angle of a detected object, relative to the X-Y axis in
the image, independently of the image size (See figure 4.1).
Figure 4.1: Projection on camera sensor - Projection of an object in the camera
sensor. The dashed square represents the perceived world and the opposite square the
camera sensor-plate. The point where all the projections converge represents the center of
the camera lens. The distance from this point to the sensor-plate is the focal-distance.
The origin of the arch projected on the sensor plate of the camera is at the center
of the camera lens. The origin of the X-Y coordinates is located at the center of the
image. Such coordinates are given in pixel units.
Once the focal distance is obtained, the angle of the detected object’s centroid can
be computed. The detailed explanation of this computations can be found in subsection
4.3.3. For example, if the centroid of a detected object is in the center of the image, its
coordinates will be (0,0) and both of its X and Y component’s angles will be 0 degrees.
The maximum angle on the Y axis, for the Nao camera, is depicted in figure 4.2.
4.2 Machine learning techniques
The proposed visual-topological mapping belongs to the supervised learning algorithms
category. Specifically two modules of the mapping algorithm are suitable for the use
machine learning algorithms: Object detection and Location estimation.
31
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
Figure 4.2: Nao’s camera angle - Nao’s camera maximum angle along Y axis direction.
With permission from Aldebaran Robotics .
The success of the mapping algorithm was measured at two levels in the experi-
ments: The success rate for estimating the exact location where the robot is located,
and the success rate for reaching the target location.
During the first stage of the project it was used Spikenet [28] for visual object
detection. Spikenet is a artificial neural network algorithm that allows the construc-
tion of object’s models for later recognition. During the second and third stages, a
SURF [4] implementation was used instead for visual object detection. The SURF
implementation used for this project is based on the OpenCV libraries for computer
vision.
For location estimation it was used a nearest neighbor implementation to catego-
rize the input vectors. Initially such vectors consisted of binary values indicating the
presence/absence of the trained objects. Later the input vectors were constructed with
abstract values describing the distance and rotation of the detected objects.
The methodology used for object detection and location estimation is explained in
detail in the subsections 4.3.2 and 4.3.5.
4.3 Visual navigation algorithm
The visual navigation method consists of behaviors and sub behaviors coupled in a
sequential loop. The implementation can be summarized in the following pseudo code:
while searching_location:
detect_surroundings() #Composed of three sub-behaviors
estimate_location()
32
4.3 Visual navigation algorithm
if current_location == target_location:
searching_location = False
correct_orientation()
1. detect_surroundings() Detection of trained objects in the surrounding space
of the robot. This module is composed of three sub behaviors.
(a) neck_motion() Searching behavior that allows the robot to perceive its
surroundings by moving the neck from left to right.
(b) store_detections() Detection of trained objects using of computer vision
techniques.
(c) compute_properties() Computation of the geometric properties of the de-
tected objects.
2. estimate_location() Estimation of the robot’s current location in the trained
path, or vector field.
3. correct_orientation() Correction of the robot’s orientation to match the po-
sition in the estimated current location.
Each of the elements in the previous steps is described in detail in the following
subsections. The store_detections() method was in charge of object detection. Both
of the object detection approaches introduced in the Platform chapter are explained in
the following subsections: Spikenet and SURF. They are explained separately in order
to compare their capabilities and justify the transition.
4.3.1 Object detection: Spikenet
During the first stage of the project, the object detection process was achieved with
Spikenet [11]. Spikenet fundamentally is an event-driven algorithm that efficiently mod-
els large networks of spiking neurons by only processing the output of firing neurons.
The set of parameters that was possible to specify for the neural network, included
a confidence threshold to consider positive detections, and color space to specify the
dimension of the vector space between RGB or B&W.
The construction of objects models was done manually during an off-line training
process. The complete procedure is described in the following sequence:
33
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
1. Collection of images from the environment. First it was gathered a dataset
of images from the required room. It was used the robot camera to collect the im-
ages, by turning the robot -in standing position- 360 degrees at spots distributed
all around the room. This approach intended to collect the required images for
building paths between any two points inside the room. Nevertheless, further
experimentation revealed it was sufficient to collect images close to the desired
path, and with the robot turning its head while acquiring images in a range of
120 degrees instead of 360.
2. Creation of models from objects. After the images were collected, they were
opened with the Spikenet Graphical User Interface (GUI) in order to manually
encircle the desired objects in each image. They were assigned descriptive names
such as TV, Computer, Microwave, etc. As well, it was specified the range of
model sizes and rotation to be generated. For example, Spikenet could search for
a model between 50-200% of the size of the original model, and within a rotation
of -12 to +12 degrees in steps of 6 degrees.
3. Continuous detection of objects. Then a script was run to initialize the robot
architecture, and the objects-models file was loaded in to the Spikenet object
detection module. Then every incoming image was searched for the presence of
any of the built models. In case of a detection, it was calculated the location of
the centroid of the detected object, its relative size with respect to the trained
model, and its rotation angle on the image plane. This computation was done
for every detection -in case of multiple simultaneous detections- and sent to the
navigation modules. An example of Spikenet detections can be seen in figure 4.3.
4.3.2 Object detection: SURF
SURF (Speeded-Up Robust Features) [4] is a descriptor and feature detector. It has
the advantage of being an on-line method, scale and rotation invariant. A K-Nearest
Neighbors (K-NN) implementation from OpenCV was used for matching the detected
keypoints or descriptors. SURF performance has shown more robustness when tested
against other popular algorithms, including SIFT (Scale-invariant feature transform)
and GLOH (Gradient Location and Orientation Histogram) [4].
34
4.3 Visual navigation algorithm
Figure 4.3: Spikenet detections - Visualization of detections made on incoming images
from Nao’s camera. Notice the low resolution of the used images. This resolution was
required to handle the expensive computations required by Spikenet.
Initially, images were manually trimmed to contain only the desired objects (See
4.4). Then all the images of the same object were moved to a folder with a descriptive
name, such as fridge, computer, sofa, etc. This process was tedious and slow. The
creation of a GUI was considered to speed up this process. Nevertheless, an alternative
approach -that could be implemented much faster- was sound and worth to try: To
consider a single object every full image from the training dataset. No trimming would
be done anymore.
Figure 4.4: SURF detections - Visual detection of trained objects using the SURF
implementation. Notice the algorithm’s robustness against rotation. The lines in the
bottom-left corner show some of the projections of the object’s keypoints on the X (hori-
zontal) and Y (vertical) image axis.
35
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
A draw back was that when the images where being trimmed their size was on
average 100 x 150 pixels size, and now they increased to 640 x 480, making their
processing more computationally expensive.
An example of SURF detections considering full images as single objects can be seen
on figure 4.5. The figure depicts two consecutive images taken from the robot, with the
neck pitch at -10 and 10 degrees (left-top and left-bottom images respectively). The
neck yaw angle is the same in both images. The detected objects are surrounded by
squares, and some of them were found in both images.
Figure 4.5: Full images considered as objects - Visualization of detections consid-
ering full images as single objects. The images on the left are the incoming images while
navigating. The images on the right show the detected objects, or what the robot “re-
members” from the trained images. The squares on the left images surround the objects
detected. The deformation in the squares occurs when the detected object is rotated (See
the dashed squares in the bottom images). The dots are located on the keypoints found
by the SURF algorithm.
The SURF method from OpenCV requires the specification of some parameters. For
36
4.3 Visual navigation algorithm
Table 4.1: Default SURF parameters
Component Specification
Descriptors size 64
Hessian threshold 300
Octaves 3
Octave layers 4
the experiments described in this work the default values were used, as they provided
a satisfactory performance. The parameters are described next, and the default values
can be found in table 4.1:
1. Descriptors size. Basic descriptors include 64 elements each, and extended
descriptors 128 elements each. The increase of this parameter enhances accuracy
at the cost of computational power.
2. Hessian threshold. Only features with Hessian larger than the specified value
are extracted.
3. Octaves. Number of octaves to be used for extraction. With each next octave
the feature size is doubled.
4. Octave layers. Number of layers within each octave.
5. Minimum matches threshold. This threshold indicates the minimum number
of keypoints in the current detection matching a stored model, to consider a
positive detection.
The construction of objects models from the images dataset, was done only once
during the training of a path. The created models were stored at the end of the first
training. In following training and testing runs the models were loaded instead of being
built again. Loading the models was 10 times faster than creating them at every run.
The complete procedure is described next:
1. Collection of images from the environment. Similar to the procedure fol-
lowed for Spikenet, it was gathered a dataset of images. This time the area for
37
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
collecting images was reduced to the path to be trained, instead at locations
spread around the entire room. The robot camera was used to collect the images.
The robot turned its head 120 degrees, on steps of 30 degrees from left to right,
while being on the default standing position from Choregraphe.
The spots for taking images were distributed along the path to be trained. The
distance between each, was twice as big as the distance the robot would walk
while navigating. This is, if during the testing phase it was required the robot
to walk 25 cm every time it would look around to calculate its location, then the
images were taken every 50 cm.
2. Creation of models from objects. The models were built with the SURF im-
plementation by creating a set of descriptors for each of the collected images. The
training process was performed in the initializer of the object detection module.
Then the system was continuously extracting the SURF features of the incoming
images, and comparing them with the trained models.
3. Continuous detection of objects. After the robot has moved its head, the
SURF module had to wait around 1 second to process a stable image. Without
this pause the image was blurry. The SURF module was not able to detect salient
features in blurry images. Salient features depend on the contour lines found by
the Hough transform. In blurry or sweeped images, the Hough transform finds
little -if any- contour lines.
The system concludes the presence of a trained object if the amount of detected
features matching a particular model is above a user-specified threshold. In the
case of several trained objects matching the same detected SURF features, it
displays all.
4.3.3 Calculation of an object’s angle
For the calculation of the angle of a detection in an incoming image, it is necessary to
know the camera’s focal distance. This value is not included in the hardware documen-
tation. Nevertheless, it is mentioned that the camera has a fixed focal distance, and
that the camera’s diagonal angle is 58 degrees. Knowing the camera’s diagonal angle
38
4.3 Visual navigation algorithm
and the image size (in pixels), the focal distance can be calculated with the following
procedure:
h =
√(x
2)2 + (
y
2)2 =
√x2
4+y2
4=
√x2 + y2
4=
√x2 + y2
2(4.1)
Where h is the hypotenuse (pixels from the center of the image to the corner), x is
half the length of the X axis, and y is half the length of the Y axis (See figure 4.6). With
the value of the hypotenuse and the angle of the image diagonal, the focal distance is
computed as follows:
Tan θ =h
fdist(4.2)
Where θ is half of the angle of the image diagonal, i.e. the angle from the center of
the image to the corner. fdist is the camera’s focal distance. By substituting equation
4.1 in equation 4.2 we obtain:
fdist =
√x2+y2
2
Tan θ=
√x2 + y2
2 Tan θ(4.3)
Figure 4.6: Object detection components - Angle of the X and Y components of the
detected object centroid. The blue square represents the camera sensor-plate. The object
is represented by the circle and the origin of the angle is measured from the focal point of
the camera lens.
39
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
Once knowing the focal distance, it is possible to calculate the angle of any pixel
along the X (horizontal) or Y (vertical) axis. For the calculation of the angle θY along
the Y axis of a given pixel, it was used the following method:
Tan θY =Y − yfdist
θY = arcTanY − yfdist
(4.4)
By substituting equation 4.3 in equation 4.4 we obtain:
θY = arcTanY − y√x2+y2
2 Tan θ
(4.5)
Where Y is the total distance from the bottom of the image to the given pixel (in
pixels), and y is half the length of the Y axis (also in pixels). Therefore, the computed
angle can take positive and negative values. The same procedure was followed for the
calculation of the angle θX along the X axis of a given pixel.
Finally, the current head yaw and pitch angles, were respectively added to the
centroid’s X and Y angles on the image. This computation provided the objects angles
relative to the robot’s front.
4.3.4 Detect surroundings
The creation of a vector field, or a “path”, consists of dividing the route that the robot
should learn in a discrete number of intermediate points. At each point the robot “looks
around” and finds which of the learned objects are present. The detection of several
objects facilitates the disambiguation between multiple locations that share landmarks
with similar properties.
The “look around” behavior made use of the full mobility range in the neck’s yaw
(horizontal) and pitch (vertical). The behavior was performed by placing the robot -in
the default stand position- at each location, and making it move the head from left to
right. The yaw movement was from -60 to 60 degrees, in steps of 30 degrees. On each of
the yaw steps it searched for models at 10 and -10 degrees along the head pitch motion.
In total the “look around” behavior consisted of 10 head positions. This behavior can
be summarized in the following pseudo algorithm:
40
4.3 Visual navigation algorithm
declare head_yaw_start #Initial head angle
declare head_yaw_steps #Number of horizontal movements
declare head_yaw_change #Angle to move the head at every step
declare head_pitch_up #Angle to look up
declare head_pitch_down #Angle to look down
look around():
head_yaw_position = head_yaw_start
for every head_yaw_steps:
set_head_yaw( head_yaw_position )
set_head_pitch( head_pitch_up )
detect_objects_in_image()
set_head_pitch( head_pitch_down )
detect_objects_in_image()
head_yaw_position -= head_yaw_change
store_detected_objects()
compute_geometric_properties_in_detections()
After looking around, the robot relates the detected objects with the current lo-
cation and stores them as a point in an Euclidean vector space. In such space, each
dimension corresponds to the properties described in subsection 4.3.2. The storage of a
location where the robot detected two objects, a painting and a computer, would look
as follows:
painting_23: body_x=0.7215, body_y=-0.1789, cov_x=7560.8802,
cov_y=8944.1634, cov_xy=-5521.9053
computer_17: body_x=0.7564, body_y=-0.0618, cov_x=9383.1553,
cov_y=736.9513, cov_xy=-501.8768
The number after the object’s name is the specific instance detected. body x and
body y are respectively the horizontal and vertical angles of the image centroid, in
relation to the robot’s body. cov x, cov y and cov xy are the values of the covariance
matrix of the object’s keypoints, with respect to their projections on the image’s X and
Y axis (See figure 4.4).
41
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
4.3.5 Location estimation
A location is any point on the floor of a given room, where the robot can walk. It is
assumed that from any location in a room, it is possible for the robot to detect salient
features in the incoming images from its camera. Such salient features or keypoints,
are generated by the texture produced in the images by furniture, decoration or the
geometry of the room itself (lines between the floor, walls, ceiling and windows). The
little “dots” created by the SURF algorithm represent the keypoints found on each
detected object (See figure 4.4).
The following properties can be computed from the keypoints describing a trained
object or image:
1. Variance. Variance of the keypoint’s projections on the X and Y axis of the
image.
2. Covariance. Covariance correlating the keypoint’s projections on the X and Y
axis of the image.
3. Centroid. Pixel coordinates of the keypoints center of mass.
4. Angles. Angles of the keypoints center of mass: Relative to the camera lens, i.e.
in the image, and relative to the robot’s body.
5. Square coordinates. Pixel coordinates of -the corners of- a square fitted around
the keypoints.
Each of the object’s properties represent a different dimension in its own vector
space. When storing the properties of the objects that the robot detected at a given
location, it is said that the location was stored. Finally, when storing a set of locations
in a vector field -between two desired points in a room- it is said that a path was trained
between such two points. The locations in the area at the end of the trained path are
called target locations. The rest of the locations in the trained path are called path
locations.1
During the testing phase, the robot should manage to re-run a trained path by
looking at the properties of the objects it can detect from its current location. Using
1The terminology used for describing the mapping procedure was considered adequate for our
algorithm, and is not standard in scientific literature.
42
4.3 Visual navigation algorithm
this information it has to find what is its most probable location in the trained path.
Then it has to correct its orientation according to the location it found to be the closest,
and walk the same distance there was between the locations in the trained vector field.
The representation of every location as a set of vectors, allows the robot to estimate
its most probable position when situated in a random place near the stored vector
field. The vector describing the robot’s current detections is found in an n-dimensional
Euclidean space. The n dimensionality of such space depends on the number of features
chosen to describe every object. For every detected object in a current detections
vector, the nearest neighbor algorithm can find another instance of the same object
in the trained locations vectors, with the smallest euclidean distance. Formally the
Euclidean distance is measured as follows:
d(p,q) = d(q,p)
=√
(q1 − p1)2 + (q2 − p2)2 + . . .+ (qn − pn)2
=
√√√√ n∑j=1
(qj − pj)2(4.6)
Where q is the vector describing a detected object, p is its counterpart stored in a
trained location, and n is the total amount of such features. Therefore, the most prob-
able location will have the smallest average distance, between the currently detected
objects and their counterparts stored in the vector field. The formal representation of
this method is given in the following formula:
arg min
[d(q,p1) + d(q,p2) + . . .+ d(q,pn)
]k
= arg min
∑ki=1 d(q,pi)
k
= arg min
∑ki=1
√∑nj=1(qj − pij)2
k
(4.7)
Where k is the number of detected objects. For each detected object, the robot
looks up the closest match in the stored locations vectors using only the keypoints
covariance-matrix. To be more precise, the variance of an object keypoints along the
X and Y axis of the image, will reflect the perceived size and orientation of the object.
The location of keypoints in an image depend on its texture. When considering full
images as objects, the distribution of the keypoints is uncorrelated. Therefore, the x
43
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
and y components of the keypoints were considered independent, and the covariance
matrix was calculated using an unbiased estimator:
σij =
∑Nn=1 (xin − µi)(xjn − µj)
N − 1(4.8)
The values of the keypoint’s covariance matrix are descriptive features. They reflect
the appearance of visual landmarks at specific locations, irrespective of the robot’s ori-
entation when reaching those locations (or near them). The example on sub-subsection
4.3.5.1 clarifies the specific selection of this features.
In summary, if the robot is commanded to bring a drink, it will search its current
location in the paths it trained to go to the kitchen. Furthermore, inside those paths
it will only compare the objects it is currently detecting, with the locations containing
those same objects. Even if the stored location contains more or less objects, it will
consider only the matching objects. This makes the system robust against occlusion or
displacement, of some of the objects it was trained to perceive at a given location.
4.3.5.1 Location disambiguation
Lets assume a path on a regular street, going towards a “STOP” traffic sign and then
surrounding it. If we had seen it far at the end of a street, it would look small and the
keypoints describing it would be close to each other (with little variance on the X and
Y axis). On the other hand, if we had been standing close in front of it, it would look
big in our field of view and its keypoints would be spread all over the image we perceive
(with large variance on the X and Y axis). Finally, if we would be standing close, with
the “stop” sign on our right, it would look like a long and thin vertical blob (with little
variance on the X axis and large variance on the Y axis). Hence, it would look the
same if we would be standing close, with the “stop” sign on our left. How could be
possible to disambiguate both scenarios if we have no other available landmarks?
It is appealing to use the angle of the object in our field of view to differentiate
between both of the cases mentioned above. In the locations 2 and 3 meters in front of
the “STOP” sign, we would perceive it at the same angle. In such cases the location
estimation is not misled by the use of the object’s angle in the robot’s field of view.
It is enough to use the variance of the keypoints. Nevertheless, the locations near the
44
4.3 Visual navigation algorithm
sign require us to turn in order to walk around the “STOP” sign. In this case the use
of angles may mislead the estimation of the current location. Why?
In the case of a location 0.5 meter in front of the sign (lets call it location B), we
want to turn for example to the left. Hence, when training this location in the path
we placed the robot facing 45 degrees to the left. Due to the nature of the training
method, when the robot reaches location B, even if it perceives the sign with the right
size (keypoints covariance) it may estimate closer the location 1 meter behind (lets call
it location A). In location A the sign was trained to be in front of the robot (as currently
perceived), whereas location B was trained to have the sign on its right side (as it was
facing 45 degrees to the left). The discrepancy between the currently perceived object
angle and the stored angle in location B, may be greater than the discrepancy between
the currently perceived object size and the size stored on location A. Therefore, leading
the nearest neighbor algorithm to find closer location A than location B. This confusion
is avoided when using only the covariance-matrix of the keypoints to feed the nearest
neighbor algorithm.
4.3.6 Correcting orientation
At every location, the angle of the detected objects is used to re-align the robot accord-
ing to the trained vector field. The robot is assumed to always perceive several objects
and/or visual features from any location. Therefore, if a few objects have changed
their position in the domestic setting, the average angle correction will not change
significatively. This is a key feature of this algorithm.
An obstacle avoidance module will prevent the robot from colliding with any moving
objects during navigation. Therefore, it is only important that periodically the robot
corrects its orientation to avoid fixed obstacles (walls, large pieces of furniture). There
is no need to follow a path with high precision. If the robot is aligned imprecisely
towards the goal, this will not avoid converging to the target area.
During the training of a path, on every location the robot was positioned in the
direction required to avoid static obstacles. Consider the example “U” shape path
depicted in figure 4.7. The path starts at the circle and ends at the cross. The green
squares represent the training locations and the pink stars represent the objects it could
detect at each point.
45
4. AN ALGORITHM FOR VISUAL-TOPOLOGICAL MAPPING
Figure 4.7: Example path - Example “U” shape path that the robot learnt. It starts
at the circle and ends at the cross. The green squares represent the training locations and
the pink stars represent the objects it could detect at each point.
On the 3rd and 4th locations from the starting point, the robot was trained to turn
90 degrees to the left. This turning behavior was achieved by storing such locations,
with the robot facing on the direction that it would have after having finished the turn.
During the testing phase, the robot was placed at the starting location. It “looked
around” to determine which objects it could detect from that location. Once the
angles of the detected objects were computed, it compared them with the objects
corresponding to the first location in the trained path. The correction angle was the
average difference between the angles of the matching objects. This angle allowed the
robot to correct its current orientation, and face the direction it had when training the
path. The formal representation of this algorithm is as follows:
g(q,p) =
[(q1 − p1) + (q2 − p2) + . . .+ (qk − pk)
]k
=
∑ki=1(qi − pi)
k
(4.9)
Where q is the vector containing the detected objects angle (relative to the robot’s
body), p is its counterpart stored in a trained location, and k is the number of detected
objects.
Finally, the robot walked a predefined distance between each location. This process
would be repeated until the final -or target- location was reached.
46
Chapter 5
Experiments
Now it is described the set of experiments performed to test the robustness of the
visual-topological mapping algorithm. Such tests explore the underpinnings of the
computational methods and their effectiveness in real world environments.
5.1 Experiment 1: Path formation
As a starting point, the robot learnt to traverse a simple “U” shape path. Such path
is depicted on figure 5.1. The robot started at the circle and finished at the cross. The
squares represent the trained locations and the stars the position of the objects it could
recognize. During the training procedure, the robot was manually placed at each of
the locations to be stored.
There are two effects that are explored in this first experiment:
1. Convergence of the robot to path and target locations. This experi-
ment explores the consequences of varying the number of intermediate locations
stored in a path between two points. In particular, it was observed the effect of
accumulated odometric error between locations.
2. Correction of the robot’s orientation. It is necessary to test the effectiveness
of the correction of the orientation of the robot, given that it actually knows its
current location in the path. The precise algorithm for the correction of the
robot’s orientation was detailed in subsection 4.3.6.
The objectives of this experiment require the following constraints to be observed:
47
5. EXPERIMENTS
Figure 5.1: Trained “U” path - “U” shape path that the robot has to memorize. The
robot starts at the circle and ends at the cross. The green squares represent the training
locations and the pink stars represent the objects it could detect at each point.
1. The robot was provided with its correct current location. Providing this
information to the robot allowed to test the performance of the correct orientation
behavior. During the training phase each location was written in a file line.
Therefore, having stored as many lines in the file, as intermediate locations in the
path. For example, at location 3 the robot would compare its current detections
with the ones stored in line 3 of the trained path file, and so on.
2. The trained path was tested in a maze free of obstacles. In this way
the variance of the Nao’s position, with respect to the trained positions, could be
observed at every location without interference from obstacle avoidance.
3. Objects placed along the path were never occluded. For this experiment
thirty objects were placed along the path. The only factors avoiding the robot
to detect them, would be having the objects further than 1.5 m, perceiving them
turned more than 40-50 degrees, or simply having them out of the range of view.
A high precision system, such as a zenithal camera orthogonal to the floor, was not
available in the laboratory at the moment of running this experiment. Nevertheless, a
48
5.1 Experiment 1: Path formation
set of images was taken at the end of each run. These images allowed the comparison
of accumulated odometric error, at every location in both paths.
Two paths were stored during the training procedure: The first one using 6 in-
termediate locations (90 cm between each), and the second one using 11 intermediate
locations(45 cm between each location).
The starting position of the robot was the same for all runs in both paths. The
position of the robot was marked at each of the locations it stopped by drawing a
line behind its feet. These lines showed the variation in the position of the robot -in
distance and angle- between the training and testing phase.
In figures 5.2 and 5.3, it can be seen the variance in the position of the robot at
every location in both paths. The robot re-ran 3 times the 6 locations path, and 2
times the 11 locations path.
Figure 5.2: 6-locations path - Variance at each location for the 6-locations path. The
white arrows show the training places and the robot orientation in them. The black arrows
indicate the robot position at each location during testing. Each foot was placed on top of
each arrow.
5.1.1 Conclusion for path formation
It was corroborated that the correction of the robot orientation is a robust approach
for traversing a path. If the output of the algorithm for location estimation is close
49
5. EXPERIMENTS
Figure 5.3: 11-locations path - Variance at each location for the 11-locations path.
The white arrows show the training places and the robot orientation in them. The black
arrows indicate the robot position at each location during testing. Each foot was placed
on top of each arrow.
enough to the actual location of the robot, it can be expected that the robot effectively
adjusts its orientation.
It can be observed in figures 5.2 and 5.3, that the variance in the angle of the robot,
when arriving at the intermediate locations, does not improve drastically between the
6 and the 11 locations path. Nevertheless, the variance in the distance to the trained
spots clearly decreased when increasing the number of training locations from 6 to 11.
This phenomenon is expected, as the error in the robot’s orientation accumulates as
the traveled intermediate distance increases. The results observed in this experiment are
promising for accurately following a path, especially if the time for location estimation
can be reduced. For example, in the case of estimating the current location of the robot
10 times per second, instead of every 45 cm, the precision for following a trained path
should increase dramatically.
50
5.2 Experiment 2: Location estimation
5.2 Experiment 2: Location estimation
During the first stage of the project, a small experiment was carried to verify the
robustness of our approach for solving the kidnapped robot problem. Its results provided
a guide line to formulate the current Location estimation experiment. We will call the
first stage experiment where am I?. Its steps are described as follows:
1. Assign objects to different rooms. It was specified in advance the name
and properties of several rooms in a file. Such rooms were represented by binary
or presence/absence (1 or 0 respectively) vectors, describing the state of all the
learned objects. Mostly the objects would be different in different rooms, although
some of them shared a few objects in common.
2. Estimate current location. Then it was shown to the robot a set of the
learned objects. It would send a feature vector to the nearest neighbor module
describing its current detections. The system would infer that the current location
corresponded to the room’s vector closest to the current detections vector, and
the robot would speak the room’s name out loud.
3. Forget no longer detected objects. Finally, the robot would have a “forget-
ting” period for every object, depending on a predefined time. If the robot would
not detect an object after some seconds, it would adjust the corresponding value
on its current detections vector, affecting in this way the nearest neighbor module
output.
In the particular case of not detecting any object, the robot stated the need of more
information to estimate its location. In a navigation system, when the robot finds itself
in this state, it can start an exploring behavior supported by the obstacle-avoidance
module.
In this second experiment, it was intended a deeper exploration of -the functionality
of- the nearest neighbor method in the proposed mapping system. The trained location
vector and the current detections vector did not consist anymore of binary values,
describing the status of all the trained objects at once. Now the vectors contained the
object’s properties that were described in subsection 4.3.4: body x, body y, cov x, cov y
and cov xy.
51
5. EXPERIMENTS
The robot calculated its most probable current location, by searching in the vector
space of a specific path. The output was the trained location vector with the smallest
Euclidean distance to the current detections vector.
In the previous experiment (section 5.1), all of the robot’s detections were stored
in separate files during the training and testing phases. Therefore, it was possible to
compare each of the detection vectors with the trained location vectors, and find if the
nearest neighbor output was exactly the actual location of the robot, or one of the
neighboring ones.
To provide homogeneity to the data, a new 9 locations path was trained in the
same maze. The path was tested 30 times. The trained locations file and the current
detection files, were used now as training and testing datasets respectively. Each of the
9 observations -or detection vectors- from the 30 runs was fed to the nearest neighbor
algorithm. A classification was considered successful when:
1. It was exactly the location of the robot when storing that detection vector.
2. It was n locations before or after the actual location of the robot, when storing
that detection vector.
Using the nearest neighbor method defined in formula 4.7, and considering the
constraints defined above, the success rate for the location estimation behaved as seen
in figure 5.4.
When zero neighbors are considered for the success, i.e. only when the output of
the nearest neighbor algorithm is identical to the current location, the success rate has
its lowest value. When considering n neighbors adjacent to the current location, as
successful outputs, the success rate increases dramatically.
The different curves specify the object’s properties removed before sending the
detections vector to the nearest neighbor algorithm. The object’s properties that re-
mained being used in the four curves, were the variance of the keypoint’s X and Y
components.
5.2.1 Conclusion for location estimation
It can be seen in graph 5.4 that even in the case of 0 neighbors, the algorithm’s per-
formance is well above the randomness threshold. The accuracy improves further with
52
5.2 Experiment 2: Location estimation
Figure 5.4: Estimation of the robot’s location in a vector field - In the graph it can
be seen the effect of increasing the number of neighboring locations -to the actual position
of the robot- considered as successful estimations. When zero neighbors are considered for
the success, i.e. only when the output of the nearest neighbor algorithm is identical to the
current location, the success rate has its lowest value. When considering n neighbors adja-
cent to the current location as successful outputs, the success rate increases dramatically.
The different curves specify the object’s properties removed before sending the detections
vector to the nearest neighbor algorithm. The four curves used the individual variances
of the X and Y keypoint’s components for the estimation of the robot’s location. In a
different setting, the use of angles may decrease the success rate more dramatically than
in this experiment. In sub-subsection 4.3.5.1 is described an example of a setting sensible
to the use of angles for location estimation.
53
5. EXPERIMENTS
the use of the first pair of neighbors. These results provide the ground for the follow-
ing experiment, where the path formation and location estimation methods are tested
together.
It is possible that these optimistic results are due to the particular configuration of
the maze used for this experiment. While the robot was walking along the path, some
of the closest objects on its side were only detectable from specific locations.
On the other hand, these results prove the effectiveness of the method when there
are enough objects spread in the environment to disambiguate a location from the rest.
The results found in this experiment showed a satisfactory performance in an artificial
environment. The time had come now to test the mapping algorithm in the real world.
5.3 Experiment 3: Visual navigation
At this point it was possible to test the combined performance of the robot’s visual
perception, the correct orientation behavior, and the location estimation provided by
the nearest neighbor algorithm. Two real scenarios were used for testing purposes. The
first one was a path towards an elevator with a few visual landmarks in the surroundings.
The second path was built in a lounge with kitchen and living room facilities.
Both scenarios were used to test the convergence of the robot to the target location.
In the second scenario was tested an optimized version of the navigation algorithm. This
optimization facilitated the conduction of four times more tests.
The training procedure, for both scenarios, was similar to the procedure used in
the first experiment (Section 5.1). The main difference was that the full images coming
from the robot’s camera were considered as objects. The images were not manually
trimmed anymore. The procedure consisted of the following steps:
1. First a set of images was taken with the robot, at several points along the path
to be created. 431 images were taken for the elevator path and 352 for the lounge
path.
2. After creating models of the set of images, the path or vector field was trained.
The robot was manually placed at different locations facing in the desired direc-
tion. In this way it was formed the vector field leading to the target location. See
54
5.3 Experiment 3: Visual navigation
figure 5.5. 84 locations were trained for the elevator path and 64 for the lounge
path.
3. Finally the robot could be placed at any point near the vector field, and it would
start moving towards the indicated target area.
4. The stopping criterion was defined by the set of trained locations around the
target area. Such locations were stored as target locations and the rest of the
trained points as path locations. If the current detections vector would be closest
to one of the target location vectors, the robot would assume to have reached the
target area and stop.
The robot described trajectories with some deviations from the shortest line between
the starting and target points as seen in figure 5.6. A comparison of the characteristics
and result of both experiments can be seen in table 5.1.
Figure 5.5: Stored vector field - Area where the vector field was trained. The elevator
door was the target location.
Figure 5.6: Robot trajectory - One of the zig-zag paths described by the robot during
testing phase of the Elevator path.
55
5. EXPERIMENTS
Table 5.1: Visual navigation results
Elevator path Lounge path
Trained objects 431 352
Vector field locations 84 64
Vector field density 9 locations/m2 6-14 locations/m2
Convergence to target 0.8 0.8
5.3.1 Elevator path
On the first scenario, the robot was trained to reach the elevator door from one of the
corridors at the Artificial Intelligence department. This location was specially challeng-
ing for the SURF algorithm. The walls lacked enough visual texture from decoration
to generate keypoints, or regions of interest. Nevertheless, the SURF algorithm created
enough models of the scene images to conduct the robot to the goal.
The path locations had no ordered sequence. For this reason it was not possible to
measure the location-estimation success rate as in the second experiment (Section 5.2).
Nevertheless, this procedure provided the desired speed and flexibility for the training
procedure. The success rate now was considered the number of times the robot would
converge to the target location, with respect to the total amount of trials.
The robot converged to the target location in 4 out of 5 times (See table 5.1).
The single occasion that was not successful resulted from situational factors: While
taking a corner, the robot got an arm stuck and, consequently, lost its balance and
fell. The small number of trials is due to the required time for training and testing
the mapping algorithm. It took approximately 1.5 minutes to conclude each step for
location-estimation, and 40 - 50 steps before reaching the goal location. This meant
around 60 - 75 minutes each run.
5.3.1.1 Conclusion for elevator path
Even though the result shown in figure 5.7 seems satisfactory, it has a low statistical
significance due to the low number of trials. The time required for each experiment was
impractical. It is common to find a small number of trials for experiments in robotics
papers. Nevertheless, having only five trials yielded unreliable results.
56
5.3 Experiment 3: Visual navigation
Figure 5.7: Confidence limits for Elevator path - Confidence limits for a recognition
rate p=0.8 as a function of N=5 samples and level of confidence alpha=0.01.
For this reason the training and testing procedures were improved by dynamically
reducing the search space. When searching only for the objects nearby the estimated
location of the robot, the look around behavior was speeded up 4-5 times. The complete
description of this improvement is given in the following subsection.
5.3.2 Lounge path
For this scenario the training and testing procedures were improved. The nearest
neighbor algorithm was used to reduce the search space of the robot while navigating.
One of the trajectories described by the robot in this scenario can be seen in figure 5.8.
The optimization procedure consisted of the following steps:
1. The first location estimation at the beginning of every experiment, was performed
in the same form as in the previous scenario. The robot compared its current
detections against the full set of trained locations. At the same time it searched
for all of the trained images in the incoming image. The output of the algorithm
was the most probable location of the robot.
2. After the first location estimation, the robot calculated which were the l closest
locations to the location it found on the previous step. l = L/4, where L is the
total amount of trained locations in the path, and l is the subset of L closest to
the estimated location at the beginning of every run.
57
5. EXPERIMENTS
Figure 5.8: Lounge path - Example of one of the trajectories described by the robot
when searching for the fuzzy target location Livingroom. In the top four images is shown
an uncluttered section of the trained path. Here the robot did not require to train a dense
vector field to successfully describe the desired route (6 locations/m2). In the bottom 10
images the environment was more cluttered, and the robot required to train a dense vector
field to navigate smoothly (14 locations/m2).
58
5.3 Experiment 3: Visual navigation
3. For the second and subsequent location estimation steps, the robot only searched
for the objects contained in the l closest locations. Then it only compared its
current detections vector against the same l closest locations.
4. Step three was continuously repeated until one of the closest locations would be
a target location. In this case all the target locations were included in the search
space at every step, and the value of l increased to l = L/2
The reasons for the inclusion of all the target locations and the increase of l, under
the final step conditions, were experimental. Sometimes the robot was not able to find
the objects at the end of the path. On the other hand, sometimes it would find itself
in a target location, while being a bit far from the target area. What were the reasons
for this behavior of the algorithm?
Sometimes the robot was not looking for some of the objects in the target area,
because the l estimated closest locations did not contain them. For this reason all the
target locations were included in the search space as soon as one of the closest locations
was a target location.
The inclusion of all the target locations would gave a greater weight to the target
locations in proportion to the included path locations. For this reason, sometimes
the robot found itself in a target location, while being a bit far from the target area.
Therefore, the increment of l balanced again the search space.
The parameters and performance of the mapping algorithm in the lounge path ex-
periment can be seen in table 5.2. The first 10 runs used a SURF minimum matches
threshold of 7 and the last 10 runs a threshold of 4 (For a description of this parameter
see subsection 4.3.2). The intention of this change was to test the effect on the mapping
algorithm an increasing number of detections. There was no significant difference in
the results when changing the value of this parameter.
5.3.2.1 Conclusion for lounge path
From 20 runs completed in this scenario, 4 were not considered successful. In 2 occasions
the robot could not find the objects near the target area. This problem motivated the
inclusion of all target locations and the increase of l as explained in the fourth step
of the optimization procedure. The other 2 unsuccessful runs were due to the robot
walking towards a white shell.
59
5. EXPERIMENTS
Table 5.2: Optimized navigation results
Run Success Distance
between
loca-
tions
(m)
SURF
matches
thresh-
old
Total
steps
Total
time
(min)
Comments
1 good 0.25 7 18 13.00
2 bad 0.25 7 29 21.00 Walked towards white ob-
jects shell. Obstacle avoid-
ance from sonars can avoid
this fault.
3 good 0.25 7 49 36.00
4 bad 0.25 7 26 19.00 Reached target area but did
not stop. Late and getting
dark.
5 good 0.25 7 24 24.28
6 good 0.25 7 22 20.00
7 good 0.25 7 17 16.08
8 good 0.25 7 25 21.67
9 bad 0.25 7 24 18.00 Reached target area but did
not stop.
10 good 0.25 7 17 16.18
11 good 0.25 4 22 19.07 SURF threshold reduced for
darker conditions.
12 good 0.25 4 22 20.32
13 good 0.25 4 19 21.09
14 good 0.25 4 19 14.89 Finished a bit far.
15 good 0.25 4 35 26.26
16 good 0.25 4 21 22.57
17 good 0.25 4 37 22.35
18 good 0.25 4 20 20.83
19 bad 0.25 4 15 18.74 Walked towards white ob-
jects shell.
20 good 0.25 4 18 20.17 Obstacle avoidance imple-
mented.
Average 23.95 19.925
60
5.3 Experiment 3: Visual navigation
The interruption of navigation when the robot is facing a flat surface can be over-
come with obstacle avoidance. The implementation of obstacle avoidance behavior
was done at the end of the trials. Initially, it was not considered necessary for this
experimental setup, but the experiments proved it complementary to the proposed
visual-topological mapping algorithm.
Even though the number of trials increased, the proportion of the success rate
remained the same (See figure 5.9). Nevertheless, the confidence limits increased and
gave more validity to the success rate. A global discussion of the results from the
previous experiments is given in the final chapter.
Figure 5.9: Confidence limits for Lounge path - Confidence limits for a recognition
rate p=0.8 as a function of N=20 samples and level of confidence alpha=0.01.
61
Chapter 6
Discussion
The experiments conducted in this research project explored the robustness of the
proposed visual-topological mapping algorithm, for indoor navigation with the robot
Nao. The results suggest that the mapping algorithm is a promising approach that
can be expanded in to a full navigation system. Such system can be achieved with a
hybrid system that incorporates obstacle avoidance, exploration and dynamic motion
planning.
The visual-topological mapping algorithm can be applied in other settings than
the domestic one. This algorithm should work in any place where there are salient
visual features, and where the environment is not too uniform. The only places where
traditional approaches should outperform this method, are in constrained (indoor) en-
vironments where there is not enough texture in the surroundings, e.g. the walls may
be plain white and with scarce decoration. In such circumstances, it is feasible to
complement the visual-topological mapping with existing alternatives based on odom-
etry, proximity sensors (sonars or range finders providing 3D point clouds) or visual
detection of the environment geometry (detection of corridors).
During the RoboCup@Home championships, the RUG team will not only use the
proposed visual algorithm, but it will be merged with a more traditional approach using
odometry and a 3D point cloud sensor. This approach is an attempt to overcome the
possible difficulties described in the previous paragraph. Autonomous robots should
be able to navigate in diverse indoor environments, including scenarios with plain and
scarce decoration.
62
6.1 Future work
It is important to stress the robustness of this visual algorithm against changes in the
environment. It assumes a dynamic world, although smoothly changing. This means
that some objects in the environment may change and the robot is still able to estimate
its current location. At the moment of the writing of this thesis, the author did not find
another cases of visual algorithms coping with occlusion in scientific literature. On the
other hand, an abrupt change in the environment visual features leads to uncertainty
about the current location. Some possibilities to study the degree of robustness against
visual occlusion is given in the following subsection.
As in the case of any other research project, the adequacy of testing platforms
and locations is reflected in the time required to obtain experimental results. Dur-
ing this project, the availability of robots increased from one Nao, to five Naos, three
wheel based platforms and available hardware for parallel processing. This significant
improvement can be enhanced even further by a designated area with domestic facili-
ties. This experimental setup can allow a greater control in the environment (lighting,
furniture, people walking around) and guaranteed access to testing facilities all year
around.
6.1 Future work
Machine learning algorithms can be added in several steps of the visual-topological nav-
igation algorithm. Form exploring behaviors, improved localization and mapping and
automated discrimination between useful landmarks (a sofa, windows) and misleading
objects (a suitcase, a person), to a fully automated visual SLAM. After all, the long
term goal in the field of Autonomous Robotics is to achieve the autonomy found in
natural systems.
It is important to learn the location of useful regions for visual exploration. The
robot should learn where to search for landmarks. In different settings, it could be
more informative to search around the ceiling, the horizon or even on the floor.
The effects of visual occlusion in the proposed algorithm is an important feature
that could be studied further. In the case of scene sections considered as single objects,
which proportion of the field of view is required by the algorithm to successfully estimate
the robot’s location? Even further, what is the precise effect of the number of keypoints
in a scene when testing different degrees of occlusion?
63
6. DISCUSSION
A possible alternative to the use of full images as objects is automatic clustering.
At the moment of writing, research was being conducted about the use of salient visual
features which do not necessary correspond to a specific object. The robot builds visual
models using clustering algorithms on the image keypoints. There is no longer need for
tedious handcrafted images for training.
Preliminary results show that a dendrogram algorithm performs very well for au-
tomatic clustering. Since this bottom-up procedure creates many more models than
those being used in this paper, the robot is expected to be more robust against changes
in the environment. On the other hand, the increase in the number of models requires
more time or computational power for processing the incoming images.
In the next months, automated pruning algorithms will be added to the system to
reduce the amount of vectors searched in the nearest neighbor space. There is no need
to constantly estimate the robot’s precise location it an entire building, if the robot
already knows it is located in certain room or subsection. At the moment there is
no need for this procedure, but the necessity will arise when the robot starts moving
in other areas than in and around our robot laboratory. The pruning algorithm is
expected to keep the nearest neighbor vector space lean and real-time.
The nearest neighbor success rate should have a threshold of optimality, with respect
to the locations and landmarks densities in a given scenario. This is, The chosen
distance between trained locations in a vector field, and the spreading of the landmarks
it the room. Answering this optimization problem could greatly speed up the training
procedure.
6.2 Conclusion
The algorithm described in this text provides a promising alternative for navigation in
a natural and changing environment. It provided a method where a humanoid robot
is using a passive sensor to detect visual features. Such features allow the perception
of trained objects for localization and mapping. These characteristics were observed
through experimentation and successfully answered the initial research questions.
Is it possible to implement a visual-topological mapping algorithm with the robot
Nao, relying on its embedded motor, sensing and computing capabilities? Certainly
the nature of computer vision algorithms require a higher computing power than that
64
6.2 Conclusion
found in Nao robot. The initial attempt of navigating with low resolution images proved
insufficient to recognize objects a few meters far from the robot.
Is it possible to improve the mapping algorithm after customizing it to the NAO’s in-
corporated sensing devices, and with the aid of additional computational power? Making
an optimized use of the Nao’s sensing devises improved the performance of the algo-
rithm. Nevertheless, the main boosting came from the use of additional computational
power. The inclusion of external processors, mainly to handle the computer vision
modules, allowed a greater exploitation of the robot’s embodiment.
Is this algorithm providing sufficient capabilities to the robot Nao to perform SLAM
in an indoor location in a reasonable amount of time, relative to the RoboCup@Home
competition requirements? The main problem to fulfill the navigation requirements
in the @Home competition is the Nao’s embodiment. The robot can explore only a
limited proportion of space at a time. Therefore, it requires valuable time to “look
around” at every step in the navigation procedure. Additionally, the robot’s motion
was found to be too slow to traverse the required distances in the provided time at
the competitions. Nevertheless, the mapping algorithm has shown to be a satisfactory
approach that could be expanded in to a complete SLAM algorithm.
The current use of a wheel based platform (Pioneer) in the RUG robot team, has
overcome all of the above mentioned difficulties. The Pioneer platform can tackle the
@Home competition time constraints. The displacement inside a domestic environment
is fast, and permits the incorporation of multiple cameras. Such cameras can perceive
“at once” several regions around the robot. Finally, the Pioneer is capable of carrying
the greater computational power required for computer vision algorithms.
It is clear the large amount of work remaining to make navigation a reality in
commercially available domestic robots. For the particular case of visual-topological
navigation, the expectations are high. Machine learning techniques can be implemented
in several modules, eventually leading to a fully automated procedure.
The creation of a localization and mapping system based on biological and cognitive
principles, has shown to be a good starting point for a full navigation system. Artificial
intelligence certainly is a promising source of inspiration to tackle problems in complex
and dynamic environments.
65
6. DISCUSSION
66
References
[1] A. Angeli, S. Doncieux, J.A. Meyer, and D. Filliat. Incremental vision-based topological
SLAM. In Intelligent Robots and Systems, 2008. IROS 2008. IEEE/RSJ International
Conference on, pages 1031–1036. IEEE, 2008. 5
[2] A. Angeli, S. Doncieux, J.A. Meyer, and D. Filliat. Real-time visual loop-closure detection.
In Robotics and Automation, 2008. ICRA 2008. IEEE International Conference on, pages
1842–1847. IEEE, 2008. 5
[3] R.C. Arkin. Behavior-based robotics. The MIT Press, 1998. 4
[4] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool. Speeded-up robust features (SURF).
Computer Vision and Image Understanding, 110(3):346–359, 2008. ISSN 1077-3142. 7, 23,
32, 34
[5] M. Becker, C.M. Dantas, and W.P. Macedo. Obstacle Avoidance Procedure for Mobile
Robots. In ABCM Symposium Series in Mechatronics, volume 2, pages 250–257. Citeseer,
2006. 12
[6] O. Booij, B. Terwijn, Z. Zivkovic, and B. Krose. Navigation using an appearance based
topological map. In Robotics and Automation, 2007 IEEE International Conference on,
pages 3927–3932. IEEE, 2007. ISBN 1424406013. 6, 9
[7] H.M. Choset. Principles of robot motion: theory, algorithms, and implementation. MIT
press, 2005. ISBN 0262033275. 11
[8] RoboCup@Home Technical Commitee. RoboCup@Home - Rules & Regulations. RoboCup
Federation, 2010. 30
[9] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags
of keypoints. In Workshop on statistical learning in computer vision, ECCV, volume 1,
page 22. Citeseer, 2004. 9
[10] A.J. Davison, I.D. Reid, N.D. Molton, and O. Stasse. MonoSLAM: Real-time single camera
SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1052–
1067, 2007. ISSN 0162-8828. 5
67
REFERENCES
[11] A. Delorme and S.J. Thorpe. Spikenet: an event-driven simulation package for modelling
large networks of spiking neurons. Network: Computation in Neural Systems, 14(4):613–
627, 2003. 7, 33
[12] M.J. Fenske, E. Aminoff, N. Gronau, and M. Bar. Top-down facilitation of visual object
recognition: object-based and context-based contributions. Progress in brain Research,
155:3–21, 2006. ISSN 0079-6123. 10
[13] D. Filliat. A visual bag of words method for interactive qualitative localization and map-
ping. In Robotics and Automation, 2007 IEEE International Conference on, pages 3921–
3926. IEEE, 2007. ISBN 1424406013. 10
[14] F. Fraundorfer, C. Engels, and D. Nister. Topological mapping, localization and navigation
using image collections. In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ
International Conference on, pages 3872–3877. IEEE, 2007. 9
[15] C.R. Gallistel. The organization of learning. The MIT Press, 1990. ISBN 0262071134. 2
[16] T. Goedeme, M. Nuttin, T. Tuytelaars, and L. Van Gool. Omnidirectional vision based
topological navigation. International Journal of Computer Vision, 74(3):219–236, 2007.
ISSN 0920-5691. 9
[17] J.L. Gould and C.G. Gould. The honey bee. Scientific American Library, 1988. ISBN
0716750236. 2
[18] S. Grossberg and T.R. Huang. ARTSCENE: A neural system for natural scene classifica-
tion. Journal of Vision, 9(4), 2009. ISSN 1534-7362. 10
[19] G. Kootstra. Selection of Landmarks for Visual Landmark Navigation. Order, 501:2942,
2002. 5
[20] B. Krose, O. Booij, and Z. Zivkovic. A geometrically constrained image similarity mea-
sure for visual mapping, localization and navigation. In Proceedings of the 3rd European
Conference on Mobile Robots, pages 168–174, 2008. 10
[21] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International journal
of computer vision, 60(2):91–110, 2004. ISSN 0920-5691. 10
[22] U. Nehmzow. Animal and robot navigation. Robotics and Autonomous Systems, 15(1-2):
71–81, 1995. ISSN 0921-8890. 2, 3, 10
[23] A. Ranganathan, E. Menegatti, and F. Dellaert. Bayesian inference in the space of topo-
logical maps. Robotics, IEEE Transactions on, 22(1):92–107, 2006. ISSN 1552-3098. 5, 8,
9
68
REFERENCES
[24] F. Savelli and B. Kuipers. Loop-closing and planarity in topological map-building. In
Intelligent Robots and Systems, 2004.(IROS 2004). Proceedings. 2004 IEEE/RSJ Inter-
national Conference on, volume 2, pages 1511–1517. IEEE, 2004. ISBN 0780384636. 8,
9
[25] H. Shatkay and L.P. Kaelbling. Learning topological maps with weak local odometric
information. In International Joint Conference on Artificial Intelligence, volume 15, pages
920–929. Citeseer, 1997. 8
[26] R. Simmons and S. Koenig. Probabilistic robot navigation in partially observable envi-
ronments. In International Joint Conference on Artificial Intelligence, volume 14, pages
1080–1087. Citeseer, 1995. 8
[27] R.C. Smith and P. Cheeseman. On the representation and estimation of spatial uncertainty.
The international journal of Robotics Research, 5(4):56, 1986. 6
[28] S.J. Thorpe, R. Guyonneau, N. Guilbaud, J.M. Allegraud, and R. VanRullen. Spikenet:
Real-time visual processing with one spike per neuron. Neurocomputing, 58:857–864, 2004.
32
[29] S. Thrun. Learning metric-topological maps for indoor mobile robot navigation. Artificial
Intelligence, 99(1):21–71, 1998. ISSN 0004-3702. 5
[30] S. Thrun. Robotic mapping: A survey. Exploring artificial intelligence in the new millen-
nium, pages 1–35, 2002. 2, 5, 8
[31] N. Tomatis, I. Nourbakhsh, and R. Siegwart. Hybrid simultaneous localization and map
building: a natural integration of topological and metric. Robotics and Autonomous sys-
tems, 44(1):3–14, 2003. ISSN 0921-8890. 8
[32] C. Valgren, T. Duckett, and A. Lilienthal. Incremental spectral clustering and its ap-
plication to topological mapping. In Robotics and Automation, 2007 IEEE International
Conference on, pages 4283–4288. IEEE, 2007. ISBN 1424406013. 9
[33] T.H. Waterman. Animal navigation. Scientific American Library series (USA), 1989. 2
[34] T. Wisspeintner, T. van der Zant, L. Iocchi, and S. Schiffer. Robocuphome: Scientific
competition and benchmarking for domestic service robots. Interaction Studies, 10(3):
392–426, 2009. iv, 6
[35] C. Zhou, Y. Wei, and T. Tan. Mobile robot self-localization based on global visual ap-
pearance features. In Robotics and Automation, 2003. Proceedings. ICRA’03. IEEE In-
ternational Conference on, volume 1, pages 1271–1276. IEEE, 2003. ISBN 0780377362. 5,
10
69
REFERENCES
[36] Z. Zivkovic, B. Bakker, and B. Krose. Hierarchical map building using visual landmarks
and geometric constraints. In Intelligent Robots and Systems, 2005.(IROS 2005). 2005
IEEE/RSJ International Conference on, pages 2480–2485. IEEE, 2005. ISBN 0780389123.
6, 9
70