Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
POLITECNICO DI TORINO
Doctorate SchoolDoctorate in Information and System Engineering
Tesi di Dottorato
Assistive Technology and Applicationsfor the independent living
of severe motor disabled users.
Emiliano CASTELLINA
Tutorprof. Fulvio Corno
April 2009
II
Contents
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Eye tracking 9
2.1 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.4 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 User comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Overall results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Software Applications 25
3.1 Web Browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.1 Accessible Surfing Extension (ASE) . . . . . . . . . . . . . . . .27
3.1.2 Preliminary Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 Multimodal interaction . . . . . . . . . . . . . . . . . . . . . . . . . . .32
3.2.1 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
III
3.2.2 Speech Recognition Background . . . . . . . . . . . . . . . . . . 36
3.2.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.5 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.7 Name Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.8 Role Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.3 Gaze interaction in 3d environments . . . . . . . . . . . . . . . . .. . . 50
3.3.1 Control and Interaction Techniques . . . . . . . . . . . . . . . .51
3.3.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4 Domotics 57
4.1 Domotics Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 General architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63
4.3 Intelligent Domotic Environments . . . . . . . . . . . . . . . . . .. . . 65
4.4 OSGi framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 DOG Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5.1 Ring 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5.2 Ring 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5.3 Ring 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5.4 Ring 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Ontology-based interoperation in DOG . . . . . . . . . . . . . . .. . . . 77
4.6.1 Start-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6.2 Runtime command validation . . . . . . . . . . . . . . . . . . . 79
4.6.3 Inter-network scenarios . . . . . . . . . . . . . . . . . . . . . . . 79
4.7 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.7.1 Dynamic Startup Configuration . . . . . . . . . . . . . . . . . . 82
4.7.2 Complex Command Execution . . . . . . . . . . . . . . . . . . . 83
4.7.3 Adding a new device . . . . . . . . . . . . . . . . . . . . . . . . 84
IV
4.7.4 Comparison of DOG to related works . . . . . . . . . . . . . . . 86
4.7.5 Mediated Interaction . . . . . . . . . . . . . . . . . . . . . . . . 86
4.7.6 Direct interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.8 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5 Conclusions 99
A Abbreviations 113
B Convention on the Rights of Persons with Disabilities 115
B.1 Guiding Principles of the Convention . . . . . . . . . . . . . . . . . .. . 115
C Pubblications 117
V
List of Tables
2.1 Eye tracking techniques comparison . . . . . . . . . . . . . . . . .. . . 16
3.1 Object classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48
3.2 Control and Interaction Techniques . . . . . . . . . . . . . . . . . .. . . 52
3.3 Game 1 Test: Precision and Time . . . . . . . . . . . . . . . . . . . . . 55
3.4 Games 2 and 3: Precision and Time . . . . . . . . . . . . . . . . . . . . 56
4.1 Requirements for Home Gateways in IDEs. . . . . . . . . . . . . . . .. 66
4.2 Interfaces defined by the DOG library bundle . . . . . . . . . . .. . . . 72
4.3 Requirements satisfied by related works, in comparison with DOG. . . . . 86
VI
List of Figures
1.1 Research Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Eye tracking systems classification . . . . . . . . . . . . . . . . .. . . . 13
2.2 Infrared light reflections . . . . . . . . . . . . . . . . . . . . . . . . .. 14
2.3 Quality of Life (McGill Scale) . . . . . . . . . . . . . . . . . . . . . .. 21
2.4 Depression (ZDS) and self-estimated burden (SPBS) . . . . .. . . . . . 21
2.5 SWLS (satisfaction with life scale) . . . . . . . . . . . . . . . . . .. . . 22
2.6 ALS Centre questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1 MyTobii Web Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Accessible surfing on Cogain.org . . . . . . . . . . . . . . . . . . . . .. 28
3.3 ASE architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 The general architecture of the proposed system . . . . . . .. . . . . . . 31
3.5 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.7 Name Ambiguity: Unique and ambiguous (Indiscriminableand Discrim-
inable) objects vs. GW size . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.8 Role Ambiguity: Unique and ambiguous(Indiscriminable and Discrim-
inable) objects vs. GW size . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.9 ScreenShots of the games . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.1 The general architecture of the proposed domotic system. . . . . . . . . 64
4.2 DOG architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.3 An overview of the DogOnt ontoloy . . . . . . . . . . . . . . . . . . . .75
VII
4.4 SPARQL queries for retrieving all BTicino OpenWebNet (a) and all KNX
(b) in DogOnt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.5 The SPARQL query needed to retrieve the commands that can be issued
to a specific device, e.g. a DimmerLamp. . . . . . . . . . . . . . . . . . 80
4.6 The switch-all-lights-off rule, in Turtle notation. . .. . . . . . . . . . . . 80
4.7 The demonstration cases used to perform functional tests on DOG. . . . . 81
4.8 Complex Command Execution . . . . . . . . . . . . . . . . . . . . . . . 83
4.9 Adding a new device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.10 The control application with a quite accurate tracker.. . . . . . . . . . . 87
4.11 The control application with a low-cost visible light tracker. . . . . . . . . 88
4.12 ASL 501 headband attaching the two optics system. . . . . .. . . . . . . 91
4.13 The ART system control interface . . . . . . . . . . . . . . . . . . .. . 92
4.14 Typical stages of the ART system (a. Stability of eye gaze captured b.
Gaze on object detected c. Control initiated) . . . . . . . . . . . . .. . . 93
4.15 ART system flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
VIII
Chapter 1
Introduction
Disability is an evolving concept, and . . . disability results from the inter-
action between persons with impairments and attitudinal and environmental
barriers that hinders full and effective participation in society on an equal
basis with others.
Preamble of Convention on the Rights of Persons with Disabilities [1].
1.1 Motivation
An estimated 10% of the world’s population, approximately 650 million people, experi-
ence some form of disability. The number of people with disabilities increasesdepending
on factors such as population growth, aging and medical advances that preserve and pro-
long life. These factors are creating considerable demandsfor health and rehabilitation
services. Furthermore, the lives of people with disabilities are made more difficult by the
way society interprets and reacts to disability which require environmental and attitudinal
changes. The UN convention on the rights of persons with disabilities 1 emphasizes the
importance of mainstreaming disability issues for sustainable development. Attention to
1Further details are provided in Appendix B
1
1 – Introduction
health and its social determinants are essential to promoteand protect people with dis-
abilities and for greater fulfillment of human rights. Need for strong and evidence based
information: despite the magnitude of the issue, awarenessof and scientific information
on disability issues arelacking. There is no agreement on definitions and little interna-
tionally comparable information on the incidence, distribution and trends of disability or
impairments. Despite the significant changes, achieved over the past two decades in the
field of disability and rehabilitation, there is no comprehensive evidence base.
Disability in Italy
Disabled people in Italy count about 3 million and 600 thousand individuals. Disability
has a huge impact on the quality of life, limiting people independence. Is is predomi-
nantly spread among elderly, since with aging the persons are afflicted by some chronic
invalidating diseases at the same time.
Among disabled population the ratio of people affected by one (59,4%) or more
(62,2%) severe chronic deseases is noticeably higher than the ratio of non-disabled per-
sons (respectevely 11,6% and 12,3%).
In Italy, 2,1% of the population, i.e. one million and 130 thousands persons, are
confined at home. Confinement means that a person is bedridden, or obliged to stay
home because of physical or psychological impediments. Among the elderlies the ratio
of confinded persons is 8,7%: the women present a ratio (10,9%) double than men (5,6%).
More than 500 thousand persons (1,1% of the population) haveproblems involving the
communication sphere, such as partial or total blindness, deafness or mutism. The greater
part of disabled persons, 1 million and 374 thousands (52,7%), is afflicted by more than
one disability.
Typically the family is the caregiver of the person with disability and it represents an
essential resource for facing the limitation and troubles caused by disability. The ratio of
Italian families with a least one disabled person is 10,3%. The greater part of families
(58,3 %) include a non-disabled person that can take care of the relatives with disabilities.
Almost the 80% of families with disabled persons at home is not assisted by public health
2
1.1 – Motivation
services.
This lack of assistance is not even supported by private domicile services: as a matter
of fact the 70% of the families do not make use of neither public or private assistance.
Severe motor disability
Some chronic degenerative diseases lead to communication problems and individual con-
finement. The progressive and total loss of motor functions induces in these patients ana-
trhria, i.e., inability of using the normal Augmentative and Alternative Communication
systems (AAC), and also the impossibility of interacting with their surrounding environ-
ment. These are in particular the subjects with AmyotrophicLateral Sclerosis (ALS) that
progressively loss the use of all muscular function (speech, use of upper and lower limbs,
respiration) due to a progressive degeneration of spinal and cortical motor neurons, the
neurons that control the movement of spinal muscles. Another pathology that can cause
a similar damage, although in longer times, is multiple sclerosis, that affects all nervous
functional systems, with a severe impairment of speech due to dysarthria, and the loss of
a functional use of limbs due to ataxia and spasticity. In this context, the loss of speech
capacity and of the skills necessary for the use of strategies of augmentative/alternative
communication is a cause oftotal isolation of patients. Subjects affected by neurodegen-
erative disorders in advanced phase have several symptoms and problems spanning a long
period of time, with a very negative impact on their quality of life. Furthermore, many
other diseases or traumatic injuries can limit the independence and the communication
capabilities of the people. Many researches are analyzing these issues, trying to identify
the various aspects involved in the care of these patients.
Palliative approaches have a lot to offer to patients affected by neurodegenerative dis-
orders, since, in general, the old population has chronic non-oncologic disorders. The
projects improving (or allowing) effective communicationin person with neurological
disorders are included in the field of palliative care.
In patients affected by ALS and MS, even if other communication forms are damaged
3
1 – Introduction
or lost, the ability of controlling eye movements is typically maintained. Since the be-
ginning of the twentieth-century, researchers investigated techniques for study the move-
ments and the physiology of the human eyes. Many of the techniques involved recording
and tracking eye movements, and estimating the zone gazed bythe users. Nowadays such
techniques are more and more advanced and are mainly used in military filed, advertising
and medical field (psychological analysis and assistive technologies). Devices that use
gaze tracking techniques are calledeye trackers; They are able to determine the direction
of the user’s gaze and to use it as an input channel. Communication instruments based on
eye-tracking are for some patients a possible solution, both to have a useful communica-
tion with family members or at longer distance, using Internet, and for the possibility to
control the home environment using a computer controlled bytheir eye movements.
1.2 Contribution
The main objectives of this doctorate thesis are to study, design and experiment new
solutions based on adopting eye tracking tools and environmental control applications for
improving thequality of lifeof patients with severe motor disabilities. These objectives
involve four different research topics (Figure 1.1):
• Computer Vision: the basis of most eye tracking technologies.
• Human Computer Interaction: research, development and testof assistive software.
• Ubiquitous Computing: research, project and development ofintegrated domotic
solutions.
• Ambient Intelligence: improve domotic environment with intelligent behaviors.
This thesis tackles, even if in partial and limited way, the communication and inde-
pendence needs of severe motor disabled persons. With such apurpose, great part of
the research has been done in strict collaboration with the Nuroscience Department of
4
1.2 – Contribution
Figure 1.1. Research Topics
S.Giovanni Battista Hospitalof Turin. This collaboration has been part of the wider con-
text of COGAIN , a network of excellence onCOmmunication byGAze IN teraction.
Members of the network are researchers, eye tracker developers and people who work
directly with users with disabilities, in user centers and hospitals. The COGAIN consor-
tium includes widely recognized experts,from research groups and companies, working
on advancements of this cutting edge technology. More than 100 researchers from more
than 10 countries are involved in the activities of the COGAINNetwork, and its impact
is growing. The COGAIN consortium is supported by two advisory bodies: A Board of
User Communities (BUC) and a Board of Industrial Advisors (BIA) in order to ensure
the best outcome and take-up of the results. COGAIN was launched with EU funding
in 2004, and its goal is to become self supporting when by the end of 2009 (when EU
fundings will cease).
In spite of many very accurate commercial eye tracker are available on the market,
they are not much widespread, in fact less then 10% of the persons, that could benifit
from eye tracking technologies, are using it.
There are three main factors that reduce the diffusion of eyetrackers:
• they are too expensive (from 5000 to more then 20000 euros);
• they have few advanced and effective applications;
• they are not know by most of the caregivers.2
2This was true especially in 2006, at the starting phase of theresearch
5
1 – Introduction
The initial phase of the thesis regarded the analysis of state-of-the-art eye tracking
algorithms, looking for suitable techniques for low cost eye tracking solution. After that,
a benchmark platform for eye tracking algorithms ha has beendesigned and implemented.
This platform allowed to obtain objective and repeatable measurements about accuracy
and performance of the investigated methods.
In the context of the COGAIN Project and of the collaboration with the ALS center
of Turin (Nueroscience Department), it has been performed an experimentation about the
impact of eye tracking technology on the quality of life of ALS patients. Furthermore the
research pursued in the project and development of assistive software for aided internet
navigation and multimodal control of the operative system.In the field of gaze technolo-
gies not only related to disabled person, the research investigated new interaction patterns,
based on eye tracking, for the navigation of 3D virtual environments.
Finally, the last part of the phd thesis has been concentrated on the fulfillment of the
independence needs of sever motor disabled, i.e., it was focused on the domotic field.
The investigation about domotics involved both the low level issues, i.e. the project of an
intelligent residential gateway for interoperation amongdifferent domotic technologies,
and the study of novel interaction patterns for controllingthe domotic enviroment through
the gaze.
1.3 Structure of the Thesis
The remainder of this thesis is organized as follows:
Chapter 2 describes the history and state-of-the art of eye tracking devices. Further-
more it reports the methodology and results of an experimentation involving ALS
patients.
Chapter 3 delineates the architecture, design and development of three assistive software
based on gaze and multimodal interaction patterns.
Chapter 4 gives a description of the domotic field for assisted living.It investigate the
6
1.3 – Structure of the Thesis
design of the Domotic OSGi Gateway that allow transformsipledomotic house in
IntelligentDomoticEnviroment. It also explains two different paradigm to control
the Domotic Environment through gaze interaction.
Chapter 5 eventually concludes the thesis and provides an overview onpossible future
works.
7
1 – Introduction
8
Chapter 2
Eye tracking
This chapter presents an overview of eye tracking techniques from the dawning to the
possible future developments. In addition to an in-depth study of the specific literature, it
reports an experimentation about the impact of eye trackingtechnologies on the quality
of Life of ALS patients.
2.1 Historical Overview
Eye tracking research started at least 100 before the spreading of personal computers. The
first research on eye tracking and analysis of the ocular movement goes back to 1878 to
Emile Javal [2]. The methods used to track, described on thiswork, were as invasive as
requiring a direct contact with the cornea.
The first not invasive technique of eye tracking was developed by Dodge and Cline
into 1901 [3]. This technique allowed to track the horizontal position of the eye onto a
photographic film, though the patient was required to be completely immobilized during
the analysis of the light reflected by the cornea. At the same period goes back the use of
the technique of the cinematographer which allowed to record the appearance of the ocular
movement during a time interval [4]. This technique analyzes the reflection produced by
the incidental light on a white spot inserted into the eye. These and other researches
concerning the study of the ocular movement made further headway during the first half
9
2 – Eye tracking
of the 20th Century as techniques combined into different ways.
In the 1930s, Miles Tinker and Paterson began to apply photographic techniques to
study eye movements in reading [5]. They varied typeface, print size, page layout, etc.
and studied the resulting effects on reading speed and patterns of eye movements. In 1947
Paul Fitts and his colleagues [6] began using motion picturecameras to study the move-
ments of pilots eyes as they used cockpit controls and instruments to land an airplane.
The Fitts et al. study represents the earliest application of eye tracking to what is now
known as usability engineering the systematic study of users interacting with products
to improve product design. Around that time Hartridge and Thompson [7] invented the
first head-mounted eye tracker. Crude by current standards, this innovation served as a
start to freeing eye tracking study participants from tightconstraints on head movement.
In the 1960s, Shackel [8] and Mackworth and Thomas [9] advanced the concept of head-
mounted eye tracking systems making them somewhat less obtrusive and further reducing
restrictions on participant head movement. In another significant advance relevant to the
application of eye tracking to human-computer interaction, Mackworth devised a system
to record eye movements superimposed on the changing visualscene viewed by the par-
ticipant. Eye movement research and eye tracking flourishedin the 1970s, with great
advances in both eye tracking technology and psychologicaltheory to link eye tracking
data to cognitive processes [10, 11, 12].
Much of the relevant work in the 1970s focused on technical improvements to increase
accuracy and precision and reduce the impact of the trackerson those whose eyes were
tracked. The discovery that multiple reflections from the eye could be used to dissociate
eye rotations from head movement [13] increased tracking precision and also prepared
the ground for developments resulting in greater freedom ofparticipant movement. Using
this discovery, two joint military / industry teams (U.S. Airforce / Honeywell Corporation
and U.S. Army / EG&G Corporation) each developed a remote eye tracking system that
dramatically reduced tracker obtrusiveness and its constraints on the participant [14, 15].
These joint military /industry development teams and others made even more important
10
2.1 – Historical Overview
contributions with the automation of eye tracking data analysis. The advent of the mini-
computer in that general timeframe2x provided the necessary resources for high-speed
data processing. This innovation was an essential precursor to the use of eye tracking data
in real-time as a means of human-computer interaction [16].Nearly all eye tracking work
prior to this used the data only retrospectively, rather than in real time (in early work,
analysis could only proceed after film was developed). The technological advances in eye
tracking during the 1960s and 70s are still seen reflected in most commercially available
eye tracking systems today [17].
Psychologists who studied eye movements and fixations priorto the 1970s generally
attempted to avoid cognitive factors such as learning, memory, workload, and deployment
of attention. Instead their focus was on relationships between eye movements and simple
visual stimulus properties such as target movement, contrast, and location. Their solution
to the problem of higher-level cognitive factors had been toignore, minimize or postpone
their consideration in an attempt to develop models of the supposedly simpler lower-
level processes, namely, sensorimotor relationships and their underlying physiology [18].
But this attitude began to change gradually in the 1970s. Whileengineers improved eye
tracking technology, psychologists began to study the relationships between fixations and
cognitive activity. This work resulted in some rudimentary, theoretical models for relating
fixations to specific cognitive processes. Of course scientific, educational, and engineer-
ing laboratories provided the only home for computers during most of this period. So
eye tracking was not yet applied to the study of human-computer interaction at this point.
Teletypes for command line entry, punched paper cards and tapes, and printed lines of
alphanumeric output served as the primary form of human-computer interaction.
As Senders [19] pointed out, the use of eye tracking has persistently come back to
solve new problems in each decade since the 1950s. Senders likens eye tracking to a
Phoenix raising from the ashes again and again with each new generation of engineers
designing new eye tracking systems and each new generation of cognitive psychologists
tackling new problems. The 1980s were no exception. As personal computers prolifer-
ated, researchers began to investigate how the field of eye tracking could be applied to
11
2 – Eye tracking
issues of human-computer interaction. The technology seemed particularly handy for an-
swering questions about how users search for commands in computer menus [20, 21, 22].
The 1980s also ushered in the start of eye tracking in real time as a means of human-
computer interaction. Early work in this area initially focused primarily on disabled users
[23, 24, 25]. In addition, work in flight simulators attempted to simulate a large, ultra-
high resolution display by providing high resolution wherever the observer was fixating
and lower resolution in the periphery (Tong, 1984). The combination of real-time eye
movement data with other, more conventional modes of user-computer communication
was also pioneered during the 1980s [26, 27, 25, 28, 29].
In more recent times, eye tracking in human-computer interaction has shown huge
growth both as a means of studying the usability of computer interfaces and as a means
of interacting with the computer.
2.2 State-of-the-art
Recent eye tracking systems can be classified according to 3 main aspects:
• the adoptedUser Interface. Eye trackers can be:
– intrusive, i.e. in direct contact with the users’ eyes (e.g. contact lens, elec-
trodes);
– remote, e.g. a system including a personal computer with one or morevideo
cameras;
– wearable, e.g. small video cameras mounted on a helmet or glasses.
Remote eye trackers are the most widespread systems. Wearable eye trackers re-
cently gaining some popularity, in contrast to invasive methods, typically uses in
medical field, which are continuously losing fame.
• theApplicationsthat use them as input devices. Eye tracker systems have a great va-
riety of application fields. The first researches, involvingeye trackers, were mostly
12
2.2 – State-of-the-art
related to the medical and cognitive fields, as they concerned the study of human
vision and eye physiology. Recent studies, instead, are rather related to Assistive
Technologies and, in general, to Human Computer Interaction, while current eye
tracker killer applications seem to be mainly focused on advertising and market-
ing, aiming at analyzing the area of interest of customers inside either commercial
videos or advertising posters.
• the usedtechnology. Eye trackers can be based on at least 6 different gaze estima-
tion algorithms that are detailed in subsequent sections.
The aforementioned eye tracking systems classification is summarized in Figure 2.1.
Figure 2.1. Eye tracking systems classification
The main eye tracking methods found in literature are 6:
• image analysis
– cornea and pupil reflex;
– pupil tracing;
13
2 – Eye tracking
– tracing of the corneal reflex through the dual Purkinje image;
– image analysis: tracing of the limb, the border between irisand sclera;
• physical properties analysis
– electro-oculography: measurement of the eye electrical potential;
– scleral coil: measurement of the magnetic field produced by the movement of
a coil inserted into a user’s eye.
Image Analysis
Most computer vision techniques for gaze tracking are basedon finding and analyzing the
reflex produced by infrared light incident on various parts of the eye. Figure 2.2 shows
reflexes on the eye produced by an infrared light source.
Figure 2.2. Infrared light reflections
The tracking of corneal and pupil reflex [30, 31, 32] allows determining the gaze
direction by comparing the infrared corneal reflex with the pupil position. The corneal
reflex tends to stay fixed during pupil movements and is used asa spatial reference for
both tracking pupil movements, and for compensating head movements. An eye tracking
system using this technique typically consists of a single device composed of one or more
infrared light emitters and an infrared-sensitive video camera. Some examples of eye
trackers using this technique are described in [33, 34, 35].
The pupil tracking method [36], conversely, uses just the position and the shape of
the pupil to infer the gaze direction. This technique is sensitive to head movements and
is less accurate in estimating the observed point. Systems adopting this technique, to
14
2.2 – State-of-the-art
compensate for head movements, use cameras mounted over glasses worn by the user.
Most of the eye tracking systems, that use these techniques,are composed by one or more
infrared light sources to illuminate the eye(s), and one or more infrared cameras to capture
video of the eye(s).
Physical properties analysis
Electro-oculography [30, 31, 32, 37] is based on measuring the electrical potential differ-
ence between the cornea and the retina. Typically, pairs of electrodes are placed around
the eye. When the eye moves, a potential difference occurs between the electrodes and,
considering that the resting potential is constant, the recorded potential measures the eye
position.
Scleral coil techniques use a contact lens with a coil to track the eye movements. The
coil induces a magnetic field variation while the eye is moving. Scleral coil eye trackers
are very intrusive because, typically, the contact lenses are connected to some wire. Only
recently a wireless Scleral Coil eye tracker has been developed [38].
Eye trackers that adopt physical properties analysis are usually composed of a DSP,
necessary for data processing, connected to on output channel, which provides the cap-
tured samples.
Comparing Eye Tracking Techniques
Table 2.1 shows a comparison among the gaze tracking techniques previously discussed.
It reports, for each eye tracking technology, the type of devices that can adopt it, the
accuracy, and the sample frequency. The accuracy is evaluated as the minimum visual
angle, measured in degrees, discriminable by the technique. A visual angle of0.5◦ allows
to estimate approximately a 20x20 pixels gazed area at the distance of 70 cm.
15
2 – Eye tracking
Technology Device Type Accuracy Sample FrequencyPupil and Corneal reflectiondesktop, wereable< 0.1◦ 50 - 100 HzElectro-potential intrusive 0.1 − 0.5◦ > 100 HzPupil Tracking desktop, wearable0.1 − 0.5◦ 50 - 100 HzScleral Coil intrusive < 0.1◦ > 100 HzDual Purkinje Image desktop, wearable0.1 − 0.5◦ > 100 HzLimbus desktop, wearable> 0.5◦ > 100 Hz
Table 2.1. Eye tracking techniques comparison
2.3 Experimentation
The potential of eye tracking in ALS is extremely high, sincethese patients retain their
full cognitive capabilities, and while paralysis progresses, in most cases eye movements
are still controllable.
There are few research results on using eye trackers with ALSor MS patients. A
former study [39] identified some fundamental requirementsfor Augmentative and Al-
ternative Communication (AAC) systems in these patients: communicating instructions,
achieve the satisfaction of their needs, clarify their needs, having an ”affective” com-
munication and transfer informations. The results of such research, even if they offer
significant information, are limited by thelack of adirect involvementof patients.
A deeper knowledge about real patient needs, and of their caregivers, is therefore
necessary to define and evaluate effective tools for AAC through eye tracking devices.
This paper reports the trials performed, over a span of 2 years, on a significant fraction of
Italian ALS patients. The trials were conducted in collaboration between Politecnico di
Torino, the hospital San Giovanni Battista of Torino, and theUniversity of Torino (dept.
of Neuroscience).
The main aim of the experimentation is to evaluate if and wheneye tracking technolo-
gies have a positive impact on patients’ lives.
2.3.1 Methodology
The research is based on the following main principles:
16
2.3 – Experimentation
• Adoption of Quality of Life (QoL) assessment scales
• Experimentation with off-the-shelf devices
• Involving a large user base.
A multi-disciplinary team, composed by Neurologists, Psychologists, Speech thera-
pists and Computer Science Engineers, lead the experimentation.
The neurologists select the patients in according with the following recruitment crite-
ria:
• Ethical: patients who are able to understand the aim of the study and to give an
informed consent.
• Motivational : patients who are unable to speech intelligibly and having various
degrees of hand function impairment.
• Efficacy: patients who have basic to good level of computer literacy.
During the trial, each patient uses an eye tracking system for a week in their own
domestic environment.
The research team schedules two visits and one telephone contact for each patient
during the eye tracking lending period. The speech therapists train patients and their
caregivers to calibrate and use the eye tracking system. Thetraining also includes a brief
course about using applications for writing, communication and Internet browsing in eye
tracking mode. Other applications are installed in according to users’ needs and interests.
The psychologists fill in the patients’ assessment questionnaires just before the train-
ing phase. The questionnaires measure the QoL, the satisfaction about Life, the Depres-
sion level, and the perception to represent a burden.
The following international recognized quantitative scales are been adopted:
• Mc Gill scale (MGS). This scale, developed at McGill’s University [40, 41], ana-
lyzes five factors: physical comfort, physical symptoms, psychological symptoms,
existential comfort and support.
17
2 – Eye tracking
• Satisfaction With Life Scale (SWLS) [42, 43] which evaluatesthe satisfaction about
life.
• Zung scale: self rating depression scale [44]; it is fast, simple and it has quantitative
results.
• Self-Perceived Burden Scale (SPBS) [45, 46]: this questionnaire consists of 25
statements about feelings the patients may or may not have about their relationships
with caregivers.
The same questionnaires are proposed again at the end of the evaluation period with
the purpose of verifying the impact of the eye tracker usage on the measured parameters.
A further questionnaire, developed by the ALS center, is additionally proposed at the end
of the lending period. The ALS questionnaire focuses on qualitative aspects and feelings,
and analyzes the time spent with the system, the training process, subjective satisfaction,
and influence on life quality.
2.3.2 Experimental settings
The eye tracker used in the experimentation was theEye Response Technologies’ ERICA
Standard Systemequipped with Assistive and Communication software such as ERICA
keyboard, mouse emulators,Sensory Software’s The Grid. Standard Windows and Inter-
net applications are also used in the tests.
The experimentation involved 16 patients (12 men, 4 women) from April 2006 to
August 2007. The patients average age was 45 years (min 32, max 78). The patients were
in the advanced phase of the disease, in detail: 7 of them weretracheotomized, 8 had
percutaneous endoscopic gastrostomy tube (PEG), while 6 patients where anarthric and 7
had a severe dysarthria.
18
2.3 – Experimentation
2.3.3 Case studies
Three particular user case studies are hereafter reported to give a qualitative outlook of
the impact of eye tracking technology on ALS patiens. Permission to publish this infor-
mation, in a partially anonymous form, has been obtained.
Marco – Marco is 47 and lives in his house with his family. Before the disease he was
a traveling salesman, frequently traveling around the country. At the time of the experi-
mentation he was using a communication system (virtual keyboard) with a computer and
a foot switch (in scanning mode). When he tried the eye controlsystem he was very ex-
cited; he used a screen keyboard for communication and for sending emails quickly and
easily. More recently, he started having a lot of problems with his current system because
he has less and less movement on his feet. He really wants an eye control system, but
the Piemonte Regional Government denied him a grant. He latersucceeded, thanks to
the help of the Italian ALS Association, to raise funds for buying an eye tracker, an is
currently collaborating with the device manufacturers. Heis also in the process of writing
a book, in collaboration with other ALS patients who use eye trackers.
Paolo – Paolo is 52 and lives at home with his wife. Before the illness,he was a web
designer, and he still is. During the experimentation he wasusing two mouse devices, one
for moving the cursor and the other for clicking. He needs theeye tracker for his work,
only, because he still successfully uses labial movements for communication with his
family. He uses many programs for his work, and tried them allon the Erica system. The
results were positive and he wants to buy the software and camera add-on to his computer.
In the past he tried other eye tracking systems but he didn’t like them because ”they didn’t
work well with web design programs.” He recently lost the ability to use mice as input
devices, and he is waiting for the national health system to fund him the purchase of an
eye tracker.
19
2 – Eye tracking
Domenico – Domenico is a young man, he lives at home with his wife. He was eager
to try the eye control system to be able to speak, for the first time, with his 2-years old
nephew, and also to be able to express his feelings with bad words! When he tried the
eye tracker, finally he could speak with his nephew who could listen for the first time his
”voice”.
2.3.4 Quantitative Results
During the initial and final meetings of each trial, the responses of the patients to the
various questionnaires were recorded, and SPSS 12.0 was used to analyze them from a
statistical point of view. The test results showed a clear improvement in the perceived
quality of life, in both the MGS and SWLS Scales.
A particularly noticeable improvement was shown in the patients’ perception of their
condition overall, including their psychological well-being and physical symptoms, al-
though the amount of support required by each patient, and their perceived depression
did not show a significant change (less than 0,05). However, it must be remembered that
these results were achieved over a relatively short trial period of seven days.
More in detail, Figures 2.3, 2.4, 2.5 report the main resultson the four main scales.
The McGill scale (Fig. 2.3) measures a slight, but generalized, improvement on all as-
pects of the quality of life that may be attributed to the eye control equipment. On the
other hand, Fig. 2.4 shows that there were no significant modifications on depression and
burden scores, while we could measure an improvement on the satisfaction with life scale
(Fig. 2.5).
Specific evaluations about the eye control device are analyzed using the ALS Center
questionnaire. In particular, we may notice that the vast majority users are quite satisfied
with eye control devices (Fig. 2.6(d)); they use it quite often (Fig. 2.6(a)), and find it easy
to use (Fig. 2.6(b)) and to learn(Fig. 2.6(c)).
20
2.4 – User comments
Figure 2.3. Quality of Life (McGill Scale)
Figure 2.4. Depression (ZDS) and self-estimated burden (SPBS)
2.4 User comments
Users agree that the system is efficient and effective, and allows more complex commu-
nication, beyond the primary needs. In fact, the majority ofpatients used the system
every day with a high level of satisfaction. They felt that eye-control was comfortable
and flexible, and required relatively little effort. A greatperceived advantage is that, after
calibration, the user is independent in using applications(compared with Plexiglas tables
commonly used for eye-contact dialogs, which rely on a communication partner). For
21
2 – Eye tracking
Figure 2.5. SWLS (satisfaction with life scale)
(a) Time of USE (b) Ease of USE
(c) Learning rate (d) Satisfaction
Figure 2.6. ALS Centre questionnaire
typing applications, users appreciated the prediction dictionary and the voice synthesis
features. On the other hand, some patients expressed negative comments, which were
mainly due to loss of motivation after some initial technical problems, or to the difficulty
in calibration or the need of repeating the calibration procedure too often. Some patients
22
2.5 – Overall results
using multi focal lenses could not calibrate the system, butthis was solved by changing
the glasses. For users less expert with computers, learningto use the screen keyboard was
somewhat difficult. Finally, some patient with a residual mobility on some parts of their
body had difficulties in keeping their head perfectly still.
2.5 Overall results
All patients showed a strong interest in eye tracking systems, and most of them had al-
ready looked for information about this technology. The Erica system has generally been
well accepted and considered easy enough to be used by ALS patients with severe dis-
ability. The patients with worse clinical conditions had better acceptance.
Eye tracking benefits are lower for patients with residual arm mobility, while tra-
cheotomized patients had stronger motivation probably because of two main reasons:
anarthria represents the first motivation for communicating and tracheotomized patients
have better ventilation, and brain oxygenation, than patients with dispnea. The patients
who tried the eye tracker system perceived an improvement ofQoL because they were
able to communicate independently and the communication was easier, faster and less
laborious.
23
2 – Eye tracking
24
Chapter 3
Software Applications
A main lack of the current commercial eye tracker is the negligible amount of available
software. Each vendors provides a propetary software package that includes the basic
software to communicate and, in some case, to surfe the internet. Those software, in most
cases, present a very simplified interface suitable for non-expert users. Users familiar
with computer, generally, are limited by the poor and too simple feauters of the software.
As mentioned before, one of the objectives of this thesis is the research and develop-
ment of applications, based on eye tracking, that can go beyond the limit of the current
software and work with various eye tracker devices. This chapter descibe the project, the
development and the experimentation of three gaze-based applications:
• a mozilla firefox extension that allows aided web surfing
• an applicationf for the control of the operative system based on multimodal inter-
action between speech recnognition and gaze tracking.
• a collection a three simple computer games for studying new multimodal interaction
modalities involving gaze interaction in 3D environments.
25
3 – Software Applications
3.1 Web Browsing
The Web is an increasingly important resource in many aspects of life: education, em-
ployment, government, commerce, health care,recreation, and more. However, not all
the people can equally exploit the Web potential, especially people with disabilities. For
these people, Web accessibility provides valuable means for perceiving, understanding,
navigating, and interacting with the Web, so allowing to actively contribute and partici-
pating to the Web. Much of the focus on Web accessibility has been on the responsibilities
of Web developers. Yet, theWeb softwarehas also a vital role in Web accessibility. Soft-
ware needs to help developers produce and evaluate accessible Web sites, and be usable
by people with disabilities [47].
In 1999 theWorld Wilde WebConsortion (W3C) began Web Accessibility Initiative
(WAI) to improve accessibility of Web. The WAI has developeda number of guidelines,
concerning both Web contents [48, 49] and user agents (Web browsers, media players,
and assistive technologies ) [50], that can help Web designers and developers to make
Web sites more accessible, especially from the view of physically disabled people. There
are many applications (screen readers, voice control, etc..) that get Web browsing more
accessible for blind or deaf people, but only few applications, typically provided with
commercial eye tracker, can allow Web navigation for disabled people that need gaze
tracker devices.
In according with WAI guidelines, aMozilla Firefox (the most usedopen sourceWeb
browser) extension, named Accessible Surfing Extension (ASE), has been developed.
Firefox Extensions are installable enhancements to the Mozilla Foundation’s projects and
add features to the application or allow existing features to be modified. ASE implements
a novel approach to Web site navigation also suitable for lowresolution gaze tracker de-
vices.
26
3.1 – Web Browsing
3.1.1 Accessible Surfing Extension (ASE)
Many aids developed for eye-tracking based web browsing tryto cope with the basic diffi-
culties caused by current web sites. In particular, the three main activities when browsing
the web are, in decreasing order of frequency, link selection, page scrolling, form filling.
Link selection is a difficult task due to the small font-sizescurrently used, that require
high pointing precision. In come cases link accessibility is also decreased when client-
side scripting is used (e.g., in the case of pop-up menus created in Javascript, or with Flash
interfaces) or time-dependent behaviors are programmed (e.g., the user has limited time
to select a link before it disappears); such situations are incompatible with the current
WAI guidelines for web page creation (WCAG).
Most approaches, such those provided in Erica System [51] and in Mytobii 2.3 [52]
(Figure 3.1), tend to facilitate link selection by compensating the limited precision that
can be attained with eye tracking systems: zooming is a common feature that increases
the size of links near to the fixation point (either by screen magnification, or with widely-
distantiated pop-ups) to facilitate their selection in a second fixation step.
Figure 3.1. MyTobii Web Browser
In this a different integration paradigm work has been explored; it decouples page
reading from link selection. In a first phase, when the user ison a new web page, he is
mainly interested in reading it, and perhaps he needs scrolling it. When an interesting link
is identified, only then the user should be concerned for the mechanism for activating it.
If the link is large enough (e.g., a button image), usually nohelp is needed (and the zoom
interface would only interfere with user intentions). If, on the other hand, the link is too
27
3 – Software Applications
small, than a separate method for selecting is available.
At all times, the browsing window is integrated by a side-barcontaining a link-
selection interface, that is always synchronized with the currently displayed web page.
When the user wants to select a link, he may use the sidebar thatfeatures large and easily
accessible buttons.
This interaction paradigm has been developed as a Mozilla Firefox extension. This
browser has been selected instead of others (Internet Explorer, Opera, Konqueror, Safari,
etc..) for being open source, cross platform (Windows, Linux and Mac OS X), customiz-
able and expandable and it has a simplified user interface.
The Accessible Surfing Extension (ASE) is a sidebar application inside the browser
window(see Figure 3.2): whenever a new Web page is loaded ASEanalyzes its contents,
modifies the page layout and refreshes the graphical user interface.
Figure 3.2. Accessible surfing on Cogain.org
According to user preferences and skills, ASE allows users to navigate Web pages in
two modalities:
• Numeric mode: each link in the Web page tagged with a consecutive
28
3.1 – Web Browsing
small integer, shown besides the link text or image. Such links are al-
ways visible, non intrusive, and usually don’t disrupt the page layout.
The integer numbers are used by the user to identify uniquelythe link he
is interested in. At this point, the user turns his attentionto the sidebar,
where the ASE displays a numeric on-screen keyboard and a selection
confirmation button. Users select the link by dialing its number with the
on-screen buttons, and then confirming with the selection button. Feed-
back is continuous: the selection buttons reports the text of the currently
dialed link number, and such link is also highlighted in the web page.
• Browsing mode: a different, simpler modality, can be activated for
users less familiar with web browsing. In such case, page scrolling
and link selection were blended in the ASE interface: through a “Next”
button, that selects the next group of 5 links in the web page,highlights
them in the web page, and updates 5 big buttons to select them,while
simultaneously scrolling down the page to the region containing them.
Thus, the main focus of the user is now on the ASE, to control web page
scrolling. When the user finds an interesting link chances arethat it is
already present in one of the 5 buttons and can be directly selected with
one fixation.
The ASE also allows page zooming, and the number and sizes of the buttons can be
customized to be adapted to the specific eye tracking system precision.
ASE architecture
ASE is composed by five modules (see Figure 3.3):
Web Page ParserWhen a new Web page is loaded asense page change eventis received
from this module that captures the Web links through the DOM (Document Object
Model) interface and stores them into theWeb Links Database. The DOM is
a platform and language-independent standard object modelfor representing and
29
3 – Software Applications
Figure 3.3. ASE architecture
interacting with HTML or XML. The Web page parser sends anupdate messageto
theGUI generator and to theWeb Page Taggerwhen it has finished the Web page
parsing.
Gui Generator This module retrieves Web links from the database, then prepares and
displays the graphical user interface according to the selected navigation modality.
Web Page TaggerIt tags each Web page link, retrieved from the database, withprogres-
sive numbers and then sends are-render pagemessage to Web browser.
ASE GUI Users can interact with the browser through this XUL1 graphical interface.
When a user presses a button acommand messageis sent to theCommand Parser.
Command Parser This module translates ASE user commands (i.e. link selection, zoom
in, etc.) to Mozilla Firefox action (page change) commands.
1the XML User Interface Language, is an XML user interface markup language developed by theMozilla project for use in its cross-platform applications.
30
3.1 – Web Browsing
Mozilla Firefox
event PAGE LOAD
WEB PAGEPARSER
GUI GENERATOR
SIDE BAR WEBPAGE
TAGGINGnum link
COMMANDPARSER
modified
PAGE
Link SelectionZooming
Page Scrolling
WEB PAGE
PAGE
rendered
Firefox commands
ASE commands
ASE
Figure 3.4. The general architecture of the proposed system
3.1.2 Preliminary Tests
A preliminary usability experimentation has been conducted on a ALS user with the part-
nership of“Molinette Hospital of Turin” . Mozilla Firefox (version 1.5) and ASE exten-
sion have been installed on ERICA eye-gaze system. The Molinette experimentation has
involved, at now, twenty people with ALS, yet only one had theopportunity to connect
to the Internet so as to test our software. The aim of the Molinette tests is to understand
how and how much a communication device like ERICA can improve the life quality
of terminally ill patients. ERICA systems has been tested by each patients for a week.
Psychological questionnaires have been proposed to the users before and after the trial.
This tests prove that the psychic condition is significantlyimproved after the trials. The
31
3 – Software Applications
user who has tried our software considers it fairly good and comfortable to use. When
browsing the Internet, he actually ended up to prefer the ASE(numeric mode) than the
ERICA zooming interaction.
3.2 Multimodal interaction
In recent years, various alternatives to the classical input devices like keyboard and mouse
and novel interaction paradigms have been proposed and experimented. Haptic devices,
head-mounted devices, and open-space virtual environments are just a few examples.
With this futurist technologies, although still far from being perfect, people may receive
immediate feedback from a remote computing unit while manipulating common objects.
Special cameras, usually mounted on special glasses, allowto track either the eye or
the environment so as to provide visual hints and remote control of the objects in the
surrounding space. In other approaches special gloves are used to interact with the envi-
ronment through gestures in the space [53].
In parallel to computer vision based techniques, voice interaction is also adopted as
an alternate or complementary channel for natural human-computer interaction, allowing
the user to issue voice commands. Speech recognition engines of different complexities
are used to identify words from a vocabulary and to interpretthe user’s intentions. These
functionalities are often integrated by context knowledgein order to reduce recognition
errors or command ambiguity. For instance, several mobile phones currently provide
speech-driven composition of a number in the contacts list,which is by itself a reduced
contextual vocabulary for this application. Information about the possible “valid” com-
mands in the current context is essential for trimming down the vocabulary size and en-
hance recognition rate. At the same time, the current vocabulary might be inherently
ambiguous, as the same command might apply to different objects or the same object
might support different commands: also in this case, contextual information may be used
to infer user intentions.
32
3.2 – Multimodal interaction
In general most interaction channels, taken alone, are inherently ambiguous, and far-
from-intuitive interfaces are usually necessary to deal with this issue [54].
To keep the interaction simple and efficient, multimodal interfaces have been pro-
posed, which try to exploit the peculiar advantages of each input technique, while com-
pensating for their disadvantages. Among these, gaze-, gestures- and speech-based ap-
proaches are considered the most natural, especially for people with disabilities.
Particularly, unobtrusive techniques are the most preferred, as they aim at enhancing
the interaction experience in the most transparent way, avoiding the introduction of wear-
able “gadgets” which usually make the user uncomfortable. Unfortunately, this is still
a strong constraint, which is often softened by some necessary trade-offs. For instance,
while speech-recognition may simply require wearing a microphone, eye tracking is usu-
ally constrained to using some fixed reference point (e.g., either a head mounted or wall
mounted camera), making it suitable only for applications in limited areas. Additionally,
the environmental conditions render eye tracking unusablewith current mobile devices,
which are instead more appropriate for multimodal gesture-and speech-based interaction.
Indeed, the ambient conditions play a major role when choosing the technologies to
use and the strategies to adopt, always taking into account the final cost of the proposed
solution.
In this context, the section discusses a gaze- and speech-based approach for the inter-
action with the existing GUI widgets provided by the Operating System. While various
studies already explored the possibility of integrating gaze and speech information in lab-
oratory experiments, we aim at extending those results to realistic desktop environments.
The statistical characteristics (size, labels, commands,. . . ) of the widgets in a modern
GUI are extremely different from those of specialized applications, and different disam-
biguation approaches are needed.
One further assumption of this work is the necessity of working with inaccurate eye
tracking information: this may be due to a low cost tracking device, or to low-resolution
mobile (glass-mounted) cameras, or to calibration difficulties, varying environmental
lighting conditions, etc. Gaze information is therefore regarded as a very noisy source
33
3 – Software Applications
of information.
The general approach proposed is based on the following principles:
1. gaze information is used to roughly estimate the point fixated by the user;
2. all object in the neighborhood of the fixated point are candidates for being selected
by the user, and are the only ones to be recognized by the vocalsystem;
3. actual command selection is done by speaking the appropriate command.
As a consequence, and contrarily to related works, the grammar for voice recogni-
tion is generated dynamically, depending on the currently gazed area. This improves
speech recognition accuracy and speed, and also opens the door to novel disambiguation
approaches. In this chapter, a method for analyzing real ambiguity sources in desktop
application usage, and an algorithm for disambiguation of vocal commands will be dis-
cussed.
Eye-gaze pattern-based interaction systems, as any other recognition (e.g. Voice)
based systems, can produce both false alarms and misses. Some of these limitations
can be overcome by developing more advanced techniques suchas statistical learning,
but more importantly ambiguity will be dramatically reduced when multiple modalities
are combined due to the mutual disambiguation effects. If weexpect that eye-gaze pat-
tern alone could be successful most of the time, its role can be expected to be even more
powerful when combined with other modalities such as speechrecognition.
The goal of high-level multi-modal speech systems is to obtain the same ease and
robustness of human communications by integrating automatic speech recognition with
other non-verbal methods, and integrating non-verbal methods with speech synthesis to
improve the output of a multi-modal application.
3.2.1 State of the art
While most of earlier approaches to multimodal interfaces were based on gestures and
speech recognition [55, 56, 57], various speech- and gaze- driven multimodal systems
34
3.2 – Multimodal interaction
have also been proposed.
In [58] an approach combining gaze and speech inputs is described. An ad-hoc pro-
gram displays a matrix of fictitious buttons which become colored when spotted through
fixation. The test users can then name the color of the desiredbutton to select it via speech
recognition, so going beyond the gaze-tracking limits. However, differently from the ap-
proach proposed in this paper, the technique has not been applied to real programs, and
the color-coding system demonstrated to be somewhat confusing for various users. Re-
sults suggest that in terms of a footprint-accuracy tradeoff, pointing performance is best
(about 93%) for targets subtending 0.85 degrees with 0.3-degree gaps between them.
In [59] gaze and speech are integrated in a multimodal systemto select differently
sized, shaped and colored figures in an ad-hoc application. The distance from the fix-
ation point is used to rank the n-best candidates, while a grammar composed by color,
color+shape or size+color+shape is used for speech recognition. The integrated use of
both gaze and speech proved to be more robust than their unimodal counterparts, thanks
to mutual disambiguation, yet the tests are not based on every-day applications.
Some theoretical directions toward the conversion of unimodal inputs to an integrated
multimodal interface are proposed in [60]. The context hereis more focused on gaze and
speech inputs as Augmentative and Alternative Communication (AAC) channels, which
can be the only available for several diversely able people.The tests are based on earlier
studies which do not involve existing off-the-shelf applications of every-day use.
The COVIRDS (COnceptualVIR tual Design) System, described in [61], provides a
3D environment with a multimodal interface for Virtual Reality-based CAD. Speech and
gesture input were subsequently used to develop an intuitive interface for concept shape
creation. A series of tasks were implemented using different modalities (zoom-in, view-
point translation/rotation, selection, resizing, translation, etc.). Evaluation of the interface
was based on user questionnaires. Voice was intuitive to usein abstract commands like
viewpoint zooming and object creation/deletion. Hand gestures were effective in spatial
tasks (resizing, moving). Some tasks (resizing, zoom in a particular direction) were per-
formed better when combining voice and hand input. The command language was very
35
3 – Software Applications
simple and the integration of modalities was implemented atsyntax level. Therefore in
some cases users showed preference for a simple input device(a wand with 5 buttons)
rather than for multimodal input.
A multimodal framework for object manipulation in Virtual Environments is presented
in [57]. Speech, gesture and gaze input were integrated in a multimodal architecture
aiming at improving virtual object manipulation. Speech input uses a Hidden-Markovian-
Model (HMM) recognizer, while the hand gesture input moduleuses two cameras and
HMM-based recognition software. Speech and gesture are integrated using a fixed syntax:
< action >< object >< modifier >. The user command language is rigid so as to
allow easier synchronization of input modalities. The synchronization process assumes
modality overlapping: the lag between the speech and gesture input is considered to be at
most one word. The functionality of the gaze input is reducedto providing complementary
information for gesture recognition. The direction of gaze, for example, can be exploited
for disambiguating object selection. A test-bench using speech and hand gesture input
was implemented for visualization and interactive manipulation of complex molecular
structures. The multimodal interface allows a much better interactivity and user control
compared with the unimodal, joystick-based, input.
In contrast with most of the mentioned approaches, our experimentation is based on a
system for multimodal interaction with every-day and off-the-shelf desktop environments.
In particular we want to improve the performance of low-costeye-gaze trackers and of
speech recognition systems when used alone, by using a real-time generated grammar
based on the integration of both the input channels.
3.2.2 Speech Recognition Background
The first studies on speech recognition technologies began as early as 1936 at Bell Labs.
In 1939, Bell Labs [62] demonstrated a speech synthesis machine, simulating talking,
at the World Fair in New York. Bell Labs further ceased researches on speech recogni-
tion, basing on the incorrect consideration that artificialintelligence would ultimately be
necessary for success [63].
36
3.2 – Multimodal interaction
Early attempts to design systems for automatic speech recognition were mostly guided
by the theory of acoustic-phonetics, which describes the phonetic elements of speech (the
basic sounds of the language) and tries to explain how they are acoustically realized in
a spoken utterance. These elements include the phonemes andthe corresponding place
and manner of articulation used to produce the sound in various phonetic contexts. For
example, in order to produce a steady vowel sound, the vocal cords need to vibrate (to
excite the vocal tract), and the air that propagates throughthe vocal tract results in sound
with natural modes of resonance similar to what occurs in an acoustic tube. These natural
modes of resonance, called the formants or formant frequencies, are manifested as major
regions of energy concentration in the speech power spectrum. In 1952, Davis, Biddulph,
and Balashek of Bell Laboratories built a system for isolated digit recognition for a single
speaker [64], using the formant frequencies measured (or estimated) during vowel regions
of each digit.
In 1960s The phoneme recognizer of Sakai and Doshita at KyotoUniversity [65]
involved the first use of a speech segmenter for analysis and recognition of speech in
different portions of the input utterance. In contrast, an isolated digit recognizer implicitly
assumed that the unknown utterance contained a complete digit (and no other speech
sounds or words) and thus did not need an explicit “segmenter”. Kyoto University’s work
could be considered a precursor to acontinuous speech recognition system.
In 1966, Lenny Baum of Princeton University proposed a statistical method [66, 67],
namely Hidden Markov Model (HMM), which was later applied tospeech recognition.
Today, most practical speech recognition systems are basedon the statistical framework
and results developed in the 1980s, later significantly improved [68, 69].
In the late 1980s the first platforms were finally commercialized, thanks to the expo-
nentially growing computer processing power. Still, only discrete utterances were suc-
cessfully recognized, until the mid-’90s when even pauses between words were tolerated.
The so called “continuous speech recognition systems” reached an accuracy of 90% and
more under ideal conditions.
37
3 – Software Applications
In the last decade the computational power of personal computer has dramatically in-
creased, so allowing to implement automatic speech recognizers, which in earlier attempts
where hardly feasible even with dedicated hardware devices, such as DSP boards.
Currently, speech recognition technologies offer an effective interaction channel in
several application fields, such as:
• Telephone Services. Most of telephone companies replace call center operators with
speech recognizer to offer real time information services (e.g. forecast information,
train and flight timetables and reservations).
• Computer control. Recent operative systems integrate nativespeech recognizer en-
gines that allows disabled people to control the personal computer by vocal com-
mands. In addition, many commercial companies provide special-purpose speech
recognizer applications, e.g., voice-to-text editor or converter.
• Mobile device control. Several mobile devices can be controlled by simplevocal
commands. At now, speech recognition in mobile devices is limited to specific
functions like contact list management, phone calls, etc.,but is expected to enable
more complex activities as the underlying system become more performant and less
demanding in terms of energy consumption.
• Automotive control. Since five years several automobile manufactures integrate
speech recognizer systems in the cars. This embedded systems allow drivers to
remotely control devices, such as a mobile phone, while keeping the attention to
the road, and without taking the hands off the steer.
• Language learning. Speech recognizer can be used as automatic pronunciation cor-
rector. Some commercial system already allow to propose thecorrect pronounce
when the spoken words differ too much from the reference samples.
The existing technologies which are currently used in modern applications of ASR
have greatly evolved since their infancy. Yet, a number of factors still make ASR algo-
rithms seriously complex:
38
3.2 – Multimodal interaction
Speaker independenceMost ASR algorithms require intensive training, which can hardly
cover the entire human spectrum. An ideal application, would require minimal or
no training at all to recognize a user’s speech.
Continuous speechIt is desirable to allow a user to speak normally, rather thanforcing
the insertion of pauses to facilitate the identification of word boundaries.
Vocabulary size The range of vocabulary size greatly vary with the application. For
instance, only a few words are to be recognized when dealing with simple and
limited controls (e.g., an audio player). In contrast, a large vocabulary is necessary
for complex communications, although leading to less accurate recognition as a
greater number of similar words may occur in the vocabulary.
Accuracy Environmental condition like noise and even minimal reverberation are likely
to lessen accuracy.
Delay The recognition process is not instantaneous. The lag introduced by the algorithms
usually grow with the complexity of the application, yielding delayed feedback to
the users, which is often annoying.
Hardware Requirements Typically the microphone has to be placed very near the mouth
for the ASR system to provide accurate results, so limiting the application ranges.
User Inteface Two commercial application for individual voice recognition are already
available at low cost, and with a simple interface (namely Dragon Naturally Speak-
ing and IBM Via Voice). Not all the applications provide a practical user interface
though.
Speaker and Listener Variables People don’t always speak clearly, nor in complete
sentences. And people with hearing loss often use their eyesto get cues from a
speaker’s face and gestures, to lipread, which might be difficult to do while watch-
ing a screen.
39
3 – Software Applications
Give these premises, a multimodal system is presented, which tries to combine the
most promising features of the available input channels, while compensating for the dis-
advantages.
3.2.3 Proposed Solution
Human-Computer interface has been developed to make easy androbust the use of the
machine, giving inflexible solutions: despite the adjective usablethese interfaces are
complex, inflexible and hierarchical. A new concept of multimodal interfaces in human-
computer interaction can open new possibilities in information exchange, obtaining more
usable system that offer to the user more informations, easing decisions and making the
user free to perform other tasks. Our work proposes, by the aforementioned motivations,
a multimodal architecture easy to use and open?source.
In order to take advantage of the concurrent visual and vocalsystems, a few basic
elements have been defined:
Objects The widgets available on the screen. These may be files represented by an icon
and a name, labeled buttons, menu items, window bars and buttons, etc. Each
object is characterized by a few properties such asname, role, state andactions. In
particular, each object has a default action defined by the system (e.g., open the file
with the associated program, show the pop-up menu, etc.).
Context The area spotted by the tracking system, also referred to as Gaze Window (GW),
identifies the context of interaction for the vocal system: only the objects within
such context will be considered by the vocal system. The context varies as the users
focuses on different areas of the screen.
Commands The words captured by the microphone and recognized by the speech recog-
nition engine. Valid commands are described in the grammar,that is composed
of the list of possible commands, corresponding to object names or action names
(within the current GW context).
40
3.2 – Multimodal interaction
Through the eye motion we track the direction of the gaze as a fixation point on the
screen, i.e., the point in which the user is focusing his/hergaze. This point is normally
affected by some displacement error due to various factors,so the eye tracker actually
identifies an area on the screen rather than a precise point. Thus, the result of the track-
ing is a GW that may contain several objects. The height and width of the GW aroud
the fixation point are defined by a customizable parameterGWsize. In the performed
experiments this parameter has been varied automatically,to simulate eye trackers with
different accuracy.
While gazing, the user also interacts with the system by uttering a command, i.e., by
pronouncing an object name (for the default system action) or a specific action.
The vocal platform manages spoken words through a VXML interpreter that is guided
by the voiceXML unit to completely and accurately interpretthe result. The voiceXML
unit interprets messages sent by the vocal platform, processes them and sends the result
to the main application unit. The voiceXML unit is developedin VXML. VXML is
the W3C’s standard XML format for specifying interactive voice dialogues between a
human and a computer. It that allows voice applications to bedeveloped and deployed
in an analogous way to HTML for visual applications. The application was developed
using a speech processing subsystem based on VoxNauta Lite 6.0 by Loquendo [70], an
Italian company leader on the field of vocal applications andplatforms. After receiving
the recognition results, the application matches the received command with the objects
selected by the eye tracker.
The rest of this section describes in details the various system modules and their func-
tionalities. In particular, I describe a mutual disambiguation algorithm that is based on dy-
namic grammar generation and is suitable for realistic desktop enviroments. Experimental
results will later show quantitative data proving the effectiveness of the disambiguation
method with real desktop usage scenarios.
Particularly, the steps required for command recognition and execution can be sum-
marized as follows (Figure 3.6):
1. Definition of a context as the screen area spotted by the eye-tracking system.
41
3 – Software Applications
2. Enumeration of the available objects within a given context.
3. Retrieval of object properties, such as name, role, position (with respect to the fix-
ation point), state, default action.
4. Disambiguation of objects having the same name by exploiting positional informa-
tion.
5. Matching of a pronounced command against object names or actions within a given
context
6. Retrieval of the corresponding object and execution of therelated action.
3.2.4 System Overview
The proposed system (Figure 3.5), described in this section, aims at extracting the most
useful informations from the two supported modalities (gaze estimation and voice com-
mands), while at the same time enabling their mutual disambiguation. Gaze is used for
setting up a “context” composed of the object that the user iscurrently focusing on. Track-
ing precision is not sufficient for quickly and reliably identifying a single widget, but is
sufficient for identifying an area on the screen and for filtering the contained objects. This
filtering highly reduces the ambiguity of voice commands, byruling out most of the se-
lectable actions (since they lie outside the user focus), and by reducing the dictionary size
(thus enhancing recognition rate).
The user task considered in this study consists in specifying an object (any selectable
element on the screen, i.e., windows, menus, buttons, icons, . . . ) or a command (any
action on an object, i.e., open, close, click, drag, . . . ).
3.2.5 System architecture
The system is organized as a set of five functional modules, asshown in Figure 3.6:
Eye Tracker, Screen Reader, Grammar Generator, Vocal Unit and Action Executor. Each
42
3.2 – Multimodal interaction
Figure 3.5. Scenario
module is described in the the appropriate sub-section. In particular, the Screen Reader
and the Grammar Generator handle object filtering and disambiguation, and real-time
generation of the VoiceXML grammar.
Figure 3.6. System Overview
43
3 – Software Applications
Eye Tracker
This module is responsible for the identification of an area of interest on the screen, i.e.,
of aGazed Window. The eye tracking system, in fact provides an estimated fixation point
that may be affected by some displacement error, strongly dependent on the hardware
and software components of the tracker. The actual area location and size is therefore
dependent on the fixation point and on the displacement error. Practically, the cursor
coordinates at the time of a fixation are used, and are collected as follows:
• if the cursor remains within a small area (a few pixels wide) for at leastD seconds
(dwell time), a fixation event is raised at the cursor position;
• if the cursor position varies too much before reaching the dwell time threshold, no
events are raised.
In case of fixation the Eye Tracker module defines the Gazing Window as a rectangle of
sizeGWsize centered on the fixation coordinates and eventually calls the Screen Reader
unit.
Screen Reader
The Screen Reader receives the fixated area (GW) as input from the Eye Tracker and
retrieves a set of objects on the screen in such area, by interacting with libraries at Op-
erating System level. This unit enumerates objects within the eye-tracking context and
defines for each of them the corresponding name, role, state,default action, and position.
The nameless or invisible (background) objects are discarded so as to get exactly what
the user sees on the screen. The retrieved objects are eventually collected into a memory
structure and passed to the Grammar Generator unit.
Grammar Generator
This unit generates an appropriate VXML grammar for the speech recognition module of
the Vocal Platform by using the objects spotted through the Eye-Tracking system and the
44
3.2 – Multimodal interaction
Screen Reader. Basically, the grammar defines a set of possiblevocalcommandsbasing
on the object names or actions.
The grammar is generated according to the following approach:
• if the object name is unique, a single vocal command is generated, corresponding
exactly to that name;
• when 2 to 4 objects share the same name or action, the corresponding commands
are disambiguated by exploiting the object locations (left, right, top, bottom). In
such a case, the commands entered into the grammar are the disambiguated names,
composed by the object name followed by the location direction. For example
“ firefox left ” and “firefox right ”. Additionally, a final command is
also added to the grammar, containing the ambiguous name (e.g., “firefox ”):
when recognized, the VXML interpreter synthesizes a vocal warning message ask-
ing the user to disambiguate it (e.g., “firefox is ambiguous, please specify left or
right”) to give proper auditory feedback to the user;
• when more than 4 objects are ambiguous, the location-based disambiguation method
is ineffective, and in this case a single command is generated with the correspond-
ing name, causing the Vocal Unit to synthesize an error message. The limitation of
4 disambiguation cases is due to the choice of using only 4 relative positions: top,
right, left, bottom.
Vocal Unit
The Vocal Platform receives as input the set of possible contextual commands defined by
the Grammar Generator, and supplies as output the command pronounced by the user.
Every spoken word is processed and interpreted on the basis of the VXML grammar and,
still according to the grammar, a vocal message can be synthesized to notify the user of
the recognition result: command recognized, ambiguous command identified, or wrong
command. When a command is correctly identified, it is passed to the Action Executor
unit.
45
3 – Software Applications
Action Executor
It receives as input the command recognized through the Vocal Platform, and executes the
associated action. Basically, the object corresponding to the command is retrieved from
the data-structure previously created by the Screen Reader,by matching the command
name with the object name or the available object actions (also considering disambigua-
tion). Then, the specified action (or the default action of the object) is executed.
3.2.6 Case Study
The proposed multimodal system, and in particular the interactive location-based disam-
biguation mechanism, have been designed for interacting with a real desktop environment.
To prove the effectiveness of the approach, we report some experimental results gathered
on the Windows XP operating system.
The performed tests have a twofold purpose:
• to analyse the relation between the gaze block size and the number of ambiguous
objects and commands, in a realistic desktop environment;
• to analyse the disambiguation efficiency of the location-based method.
The experimentation is based on data about classic Windows XP widgets (e.g. buttons,
menu items, etc.) and their locations inside the screen, gathered during both work and
personal use of computer. Unlike the other works, that make use of static pre-generated
objects disposition or of simple and unusual objects [58], this work is based on real ex-
perimental data. A test-oriented version of the screen reader module has been developed
to store screen-shots of the computer desktop taken at predefined time slots (e.g., every
3 minutes, provided the user was not idle during that period). Each screen-shot includes
a complete list of objects, each object being described by four properties: Name, Role,
Rectangle and Window Order.
• Thenameproperty contains all the text referred to the object, e.g. button title, text
area contents, etc..
46
3.2 – Multimodal interaction
• The role property specifies the object type, e.g. command button, list item, menu
item, etc..
• Therectangleproperty represents the location and the dimensions of the object.
• Thewindow orderproperty indicates the z-order location of the object.
The trials involved 5 unpaid people for about a week. Each person installed the screen
reader on his/her own computer and run it for a week. The gathered data sums to 468
screen-shots involving 144,618 objects, including the hidden ones. These objects have
been filtered down through a simple overlap detection algorithm, keeping only the 42,372
(i.e., 29.3% of the total) foreground visible objects, usedin all the subsequent test phases.
The tests determine how often the speech recognition systemis effective in disam-
biguating objects, as a function of the Gazing Window size. To speed up the analysis, the
eye tracking accuracy has been simulated by considering GW with variable dimensions
(from 10px to 800px) instead of precise coordinates identifying the objects position. The
maximum GW dimensions have been chosen to cover the corner case of a very inaccurate
eye tracking with the precision of two zones (left/right) ona 1600x1200 screen resolution.
This correspond to having practically no useful information from the eye tracker.
Two different tests have been performed, each using a different object property to
define objectsimilarity. In the first test (Name Ambiguity), two objects are considered
similar if the nameof the first object is included in (or equal to) thenameof the second
one. In the second test (Role Ambiguity), two objects are consideredsimilar if they have
the samerole (e.g., both objects are buttons).
The tests were executed according to Algorithm 1. The Classification of an object
inside a GW (line 6) follows the rules as shown in Table 3.1.
3.2.7 Name Ambiguity
This test aims at evaluating the number of ambiguous objectshaving anamesimilar to the
target object, within differently sized GWs centered on the object. We neglect the effect
47
3 – Software Applications
Algorithm 1 Test Application1: for eachscreen-shotS do2: for each target objectO in S do3: for each GWsize= 10px . . . 800pxstep10px do4: Generate aGW aroundO with sizeGWsize× GWsize5: Find the objects (OS) in theGW similar to O6: ClassifyOS7: Store Statistics8: end for9: end for
10: end for
Table 3.1. Object classificationUnique There is no other similar object in-
side the GW.Ambiguous The GW contains two or more ob-
jects which are mutually similar.Discriminable The ambiguous objects within the
GW are at most four.Indiscriminable The ambiguous objects within the
GW are more than four.
of speech recognition errors, and the only sources of imprecision are command ambigu-
ity and large GWs. In this case we reach 100% accuracy if and only if there only one
object with the same name is found in the considered GW. The test application generated
79 GWs (square, from 10px to 800px wide) for each object and calculated the number of
ambiguous objects. Thanks to the vocal localization-basedfeedback mechanism, discrim-
inable objects may be selected with full precision. Figure 3.7 illustrates the trend of both
the indiscriminable and discriminable ambiguous objects,in function of the GW size.
Experimental results show that the ideal recognition rate is quite high (about 80%)
even in case of inaccurate eye tracking device (i.e., wide GW)and no disambiguation.
Precision is significantly increased through the localization-based disambiguation method
up to 98% in the worst case. A deeper analysis of the results showed that indiscriminable
objects are not uniformly distributed here: only the 19.1% of the screen-shots presents
indiscriminable objects, and most of them are in an Internetbrowsing windows. In fact,
48
3.2 – Multimodal interaction
Figure 3.7. Name Ambiguity: Unique and ambiguous (Indiscriminable and Dis-criminable) objects vs. GW size
they are mostly hyperlinks in either Internet Explorer or Mozilla Firefox.
3.2.8 Role Ambiguity
This test aims at evaluating the number of objects having similar role, i.e., those objects
which support the execution of the same commands (e.g., all file icons). Even in this case
the test application generated 79 GWs (square, from 10px to 800px wide) for each object
and computed the number of visible objects with ambiguous role. Figure 3.8 shows the
trend of both the indiscriminable and discriminable ambiguous object roles with various
GW sizes.
In this case the ambiguous objects are far more than those obtained by name similarity.
Therefore, specifying actions as commands rather than object names can be more error
prone in case of low-precision eye trackers: even a 40px GW reduces precision to belos
50%. Even in this case we see the significant effect of location-based disambiguation,
that is able to recover all Discriminable cases. In this case, the 50% recognition threshold
49
3 – Software Applications
Figure 3.8. Role Ambiguity: Unique and ambiguous(Indiscriminable and Dis-criminable) objects vs. GW size
is reached with a much wider GW, around 150px, correspondingto a 4x increase in noise
rejection of the system to gaze tracking errors.
3.3 Gaze interaction in 3d environments
Nowadays the computer game industry is developing more and more innovative interac-
tion and control methods for user inputs. Nevertheless Gazetracking, that is a fast natural
and intuitive input channel, is not exploited in any commercial computer game, yet.
In recent years several research groups started to study gaze tracking devices applied to
computer games. In [71] and [72] we find a comparison of different input methods, also
including gaze tracking, for a first person shooter game. Thestudy in [73] shows that
gaze tracking beats mouse control as input modality during atournament of the classical
Breakout game.
In [74] several uses modes to enable mouse emulation with gaze have been designed and
50
3.3 – Gaze interaction in 3d environments
tested avoiding the well-knownMidas Touchproblem. The methods proposed in that pa-
per have been trialled in Second Life, an internet based 3D virtual world where users can
interact with each other through avatars.
This section present 6 different control methods for navigation and interaction in 3D
games and reports a usability study on those techniques. Differently from previous works,
the present research does not restrict attention to a particular technique or a particular ap-
plication/game but it extends the evaluation to three different games, that require various
skills and input schemes. This kind of research aims at spreading the study and develop-
ment of games and applications, based on gaze tracking device and addressed to common
people. Spreading gaze tracking could have a relevant impact on the reduction of device
costs. The decreasing of cost could also benefit the devices with a more noble purpose,
i.e. the eye trackers used as Assistive Technologies for disabled people [75].
3.3.1 Control and Interaction Techniques
The control scheme for navigation and interaction in 3D Virtual Environments should
allow the users to control their avatars, in particular the direction they are looking and
the direction towards where they are moving. According to the application typology the
game controller should allow more complex actions such as running, jumping, shooting
and interacting with the environment objects.
Most Virtual 3D applications provide a control scheme basedon a combination of key-
board and mouse inputs. Typically the gaze direction, called Free lookor Camera view,
is controlled by moving the mouse around, while the movementdirection is controlled by
keyboard.
In this context adding a further new input channel, additional and not alternative, such
as gaze control, can revolutionize the interaction methodsand user experiences. Our re-
search has designed, developed and tested six different control techniques that involve
gaze tracking and traditional input devices for navigationin 3D virtual environments.
Table 3.2 shows the control and interaction methods developed and tested by our research
group.
51
3 – Software Applications
Technique Movements Free Look ActionsMultimodal interaction
Gaze and keyboard (GTK) K G GGaze and keyboard button (GKB) K G K
Independent Gaze and Movements (IGM) K G GGaze interaction
Direct Gaze Control (DGC) G G GVirtual Keyboard (VK) G G G
Gaze to Target (GT) G G GK = Keyboard G = Gaze
Table 3.2. Control and Interaction Techniques
Multimodal interaction
Gaze tracking and keyboard (GTK) The user controls theFree Lookby gaze interac-
tion: when the gaze is directed to the 4 screen edges the camera view rotates toward the
same direction with proportional speed to the gazed zone nearness to the borders. When
the gaze direction comes back to the screen center the rotation stops. The gaze control al-
lows also to interact with the objects in the environment. The starting and the termination
of the rotation of the camera view and the actions activationare set by dwell time.
The keyboard is used to handle the movements of the user’s avatar by arrow keys.
Gaze and keyboard button (GKB) This control technique manages user’s avatar move-
ments and free look rotation with the same scheme of the previous method. The interac-
tion with the environment is handled by a keyboard key (for example space) that replaces
dwell time selection of the objects.
Independent Gaze and Movement (IGM) Typically in 3D environment navigation
scheme, free look and avatar movements are strictly bound, so the center of the camera
shows the moving direction. This control scheme, instead, completely separates the con-
trol of movements from the control of camera. This behavior allows to simulate a person
that walks in a direction and turns (right or left) his/her head. The direction of movements,
52
3.3 – Gaze interaction in 3d environments
controlled by keyboard, is indicated on the screen by an arrow, while the rotation of the
camera is defined by gaze tracking.
Gaze interaction
Direct Gaze Control (DGC) This method allows to control either the navigation or
interaction in 3D environments by using only the gaze tracking inputs. The free look
is managed with the same technique described above, whereasthe forward movement is
handled by selecting through dwell time the central zone of the screen (highlighted with
a viewfinder). In this scheme the direction of navigation is strictly bound to the direction
of the camera. In order to interact with the environment objects, contextual menus are
displayed after a dwell time.
Virtual Keys (VK) This scheme displays four semitransparent buttons in the middle of
screen edges. The left and right buttons control the rotation of the camera while the upper
and bottom button allow to navigate forward and backward. Each button is activated by
dwell time selection.
Gaze to Target (GT) This modality binds the user’s avatar movements to predefined
paths. The environment is enriched by anchor objects that define the locations that can be
reached by the user. When the user selects by dwell time an anchor object then the avatar
autonomously walks towards the selected point. After the anchor selection a confirmation
menu is displayed n order to reduceMidas Toucherror. The user can navigate the envi-
ronment from anchor to anchor while the camera view is free and is controlled similarly
to the GTK method.
3.3.2 Experimentation
The experimentation aimed to test the accuracy, speed and usability of the designed con-
trol and interaction techniques. It was divided in two phases. The first phase, involving
53
3 – Software Applications
6 users, had the purpose to select the more promising techniques. The second phase ex-
tended the test of the selected methods to 15 users.
Three simple 3D games have been developed in order to test thecontrol techniques. Those
games use ETU driver [76] to interact with Erica [51], the eyetracker used for the exper-
imentation. The first game shows a 3D home environment where the user should execute
two kinds of task. In the first task (Figure 3.9(a)), the user has to navigate in the home
and select a particular picture among the four pictures located in the environment. In the
second task (Figure 3.9(b)), the user has to take the requested food from the fridge. The
other two games aim at testing the pointing precision and speed, so the user is required to
shoot target men in a shooting range (Figure 3.9(c)), and to shoot enemies, avoiding good
guys, in a war path (Figure 3.9(d)).
The users played each game for 6 minutes, divided in session of 30 seconds, while
measuring their speed and precision.
The first round of preliminary tests highlighted that the more promising techniques were
GKB, DGC and VK. During the second part of the experimentationthe users tested the
selected methods with the first game. The VK method has not been tested with the last
three games because they did not allow free avatar movementsbut only free camera posi-
tioning.
Table 3.3 reports the precision percentage and the average elapsed time in the execution
of the 2 tasks of the first game. The most precise technique wasVK in both tasks, while
the method that allowed the least elapsed time was VK in the first task and DGC in the
second task. Table 3.4 shows a comparison of average elapsedtime and precision in game
2 and 3 among DGC, GKB and the mouse. The mouse control allowed better precision
in both games while the average elapsed time was equal for DGCand Mouse in games 2
and DGC had the least elapsed time in game 3.
At the end of the test the user filled an evaluation questionnaire in order to assess the
usability of the proposed gaze-based control techniques and also to gather personal opin-
ions and suggestions from the users. The analysis of the questionnaires shows that VK
54
3.3 – Gaze interaction in 3d environments
(a) Game 1/Task 1 (b) Game 1/Task 2
(c) Game 2 (d) Game 3
Figure 3.9. ScreenShots of the games
MethodFind Picture Take food
Precision (%) Avg Time (s) Precision(%) Avg Time (s)DGC 89 7.1 93 8.2GKB 79 7.8 84 8.8VK 95 6.5 97 8.6
Table 3.3. Game 1 Test: Precision and Time
method has been perceived as the most accurate and fastest control type. User perceptions
were clearly different to the real objective data reported in table 3.3 and 3.4 probabily be-
cause the amaze and the immersion provided by Gaze control give the players a more
complete and better game experience that overcame the perfomance leaks. The exper-
imentation showed a huge user interest about gaze based control applied to virtual 3D
55
3 – Software Applications
MethodGame 2 Game 3
Precision (%) Avg Time (s) Precision(%) Avg Time (s)DGC 68 0.94 45 0.47GKB 49 1.04 52 0.67
Mouse 90 0.94 68 0.51
Table 3.4. Games 2 and 3: Precision and Time
environment navigation and to game controlling.
56
Chapter 4
Domotics
Domotic systems, also known as home automation systems, have been available on the
market for several years, however only in the last few years they started to spread also
over residential buildings, thanks to the increasing availability of low cost devices and
driven by new emerging needs on house comfort, energy saving, security, communication
and multimedia services.
The challenge of intuitive and comprehensive eye-based environmental control system
requires innovative solutions on different fields: user interaction, domotic1 system control,
image processing. The current available solutions can be seen as “isolated” attempts at
tackling partial sub-sets of the problem space, and provideinteresting solutions in each
sub-domain.
This chapter seeks to devise a new-generation system, able to exploit state-of-the-art
technologies in each of the fields and anticipating interaction modalities that might be
supported by future technical solutions in a single integrated environment. In particular,
the paper presents a comprehensive solution, in which integration is sought along two
main axes:
• integrating various domotic systems
1The term domotic is a contraction of the words domus (the Latin word that means home) andinformatics.
57
4 – Domotics
• integrating various interaction methodologies
Current domotic solutions suffer from two main drawbacks: they are produced and
distributed by various electric component manufacturers,each having different functional
goals and marketing policies; and they are mainly designed as an evolution of traditional
electric components (such as switches and relays), thus being unable to natively provide
intelligence beyond simple automation scenarios. The firstdrawback causes interopera-
tion problems that prevent different domotic plants or components to interact with each
other, unless specific gateways or adapters are used. While this was acceptable in the
first evolution phase, where installations were few and isolated, now it becomes a very
strong issue as many large buildings such as hospitals, hotels and universities are mixing
different domotic components, possibly realized with different technologies, and need to
coordinate them as a single system. On the other hand, the roots of domotic systems
in simple electric automation prevent satisfying the current requirements of home inhab-
itants, who are becoming more and more accustomed to technology and require more
complex interaction possibilities.
In the literature, solutions to these issues usually propose smart homes[77], i.e.,
homes pervaded by sensors and actuators and equipped with dedicated hardware and
software tools that implement intelligent behaviors. Smart homes have been actively re-
searched since the late 90’s, pursuing arevolutionaryapproach to the home concept, from
the design phase to the final deployment. Involved costs are very high and prevented,
until now, a real diffusion of such systems, that still retain an experimental and futuristic
connotation.
The approach proposed in this paper lies somewhat outside the smart home concept,
and is based on extending current domotic systems, by addinghardware devices and
software agents for supportinginteroperation and intelligence. Our solution takes an
evolutionaryapproach, in which commercial domotic systems are extendedwith a low
cost device (embedded PC) allowing interoperation and supporting more sophisticated
automation scenarios. In this case, the domotic system in the home evolves into a more
powerful integrated system, that we call Intelligent Domotic Environment (IDE). IDEs
58
promise to achieve intelligent behaviors comparable to smart homes, at a fraction of the
cost, by reusing and exploiting available technology, and by providing solutions that may
be deployed even today.
On the other hand, interaction methodologies should take into account the latest re-
sults in human-environment interaction, as opposed to human-computer interaction. The
paradigm of “direct interaction”, so familiar in desktop environments and now also ex-
tended on the Internet with Web 2.0 applications, is not so natural when applied to en-
vironmental control. Selecting a user interface element that represents a physical object,
that is also within the user’s view field, is quite an indirectinteraction method. Directly
“selecting” objects by staring at them would be considerably more direct and intuitive.
Besides the technical difficulties of detecting the object(s) gazed by the user, there is a de-
sign trade-off between the more direct selection and the traditional mediated interaction.
While direct interaction eases object identification but leaves few options for specifying
the desired action, mediated selection, where the object isselected on a computer screen,
complicates object selection but allows an easy selection of the desired commands. In
addition, mediated selection allows interaction with objects that are not directly perceiv-
able by the user like thermal control, automated activationof home appliances or objects
in other rooms. The comprehensive solution proposed in thispaper seeks the appropriate
trade-off among these opposite interaction methods, proposing a system able to support
both, and to integrate them thanks to the aid of portable devices.
The overall vision is centered on DOG (DomoticOSGi Gateway)’ that, on one side,
builds an abstract and operable model of the environment (described in section 4.5) by
speaking with different domotic systems according to theirnative protocol, and with any
additional existing device. On the other side, it offers thenecessary APIs to develop
any kind of user interface and user interaction paradigms. In particular in this paper we
will explore eye-based interaction, and will compare “mediated” menu-driven interaction
(section 4.7.5) with innovative “direct” interaction.
Most solutions rely on a hardware component called residential [78] or home gate-
way [79] originally conceived for providing Internet connectivity to smart appliances
59
4 – Domotics
available in a given home. This component, in our approach, is evolved into DOG, an
interoperation system, where connectivity and computational capabilities are exploited to
bridge, integrate and coordinate different domotic networks. DOG exploits OSGi as a
coordination framework for supporting dynamic module activation, hot-plugging of new
components and reaction to module failures. Such basic features are integrated with an
ontology model of domotic systems and sourrounding environments named DogOnt. The
combination of DOG and DogOnt supports the evolution of domotic systems into IDEs
by providing means to integrate different domotic systems,to implement inter-network
automation scenarios, to support logic-based intelligence and to access domotic systems
through a neutral interface. Cost and flexibility concerns take a significant part in the
platform design and we propose an open-source solution capable of running on low cost
hardware systems such as an ASUS eeePC 701.
Ontology-based modeling in DOG is not limited to interoperation but can be leveraged
by applications to enhance the capabilities of controlled environments. For example, it
can support learning of user habits, reasoning about the home state, and context, or it can
be exploited to provide automatic and proactive security, and to implement comfort and
energy saving policies.
The chapter is organized as follows: in section 4.1 some relevant related works are
discussed, reporting state of the art solutions for gaze-based home interaction. Section
4.2 introduces the general architecture of the proposed approach. In section 4.5 the DOG
platform is described, starting from high-level design issues and including the description
of platform components and their interactions, while in Section 4.6 ontology-driven tasks
in DOG are described. Sections 4.7.5 and 4.7.6 compare the two, gaze-based, interac-
tion modalities, highlighting the pros and cons of both and analyzes how the two can be
successfully integrated.
60
4.1 – Domotics Background
4.1 Domotics Background
Vision is a primary sense for human beings; through gaze people can sense the environ-
ment in which they live, and can interact with objects and other living entities [80]. The
ability to see is so important that even inanimate things canexploit this sense for im-
proving their utility. Intelligent environments, for example, can exploit artificial vision
techniques for tracking the user gaze and for understandingif someone is staring at them.
In this case they become “attentive” being able to detect theuser’s desired interaction
through vision [81].
Home automation is a quite old discipline that today is gaining a new momentum
thanks to the ever increasing diffusion of electronic devices and network technologies.
Currently, many research groups are involved in the development of new architectures,
protocols, appliances and devices [82]. Also commercial solutions are increasing their
presence on the market and many brands are proposing very sophisticated domotic sys-
tems like the BTicino MyHome [83], the EIB/KNX [84], which is the result of a joint
effort of more than twenty international partners, the X10 [85] and the LonWorks [86]
systems. Many research works are evolving towards the concept of Intelligent Domotic
Environment, by adopting either centralized or distributed approaches that extend current
domotic systems with suitable devices or agents. The decreasing cost of hardware to-
gether with the constant increase in computational power and connection capabilities is
a major driving force, which currently drifts research efforts to systems based on simple,
embedded PCs able to bridge the interconnection gap between domotic systems and to
bring intelligence to homes. In this contex, Miori et al. [87] defined a framework called
DomoNet for domotic interoperability based on Web Services, XML and Internet proto-
cols. DomoNet defines so-called TechManagers, one per each integrated network, that
expose the domotic network capabilities as Web Services andact as proxies for the capa-
bilities of other networks (exposed by corresponding TechManagers). TechManagers dy-
namically discover, register and de-register services through standard Web Service facili-
ties such as UDDI. DomoNet differs from DOG in several aspects: first it replicates many
61
4 – Domotics
functionalities in virtual devices (proxies) available oneach TechManager, requiring a
considerable synchronization work, while DOG only allocates the necessary resources
implementing a centralized approach. Second, DomoNet, altough based on an ontology
model (DomoML), does not exploit facilities such as device abstraction/categorization,
functionality description and matching, and simply uses the ontology as a common vo-
cabulary for high-level XML messages.
Moon et al. [88] worked on a so-called Universal Middleware Bridge for allowing
interoperability of heterogenous house networks. Similarly to DOG, UMB adopts a cen-
tralized architecture where each device/network is integrated by means of a proper UMB
Adaptor. The Adaptor converts device-specific communication protocols and data (status,
functions, etc.) in to a global, shared representation which is then used to support inter-
operability. Differently from DOG, devices are described and abstracted by means of a
Universal Device Template, without an attached formal semantics. This prevents the sys-
tem to automatically perform generalization of device, functionality abstraction/matching
and reasoning. UMB does not define a standard access point forthe network it manages,
and requires applications to implement a proper connectionlogic (UMB Adaptor) to inter-
act with home devices and networks. The home server, in UMB, isseen more as a router,
which correctly delivers messages between different Adaptors (i.e., networks or devices),
than an intelligent component able to coordinate the connected devices/networks to reach
some goal.
Tokunaga et al. [89] defined a framework for connecting home computing middle-
ware, which tackles the problem of switching from one-to-one protocol bridges to one-to-
many conversions. In their work, Tokunaga et al. defined homecomputing middlewares
able to abstract/control physical devices allowing to coordinate several appliances without
a specific notion of used networks or protocols. Conversion ofprotocols is done by the
so-called Protocol Conversion Manager, that translates local information into a newly de-
fined Virtual Service Gateway protocol. Different protocolconversions can be combined
to achieve multi-network or multi-device interoperability.
62
4.2 – General architecture
Recently, literature reports some research about eye-gaze-controlledintelligentenvi-
ronments. In these studies two main interaction modalitiesare foreseen: direct interaction
and mediated interaction. In direct interaction paradigmsgaze is used to select and con-
trol devices and appliances either with head-mounted devices that can recognize objects
[90] or through “intelligent” devices that can detect when people stare at them [81]. Using
mediated interaction, instead, people control a software application (hosted on desktop or
portable PCs) through gaze, thus being able to control all home appliances and devices
[91].
While being interesting and sometimes very effective, the currently available solu-
tions only try to solve specific sub-problems of human-environment interaction, focusing
on single interaction patterns, interfacing a single or fewhome automation technologies.
This paper, instead, aims at integrating different interaction patterns, possibly exploiting
the advantages of all, and aspires to interoperate with virtually every domotic network and
appliances. The final goal is to provide a complete environment where the user can inter-
act with his house using the most efficient interaction pattern depending on his abilities
and on the kind of activities he wants to perform.
4.2 General architecture
Mixing interaction by gaze and home automation, requires anopen and extensible logic
architecture for easily supporting different interactionmodalities, on one side, and dif-
ferent domotic systems and devices on the other. Several aspects shall be in some way
mediated, including different communication protocols, different communication means,
different interface objects. Mediation implies, in a sense, centralization, i.e., defining a
logic hub in which specific, low level aspects are unified and harmonized into a common
high-level specification.
In the proposed approach, the unification point is materialized by the concept of a
“house manager” which is the heart of the whole logic architecture (Figure 4.1) and acts
as gateway between the user and home environments.
63
4 – Domotics
Figure 4.1. The general architecture of the proposed domotic system
On the “home side” the DOG interfaces both domotic systems and isolated devices,
capable of communicating over some network, through the appropriate low level proto-
cols (different for each system). Every message on this sideis abstracted according to
a high-level semantic representation of the home environment and of the functions pro-
vided by each device. The state of home devices is tracked andthe local interactions
are converted to a common event-based paradigm. As a result,low level, local events
and commands are translated into high-level, unified messages which can be exchanged
64
4.3 – Intelligent Domotic Environments
according to a common protocol.
On the application side, the high level protocol provided bythe manager gives home
access to several interface models, either based on direct or mediated interaction. Two
main models are discussed in this paper, the first based on attentive devices and the second
based on a more classical menu-based interface. The interested readers may find further
details in [91].
4.3 Intelligent Domotic Environments
An Intelligent Domotic Environment (IDE, Figure 4.1) is usually composed by one2, or
more, domotic systems, by a variable set of (smart) home appliances, and by a Home
Gateway that allows to implement interoperation policies and to provide intelligent be-
haviors.
Domotic systems usually include domotic devices such as plugs, lights, doors and
shutter actuators, etc., and a so-called network-level gateway that allows to tunnel low-
level protocol messages over more versatile, application independent, interconnection
technologies, e.g., Ethernet. These gateways are not suitable for implementing features
needed by IDEs as they have reduced computational power and they are usually closed,
i.e., they cannot be programmed to provide more than factorydefault functionalities.
However, they play a significant role in an IDE architecture as they offer an easy to exploit
access point to domotic systems.
Appliances can be either “dumb” devices that can only be controlled by switching on
and off the plugs to which they are connected or “smart” devices able to provide complex
functionalities and to control (or be controlled by) other devices, through a specific, often
IP-based, communication procol.
The Home Gateway is the key component for achieving interoperation and intelli-
gence in IDEs; it is designed to respond to different requirements, ranging from simple
2in this case interoperation may not be needed but intelligence still needs to be supported
65
4 – Domotics
bridging of network-specific protocols to complex interaction support. These require-
ments can be attributed to 3 priority levels: level 1 prorities include all the features needed
to control different domotic systems using a single, high-level, communication protocol
and a single access point, level 2 priorities define all the functionalities needed for defin-
ing inter-network automation scenarios and to allow inter-network control, e.g., to enable
a Konnex switch to control an OpenWebNet light, and level 3 requirements are related
to intelligent behaviors, to user modeling and to adaptation. Table 4.1 summarizes the
requirements, grouped by priority.
Table 4.1. Requirements for Home Gateways in IDEs.
Priority Requirement Description
R1 Interoperability
R1.1 Domotic networkconnection
Interconnection of several domotic networks.
R1.2 Basic Interoperabil-ity
Translation / forwarding of messages across different networks.
R1.3 High level networkprotocol
Technology independent, high-level network protocol for allowing neutral ac-cess to domotic networks.
R1.4 API Public API to allow external services to easily interact with home devices.
R2 AutomationR2.1 Modeling Abstract models to describe the house devices,their states and functionalities,
to support effective user interaction and to provide the basis for home intelli-gence.
R2.2 Complex scenarios Ability to define and operate scenarios involving different networks / compo-nents.
R3 Intelligence
R3.1 Offline Intelligence Ability to detect misconfigurations, structural problems, security issues, etc.R3.2 Online Intelligence Ability to implement runtime policies such as energy saving or fire prevention.R3.3 Adaptation Learning of frequent interaction patternsto ease users’ everyday activities.R3.4 Context based Intelli-gence
Proactive behavior driven by the current house state and context aimed at reach-ing specific goals such as safety, energy saving, robustnessto failures.
A domotic home equipped with a home gateway can be defined an Intelligent Domotic
Environment if the gateway satisfies at least level 1 and level 2 priorities. Level 3 priorities
can be considered advanced functionalities and may impose tighter constraints on the
gateway, both from the software architecture and from the computational power points of
view.
66
4.4 – OSGi framework
4.4 OSGi framework
OSGi technology is an Universal Middleware that provides a service-oriented, component-
based environment for developers and offers standardized ways to manage the software
life cycle. It provides a general-purpose, secure, and managed framework that supports
the deployment of extensible service applications known asbundles.
OSGi platform consists of 4 layers: Security, Module, Life-cycle and Service.
TheSecurity Layer is alike to standard Java security. This layer uses policy files to de-
termine what software bundles can and cannot do. It allows todynamically manipulate
permissions, i.e. changing policies on the fly or adding new policies for newly installed
components. It give the optional possibility to sign bundles.
TheModule Layer hosts bundles, that can contain Java packages and resources. A bun-
dle exports zero o more packages, that can be imported by other bundles, and keep private
other packages . The importer bundle has to specify a range ofcompatibleversions while
the framework resolves at run-time such dependencies.
TheLife-cycle Layer resolves the life-cycle of bundles within the platform. Bundles can
be installed, started, stoppedanduninstalled.
TheService Layercontains a registry where services are published. Bundles can register
their services, specifying also service properties, in theservice registry. Service registry
provides search functionalities on registered service based on a LDAP query language.
Every bundle can interact with other bundles just using and supplying services, re-
specting specification constrains included in every bundle.
4.5 DOG Architecture
DOG is a domotic gateway designed to respond to different requirements, ranging from
simple bridging of network-specific protocols to complex interaction support.
Design principles include versatility, addressed throughthe adoption of an OSGi [92]
67
4 – Domotics
based architecture, advanced intelligence support, tackled by formally modeling the home
environment and by defining suitable reasoning mechanisms,and accessibility to external
applications, through a well defined, standard API.
DOG is organized in a layered architecture with 4rings, each dealing with different
tasks and goals, ranging from low-level interconnection issues to high-level modeling and
interfacing (Figure 4.2). Each ring includes several OSGi bundles, corresponding to the
functional modules of the platform.
Ring 0 includes the DOG common library and the bundles necessary to control and
manage interactions between the OSGi platform and the otherDOG bundles. At this level,
system events related to runtime configurations, errors or failures, are generated and for-
warded to the entire DOG platform. Ring 1 encompasses the DOG bundles that provide
an interface to the various domotic networks to which DOG canbe connected. Each net-
work technology is managed by a dedicated driver, similar todevice drivers in operating
systems, which abstracts network-specific protocols into acommon, high-level represen-
tation that allows to uniformly drive different devices (thus satisfying requirement R1.1).
Ring 2 provides the routing infrastructure for messages travelling across network drivers
and directed DOG bundles. Ring 2 also hosts the core intelligence of DOG, based on
an abstract formal model of the domotic environment (DogOnt ontology), that is imple-
mented in the House Model bundle (R1.2, R1.3, R2.1 and, partially, R2.23). Finally, ring
3 hosts the DOG bundles offering access to external applications, either by means of an
API bundle, for OSGi applications, or by an XML-RPC endpoint for applications based
on other technologies (R1.4).
In the following subsections each DOG bundle is described inmore detail, focusing
on provided services and functionalities.
3In the currently implemented version, external applications can control many domotic networks as asingle home automation system, while network-to-network integration is still being inplemented.
68
4.5 – DOG Architecture
OSGi
DOGLibrary
ConfigurationRegistry
Network Drivers
MyHome
Konnex
SimulatorPlatformManager
MessageDispatcher
Executor
HouseModel
Status
API
XML−RPC
Ring 0 Ring 1 Ring 2 Ring 3
Figure 4.2. DOG architecture.
4.5.1 Ring 0
DOG library This bundle acts as a library repository for all other DOG bundles. It de-
fines the interfaces (Table 4.2) through which bundles can interact, either by providing
or consuming services, the DogMessage objects exchanged bybundles during runtime
operations, and the subset of DogMessages that is exposed toexternal applications, either
through OSGi integration or XML-RPC calls. Exposed messages, called DogML mes-
sages, are expressed in XML and must be valid according to theDogMLSchema (XSD)
provided by the DOG Library bundle. Interaction of different bundles is based on ex-
changing DogMessage objects, also defined here. DogMessages are composed by atype
declaration, that identifyies the type of content, and apayload that stores the content.
A subset of DogMessages is also available to external applications, either through OSGi
integration or XML-RPC calls: such exposed messages are called DogML messages, and
69
4 – Domotics
are encoded in XML according to the DogMLSchema (XSD) provided by the DOG Li-
brary bundle.
Platform manager This bundle handles the correct start-up of the whole systemand
manages the life cycle of DOG bundles. The platform manager coordinates bundle acti-
vations, enforcing the correct start-up order, and managesbundle errors with a two-stage
strategy. First, it attempts to restart modules brought down by uncaught exceptions, then,
if after re-bootstrapping bundles are still in error, it notifies the unavailability of the in-
terrupted services to all other DOG bundles. The platform manager can be extended to
integrate principles from autonomic computing, exploiting more advanced techniques to
react to failures and keeping DOG operational, as long as possible. The management
of the bundle failures is the base feature of the future and more complexautonomic be-
haviour that the next version of DOG should support. In this way it is possible to build
up a decentralized engine, which can continuing work even ifsome of its parts becomes
unusable. When a bundle becomes available, the DOG is immediately notified and can
start using it: the availability of the bundle services can trigger the availability of other
bundles and services. All that, without the necessity to restart the system.
Configuration Registry The Configuration Registry implements theConfiguratorin-
terface by maintaining and exporting bundle configuration parameters. Typical examples
of these parameters are IP addresses and ports for network drivers that interface network-
level gateways, the ontology repository location for the House Model bundle, the bundle
versions needed to manage compatibility issues, and so on.
4.5.2 Ring 1
Network Drivers In order to interface domotic networks, DOG provides a set ofNet-
work Drivers, one per each different technology (e.g., KNX,OpenWebNet, X10, etc.).
Every network driver implements a “self-configuration” phase, in which it interacts with
the House Model (through the HouseModeling interface) to retrieve the list of devices to
be managed, together with a description of their functionalities, in the DogOnt format.
70
4.5 – DOG Architecture
Every device description carries all the needed low-level information like the device ad-
dress, according to the network dependent addressing format (simple in OpenWebNet,
subdivided in group and individual addresses in KNX, etc.).
Network Drivers translate messages back and forth between Dog bundles and network-
level gateways, they implemement theCommandExecutorinterface that supports queries
and commands issued by other DOG bundles. In addition, they use the services defined
by theStatusListenerinterface to propagate state changes to registered listeners (the DOG
Status bundle, for example). Monitoring, at the Network Driver level, can either be done
by listening to network events or by performing polling cycles when domotic networks
do not propagate state-change messages. In both cases, the driver bundles provide state
updates to the other DOG bundles by using the typical event-based interaction paradigm
supported by OSGi.
Currently three Network Drivers have already been developed: Konnex, OpenWebNet and
Simulator, .which emulates a synthetic environment by either generating random events
or by re-playing recorded event traces. Durigin the start up, Network Drivers interact with
HouseModeler services to retrieves information about the devices controllable by them.
Those information include both devices typology and low-level data, such as physical ad-
dresses, groups numeber, etc. Network Drivers provide the CommandExecutor service
that permits other DOG bundles to send commands and inquiry the device states. Fur-
thermore Network Drivers use StatusListener services to notify the events happend in the
domotic system, e.g. light A11 is now OFF. At the present timethe DOG is composed by
Konnex, BTcino and Simulator network drivers.
The Konnex and BTcino bundles provide the traslation of high level commands, coming
from the applications, into the low-level signals and messages of their respective net-
works. They also conforms the low-level events, originatedin their networks, into the
DOGmessage format and forward them to all the bundles that exports the StatusListener
service. The Simulator network driver aims to control a virtual domestic environment
containing fictious devices and to simulate their behavioureither with random generation
of events or with the execution of a pre-recorded list of commands.
71
4 – Domotics
Table 4.2. Interfaces defined by the DOG library bundle
Interface DescriptionApiConnector Allows OSGi bundles to control the domotic environment through a
technology independent set of functionalities. More precisely, it al-lows to get the IDE configuration, to send commands to connecteddevices, to query the device states and to receive state-change noti-fications.
Configurable Supports runtime configuration of bundles. Every Configurablebundle can be tuned by external applications, that can adjust theparameters exposed through this interface.
CommandExecutor Provides means to propagate commands to the proper bundles.HouseModeling Provides access to the house formal model (DogOnt). It is used
to retrieve the house configuration, which is propagated to networkdrivers, to get device re-mapping, allowing to automaticallyrecog-nize new devices, to semantically check commands and notifica-tions, and to resolve group commands (scenarios).
StateAndCommandChecker Defines methods for validating commandsand states in DOG. Vali-dation is both syntactic, ensuring that received messages (in XML)are well formed and valid, and semantic, thus guaranteeing that ev-ery device is driven by using the appropriate commands accordingto the HouseModeling interface.
StateListener Allows bundles to be notified when managed devices change theirstates.
StateProvider Provides information about the current stateof devices.Configurator Defines a repository of start-up bundle configurations. Each bundle
accesses the services exposed through this interface in thestart-upphase, to retrieve the needed configuration parameters.
DevicesListUpgrade Permits to update the routing tables andthe list of devices currentlymanaged by DOG. In conjunction with the Configurable interface,it allows runtime changes of the DOG configuration, supporting dy-namic scenarios where new devices / networks can be hot-pluggedinto the system.
4.5.3 Ring 2
Message DispatcherThis bundle acts as an internal router, delivering messagesto the
correct destinations, be they the network drivers (commands and state polls) or other
DOG bundles (notifications). Routing is driven by a routing table where network drivers
are associated to high-level device identifiers, enbling DOG to deliver commands to the
right domotic network. For example, if a high-level DogMessage specifies that the kitchen
light must be lit, and if the House Model reports that the light belongs to Konnex plant,
then the message dispatcher routes the message to the Konnexnetwork driver. The routing
table, dynamically built through the DevicesListUpgrade service, is initially provided by
the Configurator bundle during the start up phase and constantly updated by Network
Drivers.
72
4.5 – DOG Architecture
Executor The Executor validates commands received from the API bundle, either
directly or through the XML-RPC protocol. Commands are syntactically validated, by
checking the relation between DogMessage type declarationand DogMessage content,
and semantically validated, by checking them against the set of commands modeled by
the HouseModel ontology. If all checks are passed, the Executor forwards messages to
the Message Dispatcher for the final delivery. Otherwise messages are dropped, avoid-
ing to generate a platform inconsistent state. Thanks to itsrole in filtering and checking
high-level messages in DOG, the Executor is a suitable candidate for future implementa-
tions of rule-based, runtime automation scenarios or safety policies, including command
prioritization, required for implementing security and manual override mechanisms. This
bundle listens the commands from API bundle, then validatesthem using the Status bun-
dle services and, finally, if the validation was successful it forwards the commands to the
Message Dispatcher bundle. An alternative system architecture could prefigure the merg-
ing of Executor and Message Dispatcher bundles that actually execute simalar functions,
i.e. they forward commands from the higher levels of the system to the lower ones. Nev-
ertheless these bundles have been separated because in the future version of DOG their
roles will diverge more and more. The Executor bundle will manage thesmartforwarding
of the messages, based on priority rules. Rule engine and Virtual Device manager: these
bundles are the core of the DOG intelligence. The Rule Engine allows the management
of user defined scenarios and also automated reaction to alarms. The Virtual Device Man-
ager provides device-level abstraction to real devices adding capabilities through software
emulation.
Status The Status bundle caches the states of all devices controlled by DOG by lis-
tening for notifications coming from network drivers. This state cache is extensively used
in DOG to reduce network traffic on domotic busses, and to filter out un-necessary com-
mands, e.g., commands whose effects leave the state of the destination devices unchanged.
Since missed network-level messages or other network-related errors may result in an
occasional state cache inconsistency, the Status bundle performs a low priority polling
cycle, in which suitable DogMessages are generated for querying Network Drivers and
73
4 – Domotics
consequently updating the state cache. The same query messages can also be generated
by the API bundle for directly taking a real-time snapshot ofthe house state, bypassing
the Status module.
In addition it offers the StatusChecker service that validate the command sent by ex-
ternal applications. The validation involves two steps: a syntax and a semantic control of
the commands.
The first step check if the DOGmessage, containg the command,is well-formed, while
the second step check:
• the existence of the devices related to the command
• the usefulness of the command, e.g if the command could product a change of rhe
devices states.
• the semantic probity of the command, e.g if the devices, releted to the command,
could support its execution. For example the validation fails if the command attempt
to OPEN aLAMP .
House ModelThe House Model is the core of intelligence of the DOG platform. It is
based on a formal model of the house defined by instatiating the DogOnt ontology [93].
According to the Gruber’s definition [94] an ontology is an “explicit specification of a
conceptualization,” which is, in turn, “the objects, concepts, and other entities that are
presumed to exist in some area of interest and the relationships that hold among them”.
Today’s W3C Semantic Web standard suggests a specific formalism for encoding ontolo-
gies (OWL), in several variants that vary in expressive power[95]. DogOnt is a OWL
meta-model for the domotics domain describing where a domotic device is located, the
set of its capabilities, the technology-specific features needed to interface it, and the pos-
sible configurations it can assume. Additionally, it modelshow the home environment
is composed and what kind of architectural elements and furniture are placed inside the
home. It is organized along 5 main hierarchy trees, including: Building Thing, model-
ing available things (either controllable or not);Building Environment, modeling where
things are located;State, modeling the stable configurations that controllable things can
74
4.5 – DOG Architecture
assume;Functionality, modeling what controllable things can do; andDomotic Network
Component, modeling peculiar features of each domotic plant (or network). TheBuild-
Uncontrollable
Controllable
Konnex component
...
BTicino component
Discrete state
Continuous stateState
Query functionality
Notification functionality
Control functionality
Functionality
Room
Garden
Garage
Building Environment
Domotic network component
Building ThingDogOnt
....
Figure 4.3. An overview of the DogOnt ontoloy
ingThingtree subsumes theControllableconcept and its descendants, which are used to
model devices belonging to domotic systems or that can be controlled by them.
Devices are described in terms of capabilities (Functionalityconcept) and possible
configurations (Stateconcept). Functionalities are mainly divided inContinuousandDis-
crete, the former describing capabilities that can be variated continuously and the latter
referring to the ability to change device configurations in adiscontinuous manner, e.g.,
to switch on a light. In addition they are also categorized depending on their goal, i.e. if
they allow to control a device (Control Functionality), to query a device condition (Query
Functionality) or to notify a condition change (Notification Functionality).
Each functionality instance defines the set of associated commands and, for continu-
ous functionalities, the range of allowed values, thus enabling runtime validation of com-
mands issued to devices. Devices also possess a state instance deriving from aState
subclass, which describes the stable configurations that a device can assume. EachState
class defines the set of allowedstate values; states, like functionalities, are divided in
ContinuousandDiscrete.
DOG uses the DogOnt ontology for implementing several functionalities (Section 4.6)
encompassingcommand validation at run-time, using information encoded in function-
alities, stateful operation, using the state instances associated to each device,device
abstraction leveraging the hierarchy of classes in the controllable subtree. The last
75
4 – Domotics
operation, in particular, allows to deal with unknown devices treating them as a more
generic type, e.g., a dimmer lamp can be controlled as a simple on/off lamp. Ontology
instances modeling controlled environments are created off-line by means of proper edit-
ing tools, some of which are currently being designed by the authors, and may leverage
auto-discovery facilities provided by the domotic systemsinterfaced by DOG.
4.5.4 Ring 3
API Services provided by DOG are exposed to external OSGi-basedapplications by
means of the API bundle that allows to retrieve the house configuration, to send com-
mands to devices managed by DOG and to receive house events.
• getConfiguration.The ability to request the house configuration, including the pos-
sible states and the allowed commands of each device managedby DOG.
• setCommand.The ability to send single and multiple commands to devicesmanaged
by DOG, independently from the domotic network to which theyare connected
(thus supporting inter-network scenarios).
• setListener.The possibility to register an application as event listener, thus enabling
event-based interaction with IDEs, even if managed networks natively require a
polling-based interaction.
• getDeviceStatus. The ability to directly check the state of house devices, bypassing
the internal cache hosted by the Status bundle.
The API bundle is able to directly interact with the HouseModel every time a complex,
multiple command must be resolved into a set of single commands to be issued to the
proper network drivers. For example, the command for switching all lights off is con-
verted by the API module in a set of “Switch OFF” commands issued to all devices
modeled asLampin the House Model ontology.
Messages exchanged between the API bundle and external OSGiapplications must con-
form to the DogML schema defined by the DOG Library bundle.
76
4.6 – Ontology-based interoperation in DOG
XmlRPC The XmlRPC bundle simply provides an XML-RPC endpoint for services
offered by the API bundle, thus enabling non-OSGi applications to exploit DOG services.
It implements a light-weight web server able to listen for remote procedure calls and to
map such calls to API calls. Similarly to the API bundle, all messages exchanged by the
XML-RPC bundle must conform to the DogMessage XSD, as a consequence all exported
methods will require a single, string parameter holding therequest message in XML.
This bundle is strictly connected to API bundles and make useof ApiConnector service
in order to allow external applications to control the Domotic Enviromnent. It runs an
XML-RPC server that wraps and exports the four control methods offered by API bundle.
All the published methods accept strings as parameter and return string. Those strings
contain DogComML XML messages. The applications can retreive the device states by
polling technique or by registering themselves as status listener.
4.6 Ontology-based interoperation in DOG
4.6.1 Start-up
In the start-up phase, information contained in the DogOnt ontology instantiation that
models the controlled environment, and exposed through theHouse Model bundle, is
queried to configure network drivers and to deal with unknowndevice types. When a
DOG instance is run, DOG bundles are started, with a bootstrap order defined by the
Platform Manager bundle. The House Model is one of the first available services and
is used by network drivers to get the list of their managed devices. The first interac-
tion step involves querying a DogOnt instantiation, using SPARQL, for extracting device
descriptions, filtered by technology (e.g., searching specific DomoticNetworkComponent
instances). Each device description contains the device name as well as all the low level
details needed by drivers to communicate with the corresponding real device.
Once the complete device list is received, each driver builds a mapping table for
translating high-level commands and states defined in DogOnt into suitable sequences
77
4 – Domotics
SELECT ?x WHERE{ ?x a d:OpenWebNetComponent }
(a)
SELECT ?x WHERE{ ?x a d:KonnexComponent }
(b)
Figure 4.4. SPARQL queries for retrieving all BTicino OpenWebNet (a) andall KNX (b) in DogOnt.
78
4.6 – Ontology-based interoperation in DOG
of protocol-specific messages. In this phase, drivers can possibly find unsupported de-
vices, i.e., devices that they cannot control as no mapping between high and low level
messages is defined, yet. In this case, a further interactionwith the House Model re-
quests a generalization step for instances of unknown devices. For each unknown device,
the House Model retrieves the super-classes and provides their descriptions back to the
network drivers. In this way specific devices (e.g., a dimmerlamp) can be treated as
more generic and simpler ones (e.g., a lamp), for which network drivers have the proper
mapping information. This automatic reconfiguration capability to deal with unsupported
devices sustains DOG scalability: even if devices (and their formalization) evolve more
rapidly than drivers, they can still be controlled by DOG, although in a restricted manner.
4.6.2 Runtime command validation
At runtime, the DogOnt instantiation exposed by the House Model is exploited to seman-
tically validate received requests and internally generated commands. For each DogMes-
sage requiring the execution of a command, i.e., requiring an action on a given domotic
component, the command value is validated against the set ofpossible values defined in
DogOnt for that component type. Validation proceeds as follows: when a DogMessage
containing a command needs validation, the House Model queries DogOnt for allowed
commands (Figure 4.5) and, if necessary, retrieves the allowed range associated to each
of its parameters. If the DogMessage command complies with constraints extracted from
the ontology model, the command is considered valid and propagation to the DOG Mes-
sage Dispatcher is allowed, otherwise the command is rejected and the message dropped
without any further consequences (except logging).
4.6.3 Inter-network scenarios
Together with validation and automatic generalization of devices, the House Model as-
sumes a crucial role in the definition of scenarios and commands involving more than
one domotic network. This is a typical case for home scenarios, i.e., for commands that
79
4 – Domotics
SELECT ?h WHERE{ d:DimmerLamp rdfs:subClassOf ?q.?q rdfs:subClassOf ?y. ?y rdf:type owl:Restriction.?y owl:onProperty d:hasFunctionality.?y owl:hasValue ?z. ?z a ?p. ?p rdfs:subClassOf ?l.?l rdf:type owl:Restriction.?l owl:onProperty d:commands.?l owl:hasValue ?h}
Figure 4.5. The SPARQL query needed to retrieve the commands that can beissued to aspecific device, e.g. a DimmerLamp.
coordinate different devices to reach a given high-level goal, for example to set-up a com-
fortable environment for watching the TV.
If a scenario involves devices belonging to different domotic plants, the abstraction
introduced by DogOnt allows to define operations in a technology neutral way and to
properly generate the corresponding DogMessages that are then converted into network
specific calls.
Example A very common scenario, available in almost all domotic environments is the
“switch-all-lights-off” scenario, that turns off all the lights of a given domotic home. If
the sample home contains more than one domotic plant, DOG allows to implement the
scenario easily, by means of its House Model bundle. Thanks to the abstraction provided
by the DogOnt instantiation managed by the House Model, the “switch-all-lights-off” can
be simply modeled by a rule stating that allLampdevices shall receive anOFF command,
defined by the basicOnOffFunctionalityInstanceassociated to eachLamp(Figure 4.6).
Lamp(?x)ˆhasState(?x,?y)ˆDiscreteState(?y)ˆˆvalueDiscrete(?y,?z)ˆequals(?z,"ON")->valueDiscrete(?y,"OFF")
Lamp(?x)ˆhasState(?x,?y)ˆContinuousState(?y)ˆvalueContinuous(?y,?z)ˆge(?z,0)->valuContinuous(?y,0)
Figure 4.6. The switch-all-lights-off rule, in Turtle notation.
This rule, when triggered by a call to the API bundle, requires a reasoning step, called
Transitive Closure, that allows to propagate properties (e.g., functionalities) along the
ontology hierarchy, thus allowing to recognize all the instances ofLampdescendants as
80
4.7 – Case study
Lamps. For each of them, a suitable DogMessage is generated, carrying detailed infor-
mation about the destination device, modeled in the ontology by subclassing the proper
DomoticNetworkComponent. Resulting DogMessages are then propagated by the Mes-
sage Dispatcher to the network drivers, which, in turn, power off the corresponding lamp
devices, be they simple lamps, dimmers or very complex illumination systems.
4.7 Case study
In order to test the DOG functionality on the field, an experimental setup involving two
different domotic networks has been deployed, using a BTicino MyHome demo-box and
a Konnex demo-box crafted by the authors using off-the-shelf Siemens and Merten KNX
devices (Figure 4.7).
Figure 4.7. The demonstration cases used to perform functional tests on DOG.
DOG has been implemented in Java, as a set of OSGi bundles running on the Equinox
open source framework [96]. The DogOnt ontology is managed by the HouseModel using
the HP Jena API while the XML-RPC module exploits the Apache XML-RPC API[97].
DOG is currently released under the Apache license and can run over very cheap devices
such as the ASUS eeePC (used in the experiments), a sub-laptop based on an Intel Celeron
processor, whose cost is of about 300 euros4.
Experiments aimed only at qualitatively testing the functionalities described in this
paper, while more sound evaluation through performance benchmarks and user studies
4much less than the cost of toolbox components, which is around 3000 euros each.
81
4 – Domotics
is planned as a future work. Experiments showed a generally responsive management
of domotic devices, which always reacted in a reasonable time window (delays were un-
noticeable), even when executing complex inter-network scenarios. Reaction to driver
failures was satisfying: even when disconnecting one of thetwo demonstration cases, the
other continued to work. Moreover, the automatic driver detection mechanism of DOG al-
lowed to effectively manage hot-plugging of cases, exponsing their device functionalities
few seconds after plugging-in the demonstration case.
4.7.1 Dynamic Startup Configuration
During the initial phase of the system start-up the platformmanager check the list of cor-
rect installed bundlem, then it starts them in according with the order stored into its con-
figuration files. The house model bundle it’s started just after two systems modules, i.e.
the dog library and the configuration registry bundles. The network driver bundles con-
trol firstly the connections with the physical network and then communicate with House
Model to retrieve the list of device that can be controlled bythem. The device list includes
the device names, the device type and low-level configuration parameters (address, port,
group number, . . . ). The network driver have to map the abstract type provided by the
ontology, included in the house model bundle, into real device type. In detail they trans-
late each abstract command or state (e.g. ON,OFF,CLOSE,. . . )into the proper signals or
messages in according with the network protocol and viceversa. The DOG architecture
allows to control recent devices even if the Network Driver not completly support them.
When a network driver bundle receive from the house model an unknown device type it
suddenly enquiry the hierarchy of that device type, so if thedriver support at least one
parents of the unknown type the bundle can control it by usingthe functionalities of that
parent. For example a new flashing lamp has been installed e configured (see section
4.7.3:Adding a new device) in the domotic environment and unfortunately the network
driver provided with the DOG has not been updated. In the starting phase the network
driver bundle retrieve that exists a device of the Flashing Lamp type. It can not directly
82
4.7 – Case study
control the device because has not stored the translation mapping of that device. The net-
work driver bundle enquiry the House Model for information about Flashing Lamp and it
finally knows that a Flashing Lamp is also a Lamp.
4.7.2 Complex Command Execution
A diagram of the complex command execution, inside the DOG system, is provided in
figure 4.8.
Figure 4.8. Complex Command Execution
The commands execution involves 9 steps:
1. An external application send to DOG a command in DogML formart, for example
switch off all the lights in the house
2. The API bundle parse the DogML message and envelope it intoa DogMessagese,
then forward that message to the House Model bundle.
3. The House Model performs a reasoning on the DogOnt ontology (see 4.6) and re-
trieves the list of devices that mach the query. In the considered example the House
Model should return either theSimple Lamps, Dimmer Lamps and Flashing Lamps.
83
4 – Domotics
4. The Api generates a DogMessage containing the device listand the related com-
mand.
5. The Command Executor forwards the received DogMessage to Status Bundle.
6. The Status Bundle validates and verifies the command, included in the DogMessage
and send it again to the Command Executor.
7. The Command Executor if the command passed the validation send it to Message
Dispatcher else generates an error message.
8. The Message Dispatcher routes the received DogMessages to the relative Network
Drivers, in according to the routing table.
9. The Network Drivers translate DogMessages into the respective low-level protocol
and execute commands.
4.7.3 Adding a new device
There are different alternatives in order to install a new device in the domotic environment,
as shown in the diagram in Figure 4.9. This process may involves three categories of
people:
Electricians They are responsable of the physical connection and configuration of the
new device.
Domotic technicians They configure the House Model by the modification of the Do-
gOnt ontology. Typically Domotic technicians are trained electricians.
Network Driver Developers They are specialized computer programmer that develop
and upgrade the network drivers.
Users They can handle easy configuration issues through a graphical user interface.
84
4.7 – Case study
Figure 4.9. Adding a new device
When a new device has to be installed an electrician provides the physical connections,
then a DogOnt ontology update is necessary. The DogOnt can beconfigured either by
users throught a graphical interface or by the domotic technicians. The intervention of
a domotic technician is necessary when the device needs complex configuration (e.g.
a multimedial workstation that couald control sever other devices). If the new divice
belongs to a type already present in the DogOnt, the ontologyupdate consists in adding
a new device instance into DogOnt. Otherwise a new device type has to be added to the
ontology. If such device type has a parent in DogOnt the new device can be partially
controlled otherwise an upgrade of the relative network drivar is necessary. The partial
control is provided by DogOnt reasoning up on device type inheritance.
The Network Driver Developers have to develop the uptodate drivers to control the device
of a completly new type. The DOG developers can develop a new device type driver in
85
4 – Domotics
about 2 hours. The current version of DogOnt already includes several kinds of device
type, so it is quite common to find at least one parent for the new device type, so even if
partially a new device can be controlled just after the physical installation and the ontology
configuration.
4.7.4 Comparison of DOG to related works
Table 4.3 shows the degree with which DOG, and the related works described in this
section, satisfy requirements for IDE domotic gateways (listed in Table 4.1).
Table 4.3. Requirements satisfied by related works, in comparison with DOG.
Req. DomoNet UMB [89] DOGR1.1 ++ ++ ++ ++R1.2 ++ ++ ++ ++R1.3 ++ ++ ++ ++R1.4 + - + ++R2.1 + - -- ++R2.2 + + + +R3.1 -- -- -- -R3.2 -- -- -- -R3.3 -- -- -- -R3.4 -- -- -- --
Legend: ++completely satisfied,+partially satisfied,- easily satisfiable in the current architecture,-- requires significant platformreengineering.
4.7.5 Mediated Interaction
Configuring, activating or simply monitoring complex appliances as well as complex sce-
narios can become really difficult by only gazing at them. In these cases a mediated
interaction which allows users to control the several aspects involved in these operations
through a menu-based PC application can be more effective.
In the mediated interaction paradigm, gaze-based actions and reactions are accom-
plished through a menu-driven control application that allows users to fully interact with
the domotic environment.
Such applications shall exhibit some constraints with respect to the different categories
86
4.7 – Case study
of users being expected. When users need a different application layout, related for ex-
ample to the evolution of their imparment, they shall not be compelled to learn a different
way of interacting with the application. In other words, theway in which commands are
issued shall persist even if the layout, the calibration phase or the tracking mode changes.
To reach this goal the interaction pattern that drives the command composition has
to be very natural and shall be aware of the context of the application deployment. For
example, in the real world, if a user wants to switch on the kitchen light, They go in that
room, then they searche the proper switch and finally confirmsthe desired state change
actually switching on the light. This behaviour has to be preserved in the control applica-
tion command composition and the three involved steps must remain unvaried even if the
application layout changes according to the eye tracker accuracy.
Figure 4.10. The control application with a quite accurate tracker.
In this paper, mediated interaction can either be driven by infrared eye trackers (maxi-
mum accuracy/resolution) or by visible light trackers (web-cam or videoconference cam-
eras, minimum accuracy/resolution). These two extremes clearly require different visual
layouts for the control application, due to differences in tracking resolution and movement
granularity.
87
4 – Domotics
In the infrared tracking mode, the system is able to drive thecomputer mouse directly,
thus allowing the user to select graphical elements as largeas normal system icons (32x32
pixels wide). On the other hand, in the visible light tracking mode few areas (6 as an
example) on the screen can be selected (on a 1024x768 screen size this would mean that
the selectable area is approximately 341x384 pixels). As a consequence, the visual layout
cannot remain the same in the two modes, but the interaction pattern shall persist in order
to avoid the user to re-learn the command composition process, which is usually annoying.
Figure 4.11. The control application with a low-cost visible light tracker.
As a first experiment, we created the two possible layouts presented in Figures 4.10
and 4.11, in which the number of active areas are 32 pixels and300 pixels, respectively.
Despite of a 10x change in tracking resolution, the information architecture remain co-
herent, only layout disposition and scanning sequences areaffected. The prototype user
interface has been designed to minimize screen clutter, thus easing eye tracking selection.
The complete interaction pattern implemented by the control application can be sub-
divided in two main components referred to as active and passive interface. The former
takes place when the user wants to explicitly issue a “command” to the house environ-
ment. Such a command can either be an actuation command (openthe door, play the cd,
88
4.7 – Case study
etc.) or a query command (is the fridge on?, ...).
The second part, instead is related to alert messages or actions forwarded by the House
Manager and the Interaction Manager for the general perception of the house status.
Alerts and actions must be managed so that the user can timelynotice what is happening
and provide the proper responses. They are passive from the user point of view since the
user is not required to actively perform a “check” operation, polling the house for possibly
threatening situations or for detecting automatic actions. Instead, the system pro-activity
takes care of them. House states perception shall be passiveas the user cannot query every
installed device to monitor the current home status.
The alerting mechanism and status update are priority-based: in normal operating con-
ditions, status information is displayed on a banner, carefully positioned on the periphery
of the visual interface avoiding to capture user’s attention too much and is kept out of
the selectable area of the screen to avoid so-called “Midas Touch” problems [98] where
every element fixed by the user gets selected. In addition, the availability of a well known
rest position for the eyes, to fix, is a tangible value added for the interface, which can
therefore support user pauses, and, at the same time, maximize the provided environment
information.
Whenever a high priority information (alerts and Rule Engine actions) has to be con-
veyed to the user, the banner is highlighted and the control application plays an alert sound
that requires immediate user attention.
4.7.6 Direct interaction
When the objects to be controlled or actuated are simple enough, a direct interaction
approach can avoid the drawbacks of a conventional environmental control system that
typically utilises eye interaction with representative icons displayed on a 2D computer
screen. In order to maximize the interface efficiency in these cases, a novel approach
using direct eye interaction with real objects (environmental devices) in the 3D world
has been developed. Looking directly at the object that the user wishes to control is an
extremely intuitive form of user interaction and by employing this approach the system
89
4 – Domotics
does not inherently need the user to sit incessantly before acomputer monitor. This then
makes it suitable for implementation in a wider range of situations and by users with a
variety of abilities. For example, it immediately removes the need for the user first to
be able to distinguish small icons or words, representativeof environmental controllable
devices, on a monitor before making a selection. The approach is termed ART - Attention
Responsive Technology [99]. For many individuals with a disability the ability to control
environmental devices without the help of a family member orcarer is important as it
increases their independence. ART allows anyone who can control their saccadic eye
movements to be able to operate devices easily. A second advantage of the ART approach
is that it simplifies the operation of such devices by removing the need to always present
the user with an array of all potential controllable environmental devices every time the
user wishes to operate one device. ART only presents the userwith interface options
directly related to a specific environmental device, that device being the one that the user
has looked at.
Attention Responsive Technology (ART)
With the ART approach the user can sit or stand anywhere in theenvironment and indeed
move about the environment quite freely. If s/he wants to change an environmental de-
vice’s status, for instance to switch on a light, the user simply visually attends to (looks
at) the light briefly. The ART system constantly monitors theuser’s eye movements and
ascertains the allocation of visual attention within the environment, determining whether
the user’s gaze falls on any controllable device. The environment surrounding the user is
imaged by a computer vision system, which identifies and locates any pre-known device
falling within the user’s point of gaze. If a device is identified as being gazed at, then the
system presents a simple dialogue to ask the user to confirm his/her intention. The actual
interface dialogue can be of any form, for instance a window on a touch sensitive screen
or any tailor-made approach depending on the requirements of the disabled users. Finally
the user would execute an appropriate control to operate thedevice.
90
4.7 – Case study
ART development with a head-mounted eye tracker
A laboratory-based prototype system and its software control interface have been devel-
oped [100, 101]. A head-mounted ASL 501 eye tracker, as shownin Figure 4.12, is used
to record a user’s saccadic eye movements. This comprises a control unit and a headband,
on which both a compact eye camera, which images one eye of theuser, and a scene cam-
era, which images the environment in front of the user, are mounted. Eye movement data
are recorded at 50Hz from which fixation points of varying time periods can be derived.
In order to calibrate the eye movement recording system appropriately the user dons the
ASL system and then must first look at a calibration chart comprising a series of known
spatially arrayed points.The relationship between the eyegaze data from the eye camera
and their corresponding positions in the scene camera are built up by projecting the same
physical point in both coordinate systems using an affine transformation. Eye data are
therefore related to the scene camera image.
Figure 4.12. ASL 501 headband attaching the two optics system.
In order for the ART system to recognise an object in the environment all controllable
devices are first imaged by the system. To do this each device is presented to the scene
camera and imaged at regularly spaced angles when their image SIFT features [102] are
extracted. These features are then stored in a database. Newdevices can easily be added,
as these simply need to be imaged by the ART system and their SIFT features automat-
ically added to the database. To complement each device added, the available device
control operations for it are added to the system so that whenthat device is recognised by
91
4 – Domotics
the ART system then such controls are proffered to the user.
In order to operate a device the user gazes steadily at the device in question. The pro-
totype system is currently driven by a control interface as seen in Figure 4.13. The ART
system recognises the user’s steady gaze behaviour, this can comprise several similarly
spatially-located eye fixations with the overall dwell timeand spatial parameters being
user-specified, which is recorded and a stabilised point of gaze in 3D space is determined
as shown in Figure 4.14(a). This gaze location information is then analysed with respect
to the scene camera image to determine whether or not it fallson any controllable object
of interest. Figure 4.14(b) shows the detection of such a purposeful gaze. A simple in-
terface dialogue, as illustrated in Figure 4.14(c), then appears (in the laboratory prototype
this is on a computer display) asking for the user to make his/her control input and the
system then implements the control action necessary.
Figure 4.13. The ART system control interface
There are two parts to this control interface; the information and feedback offered
to the user and the input that the user can make to the system. The former is currently a
computer display but could easily be something else, such asa head-down display or audio
menu rather than a visual display. The input function can also comprise tailor-designed
92
4.7 – Case study
inputs e.g. touchable switches, chin controlled joy stick,sip/puff switch, or by gaze dwell
time on the displays buttons, depending on the capabilitiesof the disabled user. In the
first ART development the actual device operation was controlled by an implementation
of the X10 protocol, in this work, instead, the ART system hasbeen connected to the
House Manager, enabling users to issue commands to almost every device available in
their homes, without being bound to adopt a specific domotic infrastructure.
Figure 4.14. Typical stages of the ART system (a. Stability of eye gaze captured b. Gazeon object detected c. Control initiated)
One issue of an eye controlled system is the potential false operation of a device sim-
ply because the user’s gaze is recorded as falling upon it. Inherently the user’s gaze must
always fall on something in the environment. There are two built-in system parameters
to overcome this. Firstly, the user must actively gaze at an object for a pre-determined
time period; this is both necessary for the software to identify the object in the scene cam-
era image as well as preventing the constant attempts by the ART system at identifying
objects unnecessarily. Secondly, the user’s eye gaze does not (of itself) initiate device
operation but instead initiates the presentation of a dedicated interface just for that de-
vice. This permits a check on whether or not the user does in fact wish to operate the
device. The ART system work flow is illustrated in Figure 4.15. The ART system has
been designed using a head-mounted eye tracking device for ease of system development.
The overall operational time of the system can be much improved in various ways, such
as programming in C rather than in Matlab as at present. However, there is a generic
weakness of using such a head-mounted eye tracker for prolonged periods of time as the
equipment can potentially cause user discomfort and fatigue. Consequently, eye track-
ing systems which are either head-mounted but smaller and lightweight or systems which
93
4 – Domotics
are physically remote from the user and have no attachment tothe user’s head are being
investigated for long term usage.
Figure 4.15. ART system flow chart
4.8 Guidelines
In strict collaboration within the COGAIN project we defined some guidelines, explaining
how to make control applications, for home automation system, accessible. The guide-
lines are intended for all eye tracking applications developers. The primary goal of these
guidelines is to promote safety and accessibility. The structure of this section is strongly
94
4.8 – Guidelines
inspired by W3C Web Content Accessibility Guidelines.
Each guideline has a priority level based on the impact on safety and accessibility:
• Priority 1 - A smart house application developer must satisfy these guidelines.
• Priority 2 - A smart house application developer should satisfy this guideline.
Category 1: Control applications safety
Guideline 1.1 Provide a fast, easy to understand and multimodal alarm notification. A
user needs to notice as soon as possible that the environmental control system has
sent an alarm. The control application should notify the alarms in several ways,
e.g., with sounds, flashing icons, text messages.
Guideline 1.2 Provide the user only a few clear options to handle alarm events. Several
gaze trackers are less accurate when the users are agitated,therefore in case of alarm
the control application should propose only a limited but clear set of options (3 at
most).
Guideline 1.3 Provide a default safety action to overcome an alarm event when the user
does not decide. In case of emergency the user could lose the control of the input
device, therefore the control application should take the safest decision after a time-
out. The time-out length should be dependent on the alarm type.
Guideline 1.4 Provide a confirmation request for critical and possibly dangerous opera-
tion. With inaccurate or badly configured Gaze trackers the Midas touch error can
be frequent, i.e., each object/ command gazed from the user is selected/executed,
therefore the control application should request a confirmation for possibly danger-
ous operations.
Guideline 1.5 Provide a STOP functionality that interrupts any operation. In some occa-
sions, the environmental control system can operate actions that the user does not
95
4 – Domotics
want, e.g., a selection of a wrong command, or automated and prescheduled sce-
narios, or the user changes idea, etc. The control application should allow a STOP
method for interrupting any operation.
Category 2: Input methods for control application
Guideline 2.1 Provide a connection with the Cogain ETU-driver. The Cogain ETU-
driver described in deliverable D2.3 is a single gaze communication standard that
allows any third party application to be driven by a range of different eye tracking
hardware systems. By using the driver, there is no need for anythird party appli-
cation to be changed or recompiled when switching between differing eye tracking
hardware systems.
Guideline 2.2 Support several input methods. The gaze tracker, unfortunately, can break
down, therefore the control application should support also alternative input meth-
ods, e.g. switch (scansion mode selection), keyboards, mice, etc.
Guideline 2.3 Provide reconfigurable layouts, appropriate for differenteye tracking per-
formances and user capabilities. Eye trackers have a very wide performance range;
therefore, a control application should have a reconfigurable visual interface adapt-
able to different resolutions and precisions of the eye trackers.
Guideline 2.4 Support more input methods at the same time (multimodal interaction).
The user could be able to use alternative input channels beyond the gaze, e.g. voice,
fingers movements, etc. The control application should support the combination of
more input method at the same time, for example selection with gaze and click with
mouse.
Guideline 2.5 Manage the loss of input control providing automated default actions. The
control application should understand when the user has lost the eye tracker control
and should provide default actions (e.g. recalibration, play an alarm, etc.).
96
4.8 – Guidelines
4.2.3 Category 3: Control applications operative features
Guideline 3.1 Respond to environmental control events and commands at the right time.
The control application should be responsive: it should manage events and com-
mands in an acceptable time slot.
Guideline 3.2 Manage events with different time critical priority. The control application
should distinguish between events with different priority. The time critical events
must be acted upon with a short fixed period (e.g. fire alarm, intrusion detection).
Guideline 3.3 Execute commands with different priority. The home automation systems
commonly receive more commands at the same time (e.g. different users, scenarios,
...). The control application should discriminate commands with different priority
and should adopt a prefixed management policy.
Guideline 3.4 Provide feedback when automated operations or commands areexecuting.
Scenarios, selected by the user, could include several scheduled commands. The
control application should show the actions in progress andinform the user when a
scenario is terminated.
Guideline 3.5 Manage (create, modify, delete) scenarios. Repeating a longsequence
of commands to do a frequent task could be tedious for the user. It is necessary
for gathering list of commands and manage them as a single one. The control
application should allow creation, modification and deletion of scenarios.
Guideline 3.6 Know the current status of any devices and appliances. The control appli-
cation should know the current status of any devices and appliances of the home, in
order to show that information and to take smart automated decision (e.g. prevent a
dangerous condition, activated energy saving plan, etc.).
97
4 – Domotics
Category 4: Control applications usability
Guideline 4.1 Provide a clear visualization of what is happening in the house. In ac-
cordance with the guidelines of category 1 and 3, the controlapplication interface
should provide a clear, easy understandable visualizationof the execution progress
of the commands.
Guideline 4.2 Provide a graceful and intelligible user interface. Consistent page layout,
easy to understand language, and recognizable graphics benefit all users. The con-
trol application should provide a graceful and intelligible user interface, possibly
using both images and clear texts.
Guideline 4.3 Provide a visualization of status and location of the house devices. The
control application should show the house map containing, for each room, a repre-
sentation of the devices and their status.
Guideline 4.4 Use colours, icons and text to highlight a change of status. The control
application interface should highlight the device status change using images, texts
and sounds.
Guideline 4.5 Provide an easy-to-learn selection method. In spite of the control applica-
tion could present complex features and functionally, it should provide an usable,
easy-to-learn interaction method.
98
Chapter 5
Conclusions
The research carried during the three years of doctorate, described in this thesis, lead to 2
publication on international journals an 9 publications oninternational conferences (see
Appendix C ).
Good results have been obtained both concerning the development of gaze-based as-
sistive software, and the design of the domotic gateway DOG.Domotics is a more and
more relevant research field involving two main aspects: theassisting living of disable
and elder persons, and the study of advanced solutions for energy saving. The future
researches, stemming from this thesis, will be concentrated in this last field .
The e-lite research group, where I worked these years, is planning to re-engineering
the DOG platform, whit the purpose of making it:
• more scalable and capable of running on devices with very limited computational
capacity.
• more reliable and safe, i.e., able to handle alarms and misbehaviors through the
application of autonomic solutions.
• smarter: by researching and implementing novel algorithmsallowing optimized
management of domestic resources (energy saving).
99
5 – Conclusions
• more accessible: by developing a gaze-based control interface for domotics sys-
tems, compliant with the COGAIN Recommendations.
Since the beginning of my study, in 2006, till now, the context of assistive technologies
is considerably changed. Thanks to the collaboration with COGAIN and the San Giovanni
hospital of Turin, our research group succeeded in spreadong information about new,
advanced, gaze-based assistive technologies and their benefits and impact on the quality
of life of persons affected by ALS. In 2006 the Piedmont RegionHealth Board did not
fund the purchase of eye tracking devices for severe motor disabled, therefore caregivers
are obliged to considerable expenses or raise funding. I’m not sure, if it is even for my
colleagues and mine merit (I like think so), but after the dissemination of our research
results and the organization of an international conference in Turin [103], the Piedmont
Region Health Board started to fund the purchase of eye trackers for people with ALS.
100
Bibliography
[1] United nations enable. http://www.un.org/disabilities/.
[2] Louis Emile Javal. Physiologie de la lecture et de lecriture.Bibliography in Annales
doculistique, pages 137–187, 1907.
[3] R. Cline T.S. Dodge. The angle velocity of eye movements.Psycholigical Review,
8:145–157, 1901.
[4] Judd C.H., McAllister C.N., and Steel W.M. General introduction to a series of
studies of eye movements by means of kinetoscopic photographs. Psychological
Review, Monograph Supplements, 7:1–16, 1905.
[5] D.G Paterson . and M.A. Tinker. Studies of typographicalfactors influencing speed
of reading 10: Style of typeface.Journal of Applied Psychology, 16:605–613,
1932.
[6] Milton J.L. Fitts P.M., Jones R.E. Eye movements of pilotsduring instruments
landing approach.Aeronautical Engineering Review, 9:1–7, 1950.
[7] H. Hartridge and L. C. Thomson. Method of investigating eye movements.British
Journal Ophthalmology, 32:581–591, 1948.
[8] B. SHACKEL. Pilot study in electro-oculography.The British journal of ophthal-
mology, 44:88–113, 1960.
101
BIBLIOGRAPHY
[9] NORMAN H. MACKWORTH and EDWARD LLEWELLYN THOMAS. Head-
mounted eye-marker camera.J. Opt. Soc. Am., 52:713–716, 1962.
[10] R. A. Monty and J. W. Senders.Eyemovements and psychological processes. Hills-
dale, New Jersey sey:Erlbaum Associates, 1976.
[11] Fisher D. F. Senders, J. W. and R. A. Monty.Eyemovements and higher psycho-
logical processes. Hillsdale, New Jersey: Erlbaum Associates, 1978.
[12] JW Senders DF Fisher, RA Monty.Eye movements: cognition and visual percep-
tion. L. Erlbaum Associates, 1981.
[13] T. N. Cornsweet and H. D. Crane. Accurate two-dimensionaleye tracker using first
and fourth purkinje images.J. Opt. Soc. Am, 63:921–928, 1973.
[14] Monty R.A. & Hall R.J. Lambert, R.H. High-speed data processing and unobtru-
sive monitoring of eye movements.Behavioral Research Methods & Instrumenta-
tion, 6:525–530, 1974.
[15] R.A. Monty. An advanced eye-movement measuring and recording system.Amer-
ican Psychologist, 30:331–335, 1975.
[16] J. Anliker. Eye movements: on-line measurement, analysis, and control. Eye
Movements and Psychological Processes, 1976.
[17] H. Collewijn. Eye movement recording.Vision research: A practical Guide to
Laboratory Methods, pages 245–285, 1999.
[18] E. Kowler. The role of visual and cognitive processes in the control of eye move-
ment. Elsevier Science Publishers, 1990.
[19] J.W. Senders. Four theoretical and practical questions. In Eye Tracking Research
and Applications Symposium, 2000.
102
BIBLIOGRAPHY
[20] S. K. Card. Visual search of computer command menus.Attention and Perfor-
mance X, Control of Language Processes, 1984.
[21] J.J. Hendrickson. Performance, preference, and visual scan patterns on a menu-
based system: implications for interface design. InProceedings of the ACM CHI89
Human Factors in Computing Systems Conference, 1989.
[22] A. & Rih Altonen, A. Hyrskykari. 101 spots, or how do usersread menus? In
Proceedings of CHI 98 Human Factors in Computing Systems, 1998.
[23] White K.P. Martin W.N. Reichert K.C. Hutchinson, T.E. and L.A. Frey. Human-
computer interaction using eye-gaze input.IEEE Transactions on Systems, Man,
and Cybernetics, 19:1527–1534, 1989.
[24] J.L. Levine. An eye-controlled computer. Technical report, IBM,Thomas J. Watson
Research Center,, 1981.
[25] J.L. Levine. Performance of an eyetracker for office use. Comput. Biol. Med.,
14:77=89, 1984.
[26] R.A. Bolt. Gaze-orchestrated dynamic windows.Computer Graphics, 15:109–119,
1981.
[27] R.A. Bolt. Eyes at the interface. InProceedings of the ACM Human Factors in
Computer Systems Conference, 1982.
[28] F.A. et al. Glenn. Eye-voice-controlled interface. InProceedings of the 30th Annual
Meeting of the Human Factors Society, 1986.
[29] C. Ware and Mikaelian. An evaluation of an eye tracker as adevice for computer
input. In Proceedings of the ACM CHI+GI87 Human Factors in Computing Sys-
tems Conference, 1987.
[30] A. T Duchowski. A breadth-first survey of eye tracking applications. Behavior
Research Methods, Instruments, & Computers, 4:455–470, 2002.
103
BIBLIOGRAPHY
[31] Young and Sheena. Survey of eye movement recording methods. Behavior Re-
search Methods and Instrumentation, 7, 1975.
[32] A. Glenstrup and T. Engell-Nielsen. Eye controlled media: Present and future state,
1995. http://www.diku.dk/panic/eyegaze/article.html.
[33] MetroVision System. Mon vog from metrovision systems.
[34] EyeTech Systems. Quick glance from eyetech systems.
[35] SensoMotoric Instruments. Sensomotoric instruments.
[36] Dan Witzner Hansen and Arthur E. C. Pece. Eye tracking in the wild. Comput. Vis.
Image Underst., 98(1):155–181, 2005.
[37] Gips J., Olivieri P., and J. Tecce. Direct control of thecomputer through electrodes
placed around the eyes. InFifth International Conference on Human-Computer
Interaction, pages 630–635, 1993.
[38] Mark Shelhamer Aaron Wong Dale Roberts. A new wireless search-coil system. In
Proceedings of Eye tracking research and applications Symposium (ETRA 2008),
pages 197–204, 2008.
[39] Armen R Kherlopian, Joseph P Gerrein, Minerva Yue, Kristina E Kim, Ji Won
Kim, Madhav Sukumaran, and Paul Sajda. Electrooculogram based system for
computer control using a multiple feature classification model. Conf Proc IEEE
Eng Med Biol Soc, 1:1295–8, 2006.
[40] S R Cohen, S A Hassan, B J Lapointe, and B M Mount. Quality oflife in hiv disease
as measured by the mcgill quality of life questionnaire.AIDS, 10(12):1421–7,
October 1996.
[41] S R Cohen, B M Mount, M G Strobel, and F Bui. The mcgill quality of life
questionnaire: a measure of quality of life appropriate forpeople with advanced
104
BIBLIOGRAPHY
disease. a preliminary study of validity and acceptability. Palliat Med, 9(3):207–
19, July 1995.
[42] E. Diener, R. Emmons, J. Larsen, and S. Griffin. The satisfaction with life scale.
Personality Assesment, 49(1):71–75, 1985.
[43] W. Pavot and E. Diener. Review of the satisfaction with life scaledien.Psycholog-
ical Assessment, 5:164–172, 1993.
[44] W. Zung. A self-rating depression scale.Archives of General Psychiatry, 12:63–70,
1965.
[45] M Novak and C Guest. Application of a multidimensional caregiver burden inven-
tory. Gerontologist, 29(6):798–803, December 1989.
[46] John F Deeken, Kathryn L Taylor, Patricia Mangan, K RobinYabroff, and Jane M
Ingham. Care for the caregivers: a review of self-report instruments developed
to measure the burden, needs, and quality of life of informalcaregivers.J Pain
Symptom Manage, 26(4):922–53, October 2003.
[47] W3c web accessibility initiative. http://www.w3.org/WAI/intro/accessibility.php.
[48] W3C. Web content accessibility guidelines 1.0.http://www.w3.org/TR/
WAI-WEBCONTENT/, 1999.
[49] W3C. Web content accessibility guidelines 2.0.http://www.w3.org/TR/
WCAG20/, 2007.
[50] W3C. User agent accessibility guidelines 1.0.http://www.w3.org/TR/
UAAG10/, 2002.
[51] Eye Response Technologies. Erica system.http://www.eyeresponse.
com/ , 2005.
[52] Tobii Technology. Mytobii 2.3.http://www.tobii.com , 2006.
105
BIBLIOGRAPHY
[53] B. H. Thomas and W. Piekarski. Glove based user interaction techniques for aug-
mented reality in an outdoor environment.Virutal Reality, pages 167–180, 2002.
[54] S. L. Oviatt. Mutual disambiguation of recognition errors in a multimodal architec-
ture. InProceedings of the Conference on Human Factors in Computing Systems
(CHI’99), pages 576–583. ACM Press, 1999.
[55] R. A. Bolt. “put-that-there”: Voice and gesture at the graphics interface. InSIG-
GRAPH ’80: Proceedings of the 7th annual conference on Computer graphics and
interactive techniques, pages 262–270, New York, NY, USA, 1980. ACM Press.
[56] P. R. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen,
and J. Chow. Quickset: Multimodal interaction for distributed applications. In
Proceedings of the Fifth ACM International Multimedia Conference, pages 31–40.
ACM Press, 1997.
[57] Rajeev Sharma, Vladimir Pavlovic, and Thomas Huang. Toward multimodal
human-computer interface. InProceedings of the IEEE, volume 86, pages 853–
869, May 1998.
[58] D. Miniotas, O. Spakov, I. Tugoy, and I. S. MacKenzie. Speech-augmented eye
gaze interaction with small closely spaced targets. InProceedings of the 2006
symposium on Eye tracking research and applications, pages 66–72. ACM Press,
2006.
[59] Q. Zhang, A. Imamiya, K. Go, and X. Gao. Overriding errors in speech and gaze
multimodal architecture. InProc. 9th International Conference on Intelligent User-
Interfaces (2004), pages 346–348. ACM Press, 2004.
[60] Melanie Baljko. The information-theoretic analysis ofunimodal interfaces and
their multimodal counterparts. InAssets ’05: Proceedings of the 7th international
ACM SIGACCESS conference on Computers and accessibility, pages 28–35, New
York, NY, USA, 2005. ACM Press.
106
BIBLIOGRAPHY
[61] Rajarathinam Arangarasan, Tushar H. Dani, Chi-Cheng Chu, Xiaochun Liu, and
Rajit Gadh. Geometric modeling in multi-modal, multi-sensory virtual environ-
ment. InProceedings of 2000 NSF Design and Manufacturing Research Confer-
ence, pages 3–6, 2000.
[62] H. Dudley. The vocoder.Bell Labs Record, 17:122–126, 1939.
[63] H. Dudley, R. R. Riesz, and S. A. Watkins. A synthetic speaker. J. Franklin
Institute, 227:739–764, 1939.
[64] K. H. Davis, R. Biddulph, and S. Balashek. Automatic recognition of spoken digits.
J. Acoust. Soc. Am, 24:627–642, 1952.
[65] J. Sakai and S. Doshita. The phonetic typewriter.Information Processing 1962
Proc. IFIP, 1922.
[66] L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finte
state markov chains.Annals of Mathematical Statistics, 37:1554–1563, 1966.
[67] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occur-
ring in the statistical analysis of probabilistic functions of markov chains.Annals
of Mathematical Statistics, 41:164–171, 1970.
[68] F. Jelinek. Continuous speech recognition by statistical methods. Proc. IEEE,
64:532–536, 1976.
[69] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi. An introduction to the application
of the theory of probabilistic functions of a markov processto automatic speech
recognition.Bell Syst. Tech. J., 62:1035–1074, 1983.
[70] Claudia Romellini and Daniele Sereno. The essential roleof a speech platform in
deploying effective and reliable speech enabled applications, 2004.
107
BIBLIOGRAPHY
[71] Poika Isokoschi and Benoit Martin. Eye tracker input in first person shooter games.
The 2nd Conference on Communication by Gaze Interaction – COGAIN 2006:
Gazing into the Future, pages 79–81, 2006.
[72] Poika Isokoski, Aulikki Hyrskykari, Sanna Kotkaluoto, and Benoit Martin.
Gamepad and eye tracker input in fps games: Data for the first 50 minutes.The 3nd
Conference on Communication by Gaze Interaction – COGAIN 2007:Gaze-based
Creativity and Interacting with Games and On-line Communities, pages 11–15,
2007.
[73] Michael Dorr, Martin Bohme, Thomas Martinetz, and Erhardt Barth. Gaze beats
mouse: a case study.The 3rd Conference on Communication by Gaze Interaction
– COGAIN 2007: Gaze-based Creativity and Interacting with Games and On-line
Communities, pages 18–21, 2007.
[74] Howell Istance, Richard Bates, Aulikki Hyrskykari, and Stephen Vickers. Snap
clutch, a moded approach to solving the midas touch problem.In ETRA ’08: Pro-
ceedings of the 2008 symposium on Eye tracking research applications, pages 221–
228, New York, NY, USA, 2008. ACM.
[75] John Paulin Hansen, Dan Witzner amd Johansen Anders Sewerin Hansen, and
Elvesjo John. Mainstreaming gaze interaction towards a mass market for the ben-
efit of all. 3rd international conference on universal access in human-computer
interaction, 2005.
[76] Richard Bates and Oleg Spakov. Implementation of cogain gaze tracking standards.
Communication by Gaze Interaction (COGAIN), 2006.
[77] Scott Davidoff, Min Kyung Lee, John Zimmerman, and Anind Dey. Principle of
Smart Home Control. InProceedings of the Conference on Ubiquitous Computing,
pages 19–34. Springer, 2006.
108
BIBLIOGRAPHY
[78] Sheng-Luen Chung and Wen-Yuan Chen. MyHome: A ResidentialServer for
Smart Homes.Knowledge-Based Intelligent Information and Engineering Systems,
4693/2007:664–670, 2007.
[79] Home gateway technical requirements: Residential profile. Technical report, Home
Gateway Initiative, 2008.
[80] M.A. Just and P.A. Carpenter. Eye fixations and cognitiveprocesses. InCognitive
Psychology 8, pages 441–480, 1976.
[81] Vertegaal, A. Mamuji, C. Sohn and D. Cheng. Media eyepliances: using eye track-
ing for remote control focus selection of appliances. InIn CHI Extended Abstracts,
pages 1861–1864, 2005.
[82] L. Jiang, D. Liu, and B. Yang. Smart home research. InProceedings of the Third
Conference on Machine Learning and Cybernetics SHANGHAI, pages 659–664,
August 2004.
[83] The BTicino MyHome system.http://www.myhome-bticino.it .
[84] The Konnex association.http://www.konnex-knx.com .
[85] Dave Rye. The X10 PowerHouse powerline interface. Technical report, X10 Pow-
erHouse, 2001.
[86] The LonWorks platform. http://www.echelon.com/developers/
lonworks/default.htm .
[87] Vittorio Miori, Luca Tarrini, Maurizio Manca, and Gabriele Tolomei. An Open
Standard Solution for Domotic Interoperability.IEEE Transactions on Consumer
Electronics, 52:97–103, 2006.
109
BIBLIOGRAPHY
[88] Kyeong-Deok Moon, Young-Hee Lee, Chang-Eun Lee, and Young-Sung Son. De-
sign of a Universal Middleware Bridge for Device Interoperability in Heteroge-
neous Home Network Middleware.IEEE Transactions on Consumer Electronics,
51:314–318, 2005.
[89] Eiji Tokunaga, Hiro Ishikawa, Makoto Kurahashi, Yasunobu Morimoto, and Tatsuo
Nakajima. A Framework for Connecting Home Computing Middleware. In Inter-
national Conference on Distributed Computing Systems Workshops (ICDCSW02),
2002.
[90] F.Shi, A. Gale, and K. Purdy. Direct Gaze-Based Environmental Controls. InThe
2nd Conference on Communication by Gaze Interaction, pages 36–41, 2006.
[91] D. Bonino and A. Garbo. An Accessible Control Applicationfor Domotic Environ-
ments. InFirst International Conference on Ambient Intelligence Developments,
pages 11–27, 2006.
[92] OSGi alliance.http://www.osgi.org/ .
[93] D. Bonino and F. Corno. Dogont - ontology modeling for intelligent domotic en-
vironments. In7th International Semantic Web Conference, 2008.
[94] T.R. Gruber. Toward principles for the design of ontologies used for knowledge
sharing.International Journal Human-Computer Studies, 43(5-6):907–928, 1995.
[95] D. L. McGuinness and F. van Harmelen. Owl web ontology language. W3C Rec-
ommendation,http://www.w3.org/TR/owl-features/ , February 2004.
[96] The Eclipse Equinox project.http://www.eclipse.org/equinox/ .
[97] The Apache XML-RPC API.http://ws.apache.org/xmlrpc/ .
[98] R.J.K Jacob and K.S. Karn. Eye Tracking in human computerinteraction and
usability research: Ready to deliver the promises. InThe Mind’s Eye: Cognitive
and Applied Aspects of Eye Movement Research, pages 573–605, 2003.
110
BIBLIOGRAPHY
[99] A.G. Gale. Attention responsive technology and ergonomics. In Bust P.D. and
McCabe P.T., editors,Contemporary Ergonomics 2005, pages 273–276, 2005.
[100] F. Shi , A.G. Gale , K.J. Purdy. Eye-centric ICT control.In Contemporary Er-
gonomics 2006, (Taylor and Francis, London), pages 215–218, 2006.
[101] F. Shi , A.G. Gale , K.J. Purdy. Helping People with ICT Device Control by Eye
Gaze. InMiesenberger K., Klaus J., Zagler W. and Karshmer A. (Eds.) Lecture
Notes in Computer Science (Springer Verlag Berlin), pages 480–487, 2006.
[102] Lowe D.G. Distinctive Image Features from Scale-Invariant Keypoints. InInter-
national Journal of Computer Vision., volume 2, pages 91–110, 2004.
[103] cogain 2006: Gazing into the future. InThe 2nd Conference on Communication
by Gaze Interaction, 2006.
111
BIBLIOGRAPHY
112
Appendix A
Abbreviations
AAC Augmentative and Alternative Communication systems
ALS Amyothrofic Lateral Sclerosys
ART Attention Responsive Technology
ASE Accessible Surfing Extension
COGAIN CO mmunication byGAzeIN teraction
DGC Direct Gaze Control
DOG DomoticOSGi Gateway
DOM Document Object Model
GKB Gaze and keyboard button
GT Gaze to Target
GTK Gaze tracking and keyboard
GW Gaze Window
IDE Intelligent Domotic Environment
113
A – Abbreviations
IGM Independent Gaze and Movement
MS Multiple Sclerosys
MGS Mc Gill scale
SPBS Self-Perceived Burden Scale
SWLS SatisfactionWith Life Scale
VK Virtual Keyboard
114
Appendix B
Convention on the Rights of Persons
with Disabilities
The Convention is a response to an overlooked development challenge: approximately
10% of the world’s population are persons with disabilities(over 650 million persons).
Approximately 80% of whom live in developing countries It isa response to the fact that
although pre-existing human rights conventions offer considerable potential to promote
and protect the rights of persons with disabilities, this potential was not being tapped.
Persons with disabilities continued being denied their human rights and were kept on the
margins of society in all parts of the world. The Convention sets out the legal obligations
on States to promote and protect the rights of persons with disabilities. It does not create
new rights.
B.1 Guiding Principles of the Convention
There are eight guiding principles that underlie the Convention and each one of its specific
articles:
1. Respect for inherent dignity, individual autonomy including the freedom to make
one’s own choices, and independence of persons
115
B – Convention on the Rights of Persons with Disabilities
2. Non-discrimination
3. Full and effective participation and inclusion in society
4. Respect for difference and acceptance of persons with disabilities as part of human
diversity and humanity
5. Equality of opportunity
6. Accessibility
7. Equality between men and women
8. Respect for the evolving capacities of children with disabilities and respect for the
right of children with disabilities to preserve their identities
116
Appendix C
Pubblications
• BONINO D, CASTELLINA E., CORNO F, GALE A, GARBO A, PURDY K, SHI
F (in stampa).A blueprint for integrated eye-controlled environments. UNIVER-
SAL ACCESS IN THE INFORMATION SOCIETY, ISSN: 1615-5289
• BONINO D, CASTELLINA E., CORNO F (2008).The DOG Gateway: Enabling
Ontology-based Intelligent Domotic Environments. IEEE TRANSACTIONS ON
CONSUMER ELECTRONICS, vol. 54/4, ISSN: 0098-3063, doi: 10.1109/TCE.2008.4711217
• BONINO D, CASTELLINA E., CORNO F (2008).DOG: an Ontology-Powered
OSGi Domotic Gateway. In: 20th IEEE Int’l Conference on Tools with Artificial
Intelligence. Dayton, Ohio, USA, November 3-5IEEE
• CALVO A, CHI A, CASTELLINA E., CORNO F, FARINETTI L, GHIGLIONE P,
PASIAN V, VIGNOLA A (2008). Eye Tracking Impact on Quality-of-Life of ALS
Patients. In: Lecture Notes in Computer Science 5105. Linz, Austria, 9/7/2008 -
11/7/2008K. Miesenberger et al., p. 70-77
• CASTELLINA E., CORNO F (2008).Multimodal Gaze Interaction in 3D Virtual
Environments. In: COGAIN 2008Communication, Environment and Mobility Con-
trol by Gaze. Prague, September 2-3, p. 34-38, ISBN/ISSN: 978-80-01-04151-2
117
C – Pubblications
• CASTELLINA E., CORNO F, PELLEGRINO P (2008).Integrated Speech and
Gaze Control for Realistic Desktop Environments. In: ETRA ’08 Symposium on Eye
Tracking Research & Applications. Savannah, GA, U.S.A, March 26-28, 2008ACM
Press
• DARIO B, CASTELLINA E., CORNO F (2008).Uniform Access to Domotic En-
vironments through Semantics. In: SWAP 2008 - Fifth Workshop on Semantic Web
Applications and Perspective. Rome, 16,17 december 2008
• CASTELLINA E., CORNO F (2007).Accessible Web Surfing through gaze inter-
action. In: Gaze-based Creativity, Interacting with Games and On-line Communi-
ties. Leicester, UK, 3-4 september 2007, p. 74-77
• D. BONINO, CASTELLINA E., F. CORNO, A. GARBO (2006).Control Applica-
tion for Smart House through Gaze interaction. In: Proceedings of COGAIN 2006
Gazing into the Future. Torino, 03/09/2006, p. 32-35
118