Assistive Technology and Applications for the independent ...elite.polito.it/files/castellina_phd_tesi.pdf · solutions based on adopting eye tracking tools and environmental control

POLITECNICO DI TORINO

Doctorate SchoolDoctorate in Information and System Engineering

Tesi di Dottorato

Assistive Technology and Applicationsfor the independent living

of severe motor disabled users.

Emiliano CASTELLINA

Tutorprof. Fulvio Corno

April 2009

II

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Eye tracking 9

2.1 Historical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.2 Experimental settings . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.3 Case studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.4 Quantitative Results . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 User comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5 Overall results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Software Applications 25

3.1 Web Browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Accessible Surfing Extension (ASE) . . . . . . . . . . . . . . . .27

3.1.2 Preliminary Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2 Multimodal interaction . . . . . . . . . . . . . . . . . . . . . . . . . . .32

3.2.1 State of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

III

3.2.2 Speech Recognition Background . . . . . . . . . . . . . . . . . . 36

3.2.3 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.2.4 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.5 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.2.6 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.2.7 Name Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.2.8 Role Ambiguity . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.3 Gaze interaction in 3d environments . . . . . . . . . . . . . . . . .. . . 50

3.3.1 Control and Interaction Techniques . . . . . . . . . . . . . . . .51

3.3.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4 Domotics 57

4.1 Domotics Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 General architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63

4.3 Intelligent Domotic Environments . . . . . . . . . . . . . . . . . .. . . 65

4.4 OSGi framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.5 DOG Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.5.1 Ring 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.5.2 Ring 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

4.5.3 Ring 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.5.4 Ring 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.6 Ontology-based interoperation in DOG . . . . . . . . . . . . . . .. . . . 77

4.6.1 Start-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.6.2 Runtime command validation . . . . . . . . . . . . . . . . . . . 79

4.6.3 Inter-network scenarios . . . . . . . . . . . . . . . . . . . . . . . 79

4.7 Case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

4.7.1 Dynamic Startup Configuration . . . . . . . . . . . . . . . . . . 82

4.7.2 Complex Command Execution . . . . . . . . . . . . . . . . . . . 83

4.7.3 Adding a new device . . . . . . . . . . . . . . . . . . . . . . . . 84

IV

4.7.4 Comparison of DOG to related works . . . . . . . . . . . . . . . 86

4.7.5 Mediated Interaction . . . . . . . . . . . . . . . . . . . . . . . . 86

4.7.6 Direct interaction . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.8 Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5 Conclusions 99

A Abbreviations 113

B Convention on the Rights of Persons with Disabilities 115

B.1 Guiding Principles of the Convention . . . . . . . . . . . . . . . . . .. . 115

C Pubblications 117

V

List of Tables

2.1 Eye tracking techniques comparison . . . . . . . . . . . . . . . . .. . . 16

3.1 Object classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

3.2 Control and Interaction Techniques . . . . . . . . . . . . . . . . . .. . . 52

3.3 Game 1 Test: Precision and Time . . . . . . . . . . . . . . . . . . . . . 55

3.4 Games 2 and 3: Precision and Time . . . . . . . . . . . . . . . . . . . . 56

4.1 Requirements for Home Gateways in IDEs. . . . . . . . . . . . . . . .. 66

4.2 Interfaces defined by the DOG library bundle . . . . . . . . . . .. . . . 72

4.3 Requirements satisfied by related works, in comparison with DOG. . . . . 86

VI

List of Figures

1.1 Research Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1 Eye tracking systems classification . . . . . . . . . . . . . . . . .. . . . 13

2.2 Infrared light reflections . . . . . . . . . . . . . . . . . . . . . . . . .. 14

2.3 Quality of Life (McGill Scale) . . . . . . . . . . . . . . . . . . . . . .. 21

2.4 Depression (ZDS) and self-estimated burden (SPBS) . . . . .. . . . . . 21

2.5 SWLS (satisfaction with life scale) . . . . . . . . . . . . . . . . . .. . . 22

2.6 ALS Centre questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1 MyTobii Web Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Accessible surfing on Cogain.org . . . . . . . . . . . . . . . . . . . . .. 28

3.3 ASE architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 The general architecture of the proposed system . . . . . . .. . . . . . . 31

3.5 Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.6 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Name Ambiguity: Unique and ambiguous (Indiscriminableand Discrim-

inable) objects vs. GW size . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.8 Role Ambiguity: Unique and ambiguous(Indiscriminable and Discrim-

inable) objects vs. GW size . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.9 ScreenShots of the games . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.1 The general architecture of the proposed domotic system. . . . . . . . . 64

4.2 DOG architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.3 An overview of the DogOnt ontoloy . . . . . . . . . . . . . . . . . . . .75

VII

4.4 SPARQL queries for retrieving all BTicino OpenWebNet (a) and all KNX

(b) in DogOnt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.5 The SPARQL query needed to retrieve the commands that can be issued

to a specific device, e.g. a DimmerLamp. . . . . . . . . . . . . . . . . . 80

4.6 The switch-all-lights-off rule, in Turtle notation. . .. . . . . . . . . . . . 80

4.7 The demonstration cases used to perform functional tests on DOG. . . . . 81

4.8 Complex Command Execution . . . . . . . . . . . . . . . . . . . . . . . 83

4.9 Adding a new device . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4.10 The control application with a quite accurate tracker.. . . . . . . . . . . 87

4.11 The control application with a low-cost visible light tracker. . . . . . . . . 88

4.12 ASL 501 headband attaching the two optics system. . . . . .. . . . . . . 91

4.13 The ART system control interface . . . . . . . . . . . . . . . . . . .. . 92

4.14 Typical stages of the ART system (a. Stability of eye gaze captured b.

Gaze on object detected c. Control initiated) . . . . . . . . . . . . .. . . 93

4.15 ART system flow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

VIII

Chapter 1

Introduction

Disability is an evolving concept, and . . . disability results from the inter-

action between persons with impairments and attitudinal and environmental

barriers that hinders full and effective participation in society on an equal

basis with others.

Preamble of Convention on the Rights of Persons with Disabilities [1].

1.1 Motivation

An estimated 10% of the world’s population, approximately 650 million people, experi-

ence some form of disability. The number of people with disabilities increasesdepending

on factors such as population growth, aging and medical advances that preserve and pro-

long life. These factors are creating considerable demandsfor health and rehabilitation

services. Furthermore, the lives of people with disabilities are made more difficult by the

way society interprets and reacts to disability which require environmental and attitudinal

changes. The UN convention on the rights of persons with disabilities 1 emphasizes the

importance of mainstreaming disability issues for sustainable development. Attention to

1Further details are provided in Appendix B

1

1 – Introduction

health and its social determinants are essential to promoteand protect people with dis-

abilities and for greater fulfillment of human rights. Need for strong and evidence based

information: despite the magnitude of the issue, awarenessof and scientific information

on disability issues arelacking. There is no agreement on definitions and little interna-

tionally comparable information on the incidence, distribution and trends of disability or

impairments. Despite the significant changes, achieved over the past two decades in the

field of disability and rehabilitation, there is no comprehensive evidence base.

Disability in Italy

Disabled people in Italy count about 3 million and 600 thousand individuals. Disability

has a huge impact on the quality of life, limiting people independence. Is is predomi-

nantly spread among elderly, since with aging the persons are afflicted by some chronic

invalidating diseases at the same time.

Among disabled population the ratio of people affected by one (59,4%) or more

(62,2%) severe chronic deseases is noticeably higher than the ratio of non-disabled per-

sons (respectevely 11,6% and 12,3%).

In Italy, 2,1% of the population, i.e. one million and 130 thousands persons, are

confined at home. Confinement means that a person is bedridden, or obliged to stay

home because of physical or psychological impediments. Among the elderlies the ratio

of confinded persons is 8,7%: the women present a ratio (10,9%) double than men (5,6%).

More than 500 thousand persons (1,1% of the population) haveproblems involving the

communication sphere, such as partial or total blindness, deafness or mutism. The greater

part of disabled persons, 1 million and 374 thousands (52,7%), is afflicted by more than

one disability.

Typically the family is the caregiver of the person with disability and it represents an

essential resource for facing the limitation and troubles caused by disability. The ratio of

Italian families with a least one disabled person is 10,3%. The greater part of families

(58,3 %) include a non-disabled person that can take care of the relatives with disabilities.

Almost the 80% of families with disabled persons at home is not assisted by public health

2

1.1 – Motivation

services.

This lack of assistance is not even supported by private domicile services: as a matter

of fact the 70% of the families do not make use of neither public or private assistance.

Severe motor disability

Some chronic degenerative diseases lead to communication problems and individual con-

finement. The progressive and total loss of motor functions induces in these patients ana-

trhria, i.e., inability of using the normal Augmentative and Alternative Communication

systems (AAC), and also the impossibility of interacting with their surrounding environ-

ment. These are in particular the subjects with AmyotrophicLateral Sclerosis (ALS) that

progressively loss the use of all muscular function (speech, use of upper and lower limbs,

respiration) due to a progressive degeneration of spinal and cortical motor neurons, the

neurons that control the movement of spinal muscles. Another pathology that can cause

a similar damage, although in longer times, is multiple sclerosis, that affects all nervous

functional systems, with a severe impairment of speech due to dysarthria, and the loss of

a functional use of limbs due to ataxia and spasticity. In this context, the loss of speech

capacity and of the skills necessary for the use of strategies of augmentative/alternative

communication is a cause oftotal isolation of patients. Subjects affected by neurodegen-

erative disorders in advanced phase have several symptoms and problems spanning a long

period of time, with a very negative impact on their quality of life. Furthermore, many

other diseases or traumatic injuries can limit the independence and the communication

capabilities of the people. Many researches are analyzing these issues, trying to identify

the various aspects involved in the care of these patients.

Palliative approaches have a lot to offer to patients affected by neurodegenerative dis-

orders, since, in general, the old population has chronic non-oncologic disorders. The

projects improving (or allowing) effective communicationin person with neurological

disorders are included in the field of palliative care.

In patients affected by ALS and MS, even if other communication forms are damaged

3

1 – Introduction

or lost, the ability of controlling eye movements is typically maintained. Since the be-

ginning of the twentieth-century, researchers investigated techniques for study the move-

ments and the physiology of the human eyes. Many of the techniques involved recording

and tracking eye movements, and estimating the zone gazed bythe users. Nowadays such

techniques are more and more advanced and are mainly used in military filed, advertising

and medical field (psychological analysis and assistive technologies). Devices that use

gaze tracking techniques are calledeye trackers; They are able to determine the direction

of the user’s gaze and to use it as an input channel. Communication instruments based on

eye-tracking are for some patients a possible solution, both to have a useful communica-

tion with family members or at longer distance, using Internet, and for the possibility to

control the home environment using a computer controlled bytheir eye movements.

1.2 Contribution

The main objectives of this doctorate thesis are to study, design and experiment new

solutions based on adopting eye tracking tools and environmental control applications for

improving thequality of lifeof patients with severe motor disabilities. These objectives

involve four different research topics (Figure 1.1):

• Computer Vision: the basis of most eye tracking technologies.

• Human Computer Interaction: research, development and testof assistive software.

• Ubiquitous Computing: research, project and development ofintegrated domotic

solutions.

• Ambient Intelligence: improve domotic environment with intelligent behaviors.

This thesis tackles, even if in partial and limited way, the communication and inde-

pendence needs of severe motor disabled persons. With such apurpose, great part of

the research has been done in strict collaboration with the Nuroscience Department of

4

1.2 – Contribution

Figure 1.1. Research Topics

S.Giovanni Battista Hospitalof Turin. This collaboration has been part of the wider con-

text of COGAIN , a network of excellence onCOmmunication byGAze IN teraction.

Members of the network are researchers, eye tracker developers and people who work

directly with users with disabilities, in user centers and hospitals. The COGAIN consor-

tium includes widely recognized experts,from research groups and companies, working

on advancements of this cutting edge technology. More than 100 researchers from more

than 10 countries are involved in the activities of the COGAINNetwork, and its impact

is growing. The COGAIN consortium is supported by two advisory bodies: A Board of

User Communities (BUC) and a Board of Industrial Advisors (BIA) in order to ensure

the best outcome and take-up of the results. COGAIN was launched with EU funding

in 2004, and its goal is to become self supporting when by the end of 2009 (when EU

fundings will cease).

In spite of many very accurate commercial eye tracker are available on the market,

they are not much widespread, in fact less then 10% of the persons, that could benifit

from eye tracking technologies, are using it.

There are three main factors that reduce the diffusion of eyetrackers:

• they are too expensive (from 5000 to more then 20000 euros);

• they have few advanced and effective applications;

• they are not know by most of the caregivers.2

2This was true especially in 2006, at the starting phase of theresearch

5

1 – Introduction

The initial phase of the thesis regarded the analysis of state-of-the-art eye tracking

algorithms, looking for suitable techniques for low cost eye tracking solution. After that,

a benchmark platform for eye tracking algorithms ha has beendesigned and implemented.

This platform allowed to obtain objective and repeatable measurements about accuracy

and performance of the investigated methods.

In the context of the COGAIN Project and of the collaboration with the ALS center

of Turin (Nueroscience Department), it has been performed an experimentation about the

impact of eye tracking technology on the quality of life of ALS patients. Furthermore the

research pursued in the project and development of assistive software for aided internet

navigation and multimodal control of the operative system.In the field of gaze technolo-

gies not only related to disabled person, the research investigated new interaction patterns,

based on eye tracking, for the navigation of 3D virtual environments.

Finally, the last part of the phd thesis has been concentrated on the fulfillment of the

independence needs of sever motor disabled, i.e., it was focused on the domotic field.

The investigation about domotics involved both the low level issues, i.e. the project of an

intelligent residential gateway for interoperation amongdifferent domotic technologies,

and the study of novel interaction patterns for controllingthe domotic enviroment through

the gaze.

1.3 Structure of the Thesis

The remainder of this thesis is organized as follows:

Chapter 2 describes the history and state-of-the art of eye tracking devices. Further-

more it reports the methodology and results of an experimentation involving ALS

patients.

Chapter 3 delineates the architecture, design and development of three assistive software

based on gaze and multimodal interaction patterns.

Chapter 4 gives a description of the domotic field for assisted living.It investigate the

6

1.3 – Structure of the Thesis

design of the Domotic OSGi Gateway that allow transformsipledomotic house in

IntelligentDomoticEnviroment. It also explains two different paradigm to control

the Domotic Environment through gaze interaction.

Chapter 5 eventually concludes the thesis and provides an overview onpossible future

works.

7

1 – Introduction

8

Chapter 2

Eye tracking

This chapter presents an overview of eye tracking techniques from the dawning to the

possible future developments. In addition to an in-depth study of the specific literature, it

reports an experimentation about the impact of eye trackingtechnologies on the quality

of Life of ALS patients.

2.1 Historical Overview

Eye tracking research started at least 100 before the spreading of personal computers. The

first research on eye tracking and analysis of the ocular movement goes back to 1878 to

Emile Javal [2]. The methods used to track, described on thiswork, were as invasive as

requiring a direct contact with the cornea.

The first not invasive technique of eye tracking was developed by Dodge and Cline

into 1901 [3]. This technique allowed to track the horizontal position of the eye onto a

photographic film, though the patient was required to be completely immobilized during

the analysis of the light reflected by the cornea. At the same period goes back the use of

the technique of the cinematographer which allowed to record the appearance of the ocular

movement during a time interval [4]. This technique analyzes the reflection produced by

the incidental light on a white spot inserted into the eye. These and other researches

concerning the study of the ocular movement made further headway during the first half

9

2 – Eye tracking

of the 20th Century as techniques combined into different ways.

In the 1930s, Miles Tinker and Paterson began to apply photographic techniques to

study eye movements in reading [5]. They varied typeface, print size, page layout, etc.

and studied the resulting effects on reading speed and patterns of eye movements. In 1947

Paul Fitts and his colleagues [6] began using motion picturecameras to study the move-

ments of pilots eyes as they used cockpit controls and instruments to land an airplane.

The Fitts et al. study represents the earliest application of eye tracking to what is now

known as usability engineering the systematic study of users interacting with products

to improve product design. Around that time Hartridge and Thompson [7] invented the

first head-mounted eye tracker. Crude by current standards, this innovation served as a

start to freeing eye tracking study participants from tightconstraints on head movement.

In the 1960s, Shackel [8] and Mackworth and Thomas [9] advanced the concept of head-

mounted eye tracking systems making them somewhat less obtrusive and further reducing

restrictions on participant head movement. In another significant advance relevant to the

application of eye tracking to human-computer interaction, Mackworth devised a system

to record eye movements superimposed on the changing visualscene viewed by the par-

ticipant. Eye movement research and eye tracking flourishedin the 1970s, with great

advances in both eye tracking technology and psychologicaltheory to link eye tracking

data to cognitive processes [10, 11, 12].

Much of the relevant work in the 1970s focused on technical improvements to increase

accuracy and precision and reduce the impact of the trackerson those whose eyes were

tracked. The discovery that multiple reflections from the eye could be used to dissociate

eye rotations from head movement [13] increased tracking precision and also prepared

the ground for developments resulting in greater freedom ofparticipant movement. Using

this discovery, two joint military / industry teams (U.S. Airforce / Honeywell Corporation

and U.S. Army / EG&G Corporation) each developed a remote eye tracking system that

dramatically reduced tracker obtrusiveness and its constraints on the participant [14, 15].

These joint military /industry development teams and others made even more important

10

2.1 – Historical Overview

contributions with the automation of eye tracking data analysis. The advent of the mini-

computer in that general timeframe2x provided the necessary resources for high-speed

data processing. This innovation was an essential precursor to the use of eye tracking data

in real-time as a means of human-computer interaction [16].Nearly all eye tracking work

prior to this used the data only retrospectively, rather than in real time (in early work,

analysis could only proceed after film was developed). The technological advances in eye

tracking during the 1960s and 70s are still seen reflected in most commercially available

eye tracking systems today [17].

Psychologists who studied eye movements and fixations priorto the 1970s generally

attempted to avoid cognitive factors such as learning, memory, workload, and deployment

of attention. Instead their focus was on relationships between eye movements and simple

visual stimulus properties such as target movement, contrast, and location. Their solution

to the problem of higher-level cognitive factors had been toignore, minimize or postpone

their consideration in an attempt to develop models of the supposedly simpler lower-

level processes, namely, sensorimotor relationships and their underlying physiology [18].

But this attitude began to change gradually in the 1970s. Whileengineers improved eye

tracking technology, psychologists began to study the relationships between fixations and

cognitive activity. This work resulted in some rudimentary, theoretical models for relating

fixations to specific cognitive processes. Of course scientific, educational, and engineer-

ing laboratories provided the only home for computers during most of this period. So

eye tracking was not yet applied to the study of human-computer interaction at this point.

Teletypes for command line entry, punched paper cards and tapes, and printed lines of

alphanumeric output served as the primary form of human-computer interaction.

As Senders [19] pointed out, the use of eye tracking has persistently come back to

solve new problems in each decade since the 1950s. Senders likens eye tracking to a

Phoenix raising from the ashes again and again with each new generation of engineers

designing new eye tracking systems and each new generation of cognitive psychologists

tackling new problems. The 1980s were no exception. As personal computers prolifer-

ated, researchers began to investigate how the field of eye tracking could be applied to

11

2 – Eye tracking

issues of human-computer interaction. The technology seemed particularly handy for an-

swering questions about how users search for commands in computer menus [20, 21, 22].

The 1980s also ushered in the start of eye tracking in real time as a means of human-

computer interaction. Early work in this area initially focused primarily on disabled users

[23, 24, 25]. In addition, work in flight simulators attempted to simulate a large, ultra-

high resolution display by providing high resolution wherever the observer was fixating

and lower resolution in the periphery (Tong, 1984). The combination of real-time eye

movement data with other, more conventional modes of user-computer communication

was also pioneered during the 1980s [26, 27, 25, 28, 29].

In more recent times, eye tracking in human-computer interaction has shown huge

growth both as a means of studying the usability of computer interfaces and as a means

of interacting with the computer.

2.2 State-of-the-art

Recent eye tracking systems can be classified according to 3 main aspects:

• the adoptedUser Interface. Eye trackers can be:

– intrusive, i.e. in direct contact with the users’ eyes (e.g. contact lens, elec-

trodes);

– remote, e.g. a system including a personal computer with one or morevideo

cameras;

– wearable, e.g. small video cameras mounted on a helmet or glasses.

Remote eye trackers are the most widespread systems. Wearable eye trackers re-

cently gaining some popularity, in contrast to invasive methods, typically uses in

medical field, which are continuously losing fame.

• theApplicationsthat use them as input devices. Eye tracker systems have a great va-

riety of application fields. The first researches, involvingeye trackers, were mostly

12

2.2 – State-of-the-art

related to the medical and cognitive fields, as they concerned the study of human

vision and eye physiology. Recent studies, instead, are rather related to Assistive

Technologies and, in general, to Human Computer Interaction, while current eye

tracker killer applications seem to be mainly focused on advertising and market-

ing, aiming at analyzing the area of interest of customers inside either commercial

videos or advertising posters.

• the usedtechnology. Eye trackers can be based on at least 6 different gaze estima-

tion algorithms that are detailed in subsequent sections.

The aforementioned eye tracking systems classification is summarized in Figure 2.1.

Figure 2.1. Eye tracking systems classification

The main eye tracking methods found in literature are 6:

• image analysis

– cornea and pupil reflex;

– pupil tracing;

13

2 – Eye tracking

– tracing of the corneal reflex through the dual Purkinje image;

– image analysis: tracing of the limb, the border between irisand sclera;

• physical properties analysis

– electro-oculography: measurement of the eye electrical potential;

– scleral coil: measurement of the magnetic field produced by the movement of

a coil inserted into a user’s eye.

Image Analysis

Most computer vision techniques for gaze tracking are basedon finding and analyzing the

reflex produced by infrared light incident on various parts of the eye. Figure 2.2 shows

reflexes on the eye produced by an infrared light source.

Figure 2.2. Infrared light reflections

The tracking of corneal and pupil reflex [30, 31, 32] allows determining the gaze

direction by comparing the infrared corneal reflex with the pupil position. The corneal

reflex tends to stay fixed during pupil movements and is used asa spatial reference for

both tracking pupil movements, and for compensating head movements. An eye tracking

system using this technique typically consists of a single device composed of one or more

infrared light emitters and an infrared-sensitive video camera. Some examples of eye

trackers using this technique are described in [33, 34, 35].

The pupil tracking method [36], conversely, uses just the position and the shape of

the pupil to infer the gaze direction. This technique is sensitive to head movements and

is less accurate in estimating the observed point. Systems adopting this technique, to

14

2.2 – State-of-the-art

compensate for head movements, use cameras mounted over glasses worn by the user.

Most of the eye tracking systems, that use these techniques,are composed by one or more

infrared light sources to illuminate the eye(s), and one or more infrared cameras to capture

video of the eye(s).

Physical properties analysis

Electro-oculography [30, 31, 32, 37] is based on measuring the electrical potential differ-

ence between the cornea and the retina. Typically, pairs of electrodes are placed around

the eye. When the eye moves, a potential difference occurs between the electrodes and,

considering that the resting potential is constant, the recorded potential measures the eye

position.

Scleral coil techniques use a contact lens with a coil to track the eye movements. The

coil induces a magnetic field variation while the eye is moving. Scleral coil eye trackers

are very intrusive because, typically, the contact lenses are connected to some wire. Only

recently a wireless Scleral Coil eye tracker has been developed [38].

Eye trackers that adopt physical properties analysis are usually composed of a DSP,

necessary for data processing, connected to on output channel, which provides the cap-

tured samples.

Comparing Eye Tracking Techniques

Table 2.1 shows a comparison among the gaze tracking techniques previously discussed.

It reports, for each eye tracking technology, the type of devices that can adopt it, the

accuracy, and the sample frequency. The accuracy is evaluated as the minimum visual

angle, measured in degrees, discriminable by the technique. A visual angle of0.5◦ allows

to estimate approximately a 20x20 pixels gazed area at the distance of 70 cm.

15

2 – Eye tracking

Technology Device Type Accuracy Sample FrequencyPupil and Corneal reflectiondesktop, wereable< 0.1◦ 50 - 100 HzElectro-potential intrusive 0.1 − 0.5◦ > 100 HzPupil Tracking desktop, wearable0.1 − 0.5◦ 50 - 100 HzScleral Coil intrusive < 0.1◦ > 100 HzDual Purkinje Image desktop, wearable0.1 − 0.5◦ > 100 HzLimbus desktop, wearable> 0.5◦ > 100 Hz

Table 2.1. Eye tracking techniques comparison

2.3 Experimentation

The potential of eye tracking in ALS is extremely high, sincethese patients retain their

full cognitive capabilities, and while paralysis progresses, in most cases eye movements

are still controllable.

There are few research results on using eye trackers with ALSor MS patients. A

former study [39] identified some fundamental requirementsfor Augmentative and Al-

ternative Communication (AAC) systems in these patients: communicating instructions,

achieve the satisfaction of their needs, clarify their needs, having an ”affective” com-

munication and transfer informations. The results of such research, even if they offer

significant information, are limited by thelack of adirect involvementof patients.

A deeper knowledge about real patient needs, and of their caregivers, is therefore

necessary to define and evaluate effective tools for AAC through eye tracking devices.

This paper reports the trials performed, over a span of 2 years, on a significant fraction of

Italian ALS patients. The trials were conducted in collaboration between Politecnico di

Torino, the hospital San Giovanni Battista of Torino, and theUniversity of Torino (dept.

of Neuroscience).

The main aim of the experimentation is to evaluate if and wheneye tracking technolo-

gies have a positive impact on patients’ lives.

2.3.1 Methodology

The research is based on the following main principles:

16

2.3 – Experimentation

• Adoption of Quality of Life (QoL) assessment scales

• Experimentation with off-the-shelf devices

• Involving a large user base.

A multi-disciplinary team, composed by Neurologists, Psychologists, Speech thera-

pists and Computer Science Engineers, lead the experimentation.

The neurologists select the patients in according with the following recruitment crite-

ria:

• Ethical: patients who are able to understand the aim of the study and to give an

informed consent.

• Motivational : patients who are unable to speech intelligibly and having various

degrees of hand function impairment.

• Efficacy: patients who have basic to good level of computer literacy.

During the trial, each patient uses an eye tracking system for a week in their own

domestic environment.

The research team schedules two visits and one telephone contact for each patient

during the eye tracking lending period. The speech therapists train patients and their

caregivers to calibrate and use the eye tracking system. Thetraining also includes a brief

course about using applications for writing, communication and Internet browsing in eye

tracking mode. Other applications are installed in according to users’ needs and interests.

The psychologists fill in the patients’ assessment questionnaires just before the train-

ing phase. The questionnaires measure the QoL, the satisfaction about Life, the Depres-

sion level, and the perception to represent a burden.

The following international recognized quantitative scales are been adopted:

• Mc Gill scale (MGS). This scale, developed at McGill’s University [40, 41], ana-

lyzes five factors: physical comfort, physical symptoms, psychological symptoms,

existential comfort and support.

17

2 – Eye tracking

• Satisfaction With Life Scale (SWLS) [42, 43] which evaluatesthe satisfaction about

life.

• Zung scale: self rating depression scale [44]; it is fast, simple and it has quantitative

results.

• Self-Perceived Burden Scale (SPBS) [45, 46]: this questionnaire consists of 25

statements about feelings the patients may or may not have about their relationships

with caregivers.

The same questionnaires are proposed again at the end of the evaluation period with

the purpose of verifying the impact of the eye tracker usage on the measured parameters.

A further questionnaire, developed by the ALS center, is additionally proposed at the end

of the lending period. The ALS questionnaire focuses on qualitative aspects and feelings,

and analyzes the time spent with the system, the training process, subjective satisfaction,

and influence on life quality.

2.3.2 Experimental settings

The eye tracker used in the experimentation was theEye Response Technologies’ ERICA

Standard Systemequipped with Assistive and Communication software such as ERICA

keyboard, mouse emulators,Sensory Software’s The Grid. Standard Windows and Inter-

net applications are also used in the tests.

The experimentation involved 16 patients (12 men, 4 women) from April 2006 to

August 2007. The patients average age was 45 years (min 32, max 78). The patients were

in the advanced phase of the disease, in detail: 7 of them weretracheotomized, 8 had

percutaneous endoscopic gastrostomy tube (PEG), while 6 patients where anarthric and 7

had a severe dysarthria.

18

2.3 – Experimentation

2.3.3 Case studies

Three particular user case studies are hereafter reported to give a qualitative outlook of

the impact of eye tracking technology on ALS patiens. Permission to publish this infor-

mation, in a partially anonymous form, has been obtained.

Marco – Marco is 47 and lives in his house with his family. Before the disease he was

a traveling salesman, frequently traveling around the country. At the time of the experi-

mentation he was using a communication system (virtual keyboard) with a computer and

a foot switch (in scanning mode). When he tried the eye controlsystem he was very ex-

cited; he used a screen keyboard for communication and for sending emails quickly and

easily. More recently, he started having a lot of problems with his current system because

he has less and less movement on his feet. He really wants an eye control system, but

the Piemonte Regional Government denied him a grant. He latersucceeded, thanks to

the help of the Italian ALS Association, to raise funds for buying an eye tracker, an is

currently collaborating with the device manufacturers. Heis also in the process of writing

a book, in collaboration with other ALS patients who use eye trackers.

Paolo – Paolo is 52 and lives at home with his wife. Before the illness,he was a web

designer, and he still is. During the experimentation he wasusing two mouse devices, one

for moving the cursor and the other for clicking. He needs theeye tracker for his work,

only, because he still successfully uses labial movements for communication with his

family. He uses many programs for his work, and tried them allon the Erica system. The

results were positive and he wants to buy the software and camera add-on to his computer.

In the past he tried other eye tracking systems but he didn’t like them because ”they didn’t

work well with web design programs.” He recently lost the ability to use mice as input

devices, and he is waiting for the national health system to fund him the purchase of an

eye tracker.

19

2 – Eye tracking

Domenico – Domenico is a young man, he lives at home with his wife. He was eager

to try the eye control system to be able to speak, for the first time, with his 2-years old

nephew, and also to be able to express his feelings with bad words! When he tried the

eye tracker, finally he could speak with his nephew who could listen for the first time his

”voice”.

2.3.4 Quantitative Results

During the initial and final meetings of each trial, the responses of the patients to the

various questionnaires were recorded, and SPSS 12.0 was used to analyze them from a

statistical point of view. The test results showed a clear improvement in the perceived

quality of life, in both the MGS and SWLS Scales.

A particularly noticeable improvement was shown in the patients’ perception of their

condition overall, including their psychological well-being and physical symptoms, al-

though the amount of support required by each patient, and their perceived depression

did not show a significant change (less than 0,05). However, it must be remembered that

these results were achieved over a relatively short trial period of seven days.

More in detail, Figures 2.3, 2.4, 2.5 report the main resultson the four main scales.

The McGill scale (Fig. 2.3) measures a slight, but generalized, improvement on all as-

pects of the quality of life that may be attributed to the eye control equipment. On the

other hand, Fig. 2.4 shows that there were no significant modifications on depression and

burden scores, while we could measure an improvement on the satisfaction with life scale

(Fig. 2.5).

Specific evaluations about the eye control device are analyzed using the ALS Center

questionnaire. In particular, we may notice that the vast majority users are quite satisfied

with eye control devices (Fig. 2.6(d)); they use it quite often (Fig. 2.6(a)), and find it easy

to use (Fig. 2.6(b)) and to learn(Fig. 2.6(c)).

20

2.4 – User comments

Figure 2.3. Quality of Life (McGill Scale)

Figure 2.4. Depression (ZDS) and self-estimated burden (SPBS)

2.4 User comments

Users agree that the system is efficient and effective, and allows more complex commu-

nication, beyond the primary needs. In fact, the majority ofpatients used the system

every day with a high level of satisfaction. They felt that eye-control was comfortable

and flexible, and required relatively little effort. A greatperceived advantage is that, after

calibration, the user is independent in using applications(compared with Plexiglas tables

commonly used for eye-contact dialogs, which rely on a communication partner). For

21

2 – Eye tracking

Figure 2.5. SWLS (satisfaction with life scale)

(a) Time of USE (b) Ease of USE

(c) Learning rate (d) Satisfaction

Figure 2.6. ALS Centre questionnaire

typing applications, users appreciated the prediction dictionary and the voice synthesis

features. On the other hand, some patients expressed negative comments, which were

mainly due to loss of motivation after some initial technical problems, or to the difficulty

in calibration or the need of repeating the calibration procedure too often. Some patients

22

2.5 – Overall results

using multi focal lenses could not calibrate the system, butthis was solved by changing

the glasses. For users less expert with computers, learningto use the screen keyboard was

somewhat difficult. Finally, some patient with a residual mobility on some parts of their

body had difficulties in keeping their head perfectly still.

2.5 Overall results

All patients showed a strong interest in eye tracking systems, and most of them had al-

ready looked for information about this technology. The Erica system has generally been

well accepted and considered easy enough to be used by ALS patients with severe dis-

ability. The patients with worse clinical conditions had better acceptance.

Eye tracking benefits are lower for patients with residual arm mobility, while tra-

cheotomized patients had stronger motivation probably because of two main reasons:

anarthria represents the first motivation for communicating and tracheotomized patients

have better ventilation, and brain oxygenation, than patients with dispnea. The patients

who tried the eye tracker system perceived an improvement ofQoL because they were

able to communicate independently and the communication was easier, faster and less

laborious.

23

2 – Eye tracking

24

Chapter 3

Software Applications

A main lack of the current commercial eye tracker is the negligible amount of available

software. Each vendors provides a propetary software package that includes the basic

software to communicate and, in some case, to surfe the internet. Those software, in most

cases, present a very simplified interface suitable for non-expert users. Users familiar

with computer, generally, are limited by the poor and too simple feauters of the software.

As mentioned before, one of the objectives of this thesis is the research and develop-

ment of applications, based on eye tracking, that can go beyond the limit of the current

software and work with various eye tracker devices. This chapter descibe the project, the

development and the experimentation of three gaze-based applications:

• a mozilla firefox extension that allows aided web surfing

• an applicationf for the control of the operative system based on multimodal inter-

action between speech recnognition and gaze tracking.

• a collection a three simple computer games for studying new multimodal interaction

modalities involving gaze interaction in 3D environments.

25

3 – Software Applications

3.1 Web Browsing

The Web is an increasingly important resource in many aspects of life: education, em-

ployment, government, commerce, health care,recreation, and more. However, not all

the people can equally exploit the Web potential, especially people with disabilities. For

these people, Web accessibility provides valuable means for perceiving, understanding,

navigating, and interacting with the Web, so allowing to actively contribute and partici-

pating to the Web. Much of the focus on Web accessibility has been on the responsibilities

of Web developers. Yet, theWeb softwarehas also a vital role in Web accessibility. Soft-

ware needs to help developers produce and evaluate accessible Web sites, and be usable

by people with disabilities [47].

In 1999 theWorld Wilde WebConsortion (W3C) began Web Accessibility Initiative

(WAI) to improve accessibility of Web. The WAI has developeda number of guidelines,

concerning both Web contents [48, 49] and user agents (Web browsers, media players,

and assistive technologies ) [50], that can help Web designers and developers to make

Web sites more accessible, especially from the view of physically disabled people. There

are many applications (screen readers, voice control, etc..) that get Web browsing more

accessible for blind or deaf people, but only few applications, typically provided with

commercial eye tracker, can allow Web navigation for disabled people that need gaze

tracker devices.

In according with WAI guidelines, aMozilla Firefox (the most usedopen sourceWeb

browser) extension, named Accessible Surfing Extension (ASE), has been developed.

Firefox Extensions are installable enhancements to the Mozilla Foundation’s projects and

add features to the application or allow existing features to be modified. ASE implements

a novel approach to Web site navigation also suitable for lowresolution gaze tracker de-

vices.

26

3.1 – Web Browsing

3.1.1 Accessible Surfing Extension (ASE)

Many aids developed for eye-tracking based web browsing tryto cope with the basic diffi-

culties caused by current web sites. In particular, the three main activities when browsing

the web are, in decreasing order of frequency, link selection, page scrolling, form filling.

Link selection is a difficult task due to the small font-sizescurrently used, that require

high pointing precision. In come cases link accessibility is also decreased when client-

side scripting is used (e.g., in the case of pop-up menus created in Javascript, or with Flash

interfaces) or time-dependent behaviors are programmed (e.g., the user has limited time

to select a link before it disappears); such situations are incompatible with the current

WAI guidelines for web page creation (WCAG).

Most approaches, such those provided in Erica System [51] and in Mytobii 2.3 [52]

(Figure 3.1), tend to facilitate link selection by compensating the limited precision that

can be attained with eye tracking systems: zooming is a common feature that increases

the size of links near to the fixation point (either by screen magnification, or with widely-

distantiated pop-ups) to facilitate their selection in a second fixation step.

Figure 3.1. MyTobii Web Browser

In this a different integration paradigm work has been explored; it decouples page

reading from link selection. In a first phase, when the user ison a new web page, he is

mainly interested in reading it, and perhaps he needs scrolling it. When an interesting link

is identified, only then the user should be concerned for the mechanism for activating it.

If the link is large enough (e.g., a button image), usually nohelp is needed (and the zoom

interface would only interfere with user intentions). If, on the other hand, the link is too

27


small, than a separate method for selecting is available.

At all times, the browsing window is integrated by a side-barcontaining a link-

selection interface, that is always synchronized with the currently displayed web page.

When the user wants to select a link, he may use the sidebar thatfeatures large and easily

accessible buttons.

This interaction paradigm has been developed as a Mozilla Firefox extension. This

browser has been selected instead of others (Internet Explorer, Opera, Konqueror, Safari,

etc..) for being open source, cross platform (Windows, Linux and Mac OS X), customiz-

able and expandable and it has a simplified user interface.

The Accessible Surfing Extension (ASE) is a sidebar application inside the browser

window(see Figure 3.2): whenever a new Web page is loaded ASEanalyzes its contents,

modifies the page layout and refreshes the graphical user interface.

Figure 3.2. Accessible surfing on Cogain.org

According to user preferences and skills, ASE allows users to navigate Web pages in

two modalities:

• Numeric mode: each link in the Web page tagged with a consecutive

28


small integer, shown besides the link text or image. Such links are al-

ways visible, non intrusive, and usually don’t disrupt the page layout.

The integer numbers are used by the user to identify uniquelythe link he

is interested in. At this point, the user turns his attentionto the sidebar,

where the ASE displays a numeric on-screen keyboard and a selection

confirmation button. Users select the link by dialing its number with the

on-screen buttons, and then confirming with the selection button. Feed-

back is continuous: the selection buttons reports the text of the currently

dialed link number, and such link is also highlighted in the web page.

• Browsing mode: a different, simpler modality, can be activated for

users less familiar with web browsing. In such case, page scrolling

and link selection were blended in the ASE interface: through a “Next”

button, that selects the next group of 5 links in the web page,highlights

them in the web page, and updates 5 big buttons to select them,while

simultaneously scrolling down the page to the region containing them.

Thus, the main focus of the user is now on the ASE, to control web page

scrolling. When the user finds an interesting link chances arethat it is

already present in one of the 5 buttons and can be directly selected with

one fixation.

The ASE also allows page zooming, and the number and sizes of the buttons can be

customized to be adapted to the specific eye tracking system precision.

ASE architecture

ASE is composed by five modules (see Figure 3.3):

Web Page ParserWhen a new Web page is loaded asense page change eventis received

from this module that captures the Web links through the DOM (Document Object

Model) interface and stores them into theWeb Links Database. The DOM is

a platform and language-independent standard object modelfor representing and

29


Figure 3.3. ASE architecture

interacting with HTML or XML. The Web page parser sends anupdate messageto

theGUI generator and to theWeb Page Taggerwhen it has finished the Web page

parsing.

Gui Generator This module retrieves Web links from the database, then prepares and

displays the graphical user interface according to the selected navigation modality.

Web Page TaggerIt tags each Web page link, retrieved from the database, withprogres-

sive numbers and then sends are-render pagemessage to Web browser.

ASE GUI Users can interact with the browser through this XUL1 graphical interface.

When a user presses a button acommand messageis sent to theCommand Parser.

Command Parser This module translates ASE user commands (i.e. link selection, zoom

in, etc.) to Mozilla Firefox action (page change) commands.

1the XML User Interface Language, is an XML user interface markup language developed by theMozilla project for use in its cross-platform applications.

30


Mozilla Firefox

event PAGE LOAD

WEB PAGEPARSER

GUI GENERATOR

SIDE BAR WEBPAGE

TAGGINGnum link

COMMANDPARSER

modified

PAGE

Link SelectionZooming

Page Scrolling

WEB PAGE

PAGE

rendered

Firefox commands

ASE commands

ASE

Figure 3.4. The general architecture of the proposed system

3.1.2 Preliminary Tests

A preliminary usability experimentation has been conducted on a ALS user with the part-

nership of“Molinette Hospital of Turin” . Mozilla Firefox (version 1.5) and ASE exten-

sion have been installed on ERICA eye-gaze system. The Molinette experimentation has

involved, at now, twenty people with ALS, yet only one had theopportunity to connect

to the Internet so as to test our software. The aim of the Molinette tests is to understand

how and how much a communication device like ERICA can improve the life quality

of terminally ill patients. ERICA systems has been tested by each patients for a week.

Psychological questionnaires have been proposed to the users before and after the trial.

This tests prove that the psychic condition is significantlyimproved after the trials. The

31


user who has tried our software considers it fairly good and comfortable to use. When

browsing the Internet, he actually ended up to prefer the ASE(numeric mode) than the

ERICA zooming interaction.

3.2 Multimodal interaction

In recent years, various alternatives to the classical input devices like keyboard and mouse

and novel interaction paradigms have been proposed and experimented. Haptic devices,

head-mounted devices, and open-space virtual environments are just a few examples.

With this futurist technologies, although still far from being perfect, people may receive

immediate feedback from a remote computing unit while manipulating common objects.

Special cameras, usually mounted on special glasses, allowto track either the eye or

the environment so as to provide visual hints and remote control of the objects in the

surrounding space. In other approaches special gloves are used to interact with the envi-

ronment through gestures in the space [53].

In parallel to computer vision based techniques, voice interaction is also adopted as

an alternate or complementary channel for natural human-computer interaction, allowing

the user to issue voice commands. Speech recognition engines of different complexities

are used to identify words from a vocabulary and to interpretthe user’s intentions. These

functionalities are often integrated by context knowledgein order to reduce recognition

errors or command ambiguity. For instance, several mobile phones currently provide

speech-driven composition of a number in the contacts list,which is by itself a reduced

contextual vocabulary for this application. Information about the possible “valid” com-

mands in the current context is essential for trimming down the vocabulary size and en-

hance recognition rate. At the same time, the current vocabulary might be inherently

ambiguous, as the same command might apply to different objects or the same object

might support different commands: also in this case, contextual information may be used

to infer user intentions.

32

3.2 – Multimodal interaction

In general most interaction channels, taken alone, are inherently ambiguous, and far-

from-intuitive interfaces are usually necessary to deal with this issue [54].

To keep the interaction simple and efficient, multimodal interfaces have been pro-

posed, which try to exploit the peculiar advantages of each input technique, while com-

pensating for their disadvantages. Among these, gaze-, gestures- and speech-based ap-

proaches are considered the most natural, especially for people with disabilities.

Particularly, unobtrusive techniques are the most preferred, as they aim at enhancing

the interaction experience in the most transparent way, avoiding the introduction of wear-

able “gadgets” which usually make the user uncomfortable. Unfortunately, this is still

a strong constraint, which is often softened by some necessary trade-offs. For instance,

while speech-recognition may simply require wearing a microphone, eye tracking is usu-

ally constrained to using some fixed reference point (e.g., either a head mounted or wall

mounted camera), making it suitable only for applications in limited areas. Additionally,

the environmental conditions render eye tracking unusablewith current mobile devices,

which are instead more appropriate for multimodal gesture-and speech-based interaction.

Indeed, the ambient conditions play a major role when choosing the technologies to

use and the strategies to adopt, always taking into account the final cost of the proposed

solution.

In this context, the section discusses a gaze- and speech-based approach for the inter-

action with the existing GUI widgets provided by the Operating System. While various

studies already explored the possibility of integrating gaze and speech information in lab-

oratory experiments, we aim at extending those results to realistic desktop environments.

The statistical characteristics (size, labels, commands,. . . ) of the widgets in a modern

GUI are extremely different from those of specialized applications, and different disam-

biguation approaches are needed.

One further assumption of this work is the necessity of working with inaccurate eye

tracking information: this may be due to a low cost tracking device, or to low-resolution

mobile (glass-mounted) cameras, or to calibration difficulties, varying environmental

lighting conditions, etc. Gaze information is therefore regarded as a very noisy source

33


of information.

The general approach proposed is based on the following principles:

1. gaze information is used to roughly estimate the point fixated by the user;

2. all object in the neighborhood of the fixated point are candidates for being selected

by the user, and are the only ones to be recognized by the vocalsystem;

3. actual command selection is done by speaking the appropriate command.

As a consequence, and contrarily to related works, the grammar for voice recogni-

tion is generated dynamically, depending on the currently gazed area. This improves

speech recognition accuracy and speed, and also opens the door to novel disambiguation

approaches. In this chapter, a method for analyzing real ambiguity sources in desktop

application usage, and an algorithm for disambiguation of vocal commands will be dis-

cussed.

Eye-gaze pattern-based interaction systems, as any other recognition (e.g. Voice)

based systems, can produce both false alarms and misses. Some of these limitations

can be overcome by developing more advanced techniques suchas statistical learning,

but more importantly ambiguity will be dramatically reduced when multiple modalities

are combined due to the mutual disambiguation effects. If weexpect that eye-gaze pat-

tern alone could be successful most of the time, its role can be expected to be even more

powerful when combined with other modalities such as speechrecognition.

The goal of high-level multi-modal speech systems is to obtain the same ease and

robustness of human communications by integrating automatic speech recognition with

other non-verbal methods, and integrating non-verbal methods with speech synthesis to

improve the output of a multi-modal application.

3.2.1 State of the art

While most of earlier approaches to multimodal interfaces were based on gestures and

speech recognition [55, 56, 57], various speech- and gaze- driven multimodal systems

34


have also been proposed.

In [58] an approach combining gaze and speech inputs is described. An ad-hoc pro-

gram displays a matrix of fictitious buttons which become colored when spotted through

fixation. The test users can then name the color of the desiredbutton to select it via speech

recognition, so going beyond the gaze-tracking limits. However, differently from the ap-

proach proposed in this paper, the technique has not been applied to real programs, and

the color-coding system demonstrated to be somewhat confusing for various users. Re-

sults suggest that in terms of a footprint-accuracy tradeoff, pointing performance is best

(about 93%) for targets subtending 0.85 degrees with 0.3-degree gaps between them.

In [59] gaze and speech are integrated in a multimodal systemto select differently

sized, shaped and colored figures in an ad-hoc application. The distance from the fix-

ation point is used to rank the n-best candidates, while a grammar composed by color,

color+shape or size+color+shape is used for speech recognition. The integrated use of

both gaze and speech proved to be more robust than their unimodal counterparts, thanks

to mutual disambiguation, yet the tests are not based on every-day applications.

Some theoretical directions toward the conversion of unimodal inputs to an integrated

multimodal interface are proposed in [60]. The context hereis more focused on gaze and

speech inputs as Augmentative and Alternative Communication (AAC) channels, which

can be the only available for several diversely able people.The tests are based on earlier

studies which do not involve existing off-the-shelf applications of every-day use.

The COVIRDS (COnceptualVIR tual Design) System, described in [61], provides a

3D environment with a multimodal interface for Virtual Reality-based CAD. Speech and

gesture input were subsequently used to develop an intuitive interface for concept shape

creation. A series of tasks were implemented using different modalities (zoom-in, view-

point translation/rotation, selection, resizing, translation, etc.). Evaluation of the interface

was based on user questionnaires. Voice was intuitive to usein abstract commands like

viewpoint zooming and object creation/deletion. Hand gestures were effective in spatial

tasks (resizing, moving). Some tasks (resizing, zoom in a particular direction) were per-

formed better when combining voice and hand input. The command language was very

35


simple and the integration of modalities was implemented atsyntax level. Therefore in

some cases users showed preference for a simple input device(a wand with 5 buttons)

rather than for multimodal input.

A multimodal framework for object manipulation in Virtual Environments is presented

in [57]. Speech, gesture and gaze input were integrated in a multimodal architecture

aiming at improving virtual object manipulation. Speech input uses a Hidden-Markovian-

Model (HMM) recognizer, while the hand gesture input moduleuses two cameras and

HMM-based recognition software. Speech and gesture are integrated using a fixed syntax:

< action >< object >< modifier >. The user command language is rigid so as to

allow easier synchronization of input modalities. The synchronization process assumes

modality overlapping: the lag between the speech and gesture input is considered to be at

most one word. The functionality of the gaze input is reducedto providing complementary

information for gesture recognition. The direction of gaze, for example, can be exploited

for disambiguating object selection. A test-bench using speech and hand gesture input

was implemented for visualization and interactive manipulation of complex molecular

structures. The multimodal interface allows a much better interactivity and user control

compared with the unimodal, joystick-based, input.

In contrast with most of the mentioned approaches, our experimentation is based on a

system for multimodal interaction with every-day and off-the-shelf desktop environments.

In particular we want to improve the performance of low-costeye-gaze trackers and of

speech recognition systems when used alone, by using a real-time generated grammar

based on the integration of both the input channels.

3.2.2 Speech Recognition Background

The first studies on speech recognition technologies began as early as 1936 at Bell Labs.

In 1939, Bell Labs [62] demonstrated a speech synthesis machine, simulating talking,

at the World Fair in New York. Bell Labs further ceased researches on speech recogni-

tion, basing on the incorrect consideration that artificialintelligence would ultimately be

necessary for success [63].

36


Early attempts to design systems for automatic speech recognition were mostly guided

by the theory of acoustic-phonetics, which describes the phonetic elements of speech (the

basic sounds of the language) and tries to explain how they are acoustically realized in

a spoken utterance. These elements include the phonemes andthe corresponding place

and manner of articulation used to produce the sound in various phonetic contexts. For

example, in order to produce a steady vowel sound, the vocal cords need to vibrate (to

excite the vocal tract), and the air that propagates throughthe vocal tract results in sound

with natural modes of resonance similar to what occurs in an acoustic tube. These natural

modes of resonance, called the formants or formant frequencies, are manifested as major

regions of energy concentration in the speech power spectrum. In 1952, Davis, Biddulph,

and Balashek of Bell Laboratories built a system for isolated digit recognition for a single

speaker [64], using the formant frequencies measured (or estimated) during vowel regions

of each digit.

In 1960s The phoneme recognizer of Sakai and Doshita at KyotoUniversity [65]

involved the first use of a speech segmenter for analysis and recognition of speech in

different portions of the input utterance. In contrast, an isolated digit recognizer implicitly

assumed that the unknown utterance contained a complete digit (and no other speech

sounds or words) and thus did not need an explicit “segmenter”. Kyoto University’s work

could be considered a precursor to acontinuous speech recognition system.

In 1966, Lenny Baum of Princeton University proposed a statistical method [66, 67],

namely Hidden Markov Model (HMM), which was later applied tospeech recognition.

Today, most practical speech recognition systems are basedon the statistical framework

and results developed in the 1980s, later significantly improved [68, 69].

In the late 1980s the first platforms were finally commercialized, thanks to the expo-

nentially growing computer processing power. Still, only discrete utterances were suc-

cessfully recognized, until the mid-’90s when even pauses between words were tolerated.

The so called “continuous speech recognition systems” reached an accuracy of 90% and

more under ideal conditions.

37


In the last decade the computational power of personal computer has dramatically in-

creased, so allowing to implement automatic speech recognizers, which in earlier attempts

where hardly feasible even with dedicated hardware devices, such as DSP boards.

Currently, speech recognition technologies offer an effective interaction channel in

several application fields, such as:

• Telephone Services. Most of telephone companies replace call center operators with

speech recognizer to offer real time information services (e.g. forecast information,

train and flight timetables and reservations).

• Computer control. Recent operative systems integrate nativespeech recognizer en-

gines that allows disabled people to control the personal computer by vocal com-

mands. In addition, many commercial companies provide special-purpose speech

recognizer applications, e.g., voice-to-text editor or converter.

• Mobile device control. Several mobile devices can be controlled by simplevocal

commands. At now, speech recognition in mobile devices is limited to specific

functions like contact list management, phone calls, etc.,but is expected to enable

more complex activities as the underlying system become more performant and less

demanding in terms of energy consumption.

• Automotive control. Since five years several automobile manufactures integrate

speech recognizer systems in the cars. This embedded systems allow drivers to

remotely control devices, such as a mobile phone, while keeping the attention to

the road, and without taking the hands off the steer.

• Language learning. Speech recognizer can be used as automatic pronunciation cor-

rector. Some commercial system already allow to propose thecorrect pronounce

when the spoken words differ too much from the reference samples.

The existing technologies which are currently used in modern applications of ASR

have greatly evolved since their infancy. Yet, a number of factors still make ASR algo-

rithms seriously complex:

38


Speaker independenceMost ASR algorithms require intensive training, which can hardly

cover the entire human spectrum. An ideal application, would require minimal or

no training at all to recognize a user’s speech.

Continuous speechIt is desirable to allow a user to speak normally, rather thanforcing

the insertion of pauses to facilitate the identification of word boundaries.

Vocabulary size The range of vocabulary size greatly vary with the application. For

instance, only a few words are to be recognized when dealing with simple and

limited controls (e.g., an audio player). In contrast, a large vocabulary is necessary

for complex communications, although leading to less accurate recognition as a

greater number of similar words may occur in the vocabulary.

Accuracy Environmental condition like noise and even minimal reverberation are likely

to lessen accuracy.

Delay The recognition process is not instantaneous. The lag introduced by the algorithms

usually grow with the complexity of the application, yielding delayed feedback to

the users, which is often annoying.

Hardware Requirements Typically the microphone has to be placed very near the mouth

for the ASR system to provide accurate results, so limiting the application ranges.

User Inteface Two commercial application for individual voice recognition are already

available at low cost, and with a simple interface (namely Dragon Naturally Speak-

ing and IBM Via Voice). Not all the applications provide a practical user interface

though.

Speaker and Listener Variables People don’t always speak clearly, nor in complete

sentences. And people with hearing loss often use their eyesto get cues from a

speaker’s face and gestures, to lipread, which might be difficult to do while watch-

ing a screen.

39


Give these premises, a multimodal system is presented, which tries to combine the

most promising features of the available input channels, while compensating for the dis-

advantages.

3.2.3 Proposed Solution

Human-Computer interface has been developed to make easy androbust the use of the

machine, giving inflexible solutions: despite the adjective usablethese interfaces are

complex, inflexible and hierarchical. A new concept of multimodal interfaces in human-

computer interaction can open new possibilities in information exchange, obtaining more

usable system that offer to the user more informations, easing decisions and making the

user free to perform other tasks. Our work proposes, by the aforementioned motivations,

a multimodal architecture easy to use and open?source.

In order to take advantage of the concurrent visual and vocalsystems, a few basic

elements have been defined:

Objects The widgets available on the screen. These may be files represented by an icon

and a name, labeled buttons, menu items, window bars and buttons, etc. Each

object is characterized by a few properties such asname, role, state andactions. In

particular, each object has a default action defined by the system (e.g., open the file

with the associated program, show the pop-up menu, etc.).

Context The area spotted by the tracking system, also referred to as Gaze Window (GW),

identifies the context of interaction for the vocal system: only the objects within

such context will be considered by the vocal system. The context varies as the users

focuses on different areas of the screen.

Commands The words captured by the microphone and recognized by the speech recog-

nition engine. Valid commands are described in the grammar,that is composed

of the list of possible commands, corresponding to object names or action names

(within the current GW context).

40


Through the eye motion we track the direction of the gaze as a fixation point on the

screen, i.e., the point in which the user is focusing his/hergaze. This point is normally

affected by some displacement error due to various factors,so the eye tracker actually

identifies an area on the screen rather than a precise point. Thus, the result of the track-

ing is a GW that may contain several objects. The height and width of the GW aroud

the fixation point are defined by a customizable parameterGWsize. In the performed

experiments this parameter has been varied automatically,to simulate eye trackers with

different accuracy.

While gazing, the user also interacts with the system by uttering a command, i.e., by

pronouncing an object name (for the default system action) or a specific action.

The vocal platform manages spoken words through a VXML interpreter that is guided

by the voiceXML unit to completely and accurately interpretthe result. The voiceXML

unit interprets messages sent by the vocal platform, processes them and sends the result

to the main application unit. The voiceXML unit is developedin VXML. VXML is

the W3C’s standard XML format for specifying interactive voice dialogues between a

human and a computer. It that allows voice applications to bedeveloped and deployed

in an analogous way to HTML for visual applications. The application was developed

using a speech processing subsystem based on VoxNauta Lite 6.0 by Loquendo [70], an

Italian company leader on the field of vocal applications andplatforms. After receiving

the recognition results, the application matches the received command with the objects

selected by the eye tracker.

The rest of this section describes in details the various system modules and their func-

tionalities. In particular, I describe a mutual disambiguation algorithm that is based on dy-

namic grammar generation and is suitable for realistic desktop enviroments. Experimental

results will later show quantitative data proving the effectiveness of the disambiguation

method with real desktop usage scenarios.

Particularly, the steps required for command recognition and execution can be sum-

marized as follows (Figure 3.6):

1. Definition of a context as the screen area spotted by the eye-tracking system.

41


2. Enumeration of the available objects within a given context.

3. Retrieval of object properties, such as name, role, position (with respect to the fix-

ation point), state, default action.

4. Disambiguation of objects having the same name by exploiting positional informa-

tion.

5. Matching of a pronounced command against object names or actions within a given

context

6. Retrieval of the corresponding object and execution of therelated action.

3.2.4 System Overview

The proposed system (Figure 3.5), described in this section, aims at extracting the most

useful informations from the two supported modalities (gaze estimation and voice com-

mands), while at the same time enabling their mutual disambiguation. Gaze is used for

setting up a “context” composed of the object that the user iscurrently focusing on. Track-

ing precision is not sufficient for quickly and reliably identifying a single widget, but is

sufficient for identifying an area on the screen and for filtering the contained objects. This

filtering highly reduces the ambiguity of voice commands, byruling out most of the se-

lectable actions (since they lie outside the user focus), and by reducing the dictionary size

(thus enhancing recognition rate).

The user task considered in this study consists in specifying an object (any selectable

element on the screen, i.e., windows, menus, buttons, icons, . . . ) or a command (any

action on an object, i.e., open, close, click, drag, . . . ).

3.2.5 System architecture

The system is organized as a set of five functional modules, asshown in Figure 3.6:

Eye Tracker, Screen Reader, Grammar Generator, Vocal Unit and Action Executor. Each

42


Figure 3.5. Scenario

module is described in the the appropriate sub-section. In particular, the Screen Reader

and the Grammar Generator handle object filtering and disambiguation, and real-time

generation of the VoiceXML grammar.

Figure 3.6. System Overview

43


Eye Tracker

This module is responsible for the identification of an area of interest on the screen, i.e.,

of aGazed Window. The eye tracking system, in fact provides an estimated fixation point

that may be affected by some displacement error, strongly dependent on the hardware

and software components of the tracker. The actual area location and size is therefore

dependent on the fixation point and on the displacement error. Practically, the cursor

coordinates at the time of a fixation are used, and are collected as follows:

• if the cursor remains within a small area (a few pixels wide) for at leastD seconds

(dwell time), a fixation event is raised at the cursor position;

• if the cursor position varies too much before reaching the dwell time threshold, no

events are raised.

In case of fixation the Eye Tracker module defines the Gazing Window as a rectangle of

sizeGWsize centered on the fixation coordinates and eventually calls the Screen Reader

unit.

Screen Reader

The Screen Reader receives the fixated area (GW) as input from the Eye Tracker and

retrieves a set of objects on the screen in such area, by interacting with libraries at Op-

erating System level. This unit enumerates objects within the eye-tracking context and

defines for each of them the corresponding name, role, state,default action, and position.

The nameless or invisible (background) objects are discarded so as to get exactly what

the user sees on the screen. The retrieved objects are eventually collected into a memory

structure and passed to the Grammar Generator unit.

Grammar Generator

This unit generates an appropriate VXML grammar for the speech recognition module of

the Vocal Platform by using the objects spotted through the Eye-Tracking system and the

44


Screen Reader. Basically, the grammar defines a set of possiblevocalcommandsbasing

on the object names or actions.

The grammar is generated according to the following approach:

• if the object name is unique, a single vocal command is generated, corresponding

exactly to that name;

• when 2 to 4 objects share the same name or action, the corresponding commands

are disambiguated by exploiting the object locations (left, right, top, bottom). In

such a case, the commands entered into the grammar are the disambiguated names,

composed by the object name followed by the location direction. For example

“ firefox left ” and “firefox right ”. Additionally, a final command is

also added to the grammar, containing the ambiguous name (e.g., “firefox ”):

when recognized, the VXML interpreter synthesizes a vocal warning message ask-

ing the user to disambiguate it (e.g., “firefox is ambiguous, please specify left or

right”) to give proper auditory feedback to the user;

• when more than 4 objects are ambiguous, the location-based disambiguation method

is ineffective, and in this case a single command is generated with the correspond-

ing name, causing the Vocal Unit to synthesize an error message. The limitation of

4 disambiguation cases is due to the choice of using only 4 relative positions: top,

right, left, bottom.

Vocal Unit

The Vocal Platform receives as input the set of possible contextual commands defined by

the Grammar Generator, and supplies as output the command pronounced by the user.

Every spoken word is processed and interpreted on the basis of the VXML grammar and,

still according to the grammar, a vocal message can be synthesized to notify the user of

the recognition result: command recognized, ambiguous command identified, or wrong

command. When a command is correctly identified, it is passed to the Action Executor

unit.

45


Action Executor

It receives as input the command recognized through the Vocal Platform, and executes the

associated action. Basically, the object corresponding to the command is retrieved from

the data-structure previously created by the Screen Reader,by matching the command

name with the object name or the available object actions (also considering disambigua-

tion). Then, the specified action (or the default action of the object) is executed.

3.2.6 Case Study

The proposed multimodal system, and in particular the interactive location-based disam-

biguation mechanism, have been designed for interacting with a real desktop environment.

To prove the effectiveness of the approach, we report some experimental results gathered

on the Windows XP operating system.

The performed tests have a twofold purpose:

• to analyse the relation between the gaze block size and the number of ambiguous

objects and commands, in a realistic desktop environment;

• to analyse the disambiguation efficiency of the location-based method.

The experimentation is based on data about classic Windows XP widgets (e.g. buttons,

menu items, etc.) and their locations inside the screen, gathered during both work and

personal use of computer. Unlike the other works, that make use of static pre-generated

objects disposition or of simple and unusual objects [58], this work is based on real ex-

perimental data. A test-oriented version of the screen reader module has been developed

to store screen-shots of the computer desktop taken at predefined time slots (e.g., every

3 minutes, provided the user was not idle during that period). Each screen-shot includes

a complete list of objects, each object being described by four properties: Name, Role,

Rectangle and Window Order.

• Thenameproperty contains all the text referred to the object, e.g. button title, text

area contents, etc..

46


• The role property specifies the object type, e.g. command button, list item, menu

item, etc..

• Therectangleproperty represents the location and the dimensions of the object.

• Thewindow orderproperty indicates the z-order location of the object.

The trials involved 5 unpaid people for about a week. Each person installed the screen

reader on his/her own computer and run it for a week. The gathered data sums to 468

screen-shots involving 144,618 objects, including the hidden ones. These objects have

been filtered down through a simple overlap detection algorithm, keeping only the 42,372

(i.e., 29.3% of the total) foreground visible objects, usedin all the subsequent test phases.

The tests determine how often the speech recognition systemis effective in disam-

biguating objects, as a function of the Gazing Window size. To speed up the analysis, the

eye tracking accuracy has been simulated by considering GW with variable dimensions

(from 10px to 800px) instead of precise coordinates identifying the objects position. The

maximum GW dimensions have been chosen to cover the corner case of a very inaccurate

eye tracking with the precision of two zones (left/right) ona 1600x1200 screen resolution.

This correspond to having practically no useful information from the eye tracker.

Two different tests have been performed, each using a different object property to

define objectsimilarity. In the first test (Name Ambiguity), two objects are considered

similar if the nameof the first object is included in (or equal to) thenameof the second

one. In the second test (Role Ambiguity), two objects are consideredsimilar if they have

the samerole (e.g., both objects are buttons).

The tests were executed according to Algorithm 1. The Classification of an object

inside a GW (line 6) follows the rules as shown in Table 3.1.

3.2.7 Name Ambiguity

This test aims at evaluating the number of ambiguous objectshaving anamesimilar to the

target object, within differently sized GWs centered on the object. We neglect the effect

47


Algorithm 1 Test Application1: for eachscreen-shotS do2: for each target objectO in S do3: for each GWsize= 10px . . . 800pxstep10px do4: Generate aGW aroundO with sizeGWsize× GWsize5: Find the objects (OS) in theGW similar to O6: ClassifyOS7: Store Statistics8: end for9: end for

10: end for

Table 3.1. Object classificationUnique There is no other similar object in-

side the GW.Ambiguous The GW contains two or more ob-

jects which are mutually similar.Discriminable The ambiguous objects within the

GW are at most four.Indiscriminable The ambiguous objects within the

GW are more than four.

of speech recognition errors, and the only sources of imprecision are command ambigu-

ity and large GWs. In this case we reach 100% accuracy if and only if there only one

object with the same name is found in the considered GW. The test application generated

79 GWs (square, from 10px to 800px wide) for each object and calculated the number of

ambiguous objects. Thanks to the vocal localization-basedfeedback mechanism, discrim-

inable objects may be selected with full precision. Figure 3.7 illustrates the trend of both

the indiscriminable and discriminable ambiguous objects,in function of the GW size.

Experimental results show that the ideal recognition rate is quite high (about 80%)

even in case of inaccurate eye tracking device (i.e., wide GW)and no disambiguation.

Precision is significantly increased through the localization-based disambiguation method

up to 98% in the worst case. A deeper analysis of the results showed that indiscriminable

objects are not uniformly distributed here: only the 19.1% of the screen-shots presents

indiscriminable objects, and most of them are in an Internetbrowsing windows. In fact,

48


Figure 3.7. Name Ambiguity: Unique and ambiguous (Indiscriminable and Dis-criminable) objects vs. GW size

they are mostly hyperlinks in either Internet Explorer or Mozilla Firefox.

3.2.8 Role Ambiguity

This test aims at evaluating the number of objects having similar role, i.e., those objects

which support the execution of the same commands (e.g., all file icons). Even in this case

the test application generated 79 GWs (square, from 10px to 800px wide) for each object

and computed the number of visible objects with ambiguous role. Figure 3.8 shows the

trend of both the indiscriminable and discriminable ambiguous object roles with various

GW sizes.

In this case the ambiguous objects are far more than those obtained by name similarity.

Therefore, specifying actions as commands rather than object names can be more error

prone in case of low-precision eye trackers: even a 40px GW reduces precision to belos

50%. Even in this case we see the significant effect of location-based disambiguation,

that is able to recover all Discriminable cases. In this case, the 50% recognition threshold

49


Figure 3.8. Role Ambiguity: Unique and ambiguous(Indiscriminable and Dis-criminable) objects vs. GW size

is reached with a much wider GW, around 150px, correspondingto a 4x increase in noise

rejection of the system to gaze tracking errors.

3.3 Gaze interaction in 3d environments

Nowadays the computer game industry is developing more and more innovative interac-

tion and control methods for user inputs. Nevertheless Gazetracking, that is a fast natural

and intuitive input channel, is not exploited in any commercial computer game, yet.

In recent years several research groups started to study gaze tracking devices applied to

computer games. In [71] and [72] we find a comparison of different input methods, also

including gaze tracking, for a first person shooter game. Thestudy in [73] shows that

gaze tracking beats mouse control as input modality during atournament of the classical

Breakout game.

In [74] several uses modes to enable mouse emulation with gaze have been designed and

50

3.3 – Gaze interaction in 3d environments

tested avoiding the well-knownMidas Touchproblem. The methods proposed in that pa-

per have been trialled in Second Life, an internet based 3D virtual world where users can

interact with each other through avatars.

This section present 6 different control methods for navigation and interaction in 3D

games and reports a usability study on those techniques. Differently from previous works,

the present research does not restrict attention to a particular technique or a particular ap-

plication/game but it extends the evaluation to three different games, that require various

skills and input schemes. This kind of research aims at spreading the study and develop-

ment of games and applications, based on gaze tracking device and addressed to common

people. Spreading gaze tracking could have a relevant impact on the reduction of device

costs. The decreasing of cost could also benefit the devices with a more noble purpose,

i.e. the eye trackers used as Assistive Technologies for disabled people [75].

3.3.1 Control and Interaction Techniques

The control scheme for navigation and interaction in 3D Virtual Environments should

allow the users to control their avatars, in particular the direction they are looking and

the direction towards where they are moving. According to the application typology the

game controller should allow more complex actions such as running, jumping, shooting

and interacting with the environment objects.

Most Virtual 3D applications provide a control scheme basedon a combination of key-

board and mouse inputs. Typically the gaze direction, called Free lookor Camera view,

is controlled by moving the mouse around, while the movementdirection is controlled by

keyboard.

In this context adding a further new input channel, additional and not alternative, such

as gaze control, can revolutionize the interaction methodsand user experiences. Our re-

search has designed, developed and tested six different control techniques that involve

gaze tracking and traditional input devices for navigationin 3D virtual environments.

Table 3.2 shows the control and interaction methods developed and tested by our research

group.

51


Technique Movements Free Look ActionsMultimodal interaction

Gaze and keyboard (GTK) K G GGaze and keyboard button (GKB) K G K

Independent Gaze and Movements (IGM) K G GGaze interaction

Direct Gaze Control (DGC) G G GVirtual Keyboard (VK) G G G

Gaze to Target (GT) G G GK = Keyboard G = Gaze

Table 3.2. Control and Interaction Techniques

Multimodal interaction

Gaze tracking and keyboard (GTK) The user controls theFree Lookby gaze interac-

tion: when the gaze is directed to the 4 screen edges the camera view rotates toward the

same direction with proportional speed to the gazed zone nearness to the borders. When

the gaze direction comes back to the screen center the rotation stops. The gaze control al-

lows also to interact with the objects in the environment. The starting and the termination

of the rotation of the camera view and the actions activationare set by dwell time.

The keyboard is used to handle the movements of the user’s avatar by arrow keys.

Gaze and keyboard button (GKB) This control technique manages user’s avatar move-

ments and free look rotation with the same scheme of the previous method. The interac-

tion with the environment is handled by a keyboard key (for example space) that replaces

dwell time selection of the objects.

Independent Gaze and Movement (IGM) Typically in 3D environment navigation

scheme, free look and avatar movements are strictly bound, so the center of the camera

shows the moving direction. This control scheme, instead, completely separates the con-

trol of movements from the control of camera. This behavior allows to simulate a person

that walks in a direction and turns (right or left) his/her head. The direction of movements,

52


controlled by keyboard, is indicated on the screen by an arrow, while the rotation of the

camera is defined by gaze tracking.

Gaze interaction

Direct Gaze Control (DGC) This method allows to control either the navigation or

interaction in 3D environments by using only the gaze tracking inputs. The free look

is managed with the same technique described above, whereasthe forward movement is

handled by selecting through dwell time the central zone of the screen (highlighted with

a viewfinder). In this scheme the direction of navigation is strictly bound to the direction

of the camera. In order to interact with the environment objects, contextual menus are

displayed after a dwell time.

Virtual Keys (VK) This scheme displays four semitransparent buttons in the middle of

screen edges. The left and right buttons control the rotation of the camera while the upper

and bottom button allow to navigate forward and backward. Each button is activated by

dwell time selection.

Gaze to Target (GT) This modality binds the user’s avatar movements to predefined

paths. The environment is enriched by anchor objects that define the locations that can be

reached by the user. When the user selects by dwell time an anchor object then the avatar

autonomously walks towards the selected point. After the anchor selection a confirmation

menu is displayed n order to reduceMidas Toucherror. The user can navigate the envi-

ronment from anchor to anchor while the camera view is free and is controlled similarly

to the GTK method.

3.3.2 Experimentation

The experimentation aimed to test the accuracy, speed and usability of the designed con-

trol and interaction techniques. It was divided in two phases. The first phase, involving

53


6 users, had the purpose to select the more promising techniques. The second phase ex-

tended the test of the selected methods to 15 users.

Three simple 3D games have been developed in order to test thecontrol techniques. Those

games use ETU driver [76] to interact with Erica [51], the eyetracker used for the exper-

imentation. The first game shows a 3D home environment where the user should execute

two kinds of task. In the first task (Figure 3.9(a)), the user has to navigate in the home

and select a particular picture among the four pictures located in the environment. In the

second task (Figure 3.9(b)), the user has to take the requested food from the fridge. The

other two games aim at testing the pointing precision and speed, so the user is required to

shoot target men in a shooting range (Figure 3.9(c)), and to shoot enemies, avoiding good

guys, in a war path (Figure 3.9(d)).

The users played each game for 6 minutes, divided in session of 30 seconds, while

measuring their speed and precision.

The first round of preliminary tests highlighted that the more promising techniques were

GKB, DGC and VK. During the second part of the experimentationthe users tested the

selected methods with the first game. The VK method has not been tested with the last

three games because they did not allow free avatar movementsbut only free camera posi-

tioning.

Table 3.3 reports the precision percentage and the average elapsed time in the execution

of the 2 tasks of the first game. The most precise technique wasVK in both tasks, while

the method that allowed the least elapsed time was VK in the first task and DGC in the

second task. Table 3.4 shows a comparison of average elapsedtime and precision in game

2 and 3 among DGC, GKB and the mouse. The mouse control allowed better precision

in both games while the average elapsed time was equal for DGCand Mouse in games 2

and DGC had the least elapsed time in game 3.

At the end of the test the user filled an evaluation questionnaire in order to assess the

usability of the proposed gaze-based control techniques and also to gather personal opin-

ions and suggestions from the users. The analysis of the questionnaires shows that VK

54


(a) Game 1/Task 1 (b) Game 1/Task 2

(c) Game 2 (d) Game 3

Figure 3.9. ScreenShots of the games

MethodFind Picture Take food

Precision (%) Avg Time (s) Precision(%) Avg Time (s)DGC 89 7.1 93 8.2GKB 79 7.8 84 8.8VK 95 6.5 97 8.6

Table 3.3. Game 1 Test: Precision and Time

method has been perceived as the most accurate and fastest control type. User perceptions

were clearly different to the real objective data reported in table 3.3 and 3.4 probabily be-

cause the amaze and the immersion provided by Gaze control give the players a more

complete and better game experience that overcame the perfomance leaks. The exper-

imentation showed a huge user interest about gaze based control applied to virtual 3D

55


MethodGame 2 Game 3

Precision (%) Avg Time (s) Precision(%) Avg Time (s)DGC 68 0.94 45 0.47GKB 49 1.04 52 0.67

Mouse 90 0.94 68 0.51

Table 3.4. Games 2 and 3: Precision and Time

environment navigation and to game controlling.

56

Chapter 4

Domotics

Domotic systems, also known as home automation systems, have been available on the

market for several years, however only in the last few years they started to spread also

over residential buildings, thanks to the increasing availability of low cost devices and

driven by new emerging needs on house comfort, energy saving, security, communication

and multimedia services.

The challenge of intuitive and comprehensive eye-based environmental control system

requires innovative solutions on different fields: user interaction, domotic1 system control,

image processing. The current available solutions can be seen as “isolated” attempts at

tackling partial sub-sets of the problem space, and provideinteresting solutions in each

sub-domain.

This chapter seeks to devise a new-generation system, able to exploit state-of-the-art

technologies in each of the fields and anticipating interaction modalities that might be

supported by future technical solutions in a single integrated environment. In particular,

the paper presents a comprehensive solution, in which integration is sought along two

main axes:

• integrating various domotic systems

1The term domotic is a contraction of the words domus (the Latin word that means home) andinformatics.

57

4 – Domotics

• integrating various interaction methodologies

Current domotic solutions suffer from two main drawbacks: they are produced and

distributed by various electric component manufacturers,each having different functional

goals and marketing policies; and they are mainly designed as an evolution of traditional

electric components (such as switches and relays), thus being unable to natively provide

intelligence beyond simple automation scenarios. The firstdrawback causes interopera-

tion problems that prevent different domotic plants or components to interact with each

other, unless specific gateways or adapters are used. While this was acceptable in the

first evolution phase, where installations were few and isolated, now it becomes a very

strong issue as many large buildings such as hospitals, hotels and universities are mixing

different domotic components, possibly realized with different technologies, and need to

coordinate them as a single system. On the other hand, the roots of domotic systems

in simple electric automation prevent satisfying the current requirements of home inhab-

itants, who are becoming more and more accustomed to technology and require more

complex interaction possibilities.

In the literature, solutions to these issues usually propose smart homes[77], i.e.,

homes pervaded by sensors and actuators and equipped with dedicated hardware and

software tools that implement intelligent behaviors. Smart homes have been actively re-

searched since the late 90’s, pursuing arevolutionaryapproach to the home concept, from

the design phase to the final deployment. Involved costs are very high and prevented,

until now, a real diffusion of such systems, that still retain an experimental and futuristic

connotation.

The approach proposed in this paper lies somewhat outside the smart home concept,

and is based on extending current domotic systems, by addinghardware devices and

software agents for supportinginteroperation and intelligence. Our solution takes an

evolutionaryapproach, in which commercial domotic systems are extendedwith a low

cost device (embedded PC) allowing interoperation and supporting more sophisticated

automation scenarios. In this case, the domotic system in the home evolves into a more

powerful integrated system, that we call Intelligent Domotic Environment (IDE). IDEs

58

promise to achieve intelligent behaviors comparable to smart homes, at a fraction of the

cost, by reusing and exploiting available technology, and by providing solutions that may

be deployed even today.

On the other hand, interaction methodologies should take into account the latest re-

sults in human-environment interaction, as opposed to human-computer interaction. The

paradigm of “direct interaction”, so familiar in desktop environments and now also ex-

tended on the Internet with Web 2.0 applications, is not so natural when applied to en-

vironmental control. Selecting a user interface element that represents a physical object,

that is also within the user’s view field, is quite an indirectinteraction method. Directly

“selecting” objects by staring at them would be considerably more direct and intuitive.

Besides the technical difficulties of detecting the object(s) gazed by the user, there is a de-

sign trade-off between the more direct selection and the traditional mediated interaction.

While direct interaction eases object identification but leaves few options for specifying

the desired action, mediated selection, where the object isselected on a computer screen,

complicates object selection but allows an easy selection of the desired commands. In

addition, mediated selection allows interaction with objects that are not directly perceiv-

able by the user like thermal control, automated activationof home appliances or objects

in other rooms. The comprehensive solution proposed in thispaper seeks the appropriate

trade-off among these opposite interaction methods, proposing a system able to support

both, and to integrate them thanks to the aid of portable devices.

The overall vision is centered on DOG (DomoticOSGi Gateway)’ that, on one side,

builds an abstract and operable model of the environment (described in section 4.5) by

speaking with different domotic systems according to theirnative protocol, and with any

additional existing device. On the other side, it offers thenecessary APIs to develop

any kind of user interface and user interaction paradigms. In particular in this paper we

will explore eye-based interaction, and will compare “mediated” menu-driven interaction

(section 4.7.5) with innovative “direct” interaction.

Most solutions rely on a hardware component called residential [78] or home gate-

way [79] originally conceived for providing Internet connectivity to smart appliances

59

4 – Domotics

available in a given home. This component, in our approach, is evolved into DOG, an

interoperation system, where connectivity and computational capabilities are exploited to

bridge, integrate and coordinate different domotic networks. DOG exploits OSGi as a

coordination framework for supporting dynamic module activation, hot-plugging of new

components and reaction to module failures. Such basic features are integrated with an

ontology model of domotic systems and sourrounding environments named DogOnt. The

combination of DOG and DogOnt supports the evolution of domotic systems into IDEs

by providing means to integrate different domotic systems,to implement inter-network

automation scenarios, to support logic-based intelligence and to access domotic systems

through a neutral interface. Cost and flexibility concerns take a significant part in the

platform design and we propose an open-source solution capable of running on low cost

hardware systems such as an ASUS eeePC 701.

Ontology-based modeling in DOG is not limited to interoperation but can be leveraged

by applications to enhance the capabilities of controlled environments. For example, it

can support learning of user habits, reasoning about the home state, and context, or it can

be exploited to provide automatic and proactive security, and to implement comfort and

energy saving policies.

The chapter is organized as follows: in section 4.1 some relevant related works are

discussed, reporting state of the art solutions for gaze-based home interaction. Section

4.2 introduces the general architecture of the proposed approach. In section 4.5 the DOG

platform is described, starting from high-level design issues and including the description

of platform components and their interactions, while in Section 4.6 ontology-driven tasks

in DOG are described. Sections 4.7.5 and 4.7.6 compare the two, gaze-based, interac-

tion modalities, highlighting the pros and cons of both and analyzes how the two can be

successfully integrated.

60

4.1 – Domotics Background

4.1 Domotics Background

Vision is a primary sense for human beings; through gaze people can sense the environ-

ment in which they live, and can interact with objects and other living entities [80]. The

ability to see is so important that even inanimate things canexploit this sense for im-

proving their utility. Intelligent environments, for example, can exploit artificial vision

techniques for tracking the user gaze and for understandingif someone is staring at them.

In this case they become “attentive” being able to detect theuser’s desired interaction

through vision [81].

Home automation is a quite old discipline that today is gaining a new momentum

thanks to the ever increasing diffusion of electronic devices and network technologies.

Currently, many research groups are involved in the development of new architectures,

protocols, appliances and devices [82]. Also commercial solutions are increasing their

presence on the market and many brands are proposing very sophisticated domotic sys-

tems like the BTicino MyHome [83], the EIB/KNX [84], which is the result of a joint

effort of more than twenty international partners, the X10 [85] and the LonWorks [86]

systems. Many research works are evolving towards the concept of Intelligent Domotic

Environment, by adopting either centralized or distributed approaches that extend current

domotic systems with suitable devices or agents. The decreasing cost of hardware to-

gether with the constant increase in computational power and connection capabilities is

a major driving force, which currently drifts research efforts to systems based on simple,

embedded PCs able to bridge the interconnection gap between domotic systems and to

bring intelligence to homes. In this contex, Miori et al. [87] defined a framework called

DomoNet for domotic interoperability based on Web Services, XML and Internet proto-

cols. DomoNet defines so-called TechManagers, one per each integrated network, that

expose the domotic network capabilities as Web Services andact as proxies for the capa-

bilities of other networks (exposed by corresponding TechManagers). TechManagers dy-

namically discover, register and de-register services through standard Web Service facili-

ties such as UDDI. DomoNet differs from DOG in several aspects: first it replicates many

61

4 – Domotics

functionalities in virtual devices (proxies) available oneach TechManager, requiring a

considerable synchronization work, while DOG only allocates the necessary resources

implementing a centralized approach. Second, DomoNet, altough based on an ontology

model (DomoML), does not exploit facilities such as device abstraction/categorization,

functionality description and matching, and simply uses the ontology as a common vo-

cabulary for high-level XML messages.

Moon et al. [88] worked on a so-called Universal Middleware Bridge for allowing

interoperability of heterogenous house networks. Similarly to DOG, UMB adopts a cen-

tralized architecture where each device/network is integrated by means of a proper UMB

Adaptor. The Adaptor converts device-specific communication protocols and data (status,

functions, etc.) in to a global, shared representation which is then used to support inter-

operability. Differently from DOG, devices are described and abstracted by means of a

Universal Device Template, without an attached formal semantics. This prevents the sys-

tem to automatically perform generalization of device, functionality abstraction/matching

and reasoning. UMB does not define a standard access point forthe network it manages,

and requires applications to implement a proper connectionlogic (UMB Adaptor) to inter-

act with home devices and networks. The home server, in UMB, isseen more as a router,

which correctly delivers messages between different Adaptors (i.e., networks or devices),

than an intelligent component able to coordinate the connected devices/networks to reach

some goal.

Tokunaga et al. [89] defined a framework for connecting home computing middle-

ware, which tackles the problem of switching from one-to-one protocol bridges to one-to-

many conversions. In their work, Tokunaga et al. defined homecomputing middlewares

able to abstract/control physical devices allowing to coordinate several appliances without

a specific notion of used networks or protocols. Conversion ofprotocols is done by the

so-called Protocol Conversion Manager, that translates local information into a newly de-

fined Virtual Service Gateway protocol. Different protocolconversions can be combined

to achieve multi-network or multi-device interoperability.

62

4.2 – General architecture

Recently, literature reports some research about eye-gaze-controlledintelligentenvi-

ronments. In these studies two main interaction modalitiesare foreseen: direct interaction

and mediated interaction. In direct interaction paradigmsgaze is used to select and con-

trol devices and appliances either with head-mounted devices that can recognize objects

[90] or through “intelligent” devices that can detect when people stare at them [81]. Using

mediated interaction, instead, people control a software application (hosted on desktop or

portable PCs) through gaze, thus being able to control all home appliances and devices

[91].

While being interesting and sometimes very effective, the currently available solu-

tions only try to solve specific sub-problems of human-environment interaction, focusing

on single interaction patterns, interfacing a single or fewhome automation technologies.

This paper, instead, aims at integrating different interaction patterns, possibly exploiting

the advantages of all, and aspires to interoperate with virtually every domotic network and

appliances. The final goal is to provide a complete environment where the user can inter-

act with his house using the most efficient interaction pattern depending on his abilities

and on the kind of activities he wants to perform.

4.2 General architecture

Mixing interaction by gaze and home automation, requires anopen and extensible logic

architecture for easily supporting different interactionmodalities, on one side, and dif-

ferent domotic systems and devices on the other. Several aspects shall be in some way

mediated, including different communication protocols, different communication means,

different interface objects. Mediation implies, in a sense, centralization, i.e., defining a

logic hub in which specific, low level aspects are unified and harmonized into a common

high-level specification.

In the proposed approach, the unification point is materialized by the concept of a

“house manager” which is the heart of the whole logic architecture (Figure 4.1) and acts

as gateway between the user and home environments.

63

4 – Domotics

Figure 4.1. The general architecture of the proposed domotic system

On the “home side” the DOG interfaces both domotic systems and isolated devices,

capable of communicating over some network, through the appropriate low level proto-

cols (different for each system). Every message on this sideis abstracted according to

a high-level semantic representation of the home environment and of the functions pro-

vided by each device. The state of home devices is tracked andthe local interactions

are converted to a common event-based paradigm. As a result,low level, local events

and commands are translated into high-level, unified messages which can be exchanged

64

4.3 – Intelligent Domotic Environments

according to a common protocol.

On the application side, the high level protocol provided bythe manager gives home

access to several interface models, either based on direct or mediated interaction. Two

main models are discussed in this paper, the first based on attentive devices and the second

based on a more classical menu-based interface. The interested readers may find further

details in [91].

4.3 Intelligent Domotic Environments

An Intelligent Domotic Environment (IDE, Figure 4.1) is usually composed by one2, or

more, domotic systems, by a variable set of (smart) home appliances, and by a Home

Gateway that allows to implement interoperation policies and to provide intelligent be-

haviors.

Domotic systems usually include domotic devices such as plugs, lights, doors and

shutter actuators, etc., and a so-called network-level gateway that allows to tunnel low-

level protocol messages over more versatile, application independent, interconnection

technologies, e.g., Ethernet. These gateways are not suitable for implementing features

needed by IDEs as they have reduced computational power and they are usually closed,

i.e., they cannot be programmed to provide more than factorydefault functionalities.

However, they play a significant role in an IDE architecture as they offer an easy to exploit

access point to domotic systems.

Appliances can be either “dumb” devices that can only be controlled by switching on

and off the plugs to which they are connected or “smart” devices able to provide complex

functionalities and to control (or be controlled by) other devices, through a specific, often

IP-based, communication procol.

The Home Gateway is the key component for achieving interoperation and intelli-

gence in IDEs; it is designed to respond to different requirements, ranging from simple

2in this case interoperation may not be needed but intelligence still needs to be supported

65

4 – Domotics

bridging of network-specific protocols to complex interaction support. These require-

ments can be attributed to 3 priority levels: level 1 prorities include all the features needed

to control different domotic systems using a single, high-level, communication protocol

and a single access point, level 2 priorities define all the functionalities needed for defin-

ing inter-network automation scenarios and to allow inter-network control, e.g., to enable

a Konnex switch to control an OpenWebNet light, and level 3 requirements are related

to intelligent behaviors, to user modeling and to adaptation. Table 4.1 summarizes the

requirements, grouped by priority.

Table 4.1. Requirements for Home Gateways in IDEs.

Priority Requirement Description

R1 Interoperability

R1.1 Domotic networkconnection

Interconnection of several domotic networks.

R1.2 Basic Interoperabil-ity

Translation / forwarding of messages across different networks.

R1.3 High level networkprotocol

Technology independent, high-level network protocol for allowing neutral ac-cess to domotic networks.

R1.4 API Public API to allow external services to easily interact with home devices.

R2 AutomationR2.1 Modeling Abstract models to describe the house devices,their states and functionalities,

to support effective user interaction and to provide the basis for home intelli-gence.

R2.2 Complex scenarios Ability to define and operate scenarios involving different networks / compo-nents.

R3 Intelligence

R3.1 Offline Intelligence Ability to detect misconfigurations, structural problems, security issues, etc.R3.2 Online Intelligence Ability to implement runtime policies such as energy saving or fire prevention.R3.3 Adaptation Learning of frequent interaction patternsto ease users’ everyday activities.R3.4 Context based Intelli-gence

Proactive behavior driven by the current house state and context aimed at reach-ing specific goals such as safety, energy saving, robustnessto failures.

A domotic home equipped with a home gateway can be defined an Intelligent Domotic

Environment if the gateway satisfies at least level 1 and level 2 priorities. Level 3 priorities

can be considered advanced functionalities and may impose tighter constraints on the

gateway, both from the software architecture and from the computational power points of

view.

66

4.4 – OSGi framework

4.4 OSGi framework

OSGi technology is an Universal Middleware that provides a service-oriented, component-

based environment for developers and offers standardized ways to manage the software

life cycle. It provides a general-purpose, secure, and managed framework that supports

the deployment of extensible service applications known asbundles.

OSGi platform consists of 4 layers: Security, Module, Life-cycle and Service.

TheSecurity Layer is alike to standard Java security. This layer uses policy files to de-

termine what software bundles can and cannot do. It allows todynamically manipulate

permissions, i.e. changing policies on the fly or adding new policies for newly installed

components. It give the optional possibility to sign bundles.

TheModule Layer hosts bundles, that can contain Java packages and resources. A bun-

dle exports zero o more packages, that can be imported by other bundles, and keep private

other packages . The importer bundle has to specify a range ofcompatibleversions while

the framework resolves at run-time such dependencies.

TheLife-cycle Layer resolves the life-cycle of bundles within the platform. Bundles can

be installed, started, stoppedanduninstalled.

TheService Layercontains a registry where services are published. Bundles can register

their services, specifying also service properties, in theservice registry. Service registry

provides search functionalities on registered service based on a LDAP query language.

Every bundle can interact with other bundles just using and supplying services, re-

specting specification constrains included in every bundle.

4.5 DOG Architecture

DOG is a domotic gateway designed to respond to different requirements, ranging from

simple bridging of network-specific protocols to complex interaction support.

Design principles include versatility, addressed throughthe adoption of an OSGi [92]

67

4 – Domotics

based architecture, advanced intelligence support, tackled by formally modeling the home

environment and by defining suitable reasoning mechanisms,and accessibility to external

applications, through a well defined, standard API.

DOG is organized in a layered architecture with 4rings, each dealing with different

tasks and goals, ranging from low-level interconnection issues to high-level modeling and

interfacing (Figure 4.2). Each ring includes several OSGi bundles, corresponding to the

functional modules of the platform.

Ring 0 includes the DOG common library and the bundles necessary to control and

manage interactions between the OSGi platform and the otherDOG bundles. At this level,

system events related to runtime configurations, errors or failures, are generated and for-

warded to the entire DOG platform. Ring 1 encompasses the DOG bundles that provide

an interface to the various domotic networks to which DOG canbe connected. Each net-

work technology is managed by a dedicated driver, similar todevice drivers in operating

systems, which abstracts network-specific protocols into acommon, high-level represen-

tation that allows to uniformly drive different devices (thus satisfying requirement R1.1).

Ring 2 provides the routing infrastructure for messages travelling across network drivers

and directed DOG bundles. Ring 2 also hosts the core intelligence of DOG, based on

an abstract formal model of the domotic environment (DogOnt ontology), that is imple-

mented in the House Model bundle (R1.2, R1.3, R2.1 and, partially, R2.23). Finally, ring

3 hosts the DOG bundles offering access to external applications, either by means of an

API bundle, for OSGi applications, or by an XML-RPC endpoint for applications based

on other technologies (R1.4).

In the following subsections each DOG bundle is described inmore detail, focusing

on provided services and functionalities.

3In the currently implemented version, external applications can control many domotic networks as asingle home automation system, while network-to-network integration is still being inplemented.

68

4.5 – DOG Architecture

OSGi

DOGLibrary

ConfigurationRegistry

Network Drivers

MyHome

Konnex

SimulatorPlatformManager

MessageDispatcher

Executor

HouseModel

Status

API

XML−RPC

Ring 0 Ring 1 Ring 2 Ring 3

Figure 4.2. DOG architecture.

4.5.1 Ring 0

DOG library This bundle acts as a library repository for all other DOG bundles. It de-

fines the interfaces (Table 4.2) through which bundles can interact, either by providing

or consuming services, the DogMessage objects exchanged bybundles during runtime

operations, and the subset of DogMessages that is exposed toexternal applications, either

through OSGi integration or XML-RPC calls. Exposed messages, called DogML mes-

sages, are expressed in XML and must be valid according to theDogMLSchema (XSD)

provided by the DOG Library bundle. Interaction of different bundles is based on ex-

changing DogMessage objects, also defined here. DogMessages are composed by atype

declaration, that identifyies the type of content, and apayload that stores the content.

A subset of DogMessages is also available to external applications, either through OSGi

integration or XML-RPC calls: such exposed messages are called DogML messages, and

69

4 – Domotics

are encoded in XML according to the DogMLSchema (XSD) provided by the DOG Li-

brary bundle.

Platform manager This bundle handles the correct start-up of the whole systemand

manages the life cycle of DOG bundles. The platform manager coordinates bundle acti-

vations, enforcing the correct start-up order, and managesbundle errors with a two-stage

strategy. First, it attempts to restart modules brought down by uncaught exceptions, then,

if after re-bootstrapping bundles are still in error, it notifies the unavailability of the in-

terrupted services to all other DOG bundles. The platform manager can be extended to

integrate principles from autonomic computing, exploiting more advanced techniques to

react to failures and keeping DOG operational, as long as possible. The management

of the bundle failures is the base feature of the future and more complexautonomic be-

haviour that the next version of DOG should support. In this way it is possible to build

up a decentralized engine, which can continuing work even ifsome of its parts becomes

unusable. When a bundle becomes available, the DOG is immediately notified and can

start using it: the availability of the bundle services can trigger the availability of other

bundles and services. All that, without the necessity to restart the system.

Configuration Registry The Configuration Registry implements theConfiguratorin-

terface by maintaining and exporting bundle configuration parameters. Typical examples

of these parameters are IP addresses and ports for network drivers that interface network-

level gateways, the ontology repository location for the House Model bundle, the bundle

versions needed to manage compatibility issues, and so on.

4.5.2 Ring 1

Network Drivers In order to interface domotic networks, DOG provides a set ofNet-

work Drivers, one per each different technology (e.g., KNX,OpenWebNet, X10, etc.).

Every network driver implements a “self-configuration” phase, in which it interacts with

the House Model (through the HouseModeling interface) to retrieve the list of devices to

be managed, together with a description of their functionalities, in the DogOnt format.

70


Every device description carries all the needed low-level information like the device ad-

dress, according to the network dependent addressing format (simple in OpenWebNet,

subdivided in group and individual addresses in KNX, etc.).

Network Drivers translate messages back and forth between Dog bundles and network-

level gateways, they implemement theCommandExecutorinterface that supports queries

and commands issued by other DOG bundles. In addition, they use the services defined

by theStatusListenerinterface to propagate state changes to registered listeners (the DOG

Status bundle, for example). Monitoring, at the Network Driver level, can either be done

by listening to network events or by performing polling cycles when domotic networks

do not propagate state-change messages. In both cases, the driver bundles provide state

updates to the other DOG bundles by using the typical event-based interaction paradigm

supported by OSGi.

Currently three Network Drivers have already been developed: Konnex, OpenWebNet and

Simulator, .which emulates a synthetic environment by either generating random events

or by re-playing recorded event traces. Durigin the start up, Network Drivers interact with

HouseModeler services to retrieves information about the devices controllable by them.

Those information include both devices typology and low-level data, such as physical ad-

dresses, groups numeber, etc. Network Drivers provide the CommandExecutor service

that permits other DOG bundles to send commands and inquiry the device states. Fur-

thermore Network Drivers use StatusListener services to notify the events happend in the

domotic system, e.g. light A11 is now OFF. At the present timethe DOG is composed by

Konnex, BTcino and Simulator network drivers.

The Konnex and BTcino bundles provide the traslation of high level commands, coming

from the applications, into the low-level signals and messages of their respective net-

works. They also conforms the low-level events, originatedin their networks, into the

DOGmessage format and forward them to all the bundles that exports the StatusListener

service. The Simulator network driver aims to control a virtual domestic environment

containing fictious devices and to simulate their behavioureither with random generation

of events or with the execution of a pre-recorded list of commands.

71

4 – Domotics

Table 4.2. Interfaces defined by the DOG library bundle

Interface DescriptionApiConnector Allows OSGi bundles to control the domotic environment through a

technology independent set of functionalities. More precisely, it al-lows to get the IDE configuration, to send commands to connecteddevices, to query the device states and to receive state-change noti-fications.

Configurable Supports runtime configuration of bundles. Every Configurablebundle can be tuned by external applications, that can adjust theparameters exposed through this interface.

CommandExecutor Provides means to propagate commands to the proper bundles.HouseModeling Provides access to the house formal model (DogOnt). It is used

to retrieve the house configuration, which is propagated to networkdrivers, to get device re-mapping, allowing to automaticallyrecog-nize new devices, to semantically check commands and notifica-tions, and to resolve group commands (scenarios).

StateAndCommandChecker Defines methods for validating commandsand states in DOG. Vali-dation is both syntactic, ensuring that received messages (in XML)are well formed and valid, and semantic, thus guaranteeing that ev-ery device is driven by using the appropriate commands accordingto the HouseModeling interface.

StateListener Allows bundles to be notified when managed devices change theirstates.

StateProvider Provides information about the current stateof devices.Configurator Defines a repository of start-up bundle configurations. Each bundle

accesses the services exposed through this interface in thestart-upphase, to retrieve the needed configuration parameters.

DevicesListUpgrade Permits to update the routing tables andthe list of devices currentlymanaged by DOG. In conjunction with the Configurable interface,it allows runtime changes of the DOG configuration, supporting dy-namic scenarios where new devices / networks can be hot-pluggedinto the system.

4.5.3 Ring 2

Message DispatcherThis bundle acts as an internal router, delivering messagesto the

correct destinations, be they the network drivers (commands and state polls) or other

DOG bundles (notifications). Routing is driven by a routing table where network drivers

are associated to high-level device identifiers, enbling DOG to deliver commands to the

right domotic network. For example, if a high-level DogMessage specifies that the kitchen

light must be lit, and if the House Model reports that the light belongs to Konnex plant,

then the message dispatcher routes the message to the Konnexnetwork driver. The routing

table, dynamically built through the DevicesListUpgrade service, is initially provided by

the Configurator bundle during the start up phase and constantly updated by Network

Drivers.

72


Executor The Executor validates commands received from the API bundle, either

directly or through the XML-RPC protocol. Commands are syntactically validated, by

checking the relation between DogMessage type declarationand DogMessage content,

and semantically validated, by checking them against the set of commands modeled by

the HouseModel ontology. If all checks are passed, the Executor forwards messages to

the Message Dispatcher for the final delivery. Otherwise messages are dropped, avoid-

ing to generate a platform inconsistent state. Thanks to itsrole in filtering and checking

high-level messages in DOG, the Executor is a suitable candidate for future implementa-

tions of rule-based, runtime automation scenarios or safety policies, including command

prioritization, required for implementing security and manual override mechanisms. This

bundle listens the commands from API bundle, then validatesthem using the Status bun-

dle services and, finally, if the validation was successful it forwards the commands to the

Message Dispatcher bundle. An alternative system architecture could prefigure the merg-

ing of Executor and Message Dispatcher bundles that actually execute simalar functions,

i.e. they forward commands from the higher levels of the system to the lower ones. Nev-

ertheless these bundles have been separated because in the future version of DOG their

roles will diverge more and more. The Executor bundle will manage thesmartforwarding

of the messages, based on priority rules. Rule engine and Virtual Device manager: these

bundles are the core of the DOG intelligence. The Rule Engine allows the management

of user defined scenarios and also automated reaction to alarms. The Virtual Device Man-

ager provides device-level abstraction to real devices adding capabilities through software

emulation.

Status The Status bundle caches the states of all devices controlled by DOG by lis-

tening for notifications coming from network drivers. This state cache is extensively used

in DOG to reduce network traffic on domotic busses, and to filter out un-necessary com-

mands, e.g., commands whose effects leave the state of the destination devices unchanged.

Since missed network-level messages or other network-related errors may result in an

occasional state cache inconsistency, the Status bundle performs a low priority polling

cycle, in which suitable DogMessages are generated for querying Network Drivers and

73

4 – Domotics

consequently updating the state cache. The same query messages can also be generated

by the API bundle for directly taking a real-time snapshot ofthe house state, bypassing

the Status module.

In addition it offers the StatusChecker service that validate the command sent by ex-

ternal applications. The validation involves two steps: a syntax and a semantic control of

the commands.

The first step check if the DOGmessage, containg the command,is well-formed, while

the second step check:

• the existence of the devices related to the command

• the usefulness of the command, e.g if the command could product a change of rhe

devices states.

• the semantic probity of the command, e.g if the devices, releted to the command,

could support its execution. For example the validation fails if the command attempt

to OPEN aLAMP .

House ModelThe House Model is the core of intelligence of the DOG platform. It is

based on a formal model of the house defined by instatiating the DogOnt ontology [93].

According to the Gruber’s definition [94] an ontology is an “explicit specification of a

conceptualization,” which is, in turn, “the objects, concepts, and other entities that are

presumed to exist in some area of interest and the relationships that hold among them”.

Today’s W3C Semantic Web standard suggests a specific formalism for encoding ontolo-

gies (OWL), in several variants that vary in expressive power[95]. DogOnt is a OWL

meta-model for the domotics domain describing where a domotic device is located, the

set of its capabilities, the technology-specific features needed to interface it, and the pos-

sible configurations it can assume. Additionally, it modelshow the home environment

is composed and what kind of architectural elements and furniture are placed inside the

home. It is organized along 5 main hierarchy trees, including: Building Thing, model-

ing available things (either controllable or not);Building Environment, modeling where

things are located;State, modeling the stable configurations that controllable things can

74


assume;Functionality, modeling what controllable things can do; andDomotic Network

Component, modeling peculiar features of each domotic plant (or network). TheBuild-

Uncontrollable

Controllable

Konnex component

...

BTicino component

Discrete state

Continuous stateState

Query functionality

Notification functionality

Control functionality

Functionality

Room

Garden

Garage

Building Environment

Domotic network component

Building ThingDogOnt

....

Figure 4.3. An overview of the DogOnt ontoloy

ingThingtree subsumes theControllableconcept and its descendants, which are used to

model devices belonging to domotic systems or that can be controlled by them.

Devices are described in terms of capabilities (Functionalityconcept) and possible

configurations (Stateconcept). Functionalities are mainly divided inContinuousandDis-

crete, the former describing capabilities that can be variated continuously and the latter

referring to the ability to change device configurations in adiscontinuous manner, e.g.,

to switch on a light. In addition they are also categorized depending on their goal, i.e. if

they allow to control a device (Control Functionality), to query a device condition (Query

Functionality) or to notify a condition change (Notification Functionality).

Each functionality instance defines the set of associated commands and, for continu-

ous functionalities, the range of allowed values, thus enabling runtime validation of com-

mands issued to devices. Devices also possess a state instance deriving from aState

subclass, which describes the stable configurations that a device can assume. EachState

class defines the set of allowedstate values; states, like functionalities, are divided in

ContinuousandDiscrete.

DOG uses the DogOnt ontology for implementing several functionalities (Section 4.6)

encompassingcommand validation at run-time, using information encoded in function-

alities, stateful operation, using the state instances associated to each device,device

abstraction leveraging the hierarchy of classes in the controllable subtree. The last

75

4 – Domotics

operation, in particular, allows to deal with unknown devices treating them as a more

generic type, e.g., a dimmer lamp can be controlled as a simple on/off lamp. Ontology

instances modeling controlled environments are created off-line by means of proper edit-

ing tools, some of which are currently being designed by the authors, and may leverage

auto-discovery facilities provided by the domotic systemsinterfaced by DOG.

4.5.4 Ring 3

API Services provided by DOG are exposed to external OSGi-basedapplications by

means of the API bundle that allows to retrieve the house configuration, to send com-

mands to devices managed by DOG and to receive house events.

• getConfiguration.The ability to request the house configuration, including the pos-

sible states and the allowed commands of each device managedby DOG.

• setCommand.The ability to send single and multiple commands to devicesmanaged

by DOG, independently from the domotic network to which theyare connected

(thus supporting inter-network scenarios).

• setListener.The possibility to register an application as event listener, thus enabling

event-based interaction with IDEs, even if managed networks natively require a

polling-based interaction.

• getDeviceStatus. The ability to directly check the state of house devices, bypassing

the internal cache hosted by the Status bundle.

The API bundle is able to directly interact with the HouseModel every time a complex,

multiple command must be resolved into a set of single commands to be issued to the

proper network drivers. For example, the command for switching all lights off is con-

verted by the API module in a set of “Switch OFF” commands issued to all devices

modeled asLampin the House Model ontology.

Messages exchanged between the API bundle and external OSGiapplications must con-

form to the DogML schema defined by the DOG Library bundle.

76

4.6 – Ontology-based interoperation in DOG

XmlRPC The XmlRPC bundle simply provides an XML-RPC endpoint for services

offered by the API bundle, thus enabling non-OSGi applications to exploit DOG services.

It implements a light-weight web server able to listen for remote procedure calls and to

map such calls to API calls. Similarly to the API bundle, all messages exchanged by the

XML-RPC bundle must conform to the DogMessage XSD, as a consequence all exported

methods will require a single, string parameter holding therequest message in XML.

This bundle is strictly connected to API bundles and make useof ApiConnector service

in order to allow external applications to control the Domotic Enviromnent. It runs an

XML-RPC server that wraps and exports the four control methods offered by API bundle.

All the published methods accept strings as parameter and return string. Those strings

contain DogComML XML messages. The applications can retreive the device states by

polling technique or by registering themselves as status listener.

4.6 Ontology-based interoperation in DOG

4.6.1 Start-up

In the start-up phase, information contained in the DogOnt ontology instantiation that

models the controlled environment, and exposed through theHouse Model bundle, is

queried to configure network drivers and to deal with unknowndevice types. When a

DOG instance is run, DOG bundles are started, with a bootstrap order defined by the

Platform Manager bundle. The House Model is one of the first available services and

is used by network drivers to get the list of their managed devices. The first interac-

tion step involves querying a DogOnt instantiation, using SPARQL, for extracting device

descriptions, filtered by technology (e.g., searching specific DomoticNetworkComponent

instances). Each device description contains the device name as well as all the low level

details needed by drivers to communicate with the corresponding real device.

Once the complete device list is received, each driver builds a mapping table for

translating high-level commands and states defined in DogOnt into suitable sequences

77

4 – Domotics

SELECT ?x WHERE{ ?x a d:OpenWebNetComponent }

(a)

SELECT ?x WHERE{ ?x a d:KonnexComponent }

(b)

Figure 4.4. SPARQL queries for retrieving all BTicino OpenWebNet (a) andall KNX (b) in DogOnt.

78

4.6 – Ontology-based interoperation in DOG

of protocol-specific messages. In this phase, drivers can possibly find unsupported de-

vices, i.e., devices that they cannot control as no mapping between high and low level

messages is defined, yet. In this case, a further interactionwith the House Model re-

quests a generalization step for instances of unknown devices. For each unknown device,

the House Model retrieves the super-classes and provides their descriptions back to the

network drivers. In this way specific devices (e.g., a dimmerlamp) can be treated as

more generic and simpler ones (e.g., a lamp), for which network drivers have the proper

mapping information. This automatic reconfiguration capability to deal with unsupported

devices sustains DOG scalability: even if devices (and their formalization) evolve more

rapidly than drivers, they can still be controlled by DOG, although in a restricted manner.

4.6.2 Runtime command validation

At runtime, the DogOnt instantiation exposed by the House Model is exploited to seman-

tically validate received requests and internally generated commands. For each DogMes-

sage requiring the execution of a command, i.e., requiring an action on a given domotic

component, the command value is validated against the set ofpossible values defined in

DogOnt for that component type. Validation proceeds as follows: when a DogMessage

containing a command needs validation, the House Model queries DogOnt for allowed

commands (Figure 4.5) and, if necessary, retrieves the allowed range associated to each

of its parameters. If the DogMessage command complies with constraints extracted from

the ontology model, the command is considered valid and propagation to the DOG Mes-

sage Dispatcher is allowed, otherwise the command is rejected and the message dropped

without any further consequences (except logging).

4.6.3 Inter-network scenarios

Together with validation and automatic generalization of devices, the House Model as-

sumes a crucial role in the definition of scenarios and commands involving more than

one domotic network. This is a typical case for home scenarios, i.e., for commands that

79

4 – Domotics

SELECT ?h WHERE{ d:DimmerLamp rdfs:subClassOf ?q.?q rdfs:subClassOf ?y. ?y rdf:type owl:Restriction.?y owl:onProperty d:hasFunctionality.?y owl:hasValue ?z. ?z a ?p. ?p rdfs:subClassOf ?l.?l rdf:type owl:Restriction.?l owl:onProperty d:commands.?l owl:hasValue ?h}

Figure 4.5. The SPARQL query needed to retrieve the commands that can beissued to aspecific device, e.g. a DimmerLamp.

coordinate different devices to reach a given high-level goal, for example to set-up a com-

fortable environment for watching the TV.

If a scenario involves devices belonging to different domotic plants, the abstraction

introduced by DogOnt allows to define operations in a technology neutral way and to

properly generate the corresponding DogMessages that are then converted into network

specific calls.

Example A very common scenario, available in almost all domotic environments is the

“switch-all-lights-off” scenario, that turns off all the lights of a given domotic home. If

the sample home contains more than one domotic plant, DOG allows to implement the

scenario easily, by means of its House Model bundle. Thanks to the abstraction provided

by the DogOnt instantiation managed by the House Model, the “switch-all-lights-off” can

be simply modeled by a rule stating that allLampdevices shall receive anOFF command,

defined by the basicOnOffFunctionalityInstanceassociated to eachLamp(Figure 4.6).

Lamp(?x)ˆhasState(?x,?y)ˆDiscreteState(?y)ˆˆvalueDiscrete(?y,?z)ˆequals(?z,"ON")->valueDiscrete(?y,"OFF")

Lamp(?x)ˆhasState(?x,?y)ˆContinuousState(?y)ˆvalueContinuous(?y,?z)ˆge(?z,0)->valuContinuous(?y,0)

Figure 4.6. The switch-all-lights-off rule, in Turtle notation.

This rule, when triggered by a call to the API bundle, requires a reasoning step, called

Transitive Closure, that allows to propagate properties (e.g., functionalities) along the

ontology hierarchy, thus allowing to recognize all the instances ofLampdescendants as

80

4.7 – Case study

Lamps. For each of them, a suitable DogMessage is generated, carrying detailed infor-

mation about the destination device, modeled in the ontology by subclassing the proper

DomoticNetworkComponent. Resulting DogMessages are then propagated by the Mes-

sage Dispatcher to the network drivers, which, in turn, power off the corresponding lamp

devices, be they simple lamps, dimmers or very complex illumination systems.

4.7 Case study

In order to test the DOG functionality on the field, an experimental setup involving two

different domotic networks has been deployed, using a BTicino MyHome demo-box and

a Konnex demo-box crafted by the authors using off-the-shelf Siemens and Merten KNX

devices (Figure 4.7).

Figure 4.7. The demonstration cases used to perform functional tests on DOG.

DOG has been implemented in Java, as a set of OSGi bundles running on the Equinox

open source framework [96]. The DogOnt ontology is managed by the HouseModel using

the HP Jena API while the XML-RPC module exploits the Apache XML-RPC API[97].

DOG is currently released under the Apache license and can run over very cheap devices

such as the ASUS eeePC (used in the experiments), a sub-laptop based on an Intel Celeron

processor, whose cost is of about 300 euros4.

Experiments aimed only at qualitatively testing the functionalities described in this

paper, while more sound evaluation through performance benchmarks and user studies

4much less than the cost of toolbox components, which is around 3000 euros each.

81

4 – Domotics

is planned as a future work. Experiments showed a generally responsive management

of domotic devices, which always reacted in a reasonable time window (delays were un-

noticeable), even when executing complex inter-network scenarios. Reaction to driver

failures was satisfying: even when disconnecting one of thetwo demonstration cases, the

other continued to work. Moreover, the automatic driver detection mechanism of DOG al-

lowed to effectively manage hot-plugging of cases, exponsing their device functionalities

few seconds after plugging-in the demonstration case.

4.7.1 Dynamic Startup Configuration

During the initial phase of the system start-up the platformmanager check the list of cor-

rect installed bundlem, then it starts them in according with the order stored into its con-

figuration files. The house model bundle it’s started just after two systems modules, i.e.

the dog library and the configuration registry bundles. The network driver bundles con-

trol firstly the connections with the physical network and then communicate with House

Model to retrieve the list of device that can be controlled bythem. The device list includes

the device names, the device type and low-level configuration parameters (address, port,

group number, . . . ). The network driver have to map the abstract type provided by the

ontology, included in the house model bundle, into real device type. In detail they trans-

late each abstract command or state (e.g. ON,OFF,CLOSE,. . . )into the proper signals or

messages in according with the network protocol and viceversa. The DOG architecture

allows to control recent devices even if the Network Driver not completly support them.

When a network driver bundle receive from the house model an unknown device type it

suddenly enquiry the hierarchy of that device type, so if thedriver support at least one

parents of the unknown type the bundle can control it by usingthe functionalities of that

parent. For example a new flashing lamp has been installed e configured (see section

4.7.3:Adding a new device) in the domotic environment and unfortunately the network

driver provided with the DOG has not been updated. In the starting phase the network

driver bundle retrieve that exists a device of the Flashing Lamp type. It can not directly

82

4.7 – Case study

control the device because has not stored the translation mapping of that device. The net-

work driver bundle enquiry the House Model for information about Flashing Lamp and it

finally knows that a Flashing Lamp is also a Lamp.

4.7.2 Complex Command Execution

A diagram of the complex command execution, inside the DOG system, is provided in

figure 4.8.

Figure 4.8. Complex Command Execution

The commands execution involves 9 steps:

1. An external application send to DOG a command in DogML formart, for example

switch off all the lights in the house

2. The API bundle parse the DogML message and envelope it intoa DogMessagese,

then forward that message to the House Model bundle.

3. The House Model performs a reasoning on the DogOnt ontology (see 4.6) and re-

trieves the list of devices that mach the query. In the considered example the House

Model should return either theSimple Lamps, Dimmer Lamps and Flashing Lamps.

83

4 – Domotics

4. The Api generates a DogMessage containing the device listand the related com-

mand.

5. The Command Executor forwards the received DogMessage to Status Bundle.

6. The Status Bundle validates and verifies the command, included in the DogMessage

and send it again to the Command Executor.

7. The Command Executor if the command passed the validation send it to Message

Dispatcher else generates an error message.

8. The Message Dispatcher routes the received DogMessages to the relative Network

Drivers, in according to the routing table.

9. The Network Drivers translate DogMessages into the respective low-level protocol

and execute commands.

4.7.3 Adding a new device

There are different alternatives in order to install a new device in the domotic environment,

as shown in the diagram in Figure 4.9. This process may involves three categories of

people:

Electricians They are responsable of the physical connection and configuration of the

new device.

Domotic technicians They configure the House Model by the modification of the Do-

gOnt ontology. Typically Domotic technicians are trained electricians.

Network Driver Developers They are specialized computer programmer that develop

and upgrade the network drivers.

Users They can handle easy configuration issues through a graphical user interface.

84

4.7 – Case study

Figure 4.9. Adding a new device

When a new device has to be installed an electrician provides the physical connections,

then a DogOnt ontology update is necessary. The DogOnt can beconfigured either by

users throught a graphical interface or by the domotic technicians. The intervention of

a domotic technician is necessary when the device needs complex configuration (e.g.

a multimedial workstation that couald control sever other devices). If the new divice

belongs to a type already present in the DogOnt, the ontologyupdate consists in adding

a new device instance into DogOnt. Otherwise a new device type has to be added to the

ontology. If such device type has a parent in DogOnt the new device can be partially

controlled otherwise an upgrade of the relative network drivar is necessary. The partial

control is provided by DogOnt reasoning up on device type inheritance.

The Network Driver Developers have to develop the uptodate drivers to control the device

of a completly new type. The DOG developers can develop a new device type driver in

85

4 – Domotics

about 2 hours. The current version of DogOnt already includes several kinds of device

type, so it is quite common to find at least one parent for the new device type, so even if

partially a new device can be controlled just after the physical installation and the ontology

configuration.

4.7.4 Comparison of DOG to related works

Table 4.3 shows the degree with which DOG, and the related works described in this

section, satisfy requirements for IDE domotic gateways (listed in Table 4.1).

Table 4.3. Requirements satisfied by related works, in comparison with DOG.

Req. DomoNet UMB [89] DOGR1.1 ++ ++ ++ ++R1.2 ++ ++ ++ ++R1.3 ++ ++ ++ ++R1.4 + - + ++R2.1 + - -- ++R2.2 + + + +R3.1 -- -- -- -R3.2 -- -- -- -R3.3 -- -- -- -R3.4 -- -- -- --

Legend: ++completely satisfied,+partially satisfied,- easily satisfiable in the current architecture,-- requires significant platformreengineering.

4.7.5 Mediated Interaction

Configuring, activating or simply monitoring complex appliances as well as complex sce-

narios can become really difficult by only gazing at them. In these cases a mediated

interaction which allows users to control the several aspects involved in these operations

through a menu-based PC application can be more effective.

In the mediated interaction paradigm, gaze-based actions and reactions are accom-

plished through a menu-driven control application that allows users to fully interact with

the domotic environment.

Such applications shall exhibit some constraints with respect to the different categories

86

4.7 – Case study

of users being expected. When users need a different application layout, related for ex-

ample to the evolution of their imparment, they shall not be compelled to learn a different

way of interacting with the application. In other words, theway in which commands are

issued shall persist even if the layout, the calibration phase or the tracking mode changes.

To reach this goal the interaction pattern that drives the command composition has

to be very natural and shall be aware of the context of the application deployment. For

example, in the real world, if a user wants to switch on the kitchen light, They go in that

room, then they searche the proper switch and finally confirmsthe desired state change

actually switching on the light. This behaviour has to be preserved in the control applica-

tion command composition and the three involved steps must remain unvaried even if the

application layout changes according to the eye tracker accuracy.

Figure 4.10. The control application with a quite accurate tracker.

In this paper, mediated interaction can either be driven by infrared eye trackers (maxi-

mum accuracy/resolution) or by visible light trackers (web-cam or videoconference cam-

eras, minimum accuracy/resolution). These two extremes clearly require different visual

layouts for the control application, due to differences in tracking resolution and movement

granularity.

87

4 – Domotics

In the infrared tracking mode, the system is able to drive thecomputer mouse directly,

thus allowing the user to select graphical elements as largeas normal system icons (32x32

pixels wide). On the other hand, in the visible light tracking mode few areas (6 as an

example) on the screen can be selected (on a 1024x768 screen size this would mean that

the selectable area is approximately 341x384 pixels). As a consequence, the visual layout

cannot remain the same in the two modes, but the interaction pattern shall persist in order

to avoid the user to re-learn the command composition process, which is usually annoying.

Figure 4.11. The control application with a low-cost visible light tracker.

As a first experiment, we created the two possible layouts presented in Figures 4.10

and 4.11, in which the number of active areas are 32 pixels and300 pixels, respectively.

Despite of a 10x change in tracking resolution, the information architecture remain co-

herent, only layout disposition and scanning sequences areaffected. The prototype user

interface has been designed to minimize screen clutter, thus easing eye tracking selection.

The complete interaction pattern implemented by the control application can be sub-

divided in two main components referred to as active and passive interface. The former

takes place when the user wants to explicitly issue a “command” to the house environ-

ment. Such a command can either be an actuation command (openthe door, play the cd,

88

4.7 – Case study

etc.) or a query command (is the fridge on?, ...).

The second part, instead is related to alert messages or actions forwarded by the House

Manager and the Interaction Manager for the general perception of the house status.

Alerts and actions must be managed so that the user can timelynotice what is happening

and provide the proper responses. They are passive from the user point of view since the

user is not required to actively perform a “check” operation, polling the house for possibly

threatening situations or for detecting automatic actions. Instead, the system pro-activity

takes care of them. House states perception shall be passiveas the user cannot query every

installed device to monitor the current home status.

The alerting mechanism and status update are priority-based: in normal operating con-

ditions, status information is displayed on a banner, carefully positioned on the periphery

of the visual interface avoiding to capture user’s attention too much and is kept out of

the selectable area of the screen to avoid so-called “Midas Touch” problems [98] where

every element fixed by the user gets selected. In addition, the availability of a well known

rest position for the eyes, to fix, is a tangible value added for the interface, which can

therefore support user pauses, and, at the same time, maximize the provided environment

information.

Whenever a high priority information (alerts and Rule Engine actions) has to be con-

veyed to the user, the banner is highlighted and the control application plays an alert sound

that requires immediate user attention.

4.7.6 Direct interaction

When the objects to be controlled or actuated are simple enough, a direct interaction

approach can avoid the drawbacks of a conventional environmental control system that

typically utilises eye interaction with representative icons displayed on a 2D computer

screen. In order to maximize the interface efficiency in these cases, a novel approach

using direct eye interaction with real objects (environmental devices) in the 3D world

has been developed. Looking directly at the object that the user wishes to control is an

extremely intuitive form of user interaction and by employing this approach the system

89

4 – Domotics

does not inherently need the user to sit incessantly before acomputer monitor. This then

makes it suitable for implementation in a wider range of situations and by users with a

variety of abilities. For example, it immediately removes the need for the user first to

be able to distinguish small icons or words, representativeof environmental controllable

devices, on a monitor before making a selection. The approach is termed ART - Attention

Responsive Technology [99]. For many individuals with a disability the ability to control

environmental devices without the help of a family member orcarer is important as it

increases their independence. ART allows anyone who can control their saccadic eye

movements to be able to operate devices easily. A second advantage of the ART approach

is that it simplifies the operation of such devices by removing the need to always present

the user with an array of all potential controllable environmental devices every time the

user wishes to operate one device. ART only presents the userwith interface options

directly related to a specific environmental device, that device being the one that the user

has looked at.

Attention Responsive Technology (ART)

With the ART approach the user can sit or stand anywhere in theenvironment and indeed

move about the environment quite freely. If s/he wants to change an environmental de-

vice’s status, for instance to switch on a light, the user simply visually attends to (looks

at) the light briefly. The ART system constantly monitors theuser’s eye movements and

ascertains the allocation of visual attention within the environment, determining whether

the user’s gaze falls on any controllable device. The environment surrounding the user is

imaged by a computer vision system, which identifies and locates any pre-known device

falling within the user’s point of gaze. If a device is identified as being gazed at, then the

system presents a simple dialogue to ask the user to confirm his/her intention. The actual

interface dialogue can be of any form, for instance a window on a touch sensitive screen

or any tailor-made approach depending on the requirements of the disabled users. Finally

the user would execute an appropriate control to operate thedevice.

90

4.7 – Case study

ART development with a head-mounted eye tracker

A laboratory-based prototype system and its software control interface have been devel-

oped [100, 101]. A head-mounted ASL 501 eye tracker, as shownin Figure 4.12, is used

to record a user’s saccadic eye movements. This comprises a control unit and a headband,

on which both a compact eye camera, which images one eye of theuser, and a scene cam-

era, which images the environment in front of the user, are mounted. Eye movement data

are recorded at 50Hz from which fixation points of varying time periods can be derived.

In order to calibrate the eye movement recording system appropriately the user dons the

ASL system and then must first look at a calibration chart comprising a series of known

spatially arrayed points.The relationship between the eyegaze data from the eye camera

and their corresponding positions in the scene camera are built up by projecting the same

physical point in both coordinate systems using an affine transformation. Eye data are

therefore related to the scene camera image.

Figure 4.12. ASL 501 headband attaching the two optics system.

In order for the ART system to recognise an object in the environment all controllable

devices are first imaged by the system. To do this each device is presented to the scene

camera and imaged at regularly spaced angles when their image SIFT features [102] are

extracted. These features are then stored in a database. Newdevices can easily be added,

as these simply need to be imaged by the ART system and their SIFT features automat-

ically added to the database. To complement each device added, the available device

control operations for it are added to the system so that whenthat device is recognised by

91

4 – Domotics

the ART system then such controls are proffered to the user.

In order to operate a device the user gazes steadily at the device in question. The pro-

totype system is currently driven by a control interface as seen in Figure 4.13. The ART

system recognises the user’s steady gaze behaviour, this can comprise several similarly

spatially-located eye fixations with the overall dwell timeand spatial parameters being

user-specified, which is recorded and a stabilised point of gaze in 3D space is determined

as shown in Figure 4.14(a). This gaze location information is then analysed with respect

to the scene camera image to determine whether or not it fallson any controllable object

of interest. Figure 4.14(b) shows the detection of such a purposeful gaze. A simple in-

terface dialogue, as illustrated in Figure 4.14(c), then appears (in the laboratory prototype

this is on a computer display) asking for the user to make his/her control input and the

system then implements the control action necessary.

Figure 4.13. The ART system control interface

There are two parts to this control interface; the information and feedback offered

to the user and the input that the user can make to the system. The former is currently a

computer display but could easily be something else, such asa head-down display or audio

menu rather than a visual display. The input function can also comprise tailor-designed

92

4.7 – Case study

inputs e.g. touchable switches, chin controlled joy stick,sip/puff switch, or by gaze dwell

time on the displays buttons, depending on the capabilitiesof the disabled user. In the

first ART development the actual device operation was controlled by an implementation

of the X10 protocol, in this work, instead, the ART system hasbeen connected to the

House Manager, enabling users to issue commands to almost every device available in

their homes, without being bound to adopt a specific domotic infrastructure.

Figure 4.14. Typical stages of the ART system (a. Stability of eye gaze captured b. Gazeon object detected c. Control initiated)

One issue of an eye controlled system is the potential false operation of a device sim-

ply because the user’s gaze is recorded as falling upon it. Inherently the user’s gaze must

always fall on something in the environment. There are two built-in system parameters

to overcome this. Firstly, the user must actively gaze at an object for a pre-determined

time period; this is both necessary for the software to identify the object in the scene cam-

era image as well as preventing the constant attempts by the ART system at identifying

objects unnecessarily. Secondly, the user’s eye gaze does not (of itself) initiate device

operation but instead initiates the presentation of a dedicated interface just for that de-

vice. This permits a check on whether or not the user does in fact wish to operate the

device. The ART system work flow is illustrated in Figure 4.15. The ART system has

been designed using a head-mounted eye tracking device for ease of system development.

The overall operational time of the system can be much improved in various ways, such

as programming in C rather than in Matlab as at present. However, there is a generic

weakness of using such a head-mounted eye tracker for prolonged periods of time as the

equipment can potentially cause user discomfort and fatigue. Consequently, eye track-

ing systems which are either head-mounted but smaller and lightweight or systems which

93

4 – Domotics

are physically remote from the user and have no attachment tothe user’s head are being

investigated for long term usage.

Figure 4.15. ART system flow chart

4.8 Guidelines

In strict collaboration within the COGAIN project we defined some guidelines, explaining

how to make control applications, for home automation system, accessible. The guide-

lines are intended for all eye tracking applications developers. The primary goal of these

guidelines is to promote safety and accessibility. The structure of this section is strongly

94

4.8 – Guidelines

inspired by W3C Web Content Accessibility Guidelines.

Each guideline has a priority level based on the impact on safety and accessibility:

• Priority 1 - A smart house application developer must satisfy these guidelines.

• Priority 2 - A smart house application developer should satisfy this guideline.

Category 1: Control applications safety

Guideline 1.1 Provide a fast, easy to understand and multimodal alarm notification. A

user needs to notice as soon as possible that the environmental control system has

sent an alarm. The control application should notify the alarms in several ways,

e.g., with sounds, flashing icons, text messages.

Guideline 1.2 Provide the user only a few clear options to handle alarm events. Several

gaze trackers are less accurate when the users are agitated,therefore in case of alarm

the control application should propose only a limited but clear set of options (3 at

most).

Guideline 1.3 Provide a default safety action to overcome an alarm event when the user

does not decide. In case of emergency the user could lose the control of the input

device, therefore the control application should take the safest decision after a time-

out. The time-out length should be dependent on the alarm type.

Guideline 1.4 Provide a confirmation request for critical and possibly dangerous opera-

tion. With inaccurate or badly configured Gaze trackers the Midas touch error can

be frequent, i.e., each object/ command gazed from the user is selected/executed,

therefore the control application should request a confirmation for possibly danger-

ous operations.

Guideline 1.5 Provide a STOP functionality that interrupts any operation. In some occa-

sions, the environmental control system can operate actions that the user does not

95

4 – Domotics

want, e.g., a selection of a wrong command, or automated and prescheduled sce-

narios, or the user changes idea, etc. The control application should allow a STOP

method for interrupting any operation.

Category 2: Input methods for control application

Guideline 2.1 Provide a connection with the Cogain ETU-driver. The Cogain ETU-

driver described in deliverable D2.3 is a single gaze communication standard that

allows any third party application to be driven by a range of different eye tracking

hardware systems. By using the driver, there is no need for anythird party appli-

cation to be changed or recompiled when switching between differing eye tracking

hardware systems.

Guideline 2.2 Support several input methods. The gaze tracker, unfortunately, can break

down, therefore the control application should support also alternative input meth-

ods, e.g. switch (scansion mode selection), keyboards, mice, etc.

Guideline 2.3 Provide reconfigurable layouts, appropriate for differenteye tracking per-

formances and user capabilities. Eye trackers have a very wide performance range;

therefore, a control application should have a reconfigurable visual interface adapt-

able to different resolutions and precisions of the eye trackers.

Guideline 2.4 Support more input methods at the same time (multimodal interaction).

The user could be able to use alternative input channels beyond the gaze, e.g. voice,

fingers movements, etc. The control application should support the combination of

more input method at the same time, for example selection with gaze and click with

mouse.

Guideline 2.5 Manage the loss of input control providing automated default actions. The

control application should understand when the user has lost the eye tracker control

and should provide default actions (e.g. recalibration, play an alarm, etc.).

96

4.8 – Guidelines

4.2.3 Category 3: Control applications operative features

Guideline 3.1 Respond to environmental control events and commands at the right time.

The control application should be responsive: it should manage events and com-

mands in an acceptable time slot.

Guideline 3.2 Manage events with different time critical priority. The control application

should distinguish between events with different priority. The time critical events

must be acted upon with a short fixed period (e.g. fire alarm, intrusion detection).

Guideline 3.3 Execute commands with different priority. The home automation systems

commonly receive more commands at the same time (e.g. different users, scenarios,

...). The control application should discriminate commands with different priority

and should adopt a prefixed management policy.

Guideline 3.4 Provide feedback when automated operations or commands areexecuting.

Scenarios, selected by the user, could include several scheduled commands. The

control application should show the actions in progress andinform the user when a

scenario is terminated.

Guideline 3.5 Manage (create, modify, delete) scenarios. Repeating a longsequence

of commands to do a frequent task could be tedious for the user. It is necessary

for gathering list of commands and manage them as a single one. The control

application should allow creation, modification and deletion of scenarios.

Guideline 3.6 Know the current status of any devices and appliances. The control appli-

cation should know the current status of any devices and appliances of the home, in

order to show that information and to take smart automated decision (e.g. prevent a

dangerous condition, activated energy saving plan, etc.).

97

4 – Domotics

Category 4: Control applications usability

Guideline 4.1 Provide a clear visualization of what is happening in the house. In ac-

cordance with the guidelines of category 1 and 3, the controlapplication interface

should provide a clear, easy understandable visualizationof the execution progress

of the commands.

Guideline 4.2 Provide a graceful and intelligible user interface. Consistent page layout,

easy to understand language, and recognizable graphics benefit all users. The con-

trol application should provide a graceful and intelligible user interface, possibly

using both images and clear texts.

Guideline 4.3 Provide a visualization of status and location of the house devices. The

control application should show the house map containing, for each room, a repre-

sentation of the devices and their status.

Guideline 4.4 Use colours, icons and text to highlight a change of status. The control

application interface should highlight the device status change using images, texts

and sounds.

Guideline 4.5 Provide an easy-to-learn selection method. In spite of the control applica-

tion could present complex features and functionally, it should provide an usable,

easy-to-learn interaction method.

98

Chapter 5

Conclusions

The research carried during the three years of doctorate, described in this thesis, lead to 2

publication on international journals an 9 publications oninternational conferences (see

Appendix C ).

Good results have been obtained both concerning the development of gaze-based as-

sistive software, and the design of the domotic gateway DOG.Domotics is a more and

more relevant research field involving two main aspects: theassisting living of disable

and elder persons, and the study of advanced solutions for energy saving. The future

researches, stemming from this thesis, will be concentrated in this last field .

The e-lite research group, where I worked these years, is planning to re-engineering

the DOG platform, whit the purpose of making it:

• more scalable and capable of running on devices with very limited computational

capacity.

• more reliable and safe, i.e., able to handle alarms and misbehaviors through the

application of autonomic solutions.

• smarter: by researching and implementing novel algorithmsallowing optimized

management of domestic resources (energy saving).

99

5 – Conclusions

• more accessible: by developing a gaze-based control interface for domotics sys-

tems, compliant with the COGAIN Recommendations.

Since the beginning of my study, in 2006, till now, the context of assistive technologies

is considerably changed. Thanks to the collaboration with COGAIN and the San Giovanni

hospital of Turin, our research group succeeded in spreadong information about new,

advanced, gaze-based assistive technologies and their benefits and impact on the quality

of life of persons affected by ALS. In 2006 the Piedmont RegionHealth Board did not

fund the purchase of eye tracking devices for severe motor disabled, therefore caregivers

are obliged to considerable expenses or raise funding. I’m not sure, if it is even for my

colleagues and mine merit (I like think so), but after the dissemination of our research

results and the organization of an international conference in Turin [103], the Piedmont

Region Health Board started to fund the purchase of eye trackers for people with ALS.

100

Bibliography

[1] United nations enable. http://www.un.org/disabilities/.

[2] Louis Emile Javal. Physiologie de la lecture et de lecriture.Bibliography in Annales

doculistique, pages 137–187, 1907.

[3] R. Cline T.S. Dodge. The angle velocity of eye movements.Psycholigical Review,

8:145–157, 1901.

[4] Judd C.H., McAllister C.N., and Steel W.M. General introduction to a series of

studies of eye movements by means of kinetoscopic photographs. Psychological

Review, Monograph Supplements, 7:1–16, 1905.

[5] D.G Paterson . and M.A. Tinker. Studies of typographicalfactors influencing speed

of reading 10: Style of typeface.Journal of Applied Psychology, 16:605–613,

1932.

[6] Milton J.L. Fitts P.M., Jones R.E. Eye movements of pilotsduring instruments

landing approach.Aeronautical Engineering Review, 9:1–7, 1950.

[7] H. Hartridge and L. C. Thomson. Method of investigating eye movements.British

Journal Ophthalmology, 32:581–591, 1948.

[8] B. SHACKEL. Pilot study in electro-oculography.The British journal of ophthal-

mology, 44:88–113, 1960.

101

BIBLIOGRAPHY

[9] NORMAN H. MACKWORTH and EDWARD LLEWELLYN THOMAS. Head-

mounted eye-marker camera.J. Opt. Soc. Am., 52:713–716, 1962.

[10] R. A. Monty and J. W. Senders.Eyemovements and psychological processes. Hills-

dale, New Jersey sey:Erlbaum Associates, 1976.

[11] Fisher D. F. Senders, J. W. and R. A. Monty.Eyemovements and higher psycho-

logical processes. Hillsdale, New Jersey: Erlbaum Associates, 1978.

[12] JW Senders DF Fisher, RA Monty.Eye movements: cognition and visual percep-

tion. L. Erlbaum Associates, 1981.

[13] T. N. Cornsweet and H. D. Crane. Accurate two-dimensionaleye tracker using first

and fourth purkinje images.J. Opt. Soc. Am, 63:921–928, 1973.

[14] Monty R.A. & Hall R.J. Lambert, R.H. High-speed data processing and unobtru-

sive monitoring of eye movements.Behavioral Research Methods & Instrumenta-

tion, 6:525–530, 1974.

[15] R.A. Monty. An advanced eye-movement measuring and recording system.Amer-

ican Psychologist, 30:331–335, 1975.

[16] J. Anliker. Eye movements: on-line measurement, analysis, and control. Eye

Movements and Psychological Processes, 1976.

[17] H. Collewijn. Eye movement recording.Vision research: A practical Guide to

Laboratory Methods, pages 245–285, 1999.

[18] E. Kowler. The role of visual and cognitive processes in the control of eye move-

ment. Elsevier Science Publishers, 1990.

[19] J.W. Senders. Four theoretical and practical questions. In Eye Tracking Research

and Applications Symposium, 2000.

102

BIBLIOGRAPHY

[20] S. K. Card. Visual search of computer command menus.Attention and Perfor-

mance X, Control of Language Processes, 1984.

[21] J.J. Hendrickson. Performance, preference, and visual scan patterns on a menu-

based system: implications for interface design. InProceedings of the ACM CHI89

Human Factors in Computing Systems Conference, 1989.

[22] A. & Rih Altonen, A. Hyrskykari. 101 spots, or how do usersread menus? In

Proceedings of CHI 98 Human Factors in Computing Systems, 1998.

[23] White K.P. Martin W.N. Reichert K.C. Hutchinson, T.E. and L.A. Frey. Human-

computer interaction using eye-gaze input.IEEE Transactions on Systems, Man,

and Cybernetics, 19:1527–1534, 1989.

[24] J.L. Levine. An eye-controlled computer. Technical report, IBM,Thomas J. Watson

Research Center,, 1981.

[25] J.L. Levine. Performance of an eyetracker for office use. Comput. Biol. Med.,

14:77=89, 1984.

[26] R.A. Bolt. Gaze-orchestrated dynamic windows.Computer Graphics, 15:109–119,

1981.

[27] R.A. Bolt. Eyes at the interface. InProceedings of the ACM Human Factors in

Computer Systems Conference, 1982.

[28] F.A. et al. Glenn. Eye-voice-controlled interface. InProceedings of the 30th Annual

Meeting of the Human Factors Society, 1986.

[29] C. Ware and Mikaelian. An evaluation of an eye tracker as adevice for computer

input. In Proceedings of the ACM CHI+GI87 Human Factors in Computing Sys-

tems Conference, 1987.

[30] A. T Duchowski. A breadth-first survey of eye tracking applications. Behavior

Research Methods, Instruments, & Computers, 4:455–470, 2002.

103

BIBLIOGRAPHY

[31] Young and Sheena. Survey of eye movement recording methods. Behavior Re-

search Methods and Instrumentation, 7, 1975.

[32] A. Glenstrup and T. Engell-Nielsen. Eye controlled media: Present and future state,

1995. http://www.diku.dk/panic/eyegaze/article.html.

[33] MetroVision System. Mon vog from metrovision systems.

[34] EyeTech Systems. Quick glance from eyetech systems.

[35] SensoMotoric Instruments. Sensomotoric instruments.

[36] Dan Witzner Hansen and Arthur E. C. Pece. Eye tracking in the wild. Comput. Vis.

Image Underst., 98(1):155–181, 2005.

[37] Gips J., Olivieri P., and J. Tecce. Direct control of thecomputer through electrodes

placed around the eyes. InFifth International Conference on Human-Computer

Interaction, pages 630–635, 1993.

[38] Mark Shelhamer Aaron Wong Dale Roberts. A new wireless search-coil system. In

Proceedings of Eye tracking research and applications Symposium (ETRA 2008),

pages 197–204, 2008.

[39] Armen R Kherlopian, Joseph P Gerrein, Minerva Yue, Kristina E Kim, Ji Won

Kim, Madhav Sukumaran, and Paul Sajda. Electrooculogram based system for

computer control using a multiple feature classification model. Conf Proc IEEE

Eng Med Biol Soc, 1:1295–8, 2006.

[40] S R Cohen, S A Hassan, B J Lapointe, and B M Mount. Quality oflife in hiv disease

as measured by the mcgill quality of life questionnaire.AIDS, 10(12):1421–7,

October 1996.

[41] S R Cohen, B M Mount, M G Strobel, and F Bui. The mcgill quality of life

questionnaire: a measure of quality of life appropriate forpeople with advanced

104

BIBLIOGRAPHY

disease. a preliminary study of validity and acceptability. Palliat Med, 9(3):207–

19, July 1995.

[42] E. Diener, R. Emmons, J. Larsen, and S. Griffin. The satisfaction with life scale.

Personality Assesment, 49(1):71–75, 1985.

[43] W. Pavot and E. Diener. Review of the satisfaction with life scaledien.Psycholog-

ical Assessment, 5:164–172, 1993.

[44] W. Zung. A self-rating depression scale.Archives of General Psychiatry, 12:63–70,

1965.

[45] M Novak and C Guest. Application of a multidimensional caregiver burden inven-

tory. Gerontologist, 29(6):798–803, December 1989.

[46] John F Deeken, Kathryn L Taylor, Patricia Mangan, K RobinYabroff, and Jane M

Ingham. Care for the caregivers: a review of self-report instruments developed

to measure the burden, needs, and quality of life of informalcaregivers.J Pain

Symptom Manage, 26(4):922–53, October 2003.

[47] W3c web accessibility initiative. http://www.w3.org/WAI/intro/accessibility.php.

[48] W3C. Web content accessibility guidelines 1.0.http://www.w3.org/TR/

WAI-WEBCONTENT/, 1999.

[49] W3C. Web content accessibility guidelines 2.0.http://www.w3.org/TR/

WCAG20/, 2007.

[50] W3C. User agent accessibility guidelines 1.0.http://www.w3.org/TR/

UAAG10/, 2002.

[51] Eye Response Technologies. Erica system.http://www.eyeresponse.

com/ , 2005.

[52] Tobii Technology. Mytobii 2.3.http://www.tobii.com , 2006.

105

BIBLIOGRAPHY

[53] B. H. Thomas and W. Piekarski. Glove based user interaction techniques for aug-

mented reality in an outdoor environment.Virutal Reality, pages 167–180, 2002.

[54] S. L. Oviatt. Mutual disambiguation of recognition errors in a multimodal architec-

ture. InProceedings of the Conference on Human Factors in Computing Systems

(CHI’99), pages 576–583. ACM Press, 1999.

[55] R. A. Bolt. “put-that-there”: Voice and gesture at the graphics interface. InSIG-

GRAPH ’80: Proceedings of the 7th annual conference on Computer graphics and

interactive techniques, pages 262–270, New York, NY, USA, 1980. ACM Press.

[56] P. R. Cohen, M. Johnston, D. McGee, S. Oviatt, J. Pittman, I. Smith, L. Chen,

and J. Chow. Quickset: Multimodal interaction for distributed applications. In

Proceedings of the Fifth ACM International Multimedia Conference, pages 31–40.

ACM Press, 1997.

[57] Rajeev Sharma, Vladimir Pavlovic, and Thomas Huang. Toward multimodal

human-computer interface. InProceedings of the IEEE, volume 86, pages 853–

869, May 1998.

[58] D. Miniotas, O. Spakov, I. Tugoy, and I. S. MacKenzie. Speech-augmented eye

gaze interaction with small closely spaced targets. InProceedings of the 2006

symposium on Eye tracking research and applications, pages 66–72. ACM Press,

2006.

[59] Q. Zhang, A. Imamiya, K. Go, and X. Gao. Overriding errors in speech and gaze

multimodal architecture. InProc. 9th International Conference on Intelligent User-

Interfaces (2004), pages 346–348. ACM Press, 2004.

[60] Melanie Baljko. The information-theoretic analysis ofunimodal interfaces and

their multimodal counterparts. InAssets ’05: Proceedings of the 7th international

ACM SIGACCESS conference on Computers and accessibility, pages 28–35, New

York, NY, USA, 2005. ACM Press.

106

BIBLIOGRAPHY

[61] Rajarathinam Arangarasan, Tushar H. Dani, Chi-Cheng Chu, Xiaochun Liu, and

Rajit Gadh. Geometric modeling in multi-modal, multi-sensory virtual environ-

ment. InProceedings of 2000 NSF Design and Manufacturing Research Confer-

ence, pages 3–6, 2000.

[62] H. Dudley. The vocoder.Bell Labs Record, 17:122–126, 1939.

[63] H. Dudley, R. R. Riesz, and S. A. Watkins. A synthetic speaker. J. Franklin

Institute, 227:739–764, 1939.

[64] K. H. Davis, R. Biddulph, and S. Balashek. Automatic recognition of spoken digits.

J. Acoust. Soc. Am, 24:627–642, 1952.

[65] J. Sakai and S. Doshita. The phonetic typewriter.Information Processing 1962

Proc. IFIP, 1922.

[66] L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finte

state markov chains.Annals of Mathematical Statistics, 37:1554–1563, 1966.

[67] L. E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occur-

ring in the statistical analysis of probabilistic functions of markov chains.Annals

of Mathematical Statistics, 41:164–171, 1970.

[68] F. Jelinek. Continuous speech recognition by statistical methods. Proc. IEEE,

64:532–536, 1976.

[69] S. E. Levinson, L. R. Rabiner, and M. M. Sondhi. An introduction to the application

of the theory of probabilistic functions of a markov processto automatic speech

recognition.Bell Syst. Tech. J., 62:1035–1074, 1983.

[70] Claudia Romellini and Daniele Sereno. The essential roleof a speech platform in

deploying effective and reliable speech enabled applications, 2004.

107

BIBLIOGRAPHY

[71] Poika Isokoschi and Benoit Martin. Eye tracker input in first person shooter games.

The 2nd Conference on Communication by Gaze Interaction – COGAIN 2006:

Gazing into the Future, pages 79–81, 2006.

[72] Poika Isokoski, Aulikki Hyrskykari, Sanna Kotkaluoto, and Benoit Martin.

Gamepad and eye tracker input in fps games: Data for the first 50 minutes.The 3nd

Conference on Communication by Gaze Interaction – COGAIN 2007:Gaze-based

Creativity and Interacting with Games and On-line Communities, pages 11–15,

2007.

[73] Michael Dorr, Martin Bohme, Thomas Martinetz, and Erhardt Barth. Gaze beats

mouse: a case study.The 3rd Conference on Communication by Gaze Interaction

– COGAIN 2007: Gaze-based Creativity and Interacting with Games and On-line

Communities, pages 18–21, 2007.

[74] Howell Istance, Richard Bates, Aulikki Hyrskykari, and Stephen Vickers. Snap

clutch, a moded approach to solving the midas touch problem.In ETRA ’08: Pro-

ceedings of the 2008 symposium on Eye tracking research applications, pages 221–

228, New York, NY, USA, 2008. ACM.

[75] John Paulin Hansen, Dan Witzner amd Johansen Anders Sewerin Hansen, and

Elvesjo John. Mainstreaming gaze interaction towards a mass market for the ben-

efit of all. 3rd international conference on universal access in human-computer

interaction, 2005.

[76] Richard Bates and Oleg Spakov. Implementation of cogain gaze tracking standards.

Communication by Gaze Interaction (COGAIN), 2006.

[77] Scott Davidoff, Min Kyung Lee, John Zimmerman, and Anind Dey. Principle of

Smart Home Control. InProceedings of the Conference on Ubiquitous Computing,

pages 19–34. Springer, 2006.

108

BIBLIOGRAPHY

[78] Sheng-Luen Chung and Wen-Yuan Chen. MyHome: A ResidentialServer for

Smart Homes.Knowledge-Based Intelligent Information and Engineering Systems,

4693/2007:664–670, 2007.

[79] Home gateway technical requirements: Residential profile. Technical report, Home

Gateway Initiative, 2008.

[80] M.A. Just and P.A. Carpenter. Eye fixations and cognitiveprocesses. InCognitive

Psychology 8, pages 441–480, 1976.

[81] Vertegaal, A. Mamuji, C. Sohn and D. Cheng. Media eyepliances: using eye track-

ing for remote control focus selection of appliances. InIn CHI Extended Abstracts,

pages 1861–1864, 2005.

[82] L. Jiang, D. Liu, and B. Yang. Smart home research. InProceedings of the Third

Conference on Machine Learning and Cybernetics SHANGHAI, pages 659–664,

August 2004.

[83] The BTicino MyHome system.http://www.myhome-bticino.it .

[84] The Konnex association.http://www.konnex-knx.com .

[85] Dave Rye. The X10 PowerHouse powerline interface. Technical report, X10 Pow-

erHouse, 2001.

[86] The LonWorks platform. http://www.echelon.com/developers/

lonworks/default.htm .

[87] Vittorio Miori, Luca Tarrini, Maurizio Manca, and Gabriele Tolomei. An Open

Standard Solution for Domotic Interoperability.IEEE Transactions on Consumer

Electronics, 52:97–103, 2006.

109

BIBLIOGRAPHY

[88] Kyeong-Deok Moon, Young-Hee Lee, Chang-Eun Lee, and Young-Sung Son. De-

sign of a Universal Middleware Bridge for Device Interoperability in Heteroge-

neous Home Network Middleware.IEEE Transactions on Consumer Electronics,

51:314–318, 2005.

[89] Eiji Tokunaga, Hiro Ishikawa, Makoto Kurahashi, Yasunobu Morimoto, and Tatsuo

Nakajima. A Framework for Connecting Home Computing Middleware. In Inter-

national Conference on Distributed Computing Systems Workshops (ICDCSW02),

2002.

[90] F.Shi, A. Gale, and K. Purdy. Direct Gaze-Based Environmental Controls. InThe

2nd Conference on Communication by Gaze Interaction, pages 36–41, 2006.

[91] D. Bonino and A. Garbo. An Accessible Control Applicationfor Domotic Environ-

ments. InFirst International Conference on Ambient Intelligence Developments,

pages 11–27, 2006.

[92] OSGi alliance.http://www.osgi.org/ .

[93] D. Bonino and F. Corno. Dogont - ontology modeling for intelligent domotic en-

vironments. In7th International Semantic Web Conference, 2008.

[94] T.R. Gruber. Toward principles for the design of ontologies used for knowledge

sharing.International Journal Human-Computer Studies, 43(5-6):907–928, 1995.

[95] D. L. McGuinness and F. van Harmelen. Owl web ontology language. W3C Rec-

ommendation,http://www.w3.org/TR/owl-features/ , February 2004.

[96] The Eclipse Equinox project.http://www.eclipse.org/equinox/ .

[97] The Apache XML-RPC API.http://ws.apache.org/xmlrpc/ .

[98] R.J.K Jacob and K.S. Karn. Eye Tracking in human computerinteraction and

usability research: Ready to deliver the promises. InThe Mind’s Eye: Cognitive

and Applied Aspects of Eye Movement Research, pages 573–605, 2003.

110

BIBLIOGRAPHY

[99] A.G. Gale. Attention responsive technology and ergonomics. In Bust P.D. and

McCabe P.T., editors,Contemporary Ergonomics 2005, pages 273–276, 2005.

[100] F. Shi , A.G. Gale , K.J. Purdy. Eye-centric ICT control.In Contemporary Er-

gonomics 2006, (Taylor and Francis, London), pages 215–218, 2006.

[101] F. Shi , A.G. Gale , K.J. Purdy. Helping People with ICT Device Control by Eye

Gaze. InMiesenberger K., Klaus J., Zagler W. and Karshmer A. (Eds.) Lecture

Notes in Computer Science (Springer Verlag Berlin), pages 480–487, 2006.

[102] Lowe D.G. Distinctive Image Features from Scale-Invariant Keypoints. InInter-

national Journal of Computer Vision., volume 2, pages 91–110, 2004.

[103] cogain 2006: Gazing into the future. InThe 2nd Conference on Communication

by Gaze Interaction, 2006.

111

BIBLIOGRAPHY

112

Appendix A

Abbreviations

AAC Augmentative and Alternative Communication systems

ALS Amyothrofic Lateral Sclerosys

ART Attention Responsive Technology

ASE Accessible Surfing Extension

COGAIN CO mmunication byGAzeIN teraction

DGC Direct Gaze Control

DOG DomoticOSGi Gateway

DOM Document Object Model

GKB Gaze and keyboard button

GT Gaze to Target

GTK Gaze tracking and keyboard

GW Gaze Window

IDE Intelligent Domotic Environment

113

A – Abbreviations

IGM Independent Gaze and Movement

MS Multiple Sclerosys

MGS Mc Gill scale

SPBS Self-Perceived Burden Scale

SWLS SatisfactionWith Life Scale

VK Virtual Keyboard

114

Appendix B

Convention on the Rights of Persons

with Disabilities

The Convention is a response to an overlooked development challenge: approximately

10% of the world’s population are persons with disabilities(over 650 million persons).

Approximately 80% of whom live in developing countries It isa response to the fact that

although pre-existing human rights conventions offer considerable potential to promote

and protect the rights of persons with disabilities, this potential was not being tapped.

Persons with disabilities continued being denied their human rights and were kept on the

margins of society in all parts of the world. The Convention sets out the legal obligations

on States to promote and protect the rights of persons with disabilities. It does not create

new rights.

B.1 Guiding Principles of the Convention

There are eight guiding principles that underlie the Convention and each one of its specific

articles:

1. Respect for inherent dignity, individual autonomy including the freedom to make

one’s own choices, and independence of persons

115

B – Convention on the Rights of Persons with Disabilities

2. Non-discrimination

3. Full and effective participation and inclusion in society

4. Respect for difference and acceptance of persons with disabilities as part of human

diversity and humanity

5. Equality of opportunity

6. Accessibility

7. Equality between men and women

8. Respect for the evolving capacities of children with disabilities and respect for the

right of children with disabilities to preserve their identities

116

Appendix C

Pubblications

• BONINO D, CASTELLINA E., CORNO F, GALE A, GARBO A, PURDY K, SHI

F (in stampa).A blueprint for integrated eye-controlled environments. UNIVER-

SAL ACCESS IN THE INFORMATION SOCIETY, ISSN: 1615-5289

• BONINO D, CASTELLINA E., CORNO F (2008).The DOG Gateway: Enabling

Ontology-based Intelligent Domotic Environments. IEEE TRANSACTIONS ON

CONSUMER ELECTRONICS, vol. 54/4, ISSN: 0098-3063, doi: 10.1109/TCE.2008.4711217

• BONINO D, CASTELLINA E., CORNO F (2008).DOG: an Ontology-Powered

OSGi Domotic Gateway. In: 20th IEEE Int’l Conference on Tools with Artificial

Intelligence. Dayton, Ohio, USA, November 3-5IEEE

• CALVO A, CHI A, CASTELLINA E., CORNO F, FARINETTI L, GHIGLIONE P,

PASIAN V, VIGNOLA A (2008). Eye Tracking Impact on Quality-of-Life of ALS

Patients. In: Lecture Notes in Computer Science 5105. Linz, Austria, 9/7/2008 -

11/7/2008K. Miesenberger et al., p. 70-77

• CASTELLINA E., CORNO F (2008).Multimodal Gaze Interaction in 3D Virtual

Environments. In: COGAIN 2008Communication, Environment and Mobility Con-

trol by Gaze. Prague, September 2-3, p. 34-38, ISBN/ISSN: 978-80-01-04151-2

117

C – Pubblications

• CASTELLINA E., CORNO F, PELLEGRINO P (2008).Integrated Speech and

Gaze Control for Realistic Desktop Environments. In: ETRA ’08 Symposium on Eye

Tracking Research & Applications. Savannah, GA, U.S.A, March 26-28, 2008ACM

Press

• DARIO B, CASTELLINA E., CORNO F (2008).Uniform Access to Domotic En-

vironments through Semantics. In: SWAP 2008 - Fifth Workshop on Semantic Web

Applications and Perspective. Rome, 16,17 december 2008

• CASTELLINA E., CORNO F (2007).Accessible Web Surfing through gaze inter-

action. In: Gaze-based Creativity, Interacting with Games and On-line Communi-

ties. Leicester, UK, 3-4 september 2007, p. 74-77

• D. BONINO, CASTELLINA E., F. CORNO, A. GARBO (2006).Control Applica-

tion for Smart House through Gaze interaction. In: Proceedings of COGAIN 2006

Gazing into the Future. Torino, 03/09/2006, p. 32-35

118

Documents

Assistive Technology and Applications for the independent ...elite.polito.it/files/castellina_phd_tesi.pdf · solutions based on adopting eye tracking tools and environmental control