Gaze directed displays as an enabling technology for attention aware systems

Computers in

Computers in Human Behavior 22 (2006) 615–647

www.elsevier.com/locate/comphumbeh

Human Behavior

Gaze directed displays as an enabling technologyfor attention aware systems

Alexander Toet *

TNO Human Factors, Vision Group, Kampweg 5, 3769 DE Soesterberg, The Netherlands

Available online 7 February 2006

Abstract

Visual information can in principle be dynamically optimised by monitoring the user’s state ofattention, e.g. by tracking eye movements. Gaze directed displays are therefore an importantenabling technology for attention aware systems. We present a state-of-the-art review of both (1)techniques to register the direction of gaze and (2) display techniques that can be used to optimallyadjust visual information presentation to the capabilities of the human visual system and themomentary direction of viewing. We focus particularly on evaluation studies that were performedto assess the added value of these displays. We identify promising application areas and directionsfor further research.� 2005 Elsevier Ltd. All rights reserved.

Keywords: Eye movements; Attentive displays; Gaze registration; Gaze contingent displays

1. Introduction

The human gaze reveals information about the user’s intention and attention. It is apotential porthole into his current cognitive processes. The human fixation behaviour overtime reveals information on the cognitive state of the user such as confusion or fatigue, oron his degree of expertise. If a computer knows where the user fixates it can react by takingappropriate actions like presenting the information in an adapted form, presenting addi-tional information or activating fixated items. Gaze directed displays are therefore animportant enabling technology for attention aware systems.

0747-5632/$ - see front matter � 2005 Elsevier Ltd. All rights reserved.

doi:10.1016/j.chb.2005.12.010

* Tel.: +31 346 356237; fax: +31 346 363977/353977.E-mail addresses: [email protected], [email protected].

mailto:[email protected]

mailto:[email protected]

616 A. Toet / Computers in Human Behavior 22 (2006) 615–647

Attentive gaze directed interfaces may optimize the user’s attentive resources by (1)reducing the attentional visual load (clutter reduction), by (2) enhancing their attentivecapacity, and by (3) attracting or guiding the user’s attention.

Operators in complex event-driven domains face considerable and often competingattentional demands. The capacity to attend to several objects simultaneously decreasesas a function of visual complexity (Alvarez & Cavanagh, 2004). Adding more detail toa display (thereby increasing its complexity or the amount of clutter) places a greaterdemand on the user to actively ignore task-irrelevant features, which in turn can lead toa decreased awareness of all unattended features (Most et al., 2001; Simons & Chabris,1999). In situations where multiple events occur simultaneously, operators can fail todetect important changes even when they are not fatigued, stressed or multitasking(change blindness, e.g. Durlach, 2004). A gaze contingent display could contribute todiminish these effects by temporarily eliminating less relevant peripheral details (reducingthe amount of clutter) during changes, thereby increasing the chance that they will benoticed by the user. In general, dynamical filtering of information on the basis of userinterest allows cognitive load associated with complex displays to be managed moreeffectively.

Nowadays there is a range of visual dynamic information display techniques thatenhance the user’s visual capacity by optimally tuning the displayed visual informationto the characteristics and limitations of the human visual system. The results from severalevaluation studies indicate that new interface technologies and new display techniques canindeed enhance observer performance for certain applications.

Simple and unobtrusive video-based gaze tracking systems are now available thatenable free viewing with high accuracy over a large tracking volume. These systems cansimultaneously be used for posture and gesture based interaction. Simple and inexpensiveeye contact sensors can be deployed to construct displays and devices that adapt their statewhenever a user looks at them. All these devices can be deployed to transform the com-puter into an agent that continuously monitors the user’s state of attention, and that usesthis knowledge to attract or guide the user’s attention.

In this study we will present a state-of-the-art review of computer interfaces that useinformation about the human visual viewing behaviour to optimise visual informationtransfer to the user. The key components of these interfaces are (1) gaze registration tech-niques and (2) adaptive information display techniques. We will also identify severalpromising application areas that involve the integration of multiple interface modalitiesand presentation techniques.

1.1. Rationale for monitoring eye movements

Eye gaze implicitly indicates the area of the user’s attention. People continuously gazeat the world while performing other tasks. It is clear that while fixating the eyes on onelocation attention may be directed at other locations (Posner, 1980). However, it is mostlikely that spatial attention and saccade planning are closely coupled during naturalunconstrained eye movements (Findlay & Gilchrist, 1998). Thus, covert attention appearsto supplement, not substitute for, actual eye movements (Findlay & Gilchrist, 2001). Eyemovements are extremely fast and require little conscious effort. Eye gaze has thereforelong been considered as a promising candidate interactive visual display input method(Jacob & Karn, 2004; Stampe & Reingold, 1995). Gaze controlled display interfaces prom-

A. Toet / Computers in Human Behavior 22 (2006) 615–647 617

ise important benefits, such as freeing the user’s hands for other tasks, and faster reactiontimes compared to other pointing devices.

1.2. Information from eye movements

The amount of information that can be derived from registering the human eye is lim-ited to

(1) the instantaneous direction of gaze or fixation,(2) the fixation behaviour over time, and(3) variations in the pupil size.

1.2.1. Individual fixationsFixations are pauses in the eye scanning process over informative regions of interest

(Salvucci & Goldberg, 2000). Measures of fixation include two attributes: the fixationduration (i.e. the time spent investigating a local area of the visual field; also known asdwell time) and the number of fixations (i.e. the number of times the eye stops on a certainarea of the visual field). Longer fixation duration implies more time spent on interpreting,processing or associating a target with its internalized representation. Fixation duration isnegatively correlated to the efficiency of task execution (Goldberg & Kotval, 1998; Kotval& Goldberg, 1998). A larger number of fixations implies that more information is requiredto process a given task (Backs & Walrath, 1992).

1.2.2. Fixation behaviour

Eye movements can be classified, according to the situations in which they occur, as(Kahneman, 1973):

(1) Spontaneous, when the subject views a scene without any specific task in mind, i.e.when he is ‘‘just watching’’ the scene.

(2) Task-relevant, when the observer views the scene with a particular question or task inmind, and is therefore searching for a specific type of information.

(3) Orientation of thought looking, when the observer is not paying much attention towhere he is looking, but is attending to some ‘‘inner thought’’.

Glenstrup added a new type of looking that has become more prevalent through the use ofeye-gaze media:(4) Intentional manipulatory looking is the observer’s act of directing the eyes to a specific

part of the scene or in a specific way, with the intention of manipulating objects inthe scene.

Generally, the eyes are not merely attracted by the physical qualities of the items inthe scene, but rather by how important the viewer would rate them to be. Thus, whenviewing faces, the eyes of the viewer will be attracted mostly by the eyes, lips and nose.It is reasonable to assume that during spontaneous or task-relevant looking (and forintentional manipulatory looking this is trivially true), the direction of gaze is indica-tive of what the observer is interested in (e.g. Barber & Legge, 1976; Bolt, 1984; Ware& Mikaelian, 1987). Eye movement analysis has successfully been applied to design,evaluate and optimise visual information displays (e.g. Lin, Zhang, & Watson, 2003).


However, it has also been shown that observers do not necessarily attend to what theyare looking at and they do not necessarily look at what they are attending to (Barber& Legge, 1976).

The eye-gaze pattern is determined partly by the composition of the scene, and partlyby the observer’s thoughts and stored knowledge of the items in the scene (memory). Theworking memory (Baddeley, 1981) is where these two determinants meet, and research hasshown that the actual control of the eyemovements is probably performed by someprocess-monitoring system. When a scene is observed, it is initially scanned for importantelements, and this scanpath is then more or less repeated in successive cycles; the observerdoes not to any great extent attend to the remaining, less scanned part of the scene (Yar-bus, 1967). Finally, some studies have shown that not only the reaction times for detectingtarget objects are indicative of the difficulty of the processing task, but also the duration ofthe fixations reflects the time it takes to register and process the fixated information (Fitts,Jones, & Milton, 1950; Goldberg & Kotval, 1998; Hooge & Erkelens, 1998; Jacob & Karn,2004).

1.2.3. Pupil size

Pupil diameter usually increases with an increase in mental workload (May, Kennedy,Williams, Dunlap, & Brannan, 1990). A change in the pupil diameter reflects difficultylevels of tasks; greater complexity either in the task itself or the interface form leadsto greater pupil dilation (Stern, 1997). Also, there is suggestive evidence that the pupilconstricts as a physiological sign of ‘fatigue’, or as a function of the time spent on a giventask (Stern, 1997). However, the relative sensitivity of the change in the pupil diameter asa measure of mental workload is highly debatable (Lin et al., 2003; Stern, 1997). Forexample, pupil response to arousing visual stimuli is controversial (Hess & Polt, 1964).Also, of special interest is that pupil dilation reaches a maximum as maximum effectivestorage is reached, and pupils may even decrease in diameter when memory is overloaded(Peavler, 1974).

1.3. Quality of eye movement data

The collection of useful eye movement data involves several challenges. Problems thatoccur in practice are (Schnipke & Todd, 2000):

� the pupil does not reflect enough light,� the pupil is occluded by eyelashes or eyelids,� the iris is too light in color to be distinguished from the pupil reflection,� the subject’s eyes are too dry.

All of these effectsmake it hard to track the eye.While some eye-tracking systems use variousrestraining methods to keep the subject’s head stationary, remote systems allow subjects tomove their heads and perform tasks in a more natural way. However, remote eye-trackingsystems introduce their own specific problems, such as (Schnipke & Todd, 2000):

� delays that occur when the eye-tracker has to reacquire the eye after momentarily losingit when the head has moved,

� a possible loss of calibration.

1.4. Using eye movement data to optimise visual information transfer

Eye movement information can be used both for interactive and for diagnostic purposes(Duchowski, 2001).

Interactive eye tracking systems typically respond in some way to the location of theuser’s gaze. These systems can be classified as selective and gaze-contingent. In selectiveapplications the user’s gaze acts as an alternate mode of input (e.g. Bates, 2002; Hyr-skykari, 1997; Magee, Scott, Waber, & Betke, 2004; Ohno, 1998). In gaze-contingentapplications the system achieves a maximal information transfer by using the informa-tion on the user’s direction of gaze to optimally adjust the visual display mode (1) tothe human visual system (e.g. Parkhurst & Niebur, 2002; Reingold, Loschky, McConkie,& Stampe, 2003) and (2) to the momentary information requirements (e.g. Starker &Bolt, 1990).

Diagnostic eye tracking systems register the human visual scanning behaviour over alonger period of time, to refine their model of the user’s behaviour. They improve the qual-ity of interaction by tuning their response to the user’s overt visual attention (e.g. Roth-rock, Koubek, Fuchs, Haas, & Salvendy, 2002). In addition, approaches that includeprediction of the next gaze location based on the one immediately prior may be combinedwith prediction based on salient areas in the image (Parkhurst, Law, & Niebur, 2002; Raj-ashekar, Cormack, & Bovik, 2003) to improve speed and accuracy.

1.5. Challenges of gaze directed interfacing

However, the design and implementation of gaze-based computer input has been facedwith two major challenges: the inherently limited accuracy of the method and the fact thatit is unnatural to overload the visual channel with motor control tasks. First, given theone-degree size of the fovea and the subconscious jittery motions that the eyes constantlyproduce, eye gaze is not precise enough to operate user interface widgets such as scroll-bars, hyperlinks, and slider handles. A remedy for this problem can for instance be tointeractively enlarge the inspected region of a display (Bates & Istance, 2002; Miniotas& Spakov, 2004). Second, and perhaps more importantly, the eye, as one of our primaryperceptual devices, has not evolved to be a control organ. Sometimes its movements arevoluntarily controlled while at other times it is driven by external events. When dwell timeis used to select the target (which is considered more natural than selection by blinking:Jacob, 1993) one has to be conscious of where one looks and how long one looks at anobject. If one stares at an object for more than a set threshold, (e.g., 200 ms), the objectwill be selected, regardless of the user’s intention. This effect, known as the ‘‘MidasTouch’’ problem (Jacob, 1990), can be annoying and counter-productive (such as unin-tended jumps to a web page). People expect to be able to look at an object without havingthe look ‘‘mean’’ anything (i.e. they certainly do not expect the look to initiate any action).Moreover, systems that implement dwell time run the risk of disallowing visual scanningbehaviour, requiring users to control their eye movements for the purpose of output ratherthan input (e.g. Glenstrup & Engell-Nielsen, 1995). Furthermore, dwell time can only sub-stitute for a single mouse click. In practice often two steps are needed to activate a target.A single click usually selects the target (e.g., an application icon) and a double click (or adifferent physical button click) opens the icon (e.g., launches an application). To performboth steps with dwell time is even more difficult. In short, to overload the visual perception



channel with a motor control task seems fundamentally at odds with users’ natural mentalmodel in which the eye searches for and takes in information and the hand produces out-put that manipulates external objects. Other than for disabled users, who have no alterna-tive, using eye gaze for practical pointing does not appear to be very promising (Bates &Istance, 2003).

1.6. Overview

In the rest of this study we will first give a brief overview of the state-of-the-art ofgaze registration techniques. We will focus particularly on non-obtrusive techniquesthat register the direction of gaze and simple techniques that merely detect eye contact.Next we will present a brief overview of some different ways to optimise visual infor-mation transfer on limited screen areas. Given the instantaneous direction of gaze,these display methods can in principle be deployed to achieve optimal visual informa-tion transfer to the user. Then we will present an overview of the range of techniquesthat have been developed to interact with the different visual information displays.Finally, we will discuss some potential applications of gaze directed displays in atten-tion aware systems.

2. Gaze registration techniques

The ideal gaze-tracking system should have the following specifications (e.g. Glenstrup& Engell-Nielsen, 1995; Morimoto & Mimica, 2005):

(1) plug and play, i.e. simple to operate;(2) non-obtrusive, i.e. it should not obstruct the field of view and should preferably be a

remote system with no physical attachment to the observer;(3) accurate (e.g., less than 0.5� error);(4) high temporal resolution (e.g., 500-Hz sampling rate) to allow a real time response;(5) high spatial resolution and low noise;(6) ability to determine gaze position in a wraparound 360� field of view; and(7) affordable.

As adequately summarised by Morimoto and Mimica (2005), the ideal gaze-trackingsystem should work anywhere, for everyone, all the time, with any application, withoutthe need for setup, and should cause the user no harm or discomfort. In contrast, currentgaze-tracking technologies tend to have trade-offs among factors such as ease of operation,comfort, accuracy, spatial and temporal resolution, field of view, and cost.

Over the last few years a lot of progress has been made with the development of free-head gaze tracking systems. Some approaches use position sensors attached to the user todetermine the 3-D position of his head. These methods still restrict the user’s freedom, andare therefore not practical for many applications. Machine vision based approaches haveresulted in fast and accurate gaze-tracking systems that require no physical contact withthe user (e.g., Atienza & Zelinsky, 2003; Haro, Flickner, & Essa, 2000; Matsumoto &Zelinsky, 2000; Stiefelhagen, Yang, & Waibel, 1997b).

Gaze tracking techniques can be classified into the following three categories: two-dimensional techniques, model-based 3-D techniques, and three-dimensional techniques.


2.1. Two-dimensional techniques

Those gaze tracking techniques which cannot provide full information of the 3-D line-of-sight are classified as 2-D techniques. In 2-D gaze tracking systems, the 3-D position ofthe eye is usually unknown, and only the relative orientation of user’s eye with respect touser’s head is measured. In general, 2-D techniques require users to hold their head verystill. Furthermore, if highly accurate gaze tracking results are required, some auxiliaryapparatus, such as head rest, chin rest or bite bar, have to be used in conjunction withthe 2-D gaze tracking techniques. Two-dimensional gaze tracking techniques include theelectro-oculographic potential (EOG) technique (Gips, Olivieri, & Tecce, 1993), the pupiltracking technique (Zhu, Fujimura, & Ji, 2002), the artificial neural network technique(Baluja & Pomerleau, 1994; Ji & Zhu, 2002), the pupil and corneal reflection tracking tech-nique (Frey, White, & Hutchinson, 1990; Hutchinson, White, Martin, Reichert, & Frey,1989; White, Hutchinson, & Carley, 1993), the dual Purkinje image tracking technique(Cornsweet & Crane, 1973), and the sclera coil (or search coil) technique (Bour, 1997).

The least obtrusive of these methods which is most commonly applied is the pupil andcorneal reflection tracking technique (Ebisawa, 1989; Haro et al., 2000; Hutchinson et al.,1989; Stampe & Reingold, 1995; White et al., 1993), which uses a low-intensity near-infared source to illuminate the eye and an infrared sensitive camera to register the imageof the eye. The fraction of light reflected on the surface of the cornea appears in the imageas a small bright spot (a glint). The fraction of light that enters the pupil is reflected by theretina and appears in the image as an area of light called the bright eye. The bright eye isless intense than the glint but brighter than the dark image of the surrounding iris, fromwhich it can therefore easily be segmented. The direction of gaze can be computed fromthe relative positions of respectively the pupil center and the glint on the cornea. However,the method does not take head movements into account, and the results depend on theposition of the eye. As a result, the approach only works when the head of the observeris fixated. In those conditions, the method can achieve an accuracy of less than 1 0 (Daunys& Ramanauskas, 2004).

Magee et al. (2004) recently presented a simple real time gaze directed vision interfacebased on an average PC with video input from an inexpensive USB webcam. The systemtracks the face of the user through multi-scale template correlation. The left and right eyesare detected and compared to determine if the user is looking to the centre, left or rightside of the display. The output can be used to control applications. In its current imple-mentation head rotations can cause the face tracker or eye direction classification to breakdown. No further details about the accuracy of the system are supplied.

Any 2-D gaze tracking system can be extended to a 3-D tracking system, if the absolute3-D position of the eye (it could be the curvature center of the cornea, the rotation centerof the eyeball, or any other fixed point on the head) can be determined.

2.2. Model-based 3-D techniques

As implied by the name, these techniques use a 3-D model of some facial feature pointsto provide 3-D information of the line-of-sight from a monocular image.

The gaze tracking system implemented by Stiefelhagen, Yang, and Waibel (1997a)tracks six facial feature points (eyes, lip corners and nostrils) for estimating the head poseand the gaze direction.


Collet, Finkel, and Gherbi (1997) have developed a similar gaze tracking system. Theyproposed to improve the robustness of their facial feature tracking subsystem by introduc-ing a recognition module to verify the tracking results. In order to track the facial features,the view field of the tracking camera has to be made sufficiently large to cover the entirehead of a user. The advantage of this approach is that users are allowed to move their headfreely in a specified area, but using a camera with a large field of view yields a lower gazetracking accuracy. To overcome this drawback, one can use an additional camera toacquire a high-definition image of the user’s eye.

Kim and Ramakrishna (1999) estimate the direction of gaze in the presence of slighthead movements from a single camera image. The subject has to calibrate the system bygazing at a number of predefined screen points. The eye gaze is computed by finding cor-respondences between points in a geometric model of a face and points in the cameraimage.

Matsumoto and Zelinsky (2000) introduced a real-time 3-D stereo vision based headpose and gaze direction measurement system. The system determines the head pose by ste-reo matching a 3-D facial feature model to the stereo video images. The corners of the eyesand the mouth are used as anchor points, and need to be indicated by the user throughmouse pointing. The gaze direction is determined from the pose of the head and the posi-tion of the irises of the eyes, using a 3-D eye model. The tracking method works well evenin daylight conditions. Because it is robust there are several application areas like gazetracking while driving a car. However, the accuracy of the gaze vector is only about 3�.

2.3. Three-dimensional techniques

Pastoor et al. have implemented a gaze tracking system which consists of a head trackerfor tracking the 3-D eye locations and a pan-tilt-zoom camera for tracking the gaze direc-tion using corneal reflection and pupil images (Pastoor, Liu, & Renault, 1999). Sugiokaet al. have also developed a gaze tracking system with a pan-tilt- zoom camera for gazetracking (Sugioka, Ebisawa, & Ohtani, 1996). An individual calibration process is requiredin both algorithms. The main difference between the two gaze tracking systems is that Sug-ioka et al. utilized an ultrasonic device to estimate the distance between the pan-tilt-zoomcamera and the eye. However, the direction of the line-of-sight is still estimated by 2-Dtechniques using pupil and corneal reflection as done in (Frey et al., 1990; Hutchinsonet al., 1989; White et al., 1993). Also, the accuracy of the system is not good becausethe distance measured by an ultrasonic sensor is inaccurate.

Yoo (2004), Yoo and Chung (2004), Yoo and Chung (2005), Yoo, Kim, Kim, andChung (2002) developed a gaze tracking system based on five IR LEDs and a CCD cam-era. Four of the LEDs are attached to the corners of a monitor. Their reflections on thesurface of the eye’s cornea represent the projection of the monitor plane and can be used todetermine the direction of gaze. The fifth LED is located at the center of the camera lens.The light emitted by this LED goes straight into the eye through the pupil and is reflectedby the retina. This reflection makes the pupil appear bright in the camera image. Theresulting difference between the intensity levels of the pupil and the iris becomes so largethat it is easy to segment the pupil region. The center of the pupil is computed from anellipse that is fitted to the boundary of the pupil region. The backprojection of the pupilcenter on the monitor is computed from the geometry of the four LEDs in the corners ofthe monitor and their corneal projections. This projection corresponds to the point on the


monitor screen that is fixated by the user. The method is simple and fast, does not requireknowledge of the geometry of the cameras, monitor and eyes, the cameras don’t need to becalibrated, and the method is (robust for large head movements). The average accuracy ofthe method is about 0.65� (Yoo & Chung, 2004, 2005).

Shih and Liu (2004) recently presented a novel approach to real-time three-dimensional(3-D) gaze tracking using 3-D computer vision techniques. This method determines theoptical axis of the eye (defined as the 3-D line connecting the center of the pupil andthe curvature center of the cornea) using just two cameras and two point light sourceslocated at known positions. Before using the system, each user only has to stare at a targetpoint for a few (2–3) seconds so that the constant angle between the 3-D line of sight andthe optical axis can be estimated. If the geometric relation between the cameras and thescreen has been calibrated, the optical axis of the eye can then be transformed into thescreen coordinate system to obtain the fixation point on the screen. Hence, users no longerhave to participate in an intensive user-dependent calibration process as required by othermethods. This gaze tracking method differs from other existing methods in the followingways.

(1) A pair of calibrated stereo cameras are used to track the gaze information of one eye,while other methods use one camera for tracking one eye.

(2) The positions of the two point light sources are calibrated beforehand.(3) Both the position and the orientation of the optical axis of user’s eye can be explicitly

determined by solving linear equations. Thus, the gaze tracking results can be usednot only for inferring fixation points on the screen but also for inferring 3-D fixationpoints in the real world.

(4) Because the proposed method employs a 3-D computer vision technique, users nei-ther have to keep their head still nor spend a lot of time calibrating the gaze trackingsystem.

Talmi and Liu (1999) proposed a gaze tracking system that used a stereo camera for eyepositioning and a gaze detection camera. They apply Principal Component Analysis todetect the user’s eye position from the stereo camera output. Corneal reflection is usedfor gaze detection. However, no actual gaze data is presented, and nothing is mentionedabout personal calibration.

Park and Lim (2004) proposed a reflection marker based head-direction detectionmethod for human–computer interaction.. Their method applied three infrared reflectivemarkers that are attached to the user’s eyeglasses. While the user can move his head freelywith this method, the use of the markers still hinders the intermittent use of the system.

Beymer and Flickner (2003) proposed a head-free gaze tracking system which is a com-bination of a wide angle stereo system for eye positioning, and a narrow angle stereo sys-tem for gaze detection. For high-resolution tracking, the eye is modelled in 3-D, includingthe corneal ball, pupil and fovea In their system, pan and tilt directions of narrow anglecamera are controlled using rotating mirrors with galvano motors. They attained an accu-racy of 0.6� when the subject calibrated the system by looking at nine points on the screenfrom two different head positions.

Wang and Sung (2004) presented an approach to measure the eye gaze via images of thetwo irises. Their method uses a pose camera and a gaze camera, which is built on a pan-tiltunit, to obtain a high resolution image of the iris. An evaluation test showed that the


method is not very accurate. The problem of these approaches is that the accuracy of gazedetection is low, which limits the application area. More recently they optimised theirapproach by using a high-resolution image of a single iris obtained from a zoom-camerathat is driven by a head pose estimation system (Wang, Sung, & Venkateswarlu, 2005).With this new approach the accuracy of gaze detection is better than 1�.

Ohno and Mukawa (2004) recently presented a stereo vision based gaze tracking systemthat offers free viewing in combination with a simple and fast calibration procedure. Thesystem consists of an eye positioning unit and a gaze detection unit. The eye positioningunit detects the user’s eye position using a stereo camera and controls the direction ofthe gaze detection unit accordingly. The gaze detection unit, which consists of a near-infra-red camera and a near-infrared light-emitting diode (LED) array, detects the user’s gazedirection and his point of gaze. Personal calibration of the system is performed by lookingat two markers on the screen. The accuracy of the implemented system is about 1.0� (viewangle).

Morimoto, Amir, and Flickner (2002) proposed a new technique for remote eye gazetracking and detection of point of regard, that eliminates two of the major problems ofmost current remote eye gaze tracking systems, namely (1) the need for user calibrationbefore each session and (2) the degradation of accuracy with head movement. The newmethod uses a single calibrated camera, several (a minimum of two) light sources withknown positions and a physical model of the eye to estimate the 3-D position of theeye and its gaze direction. The method has only been simulated using ray tracing. The sim-ulation results indicate its feasibility. However, the accuracy is still rather low (2.5�).

Ruddarraju et al. (2003) developed a fast vision-based eye tracking method that locatesand tracks user’s eyes as they interact with an application. Multiple infrared lighting cam-eras are used to track the position of the subject’s eyes. In each of the camera images theposition of the two corners of the mouth are then estimated from the tracked eye locations.The user’s 3-D head pose is then estimated using the mouth corners and eye positions fromall cameras as low-level input features. The use of multiple cameras provides a large track-ing volume. When the user’s eyes are no longer visible or partly occluded in one cameraimage another camera can take over. The position estimates of the eyes and mouth cornersare used to compute a head pose vector. The method is robust for variations in lightingconditions.

Zhu and Ji (2004) presented a vision based real-time gaze tracking system using activeIR illumination. They used a new gaze calibration procedure that identifies the mappingfrom pupil parameters to screen coordinates through generalized regression neural net-works. The resulting non-analytical mapping function generalizes to individuals not usedin the training process, and explicitly accounts for head movements. As a result, their sys-tem does not need user calibration and performs robust and accurate gaze estimationunder rather significant head movements. However, the angular gaze accuracy is currentlyonly about 5� horizontally and 8� vertically.

2.4. Summary

Simple and unobtrusive video-based gaze tracking systems are now available. Thesesystems enable free viewing in combination with simple and fast calibration procedures.The accuracy of video-based eye trackers is much improved compared with previous meth-ods and is often better than 0.5�. The use of multiple cameras can provide a large tracking


volume. An additional advantage of these methods is that the camera images also conveyother relevant visual information about the user, like identity, facial expression, postureand gestures.

3. Visual information presentation techniques

Effective visual information presentation strategies may help to improve observer perfor-mance in tasks that depend critically on visual information transfer (e.g. visual surveillance,monitoring, inspection). The total amount of information that can be presented simulta-neously by a display medium is restricted by the area of the screen. Traditional display tech-niques apply windowing techniques and scroll/zoom features in an attempt to overcomethis problem. A serious drawback of these methods is that the zoomed regions are displayedin separate windows. This may result in a loss of context (situational awareness), which mayin turn degrade user performance in for instance navigational tasks. The amount of detailneeded to perform a certain task may vary from different locations on the display. Regionsof interest (focus points) should be displayed with more detail. On the other hand, the back-ground serves to provide the global context needed to relate them to each other and cantherefore be displayed with less detail. This suggests the use of variable scale displays thatsimultaneously present enlarged views of the regions of interest with a less detailed (buttopologically correct) representation of their context. The scale (the level of detail) of thepresented information may vary either gradually or discontinuously over the screen area.Because the display area is fixed variable scale displays inherently induce image distortions.In this section we first give a brief overview of some approaches that have been presented tooptimise the visual information transfer using limited screen areas. Then we will present anoverview of the different techniques that have been developed to interact with the displayedinformation.

3.1. Display modes

In this section we will briefly review a range of different visual information presentationmodes that were designed to optimise visual information transfer on displays with limitedscreen sizes.

3.1.1. Non-distortion oriented techniques

Initially two main strategies were designed to cope with limited display size. The firstapproach is to partition or tile the screen into a number of non-overlapping windows. Thisstrategy is not feasible when a proliferation of information items needs to be displayed(e.g. windows, menus, dialog boxes, or tool palettes). A non-overlapping tiling may resultin items that are simply too small to resolve. The second strategy is to use overlapping win-dows. Only the top one is visible at any given time, and a mechanism is provided to rapidlychange which window is visible (temporal sequencing). This strategy is also undesirable,since overlapping opaque objects obscure portions of information we may need to see.Frequently, a hybrid of the first two strategies is used.

Another approach to optimise visual information presentation on limited display areasis the use of semi-transparency or multi-layer displays (Beverly, Harrison, & Vicente, 1996;Harrison, Ishii, Vicente, & Buxton, 1995a, Harrison, Kurtenbach, & Vicente, 1995b).Semi-transparency can create the impression of multiple interface layers on a conventional


display. Semi-transparency is traditionally achieved by using a technique called a-blending(Porter & Duff, 1984), which computes a weighted mean of two or more image layers (orparts thereof, representing the objects of interest) using uniform weight maps. The draw-back of this method is that the contents of the fore- and background images appear tointerfere in the resulting image. A new technique called multi-blending alleviates this prob-lem by using separate weight maps for individual image features like colour and texture(Baudisch & Gutwin, 2004). It was recently shown that the degree of perceived transpar-ency of an object or layer is to a certain extent determined by the amount of attention paidto it by the observer (Tse, 2005).

In applications with a transparent overview in the foreground superimposed on adetailed construction map in the background subjects obtained optimal performance withoverviews that were 50–70% transparent (Cox, Chugh, Gutwin, & Greenberg, 1998). Peo-ple were able to shift their focus rapidly between the two views, to the point of initiating anaction in the foreground layer and continuing it in the background layer. Semi-transpar-ency was readily and early adopted in computer games like Diablo II (Cox et al., 1998) andEverquest (Beverly et al., 1996). In this type of games the degree of transparency of a com-ponent depends on the amount of attention paid to it by the user.

Semi-transparency has been used to create a multi-layer zoomable focus + context dis-play (Pook, Lecolinet, Vaysseix, & Barillot, 2000). In this approach the user can selectivelyenlarge interesting parts of the display, while the initial top-level overview remains visibleas a transparent overlay. Thus, the user remains aware of the context while zooming andscrolling through the display.

Semi-transparency also effectively encodes the spatial relations (depth cues) betweenobjects in 3-D space. As a result, human performance in interactive 3-D computer graphicsenvironments improves when using semi-transparent tools (Zhai, Buxton, & Milgram,1996).

Special (auto-) stereoscopic displays can be deployed to present semi-transparent inter-face layers on different perceived depth planes. In this case, the user looks through theinterface presented in the foreground to see the interface presented in the background. Ide-ally, more information will be simultaneously visible using this approach. However, over-lapping windows may also interfere with each other.

3.1.2. Distortion-oriented techniques

Focus + context displays combine a detailed (full size or enlarged) representation of theregions of interest with a less detailed (compressed) representation of the remainingregions (the background). The low-resolution representation of the background providesthe context for the focus regions. Several different types of focus + context displays havebeen developed, including fisheye, bifocal, Perspective Wall, and polyfocal displays (seeFig. 1; for an overview and taxonomy of distortion oriented display techniques see: Car-pendale & Montagnese, 2001; Leung & Apperley, 1994). Most focus + context displaysare software driven. However, some hardware implementations have also been suggested(e.g. Baudisch, Good, & Stewart, 2001). The principle has recently been extended to 3-Dinformation displays (Carpendale, Cowperthwaite, & Fracchia, 1997).

Fisheye displays were originally introduced as a presentation strategy for informationhaving inherently a hierarchical structure (Furnas, 1982, 1986). The most relevant infor-mation is presented in the center of the display in great detail, whereas less relevant infor-mation is presented in the peripheral regions of the display in less detail. A threshold

Fig. 1. Illustration of the effect of two types of focus + context displays on a map of the London Underground(copyright Transport for London). (a) Bifocal fisheye display with 2 locally enlarged regions, the left one centeredon Earl’s Court, and the right one centered on Bank. (b) Perspective wall display.



function is applied to determine what information is to be presented or suppressed. Fish-eye view representations have successfully been applied to visualize program and calendardata (Bederson, Clamage, Czerwinski, & Robertson, 2003; Furnas, 1982, 1986), subwaynetworks (Hollands, Carey, Matthews, & McCann, 1989), aircraft maintenance data(Mitta, 1990), and graph structures (Sarkar & Brown, 1992).

The bifocal display was originally a one-dimensional mapping that combines a detailedcentral view with two distorted side views (Spence & Apperley, 1982). The concept hasbeen extended to two dimensions in an implementation of the London Undergroundmap (Leung, 1989) and for application to web browsing (Pilgrim & Leung, 1996) and cal-endar planning (Bederson et al., 2003).

The Perspective wall (Mackinlay, Roberston, & Card, 1991) is a way of densely display-ing large amounts of information by placing information on a flat plane which is tilted intothe screen so that it shrinks back toward infinity. More important items (such as upcomingdates on a calendar) can be displayed at a closer location and thus appear larger than theless important items that are further away.

Polyfocal displays are essentially fisheye views with multiple focus regions (Kadmon &Shlomi, 1978). They were originally proposed for the projection of statistical data on car-tographic maps.

When focus + context displays are used in an interactive mode users can navigatethrough the information space by selecting and moving (dragging) the regions of interest,for instance by using a pointing device (Flider & Bailey, 2004) or by finger pointing or rub-bing on a touchscreen (Olwal & Feiner, 2003).

Subjects can perform route planning and steering tasks on large graphical displays(Gutwin & Skopik, 2003; Hollands et al., 1989), information extraction tasks on large doc-uments or web pages (Baudisch, Lee, & Hanna, 2004) and driving tasks (Baudisch, Good,Bellotti, & Schraedley, 2002) significantly faster with focus + context representations thanwith scrolling and zooming techniques.

A comparative study of user interaction with large interfaces represented on smallscreens showed that subjects perform navigation tasks significantly faster withfocus + context representations, perform surveillance tasks better with zooming tech-niques, and attain worst performance with standard panning techniques (Gutwin &Fedak, 2004). Subjects use the context to perform large scale navigation tasks and switchattention to the focus region to perform local (small scale) navigation tasks (Flider & Bai-ley, 2004).

3.1.3. Hybrid techniques: magic lens filters

Magic lenses are small arbitrarily shaped windows which the user can move around(e.g. by mouse-, keyboard- or gaze-control) over the display area in order to inspect cer-tain areas of the display in more detail or to locally reveal additional information (Stone,Fishkin, & Bier, 1994). The operators associated with these windows or magic lenses canrange from simple image processing operations like local image enlargement (in order tosee more detail), contrast enhancement, noise or clutter reduction, to quite general oper-ations, for instance to view different types of information related to the area that is beinginspected (see Fig. 2). The latter type of operations is frequently applied in geographicalinformation systems, military mission planning tools and roadmaps. For instance, an engi-neer using a geographical information system to plan some major construction works in acity center may need to view the exact location of underground pipelines and/or cables

Fig. 2. Illustration of the Magic Lens concept. Left column: Visible light satellite image of North America. Rightcolumn: geographical map of North America. The white circle in the images in the left column represents theborder of the magic lens support. The white circle in the images in the right column represents the position of themagic lens relative to the geographical map. From top to bottom: black outline of state boundaries (a,b), whiteoutline of Air Route Traffic Control Center boundaries (c,d), yellow triangles representing the VOR aides tonavigation (e,f), light blue diamonds of the major airports (g,h). In each case the image on the left represents theinformation presented through a magic lens moving over the satellite image, and the image on the right shows theoverall information map residing in the underlying database. (For interpretation of the references to colour in thisfigure legend, the reader is referred to the Web version of this article.)



relative to some building blocks. In this case these additional details can be shown througha magic lens positioned on the building blocks, whereas the rest of the map remainsunchanged (to prevent severe clutter of the display: the map merely serves to providethe context in this case). Multiple operators can be applied in a single magic lens. Inthe previous example, the magic lens showing the location of pipelines and cables canbe used in combination with an edge-enhancement and a contrast reduction operator thatproduces a low-contrast outline image of the city map which can serve as a background forthe schematical map of the pipelines and cables, such that the latter are more clearly visiblein the context of the city map.

Magic lenses have a number of potential advantages over traditional methods of gener-ating alternate views and filtering information. Binding the filter to a spatially bounded,movable region creates an easily understood user model based on experience with physicallenses (using these filters is just like moving a looking glass over a newspaper). Limiting theview to a local region preserves context and can reduce clutter. Lenses can be parameterized,and can have arbitrary shape. The user can apply different operators simultaneously over dif-ferent parts of the displayed information to get multiple, simultaneous views. Lenses thatoverlap combine their effects (e.g. by multiplying their magnification), making it easy to cre-ate visual macros (Carpendale, Ligh, & Pattison, 2004). These macros can be temporary, orcan be ‘‘welded together’’ to create a compound lens that encapsulates a set of operators andparameters. Finally, the magic lens metaphor can be used uniformly across applications.

Another application of a gaze interactive display which is closely related to the magiclens concept is a system to view multiple overlaying images from different types of sensors,e.g. registered two-dimensional CT (depicting bone structures) and MR (showing soft tis-sue) images (Nikolov, Bull, Canagarajah, Jones, & Gilchrist, 2002). Such a system can forinstance simultaneously display one image modality in the central part of the visual fieldand another modality in the periphery. Provided that the system parameters are correctlychosen the human visual system fuses the two images into a single consistent percept. Inthe abovementioned medical application such a system allows the observer to continu-ously investigate the soft tissue from the MR image in the central part of vision, whileviewing the bone structure as context in peripheral vision. Applications of multisensor dis-plays are also found in geographic information systems and satellite image viewing sys-tems (Nikolov, Gilchrist, Bull, Canagarajah, & Jones, 2003).

3.1.4. Multiple screens

An interesting application of gaze interactive displays is the use of multiple screens. Inthis case, the main or central screen represents the global overview of the data that isinspected, and the system analyses the eye movements of the user on the central screento select the additional and/or detailed information that is presented on the peripheralscreens. A typical application is a naval information display in which a single screen isdivided in two halves: the right half is gaze sensitive and displays a naval map on whichthe positions of ships are marked, and the left half displays information about the shipthat was last fixated on the right (gaze sensitive) part of the screen (Hyrskykari, 1997;Jacob, 1991). Using a similar setup Pomplun, Ivanovic, Reingold, and Shen (2001) foundthat user performance with a gaze controlled display in a comparative visual search taskthat requires zooming in and out is similar to performance with a mouse.

In a typical application (e.g. surveillance, crowd monitoring, inspection) the user canfor instance request additional (detailed) information on a visual detail that he inspects


in the main scene. This information can either be presented automatically by the systemwhen the observer inspects the detail longer than a threshold dwelltime, or the user canactively request information on a currently fixated detail by pressing a button or givingvoice commands.

A system with multiple screens may also be useful for the display of multimodal night-vision images. For instance, when viewing a nightvision surveillance scene in an intensifiedimage mode on the central screen, the observer can inspect suspect locations (identifiedthrough his prolonged fixations) in another image modality (e.g. IR or LADAR) on anadjacent screen.

An obvious advantage of a gaze interactive system with multiple screens is the fact thatadditional task supporting information can easily be provided on peripheral screens, whilethe observer is performing his task using the main or central screen. Crucial informationon the central screen remains visible at all times, whereas the supporting information canbe updated and altered on demand. A potential disadvantage of such a system is the factthat the observer has to make eye movements to the peripheral displays in order to per-ceive the additional information presented there. During the time intervals in which theobserver inspects the peripheral displays changes in the central display may go unnoticed.This may be remedied by signalling changes in the centrally displayed visual informationthrough other sensory channels, e.g. by using auditory warning signals. However, since theperipheral displays can be placed adjacent to the main central display, the impact of thiseffect is probably negligible.

3.1.5. Gaze contingent multiresolution displays

Gaze-contingent multiresolutional displays center high-resolution information on theuser’s gaze position, matching the user’s area of interest (Parkhurst & Niebur, 2002; Rein-gold et al., 2003; for recent overviews see: Duchowski, Cournia, & Murphy, 2004; Park-hurst & Niebur, 2002; Reingold et al., 2003). Image resolution and details outside thearea of interest are reduced, lowering the requirements for processing resources and trans-mission bandwidth in demanding display and imaging applications. Gaze-contingent dis-plays integrate a system for tracking viewer gaze position (by combined eye and headtracking) with a display that can be modified in real time to center the area of interestat the point of gaze. Applications of this research include flight, medical, and driving sim-ulators; virtual reality; remote piloting and teleoperation; infrared and indirect vision;image transmission and retrieval; telemedicine; video tele-conferencing; and artificialvision systems. The size of the central high-resolution part of gaze contingent multireso-lution displays should at least be about 4� to obtain search times that are equal to thosefound in normal viewing conditions (Loschky & McConkie, 2000; Parkhurst, Culurciello,& Niebur, 2000; Shioiri & Ikeda, 1989). Decreasing either central region size or peripheralresolution leads to longer search times and decreasing saccade sizes (Loschky & McCon-kie, 2000; Shioiri & Ikeda, 1989; van Diepen & Wampers, 1998). However, there are indi-cations that target saliency degrades with decreasing peripheral resolution (Reingold &Loschky, 2002), making gaze contingent displays probably less suitable for visual searchand detection applications.

Real-time interactive gaze-contingent foveation is an increasingly popular approach toreduce the size of video streams that need to be transmitted over bandwidth-limited com-munication channels. Gaze-contingent video transmission typically uses an eye-trackingdevice to record eye position from a human observer on the receiving end, and applies


a real time foveation filter to the video contents at the source (Perry & Geisler, 2002).Thus, most of the communication bandwidth is allocated to high-fidelity transmissionof a small region around the viewer’s current point of regard, while peripheral imageregions are highly degraded and transmitted over little remaining bandwidth. Thisapproach is particularly effective, with observers often not noticing any degradation ofthe signal, if well matched to their visual system and viewing conditions. Further, onlineanalysis of the observer’s patterns of eye movements may allow more sophisticated inter-actions than simple foveation, like zooming-in and other computer interface controls(Goldberg & Schryver, 1995).

Interactive rendering of large-scale geometric datasets is an enabling technology formany farflung fields, ranging from scientific and medical visualization to entertainment,architecture, military training, and industrial design. Despite tremendous strides in com-puter graphics hardware, the growth of large-scale models continues to outstrip our capa-bility to render them interactively. Gaze directed simplification can be used to degrade thescene more aggressively in the viewer’s peripheral vision than at the center of their gaze,such that the overall simplification is imperceptible to the user (Luebke, Hallen, Newfield,& Watson, 2000; Williams, Luebke, Cohen, Kelley, & Schubert, 2003).

3.1.6. Gaze contingent stereoscopic displays

A novel gaze contingent display technique uses real-time stereoscopic gaze-tracking tomeasure the observer’s 3-D fixation point and enhance the appearance of the volumetricimage representation of the region around this point (Jones & Nikolov, 2000). The methodhas been demonstrated for the display of both integrated and fused multimodal medicalvolumetric data (Jones & Nikolov, 2004; Nikolov, Jones, Agrafiotis, Bull, & Canagarajah,2001). Region-enhanced volume rendering was employed in cases where volumes areviewed from the outside, while combined surface and volume rendering was used whenviewing the volumes from the inside, e.g. in virtual endoscopy applications or more gen-erally in volume navigation.

3.1.7. Gaze contingent attention guiding displays

Gaze contingent displays can also be deployed to actively guide the attention of observ-ers to certain parts of a scene. For instance, by overlaying red dots or looming icons overvideo clips it is possible to direct the visual scan path (Dorr, 2004; Dorr, Martinetz,Gegenfurtner, & Barth, 2004). The saliency of the items used to attract the observer’sattention can be adjusted to the visual angle between the current direction of gaze ofthe observer and the next desired fixation location. In this way, entire prescribed scanpaths can be imposed on observers inspecting a scene. Important application areas are sur-veillance and advertising. In surveillance applications the observer’s attention can bedrawn to unusual activities. In advertising the focus of attention can be directed towardsthe product that needs to be promoted.

3.2. Interface modalities

Visual information display techniques may be more effective when the user can dynam-ically adjust the display mode to his momentary information requirements during eachstage of a task. The interactive adjustment can for instance be done through keyboardcommands, mouse or hand pointing, voice commands, or gaze directing. The use of mul-


timodal interfaces, that combine two or more input modalities in a coordinated way, mayhelp to reduce the ambiguity of the individual input devices and to enhance the interactionspeed (Oviatt, 1996, 2003; Oviatt & Cohen, 2000). In this section we will give a brief over-view of the different input modalities that have recently been applied in the literature tointeract with dynamic visual information displays.

Interactive fisheye lenses allow for varying magnification and movable focus point.Unfortunately, magnification effects may make it seem as if objects in the fisheye vieware moving when the view is changed, thereby complicating target acquisition. These prob-lems become even more evident with touchscreen-controlled fisheye views, where there isno continuous navigation and pointing is inaccurate. However, it has been shown thatadding transparency and additional stabilized representations of the view facilitates rapidtargeting in fisheye views, even on touchscreens (Olwal & Feiner, 2003).

While eye gaze tracking has long been used as an input technique, eye contact as an inter-action mechanism has been less explored. Using eye contact, a user simply looks at a deviceto address it, the device reciprocates eye contact to communicate attention, and a user’sintended actions are mapped to that device (Kembel, 2003). Gaze or eye contact can beestablished through devices that detect the presence of human eyes in their field of view.Eye blinks can be detected from the differences between successive frames (Kawato &Tetsutami, 2004; Ohno, Mukawa, & Kawato, 2003). Recently, gaze or eye contact sensorshave been developed that are cheap, unobtrusive and tolerant to user head movement (e.g.Amir, Zimet, Sangiovanni-Vincentelli, & Kao, 2005; Morimoto, Koons, Amir, & Flickner,2000; Selker, Lockerd, & Martinez, 2001). They are therefore highly suitable to build intosmall electronic devices like cameras, PDA’s and cellphones, thus making it possible to ini-tiate contact with a device simply by looking at it (Shell, Vertegaal, & Skaburskis, 2003,2004). In a typical application, a device equipped with an eye contact sensor may choosean unobtrusive method to signal the user (e.g. by vibration in case of a mobile phone) whenit has not received the user’s attention for a while, and it may open a communication chan-nel once the user has acknowledged the device’s request for attention by looking at it.

Object selection by gaze control is significantly faster than by mouse control (Ohno,1998), especially with large display screens and virtual environments and for relativelylarge targets (Sibert & Jacob, 2000; Sibert, Jacob, & Templeman, 2001). However, func-tions like dragging and double clicking are better performed by manual pointing (Jacob,1993). It is far more natural to manipulate an object with the hand than with gaze. Thissuggests that gaze and mouse control should be used in a complementary fashion (Salvucci& Anderson, 2000). In this way, mouse pointing may serve to disambiguate user input, e.g.when selecting small targets (Yamato, Monden, Matsumoto, Inoue, & Torii, 2000). In thiscase eye gaze controls the ‘‘ballistic’’ movements of the cursor from one screen location toa region of tolerance in which the final target point is located, whereas manual controlserves to compensate for over- and undershoots and lock the cursor onto the target (Zhai,2003; Zhai, Morimoto, & Ihde, 1999).

Eyesight and speech are two channels that humans naturally use to communicate, espe-cially when their hands are occupied. Recognition ambiguities of speech and gaze inputsare inevitable. Combining the two overcomes imperfections of recognition techniques,compensates for drawbacks of single mode, supports mutual correction of individualinput signals, reduces the error rate and improves interaction efficiency (Zhang, Imamiya,Go, & Mao, 2004). For instance, eye gaze provides additional information that can beused to disambiguate speech, allowing an operator’s verbal commands to be directed to


the appropriate receiver (Glenn et al., 1986; Maglio, Matlock, Campbell, Zhai, & Smith,2000b). Another benefit of a combined gaze and speech multimodal input device is that iteliminates the need for the lengthy definite descriptions that would be necessary forunnamed objects if only speech is used. Instead, a terse description accompanied by user’snatural gaze behaviour can be used (Zhang et al., 2004). The use of simplified speech con-tributes to both error avoidance and user’s acceptance. Voice commands in combinationwith speech recognition can for instance be used to zoom and scroll images, or direct thefocus of interest in focus + context displays.

Gaze direction and facial expression that is subserved by the underlying facial muscleactivity are used frequently and fully automatically in human interaction. Some recent stud-ies investigate the feasibility of combining voluntarily directed eye movements (i.e., volun-tarily controlled fixations and gaze direction) and voluntarily produced changes in the levelof electrical activity of facial muscles as a new human–computer interaction technique(Partala, Aula, & Surakka, 2001; Surakka, Illi, & Isokoski, 2004). The voluntary use offacial muscle corrugator superciliiworks well for clicking as a counterpart for themouse but-ton press. With the new technique theMidas touch problem and the use of hardware buttonpress are totally avoided. This means that the user’s hands are left free for other purposes.

Pointing in 3-D to interact with objects or to change the viewpoint of cameras can bedone with 3-D tracked mouse devices, with data gloves, or through hand tracking using3-D computer vision techniques. In virtual environments hand-based pointing is fasterthan gaze-based pointing for distant objects (Cournia, Smith, & Duchowski, 2003).

Dynamically adjusting the distortion level based on pointer velocity and accelerationimproves the usability of fisheye views without requiring the user to manipulate any addi-tional view controls (Gutwin, 2002).

Computer vision is an effective input modality for human–computer interaction (Turk,2004; Turk & Kolsch, 2004). Video-based sensing is passive and non-intrusive, and doesnot require physical contact with the user or any special purpose devices. The sensor itselfcan be used for other imaging purposes as well. A vision based interface can support arange of functionalities in an interactive system, by conveying relevant visual informationabout the user, such as:

� Face detection and location: How many people are in the scene and where are they?� Face recognition: Who is it?� Head and face tracking: Where is the user’s head, and what is the specific position andorientation (pose) of the face?

� Facial expression analysis: Is the user smiling, laughing, frowning, speaking, sleepy?� Audiovisual speech recognition: Using lip-reading and face-reading along with speechprocessing, what is the user saying?

� Eye-gaze tracking: Specifically where are the user’s eyes looking?� Body tracking: Where is the user’s body and what is its articulation?� Hand tracking: Where are the user’s hands, in 2-D or 3-D? What are the specific handconfigurations?

� Gait recognition: Whose style of walking/running is this?� Recognition of postures, gestures, and activity: What is this person doing?

Gesture recognition has for instance successfully been applied to perform basic windowmanagement tasks (Wilson & Oliver, 2003) and to play a virtual game of solitaire without


any further computer interaction (Parker & Baumback, 2003). Body posture recognitionhas been used to interact with virtual environments (Tollmar, Demirdjian, & Darrell,2003). Facial feature tracking has been applied to construct a lip-reading system (Yang,Stiefelhagen, Meier, & Waibel, 1998). Frowning has been used to replace selection bymouse clicking (Partala et al., 2001; Surakka et al., 2004).

3.3. Attentive interfaces

An attentive (or reactive) interface dynamically prioritizes the information it presents toits users, such that information processing resources of both user and system are optimallydistributed across a set of tasks (Vertegaal, 2002, 2003). More precisely, attentive userinterfaces

� monitor user behaviour,� model user goals and interests,� anticipate user needs,� provide users with information, and� interact with users.

Attentive user interfaces are related to perceptual user interfaces, which incorporatemultimodal input, multimedia output, and human-like perceptual capabilities to createsystems with natural human–computer interactions (Oviatt & Cohen, 2000; Turk & Rob-ertson, 2000). Whereas the emphasis of perceptual user interfaces is on coordinating per-ception in human and machine, the emphasis of attentive user interfaces is on directingattention in human and machine. For a system to attend to a user, it must not only per-ceive the user but it must also anticipate the user. The key lies not in how it picks up infor-mation from the user or how it displays information to the user; rather, the key lies in howthe user is modelled and what inferences are made about the user.

Attentive displays can be particularly useful in domains with tasks that require visual,spatial and causal reasoning. These domains share five characteristics (Narayanan &Yoon, 2003):

(1) objects of the domain are spatially distributed;(2) the domain is dynamic, i.e. objects and their properties change over time;(3) objects causally interact with each other;(4) such interactions can be traced along chains of cause-effect relationships that branch

and merge in spatial and temporal dimensions; and(5) predicting the future evolution of a system in the domain requires reasoning from a

given set of initial conditions and inferring these causal chains of events.

Examples of domains satisfying these criteria include mechanics, meteorology and mil-itary planning. An example of task in meteorology is understanding the weather condi-tions, and then making a forecast, from an interactive display with various kinds ofweather maps, satellite imagery and other information. A military example is missionplanning, where a range of different intelligence information from a variety of differentsources is used to draw up the final plan. In these types of tasks, an attentive interface mustleverage knowledge about the task that the user is engaged in and the trajectory of the


user’s attention shifts in order to provide the right information in the right place and at theright time.

User behaviour may be monitored, for example, by video cameras to watch for certainsorts of user actions such as eye movements (Jacob, 1993; Zhai et al., 1999) or hand ges-tures (Bolt, 1980), by microphones to listen for speech or other sounds (Oviatt & Cohen,2000), by monitoring heart rate variability or motor activity using electroencephalogramanalysis (Chen & Vertegaal, 2004), or by a computer’s operating system to track key-strokes, mouse input, and application use (Horvitz, Breese, Heckerman, Hovel, & Romm-else, 1998; Linton, Joy, & Schaefer, 1999; Kotval & Goldberg, 1998, Maglio, Barrett,Campbell, & Selker, 2000a). User goals and interests may be modelled using Bayesian net-works (Horvitz et al., 1998), predefined knowledge structures (Selker, 1994), or heuristics(Maglio et al., 2000a, Maglio, Campbell, Barrett, & Selker, 2001). User needs may beanticipated by modeling task demands (Selker, 1994). Information may be delivered tousers by speech or by text (Maglio et al., 2000a; Oviatt & Cohen, 2000), and users mayinteract directly through eye gaze, gestures or speech (Bolt, 1980; Jacob, 1993; Starker& Bolt, 1990; Zhai et al., 1999). By statistically modeling the interactive user behaviour,attentive displays may establish the urgency and relevance of the displayed informationin the context of current activity. They may use this information to adjust their renderingsto provide peripheral context in support of focused activity. The optimal allocation ofattentional resources requires careful interruption management (Chen & Vertegaal,2004). Poorly designed attentive interfaces can be counterproductive if they distract theuser or make false inferences about the user’s needs and goals.

Examples of attentive displays are gaze-assisted applications for reading electronic doc-uments written in a foreign language (Hyrskykari, Majaranta, & Raiha, 2003; Khiat, Mat-sumoto, & Ogasawara, 2004). The system observes the reader’s eye movements andproactively provides help when the user appears to have comprehension problems.

Perceptual intelligent interfaces are attentivemultimodal interfaces that learn. They adapttheir behaviour to suit the user, rather than the other way around, and they do so by payingattention to the user and his surroundings in the same way another person would do (Pent-land, 2004). They enable rich, natural, and efficient interactionwith computers, by leveragingsensing (input) and rendering (output) technologies (Turk &Kolsch, 2004). These interfaceslearn by tracking the observer and his interactions with the environment over time. As aresult they can anticipate the user’s actions and his information requirements.

4. Application of gaze directed displays in attention aware systems

In this section we will identify some promising applications of gaze directed displays inattention aware systems. In each case we will argue how the application may serve to sup-port human attentional processes.

4.1. Multi-layer displays

In combination with a gaze tracking device a semi-transparent or multi-layer display(e.g. Deep Video Imaging Ltd., 2004) can be converted into an adaptive user interface.

In case of semi-transparent displays, the transparency level of regions on which anobserver dwells may automatically change (turning it more opaque) in response to theuser’s viewing behaviour.


In case of multiple depth layer displays, the depth layer on which the observer actuallyfixates may turn opaque, whereas the layers in front may become fully transparent. Also,fixated objects may be transferred to a different (more prominent) depth plane. This maybe a useful feature for surveillance systems and complex control displays. For instance,when monitoring a large crowd, previously inspected suspect individuals may be trans-ferred to a front plane, making them easier to track. Another example is a complex plantcontrol display, where it may be useful make control items that require frequent inspectionmore prominent by transferring them to separate depth planes.

Depending on the nature of the task a user either needs to focus his attention on a singleitem (e.g. inspection tasks) or he needs to share or divide his attention between severalitems of interest (e.g. crowd monitoring). Several key design issues need to be investigatedif users are expected to focus on or divide attention between superimposed images:

� Can users selectively attend to a chosen ‘‘layer’’ without visual interference from theother?

� Are there certain display characteristics or task properties which facilitate or precludeoverlapping displays?

� How do these design choices affect attentional performance?

A potential benefit of the application of a multi-layer display in a maritime setting is thefact that will enable the operator to more easily track multiple objects. A possible disad-vantage is the fact that the operator may miss temporal changes in the status of certainobjects when concentrating on different depth layers.

4.2. Eye contact

While eye contact sensors give less information than a conventional eye tracker, thedevices are extremely compact and inexpensive and can therefore be used in many situa-tions where a conventional eye tracker would not be feasible (Kembel, 2003; Selker et al.,2001). Devices (e.g. PDA’s, cellphones, displays) equipped with eye contact sensors canswitch from an unattended to an attentive state whenever a user looks at them. In an unat-tended state they may use auditory signals or flashing light to alert the user whenever theyneed to convey urgent information. In an attended state they may switch to a subdued orsilent mode, because they already have the user’s attention. Displays equipped with eyecontact sensors can even personalise the displayed information because eye contact sensorscan also be used to detect the identity of the observer (Selker et al., 2001). Similarly,devices equipped with eye contact sensors can also personalise their response (e.g. the lightlevels in case of dimmers, the sound and image settings in case of televisions and stereos).The combination of eye contact sensors with speech recognition seems very promising,since it allows devices to be activated and respond to spoken commands only when theyare fixated by the user.

A potentially promising application appears to be the equipment of public systemswith a combination of Bluetooth receivers and eye contact sensors. If visually impairedor handicapped people carry Bluetooth devices signalling their status, the public visualdisplays they inspect may for instance optimize the displayed information for the partic-ular visual impairment of the observer, and doors may be opened simply by looking atthem.


4.3. Gaze contingent displays

Displays with integrated eye trackers are currently available (e.g. the Tobii 1750 eyetracker, i.e.: www.tobii.se). Such a display would be highly suitable to implement a systemfor gaze contingent display of fused multimodal images. An observer can for instance usesuch a system to inspect a night vision surveillance scene. His fixation location on thescreen is registered by the system and can for instance drive a magic lens, that can locallydisplay each of the individual image modalities or fused combinations of the individualinput image modalities. In combination with another input interface modality (e.g. speechor gesture recognition, a pointing device or just keyboard buttons) the observer can indi-cate what information about the fixated region should be displayed (e.g. IR, II, LADAR,color fused, or grayscale fused, or edge density, etc.).

Another option is to replace the screen of a commercially available integrated eyetracker display like the Tobii 1750 with a two-layer display. With such a configurationthe observer can easily indicate the region of interest merely by looking at it and tellthe system (e.g. through speech or by pressing some buttons) to place this region in a dif-ferent depth layer to facilitate later retrieval.

An integrated eye tracker display like the Tobii 1750 can also be used to boost observerperformance in visual search and detection tasks (e.g. surveillance) by (1) driving thehuman visual scanning process, by (2) alerting the observer in case his attention fades(which the computer can assess by monitoring his fixation behaviour over time), or (3)providing additional information related to the inspected location. Suppose the observeris scanning the image. The system can then indicate a potential target or interesting event(e.g. by drawing a circle around it, or by flashing it, or by any other means to raise itsvisual conspicuity). To inspect the target the subject will make a saccade to the indicatedregion. Once the subject has inspected the indicated region the system may suggest a nextpotential target location, etc. The system can keep track of all regions that have beeninspected to prevent repeated inspection of the same zones. It is also possible to provideadditional information related to the inspected target region, e.g. by combining the over-view image of the surveyed scene with a high resolution inset representing a close-up of theinspected location. This close-up can be obtained from a camera with a telephoto lens thatis aimed at the inspected location. The steering of the telephote-lens itself can be per-formed under computer control (i.e can be gaze directed), when a calibrated mappingexists between the screen coordinates and the surveyed scene.

Gaze contingent display may also be applied in intelligent tutoring software (Beal, 2004).It is well known that experts make different types of eye movements than novices (Aaltonen,Hyrskykari,&Raiha, 1998; Canham&Hegarty, 2004;Card, 1984;Crosby&Peterson, 1991;Fitts et al., 1950; Goldberg & Kotval, 1998; Underwood, Chapman, Brocklehurst, Under-wood, & Crundall, 2003). A display system that monitors the user’s fixation behaviourmay react by providing novices with more or different types of guidance than experts, e.g.by drawing their attention to important details which they may have failed to inspect.

In interactive webpages the direction of gaze may be used as input for agents thatengage and direct the user’s attention while providing both generalized system help andadvice when giving specific product information (Witkowski, Arafa, & de Bruijn, 2001).

Another potential application area is in computer games. Games can become moreexciting if the system knows the attentional status of the player. For instance, the com-puter may let events occur preferably in regions that are less well attended.

http://www.tobii.se


In tele-communication and tele-work tasks it is important to know the state of attentionof the participants. This can be deduced from their gaze directions (Velichkovsky, 1995;Velichkovsky & Hansen, 2004). An example of such an application is the GAZE group-ware system which is a multiparty mediated tele-conference system that conveys the gazedirection of the participants (Velichkovsky, 1995; Vertegaal, 1999; Vertegaal, Vons, &Slagter, 1998). In this context it has also been observed that a look-to-talk interface is anatural alternative to push-to-talk systems (Oh et al., 2002).

4.4. Three-dimensional gaze registration systems

Three-dimensional gaze registration systems can establish the point in space that isfocussed by the observer. This information can be used to direct an active vision systemto search and identify the object a user is looking at (Atienza & Zelinsky, 2003). Usingsuch an active vision based interface a user can for instance direct a remotely controlledrobot to pick up an object in space simply by looking at it. Potential application areasare devices to assist handicapped and the operation of remotely piloted vehicles. Anotherinteresting but slightly futuristic application would be to install such a system in cars,together with a video system that analyses the road ahead. A computer can then contin-uously monitor the driver’s direction of gaze and alert the driver in case he does not payattention to oncoming and potentially hazardous traffic.

5. Concluding remarks

We presented a state-of-the-art review of both (1) techniques to register the direction ofgaze and (2) display techniques that can be used to optimally adjust visual informationpresentation to the capabilities of the human visual system and the momentary directionof viewing. These techniques can be used to develop gaze or attention aware systems thatmay serve to

� reduce the attentional workload of users,� personalise the displayed information or system actions,� attract or guide the user’s attention,� adapt information systems or games to the user’s degree of expertise,� convey information on the user’s attention in tele-conference systems,� assist disabled persons.

Promising application areas include surveillance systems, public information displaysystems, computer games, intelligent tutoring systems, tele-conferencing systems, devicesto assist handicapped, driver assistance systems, and remotely piloted vehicles.

References

Aaltonen, A., Hyrskykari, A., & Raiha, K. (1998). 101 Spots, or how do users read menus? In Proceedings of CHI

98 human factors in computing systems (pp. 132–139). New York, USA: ACM Press.Alvarez, G. A., & Cavanagh, P. (2004). The capacity of visual short-term memory is set both by visual

information load and by number of objects. Psychological Science, 15(2), 106–111.Amir, A., Zimet, L., Sangiovanni-Vincentelli, A., & Kao, S. (2005). An embedded system for an eye-detection

sensor. Computer Vision and Image Understanding, 98(1), 104–123.


Atienza, R., & Zelinsky, A. (2003). Interactive skills using active gaze tracking. In Proceedings of the fifth

international conference on multimodal interfaces (pp. 188–195). New York, USA: ACM Press.Backs, R. W., & Walrath, L. C. (1992). Eye movement and pupillary response indices of mental workload during

visual search of symbolic displays. Applied Ergonomics, 23(4), 243–254.Baddeley, A. (1981). The concept of working memory: a view of its current state and probable future

development. Cognition, 10(1), 17–23.Baluja, S., & Pomerleau, D. (1994). Non-intrusive gaze tracking using artificial neural networks (Report Technical

Report CMU-CS-94-102). Pittsburgh, PA, USA: School for Computer Science, Carnegie Mellon University.Barber, P. J., & Legge, D. (1976). Information acquisition. In Anonymous, perception and information (pp. 54–66).

London, UK: Methuen.Bates, R. (2002). Have patience with your eye mouse! eye-gaze interaction with computers can work. In S. Keates,

P. Langdon, P. J. Clarkson, & P. Robinson (Eds.), Proceedings of the first Cambridge workshop on universal

access and assistive technology (CWUAAT) (pp. 33–38). Cambridge, UK: University of Cambridge.Bates, R., & Istance, H. O. (2002). Zooming interfaces! Enhancing the performance of eye controlled pointing

devices. In Proceedings of ASSETS 2002, the fifth international ACM SIGCAPH conference on assistive

technologies (pp. 119–126). New York, USA: ACM Press.Bates, R., & Istance, H. O. (2003). Why are eye mice unpopular? A detailed comparison of head and

eye controlled assistive technology pointing devices. Universal Access in the Information Society, 2(3),280–290.

Baudisch, P., Good, N., Bellotti, V., & Schraedley, P. (2002). Keeping things in context: a comparative evaluationof focus plus context screens. In Proceedings of the SIGCHI conference on human factors in computing systems:

Changing our world, changing course (pp. 259–266). New York, USA: ACM Press.Baudisch, P., Good, N., & Stewart, P. (2001). Focus plus context screens: combining display technology with

visualization techniques. In Proceedings of the 14th annual ACM symposium on user interface software and

technology (pp. 31–40). New York, USA: ACM Press.Baudisch, P., & Gutwin, C. (2004). Multiblending: displaying overlapping windows simultaneously without the

drawbacks of alpha blending. In Proceedings of the ACM conference on human factors in computing systems

(CHI’04) (pp. 367–374). New York, USA: ACM Press.Baudisch, P., Lee, B., & Hanna, L. (2004). Fishnet, a fisheye web browser with search term popouts: a

comparative evaluation with overview and linear view. In Proceedings of the working conference on advanced

visual interfaces (pp. 133–140). New York, USA: ACM Press.Beal, C. R. (2004). Adaptive user displays for intelligent tutoring software. CyberPsychology & Behavior, 7(6),

689–693.Bederson, B. B., Clamage, A., Czerwinski, M. P., & Robertson, G. G. (2003). A fisheye calendar interface for

PDAs: providing overviews for small displays. In Proceedings of the ACM conference on human factors in

computing systems (pp. 618–619). New York, USA: ACM Press.Beverly, L., Harrison, B. L., & Vicente, K. J. (1996). An experimental evaluation of transparent menu usage. In

Proceedings of the SIGCHI conference on human factors in computing systems: common ground (pp. 391–398).New York, USA: ACM Press.

Beymer, D., & Flickner, M. (2003). Eye gaze tracking using an active stereo head. Proceedings of the 2003 IEEE

computer society conference on computer vision and pattern recognition (Vol. 2, pp. 451–458). Washington,USA: IEEE Press.

Bolt, R. A. (1980). Put-that-there: Voice and gesture at the graphics interface. In Proceedings of the seventh annual

conference on computer graphics and interactive techniques (pp. 262–270). New York, USA: ACM Press.Bolt, R. A. (1984). The human interface: where people and computers meet. Belmont, CA: Lifetime Learning

Publications.Bour, L. (1997). DMI-search scleral coil (Report Technical Report H2-214). Amsterdam, The Netherlands:

Academic Medical Center AZUA.Canham, M., & Hegarty, M. (2004). Influences of knowledge on eye fixations while interpreting weather maps. In

Proceedings of the 26th annual meeting of the cognitive science society (CogSci 2004 9pp. 286-1-Boston).Massachusetts, USA: Cognitive Science Society.

Card, S. K. (1984). Visual search of computer command menus. In H. Bouma & D. G. Bouwhuis (Eds.),Attention and performance X, control of language processes (pp. 97–108). Lawrence Erlbaum Associates:London, UK.

Carpendale, M. S. T., Cowperthwaite, D. J., & Fracchia, F. D. (1997). Extending distortion viewing from 2D to3D. IEEE Computer Graphics and Applications, 17(4), 42–51.


Carpendale, S., Ligh, J., & Pattison, E. (2004). Achieving higher magnification in context. In Proceedings of the

17th annual ACM symposium on user interface software and technology (pp. 71–80). New York, USA: ACMPress.

Carpendale, M. S. T., & Montagnese, C. (2001). A framework for unifying presentation space. In Proceedings of

ACM conference on user-interface software technology (pp. 61–70). New York, USA: ACM Press.Chen, D., & Vertegaal, R. (2004). Using mental load for managing interruptions in physiologically attentive user

interfaces. In Extended Abstracts of ACM CHI 2004 conference on human factors in computing systems

(pp. 1513–1516). New York, USA: ACM Press.Collet, C., Finkel, A., & Gherbi, R. (1997). CapRe: a gaze tracking system in man-machine interaction. In IEEE

international conference on intelligent engineering systems (pp. 577–581). Washington, USA: IEEE Press.Cornsweet, T., & Crane, H. (1973). Accurate two-dimensional eye tracker using first and fourth purkinje images.

Journal of the Optical Society of America, 63(8), 921–928.Cournia, N., Smith, J. D., & Duchowski, A. T. (2003). Gaze- vs. hand-based pointing in virtual environments. In

Proceedings of the conference on human factors in computing systems (pp. 772–773). New York, USA: ACMPress.

Cox, D. A., Chugh, J. S., Gutwin, C., & Greenberg, S. (1998). The usability of transparent overview layers. InCHI 98 conference summary on human factors in computing systems (pp. 301–302). New York, USA: ACMPress.

Crosby, M. E., & Peterson, W. W. (1991). Eye movements and interface components grouping: an evaluationmethod. In Proceedings of the 35th annual meeting of the human factors and ergonomics society

(pp. 1476–1480). Santa Monica, CA: Human Factors and Ergonomics Society.Daunys, G., & Ramanauskas, N. (2004). The accuracy of eye tracking using image processing. In Proceedings of

the third Nordic conference on human–computer interaction (pp. 377–380). New York, USA: ACM Press.Deep Video Imaging Ltd., (2004). Interactive dual plane imagery. Website: http://www.Deepvideo.com.Dorr, M. (2004). Effects of gaze-contingent stimuli on eye movements. Lubeck, Germany: Institut fur Neuro- und

Bioinformatik, Universitat zu Lubeck.Dorr, M., Martinetz, T., Gegenfurtner, K. R., & Barth, E. (2004). Guidance of eye movements on a gaze-

contingent display. In U. J. Illg, H. H. Bulthoff & H. A. Mallot (Eds.), Proceedings of the fifth workshop on

dynamic perception 2004 (pp. 89–94). Tubingen, Germany.Duchowski, A. T. (2001). Eye tracking techniques for perceptually adaptive graphics. In Proceedings of the ACM

SIGGRAPH/EUROGRAPHICS campfire conference on perceptual adaptive graphics. New York, USA: ACMPress.

Duchowski, A. T., Cournia, N., & Murphy, H. (2004). Gaze-contingent displays: a review. CyberPsychology &

Behavior, 7(6), 621–634.Durlach, P. J. (2004). Change blindness and its implications for complex monitoring and control systems design

and operator training. Human–Computer Interaction, 19(4), 423–451.Ebisawa, Y. (1989). Unconstrained pupil detection technique using two light sources and the image difference

method. In C. A. Brebbia & S. Hernandez (Eds.), Visualization and intelligent design in engineering and

architecture II (pp. 79–89). Southampton, UK: WIT Press.Findlay, J. M., & Gilchrist, I. D. (1998). Eye guidance and visual search. In G. Underwood (Ed.), Eey guidance in

reading and scene perception (pp. 295–312). Oxford, UK: Elsevier Science Ltd.Findlay, J. M., & Gilchrist, I. D. (2001). Visual attention: the active vision perspective. In L. R. Harris & M.

Jenkin (Eds.), Vision and attention (pp. 85–106). Berlin, Germany: Springer.Fitts, P. M., Jones, R. E., & Milton, J. L. (1950). Eye movements of aircraft pilots during instrument-landing

approaches. Aeronautical Engineering Review, 9(2), 24–29.Flider, M. J., & Bailey, B. P. (2004). An evaluation of techniques for controlling focus + context screens. In

Proceedings of the 2004 conference on graphics interface (pp. 135–144). New York, USA: ACM Press.Frey, L. A., White, K. P., & Hutchinson, T. E. (1990). Eye-gaze word processing. IEEE Transactions on Systems,

Man and Cybernetics, 20(4), 944–950.Furnas, G. W. (1982). The FISHEYE view: A new look at structured files (Report Technical Memorandum 82-

11221-22). Bell Laboratories.Furnas, G. W. (1986). Generalized fisheye views. In Human factors in computing systems CHI ’86, Boston, USA,

(pp. 16–23).Gips, J., Olivieri, P., & Tecce, J. (1993). Direct control of the computer through electrodes placed around the eyes.

In M. J. Smith & G. Salvendy (Eds.), Proceedings of the fifth international conference on human–computer

interaction: applications and case studies (pp. 630–635). Amsterdam, The Netherlands: Elsevier.

http://www.Deepvideo.com


Glenn, F. A., Iavecchia, H. P., Ross, L. V., Stokes, J. M., Weiss, D., & Zakland, A. L. (1986). Eye-voicecontrolled interface. Proceedings of the human factors society, 322–326.

Glenstrup, A. J., & Engell-Nielsen, T. (1995). Eye controlled media: present and future state. Copenhagen,Denmark: Institute of Computer Science, University of Copenhagen.

Goldberg, J. H., & Kotval, X. P. (1998). Eye movement-based evaluation of the computer interface. In S. K.Kumar (Ed.), Advances in occupational ergonomics and safety (pp. 529–532). Amsterdam, The Netherlands:ISO Press.

Goldberg, J. H., & Schryver, J. C. (1995). Eye-gaze-contingent control of the computer interface: Methodologyand example for zoom detection. Behavior Research Methods, Instruments and Computers, 27(3), 338–350.

Gutwin, C. (2002). Improving focus targeting in interactive fisheye views. In Proceedings of the ACM conference

on human factors in computing systems (CHI’02) (pp. 267–274). New York, USA: ACM Press.Gutwin, C., & Fedak, C. (2004). Interacting with big interfaces on small screens: a comparison of fisheye, zoom,

and panning techniques. In Proceedings of the 2004 conference on graphics interface (pp. 145–152). New York,USA: ACM Press.

Gutwin, C., & Skopik, A. (2003). Fisheye views are good for large steering tasks. In Proceedings of the ACM

conference on human factors in computing systems (CHI’03) (pp. 201–208). New York, USA: ACM Press.Haro, A., Flickner, M., & Essa, I. A. (2000). Detecting and tracking eyes by using their physiological properties,

dynamics, and appearance. In Proceedings of the IEEE conference on computer vision and pattern recognition

2000 (pp. 163–168). Washington, USA: IEEE Press.Harrison, B. L., Ishii, H., Vicente, K. J., & Buxton, W. A. S. (1995a). Transparent layered user interfaces: an

evaluation of a display design to enhance focused and divided attention. In Human factors in computing

systems: Proceedings of CHI’95 (pp. 317–324). New York, USA: ACM Press.Harrison, B. L., Kurtenbach, G., & Vicente, K. J. (1995b). An experimental evaluation of transparent user

interface tools and information content. In Proceedings of the eighth annual ACM symposium on user interface

and software technology (pp. 81–90). New York, USA: ACM Press.Hess, E. H., & Polt, J. M. (1964). Pupil size in relation to mental activity during simple problem-solving. Science,

143, 1190–1192.Hollands, J. G., Carey, T. T., Matthews, M. L., & McCann, C. A. (1989). Presenting a graphical network: a

comparison of performance using fisheye and scrolling views. In G. Salvendy & M. J. Smith (Eds.),Proceedings of the third international conference on human–computer interaction on designing and using human–

computer interfaces and knowledge based systems (2nd ed.) (pp. 313–320). New York, NY, USA: ElsevierScience Inc.

Hooge, I. T., & Erkelens, C. J. (1998). Adjustment of fixation duration in visual search. Vision Research, 38(9),1295–1302.

Horvitz, E., Breese, J., Heckerman, D., Hovel, D., & Rommelse, K. (1998). The Lumiere project: Bayesian usermodeling for inferring the goals and needs of software users. In Proceedings of the 14th conference on

uncertainty in artificial intelligence (pp. 256–265). San Francisco, CA: Morgan Kaufmann.Hutchinson, T. E., White, K. P., Martin, W. N., Reichert, K. C., & Frey, L. A. (1989). Human–computer

interaction using eye-gaze input. IEEE Transactions on Systems, Man and Cybernetics, 19(6), 1527–1534.Hyrskykari, A. (1997). Gaze control as an input device. In K.-J., Raiha (Ed.), Proceedings of ACHCI’97:

Advanced course on human–computer interaction.Hyrskykari, A., Majaranta, P., & Raiha, K.-J. (2003). Proactive response to eye movements. In M. Rauterberg,

M. Menozzi, & J. Wesson (Eds.), Proceedings of the ninth IFIP TC13 international conference on human–

computer interaction (pp. 129–136). Laxenburg, Austria: International Federation for Information Processing.Jacob, R. J. K. (1990). What you look at is what you get: eye movement-based interaction techniques. In

Proceedings of the SIGCHI conference on human factors in computing systems: empowering people (pp. 11–18).New York, USA: ACM Press.

Jacob, R. J. K. (1991). The use of eye movements in human–computer interaction techniques: what you look at iswhat you get. ACM Transactions on Information Systems, 9(2), 152–169.

Jacob, R. J. K. (1993). Eye movement-based human–computer interaction techniques: toward non-commandinterfaces. In H. R. Hartson & D. Hix (Eds.), Advances in human–computer interaction (pp. 151–190).Norwood, NJ, USA: Ablex Publishing Co..

Jacob, R. J. K., & Karn, K. S. (2004). Eye tracking in human–computer interaction and usability research: readyto deliver the promises (section commentary). In J. Hyona, R. Radach, & H. Deubel (Eds.), The mind’s eye:

cognitive and applied aspects of eye movement research (pp. 573–605). Amsterdam, The Netherlands: ElsevierScience.


Ji, Q., & Zhu, Z. (2002). Eye and gaze tracking for interactive graphic display. In Proceedings of the second

international symposium on smart graphics (pp. 79–85). New York, USA: ACM Press.Jones, M. G., & Nikolov, S. G. (2000). Volume visualisation via region-enhancement around an observer’s

fixation point. In Proceedings of the international conference on advances in medical signal

and information processing (MEDSIP 2000) (pp. 305–312). Herts, UK: The Institution of ElectricalEngineers IEE.

Jones, M. G., & Nikolov, S. G. (2004). Region-enhanced volume visualization and navigation. In S. K. Mun(Ed.), Medical imaging 2000: Image display and visualization, SPIE-3976 (pp. 454–465). Bellingham, WA,USA: The International Society for Optical Engineering.

Kadmon, N., & Shlomi, E. (1978). A polyfocal projection for statistical surfaces. The Cartographic Journal, 15(1),36–41.

Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall Inc.Kawato, S., & Tetsutami, N. (2004). Detection and tracking of eyes for gaze-camera control. Image and Vision

Computing, 22(12), 1031–1038.Kembel, J. A. (2003). Reciprocal eye contact as an interaction technique. In Proceedings of the ACM conference

on human factors in computing systems (pp. 952–953). New York, USA: ACM Press.Khiat, A., Matsumoto, Y., & Ogasawara, T. (2004). Task specific eye movements understanding for a gaze-

sensitive dictionary. In Proceedings of the ninth international conference on intelligent user interface

(pp. 265–267). New York, USA: ACM Press.Kim, K.-N., & Ramakrishna, R. S. (1999). Vision-based eye-gaze tracking for human computer interface.

Proceedings of the IEEE international conference on systems, man, and cybernetics.Kotval, X. P., & Goldberg, J. H. (1998). Eye movements and interface components grouping: an evaluation

method. In Proceedings of the 42nd annual meeting of the human factors and ergonomics society (pp. 486–490).Santa Monica, CA: Human Factors and Ergonomics Society.

Leung, Y. K. (1989). Human–computer interface techniques for map based diagrams. In G. Salvendy &M. Smith(Eds.), Proceedings of the third international conference on human–computer interaction (pp. 361–368).Amsterdam, The Netherlands: Elsevier.

Leung, Y. K., & Apperley, M. D. (1994). A review and taxonomy of distortion-oriented presentation techniques.ACM Transactions on Computer–Human Interaction, 1(2), 126–160.

Lin, Y., Zhang, W. J., & Watson, L. G. (2003). Using eye movement term parameters for evaluating human–machine interface frameworks under normal control operation and fault detection situations. InternationalJournal of Human–Computer Studies, 59(6), 837–873.

Linton, F., Joy, D., & Schaefer, H. (1999). Building user and expert models by long-term observation ofapplication usage. In Proceedings of the seventh international conference on user modeling (pp. 129–138).Available from http://www.cs.usask.ca/UM99/papers.shtml.

Loschky, L. C., & McConkie, G. W. (2000). User performance with gaze contingent displays. In A. T. Duchowski(Ed.), Proceedings of the eye tracking research & applications symposium 2000 (pp. 97–103). New York, USA:ACM Press.

Luebke, D., Hallen, B., Newfield, D., & Watson, B. (2000). Perceptually driven simplification using gaze-directedrendering. In S. Gortler & K. Myszkowski (Eds.), Proceedings of the 2001 eurographics workshop on rendering

techniques (pp. 223–234). Vienna, Austria: Springer-Verlag.Mackinlay, J. D., Roberston, G. G., & Card, S. K. (1991). The perspective wall: detail and context smoothly

integrated. In Proceedings of the conference on human factors in computing systems ‘91 (CHI ’91)

(pp. 173–179). New York, USA: ACM Press.Magee, J. J., Scott, M. R., Waber, B. N., & Betke, M. (2004). EyeKeys: real-time vision interface based on gaze

detection from a low-grade video camera. Proceedings of the 2004 conference on computer vision and pattern

recognition workshop (CVPRW’04), workshop on real-time vision for human-computer interaction (RTV4HCI)

(Vol. 10, pp. 159–166). Washington, USA: IEEE Press.Maglio, P. P., Barrett, R., Campbell, C. S., & Selker, T. (2000a). SUITOR: an attentive information system. In

Proceedings of the fifth international conference on intelligent user interfaces (pp. 169–176). New York, USA:ACM Press.

Maglio, P. P., Campbell, C. S., Barrett, R., & Selker, T. (2001). An architecture for developing attentiveinformation systems. Knowledge-Based Systems, 14(1–2), 103–110.

Maglio, P. P., Matlock, T., Campbell, C. S., Zhai, S., & Smith, B. A. (2000b). Gaze and speech in attentive userinterfaces. In Proceedings of the international conference on multimodal interfaces, LNCS Series Springer-Verlag.

http://www.cs.usask.ca/UM99/papers.shtml


Matsumoto, Y., & Zelinsky, A. (2000). An algorithm for real-time stereo vision implementation of head pose andgaze direction measurement. In Proceedings of IEEE fourth international conference on face and gesture

recognition (pp. 499–505). Piscataway, NJ: IEEE.May, J. G., Kennedy, R. S., Williams, M. C., Dunlap, W. P., & Brannan, J. R. (1990). Eye movement indices of

mental workload. Acta Psychologica, 75, 75–89.Miniotas, D., & Spakov, O. (2004). Target expansion as a means to facilitate eye-based selection. Elektronika ir

Elektrotechnika, 3(52), 13–17.Mitta, D. A. (1990). A fisheye presentation strategy: aircraft maintenance data. In Proceedings of the IFIP TC13

third international conference on human–computer interaction (pp. 875–880). Amsterdam, The Netherlands:North-Holland Publishing Company.

Morimoto, C. H., Amir, A., & Flickner, M. (2002). Free head motion eye gaze tracking without calibration. InConference on human factors in computing systems (pp. 586–587). New York, USA: ACM Press.

Morimoto, C. H., Koons, D., Amir, A., & Flickner, M. (2000). Pupil detection and tracking using multiple lightsources. Image and Vision Computing, 18(4), 331–335.

Morimoto, C., & Mimica, M. R. M. (2005). Eye gaze tracking techniques for interactive applications. Computer

Vision and Image Understanding, 98(1), 4–24.Most, S. B., Simons, D. J., Scholl, B. J., Jimenez, R., Clifford, E., & Chabris, C. F. (2001). How not to be seen: the

contribution of similarity and selective ignoring to sustained in attentional blindness. Psychological Science,12(1), 9–17.

Narayanan, N. H., & Yoon, D. (2003). Reactive information displays. In Proceedings of INTERACT 2003: Ninth

IFIP TC 13 international conference on human–computer interaction (pp. 244–251). Amsterdam, TheNetherlands: IOS Press.

Nikolov, S. G., Bull, D., Canagarajah, C. N., Jones, M., & Gilchrist, I. D. (2002). Multi-modality gaze-contingentdisplays for image fusion. In Proceedings of the fifth international conference on information fusion

(pp. 1213–1220). Sunnyvale, CA: International Society of Information Fusion.Nikolov, S. G., Gilchrist, I. D., Bull, D. R., Canagarajah, C. N., & Jones, M. G. (2003). A system for gaze-

contingent image analysis and multi-sensorial image display. In Proceedings of the sixth international

conference on information fusion (FUSION 2003) (pp. 749–756). Fairborn, OH, USA: International Society ofInformation Fusion.

Nikolov, S. G., Jones, M., Agrafiotis, D., Bull, D. R., & Canagarajah, C. N. (2001). Focus + contextvisualisation for fusion of volumetric medical images. In Proceedings of the fourth international conference

on information fusion (Fusion 2001), I (pp. WeC3-3–WeC3-10). Sunnyvale, CA: International Society ofInformation Fusion.

Oh, A., Fox, H., van Kleek, M., Adler, A., Gajos, K., Morency, L.-P., et al. (2002). Evaluating look-to-talk: agaze-aware interface in a collaborative environment. In Proceedings of the CHI ’02 conference on human

factors in computing systems (pp. 650–651). New York, USA: ACM Press.Ohno, T. (1998). Features of eye gaze interface for selection tasks. In Proceedings of the third Asia Pacific

computer human interaction APCHI’98 (pp. 1–6). Washington, USA: IEEE Computer Society.Ohno, T., & Mukawa, N. (2004). A free-head, simple calibration, gaze tracking system that enables gaze-based

interaction. In Proceedings of the symposium on ETRA 2004: eye tracking research & application symposium

(pp. 115–122). New York, USA: ACM Press.Ohno, T., Mukawa, N., & Kawato, S. (2003). Just blink your eyes: a head-free gaze tracking system. In

Proceedings of the ACM conference on human factors in computing systems (CHI2003) (pp. 950–951). NewYork, USA: ACM Press.

Olwal, A., & Feiner, S. (2003). Rubbing the fisheye: precise touch-screen interaction with gestures and fisheyeviews. Conference supplement of UIST ’03 (ACM symposium on user interface software and technology) (Vol.2, pp. 83–84). New York, USA: ACM Press.

Oviatt, S. (1996). Multimodal interfaces for dynamic interactive maps. In Proceedings of the SIGCHI conference

on human factors in computing systems: common ground (pp. 95–102). New York, USA: ACM Pres.Oviatt, S. (2003). Multimodal interfaces. In J. Jacko & A. Sears (Eds.), The human–computer interaction

handbook: fundamentals, evolving technologies and emerging applications (pp. 286–304). Mahwah, NJ, USA:Lawrence Erlbaum Associates Inc..

Oviatt, S., & Cohen, P. (2000). Perceptual user interfaces: multimodal interfaces that process what comesnaturally. Communications of the ACM, 43(3), 45–53.

Park, K. S., & Lim, C. J. (2004). A simple vision-based head tracking method for eye-controlled human/computerinterface. International Journal of Human–Computer Studies, 54(3), 319–332.


Parker, J., & Baumback, M. (2003). Creating an enhanced reality user interface – ERSolitaire. In Proceedings of

the ACM conference on human factors in computing systems (pp. 958–959). New York, USA: ACM Press.Parkhurst, D., Culurciello, E., & Niebur, E. (2000). Evaluating variable resolution displays with visual search:

task performance and eye movements. In Proceedings of the eye tracking research and applications symposium

(pp. 105–109). New York, USA: ACM Press.Parkhurst, D., Law, K., & Niebur, E. (2002). Modeling the role of salience in the allocation of overt visual

attention. Vision Research, 42(1), 107–123.Parkhurst, D. J., & Niebur, E. (2002). Variable-resolution displays: a theoretical, practical and behavioral

evaluation. Human Factors, 44(4), 611–629.Partala, T., Aula, A., & Surakka, V. (2001). Combined voluntary gaze direction and facial muscle activity as a

new pointing technique. In M. Hirose (Ed.), Proceedings of INTERACT 2001 (pp. 100–107). Amsterdam, TheNetherlands: IOS Press.

Pastoor, S., Liu, J., & Renault, S. (1999). An experimental multimedia system allowing 3-D visualization and eye-controlled interaction without user-worn devices. IEEE Transactions on Multimedia, 1(1), 41–52.

Peavler,W. S. (1974). Pupil size, informationoverload, andperformance differences.Psychophysiology, 11, 559–566.Pentland, A. (2004). Perceptual intelligence. Communications of the ACM, 43(3), 35–44.Perry, J. S., & Geisler, W. S. (2002). Gaze-contingent real-time simulation of arbitrary visual fields. In B. E.

Rogowitz & T. N. Pappas (Eds.), Human vision and electronic imaging VII, SPIE-4662 (pp. 57–69).Bellingham, WA, USA: The International Society for Optical Engineering.

Pilgrim, C., & Leung, Y. (1996). Applying bifocal displays to enhance WWW navigation. In Proceedings of the

second Australian World Wide Web conference. Available from http://ausweb.scu.edu.au/aw96/tech/pilgrim/.Pomplun, M., Ivanovic, N., Reingold, M., & Shen, J. (2001). Empirical evaluation of a novel gaze-controlled

zooming interface. In M. J. Smith, G. Salvendy, D. Harris, & R. J. Koubek (Ed.), Usability evaluation and

design: cognitive engineering, intelligent agents and virtual reality. Proceedings of the ninth international

conference on human–computer interaction 2001, Vol. 2, New Orleans, USA.Pook, S., Lecolinet, E., Vaysseix, G., & Barillot, E. (2000). Context and interaction in zoomable user interfaces. In

Proceedings of the working conference on advanced visual interfaces (pp. 227–231). NewYork, USA: ACMPress.Porter, T., & Duff, T. (1984). Compositing digital imaegs. Computer Graphics, 18(3), 253–259.Posner, M. I. (1980). Orienting of attention. Quarterly Journal of Experimental Psychology, 32(1), 3–25.Rajashekar, U., Cormack, L. K., & Bovik, A. C. (2003). Image features that draw fixations. Proceedings of the

IEEE international conference on image processing (2, pp. III-313–III-316). Washington, USA: IEEE.Reingold, E. M., & Loschky, L. C. (2002). Reduced saliency of peripheral targets in gaze-contingent multi-

resolutional displays: blended versus sharp boundary windows. In Proceedings of the symposium on eye

tracking research & applications (pp. 89–93). New York, USA: ACM Press.Reingold, E. M., Loschky, L. C., McConkie, G. W., & Stampe, D. M. (2003). Gaze-contingent multiresolutional

displays: an integrative review. Human Factors, 45(2), 307–328.Rothrock, L., Koubek, R., Fuchs, F., Haas, M., & Salvendy, G. (2002). Review and reappraisal of adaptive

interfaces: toward biologically inspired paradigms. Theoretical Issues in Ergonomics Science, 3(1), 47–84.Ruddarraju, R., Haro, A., Nagel, K., Tran, Q. T., Essa, I. A., Abowd, G., et al. (2003). Perceptual user interfaces

using vision-based eye tracking. In Proceedings of the fifth international conference on multimodal interfaces

(pp. 227–233). New York, USA: ACM Press.Salvucci, D. D., & Anderson, J. R. (2000). Intelligent gaze-added interfaces. In Proceedings of the SIGCHI

conference on human factors in computing systems (pp. 273–280). New York, USA: ACM Press.Salvucci,D.D.,&Goldberg, J.H. (2000). Identifyingfixations and saccades in eye-trackingprotocol. InProceedings

of the eye tracking research and applications symposium 2000, Palm Beach Gardens, FL, USA (pp. 71–78).Sarkar, M., & Brown, M. H. (1992). Graphical Fisheye Views of Graphs (Report SRC-RR-84A). Digital

Equipment Corporation.Schnipke, S. K., & Todd, M. W. (2000). Trials and tribulations of using an eye-tracking system. In Conference on

human factors in computing systems (pp. 273–274). New York, USA: ACM Press.Selker, T. (1994). COACH: A teaching agent that learns. Communications of the ACM, 37(1), 92–99.Selker, T., Lockerd, A., & Martinez, J. (2001). Eye-R, a glasses-mounted eye motion detection interface. In

Proceedings of the ACM CHI 2001 human factors in computing systems conference (pp. 179–180). New York,USA: ACM Press.

Shell, J. S., Vertegaal, R., Cheng, D., Skaburskis, A. W., Sohn, C., Stewart, A. J., et al. (2004). ECSGlasses andEyePliances: using attention to open sociable windows of interaction. In Proceedings of ACM eye tracking

research and applications symposium 2004 (pp. 93–100). New York, USA: ACM Press.

http://ausweb.scu.edu.au/aw96/tech/pilgrim/


Shell, J. S., Vertegaal, R., & Skaburskis, A. W. (2003). EyePliances: attention-seeking devices that respond tovisual attention. In Proceedings of the conference on human factors in computing systems (pp. 770–771). NewYork, USA: ACM Press.

Shih, S.-W., & Liu, J. (2004). A novel approach to 3-D gaze tracking using stereo cameras. IEEE Transactions on

Systems, Man and Cybernetics – Part B: Cybernetics, 34(1), 234–245.Shioiri, S., & Ikeda, M. (1989). Useful resolution for picture perception as a function of eccentricity. Perception,

18, 347–361.Sibert, L. E., & Jacob, R. J. K. (2000). Evaluation of eye gaze interaction. In Proceedings of the ACM CHI

2000 human factors in computing systems conference (pp. 281–288). New York, USA: Addison-Wesley/ACMPress.

Sibert, L. E., Jacob, R. J. K., & Templeman, J. N. (2001). Evaluation and analysis of eye gaze interaction (ReportNRL Report NRL/FR/5513–01-9990). Washington, DC: Naval Research Laboratory.

Simons, D. J., & Chabris, C. F. (1999). Gorillas in our midst: sustained in attentional blindness for dynamicevents. Perception, 28(9), 1059–1074.

Spence, R., & Apperley, M. (1982). Database navigation: an office environment for the professional. Behaviourand Information Technology, 1, 43–54.

Stampe, D. M., & Reingold, E. M. (1995). Selection by looking: a novel computer interface and its application topsychological research. In J. M. Findlay, R. Walker, & R. W. Kentridge (Eds.), Eye movement research:

Mechanisms processes and applications (pp. 467–478). Amsterdam, The Netherlands: Elsevier SciencePublishers.

Starker, I., & Bolt, R. A. (1990). A gaze-responsive self-disclosing display. InProceedings of the SIGCHI conference

on human factors in computing systems: empowering people (pp. 3–10). New York, USA: ACM Press.Stern, J .A. (1997). The pupil of the eye: what can it tell us about ’mental processes’? Human Engineering for

Quality of Life (HQL) (pp. 1–2). Available from http://www.hql.or.jp/gpd/eng/www/nwl/n08/ergo.html.Stiefelhagen, R., Yang, J., & Waibel, A. (1997a). A model-based gaze-tracking system. International Journal of

Artificial Intelligence Tools, 6(2), 193–209.Stiefelhagen, R., Yang, J., & Waibel, A. (1997b). Tracking eyes and monitoring eye gaze. In Proceedings of the

workshop on perceptual user interfaces (pp. 98–100). New York, USA: ACM Press.Stone, M. C., Fishkin, K., & Bier, E. A. (1994). The movable filter as a user interface tool. In Proceedings of the

SIGCHI conference on human factors in computing systems: celebrating interdependence (pp. 306–312). NewYork, USA: ACM Press.

Sugioka, A., Ebisawa, Y., & Ohtani, M. (1996). Noncontact video-based eye-gaze detection method allowinglarge head displacements. Proceedings of the 18th annual international conference of the IEEE on engineering in

medicine and biology society (Vol. 2, pp. 526–528). Washington, USA: IEEE Press.Surakka, V., Illi, M., & Isokoski, P. (2004). Gazing and frowning as a new human–computer interaction

technique. ACM Transactions on Applied Perception, 1(1), 40–56.Talmi, K., & Liu, J. (1999). Eye and gaze tracking for visually controlled interactive stereoscopic displays. Signal

Processing: Image Communication, 14(10), 799–810.Tollmar, K., Demirdjian, D., & Darrell, T. (2003). Gesture + play: full-body interaction for virtual environments.

In Proceedings of the ACM conference on human factors in computing systems (pp. 620–621). New York, USA:ACM Press.

Tse, P. U. (2005). Voluntary attention modulates the brightness of overlapping transparent surfaces. VisionResearch, 45(9), 1095–1098.

Turk, M. (2004). Computer vision in the interface. Communications of the ACM, 47(1), 60–67.Turk, M., & Kolsch, M. (2004). Perceptual interfaces. In G. Medioni & S. B. Kang (Eds.), Emerging topics in

computer vision (pp. 455–519). Englewood Cliffs, NJ: Prentice Hall PTR.Turk, M., & Robertson, G. (2000). Perceptual user interfaces. Communications of the ACM, 43(3), 32–34.Underwood, G., Chapman, P., Brocklehurst, N., Underwood, J., & Crundall, D. (2003). Visual attention while

driving: sequences of eye fixations made by experienced and novice drivers. Ergonomics, 46(6), 629–646.van Diepen, P. M., & Wampers, M. (1998). Scene exploration with Fourier-filtered peripheral information.

Perception, 27(10), 1141–1151.Velichkovsky, B. M. (1995). Communicating attention: gaze position transfer in cooperative problem solving.

Pragmatics and Cognition, 3(2), 122–199.Velichkovsky, B. M., Hansen, & J. P. (2004). New technological windows into mind: there is more in eyes and

brains for human–computer interaction. Electronically available at <http://rcswww.urz.tu-dresden.de/~cogsci/welcome_g.html?/~cogsci/velich/boris.html>.

http://www.hql.or.jp/gpd/eng/www/nwl/n08/ergo.html

http://rcswww.urz.tu-dresden.de/~cogsci/welcome_g.html

http://rcswww.urz.tu-dresden.de/~cogsci/welcome_g.html


Vertegaal, B. J. H. (1999). The GAZE groupware system: mediating joint attention in multiparty communicationand collaboration. In Proceedings of the SIGCHI conference on human factors in computing systems: the CHI is

the limit (pp. 294–301). New York, USA: ACM Press.Vertegaal, R. (2002). Designing attentive interfaces. In Proceedings of the symposium on eye tracking research &

applications (pp. 23–30). New York, USA: ACM Press.Vertegaal, R. (2003). Attentive user interfaces. Communications of the ACM, 46(3), 30–33.Vertegaal, B. J. H., Vons, H., & Slagter, R. (1998). Look who’s talking: the GAZE groupware system. In CHI 98

conference summary on human factors in computing systems (pp. 293–294). New York, USA: ACM Press.Wang, J.-G., & Sung, E. (2004). Gaze determination via images of irises. Image and Vision Computing, 19(12),

891–911.Wang, J.-G., Sung, E., & Venkateswarlu, R. (2005). Estimating the eye gaze from one eye. Computer Vision and

Image Understanding, 98(1), 88–103.Ware, C., & Mikaelian, H. H. (1987). An evaluation of an eye tracker as a device for computer input. In J. M.

Carroll & P. P. Tanner (Eds.), CHI + GI 1987 conference proceedings, SIGCHI Bulletin (pp. 183–188). NewYork, USA: ACM Press.

White, K. P., Hutchinson, T. E., & Carley, J. M. (1993). Spatially dynamic calibration of an eye-tracking system.IEEE Transactions on Systems, Man and Cybernetics, 23(4), 1162–1168.

Williams, N., Luebke, D., Cohen, J. D., Kelley, M., & Schubert, B. (2003). Perceptually guided simplification oflit, textured meshes. In Proceedings of the 2003 symposium on interactive 3D graphics (pp. 113–121). NewYork, USA: ACM Press.

Wilson, A., & Oliver, N. (2003). GWindows: robust stereo vision for gesture-based control of windows. InProceedings of the fifth international conference on multimodal interfaces (ICMI’03) (pp. 211–218). New York,USA: ACM Press.

Witkowski, M., Arafa, Y., & de Bruijn, O. (2001). Evaluating user reaction to character agent mediated displaysusing eye-tracking technology. In Proceedings of the AISB’01 symposium on information agents for electronic

commerce (pp. 79–87). York, UK: University of York.Yamato, M., Monden, A., Matsumoto, K., Inoue, K., & Torii, K. (2000). Button selection for general GUIs

using eye and hand together. In Proceedings of the working conference on advanced visual interfaces

(pp. 270–273). ACM Press: New York, USA.Yang, J., Stiefelhagen, R., Meier, U., & Waibel, A. (1998). Visual tracking for multimodal human–computer

interaction. In Proceedings of the conference on human factors in computing systems (pp. 140–147). New York,USA: ACM Press.

Yarbus, A. (1967). Eye movements and vision. New York, USA: Plenum Press.Yoo, D. H. (2004). Non-contact eye gaze estimation system using robust feature extraction and mapping of corneal

reflections. Daejeon, Republic of Korea: Korea Advanced Institute of Science and Technology.Yoo, D. H., & Chung, J. W. (2004). Non-intrusive eye gaze estimation without knowledge of eye pose. In

Proceedings of the sixth IEEE international conference on automatic face and gesture recognition (pp. 785–790).Washington, USA: IEEE Press.

Yoo, D. H., & Chung, M. J. (2005). A novel non-intrusive eye gaze estimation using cross-ratio under large headmotion. Computer Vision and Image Understanding, 98(1), 25–51.

Yoo, D. H., Kim, J. H., Kim, D. H., & Chung, M. J. (2002). A human-robot interface suing vision-based eye gazeestimation system. In Proceedings of the IEEE/RSJ international conference on intelligent robots and system(IROS 02), Vol. 2 (pp. 1196–1201).

Zhai, S. (2003). What’s in the eyes for attentive input. Communications of the ACM, 46(3), 34–39.Zhai, S., Buxton, W., & Milgram, P. (1996). The partial-occlusion effect: utilizing semitransparency in 3D

human–computer interaction. ACM Transactions on Computer–Human Interaction, 3(3), 254–284.Zhang, Q., Imamiya, A., Go, K., & Mao, X. (2004). A gaze and speech multimodal interface. In Proceedings of

the 24th international conference on distributed computing systems workshops (ICDCSW’04) (pp. 208–214).IEEE Computer Society.

Zhai, S.,Morimoto,C.,& Ihde, S. (1999).Manual and gaze input cascaded (MAGIC) pointing. InProceedings of theCHI’99: ACM conference on human factors in computing systems (pp. 246–253). New York, USA: ACM Press.

Zhu, Z., & Ji, Q. (2004). Eye and gaze tracking for interactive graphic display. Machine Vision and Applications,

15(3), 139–148.Zhu, Z., Fujimura, K., & Ji, Q. (2002). Real-time eye detection and tracking under various light conditions. In

Proceedings of the symposium on eye tracking research & applications (pp. 139–144). New York, USA: ACMPress.

Documents

Gaze directed displays as an enabling technology for attention aware systems