ORIGINAL PAPER
Dogs do look at images: eye tracking in canine cognition research
Sanni Somppi • Heini Tornqvist • Laura Hanninen •
Christina Krause • Outi Vainio
Received: 11 February 2011 / Revised: 6 July 2011 / Accepted: 1 August 2011 / Published online: 23 August 2011
� Springer-Verlag 2011
Abstract Despite intense research on the visual com-
munication of domestic dogs, their cognitive capacities
have not yet been explored by eye tracking. The aim of the
current study was to expand knowledge on the visual
cognition of dogs using contact-free eye movement track-
ing under conditions where social cueing and associative
learning were ruled out. We examined whether dogs
spontaneously look at actual objects within pictures and
can differentiate between pictures according to their nov-
elty or categorical information content. Eye movements of
six domestic dogs were tracked during presentation of
digital color images of human faces, dog faces, toys, and
alphabetic characters. We found that dogs focused their
attention on the informative regions of the images without
any task-specific pre-training and their gazing behavior
depended on the image category. Dogs preferred the facial
images of conspecifics over other categories and fixated on
a familiar image longer than on novel stimuli regardless of
the category. Dogs’ attraction to conspecifics over human
faces and inanimate objects might reflect their natural
interest, but further studies are needed to establish whether
dogs possess picture object recognition. Contact-free eye
movement tracking is a promising method for the broader
exploration of processes underlying special socio-cognitive
skills in dogs previously found in behavioral studies.
Keywords Canine � Domestic dogs � Eye movement
tracking � Visual cognition
Introduction
Close co-evolution, sharing the same living habitat, and
similarities in canine and human socio-cognitive skills
make the dog a unique model for comparative cognition
studies (Hare and Tomasello 2005; Miklosi et al. 2007;
Topal et al. 2009). Domestication has equipped dogs with
sophisticated sensitivity to respond to human visual com-
municative cues, including gaze direction and pointing
gestures (Hare and Tomasello 2005; Miklosi et al. 2003;
Soproni et al. 2002; Viranyi et al. 2004). Despite the
intense research around the visual communication of
domestic dogs, their cognitive capacities have not yet been
explored by eye tracking.
Animals with developed visual systems control their
gaze direction using eye movements (Land 1999). Eye
movements are linked to attention (Buswell 1935); the
visual-cognitive system directs a gaze toward important
and informative objects, and vice versa, gaze direction
affects several cognitive processes (Henderson 2003).
Based on behavioral observations, dogs can perceive two-
dimensional visual information and can be trained to per-
form visual tasks, such as classifying photographs of nat-
ural stimuli by means of a perceptual response rule (Range
et al. 2008). They can correlate the photographs and voice
of their owner (Adachi et al. 2007) and are able to distin-
guish the facial images of two individual humans or dogs
(Racca et al. 2010).
However, in previous experiments, dogs have been
either manually restrained (Adachi et al. 2007; Guo et al.
2009; Racca et al. 2010) or their choices were reinforced
S. Somppi (&) � L. Hanninen � O. Vainio
Faculty of Veterinary Medicine, University of Helsinki,
P. O. Box 57, 00014 Helsinki, Finland
e-mail: [email protected]
H. Tornqvist � C. Krause
Faculty of Behavioral Sciences, University of Helsinki,
P. O. Box 9, 00014 Helsinki, Finland
123
Anim Cogn (2012) 15:163–174
DOI 10.1007/s10071-011-0442-1
(Range et al. 2008), which may have affected the out-
come. Dogs are highly sensitive and respond to human
gestures and attentional states (reviewed in Miklosi et al.
2007; Topal et al. 2009), so the owner or experimenter
may even subconsciously influence the dog’s expecta-
tions and behavior; the phenomenon called Clever Hans
effect (Pfungst 1907). In the previous studies, this effect
has been minimized by avoiding the human interference
with the dogs or leaving the restrainer ignorant of the
purpose of the experiment (Adachi et al. 2007; Guo et al.
2009; Racca et al. 2010). The Clever Hans effect might
be particularly prevalent in dogs (Lit et al. 2011); thus,
tasks that are performed without human presence could
be useful.
In traditional visual discrimination tests the looking
behavior is measured manually from the orientation
behavior (Adachi et al. 2007; Guo et al. 2009; Racca et al.
2010; Farago et al. 2010), which may include both bouts of
active information processing and blank stares (Aslin
2007). When only behavioral measures are used, it is
impossible to define which features the dogs’ attention is
drawn to. Is the dog actually looking at the picture? Eye
movement tracking, a technique for directly assessing
gazing behavior, enables measurement of the sub-elements
of the looking behavior, which likely indicate the under-
lying mechanisms of the performance better than tradi-
tional paradigms (Quinn et al. 2009).
Eye tracking provides an objective method for under-
standing the ongoing cognitive and emotional processes,
such as visual-spatial attention, semantic information pro-
cessing, and motivational states of animals (Henderson
2003; Kano and Tomonaga 2009). Eye movement tracking
in dogs has been used to diagnose ocular motor abnor-
malities such as nystagmus (Jacobs et al. 2009; Dell’Osso
et al. 1998), and a head-mounted eye-tracking camera has
been recently tested on a single dog (Williams et al. 2011),
but to date, the method has not been utilized in canine
cognitive studies. Although dogs’ visual acuity is lower
than that of humans and they have comparatively reduced
color perception (dichromatic color vision), the visual
system of the dog does not impose crucial limitations on
the use of the eye movement tracking method. The canine
optic chiasm has a crossover of about 75%, which allows
for good binocular vision (Miller and Murphy 1995). Dogs
can detect details, even in very small photographs pre-
sented at a short distance (Range et al. 2008).
The spontaneous visual discrimination abilities of human
infants and non-human primates have traditionally been
tested with novelty preference paradigms (Fantz 1964;
Joseph et al. 2006; Dahl et al. 2007; Colombo and Micthell
2009). In a commonly used protocol, a single stimulus is
first presented repeatedly (familiarization) and then chan-
ged to a novel stimulus. If the subject discriminates the
similarity of the repeating presentation, there is a decline in
looking, and a rebound in looking occurs when the stimulus
is shifted if the subject discriminates a novel stimulus from
a familiar one (Houston-Price and Nakai 2004). Recently, it
has been demonstrated that dogs exhibit an orienting
behavior toward novel pictures, but their preferences are
dependent on image category (Racca et al. 2010). This
finding indicates that dogs may have the ability to catego-
rize visual objects. Forming categories by sorting objects
into associative classes is one of the fundamental cognitive
abilities of humans, but little is known about spontaneous
categorization in animals (Murai et al. 2005).
The purpose of our study was to expand knowledge on
the cognitive abilities of dogs using a new method, contact-
free eye movement tracking. Our aim was to assess how
dogs view two-dimensional photographs under free view-
ing conditions where social cueing is ruled out. To study
this, we tested whether dogs differentiate between pictures
according to their novel or categorical information content
without any task-specific pre-training.
Materials and methods
All experimental procedures were approved by the Ethical
Committee for the use of animals in experiments at the
University of Helsinki.
Animals and experimental setup
Six privately owned dogs, aged 1–5 years (Table 1), par-
ticipated in the study. All experiments were conducted at
the Veterinary Faculty of the University of Helsinki.
Prior to conducting the experiments (1–2 months), the
dogs were trained by their owners as instructed by the first
author. Dogs were trained using an operant-positive con-
ditioning method (clicker) to lie on a 10-cm-thick Styro-
foam mattress and lean their jaw on a purpose-designed
u-shaped chin rest for up to 60 s (see Fig. 1 for experi-
mental details). During the last 2–3 weeks of the training
period, the dogs and their owners visited the experimental
Table 1 Breed, age, and gender of the dogs participating in the
experiment
ID Breed Age (years) Gender
D1 Beauce shepherd 1 Female
D2 Beauce shepherd 2 Female
D3 Beauce shepherd 2.5 Female
D4 Rough collie 1 Female
D5 Hovawart 4 Castrated
D6 Great pyreness 5 Female
164 Anim Cogn (2012) 15:163–174
123
room 2–9 times until they were fully familiar with the
experimental environment and instrumentation setup. The
criterion for passing the training period was that the dog
took the pre-trained position without being commanded
and remained in that position for at least 30 s, while the
owner was positioned behind an opaque barrier. During the
training, the dogs were not encouraged to fix their eyes on
any monitor or images.
At the beginning of the test, the dog was released from
the leash at the door of the test room. The door was closed,
the owner went behind the opaque barrier, and the dog
settled down to the pre-trained position and was rewarded
with a titbit following a clicker sound. During the experi-
ments, the dog stayed in the trained position while
Experimenter 1, Experimenter 2, and the owner sat quietly
behind the opaque barrier (Fig. 1). The subject was mon-
itored through the webcam and rewarded after every trial.
If the dog left the position, but returned on its own initia-
tive, it was rewarded after the trial. The dogs were neither
restrained nor forced to perform the task.
Illumination was measured from four points from the
side and front of the dog’s eyes before each calibration,
calibration check trials, and experimental trial. The mean
illumination intensity was 1,050 lx, ranging from 850 to
1,250 lx depending on the size of the individual dog’s
pupils.
Eye-tracking system
The eye movements of the dogs were recorded with an
infrared-based eye-tracking system (iView XTM
RED,
SensoMotoric Instruments GmbH, Germany) integrated
with a 2200 LCD monitor (1,680 9 1,050 px) placed at
0.7 m distance from the dog’s eyes. The device recorded
binocular pupil position 250 times per second using a
corneal reflection method.
Calibration of the eye tracker
The eye tracker was calibrated for each dog’s eyes using a
five-point procedure modified from the method used by
Joseph et al. (2006). The calibrated area was a visual field
of 40.5� 9 24.4� (equal to the size of the monitor).
For calibration, the monitor was replaced with a card-
board wall with five 30-mm holes in the positions of the
calibration points. Each hole was covered by a flap, which
Experimenter 1 lifted up and showed a titbit in the hole to
catch the dog’s attention. When the dog had gazed at a
certain point for a minimum of 5 s, Experimenter 2
accepted the calibration point at the operator computer
(iView XTM
). After accepting five points, Experimenter 1
served the titbit to the dog through the hole at the end of a
pointer stick (0.7 m long).
After the successful five-point calibration, the calibra-
tion was checked by recording a dog’s eye movements with
the same procedure. The criterion for an adequate cali-
bration was achieved if the dog fixed its gaze on the central
calibration point and at least three of four distal points. To
get an optimal calibration, 3–6 calibration sessions were
required for each dog.
The optimal calibration was saved and used for the
experimental trials. The accuracy of the calibration was
verified with five additional calibration check trials on
different days using the same procedure as in the original
calibration session. The head position, illumination, and the
position of the eye tracker were kept the same during the
calibration, calibration check trials, and experimental trials.
During all the calibration check trials, all the dogs fixed
their gaze within a 1� radius of the central calibration point.
The average hit ratio for all five calibration points within a
1� radius was 84% (SD 13%, ranging between 60 and
100%) (Table 2). One degree of gaze angle corresponded
to approximately 1 cm or 40 px on the screen.
To maintain vigilance and to prevent frustration in the
dogs, the calibration and experimental sessions were run on
separate days. However, each experimental session was
initiated by a visual evaluation of the calibration where a
dog’s eye movements were tracked while Experimenter 1
pointed with a pointer stick to calibration points appearing
on the presentation monitor and Experimenter 2 simulta-
neously ensured via the operator computer (iView XTM
) that
visualized gaze cursors hit on the right points.
Fig. 1 a A schematic picture of
the experimental setup for eye
movement tracking of dogs
(n = 6) using a monitor
integrated eye-tracking system
without restraining, b the chin
rest, on which the dogs were
trained to keep their head still.
The dog stayed in the pre-
trained position while the
experimenters and dog’s owner
sat quietly behind the opaque
barrier
Anim Cogn (2012) 15:163–174 165
123
Stimuli
Digital color photographs of four stimulus categories, faces
of humans (HUMAN, n = 29, all Caucasian), faces of dogs
(DOG, n = 27), children’s toys (ITEM, n = 12, manu-
factured by IKEA), and alphabetic characters (LETTER,
n = 15) were used. Sample pictures with their typical eye
gazing paths are shown in results (Fig. 2). All individuals
and items in the stimulus images were unfamiliar to the
participating dogs. In 50% of the HUMAN and DOG
images, the teeth were visible; human subjects were smil-
ing, and the images of dogs with bared teeth were taken
during playing.
Images of 750 9 536 px (corresponding the visual field
of 18.1� 9 12.9�) were presented on a gray background
using Presentation� software (Neurobehavioral Systems,
San Francisco, USA). The stimulus images were manipu-
lated such that the sizes of all the objects were 40% of the
size of the images.
Experimental procedure
Five dogs participated in 8 experimental sessions and one
dog in four sessions. For practical reasons, the owner of D4
was not able to bring the dog to the experiments during all
the test days. The time delays between the experimental
sessions were 24–48 h.
Each experimental session consisted of three experi-
mental trials, each representing one of the four image cat-
egories (DOG, HUMAN, ITEM, or LETTER) (Fig. 3). In
one trial, two stimulus images of the same category were
presented as a total of six frames at a speed of 2 s per frame
with a 500-ms blank screen between frames. A frame cor-
responded to the stimulus image shown once. Every
experimental trial was followed by a 30- to 60-s break,
when the dog was rewarded regardless of its gazing
behavior to maintain its motivation to participate in the test.
The experimental trial consisted of two phases. During
the familiarization phase, the same stimulus image was
presented in three to five frames. After that, the novelty
phase began and the image was shifted to another stimulus
image from the same category, which was repeated in
remaining one to three frames forming a total of six con-
secutive frames. The novelty status of the last frame of the
Table 2 Descriptive data of the mean calibration accuracy % (SD) of
the eye tracker during the calibration check trials and the mean eye
movement tracking ratios % (SD) during experimental sessions for
each dog
ID Calibration
check trials
Calibration
accuracy %
(SD)a
Experimental
sessions
Tracking
ratio %
(SD)b
D1 6 70 (11) 8 29 (19)
D2 6 93 (10) 8 33 (24)
D3 6 80 (13) 8 53 (29)
D4 4 90 (12) 4 25 (25)
D5 6 83 (20) 8 24 (16)
D6 6 87 (10) 8 30 (25)
a The mean percent of hits to 1� radius of the five calibration pointsb The mean percentage of duration the pupils were detected during
experimental sessions
Fig. 2 Examples of the four
stimulus image categories
presented in the study with their
typical binocular scan paths:
a dog face (DOG), b human
face (HUMAN), c children’s toy
(ITEM), and d alphabetic
character (LETTER). The scan
paths of the gaze during a 2 s
viewing of five dogs are drawn
in a different color for each dog.
The lines trace the path that the
eye passed across the image.
Circles represent fixed points of
gaze (fixations). The larger the
point, the longer the subject
fixated onto the corresponding
point. The original images a and
b by Microsoft� Office and c by
IKEA
166 Anim Cogn (2012) 15:163–174
123
familiarization phase was encoded as FAMILIAR and the
first frame of the novelty phase as NOVEL (Fig. 3). To
prevent anticipatory behavior, the shift from familiar to
novel stimulus varied (FAMILIAR ? NOVEL order
3 ? 3, 4 ? 2, or 5 ? 1; Fig. 3). The categories were semi-
randomized within trials and FAMILIAR ? NOVEL
orders; thus, each category was at least once presented
either on the 1st, 2nd, or 3rd trial, and each of the cate-
gories was presented at least once of each FAMIL-
IAR ? NOVEL order.
The FAMILIAR and NOVEL stimuli were paired by
background color, brightness, contrast, and general shape,
to make the images quite similar but easily distinguishable
to humans. Human face pictures were also paired by gen-
der, age, hair color, and expression, and dog pictures by ear
shape and expression (mouth closed vs. open).
If the dog moved, and the eye tracker therefore lost the
eyes for more than three frames in a particular trial, the
whole trial was repeated in the same session. During such a
repeated trial, new images of the same category were
presented in the same order as in the original trial. A total
of 132 original trials (12–24 trials per dog) and ten replaced
trials (0–3 trials per dog) were conducted.
In addition to the experimental trials, baseline data, i.e.,
dogs watching a blank gray screen (BLANK), were gath-
ered from three to four trials per dog. The baseline data
were gathered in two randomly selected separate sessions
before or after image-viewing trials.
Data analysis
Eye movement data were obtained from six dogs for a total
of 857 frames, on average 143 frames per dog, (SD 28
frames, ranging between 89 and 161 frames), including the
baseline recordings. Due to technical problems with the
software, 61 frames of unreadable data were lost.
The tracking ratio, i.e., the mean percentage of the time
a pupil was detected during the experimental session, was
32% (SD 27%, ranging between 0.6 and 96%) (Table 2).
The tracking of eye movements succeeded better for one
eye than the other. For the better eye, the average tracking
ratio was 45% (SD 45%, ranging between 3.5 and 96%).
Tracking ratio includes also breaks between experimental
trials.
The raw binocular eye movement data were analyzed
using BeGaze 2TM
software (SensoMotoric Instruments
GmbH, Berlin, Germany). The fixation of the gaze was
scored with a low-speed event detection algorithm that
calculates potential fixations with a moving window
spanning consecutive data points. The fixation was coded if
the minimum fixation duration was 75 ms and the maximum
dispersion value D = 250 px (D = [max(x)-min(x)] ?
[max(y)-min(y)]). Otherwise, the recorded sample was
defined as part of a saccade.
Statistical analyses
Each stimulus image was divided into three areas of
interest (AOI): monitor, image, and object. Respectively
AOIs of corresponding size and placement were also
defined for the blank screen. From the binocular raw data,
number of fixations, duration of a single fixation, total
duration of fixations, and relative fixation duration were
averaged per frame for each AOI. In the baseline com-
parisons, the relative fixation duration is the duration of the
fixations targeted to the image area as a percentage of the
Fig. 3 A general outline of the experimental session where the
novelty paradigm is tested in three experimental trials. In one trial,
two stimulus images from one of the four categories (DOG, HUMAN,
ITEM, or LETTER) were presented as a total of six frames. The
timing of the shift from familiarization phase to novelty phase varied
through the trials. An example of encoding of the novelty status is
given below the Trial 3; the last frame of the familiarization phase
was encoded as FAMILIAR and the first frame of the novelty phase as
NOVEL
Anim Cogn (2012) 15:163–174 167
123
total duration of all fixations in a monitor area. In other
comparisons, the relative fixation duration is the duration
of the fixations targeted to the object area as a percentage
of the total duration of all fixations in an image area.
The differences in measured gaze parameters between
the blank screen and image-viewing frames were analyzed
using a repeated linear mixed-effect model. The model
incorporated the monitor status (BLANK or IMAGES) and
session as fixed effects. Dog and the interaction between
dog and session were random effects.
The repeated linear mixed-effects model was also
used to assess the differences between the familiar and
the novel images and the differences between image cat-
egories (DOG, HUMAN, LETTER, and ITEM). The model
included category, session, trial and frame as fixed effects.
The random effects were dog, dog 9 session and dog 9
trial. The real object size was used as a covariate. Analysis
of the novelty effect was limited to the critical frames: the
first frame of the trial (1st FRAME), the last frame of the
familiarization phase (FAMILIAR), and the first frame of
the novelty phase (NOVEL) (see Fig. 3 for details).
Interaction between novelty status and category was also
tested, but was excluded as being non-significant. In all
models, session, trial and frame were used as repeated
factors with the first-order autoregressive (AR1) covariance
structure.
Results of the linear mixed models are reported as
estimated averages with a standard error of the mean
(SEM), except for the duration of a single fixation, which is
reported as an estimated average with a 95% confidence
interval (CI) due to reconversion of the logarithmic vari-
able. The significance level was set at alpha 0.05. All the
statistical analyses were conducted in PASW Statistics 17
(IBM Acquires SPSS Inc. 2009, Chicago, USA).
Results
We obtained successful eye movement recordings from all
six dogs even though they were neither restrained nor
trained to look at the screen or 2D pictures. Examples of
typical scan paths of the dogs are shown in Fig. 2.
The statistical significances for the fixed effects and
repeated factors are displayed in Table 3.
Blank screen versus images
Statistically significant differences between BLANK
screen (n = 126) and IMAGES (n = 857) were found for
the number of fixations, duration of a single fixation, total
fixation duration, and relative fixation duration (Table 3).
The dogs fixated at the monitor in the IMAGE condition
more frequently (2.3 ± 0.4 vs. 1.1 ± 0.5, P = 0.000), and
the duration of a single fixation was longer (205 ms, 95%
CI 137–307 ms) than in the BLANK screen conditions
(128 ms, 95% CI 85–193 ms, P = 0.000). When the
IMAGES were displayed, the total duration of fixations
(543 ± 118 ms vs. 209 ± 124 ms, P = 0.000) and rela-
tive fixation duration for the image were longer compared
with the corresponding area on the BLANK screen
(63.0 ± 6.3% vs. 46.8 ± 8.4%, P = 0.007). Examples of
the focus maps are illustrated in Fig. 4.
Novelty effect
Statistically significant differences between the 1st
FRAME (n = 121), FAMILIAR (n = 122), and NOVEL
(n = 119) frames were found for the number of fixations
and total duration of fixations (Table 3). Dogs fixated to the
1st FRAME more often (1.8 ± 0.3) than to FAMILIAR
(1.3 ± 0.3, P = 0.003) or NOVEL frames (0.1 ± 0.3,
P = 0.000). Total duration of fixations decreased after the
shift (373 ± 80 ms vs. 232 ± 81 ms, P = 0.006; Fig. 5).
The relative fixation duration of the object was
56.3 ± 6.5% with no statistically significant differences
between frames.
Effect of repeated measures
Statistically significant differences between consecutively
presented six frames (1st n = 121, 2nd n = 123, 3rd
n = 122, 4th n = 121, 5th n = 121, and 6th n = 123)
were found for the number of fixations, duration of a single
fixation, and total duration of fixations (Table 3). The
number of fixations decreased (P = 0.016) and the dura-
tion of a single fixation increased (P = 0.004) after the first
frame (Fig. 6a). At the 6th frame, the number (P = 0.047;
Fig. 6a) and total duration of fixations decreased
(P = 0.002; Fig. 6b) compared with the previous frame.
The relative fixation duration of the object was
55.5 ± 6.8% with no statistically significant difference
between frames.
Furthermore, statistically significant differences
between experimental sessions were found in the number
of fixations (Table 3). The session had a slight effect on the
mean number of fixations (P = 0.016); it was smallest
during the 5th (0.4 ± 0.4) and highest during the 8th ses-
sion (2.2 ± 0.5), ranging mostly between 1.1–1.7 fixations
per frame. The relative fixation duration of the object was
55.5 ± 8.7%, with no statistically significant differences
between sessions.
Image categories
Significant differences between the four image categories
(DOG n = 238, HUMAN n = 249, ITEM n = 131, and
168 Anim Cogn (2012) 15:163–174
123
Ta
ble
3S
tati
stic
alre
sult
sfr
om
the
lin
ear
mix
ed-e
ffec
tm
od
elan
aly
sis
of
the
effe
cts
of
imag
eca
teg
ory
and
rep
eate
dfa
cto
rs(f
ram
e,tr
ial,
and
sess
ion
)o
nth
en
um
ber
of
fix
atio
ns,
the
du
rati
on
of
asi
ng
lefi
xat
ion
,th
eto
tal
du
rati
on
of
fix
atio
ns,
and
the
rela
tiv
efi
xat
ion
du
rati
on
Co
mp
aris
on
AO
IV
aria
ble
Ty
pe-
III
F-t
ests
for
fix
edef
fect
s
Cat
ego
rya
Fra
me
Tri
alS
essi
on
Bas
elin
eM
on
ito
rN
um
ber
of
fix
atio
ns
Du
rati
on
of
asi
ng
lefi
xat
ion
To
tal
du
rati
on
of
fix
atio
ns
Rel
ativ
efi
xat
ion
du
rati
on
of
the
imag
ear
ea
F(1
,8
35
.3)
=3
7.7
,P
=0
.00
0
F(1
,6
12
.0)
=2
6.5
,P
=0
.00
0
F(1
,8
36
.1)
=5
8.5
,P
=0
.00
0
F(1
,6
15
.2)
=7
.3,
P=
0.0
07
– – – –
– – – –
F(7
,3
0.0
)=
3.1
,P
=0
.01
5
ns
F(7
,3
0.6
)=
3.3
,P
=0
.01
0
F(7
,3
0.6
)=
2.6
,P
=0
.03
4
No
vel
tyef
fect
Imag
eN
um
ber
of
fix
atio
ns
Du
rati
on
of
asi
ng
lefi
xat
ion
To
tal
du
rati
on
of
fix
atio
ns
Rel
ativ
efi
xat
ion
du
rati
on
of
the
imag
ear
ea
F(3
,1
80
.3)
=3
.7,
P=
0.0
12
ns
F(3
,1
64
.0)
=3
.5,
P=
0.0
17
ns
F(2
,2
18
.9)
=1
1.0
,P
=0
.00
0
ns
F(2
,2
28
.2)
=7
.1,
P=
0.0
01
ns
ns
ns
ns
ns
F(7
,1
01
.5)
=2
.4,
P=
0.0
23
ns
ns
ns
Imag
eca
teg
ori
esIm
age
Nu
mb
ero
ffi
xat
ion
s.
Du
rati
on
of
asi
ng
lefi
xat
ion
To
tal
du
rati
on
of
fix
atio
ns
Rel
ativ
efi
xat
ion
du
rati
on
of
the
imag
ear
ea
F(3
,2
63
.1)
=5
.2,
P=
0.0
02
ns
F(3
,2
67
.5)
=6
.2,
P=
0.0
00
F(3
,1
89
.3)
=2
.8,
P=
0.0
42
F(5
,4
60
.8)
=4
.5,
P=
0.0
00
F(5
,3
49
.1)
=2
.7,
P=
0.0
21
F(5
,4
76
.8)
=3
.6,
P=
0.0
03
ns
ns
ns
ns
ns
F(7
,7
0.0
)=
2.7
,P
=0
.01
6
ns
ns
ns
An
aly
ses
of
bas
elin
e,n
ov
elty
effe
ct,
and
imag
eca
teg
ori
esw
ere
anal
yze
dse
par
atel
y.
Th
ere
sult
sar
ep
rese
nte
din
afo
rmF
(Nu
mer
ato
rd
f,D
eno
min
ato
rd
f)=
F,
ob
serv
edsi
gn
ifica
nce
lev
ela
Inth
eb
asel
ine
com
par
iso
n,
cate
go
rym
ean
sm
on
ito
rst
atu
s,i.
e.,
bla
nk
scre
env
ersu
sim
ages
.In
oth
erco
mp
aris
on
s,ca
teg
ori
esar
eD
OG
,H
UM
AN
,IT
EM
,an
dL
ET
TE
R
Anim Cogn (2012) 15:163–174 169
123
LETTER n = 113) were found for the number of fixations,
the total duration of fixations, and the relative fixation
duration (Table 3). The dogs fixated on DOG images more
often (2.0 ± 0.3) than on HUMAN (1.6 ± 0.3, P = 0.014),
ITEM (1.2 ± 0.3, P = 0.006), or LETTER images
(0.5 ± 0.5, P = 0.002). LETTER images gathered fewer
fixations than other categories (LETTER vs. DOG P =
0.000; LETTER vs. HUMAN P = 0.025; LETTER vs.
ITEM P = 0.043; Fig. 7).
The mean duration of a single fixation was 214 ms (95%
CI 154–289 ms) on average and did not differ between the
categories. However, the total fixation duration among the
four image categories differed statistically significant
(P = 0.000; Table 3). Dogs fixated longest on the images
of dogs (534 ± 80 ms) and shortest (94 ± 120 ms) on the
images of alphabetic characters (DOG vs. HUMAN
P = 0.024; DOG vs. ITEM P = 0.001; DOG vs. LETTER
Fig. 4 An example of eye gaze patterns during presentation of the
DOG image (a) and the BLANK screen (b). The focus map represents
averaged fixations of the right eye of five dogs and three repetitions
presented consecutively. The color coding represents the average of
fixation durations; minimum 5 ms indicated by light blue and the
maximum of 100 ms or over by bright red. The area corresponding to
the image size and placement is overlaid on the BLANK screen as a
dashed line
Fig. 5 The effect of novelty status of the image on the number of
fixations per frame (mean ± SEM) and the total duration of fixations
per frame (mean in ms ± SEM) in six dogs. In the novelty paradigm,
the same stimulus image repeats 3–5 times (last frame is considered
as FAMILIAR) and then changes (SHIFT) to another stimulus image
from the same stimulus category (NOVEL). Statistically significant
differences between the 1st, FAMILIAR, and NOVEL frames are
presented as different letters (MIXED, P \ 0.05)
Fig. 6 The effect of the repeated image presentation on a the number
of fixations (mean ± SEM) and the duration of a single fixation
(mean in ms ± SEM) per frame b the total duration of fixations per
frame (mean in ms ± SEM) and the relative fixation duration of the
fixations targeted to the object (% ± SEM) in six dogs during an
experimental trial where a total of six frames of two different images
from the same stimulus category are shown as a novelty paradigm.
Statistically significant differences between the frames are presented
as different letters (MIXED, P \ 0.05)
170 Anim Cogn (2012) 15:163–174
123
P = 0.000; LETTER vs. HUMAN P = 0.005; LETTER
vs. ITEM P = 0.029; Fig. 7). The main effect of the image
category on the relative fixation duration of the object was
statistically significant (P = 0.042), but the pairwise
comparisons did not specify which categories differed from
each other (DOG 65.4 ± 6.4%; HUMAN 56.2 ± 6.7%;
ITEM 60.4 ± 8.4%; and LETTER 39.8 ± 13.3%).
Discussion
The current study produced evidence on the canine visual
cognition by a new method: a contact-free eye movement
tracking. Dogs focused their attention on the informative
regions of the images without any task-specific pre-train-
ing, and their gazing behavior depended on the image
category.
Dogs focused most of their fixation duration on the
actual image compared with the surrounding monitor and
on the actual object compared with the background image,
as it has previously been reported for humans, chimpan-
zees, and monkeys (Yarbus 1967; Nahm et al. 1997; Kano
and Tomonaga 2009). Dogs spontaneously prefer images
of conspecifics over human faces and inanimate objects,
suggesting they might be able to discriminate images of
different categories.
Various animal species can form visual categories when
trained using a match-to-sample procedure (Bovet and
Vauclair 2000). However, animals quickly learn to repeat
the reinforced behaviors (Skinner 1938), even uncharac-
teristic ones, and hence, the natural looking behavior could
remain hidden (Dahl et al. 2007). Explicit rewarding for
certain pre-defined criterion could lead to atypical response
strategies and limit the comparability between studies done
with naıve human subjects (Murai et al. 2005; Dahl et al.
2009). In the current study, the differences between stim-
ulus categories arise from unprompted attention measured
directly as fixational eye movements.
The categories gathered different numbers of fixations,
while the average duration of a single fixation was the same
in all categories. The role of the fixation is to keep the gaze
stable enough for stimulus encoding in the photoreceptors
(Land 1999; Yarbus 1967); thus, a sufficiently long dura-
tion for single fixation is needed to identify the object. In
humans, targets interpreted as informative or interesting
attract more re-fixations (Buswell 1935; Henderson and
Hollingworth 1999). In the current study, the images of
letters received the least number of fixations and thus the
shortest overall fixation duration. The finding corresponds
with behavioral observations in a recent study of Farago
et al. (2010) who suggested that dogs might consider nat-
ural objects more interesting than abstract ones. On the
other hand, these complex pictures might contain more
information to process. The lowest level for the picture
perception can occur on the basis of physical features (i.e.,
color, intensity, contrasts, or edge orientation), which does
not require the understanding the representational content
of the image (Fagot et al. 1999, Bovet and Vauclair 2000).
In eye movements, the targets of the fixations could be
driven not only by higher level conceptual information of
the objects but also by low-level visual saliency (Hender-
son 2003; Henderson and Hollingworth 1999). In fact, the
more complex the image, the more it contains details that
might attract fixations and hence increase the time it takes
to view and process the image. Therefore, the category-
dependent gazing behavior could also be the result of dif-
ferences in physical complexity. In humans, the high- and
low-level mechanisms of the guidance of eye movements
are dependent on the task and could alternate by turns
(Einhauser et al. 2008a, b). Thus, based only on eye
movement data, we cannot yet draw any conclusions as to
whether the attention of dogs was mainly directed by
stimulus features or semantic information, or both.
The letter images were much simpler in their features
than were the other categories, but the images of dog faces
were not apparently more complex from their physical
properties than presented human faces. However, dogs
fixated for more times and longer total duration on canine
faces than on human faces. A preference for conspesific
faces has been suggested to indicate expertise in the per-
ception of the faces of own species (Dahl et al. 2009;
Hattori et al. 2010). Recent behavioral studies in dogs also
found species-dependent looking behavior when viewing
human and canine faces (Guo et al. 2009; Racca et al.
2010). Face detection plays an important role in non-verbal
communication in primates and probably also in other
Fig. 7 The effect of the image category (DOG, HUMAN, ITEM, and
LETTER) on the number of fixations (mean ± SEM) and the total
duration of fixations (mean in ms ± SEM) per frame in six dogs.
Statistically significant differences between the image categories are
presented as different letters (MIXED, P \ 0.05)
Anim Cogn (2012) 15:163–174 171
123
social mammals (Leopold and Rhodes 2010). The dog
might perceive conspecific faces to be more interesting or
informative than human faces due to different social rele-
vancies and communication strategies between dogs and
dogs, and dogs and humans. One factor that might have
affected category preferences is a variation within a cate-
gory; the dog faces represent many breeds, while human
faces were all Caucasian.
As has been found in human infants (Fantz 1964;
reviewed in Colombo and Micthell 2009) and monkeys
(Joseph et al. 2006), the first frame attracted the highest
total looking time, which decreased when the image was
repeated, probably indicating habituation to the stimulus.
However, we could not detect expected rebound in looking
when the novel picture was presented. Instead, the total
duration of fixations decreased after the stimulus changed.
It is likely that dogs did not notice the change or they may
have generalized familiar pictures to novel ones. In gen-
eralization, two stimuli, even if readily distinguishable, are
similar enough with respect to their physical properties to
evoke the same response (Shepard 1994, Chirlanda and
Enqvist 2003; generalization of natural 2D pictures has
been demonstrated in dogs by Range et al. 2008). Most of
the familiar and the familiar–novel pairs were rather sim-
ilar; for example, humans were paired according to gender,
age, hair color, and expression. When the stimuli are
complex, dogs’ performance could be limited as compared
with other species with better visual acuity and color per-
ception (Range et al. 2008). It is possible that the dogs do
not perceive the minor details of the picture and thus
cannot discriminate the images within the same category.
An interaction between category and novelty status was
not established, which is partly contradictory to the recent
study of Racca et al. (2010) in which dogs also directed
shorter looking times to novel canine faces, but longer
durations at novel human faces and objects. Category-
dependent novelty responses have also been reported for
human observers, who prefer familiarity in human faces
but novelty in natural scenes, and had no clear preference
for geometric figures (Park et al. 2010). The different
results may also be due to many methodological reasons;
we examined fixational eye movements, not overall ori-
enting time as Racca et al. (2010). The overall orienting
time includes also the inactive viewing (‘‘blank stares’’;
Aslin 2007). Moreover, the presentation setup differed. In
the current study, the design was a modified version of that
used for monkeys (Joseph et al. 2006). The stimulus was
repeated several times, and dogs were rewarded after the
sixth frame. All dogs were highly motivated, but the
decrease in the number and total duration of fixations
during the last frames of the trial suggest that they might
have got tired of the monotony of the task. It is also likely
that they anticipated the reward and therefore were not
focusing their attention at the monitor at the end of the trial.
Varying the length of trials could have represented a better
design for canine research. Dogs might detect the differ-
ence between familiar and novel images better if the
images are presented side by side as in visual paired
comparison study of Racca et al. (2010). Different meth-
odologies have led to contradicting novelty responses also
in infant studies (Houston-Price and Nakai 2004).
The absence of preferential looking does not necessarily
mean the absence of discrimination (Aslin 2007). Novelty
preferences can vary individually; the subjects might have
perceived certain categories more attractive or might have
used individual strategies for detecting a new stimulus, as
suggested previously for monkeys and humans (Joseph
et al. 2006). Preference may even focus on certain details of
the object, for example, the head area in whole-body pic-
tures Quinn et al. (2009). Also the physical similarity and
differences in attractiveness between consecutive images
could affect novelty responses. A rebound in looking is
likely to occur when consecutive images differ more from
each other and when the novel image is more interesting in
content than the previous one (Dahl et al. 2007).
The dogs in the current study performed the tasks
independently while the owners and experimenters
remained hidden behind an opaque barrier. The dogs were
trained neither to fixate on the monitor nor to discriminate
images. However, all measured eye movement variables
indicated that the dogs were more interested in looking at
the monitor when images were displayed than when the
screen was blank. This finding confirms that the dogs had
not learned to fix their gaze on the monitor in anticipation
of a reward or a response to social cueing. To the authors’
knowledge, blank screen viewing has not been previously
measured in animal studies. Dogs targeted some fixations
toward the blank screen, which is typical also in human
looking behavior because the re-activation of memory
representations drives the eyes to previously viewed loca-
tions (Ferreira et al. 2008).
Clearly, we cannot yet say unequivocally what the dogs
see in the pictures. In humans, a picture is something in
which objects can be recognized, even though the objects
themselves are not actually present. It is under debate
whether animals recognize pictures as representations of
real-world objects (Bovet and Vauclair 2000; Jitsumori
2010). Dogs can associate visual image with acoustical
information, suggesting they are capable of forming mental
representations through pictures (Adachi et al. 2007, Far-
ago et al. 2010). It has also been demonstrated through a
fetching task that at least some dogs are able to match
photographs of items to actual objects (Kaminski et al.
2009). Nevertheless, eye movement tracking is a promising
method for comparing visual perception strategies and
abilities between humans and dogs.
172 Anim Cogn (2012) 15:163–174
123
In conclusion, contact-free eye movement tracking can
be used to assess canine visual cognition. This promising
method represents a tool for the broader exploration of
processes underlying special socio-cognitive skills in dogs
previously established through behavioral studies. Dogs’
attraction to conspecific and human faces over inanimate
objects might reflect the natural interests of dogs, but fur-
ther studies are needed to establish if dogs possess picture
object recognition.
Acknowledgments This work was financially supported by the
Academy of Finland and University of Helsinki. The authors are
grateful to Antti Flyckt, Matti Pastell, Aleksander Alafuzoff, Teemu
Peltonen, Jaana Simola, Timo Murtonen, and Kristian Tornqvist for
their support in conducting the experiment. Authors also thank IKEA
group for the permission to use the photos of children’s toys.
Conflict of interest The authors declare that they have no conflict
of interest.
References
Adachi I, Kuwahata H, Fujita K (2007) Dogs recall their owner’s face
upon hearing the owner’s voice. Anim Cogn 10:17–21
Aslin RN (2007) What’s in a look? Dev Sci 10:48–53
Bovet D, Vauclair J (2000) Picture recognition in animals and
humans. Behav Brain Res 109:143–165
Buswell GT (1935) How people look at pictures; a study of the
psychology of perception in art. University of Chicago Press,
Chicago
Chirlanda S, Enqvist M (2003) A century of generalization. Anim
Behav 66:15–36
Colombo J, Micthell DW (2009) Infant visual habituation. Neurobiol
Learn Mem 92:225–234
Dahl CD, Logothetis NK, Hoffman KL (2007) Individuation and
holistic processing of faces in rhesus monkeys. Proc Rev Soc B
Biol Sci 274:2069–2076
Dahl CD, Wallraven C, Bulthoff HH, Logothetis NK (2009) Humans
and macaques employ similar face-processing strategies. Curr
Biol 19:509–513
Dell’Osso LF, Williams RW, Jacobs JB, Erchul DM (1998) The
congenital and see-saw nystagmus in the prototypical achiasma
of canines: comparison to the human achiasmatic prototype. Vis
Res 38:1629–1641
Einhauser W, Rutishauser U, Koch C (2008a) Task-demands can
immediately reverse the effects of sensory-driven saliency in
complex visual stimuli. J Vis 8:1–19
Einhauser W, Spain M, Perona P (2008b) Objects predict fixations
better than early saliency. J Vis 8:1–26
Fagot J, Martin-Malivel J, Depy D (1999) What is the evidence for an
equivalence between objects and pictures in birds and nonhuman
primates? Curr Psychol Cogn 18:923–949
Fantz RL (1964) Visual experience in infants: decreased attention to
familiar patterns relative to novel ones. Sci 146:668–670
Farago T, Pongracz P, Miklosi A, Huber L, Viranyi Z, Range F
(2010) Dogs’ expectation about signalers’ body size virtue of
their growls. PLoS One 12:1–8
Ferreira F, Apel J, Henderson JM (2008) Taking a new look at
looking at nothing. Trends Cogn Sci 12:405–410
Guo K, Meints K, Hall C, Hall S, Mills D (2009) Left gaze bias in humans,
rhesus monkeys and domestic dogs. Anim Cogn 12:409–418
Hare B, Tomasello M (2005) Human-like social skills in dogs?
Trends Cogn Sci 9:439–444
Hattori Y, Kano F, Tomonaga M (2010) Differential sensitivity to
conspecific and allospecific cues in chimpanzees and humans: a
comparative eye-tracking study. Biol Lett 6:610–613
Henderson JM (2003) Human gaze control during real-world scene
perception. Trends Cogn Sci 7:498–504
Henderson JM, Hollingworth A (1999) High-level scene perception.
Annu Rev Psychol 50:243–271
Houston-Price C, Nakai S (2004) Distinguishing novelty and
familiarity effects in infant preference procedures. Infant Child
Dev 13:341–348
Jacobs JB, Dell’Osso LF, Wang ZI, Acland GM, Bennett J (2009)
Using the NAFX to measure the effectiveness over time of
gene therapy in canine LCA. Invest Ophthalmol Vis Sci
50:4685–4692
Jitsumori M (2010) Do animals recognize pictures as representations
of 3D objects? Comp Cogn Behav Rev 5:136–138
Joseph JE, Powell DK, Andersen AH, Bhatt RS, Dunlap MK, Foldes
ST, Forman E, Hardy PA, Steinmetz NA, Zhang Z (2006) fMRI
in alert, behaving monkeys: an adaptation of the human infant
familiarization novelty preference procedure. J Neurosci Meth-
ods 157:10–24
Kaminski J, Tempelmann S, Call J, Tomasello M (2009) Domestic
dogs comprehend human communication with iconic signs. Dev
Sci 12:831–837
Kano F, Tomonaga M (2009) How chimpanzees look at pictures: a
comparative eye-tracking study. Proc Biol Sci 276:1949–1955
Land MF (1999) Motion and vision: why animals move their eyes.
J Comp Physiol A Neuroethol Sens Neural Behav Physiol
185:341–352
Leopold DA, Rhodes G (2010) A comparative view of face
perception. J Comp Psychol 124:233–251
Lit L, Schweitzer JB, Oberbauer AM (2011) Handler beliefs affect
scent detection dog outcomes. Anim Cogn 14:387–394
Miklosi A, Kubinyi E, Topal J, Gacsi M, Viranyi Z, Csanyi V (2003)
A simple reason for a big difference: wolves do not look back at
humans, but dogs do. Curr Biol 13:763–766
Miklosi A, Topal J, Csanyi V (2007) Big thoughts in small brains?
dogs as a model for understanding human social cognition.
Neuroreport 18:467–471
Miller PE, Murphy CJ (1995) Vision in dogs. J Am Vet Med Assoc207:1623–1634
Murai C, Kosugi D, Tomonaga M, Tanaka M, Matsuzawa T, Itakura S
(2005) Can chimpanzee infants (Pan troglodytes) form categorical
representations in the same manner as human infants (Homosapiens)? Dev Sci 8:240–254
Nahm FKD, Perret A, Amaral DG, Albright TD (1997) How do
monkeys look at faces? J Cogn Neurosci 9:611–623
Park J, Shimojo E, Shimojo S (2010) Roles of familiarity and novelty
in visual preference judgments are segregated across object
categories. Proc Natl Acad Sci USA 107:14552–14555
Pfungst O (1907) Das Pferd des Herrn von Osten (der Kluge Hans):
Ein Beitrag zur experimentellen Tier-und Menchenpsychologie.
Johann Ambrosius Barth, Leipzig
Quinn PC, Doran MM, Reiss JE, Hoffman JE (2009) Time course of
visual attention in infant categorization of cats versus dogs:
evidence for a head bias as revealed through eye tracking. Child
Dev 80:151–161
Racca A, Amadei E, Ligout S, Guo K, Meints K, Mills D (2010)
Discrimination of human and dog faces and inversion responses
in domestic dogs (Canis familiaris). Anim Cogn 13:525–533
Range F, Aust U, Steurer M, Huber L (2008) Visual categorization of
natural stimuli by domestic dogs. Anim Cogn 11:339–347
Shepard R (1994) Perceptual-cognitive universals as reflections of the
world. Psychon Bull Rev 1:2–28
Anim Cogn (2012) 15:163–174 173
123
Skinner (1938) The behavior of organisms: an experimental analysis.
D. Appleton-century company. New York, p 457
Soproni K, Miklosi A, Topal J, Csanyi V (2002) Dogs’ (Canisfamiliaris) responsiveness to human pointing gestures. J Comp
Psychol 116:27–34
Topal J, Miklosi A, Gacsi M, Doka A, Pongracz P, Kubinyi E,
Viranyi Z, Csanyi V (2009) The dog as a model for
understanding human social behavior. In: Brockmann HJ, Roper
TJ, Naguib M, Wynne-Edwards KE, Mitani JC, Simmons LW
(eds) Advances in the study of behavior, vol 39. Academic Press,
Burlington, pp 71–116
Viranyi Z, Topal J, Gacsi M, Miklosi A, Csanyi V (2004) Dogs
respond appropriately to cues of humans’ attentional focus.
Behav Process 66:161–172
Williams FJ, Mills DS, Guo K (2011) Development of a head-
mounted, eye-tracking system for dogs. J Neurosci Methods
94:259–265
Yarbus AL (1967) Eye movements and vision. Plenum Press, New York
174 Anim Cogn (2012) 15:163–174
123