Download pdf - Dogs do look at images: eye tracking in canine cognition research

ORIGINAL PAPER

Dogs do look at images: eye tracking in canine cognition research

Sanni Somppi • Heini Tornqvist • Laura Hanninen •

Christina Krause • Outi Vainio

Received: 11 February 2011 / Revised: 6 July 2011 / Accepted: 1 August 2011 / Published online: 23 August 2011

� Springer-Verlag 2011

Abstract Despite intense research on the visual com-

munication of domestic dogs, their cognitive capacities

have not yet been explored by eye tracking. The aim of the

current study was to expand knowledge on the visual

cognition of dogs using contact-free eye movement track-

ing under conditions where social cueing and associative

learning were ruled out. We examined whether dogs

spontaneously look at actual objects within pictures and

can differentiate between pictures according to their nov-

elty or categorical information content. Eye movements of

six domestic dogs were tracked during presentation of

digital color images of human faces, dog faces, toys, and

alphabetic characters. We found that dogs focused their

attention on the informative regions of the images without

any task-specific pre-training and their gazing behavior

depended on the image category. Dogs preferred the facial

images of conspecifics over other categories and fixated on

a familiar image longer than on novel stimuli regardless of

the category. Dogs’ attraction to conspecifics over human

faces and inanimate objects might reflect their natural

interest, but further studies are needed to establish whether

dogs possess picture object recognition. Contact-free eye

movement tracking is a promising method for the broader

exploration of processes underlying special socio-cognitive

skills in dogs previously found in behavioral studies.

Keywords Canine � Domestic dogs � Eye movement

tracking � Visual cognition

Introduction

Close co-evolution, sharing the same living habitat, and

similarities in canine and human socio-cognitive skills

make the dog a unique model for comparative cognition

studies (Hare and Tomasello 2005; Miklosi et al. 2007;

Topal et al. 2009). Domestication has equipped dogs with

sophisticated sensitivity to respond to human visual com-

municative cues, including gaze direction and pointing

gestures (Hare and Tomasello 2005; Miklosi et al. 2003;

Soproni et al. 2002; Viranyi et al. 2004). Despite the

intense research around the visual communication of

domestic dogs, their cognitive capacities have not yet been

explored by eye tracking.

Animals with developed visual systems control their

gaze direction using eye movements (Land 1999). Eye

movements are linked to attention (Buswell 1935); the

visual-cognitive system directs a gaze toward important

and informative objects, and vice versa, gaze direction

affects several cognitive processes (Henderson 2003).

Based on behavioral observations, dogs can perceive two-

dimensional visual information and can be trained to per-

form visual tasks, such as classifying photographs of nat-

ural stimuli by means of a perceptual response rule (Range

et al. 2008). They can correlate the photographs and voice

of their owner (Adachi et al. 2007) and are able to distin-

guish the facial images of two individual humans or dogs

(Racca et al. 2010).

However, in previous experiments, dogs have been

either manually restrained (Adachi et al. 2007; Guo et al.

2009; Racca et al. 2010) or their choices were reinforced

S. Somppi (&) � L. Hanninen � O. Vainio

Faculty of Veterinary Medicine, University of Helsinki,

P. O. Box 57, 00014 Helsinki, Finland

e-mail: [email protected]

H. Tornqvist � C. Krause

Faculty of Behavioral Sciences, University of Helsinki,

P. O. Box 9, 00014 Helsinki, Finland

123

Anim Cogn (2012) 15:163–174

DOI 10.1007/s10071-011-0442-1

(Range et al. 2008), which may have affected the out-

come. Dogs are highly sensitive and respond to human

gestures and attentional states (reviewed in Miklosi et al.

2007; Topal et al. 2009), so the owner or experimenter

may even subconsciously influence the dog’s expecta-

tions and behavior; the phenomenon called Clever Hans

effect (Pfungst 1907). In the previous studies, this effect

has been minimized by avoiding the human interference

with the dogs or leaving the restrainer ignorant of the

purpose of the experiment (Adachi et al. 2007; Guo et al.

2009; Racca et al. 2010). The Clever Hans effect might

be particularly prevalent in dogs (Lit et al. 2011); thus,

tasks that are performed without human presence could

be useful.

In traditional visual discrimination tests the looking

behavior is measured manually from the orientation

behavior (Adachi et al. 2007; Guo et al. 2009; Racca et al.

2010; Farago et al. 2010), which may include both bouts of

active information processing and blank stares (Aslin

2007). When only behavioral measures are used, it is

impossible to define which features the dogs’ attention is

drawn to. Is the dog actually looking at the picture? Eye

movement tracking, a technique for directly assessing

gazing behavior, enables measurement of the sub-elements

of the looking behavior, which likely indicate the under-

lying mechanisms of the performance better than tradi-

tional paradigms (Quinn et al. 2009).

Eye tracking provides an objective method for under-

standing the ongoing cognitive and emotional processes,

such as visual-spatial attention, semantic information pro-

cessing, and motivational states of animals (Henderson

2003; Kano and Tomonaga 2009). Eye movement tracking

in dogs has been used to diagnose ocular motor abnor-

malities such as nystagmus (Jacobs et al. 2009; Dell’Osso

et al. 1998), and a head-mounted eye-tracking camera has

been recently tested on a single dog (Williams et al. 2011),

but to date, the method has not been utilized in canine

cognitive studies. Although dogs’ visual acuity is lower

than that of humans and they have comparatively reduced

color perception (dichromatic color vision), the visual

system of the dog does not impose crucial limitations on

the use of the eye movement tracking method. The canine

optic chiasm has a crossover of about 75%, which allows

for good binocular vision (Miller and Murphy 1995). Dogs

can detect details, even in very small photographs pre-

sented at a short distance (Range et al. 2008).

The spontaneous visual discrimination abilities of human

infants and non-human primates have traditionally been

tested with novelty preference paradigms (Fantz 1964;

Joseph et al. 2006; Dahl et al. 2007; Colombo and Micthell

2009). In a commonly used protocol, a single stimulus is

first presented repeatedly (familiarization) and then chan-

ged to a novel stimulus. If the subject discriminates the

similarity of the repeating presentation, there is a decline in

looking, and a rebound in looking occurs when the stimulus

is shifted if the subject discriminates a novel stimulus from

a familiar one (Houston-Price and Nakai 2004). Recently, it

has been demonstrated that dogs exhibit an orienting

behavior toward novel pictures, but their preferences are

dependent on image category (Racca et al. 2010). This

finding indicates that dogs may have the ability to catego-

rize visual objects. Forming categories by sorting objects

into associative classes is one of the fundamental cognitive

abilities of humans, but little is known about spontaneous

categorization in animals (Murai et al. 2005).

The purpose of our study was to expand knowledge on

the cognitive abilities of dogs using a new method, contact-

free eye movement tracking. Our aim was to assess how

dogs view two-dimensional photographs under free view-

ing conditions where social cueing is ruled out. To study

this, we tested whether dogs differentiate between pictures

according to their novel or categorical information content

without any task-specific pre-training.

Materials and methods

All experimental procedures were approved by the Ethical

Committee for the use of animals in experiments at the

University of Helsinki.

Animals and experimental setup

Six privately owned dogs, aged 1–5 years (Table 1), par-

ticipated in the study. All experiments were conducted at

the Veterinary Faculty of the University of Helsinki.

Prior to conducting the experiments (1–2 months), the

dogs were trained by their owners as instructed by the first

author. Dogs were trained using an operant-positive con-

ditioning method (clicker) to lie on a 10-cm-thick Styro-

foam mattress and lean their jaw on a purpose-designed

u-shaped chin rest for up to 60 s (see Fig. 1 for experi-

mental details). During the last 2–3 weeks of the training

period, the dogs and their owners visited the experimental

Table 1 Breed, age, and gender of the dogs participating in the

experiment

ID Breed Age (years) Gender

D1 Beauce shepherd 1 Female

D2 Beauce shepherd 2 Female

D3 Beauce shepherd 2.5 Female

D4 Rough collie 1 Female

D5 Hovawart 4 Castrated

D6 Great pyreness 5 Female

164 Anim Cogn (2012) 15:163–174

123

room 2–9 times until they were fully familiar with the

experimental environment and instrumentation setup. The

criterion for passing the training period was that the dog

took the pre-trained position without being commanded

and remained in that position for at least 30 s, while the

owner was positioned behind an opaque barrier. During the

training, the dogs were not encouraged to fix their eyes on

any monitor or images.

At the beginning of the test, the dog was released from

the leash at the door of the test room. The door was closed,

the owner went behind the opaque barrier, and the dog

settled down to the pre-trained position and was rewarded

with a titbit following a clicker sound. During the experi-

ments, the dog stayed in the trained position while

Experimenter 1, Experimenter 2, and the owner sat quietly

behind the opaque barrier (Fig. 1). The subject was mon-

itored through the webcam and rewarded after every trial.

If the dog left the position, but returned on its own initia-

tive, it was rewarded after the trial. The dogs were neither

restrained nor forced to perform the task.

Illumination was measured from four points from the

side and front of the dog’s eyes before each calibration,

calibration check trials, and experimental trial. The mean

illumination intensity was 1,050 lx, ranging from 850 to

1,250 lx depending on the size of the individual dog’s

pupils.

Eye-tracking system

The eye movements of the dogs were recorded with an

infrared-based eye-tracking system (iView XTM

RED,

SensoMotoric Instruments GmbH, Germany) integrated

with a 2200 LCD monitor (1,680 9 1,050 px) placed at

0.7 m distance from the dog’s eyes. The device recorded

binocular pupil position 250 times per second using a

corneal reflection method.

Calibration of the eye tracker

The eye tracker was calibrated for each dog’s eyes using a

five-point procedure modified from the method used by

Joseph et al. (2006). The calibrated area was a visual field

of 40.5� 9 24.4� (equal to the size of the monitor).

For calibration, the monitor was replaced with a card-

board wall with five 30-mm holes in the positions of the

calibration points. Each hole was covered by a flap, which

Experimenter 1 lifted up and showed a titbit in the hole to

catch the dog’s attention. When the dog had gazed at a

certain point for a minimum of 5 s, Experimenter 2

accepted the calibration point at the operator computer

(iView XTM

). After accepting five points, Experimenter 1

served the titbit to the dog through the hole at the end of a

pointer stick (0.7 m long).

After the successful five-point calibration, the calibra-

tion was checked by recording a dog’s eye movements with

the same procedure. The criterion for an adequate cali-

bration was achieved if the dog fixed its gaze on the central

calibration point and at least three of four distal points. To

get an optimal calibration, 3–6 calibration sessions were

required for each dog.

The optimal calibration was saved and used for the

experimental trials. The accuracy of the calibration was

verified with five additional calibration check trials on

different days using the same procedure as in the original

calibration session. The head position, illumination, and the

position of the eye tracker were kept the same during the

calibration, calibration check trials, and experimental trials.

During all the calibration check trials, all the dogs fixed

their gaze within a 1� radius of the central calibration point.

The average hit ratio for all five calibration points within a

1� radius was 84% (SD 13%, ranging between 60 and

100%) (Table 2). One degree of gaze angle corresponded

to approximately 1 cm or 40 px on the screen.

To maintain vigilance and to prevent frustration in the

dogs, the calibration and experimental sessions were run on

separate days. However, each experimental session was

initiated by a visual evaluation of the calibration where a

dog’s eye movements were tracked while Experimenter 1

pointed with a pointer stick to calibration points appearing

on the presentation monitor and Experimenter 2 simulta-

neously ensured via the operator computer (iView XTM

) that

visualized gaze cursors hit on the right points.

Fig. 1 a A schematic picture of

the experimental setup for eye

movement tracking of dogs

(n = 6) using a monitor

integrated eye-tracking system

without restraining, b the chin

rest, on which the dogs were

trained to keep their head still.

The dog stayed in the pre-

trained position while the

experimenters and dog’s owner

sat quietly behind the opaque

barrier

Anim Cogn (2012) 15:163–174 165

123

Stimuli

Digital color photographs of four stimulus categories, faces

of humans (HUMAN, n = 29, all Caucasian), faces of dogs

(DOG, n = 27), children’s toys (ITEM, n = 12, manu-

factured by IKEA), and alphabetic characters (LETTER,

n = 15) were used. Sample pictures with their typical eye

gazing paths are shown in results (Fig. 2). All individuals

and items in the stimulus images were unfamiliar to the

participating dogs. In 50% of the HUMAN and DOG

images, the teeth were visible; human subjects were smil-

ing, and the images of dogs with bared teeth were taken

during playing.

Images of 750 9 536 px (corresponding the visual field

of 18.1� 9 12.9�) were presented on a gray background

using Presentation� software (Neurobehavioral Systems,

San Francisco, USA). The stimulus images were manipu-

lated such that the sizes of all the objects were 40% of the

size of the images.

Experimental procedure

Five dogs participated in 8 experimental sessions and one

dog in four sessions. For practical reasons, the owner of D4

was not able to bring the dog to the experiments during all

the test days. The time delays between the experimental

sessions were 24–48 h.

Each experimental session consisted of three experi-

mental trials, each representing one of the four image cat-

egories (DOG, HUMAN, ITEM, or LETTER) (Fig. 3). In

one trial, two stimulus images of the same category were

presented as a total of six frames at a speed of 2 s per frame

with a 500-ms blank screen between frames. A frame cor-

responded to the stimulus image shown once. Every

experimental trial was followed by a 30- to 60-s break,

when the dog was rewarded regardless of its gazing

behavior to maintain its motivation to participate in the test.

The experimental trial consisted of two phases. During

the familiarization phase, the same stimulus image was

presented in three to five frames. After that, the novelty

phase began and the image was shifted to another stimulus

image from the same category, which was repeated in

remaining one to three frames forming a total of six con-

secutive frames. The novelty status of the last frame of the

Table 2 Descriptive data of the mean calibration accuracy % (SD) of

the eye tracker during the calibration check trials and the mean eye

movement tracking ratios % (SD) during experimental sessions for

each dog

ID Calibration

check trials

Calibration

accuracy %

(SD)a

Experimental

sessions

Tracking

ratio %

(SD)b

D1 6 70 (11) 8 29 (19)

D2 6 93 (10) 8 33 (24)

D3 6 80 (13) 8 53 (29)

D4 4 90 (12) 4 25 (25)

D5 6 83 (20) 8 24 (16)

D6 6 87 (10) 8 30 (25)

a The mean percent of hits to 1� radius of the five calibration pointsb The mean percentage of duration the pupils were detected during

experimental sessions

Fig. 2 Examples of the four

stimulus image categories

presented in the study with their

typical binocular scan paths:

a dog face (DOG), b human

face (HUMAN), c children’s toy

(ITEM), and d alphabetic

character (LETTER). The scan

paths of the gaze during a 2 s

viewing of five dogs are drawn

in a different color for each dog.

The lines trace the path that the

eye passed across the image.

Circles represent fixed points of

gaze (fixations). The larger the

point, the longer the subject

fixated onto the corresponding

point. The original images a and

b by Microsoft� Office and c by

IKEA

166 Anim Cogn (2012) 15:163–174

123

familiarization phase was encoded as FAMILIAR and the

first frame of the novelty phase as NOVEL (Fig. 3). To

prevent anticipatory behavior, the shift from familiar to

novel stimulus varied (FAMILIAR ? NOVEL order

3 ? 3, 4 ? 2, or 5 ? 1; Fig. 3). The categories were semi-

randomized within trials and FAMILIAR ? NOVEL

orders; thus, each category was at least once presented

either on the 1st, 2nd, or 3rd trial, and each of the cate-

gories was presented at least once of each FAMIL-

IAR ? NOVEL order.

The FAMILIAR and NOVEL stimuli were paired by

background color, brightness, contrast, and general shape,

to make the images quite similar but easily distinguishable

to humans. Human face pictures were also paired by gen-

der, age, hair color, and expression, and dog pictures by ear

shape and expression (mouth closed vs. open).

If the dog moved, and the eye tracker therefore lost the

eyes for more than three frames in a particular trial, the

whole trial was repeated in the same session. During such a

repeated trial, new images of the same category were

presented in the same order as in the original trial. A total

of 132 original trials (12–24 trials per dog) and ten replaced

trials (0–3 trials per dog) were conducted.

In addition to the experimental trials, baseline data, i.e.,

dogs watching a blank gray screen (BLANK), were gath-

ered from three to four trials per dog. The baseline data

were gathered in two randomly selected separate sessions

before or after image-viewing trials.

Data analysis

Eye movement data were obtained from six dogs for a total

of 857 frames, on average 143 frames per dog, (SD 28

frames, ranging between 89 and 161 frames), including the

baseline recordings. Due to technical problems with the

software, 61 frames of unreadable data were lost.

The tracking ratio, i.e., the mean percentage of the time

a pupil was detected during the experimental session, was

32% (SD 27%, ranging between 0.6 and 96%) (Table 2).

The tracking of eye movements succeeded better for one

eye than the other. For the better eye, the average tracking

ratio was 45% (SD 45%, ranging between 3.5 and 96%).

Tracking ratio includes also breaks between experimental

trials.

The raw binocular eye movement data were analyzed

using BeGaze 2TM

software (SensoMotoric Instruments

GmbH, Berlin, Germany). The fixation of the gaze was

scored with a low-speed event detection algorithm that

calculates potential fixations with a moving window

spanning consecutive data points. The fixation was coded if

the minimum fixation duration was 75 ms and the maximum

dispersion value D = 250 px (D = [max(x)-min(x)] ?

[max(y)-min(y)]). Otherwise, the recorded sample was

defined as part of a saccade.

Statistical analyses

Each stimulus image was divided into three areas of

interest (AOI): monitor, image, and object. Respectively

AOIs of corresponding size and placement were also

defined for the blank screen. From the binocular raw data,

number of fixations, duration of a single fixation, total

duration of fixations, and relative fixation duration were

averaged per frame for each AOI. In the baseline com-

parisons, the relative fixation duration is the duration of the

fixations targeted to the image area as a percentage of the

Fig. 3 A general outline of the experimental session where the

novelty paradigm is tested in three experimental trials. In one trial,

two stimulus images from one of the four categories (DOG, HUMAN,

ITEM, or LETTER) were presented as a total of six frames. The

timing of the shift from familiarization phase to novelty phase varied

through the trials. An example of encoding of the novelty status is

given below the Trial 3; the last frame of the familiarization phase

was encoded as FAMILIAR and the first frame of the novelty phase as

NOVEL

Anim Cogn (2012) 15:163–174 167

123

total duration of all fixations in a monitor area. In other

comparisons, the relative fixation duration is the duration

of the fixations targeted to the object area as a percentage

of the total duration of all fixations in an image area.

The differences in measured gaze parameters between

the blank screen and image-viewing frames were analyzed

using a repeated linear mixed-effect model. The model

incorporated the monitor status (BLANK or IMAGES) and

session as fixed effects. Dog and the interaction between

dog and session were random effects.

The repeated linear mixed-effects model was also

used to assess the differences between the familiar and

the novel images and the differences between image cat-

egories (DOG, HUMAN, LETTER, and ITEM). The model

included category, session, trial and frame as fixed effects.

The random effects were dog, dog 9 session and dog 9

trial. The real object size was used as a covariate. Analysis

of the novelty effect was limited to the critical frames: the

first frame of the trial (1st FRAME), the last frame of the

familiarization phase (FAMILIAR), and the first frame of

the novelty phase (NOVEL) (see Fig. 3 for details).

Interaction between novelty status and category was also

tested, but was excluded as being non-significant. In all

models, session, trial and frame were used as repeated

factors with the first-order autoregressive (AR1) covariance

structure.

Results of the linear mixed models are reported as

estimated averages with a standard error of the mean

(SEM), except for the duration of a single fixation, which is

reported as an estimated average with a 95% confidence

interval (CI) due to reconversion of the logarithmic vari-

able. The significance level was set at alpha 0.05. All the

statistical analyses were conducted in PASW Statistics 17

(IBM Acquires SPSS Inc. 2009, Chicago, USA).

Results

We obtained successful eye movement recordings from all

six dogs even though they were neither restrained nor

trained to look at the screen or 2D pictures. Examples of

typical scan paths of the dogs are shown in Fig. 2.

The statistical significances for the fixed effects and

repeated factors are displayed in Table 3.

Blank screen versus images

Statistically significant differences between BLANK

screen (n = 126) and IMAGES (n = 857) were found for

the number of fixations, duration of a single fixation, total

fixation duration, and relative fixation duration (Table 3).

The dogs fixated at the monitor in the IMAGE condition

more frequently (2.3 ± 0.4 vs. 1.1 ± 0.5, P = 0.000), and

the duration of a single fixation was longer (205 ms, 95%

CI 137–307 ms) than in the BLANK screen conditions

(128 ms, 95% CI 85–193 ms, P = 0.000). When the

IMAGES were displayed, the total duration of fixations

(543 ± 118 ms vs. 209 ± 124 ms, P = 0.000) and rela-

tive fixation duration for the image were longer compared

with the corresponding area on the BLANK screen

(63.0 ± 6.3% vs. 46.8 ± 8.4%, P = 0.007). Examples of

the focus maps are illustrated in Fig. 4.

Novelty effect

Statistically significant differences between the 1st

FRAME (n = 121), FAMILIAR (n = 122), and NOVEL

(n = 119) frames were found for the number of fixations

and total duration of fixations (Table 3). Dogs fixated to the

1st FRAME more often (1.8 ± 0.3) than to FAMILIAR

(1.3 ± 0.3, P = 0.003) or NOVEL frames (0.1 ± 0.3,

P = 0.000). Total duration of fixations decreased after the

shift (373 ± 80 ms vs. 232 ± 81 ms, P = 0.006; Fig. 5).

The relative fixation duration of the object was

56.3 ± 6.5% with no statistically significant differences

between frames.

Effect of repeated measures

Statistically significant differences between consecutively

presented six frames (1st n = 121, 2nd n = 123, 3rd

n = 122, 4th n = 121, 5th n = 121, and 6th n = 123)

were found for the number of fixations, duration of a single

fixation, and total duration of fixations (Table 3). The

number of fixations decreased (P = 0.016) and the dura-

tion of a single fixation increased (P = 0.004) after the first

frame (Fig. 6a). At the 6th frame, the number (P = 0.047;

Fig. 6a) and total duration of fixations decreased

(P = 0.002; Fig. 6b) compared with the previous frame.

The relative fixation duration of the object was

55.5 ± 6.8% with no statistically significant difference

between frames.

Furthermore, statistically significant differences

between experimental sessions were found in the number

of fixations (Table 3). The session had a slight effect on the

mean number of fixations (P = 0.016); it was smallest

during the 5th (0.4 ± 0.4) and highest during the 8th ses-

sion (2.2 ± 0.5), ranging mostly between 1.1–1.7 fixations

per frame. The relative fixation duration of the object was

55.5 ± 8.7%, with no statistically significant differences

between sessions.

Image categories

Significant differences between the four image categories

(DOG n = 238, HUMAN n = 249, ITEM n = 131, and

168 Anim Cogn (2012) 15:163–174

123

Ta

ble

3S

tati

stic

alre

sult

sfr

om

the

lin

ear

mix

ed-e

ffec

tm

od

elan

aly

sis

of

the

effe

cts

of

imag

eca

teg

ory

and

rep

eate

dfa

cto

rs(f

ram

e,tr

ial,

and

sess

ion

)o

nth

en

um

ber

of

fix

atio

ns,

the

du

rati

on

of

asi

ng

lefi

xat

ion

,th

eto

tal

du

rati

on

of

fix

atio

ns,

and

the

rela

tiv

efi

xat

ion

du

rati

on

Co

mp

aris

on

AO

IV

aria

ble

Ty

pe-

III

F-t

ests

for

fix

edef

fect

s

Cat

ego

rya

Fra

me

Tri

alS

essi

on

Bas

elin

eM

on

ito

rN

um

ber

of

fix

atio

ns

Du

rati

on

of

asi

ng

lefi

xat

ion

To

tal

du

rati

on

of

fix

atio

ns

Rel

ativ

efi

xat

ion

du

rati

on

of

the

imag

ear

ea

F(1

,8

35

.3)

=3

7.7

,P

=0

.00

0

F(1

,6

12

.0)

=2

6.5

,P

=0

.00

0

F(1

,8

36

.1)

=5

8.5

,P

=0

.00

0

F(1

,6

15

.2)

=7

.3,

P=

0.0

07

– – – –

– – – –

F(7

,3

0.0

)=

3.1

,P

=0

.01

5

ns

F(7

,3

0.6

)=

3.3

,P

=0

.01

0

F(7

,3

0.6

)=

2.6

,P

=0

.03

4

No

vel

tyef

fect

Imag

eN

um

ber

of

fix

atio

ns

Du

rati

on

of

asi

ng

lefi

xat

ion

To

tal

du

rati

on

of

fix

atio

ns

Rel

ativ

efi

xat

ion

du

rati

on

of

the

imag

ear

ea

F(3

,1

80

.3)

=3

.7,

P=

0.0

12

ns

F(3

,1

64

.0)

=3

.5,

P=

0.0

17

ns

F(2

,2

18

.9)

=1

1.0

,P

=0

.00

0

ns

F(2

,2

28

.2)

=7

.1,

P=

0.0

01

ns

ns

ns

ns

ns

F(7

,1

01

.5)

=2

.4,

P=

0.0

23

ns

ns

ns

Imag

eca

teg

ori

esIm

age

Nu

mb

ero

ffi

xat

ion

s.

Du

rati

on

of

asi

ng

lefi

xat

ion

To

tal

du

rati

on

of

fix

atio

ns

Rel

ativ

efi

xat

ion

du

rati

on

of

the

imag

ear

ea

F(3

,2

63

.1)

=5

.2,

P=

0.0

02

ns

F(3

,2

67

.5)

=6

.2,

P=

0.0

00

F(3

,1

89

.3)

=2

.8,

P=

0.0

42

F(5

,4

60

.8)

=4

.5,

P=

0.0

00

F(5

,3

49

.1)

=2

.7,

P=

0.0

21

F(5

,4

76

.8)

=3

.6,

P=

0.0

03

ns

ns

ns

ns

ns

F(7

,7

0.0

)=

2.7

,P

=0

.01

6

ns

ns

ns

An

aly

ses

of

bas

elin

e,n

ov

elty

effe

ct,

and

imag

eca

teg

ori

esw

ere

anal

yze

dse

par

atel

y.

Th

ere

sult

sar

ep

rese

nte

din

afo

rmF

(Nu

mer

ato

rd

f,D

eno

min

ato

rd

f)=

F,

ob

serv

edsi

gn

ifica

nce

lev

ela

Inth

eb

asel

ine

com

par

iso

n,

cate

go

rym

ean

sm

on

ito

rst

atu

s,i.

e.,

bla

nk

scre

env

ersu

sim

ages

.In

oth

erco

mp

aris

on

s,ca

teg

ori

esar

eD

OG

,H

UM

AN

,IT

EM

,an

dL

ET

TE

R

Anim Cogn (2012) 15:163–174 169

123

LETTER n = 113) were found for the number of fixations,

the total duration of fixations, and the relative fixation

duration (Table 3). The dogs fixated on DOG images more

often (2.0 ± 0.3) than on HUMAN (1.6 ± 0.3, P = 0.014),

ITEM (1.2 ± 0.3, P = 0.006), or LETTER images

(0.5 ± 0.5, P = 0.002). LETTER images gathered fewer

fixations than other categories (LETTER vs. DOG P =

0.000; LETTER vs. HUMAN P = 0.025; LETTER vs.

ITEM P = 0.043; Fig. 7).

The mean duration of a single fixation was 214 ms (95%

CI 154–289 ms) on average and did not differ between the

categories. However, the total fixation duration among the

four image categories differed statistically significant

(P = 0.000; Table 3). Dogs fixated longest on the images

of dogs (534 ± 80 ms) and shortest (94 ± 120 ms) on the

images of alphabetic characters (DOG vs. HUMAN

P = 0.024; DOG vs. ITEM P = 0.001; DOG vs. LETTER

Fig. 4 An example of eye gaze patterns during presentation of the

DOG image (a) and the BLANK screen (b). The focus map represents

averaged fixations of the right eye of five dogs and three repetitions

presented consecutively. The color coding represents the average of

fixation durations; minimum 5 ms indicated by light blue and the

maximum of 100 ms or over by bright red. The area corresponding to

the image size and placement is overlaid on the BLANK screen as a

dashed line

Fig. 5 The effect of novelty status of the image on the number of

fixations per frame (mean ± SEM) and the total duration of fixations

per frame (mean in ms ± SEM) in six dogs. In the novelty paradigm,

the same stimulus image repeats 3–5 times (last frame is considered

as FAMILIAR) and then changes (SHIFT) to another stimulus image

from the same stimulus category (NOVEL). Statistically significant

differences between the 1st, FAMILIAR, and NOVEL frames are

presented as different letters (MIXED, P \ 0.05)

Fig. 6 The effect of the repeated image presentation on a the number

of fixations (mean ± SEM) and the duration of a single fixation

(mean in ms ± SEM) per frame b the total duration of fixations per

frame (mean in ms ± SEM) and the relative fixation duration of the

fixations targeted to the object (% ± SEM) in six dogs during an

experimental trial where a total of six frames of two different images

from the same stimulus category are shown as a novelty paradigm.

Statistically significant differences between the frames are presented

as different letters (MIXED, P \ 0.05)

170 Anim Cogn (2012) 15:163–174

123

P = 0.000; LETTER vs. HUMAN P = 0.005; LETTER

vs. ITEM P = 0.029; Fig. 7). The main effect of the image

category on the relative fixation duration of the object was

statistically significant (P = 0.042), but the pairwise

comparisons did not specify which categories differed from

each other (DOG 65.4 ± 6.4%; HUMAN 56.2 ± 6.7%;

ITEM 60.4 ± 8.4%; and LETTER 39.8 ± 13.3%).

Discussion

The current study produced evidence on the canine visual

cognition by a new method: a contact-free eye movement

tracking. Dogs focused their attention on the informative

regions of the images without any task-specific pre-train-

ing, and their gazing behavior depended on the image

category.

Dogs focused most of their fixation duration on the

actual image compared with the surrounding monitor and

on the actual object compared with the background image,

as it has previously been reported for humans, chimpan-

zees, and monkeys (Yarbus 1967; Nahm et al. 1997; Kano

and Tomonaga 2009). Dogs spontaneously prefer images

of conspecifics over human faces and inanimate objects,

suggesting they might be able to discriminate images of

different categories.

Various animal species can form visual categories when

trained using a match-to-sample procedure (Bovet and

Vauclair 2000). However, animals quickly learn to repeat

the reinforced behaviors (Skinner 1938), even uncharac-

teristic ones, and hence, the natural looking behavior could

remain hidden (Dahl et al. 2007). Explicit rewarding for

certain pre-defined criterion could lead to atypical response

strategies and limit the comparability between studies done

with naıve human subjects (Murai et al. 2005; Dahl et al.

2009). In the current study, the differences between stim-

ulus categories arise from unprompted attention measured

directly as fixational eye movements.

The categories gathered different numbers of fixations,

while the average duration of a single fixation was the same

in all categories. The role of the fixation is to keep the gaze

stable enough for stimulus encoding in the photoreceptors

(Land 1999; Yarbus 1967); thus, a sufficiently long dura-

tion for single fixation is needed to identify the object. In

humans, targets interpreted as informative or interesting

attract more re-fixations (Buswell 1935; Henderson and

Hollingworth 1999). In the current study, the images of

letters received the least number of fixations and thus the

shortest overall fixation duration. The finding corresponds

with behavioral observations in a recent study of Farago

et al. (2010) who suggested that dogs might consider nat-

ural objects more interesting than abstract ones. On the

other hand, these complex pictures might contain more

information to process. The lowest level for the picture

perception can occur on the basis of physical features (i.e.,

color, intensity, contrasts, or edge orientation), which does

not require the understanding the representational content

of the image (Fagot et al. 1999, Bovet and Vauclair 2000).

In eye movements, the targets of the fixations could be

driven not only by higher level conceptual information of

the objects but also by low-level visual saliency (Hender-

son 2003; Henderson and Hollingworth 1999). In fact, the

more complex the image, the more it contains details that

might attract fixations and hence increase the time it takes

to view and process the image. Therefore, the category-

dependent gazing behavior could also be the result of dif-

ferences in physical complexity. In humans, the high- and

low-level mechanisms of the guidance of eye movements

are dependent on the task and could alternate by turns

(Einhauser et al. 2008a, b). Thus, based only on eye

movement data, we cannot yet draw any conclusions as to

whether the attention of dogs was mainly directed by

stimulus features or semantic information, or both.

The letter images were much simpler in their features

than were the other categories, but the images of dog faces

were not apparently more complex from their physical

properties than presented human faces. However, dogs

fixated for more times and longer total duration on canine

faces than on human faces. A preference for conspesific

faces has been suggested to indicate expertise in the per-

ception of the faces of own species (Dahl et al. 2009;

Hattori et al. 2010). Recent behavioral studies in dogs also

found species-dependent looking behavior when viewing

human and canine faces (Guo et al. 2009; Racca et al.

2010). Face detection plays an important role in non-verbal

communication in primates and probably also in other

Fig. 7 The effect of the image category (DOG, HUMAN, ITEM, and

LETTER) on the number of fixations (mean ± SEM) and the total

duration of fixations (mean in ms ± SEM) per frame in six dogs.

Statistically significant differences between the image categories are

presented as different letters (MIXED, P \ 0.05)

Anim Cogn (2012) 15:163–174 171

123

social mammals (Leopold and Rhodes 2010). The dog

might perceive conspecific faces to be more interesting or

informative than human faces due to different social rele-

vancies and communication strategies between dogs and

dogs, and dogs and humans. One factor that might have

affected category preferences is a variation within a cate-

gory; the dog faces represent many breeds, while human

faces were all Caucasian.

As has been found in human infants (Fantz 1964;

reviewed in Colombo and Micthell 2009) and monkeys

(Joseph et al. 2006), the first frame attracted the highest

total looking time, which decreased when the image was

repeated, probably indicating habituation to the stimulus.

However, we could not detect expected rebound in looking

when the novel picture was presented. Instead, the total

duration of fixations decreased after the stimulus changed.

It is likely that dogs did not notice the change or they may

have generalized familiar pictures to novel ones. In gen-

eralization, two stimuli, even if readily distinguishable, are

similar enough with respect to their physical properties to

evoke the same response (Shepard 1994, Chirlanda and

Enqvist 2003; generalization of natural 2D pictures has

been demonstrated in dogs by Range et al. 2008). Most of

the familiar and the familiar–novel pairs were rather sim-

ilar; for example, humans were paired according to gender,

age, hair color, and expression. When the stimuli are

complex, dogs’ performance could be limited as compared

with other species with better visual acuity and color per-

ception (Range et al. 2008). It is possible that the dogs do

not perceive the minor details of the picture and thus

cannot discriminate the images within the same category.

An interaction between category and novelty status was

not established, which is partly contradictory to the recent

study of Racca et al. (2010) in which dogs also directed

shorter looking times to novel canine faces, but longer

durations at novel human faces and objects. Category-

dependent novelty responses have also been reported for

human observers, who prefer familiarity in human faces

but novelty in natural scenes, and had no clear preference

for geometric figures (Park et al. 2010). The different

results may also be due to many methodological reasons;

we examined fixational eye movements, not overall ori-

enting time as Racca et al. (2010). The overall orienting

time includes also the inactive viewing (‘‘blank stares’’;

Aslin 2007). Moreover, the presentation setup differed. In

the current study, the design was a modified version of that

used for monkeys (Joseph et al. 2006). The stimulus was

repeated several times, and dogs were rewarded after the

sixth frame. All dogs were highly motivated, but the

decrease in the number and total duration of fixations

during the last frames of the trial suggest that they might

have got tired of the monotony of the task. It is also likely

that they anticipated the reward and therefore were not

focusing their attention at the monitor at the end of the trial.

Varying the length of trials could have represented a better

design for canine research. Dogs might detect the differ-

ence between familiar and novel images better if the

images are presented side by side as in visual paired

comparison study of Racca et al. (2010). Different meth-

odologies have led to contradicting novelty responses also

in infant studies (Houston-Price and Nakai 2004).

The absence of preferential looking does not necessarily

mean the absence of discrimination (Aslin 2007). Novelty

preferences can vary individually; the subjects might have

perceived certain categories more attractive or might have

used individual strategies for detecting a new stimulus, as

suggested previously for monkeys and humans (Joseph

et al. 2006). Preference may even focus on certain details of

the object, for example, the head area in whole-body pic-

tures Quinn et al. (2009). Also the physical similarity and

differences in attractiveness between consecutive images

could affect novelty responses. A rebound in looking is

likely to occur when consecutive images differ more from

each other and when the novel image is more interesting in

content than the previous one (Dahl et al. 2007).

The dogs in the current study performed the tasks

independently while the owners and experimenters

remained hidden behind an opaque barrier. The dogs were

trained neither to fixate on the monitor nor to discriminate

images. However, all measured eye movement variables

indicated that the dogs were more interested in looking at

the monitor when images were displayed than when the

screen was blank. This finding confirms that the dogs had

not learned to fix their gaze on the monitor in anticipation

of a reward or a response to social cueing. To the authors’

knowledge, blank screen viewing has not been previously

measured in animal studies. Dogs targeted some fixations

toward the blank screen, which is typical also in human

looking behavior because the re-activation of memory

representations drives the eyes to previously viewed loca-

tions (Ferreira et al. 2008).

Clearly, we cannot yet say unequivocally what the dogs

see in the pictures. In humans, a picture is something in

which objects can be recognized, even though the objects

themselves are not actually present. It is under debate

whether animals recognize pictures as representations of

real-world objects (Bovet and Vauclair 2000; Jitsumori

2010). Dogs can associate visual image with acoustical

information, suggesting they are capable of forming mental

representations through pictures (Adachi et al. 2007, Far-

ago et al. 2010). It has also been demonstrated through a

fetching task that at least some dogs are able to match

photographs of items to actual objects (Kaminski et al.

2009). Nevertheless, eye movement tracking is a promising

method for comparing visual perception strategies and

abilities between humans and dogs.

172 Anim Cogn (2012) 15:163–174

123

In conclusion, contact-free eye movement tracking can

be used to assess canine visual cognition. This promising

method represents a tool for the broader exploration of

processes underlying special socio-cognitive skills in dogs

previously established through behavioral studies. Dogs’

attraction to conspecific and human faces over inanimate

objects might reflect the natural interests of dogs, but fur-

ther studies are needed to establish if dogs possess picture

object recognition.

Acknowledgments This work was financially supported by the

Academy of Finland and University of Helsinki. The authors are

grateful to Antti Flyckt, Matti Pastell, Aleksander Alafuzoff, Teemu

Peltonen, Jaana Simola, Timo Murtonen, and Kristian Tornqvist for

their support in conducting the experiment. Authors also thank IKEA

group for the permission to use the photos of children’s toys.

Conflict of interest The authors declare that they have no conflict

of interest.

References

Adachi I, Kuwahata H, Fujita K (2007) Dogs recall their owner’s face

upon hearing the owner’s voice. Anim Cogn 10:17–21

Aslin RN (2007) What’s in a look? Dev Sci 10:48–53

Bovet D, Vauclair J (2000) Picture recognition in animals and

humans. Behav Brain Res 109:143–165

Buswell GT (1935) How people look at pictures; a study of the

psychology of perception in art. University of Chicago Press,

Chicago

Chirlanda S, Enqvist M (2003) A century of generalization. Anim

Behav 66:15–36

Colombo J, Micthell DW (2009) Infant visual habituation. Neurobiol

Learn Mem 92:225–234

Dahl CD, Logothetis NK, Hoffman KL (2007) Individuation and

holistic processing of faces in rhesus monkeys. Proc Rev Soc B

Biol Sci 274:2069–2076

Dahl CD, Wallraven C, Bulthoff HH, Logothetis NK (2009) Humans

and macaques employ similar face-processing strategies. Curr

Biol 19:509–513

Dell’Osso LF, Williams RW, Jacobs JB, Erchul DM (1998) The

congenital and see-saw nystagmus in the prototypical achiasma

of canines: comparison to the human achiasmatic prototype. Vis

Res 38:1629–1641

Einhauser W, Rutishauser U, Koch C (2008a) Task-demands can

immediately reverse the effects of sensory-driven saliency in

complex visual stimuli. J Vis 8:1–19

Einhauser W, Spain M, Perona P (2008b) Objects predict fixations

better than early saliency. J Vis 8:1–26

Fagot J, Martin-Malivel J, Depy D (1999) What is the evidence for an

equivalence between objects and pictures in birds and nonhuman

primates? Curr Psychol Cogn 18:923–949

Fantz RL (1964) Visual experience in infants: decreased attention to

familiar patterns relative to novel ones. Sci 146:668–670

Farago T, Pongracz P, Miklosi A, Huber L, Viranyi Z, Range F

(2010) Dogs’ expectation about signalers’ body size virtue of

their growls. PLoS One 12:1–8

Ferreira F, Apel J, Henderson JM (2008) Taking a new look at

looking at nothing. Trends Cogn Sci 12:405–410

Guo K, Meints K, Hall C, Hall S, Mills D (2009) Left gaze bias in humans,

rhesus monkeys and domestic dogs. Anim Cogn 12:409–418

Hare B, Tomasello M (2005) Human-like social skills in dogs?

Trends Cogn Sci 9:439–444

Hattori Y, Kano F, Tomonaga M (2010) Differential sensitivity to

conspecific and allospecific cues in chimpanzees and humans: a

comparative eye-tracking study. Biol Lett 6:610–613

Henderson JM (2003) Human gaze control during real-world scene

perception. Trends Cogn Sci 7:498–504

Henderson JM, Hollingworth A (1999) High-level scene perception.

Annu Rev Psychol 50:243–271

Houston-Price C, Nakai S (2004) Distinguishing novelty and

familiarity effects in infant preference procedures. Infant Child

Dev 13:341–348

Jacobs JB, Dell’Osso LF, Wang ZI, Acland GM, Bennett J (2009)

Using the NAFX to measure the effectiveness over time of

gene therapy in canine LCA. Invest Ophthalmol Vis Sci

50:4685–4692

Jitsumori M (2010) Do animals recognize pictures as representations

of 3D objects? Comp Cogn Behav Rev 5:136–138

Joseph JE, Powell DK, Andersen AH, Bhatt RS, Dunlap MK, Foldes

ST, Forman E, Hardy PA, Steinmetz NA, Zhang Z (2006) fMRI

in alert, behaving monkeys: an adaptation of the human infant

familiarization novelty preference procedure. J Neurosci Meth-

ods 157:10–24

Kaminski J, Tempelmann S, Call J, Tomasello M (2009) Domestic

dogs comprehend human communication with iconic signs. Dev

Sci 12:831–837

Kano F, Tomonaga M (2009) How chimpanzees look at pictures: a

comparative eye-tracking study. Proc Biol Sci 276:1949–1955

Land MF (1999) Motion and vision: why animals move their eyes.

J Comp Physiol A Neuroethol Sens Neural Behav Physiol

185:341–352

Leopold DA, Rhodes G (2010) A comparative view of face

perception. J Comp Psychol 124:233–251

Lit L, Schweitzer JB, Oberbauer AM (2011) Handler beliefs affect

scent detection dog outcomes. Anim Cogn 14:387–394

Miklosi A, Kubinyi E, Topal J, Gacsi M, Viranyi Z, Csanyi V (2003)

A simple reason for a big difference: wolves do not look back at

humans, but dogs do. Curr Biol 13:763–766

Miklosi A, Topal J, Csanyi V (2007) Big thoughts in small brains?

dogs as a model for understanding human social cognition.

Neuroreport 18:467–471

Miller PE, Murphy CJ (1995) Vision in dogs. J Am Vet Med Assoc207:1623–1634

Murai C, Kosugi D, Tomonaga M, Tanaka M, Matsuzawa T, Itakura S

(2005) Can chimpanzee infants (Pan troglodytes) form categorical

representations in the same manner as human infants (Homosapiens)? Dev Sci 8:240–254

Nahm FKD, Perret A, Amaral DG, Albright TD (1997) How do

monkeys look at faces? J Cogn Neurosci 9:611–623

Park J, Shimojo E, Shimojo S (2010) Roles of familiarity and novelty

in visual preference judgments are segregated across object

categories. Proc Natl Acad Sci USA 107:14552–14555

Pfungst O (1907) Das Pferd des Herrn von Osten (der Kluge Hans):

Ein Beitrag zur experimentellen Tier-und Menchenpsychologie.

Johann Ambrosius Barth, Leipzig

Quinn PC, Doran MM, Reiss JE, Hoffman JE (2009) Time course of

visual attention in infant categorization of cats versus dogs:

evidence for a head bias as revealed through eye tracking. Child

Dev 80:151–161

Racca A, Amadei E, Ligout S, Guo K, Meints K, Mills D (2010)

Discrimination of human and dog faces and inversion responses

in domestic dogs (Canis familiaris). Anim Cogn 13:525–533

Range F, Aust U, Steurer M, Huber L (2008) Visual categorization of

natural stimuli by domestic dogs. Anim Cogn 11:339–347

Shepard R (1994) Perceptual-cognitive universals as reflections of the

world. Psychon Bull Rev 1:2–28

Anim Cogn (2012) 15:163–174 173

123

Skinner (1938) The behavior of organisms: an experimental analysis.

D. Appleton-century company. New York, p 457

Soproni K, Miklosi A, Topal J, Csanyi V (2002) Dogs’ (Canisfamiliaris) responsiveness to human pointing gestures. J Comp

Psychol 116:27–34

Topal J, Miklosi A, Gacsi M, Doka A, Pongracz P, Kubinyi E,

Viranyi Z, Csanyi V (2009) The dog as a model for

understanding human social behavior. In: Brockmann HJ, Roper

TJ, Naguib M, Wynne-Edwards KE, Mitani JC, Simmons LW

(eds) Advances in the study of behavior, vol 39. Academic Press,

Burlington, pp 71–116

Viranyi Z, Topal J, Gacsi M, Miklosi A, Csanyi V (2004) Dogs

respond appropriately to cues of humans’ attentional focus.

Behav Process 66:161–172

Williams FJ, Mills DS, Guo K (2011) Development of a head-

mounted, eye-tracking system for dogs. J Neurosci Methods

94:259–265

Yarbus AL (1967) Eye movements and vision. Plenum Press, New York

174 Anim Cogn (2012) 15:163–174

123