Faculty of Informatics and Information Technologiesfogelton/projects/PolatsekBachelorThesis.pdfFACULTY OF INFORMATICS AND INFORMATION TECHNOLOGIES Degree Course: INFORMATICS Author:

Slovak University of Technology in Bratislava

Faculty of Informatics and Information Technologies

FIIT-5212-5737

Patrik Polatsek

BLINK RATE TRACKING OF COMPUTER USER

Bachelor thesis

Degree Course: Informatics

Field of study: 9.2.1 Informatics

Place of development: Institute of Informatics and Software Engineering, FIIT STU Bratislava

Supervisor: Ing. Andrej Fogelton

May 2013

ANOTÁCIA

Slovenská technická univerzita v Bratislave

FAKULTA INFORMATIKY A INFORMACNÝCH TECHNOLÓGIÍ

Študijný odbor: INFORMATIKA

Autor: Patrik Polatsek

Bakalárska práca: Sledovanie frekvencie žmurknutí používatel’a pocítaca

Vedúci bakalárskej práce: Ing. Andrej Fogelton

máj, 2013

Táto bakalárska práca sa zaoberá sledovaním frekvencie žmurknutí oka používatel’a pri prá-

ci s pocítacom. Používatel’ má pred obrazovkou pocítaca tendenciu znižovat’ frekvenciu

žmurknutí, cím sa nedokonale nanáša slzný film na ocnú rohovku. Nižšia frekvencia žmurk-

nutí tým pádom spôsobuje zacervenanie a vysychanie oka. Tento casto vyskytujúci sa prob-

lém používatel’ov pocítaca sa nazýva syndróm suchého oka. Ciel’om bakalárskej práce je

vytvorenie algoritmu na detekciu žmurkania. V budúcnosti môže byt’ použitý v aplikácii

na prevenciu suchého oka, ktorá bude detekovat’ používatel’ove žmurknutia. Analyzo-

vali sme dostupné techniky na detekciu žmurknutia a vytvorili naše vlastné riešenia za-

ložené na metódach histogramovej spätnej projekcie, optického toku, snímkových rozdie-

loch a FREAK deskriptoroch. Naše algoritmy sme testovali na rôznych datasetoch pri roz-

licných svetelných podmienkach. Metóda nazvaná Detekcia centrálne zarovnaného pohybu

funguje lepšie ako ostatné metódy. Dosahujeme vyššiu mieru detekcie a omnoho nižšiu

mieru falošnej detekcie pri datasete Talking Face Video než metóda, ktorú predstavili Divjak

a Bischof v roku 2009.

ANNOTATION

Slovak University of Technology Bratislava

FACULTY OF INFORMATICS AND INFORMATION TECHNOLOGIES

Degree Course: INFORMATICS

Author: Patrik Polatsek

Bachelor Thesis: Blink Rate Tracking of Computer User

Supervisor: Ing. Andrej Fogelton

2013, May

This bachelor thesis deals with eye blink rate tracking of the user while working with com-

puter. A user tends to decrease the blink rate in front of a computer screen, due to which the

tear film is non adequately applied on the eye cornea. Lower blink rate causes eye redness

and dryness. This commonly-occurring problem of computer users is called Dry Eye. The

goal of the bachelor thesis is to design an eye blink detection algorithm. In future it can be

used in dry eye prevention application, which will detect user’s blinks. We have analysed

available techniques for eye blink detection and designed our own solutions based on his-

togram backprojection, optical flow, frame difference and FREAK descriptor method. We

have tested our algorithms on different datasets under various lighting conditions. Centre

Aligned Movement Detection method based on optical flow performs better than the other

ones. We achieve higher recognition rate and much lower false positive rate in the Talk-

ing Face Video dataset than the-state-of-the-art technique presented by Divjak and Bischof

in 2009.

Declaration of Honour

I honestly declare, that I wrote this thesis independently under professional supervision of

Ing. Andrej Fogelton with citated bibliography.

May, 2013 in Bratislava signature

Acknowledgement

My thanks mostly belongs to my supervisor Ing. Andrej Fogelton for his willingness, profes-

sional support and helpful advice during the work on this thesis. I want to thank my family

and friends for being so supportive.

Contents

1 Introduction 1

1.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Computer Vision Syndrome 5

2.1 Symptoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Dry Eye . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Prevention and Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Related Work 11

3.1 Optical Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2 Normal Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Gabor Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Variance Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.6 Intensity Distribution in Eye Halves . . . . . . . . . . . . . . . . . . . . . 18

3.7 Deformable Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.8 Eyelid’s State Detecting Value . . . . . . . . . . . . . . . . . . . . . . . . 19

3.9 Circular Hough Transformation . . . . . . . . . . . . . . . . . . . . . . . . 20

3.10 Infrared Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.11 Electrooculography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 Analysis of Used Principles 25

4.1 Histogram Backprojection . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.1.1 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 Centre Aligned Movement Detection . . . . . . . . . . . . . . . . . . . . . 28

Contents Patrik Polatsek


4.3 Frame Difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31


4.4 FREAK Descriptor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32


5 Evaluation and Discussion 35

6 Conclusion 39

A Technical Documentation 45

B User Guide 49

C IIT.SRC 2013 paper 53

D BMVC 2013 paper 61

E Resumé 71

F DVD Contents 77

ii

Chapter 1

Introduction

Today we are surrounded by technologies. We spend more and more hours with visualdisplay unit (VDU) devices such as computers, laptops, TV screens, mobile phones andtablets. Use of VDU devices is often associated with eye and visual problems.

Most individuals blink 10 – 15 times per minute. However the rate of spontaneous eyeblinking while using computer reduces significantly (up to 60% reduction). Blinking helps usto spread the tear film, moisten and disinfect the surface of the eye, due to which the reducedblink rate causes dryness of our eyes. Typical ocular complaints experienced by intensivecomputing work (more than 3 hours a day) include dryness, redness, burning, sandy-grittyeye irritation or sensitivity to light and eye fatigue. These symptoms are also known as DryEye, which is a major part of the Computer Vision Syndrome (CVS). CVS is a set of problemsrelated to the computer use including dry eyes, eyestrain, headache, blurred vision, neck painand backache. The easiest way to avoid the symptoms of Dry Eye is to blink regularly (Yanet al., 2008; Blehm et al., 2005).

There are only few available hardware or software solutions which try to prevent from dryeye symptoms. One of the devices with the aim to protect the user’s eyes is the Blink Now1

(Figure 1.1). Unfortunately, it does not use an eye blink detection. It is just a small externalscreen projecting a blinking eye, which should remind individuals to blink regularly. Wehave serious doubts about the efficiency of such a solution.

Figure 1.1: Blink Now device uses a screen to remind eye blinking1.

Eye blink detection has a wide variety of applications. Google improved Face unlock appli-cation to unlock the screen by face detection. Mobile devices with Android 4.1 Jelly Bean

1http://www.blinknow.co.uk/

http://www.blinknow.co.uk/

Chapter 1. Introduction Patrik Polatsek

require an eye blink to verify that that the device is unlocked by a real person and not apicture2. At the industrial exhibition Internationale Funkausstellung Berlin (IFA) was intro-duced eye-controlled technology Gaze TV by Haier3. Users can adjust the settings, changevolume or switch channel by looking at a specific point on the screen and they confirm theirchoice by eye blinks.

Figure 1.2: Eye tracking system for physically disabled people4.

Volkswagen has developed a prototype of a driver fatigue detection system5 based on eyeblink frequency which prevents from driver’s micro-sleep (Figure 1.3). Eye tracker systemsalso help people with physical disabilities (Figure 1.2). It gives them ability to control acomputer with eye movements6. Their blink is used to simulate the left button mouse click.

Figure 1.3: Anti micro-sleep system5.

The movie Mission Impossible - Ghost Protocol (2011) brings a new way of blink recognitionapplication. Bionic contact lenses presented in the movie allow the user to take pictures andsend them to a computer wirelessly by blinking.

Our aim is to design an algorithm, which would detect user’s blinks from images capturedby webcam. In future it can be used in an application which will analyse user’s blink rate.If the rate of blinking will be lower than the average, it will remind a user to blink and toprotect eyes.

2http://www.android.com/about/jelly-bean/3http://www.bbc.co.uk/news/technology-194418604http://www.eyetechds.com5https://www.volkswagen-media-services.com/medias_publish/ms/

content/en/pressemitteilungen/2007/08/30/the_battle_against.standard.gid-oeffentlichkeit.html

6http://www.gateway2at.org/page.php?page_ID=3&gen_ID=12&mensub_ID=4&submen_ID=1&AtDet_ID=22

2

http://www.android.com/about/jelly-bean/

http://www.bbc.co.uk/news/technology-19441860

http://www.eyetechds.com

https://www.volkswagen-media-services.com/medias_publish/ms/content/en/pressemitteilungen/2007/08/30/the_battle_against.standard.gid-oeffentlichkeit.html



http://www.gateway2at.org/page.php?page_ID=3&gen_ID=12&mensub_ID=4&submen_ID=1&AtDet_ID=22

http://www.gateway2at.org/page.php?page_ID=3&gen_ID=12&mensub_ID=4&submen_ID=1&AtDet_ID=22


1.1 Requirements

We decided to create an eye blink detection algorithm, which will be used in the applicationavailable for the most of the users. It should not require any special or expensive hardware.The only peripheral device our solution uses for detecting the blinks is a webcam with suf-ficient frame rate to track the face and eyes in real-time. Low hardware requirements makethis solution accessible for the most of the people.

According to (Rodriguez et al., 2013), a typical blink lasts from 100 to 400 msec. During theblink, eyes are fully closed from 10 to 80 msec. An eye blink consists of closing and openingpart. The fastest eye closure takes about 50 msec (Stern et al., 1984). Thus we suggest touse a webcam with at least 25 frames per second (fps) rate (i.e. a webcam which capturesframes every 40 msec) which should be sufficient to capture at least partially closed eyes ofthe eye blink sequence.

We focused only on the design of the eye blink detection algorithm, which can be in futureused in a dry eye prevention application with intuitive and easy to use user interface. It willgently notify the user to blink regularly with increased frequency and intensity.

We determined several requirements for the eye blink detection algorithm:

• detect blinks of a user who looks at the computer screen,

• high blink detection rate and low number of missed blinks,

• low hardware requirements,

• low computational requirements,

• acceptable tradeoff between the speed of the algorithm and the quality of the detection.

3


4

Chapter 2

Computer Vision Syndrome

The number of people using computers increases every day. High percentage of them sufferfrom symptoms collectively called CVS. It is relatively new medical term. It is a group ofsymptoms that results from computer use (Comeau – Godnig, 1999). According to (Divjak– Bischof, 2008) more than 40% of the European population uses a computer at work and70% of computer workers worldwide experience some complaints related to CVS.

2.1 Symptoms

CVS includes a wide variety of ocular, vision and non-ocular problems. The major categoriesof symptoms and common diagnoses are listed in Table 2.1.

Table 2.1: The most common symptoms and diagnoses of CVS (Blehm et al., 2005).Symptom Category Symptoms DiagnosisAsthenopic Eyestrain Binocular vision

Tired Eyes AccommodationSore eyes

Ocular surface-related Dry eyesWatery eyesIrritated eyesContact lens problems

Visual Blurred vision Refractive errorSlowness of focus change AccommodationDouble vision Binocular visionPresbyopia

Extraocular Neck pain Presbyopic correctionBack pain Computer screen locationHeadacheShoulder pain

The most frequent health problems among the VDU users are eye-related symptoms. Eye-strain is a subjective complaint about painful and irritated vision experiences during pro-

Chapter 2. Computer Vision Syndrome Patrik Polatsek

longed VDU usage. Work with VDU devices causes diminished power of accommodation,removal of near point of convergence and deviation of phoria for near vision (Blehm et al.,2005).

Eyestrain can lead into blurred vision. It creates an understimulation or a lag of accommo-dation. Blurred vision is a result of refractive error or a poor computer working environment(a dirty computer screen, a higher viewing angle, reflected glare or a poor quality of thescreen) (Yan et al., 2008; Blehm et al., 2005).

Headache often accompanies other CVS symptoms. It usually occurs in the middle or endof the day. Improper viewing position while using VDUs causes neck pain, backache orshoulder pain (Yan et al., 2008).

2.1.1 Dry Eye

The surface of the eye must be covered with lacrimal (tear) film. With each blink, tearsare spread on the whole surface of eye to provide lubrication and to wash away dust andmicroorganisms1.

Lacrimal film consists of mucus, aqueous (watery) and lipid (oil) layer (Figure 2.1). Theinner mucus layer is produced by goblet cells on the ocular surface. The middle and thethickest layer called the aqueous layer is produced by the lacrimal glands. Lipid layer is anouter layer produced by the meibomian glands. It prevents from evaporation of tears andincreases stability of lacrimal film. Computer users who blink less frequently have a thinnerlipid layer (Cernák – Cernák, 2007).

Figure 2.1: Tear film consists of mucus, aqueous (watery) and lipid (oil) layer2.

1http://www.aoa.org/x4717.xml2http://theralife.com/images/eyediagram.jpg

6

http://www.aoa.org/x4717.xml

http://theralife.com/images/eyediagram.jpg


Disorders of any lacrimal film layer or an inadequate amount of tears leads into one of themost typical CVS symptoms – dry eye (keratokonjuktivitis sicca) which is a cause of ocularfatigue. This term was used for the first time by Swedish ophthalmologist Sjögren in 1933.Nowadays it is a problem of 15% – 17% of population (Cernák – Cernák, 2007).

Dry eye is affected by many external and internal factors (Rosenfield, 2011; Blehm et al.,2005):

1. Environmental factors producing corneal drying: low ambient humidity, high forced-air heating, the use of air condition or ventilation fans, airborne dust.

2. Reduced blink rate: While using VDU the blink rate is reduced. When eyes blink, tearsmoisture their surface and wash away dust and microorganisms. So reduced blinkingcontributes to a poor film quality and stresses the cornea. According to studies in(Blehm et al., 2005) normal blink rate is between 10 – 15 blinks per min and it canreduce due to computer use up to 60%. A research mentioned in (Rosenfield, 2011)compares the blink rate while relaxing, reading a book and reading text on a VDU.While relaxing the mean blink rate was 22 per min. However while reading a book itwas 10 per min and during viewing VDU it was only 7 per min.

3. Incomplete blinking: It was noticed that incomplete blinks are common while workingwith computer. However, it is unclear if the incompletion of blinks is undesirable.(Rosenfield, 2011) refers to a study which suggests that incomplete blinking may bealso a possible cause of CVS.

4. Increased exposure: By reading a handcopy text eyes are usually looking downwardswhereas computer users view text in a horizontal gaze.

5. Age and gender: The prevalence of dry eye is higher in females and increases withage.

6. Contact lens use: Wearing contact lens alters the blink rate significantly. The causemay be an irritation by lens or a more unstable tear film.

7. Systemic diseases and medications: Diseases such as arthritis, allergy or thyroid dis-ease may contribute to ocular drying. Dry eye may also have an association withvarious systemic medications including diuretics, anti-histamines, psychotropics, andanti-hypertensives.

8. Ocular conditions: Dry eye may be a result of dysfunction of the glands which producethe tear film.

Figure 2.2: Dilated blood vessels near the surface of eye are one of the symptoms of DryEye3.

3http://anthonyliska.com.au/wp-content/uploads/2011/01/800px-Pink_eye.jpg

7

http://anthonyliska.com.au/wp-content/uploads/2011/01/800px-Pink_eye.jpg

http://anthonyliska.com.au/wp-content/uploads/2011/01/800px-Pink_eye.jpg


Dry eyes may cause burning or itchy eyes (Figure 2.2), discomfort while wearing contactlenses, increased sensitivity to light or blurred vision (Yan et al., 2008).

2.2 Prevention and Treatment

In this section we focus on several strategies which deal with CVS (Yan et al., 2008). Thecomputer screen should be placed at a minimum distance of 20 inch (50.8 cm) from the user’seyes and 10◦ – 20◦ below the eye level. The viewing angle of the screen should be adjustedto around 15◦ lower than the horizontal level which reduces the visual and musculoskeletaldiscomfort such as dry eyes, neck and back pain (Figure 2.3). The brightness affects a visualfatigue of the user. It is recommended to use screens with minimal refresh rate of 75 Hz.Higher refresh rate decreases ocular symptoms. Screen reflections should be eliminated, forexample by using anti-glare filters.

Figure 2.3: Proper distance and viewing angle of computer screen can help users to reduceCVS symptoms4.

Room conditions (e.g. humidity and dust), the screen lighting and room lighting should bechecked before computer use. The source of light within the user’s field of view should beless than three times the mean screen luminance. Proper conditions for computer usage mayreduce the risk of CVS. It is also important to have a proper desk and chair. Good sittingposition can prevent from neck pain, back pain and headache.

4http://www.lakeridgeeye.com/index-3.php

8

http://www.lakeridgeeye.com/index-3.php


It is also recommended to follow the 20/20/20 rule. After 20 min of computer use, the usershould look at something 20 ft (6.1 m) away for at least 20 seconds.

According to (Comeau – Godnig, 1999) users should have a 15 min break after 2 hours ofcomputer use. According to another study, users after 45 min of computer work should closetheir eyes, cover them gently with palms for 20 – 30 sec and repeat this activity regularly5.

Warm eyelid massage is suggested for users who work with computer more than 3 hours(Yan et al., 2008). The massage stimulates the tear glands, increases the blood circulationwithin the eyes and reduces the occurrence of dry eyes. Computer users might place a warmtowel over their closed eyes and massage upper and lower eyelid against the bone placedabove and under the eye for 10 seconds (Figure 2.4).

Figure 2.4: Regular eyelid massage can help to relieve dry eye symptoms6.

Treatment suggestions of dry eyes include (Yan et al., 2008):

• blink frequently when using VDUs,

• artificial tears: support natural tears with artificial tears (Figure 2.5),

• surgical intervention: plug the eye’s drain to block the drainage.

Figure 2.5: Application of artificial tears moderates dry eyes7.

5CERNÁK, A. Nenechajte pocítac znicit’ vaše oci, Oci a zrak, MEDIAPLANET vol. 4, March 2011,pages 2. Available from: http://doc.mediaplanet.com/all_projects/6973.pdf.

6http://getdryeyetreatment.com/blog/warm-eyelid-compress-memboitis-travelling/quick-hot-eyelid-compress/

9

http://doc.mediaplanet.com/all_projects/6973.pdf

http://getdryeyetreatment.com/blog/warm-eyelid-compress-memboitis-travelling/quick-hot-eyelid-compress/

http://getdryeyetreatment.com/blog/warm-eyelid-compress-memboitis-travelling/quick-hot-eyelid-compress/


The application of artificial tears is the most common treatment of dry eye. It substitutesthe lack of natural tears and helps to keep the eyes moistened. The biggest disadvantage ofmost artificial tears is the presence of preservatives and additives which can irritate the eyesor make worsen the ocular drying (Cernák – Cernák, 2007).

7http://www.aoa.org/x4717.xml

10

http://www.aoa.org/x4717.xml

Chapter 3

Related Work

In this chapter we present several methods for eye blink detection. We focus on their benefits,disadvantages and efficiency. The algorithms we focus on achieve good blink detectionresults and have the ability work in real-time. Most of these techniques consists of threesteps (Figure 3.1):

• face detection,

• eye detection,

• eye blink detection.

Figure 3.1: General process of blink detection algorithms.

According to the state-of-the-art in this field, algorithms represented in Table 3.1 could beused to detect the eye blink.

Blink detection and the frequency measuring are used by application with the aim to protectusers from dry eye syndrome. Some approaches are designed to be used by disabled peopleand they do not suppose any head movements. Other algorithms use blinks to localise eyesfor eye tracking. Change from open to closed eye is important for a vehicle system as aprevention from micro-sleep.

The only commercial solution we found, detects blinks in order to protect the eyes fromCVS is iVision Guard1. It is the application which analyses the frequency of blinking fromimages obtained by webcam to detect an eye fatigue. In the case of reduced blink rate itrecommends the user to have a short break by balloon notifications. It provides informationabout the actual blink rate and statistics of blinking (Figure 3.2).

1http://www.ivisionguard.com/

http://www.ivisionguard.com/

Chapter 3. Related Work Patrik Polatsek

Table 3.1: Overview of blink detection algorithms.Method Descriptionoptical flow track eyelid motionnormal flow track eyelid motionGabor filter extract eyelid contoursvariance map compute the change in the intensitycorrelation compare an eye with a templateintensity distribution in eye halves compare mean intensities in eye halvesdeformable model extract eye shapeeyelid’s state detecting value find a minimum threshold valueCircular Hough Transformation detect iris within the eyeinfrared sensor detect the light reflected from eyeselectrooculography detect electrooculographic signal

Figure 3.2: Statistics of eye blink rate obtained from iVision Guard.

3.1 Optical Flow

Method of optical flow is used to locate new feature positions in the following frame. It is of-ten used to estimate the motion in between two images. The most common motion detectionalgorithm is Lucas-Kanade (KLT) feature tracker (Tomasi – Kanade, 1991). Optical flowcan be used to track eyelid movements and subsequently to detect the eventual eye blinks.

If the brightness constancy of a feature point is assumed the displacement (u, v) will be:

It(x, y) = It+1(x+ u, y + v), (3.1)

where u and v are displacements in directions x and y, It is a brightness of a current frameand It+1 is a brightness of the next one. Using Taylor series we follow the equation:

It+1(x+ u, y + v) ≈ It(x, y) +∂I

∂xu+

∂I

∂yv +

∂I

∂t(3.2)

The adjustment of Equation 3.2 gives the optical flow constraint equation (Laganière, 2011)

12


formulated in Equation 3.3.∂I

∂xu+

∂I

∂yv = −∂I

∂t(3.3)

The purpose of the computation is to calculate the velocity vector. For every feature pointthe neighbourhood is determined and searched for a point with the closest intensity value.

(Lalonde et al., 2007) proposes eye blink detector based on optical flow algorithm. The eyearea is detected in the initial phase by analysing the horizontal and vertical profile of the im-age. The area is afterwards described by SIFT (scale-invariant feature transform) descriptorcomputed on GPU. First, motion is detected inside the eye regions using thresholded framedifference. Consequently, these regions are being used to calculate the optical flow. Whileuser blinks, eyelids move up and down and the dominant motion is in the vertical direction.This method detects 97% of blinks on their own dataset. Most of the false detections arecaused due to the gaze lowering and vertical head movements.

A method based on optical flow estimation is also presented in (Divjak – Bischof, 2009) asthe eye fatigue detector to prevent from CVS. It locates eyes and face position by 3 differentclassifiers. Due to the use of Haar Cascade Classifier (Viola – Jones, 2001), the algorithmis successful mostly, when the head is directly facing the camera. The KLT tracker is usedto track the detected feature points. This blink detector uses GPU-based optical flow inthe face region. This approach is based on their previous normal flow algorithm (Divjak –Bischof, 2008). Therefore we suppose that all parts of the proposed algorithm which arenot explained, remain unchanged. The flow within the eyes is compensated to the globalface movements, normalised and corrected in rotation when the eyes are in non-horizontalposition. Afterwards the dominant angle of optical flow orientation can be estimated. Dur-ing the blink, the flow perpendicular to line connecting both eyes should be dominant (thedownward motion represents the angle between 200◦ and 340◦). The flow data are processedby unspecified adaptive threshold. The results of left and right eye are merged to detect eyeblinks. However authors do not specify the way how the results are combined. Authors re-port 95% blink detection rate on the Talking Face Video dataset from INRIA2 (Chapter 5).Unfortunately this approach has problems to detect eye blinks while quick head movementsup and down.

(Bhaskar et al., 2003) uses blink detection in the eye tracking algorithm. User’s blinks arerecognised at the beginning of the proposed algorithm to locate eyes position. Frame dif-ference serves to determine the possible eye candidates. The most suitable ones are chosenaccording to the region size. The algorithm computes optical flow in the region of chosen eyecandidates and determines the dominant direction of motion. Depending on this direction,blinks are detected and eye position is located. Eyes are eventually tracked with the KLTtracker. Some feature points are lost during the blink or drastic eye motion. If the numberof lost features is greater than 20% of points the tracker is reinitialized after the next blink.97% success rate was obtained on their own dataset.

2http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html

13

http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html



3.2 Normal Flow

Eyelid movements are estimated by normal flow instead of optical flow in (Heishman –Duric, 2007) (Figure 3.3). Optical flow integrates information over image regions and cal-culates the overall image movement whereas normal flow can be computed using only localinformation. Authors claim that normal flow calculation is more effective than the previousmethod and can be also applied in real-time.

Optical flow can be decomposed into its normal and tangential components. From the Equa-tion 3.3 we can define the normal component, i.e. the normal flow. It is the component ofoptical flow that is orthogonal to the image brightness gradient or edges:

un = − It|∇I| , (3.4)

where It = ∂I∂t

and ∇I =(∂I∂x, ∂I∂y

)is the image gradient (Denteneer – Jasinschi, 2004).

For each edge element found with Canny detector a small window is created with its rowsparallel to the gradient direction and a larger window for the next frame. The intensitydifference is computed using a sliding window. The difference is equal to zero at the distanceun from the origin of the second window. The estimated normal flow vector is then−un. Thedifference between the intensities in colour images is computed in each colour component.

Figure 3.3: Normal flow displacement of closing and opening eyes (Heishman – Duric,2007).

The resulting data from the previous computation as a magnitude and a direction of thenormal flow are used in a deterministic finite state machine (DFSM) to estimate blink pa-rameters. DFSM has three states to determine in which phase are the subject’s eyes: steady(open), opening and closing state. The disadvantage is that the threshold strategy used inthis algorithm requires to set various thresholds manually depending on the subjects andconditions.

A similar method is applied in (Divjak – Bischof, 2008). Face and eyes of a subject aredetected and tracked using classifier in OpenCV library. If the detection is unsuccessful,feature points detected by the FAST feature detector are tracked by the KLT tracker. Normalflow vectors computed in the eye region are used to detect eye blinks. The flow is correctedand normalised in the same way as described in (Divjak – Bischof, 2009). This approachutilises and improves DFSM from previous work by adding a new state: closed to detect

14


more variations in eye movements, for example holding eyes closed. Authors use for blinkextraction a threshold defined as T = 6 × standard_deviation(n), where n is the flowmagnitude in the stationary eye state. In some cases this threshold must be set manually.

3.3 Gabor Filter

Gabor filter is a linear filter used in image processing for edge detection. For our purposesit can be used to extract contours within the eye regions. Blinks are detected after measuringthe distance between the upper and lower eyelid. Measured distance differs in case of closedand open eye.

2D Gabor filter has the following form:

G(x, y) = exp(−x′2 + γ2y′2

2σ2) cos(2π

x′

λ), (3.5)

where x′ = x cos θ + y sin θ and y′ = −x sin θ + y cos θ. σ is the standard deviation ofthe Gaussian envelope, λ represents the wavelength of the sinusoidal function, θ representsthe normal to the parallel stripes of the Gabor function and the last parameter is γ, calledthe spatial aspect ratio. It specifies the ellipticity of the Gabor function. Gabor function isthe result of modulation of a sinusoidal plane with a Gaussian. Gabor filter-based image isobtained by convolution using the Gabor kernel in the image.

Gabor filter-based method mentioned above appeared in (Arai – Mardiyanto, 2011) (Fig-ure 3.4). Eye region detection is based on deformable template matching. If it fails, eyes arelocalised using Haar Cascade Classifier (Viola – Jones, 2001). After applying the filter, thedistance between detected top and bottom arc in eye region is measured. Different distanceindicates closed or opened eye. The problem of arc extraction arises while the person islooking down. Then the difference in distance of closed and open eyes is too small.

Figure 3.4: Application of Gabor filter on image (Arai – Mardiyanto, 2011).

3.4 Variance Map

Variance measures how values deviate from the mean. If we apply variance in image pro-cessing variance map specifies the deviation of intensities from the mean value in an imagesequence. The intensity of pixels located in eye region changes during the blink, which canbe used in the detection process.

Let Ii(x, y) be the intensity of pixel at coordinates (x, y) for image i. The recursive equationsfor the mean image µ and the variance map σ2 are calculated as follows:

µj+1(x, y) =jµj(x, y) + Ij+1(x, y)

j + 1, (3.6)

15


σ2j+1(x, y) =

(1− 1

j

)σ2j (x, y) + (j + 1)(µj+1 − µj)2, (3.7)

where µ1(x, y) = I1(x, y) and σ21(x, y) = 0.

Variance map is a part of blink detection technique in (Morris et al., 2002) which locatesuser’s eyes in a real-time eye tracker. Due to unconscious head movements face can bedetected using the thresholded accumulated frame difference obtained from several framedifferences of consecutive gray scale frames. Blink detection is described in the algorithmabove. After the creation of a variance map (Figure 3.5), the rate of thresholded pixels iscalculated and if it exceeds a specific value, a blink is detected. Possible eye blink regions arethen eliminated by their size, horizontal symmetry, mutual position in the face and similarityof these regions and selected eye blink pair is adjusted to the same size.

Figure 3.5: Two sample images from image sequence, the resulting variance map, thresh-olded variance map, eye-blink regions and detected face and eyes positions (Morris et al.,2002).

Very similar system to previous work mentioned in (Nawaz – Sircar, 2003) applies the samevariance map method for the blink detection. The position of the eyes is approximated bythe following technique. Firstly user’s face is detected. Image pixels are classified as skinor non-skin according to their chromatic red and blue values. Eye localisation takes intoaccount the fact that open eyes have different colour than the skin. Eyes are then trackedusing the optical flow algorithm. The resulting variance map of eye region is thresholdedand the ratio of the sum of thresholded pixels to the total number of pixels indicates whethereyes are open or not.

Proposed iris tracking and blink detection system designed to be used by paralysed users in(Khilari, 2010) detects blinks via a variance map. When the face is in a stabilised position,the frame difference is created and filtered. Due to paralysis of users, it does not assume anyhead movements. After the blink, it can estimate the eye position. The head tracking is basedon the Between-the-Eyes feature tracking, because of their good visibility. Eye tracking usesthe correlation between the actual frame and the open eye template. When the correlationcoefficient is lower than a specific value, the blink detection procedure is established. The

16


difference is detected by a variance projection function map defined by:

σ2H(y) =

1

w

w∑i=1

[I(xi, y)−H(y)]2, (3.8)

where I(xi, y) is the intensity in the red channel at point (xi, y), H(y) is the mean valuefor row y and w is the width of the image. Median filter is used to smooth the varianceprojection. The absolute sum of the extreme values of the first derivative of the varianceprojection is computed, due to the decrease of the score while eyes are closing.

3.5 Correlation

Correlation in statistics describes the degree of dependence between two variables. The mea-surement of correlation is expressed by a correlation coefficient. The coefficient takes thevalue within the interval 〈−1, 1〉. It measures the degree of linear dependence (relationship)between variables. A correlation of +1 means the perfect positive correlation and a correla-tion of−1 means the perfect negative correlation. The correlation of 0 denotes independencebetween variables. The normalised correlation coefficient denotes the similarity between thetemplate with w× h pixels and the image rectangle with the top left corner at position (x, y)and the size equal the template size. It is computed as the follows3:

R(x, y) =

∑h−1y′=0

∑w−1x′=0 T(x′, y′)I(x+ x′, y + y′)√∑h−1

y′=0

∑w−1x′=0T(x′, y′)2

∑h−1y′=0

∑w−1x′=0 I(x+ x′, y + y′)2

, (3.9)

where T(x′, y′) = T (x′, y′)−T , I(x+x′, y+ y′) = I(x+x′, y+ y′)− I(x, y), I(x, y) is thebrightness of coordinates (x, y), I is the average brightness within the search region, T (x, y)is the template image brightness at point (x, y) and finally T denotes the average value of thetemplate (Grauman et al., 2001).

Score close to 0 means low level similarity and score close to 1 indicates a probable matchwith a template. Closed eyes during the blink does correlation coefficient to decrease. If wemeasure a correlation between actual eye and open eye template, a correlation coefficientrepresents "the openness" of eyes in a frame.

Correlation method is mentioned in (Chau – Betke, 2005) as a part of the system for phys-ically disabled people. User’s eyes are localised with frame difference which is filtered bykernel in an opening morphological operation. The proposed system serves people withparalysis who are relatively still, so the system does not assume many eventual motion can-didates. The most likely eye blink pair is extracted as the candidate with the best match ofparameters such as size and distance. Template of open eyes is subsequently captured for thecorrelation computation. The equation above is used to calculate the correlation coefficientto track eyes and perform the blink detection. Since the system has to perform computationquickly, the search region for correlation is reduced to a small area around the eyes. If thecorrelation coefficient is very high, eyes are open. If it falls within the interval 〈0.5, 0.55〉,they assume that the user blinks and finally if the similarity is low, the tracker probably lostthe user’s eyes.

3http://itee.uq.edu.au/~iris/CVsource/OpenCVreferencemanual.pdf

17

http://itee.uq.edu.au/~iris/CVsource/OpenCVreferencemanual.pdf


Blink detection via correlation for immobile people is also used in the real-time systemfrom (Grauman et al., 2001). The first step which locates user’s eyes after the initial blinkconsists of a frame difference which is thresholded and processed by erosion. The resultingeye blink pair is eliminated from candidates by various filters. After the localisation openeye template is captured and used in simple eye tracker based on the correlation between theactual eye region and the template. The proposed system detects blinks when the computedcorrelation score ranges from value 0.55 to 0.8.

3.6 Intensity Distribution in Eye Halves

In (Moriyama et al., 2002) user’s face is consequently tracked using dynamic templatematching - face in the previous frame serves as the template for the next one.

By manually giving the feature points in the left, right, bottom and top corner of the eyethe eye region is extracted during the initialisation. A blink detection algorithm is based onthe fact that the upper and lower part of eye have different distributions of mean intensitiesduring open eyes and blinks (Figure 3.6). These intensities cross during the eyelid closingand opening. Reflection from eyeglasses causes difficulties to the blink detector.

Figure 3.6: Luminance curves in upper (red curve) and lower (blue curve) half of the eyeduring different states. Left image represents the eye blink state, middle image the multipleblink (flutter) state and right one characterises the non-blink state (Moriyama et al., 2002).

3.7 Deformable Model

Deformable models are curves or shapes which are deformed in order to fit in an objectboundary or an image feature (Xu – Prince, 2000). A real-time blink detector in (Litinget al., 2009) detects blinks using deformable model.

The detection starts with face detection based on Haar Cascade Classifier. Subsequently theeye detector works on the same method. Afterwards the extraction of eye contours is per-formed. The technique uses a deformable model – Active Shape Model (ASM) representedby several landmarks as an eye contour shape. The proposed model learns the appearancearound each landmark and fits it in the actual frame to obtain the new eye shape. This pro-cedure estimates eyelid positions. Blinks are detected by the distance measurement betweenupper and lower eyelid (Figure 3.7).

18


Figure 3.7: Eye contour extraction - measurement of the distance between eyelids and resultswith open and closed eyes (Liting et al., 2009).

3.8 Eyelid’s State Detecting Value

(Ayudhaya – Srinark, 2009) proposed a real-time algorithm which detects open or closedeyelids with the eyelid’s state detecting (ESD) value. It increases the threshold until theresulting binary image has at least one black pixel after applying the median blur filter. Thisminimum threshold value is the ESD value. This threshold value differs while user blinks.Figure 3.8 shows a graph of multiplied ESD values of left and right eye. Peaks in the graphrepresent user’s blinks. The algorithm for computing the ESD value is listed in Listing 3.1.

Face is detected and tracked using Haar Cascade Classifier and the Camshift algorithm (Brad-ski, 1998). Eye detection is also based on the Haar features. The slope of the graph ofcomputed ESD values within the eye region determines a state of the eyelid. This methodachieves 92,6% detection accuracy on their own videos.

In case of dark eyelashes or dark glasses frame the blink detector will not work properly,due to which (Panning et al., 2011) uses modified ESD value (ESDm). It is the mean valueof all pixels within the eye. At the beginning of blink detection, users are required to blink.Consequently the frames are analysed to compute the threshold value to detect blinks. It isdefined as:

th = α

(min(ESDm) +

max(ESDm)−min(ESDm)

2

)(3.10)

Authors obtained the best result with α = 0.65.

Listing 3.1: The ESD value algorithm (Ayudhaya – Srinark, 2009).input:img - half-bottom eye image

initialisation:t = -1 (threshold value)

ESD computing:do

t = t+1threshold img with value tapply median blur filter to thresholded image

while sum of black pixels in the image == 0ESD = t

output:ESD - the ESD value

19


Figure 3.8: ESD value graph. Blinks and closing eyelids frames are marked by circles(Ayudhaya – Srinark, 2009).

3.9 Circular Hough Transformation

Some approaches use circle detection algorithms to detect the iris within the eye. The ab-sence of the iris is considered as blink.

The Circular Hough Transformation (CHT) is a method for extracting circles from images.Each circle can be defined by the equation:

r2 = (x− a)2 + (y − b)2, (3.11)

where r is the radius of the circle and a and b are the coordinates of the centre. Thus CHTuses 3-dimensional accumulator space (a, b, r). In order to simplify the algorithm the valueof radius is often fixed or minimum and maximum threshold are defined. For each edgepoint (x, y) a circle is drawn around the edge point with radius r. We increment the valuesof accumulation matrix corresponded to the coordinates of points on the perimeter of thedrawn circle. Each value in the matrix denotes how many circles pass through the individualcoordinates. The matrix cells with the highest values represent the most possible candidatesfor the centre of the circles4.

CHT method is presented in (Yunqi et al., 2009). An obvious peak in the accumulated arrayfrom CHT indicates visible iris (Figure 3.9). The closed eye state is detected by computingthe direction of the upper eyelid. If the curve of the eyelid is downwards, blink is identified.If the actual eye state is not determined by the previous two methods, the distance betweenthe eyelids decides on the state.

4PEDERSEN, S.J.K. Circular Hough Transform, Aalborg University, Vision, Graphics and Interactive Sys-tems, November 2007. Available from: http://www.cvmt.dk/education/teaching/e07/MED3/IP/Simon_Pedersen_CircularHoughTransform.pdf

20

http://www.cvmt.dk/education/teaching/e07/MED3/IP/Simon_Pedersen_CircularHoughTransform.pdf

http://www.cvmt.dk/education/teaching/e07/MED3/IP/Simon_Pedersen_CircularHoughTransform.pdf


Figure 3.9: Accumulated array from CHT (Yunqi et al., 2009).

3.10 Infrared Sensor

The following method is suited for measurements of the eye blink response of the rabbit. Aneye blink detector in (Ryan et al., 2006) uses an infrared (IR) sensor. It consists of an IRlight emitting diode (LED) pulsed at a high frequency and an IR photodiode. It transmitsinfrared rays in rabbit’s eye and receives the reflected rays (Figure 3.10). If the eye is closed,maximum light is reflected and detected, since the eyelid has a higher reflectivity than thecornea. The IR signal is then amplified and digitised. Authors claim that the detector couldbe applied in experiments with human subjects.

Figure 3.10: Eye blink detector with IR sensor (Ryan et al., 2006).

The method described in a previous paper can be used in cars as a prevention from mi-crosleep. IR light is invisible to the eye and it does not disturb a driver. A proposed device in(Castro, 2008) is placed in the temple of the glasses (Figure 3.11). IR signal received by the

21


sensor is digitised and sent to a computer. The signal is analysed and if a driver fell asleep,an alarm starts to ring.

Figure 3.11: Wearable IR sensor placed on the glasses (Castro, 2008).

3.11 Electrooculography

(Pander et al., 2008) represents electrooculography as a method for blink detection. Elec-trooculographic (EOG) signal is based on the electrical potential difference between thecornea and the retina. The potential creates an electrical field. Orientation of field dependson eye movements. The movements cause the change of the orientation. If eyelids are closedduring the blink, the potential around the eye changes as a result of movements of the eyelidmuscles.

EOG signal is measured by electrodes placed near the eyes (Figure 3.12). The signal isfiltered from the noise and applied in detection function. Peaks in the shape of the functionrepresent user’s blinks.

Figure 3.12: The placement of electrodes for measurement of EOG signals (Pander et al.,2008).

22


3.12 Summary

After analysing available techniques for blink detection, we designed our own solutionsbased on histogram backprojection, optical flow, frame difference and descriptor method.Due to our goal to do CVS preventing system, our main focus is on model situation whenthe user is facing the computer screen. Because of this, the high recognition rate within theefficient computation is necessary.

23


24

Chapter 4

Analysis of Used Principles

Our aim is to create an eye blink detector, which could be used in a real-time blink detectionsystem. In a case of low blink rate it could notify user to blink more frequently.

In this chapter we propose four different methods on blink detection. The first of presentedalgorithms computes backprojection from 1D saturation and 2D hue-saturation histogram.The method addressed as Centre Aligned Movement Detection detects eyelid motion usingthe KLT feature tracker. The next technique is based on the frame difference in grayscaleand RGB (Red Green Blue) image. The last one uses FREAK descriptor (Alahi et al., 2012)to detect blinks.

4.1 Histogram Backprojection

Histograms characterise the distribution of data organised into bins. Each bin denotes acertain interval of data values. Histograms may be also multidimensional. In computer visionthey can be used to represent a colour distribution of a given picture (Bradski – Kaehler,2008).

Histogram backprojection creates a probability map over the image. In other words backpro-jection determines how well the pixels from an image fit the distribution of a given histogram.The higher value of a pixel in a backprojected image denotes more likely location of the givenobject. Backprojected image is a single channel image, due to which we have to normalisethe histogram in order to get all probability values within the interval 〈0, 255〉 (Bradski –Kaehler, 2008; Laganière, 2011).

The result of backprojection can be adjusted by morphological operations. Among the basicoperations belong (Laganière, 2011):

1. Erode replaces the current pixel with the local minimum found in the neighbourhoodof the pixel.

2. Dilate is the complement operator. It finds the local maximum in the neighbourhoodof the current pixel and replaces the pixel with the maximum.

3. Opening is the combination of erode and dilate. It is the dilatation followed by erosion.

4. Closing is Opening performed in reverse. It is defined as the erosion followed bydilatation.

Chapter 4. Analysis of Used Principles Patrik Polatsek

HSV (Hue Saturation Value) colour model is the 3-channel model. Hue (H) componentdescribes the dominant colour, Saturation (S) expresses the amount of white in the colourand brightness is described in the separate Value (V) component (Hassanpour et al., 2008;Garnavi et al., 2009):

H = arccos12(R−G) + (R−B)√

(R−G)2 + (R−B)(G−B), (4.1)

S = 1− 3min(R,G,B)

R +G+B, (4.2)

V =1

3(R +G+B), (4.3)

where R, G and B are red, green and blue value.

We use the histogram to represent the skin colour of the user (Figure 4.1). Backprojec-tion computes probability of skin presence for each pixel. We detect closed eyes as a highpercentage of skin colour pixels within the eye region otherwise we consider eyes opened(Figure 4.2).

(a) Source image.

114 403 965 1706 3628 5870 7262 55584465 3518 2174 1504 1041 555 308 209134 125 113 79 67 56 69 6361 47 67 43 42 40 32 3

(b) Resulting 1D saturation histogram (32 bins).

Figure 4.1: 1D saturation histogram of a face.

We use the HSV colour model to achieve partial luminance invariance by the omission of theValue channel. We did experiments with various histograms:

• 2D hue-saturation histogram,

• 1D hue histogram,

• 1D saturation histogram.

According to our experiments Hue channel alone does not provide sufficient informationabout the actual eye state in worse lighting conditions. Due to that we propose our histogrambackprojection method only with hue-saturation and saturation histogram. Saturation his-togram consists of 32 bins. Hue-saturation histogram is a grid of 30 bins for Hue and 32 binsfor Saturation channel.

4.1.1 Implementation Details

First we detect the user’s face by Haar Cascade Classifier (Viola – Jones, 2001) convertedto HSV colour space. We calculate the skin colour histogram (hue-saturation or saturation

26


(a) Source images.

(b) 1D saturation histogram. (c) Backprojected images (saturation histogram).

(d) 2D hue-saturation histogram. (e) Backprojected images (hue-saturation histogram).

Figure 4.2: Histogram backprojection for a person with open and closed eyes using differenthistograms.

histogram) from a sequence of images of facial regions. Other parts of the image are notused to obtain as precise skin colour histogram as possible. The histogram is normalisedafterwards and regularly updated. In case of dominant colour in background e.g. white wall,normalised histogram may not represent skin distribution properly. Because of that, thenormalisation function scales all values such that the second maximum value is 255. Thusthe final histogram will contain two maximums.

For every input image we calculate the backprojection with this histogram. Subsequently aresulting backprojected image is modified using morphological operation Erode and thresh-old (thresholdHS = 10 in hue-saturation and thresholdS = 25 in saturation histogramobtained by experiments) to increase small difference between open and closed eyelids dueto lower skin probability of eyelids caused by shadows in eye areas or make-up. Finally, theaverage value of the probabilities is calculated from the region of the user’s eyes. Significant

27


increase is considered as eye blink of the user. The threshold for eye blink is computed as:

avg(BPt−8 +BPt−7 + ...+BPt−4) + α, (4.4)

where avg is the average value and BPi indicates the ith backprojected image (BPt rep-resents the actual backprojection). α equals 8 for saturation and 10 for hue-saturation his-togram.

Figures 4.3(a) and 4.3(b) illustrate the results of backprojection while using different his-tograms.

(a) The average values of backprojection using 2D hue-saturation histogram.

(b) The average values of backprojection using 1D saturation histogram.

Figure 4.3: The average values of modified backprojection using different types of his-tograms. Backprojection is computed from an image sequence of a user while sitting ata computer. The user blinks at frames 18, 33, 50, 69 and 90. Detected blinks are representedby circles on the graph.

4.2 Centre Aligned Movement Detection

We introduce our own Centre Aligned Movement Detection algorithm based on optical flow(Section 3.1). Optical flow locates new feature position in the following frame. One of themost common method is KLT feature tracker (Tomasi – Kanade, 1991).

Lucas-Kande tracker, also known as the KLT (Kanade-Lucas-Tomasi) tracker is based on twopapers. Original Lucas-Kande tracker was firstly published in 1981 by Lucas and Kanade(Lucas – Kanade, 1981). In the later paper from 1991 Tomasi and Kanade improved thetracker (Tomasi – Kanade, 1991).

KLT method makes three assumptions for tracking feature points:

1. Brightness constancy: The brightness of the tracked point does not change in the con-sequent frame.

2. Temporal persistence: Feature points move slowly in time.

28


3. Spatial coherence: Points in the neighbourhood have similar motion.

KLT tracker selects features suitable for tracking with high intensity changes in both direc-tions, such as corners (Shi – Tomasi, 1994). Then it measures similarity between the selectedpoints and their most likely location in the next frame. If the correlation is below threshold,the points are considered as lost. Pyramid implementation of the tracker improves efficiencyof the algorithm. Image pyramids consist of images at different sizes. The tracker iterativelysearches the position of the feature at a higher resolution of the image. It allows to tracklarger motion than the original algorithm (Bradski – Kaehler, 2008; Laganière, 2011).

If a user blinks, the mean displacement of feature points within the eye region should begreater than the displacement of the rest of the points within the face area (Figure 4.4).

(a) Source images. (b) Displacement of feature points.

(c) Source images1. (d) Displacement of feature points.

Figure 4.4: Tracking of feature points using KLT feature tracker by eye blink. The displace-ments of (yellow) ocular points are represented by green lines. The displacements of (blue)nose points are represented by red lines.


29





The first step consists of localising a user’s face and eyes using Haar Cascade Classifier (Vi-ola – Jones, 2001) on a grayscale image. We initialise random KLT features within the eyeand nose regions and classify them as left ocular, right ocular or non-ocular. These featuresare being tracked by the KLT tracker. Tracking is reinitialised in regular intervals or in caseof loss of many feature points.

We compute the average displacement separately for three groups of points. Afterwardswe compare the difference between the left or right ocular and non-ocular movement dis-placements. If this difference exceeds a threshold value (thresholddiff = face.height/165,where face.height is the height of the detected face in the initial phase), a movement withinthe eye region is anticipated.

Consequently we count the ratio of ocular points that moved down at least a specific distancein the direction of the y-axis (distancey = face.height/110) in order to exclude false pos-itives caused by horizontal eye movements. Due to the proper computation of the ratio weeliminate the vertical ocular displacement caused by head movements. The ocular points aretherefore shifted in a distance equal to the average displacement of non-ocular points. If theratio is higher than a threshold (5% of displacements of one group of ocular points and 2%of displacement of the second one), we consider it as a blink.

Figure 4.5 represents a graph of values defined as:

max(abs(avg(left)− avg(non)), abs(avg(right)− avg(non))), (4.5)

where max and abs are the maximum and absolute value, avg indicates the average move-ment within a given region and left, right and non denote left ocular, right ocular andnon-ocular region.

Figure 4.5: The differences between the average ocular and non-ocular movement within theface area computed from an image sequence of a user while sitting at a computer. The userblinks at frames 18, 33, 50, 69 and 90. Detected blinks are represented by circles on thegraph.

This method has similar blink detection process as the method in (Divjak – Bischof, 2009).It is also based on optical flow and tries to detect eyelid motions excluding the global headmotions. (Divjak – Bischof, 2009) defines the downward motion as the angle of flow vectorsclose to the parallel to the line connecting the eyes. However, our method classifies thedownward motion without the computation of the dominant angle. We define it as the certaindownward movement in the direction of the y-axis. Our method does not combine detected

30


motion within the left and right eye region which is advantageous for example when thereflection from glasses makes worsen the visibility of one eye. In order to extract the eyelidmotion we do not use features points from the whole face region. Such points can influencefacial mimicry such as smiling or talking, due to which our solution tracks only points withinthe nose.

4.3 Frame Difference

Frame difference returns the difference between two images. Each pixel location (i, j) in theresulted image is computed as:

imagediff (i, j) = abs(image1(i, j)− image2(i, j)), (4.6)

where abs is the absolute value. From the multichannel image the difference is computedseparately for each channel. The higher value of a pixel we obtain, the greater differencewas detected.

The following blink detection method uses frame difference in grayscale and RGB colourimage. We found out that the difference within the eye area is mostly multiple times greaterthan within the rest of facial image when a user blinks (Figure 4.6).

(a) Source images. (b) Grayscale difference. (c) Colour difference.

Figure 4.6: Result of frame difference using grayscale and colour on INRIA Talking FaceVideo dataset.


In the initial phase we detect a user’s face and eyes by Haar Cascade Classifier (Viola –Jones, 2001). Afterwards, their positions are maintained by tracking KLT features in a gridwithin the eyes. The tracker is reinitialised in regular intervals or in case of lost of manyfeatures.

Within the facial area we compute a frame difference between the actual and one beforethe last image. In order to eliminate noise and make the results more significant we applythreshold (threshold = 15) and erode the results. Subsequently we compute separatelyfor the frame difference image the average value within the left and the right eye regionand the rest area of the face. For colour images it is computed as the mean of R, G andB values. The average difference within the eyes is computed as the mean from the values

31


(a) The differences in the frame difference using grayscale images.

(b) The differences in the frame difference using RGB colour images.

Figure 4.7: The differences in the frame difference within the face and eye area computedfrom an image sequence of a user while sitting at a computer. The user blinks at frames 18,33, 50, 69 and 90. Detected blinks are represented by circles on the graph.

within the left and right eye. We detect an eye blink when the average differences withinthe eyes (diff(eye)) and the rest of the face (diff(face)) satisfy the following conditions:diff(eye) > 8 and diff(eye) − 4 × diff(face) > 4. The results of this blink detectionmethod are represented by Figure 4.7.

4.4 FREAK Descriptor

Object recognition is often based on detecting some special points within the object andmatch them with points extracted from the image. These points also called feature points orkeypoints can be described by feature descriptors. They represent the neighbourhood of thepoint by a n-dimensional vector. The representation should be invariant to light changes andsmall image deformations (Laganière, 2011).

Binary descriptors are binary vectors which can be easily compared using the hamming dis-tance. It is the number of different bits of two bit vectors at the same positions. Smallhamming distance denotes the probable similarity between the keypoints.

FREAK (Fast REtinA Keypoint) is a binary keypoint descriptor (Alahi et al., 2012). It is cre-ated by comparing 43 Gaussian-smoothed intensities at the locations in the neighbourhoodof the keypoint with the pattern inspired by the human retina (Figure 4.8):

• density of sampling points increases with the closing distance to the keypoint centre,

32


• each sample point is smoothed by the corresponding Gaussian kernel (a receptivefield).

Figure 4.8: Freak sampling pattern with the topology inspired by the eye retina. A respectivefield described by a circle represents the Gaussian kernel which is applied on the samplepoint (Alahi et al., 2012).

Each bit of the descriptor F represents the difference between pairs of receptive fields:

F =∑

0≤α<N2αT (Pα), (4.7)

where Pα is a pair of receptive fields, N represents the size of the descriptor and

T (Pα) =

{1 if (I(P r1

α )− I(P r2α ) > 0,

0 otherwise,(4.8)

where I(P r1α ) is the smoothed intensity of the first field.

Not all possible pairs of receptive fields are used to create the descriptor in order to increaseefficiency and avoid the high correlation between the pairs. Thus FREAK uses a method forselecting 512 most discriminant pairs.

FREAK computes for selected keypoints a matrix where each row represents a descriptor forone keypoint.

An eye blink changes the neighbourhood of keypoints detected within the eyes, due to whichthe descriptors computed in the blink and non-blink state are significantly different. Thusthe hamming distance of the descriptors computed from ocular feature points in subsequentimages increases during the blink.


This method first detects the eye area using the Haar Cascade Classifier (Viola – Jones,2001) on grayscale image. The regions of interest are subsequently tracked by KLT feature

33


tracker. The reinitialisation of the algorithm comes in regular intervals, after the blink or incase of lost of many tracked feature points.

For each eye region we set keypoints in a grid with step of 3 to 4 pixels and calculate theFREAK descriptors. Consequently we find for each descriptor the closest one in the nextimage according their hamming distance.

The threshold for blink detection is based on the hamming distance. We detect a blink whenat least 15% of matched descriptors have more than 100 bits different and the sum of thehamming distances between the all matched descriptors is higher than 12%. Figure 4.9represents a graph of the total hamming distance between matched descriptors.

Figure 4.9: The sum of the hamming distances between matched FREAK descriptors (ex-pressed as percentages) computed from an image sequence of a user while sitting at a com-puter. The user blinks at frames 18, 33, 50, 69 and 90. Detected blinks are represented bycircles on the graph.

34

Chapter 5

Evaluation and Discussion

Our blink detection algorithms are evaluated on three datasets. Our own dataset includes8 individuals (5 males and 3 females, one person wearing glasses) under different lightingconditions who sit in front of a computer screen mostly in a stable position and lookingdirectly at the screen. It consists of 7569 frames (640×480 size) and 128 blinks. The secondimage sequence - the Talking Face Video (TALKING) is publicly available from INRIA1. Itincludes 5000 images (720×576 size) of a person engaged in conversation who blinks 61times. The last public dataset is the ZJU Eyeblink Database (ZJU) (Pan et al., 2007). Itcontains 80 short video clips (10876 frames of 320×240 size) of 20 individuals (13 malesand 7 females). Subjects are captured in the frontal and upward view with and withoutglasses. There are totally 255 complete blinks.

We have available to use Logitech C905 webcam for testing, which can record up to 30 fpseven in poor light conditions.

We have tested our algorithms and compared their blink detection abilities to the optical flowmethod mentioned in (Divjak – Bischof, 2009) (Section 3.1). We assumed that our resultswill be more similar, so we have doubts about the proper computation of the false positiverate in this method. We think that authors computed the false discovery rate which is therate of false positives to total number of detections. This would explain their high falsepositive rate values. Authors do not explain how they have computed the detection rate inthe ZJU Eyeblink Database. The results are shown in Table 5.1. The best overall true andfalse positive rate are achieved by Centre Aligned Movement Detection (CAMD). It detects93,75% of blinks on own dataset, 98,36% of blinks on the Talking Face Video and 89,85%of blinks on ZJU Eyeblink Database.

Backprojections using hue-saturation (Hist. HS) and saturation histogram (Hist. S) pro-vide similar accuracies on own dataset and the Talking Face Video. Values of the saturationchannel of an image differ in case of skin and pupil in various light conditions, thus it oftenprovides reliable and sufficient information about user’s blinks. However images in the ZJUEyeblink Database just slightly differ in saturation channel within the pupil, due to which 1Dsaturation histogram cannot be used for blink detection on this dataset. On the other handhue channel is often different in whole eye regions. Sometimes it is without any significantchanges when the eye blinks. It happens mostly in very dark images. We suggest to use bothhue and saturation channel in order to cover more cases where eye area can be extracted by




Chapter 5. Evaluation and Discussion Patrik Polatsek

Table 5.1: Comparison of our blink detection algorithms to the method in (Divjak – Bischof,2009). TP represents true positive rate and FP is false positive rate.

Method Own dataset TALKING ZJUTP (%) FP (%) TP (%) FP (%) TP (%) FP (%)

Backprojection (Hist. S) 81,25 0,40 88,52 0,49 - -Backprojection (Hist. HS) 75,00 0,32 85,25 0,47 61,65 3,63CAMD 93,75 0,05 98,36 0,20 89,85 0,02Frame difference (grayscale) 85,16 0,35 88,52 1,17 96,24 0,22Frame difference (RGB) 86,72 0,43 90,16 0,83 95,86 0,21FREAK descriptor 85,94 0,52 93,44 0,00 82,33 0,07(Divjak – Bischof, 2009) - - 95 19 95±12 2±6

backprojection. False detections are the results of luminance changes, poor light conditions,changes in gaze direction, reflection from glasses, very small distance between the eyelids,facial mimicry such as smiling and eyelid makeup. In such cases it is very difficult to recog-nize whether a user blinks or not. Backprojection using hue-saturation histogram has manymissed blinks when an individual wears glasses.

CAMD, our best method, has only 16 false positive cases. False detections are caused mainlyby rapid head movements, lowering the gaze and reflection from glasses. Too small distancebetween the eyelids of some Chinese person in ZJU Eyeblink Database decreases the truepositive rate on this dataset. This method has only one missed blink in the Talking FaceVideo caused by gaze lowering. CAMD has the best trade-off between the true and falsepositive rate and the lowest false positives by wearing glasses. This method has only onemissed blink in the Talking Face Video caused by gaze lowering.

The frame difference method using colour and grayscale images achieves almost the samedetection rate. Despite the attempt to reduce false detections in this method, it is still difficultto differ open and closed eyes by fast head and pupil movements or facial mimicry. Missedblinks also occur in poor lighting conditions especially using grayscale images. Many falsedetections are also caused by wearing glasses. This method has the highest true positiverate in ZJU Eyeblink Database. However it has too many false positives, due to which it isreliable only when the user is relatively calm.

The method based on the FREAK descriptor has the most false detections due to glassesand pupil movements. Very poor lighting conditions cause some missed blinks. This methodcould achieve a better detection rate by more precise pupil detection. Haar Cascade Classifierdetects the whole eye area, due to which we often choose as the keypoint the skin instead ofthe eyes.

Use of Haar Cascade Classifier causes problems to all proposed blink detection methods.Haar-based eye detection often fails in dark images, when the user wears glasses or does notdirectly look at the computer screen. Exact eye or pupil detection would increase the truepositive rate of all algorithms.

We evaluated the average image processing time of our blink detection algorithms. Weimplemented them in a C++ console application using the OpenCV2 library, which detectsblinks in a selected dataset (frames of 640×480 size). According to the results in Table 5.2

2http://opencv.willowgarage.com

36

http://opencv.willowgarage.com


obtained on our computer (Intel Core i7-2670QM, 2.20GHz processor and 6GB RAM), allalgorithms run in real-time.

CAMD method processes one image in 13,28 msec on average. Reinitialisation in thismethod which detects the face and eyes requires more time than the average, but it is stillsufficient to work in real-time. Processing time is also influenced by the size of the trackedface and eyes, due to the dependency of the number of tracked features from the size ofthe user’s eyes. If we do not include in the average time reading the images from the harddrive, CAMD algorithm needs only about 5 msec to process an image. Our webcam capturesframes every 33 msec. Thus CAMD, that takes less time, is able to process all 30 framesfrom the webcam per second.

Table 5.2: Average processing time per image of our blink detection algorithms. They weretested on several image sequences of 640×480 size from our dataset.

Method Average time per imageBackprojection (Hist. S) 18,89 msecBackprojection (Hist. HS) 19,25 msecCAMD 13,28 msecFrame difference (grayscale) 12,51 msecFrame difference (RGB) 13,82 msecFREAK descriptor 14,35 msec

37


38

Chapter 6

Conclusion

In this thesis we have focused on designing an eye blink detection algorithm. We haveproposed four different techniques for the blink detection, which can run in real-time. Thefirst method detects blinks by backprojection using saturation or hue-saturation histogram.The second method is based on the KLT feature tracker, which tracks eyelid motions. Thenext method computes a frame difference in grayscale and colour images and the last one isbased on FREAK descriptors.

All algorithms are implemented in C++ using OpenCV library.

We have compared all methods and evaluated them on different datasets. The model sit-uation is a user looking at the computer screen. After analysing all proposed techniques,we consider the Centre Aligned Movement Detection as our best method. It outperforms themethod in (Divjak – Bischof, 2009) by 3% better true positive rate and about 18% lower falsepositive rate in the Talking Face Video dataset. This method represents the best compromisebetween the high true positive and low false positive rate. In future it can be extended tocover more complex user behaviour in front of camera. This algorithm has the predisposi-tion to be deployed in desktop application to detect user’s blinks. The blink detection abilityof this algorithm can be used to analyse the frequency of user’s blinks. It will warn the userto blink regularly and in this way it will protect the eyes from drying.

Acknowledgement: This project is partially supported by Tatra bank foundation E-Talent2012et009.

Chapter 6. Conclusion Patrik Polatsek

40

Bibliography

ALAHI, A. – ORTIZ, R. – VANDERGHEYNST, P. FREAK: Fast Retina Keypoint. InProceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recogni-tion (CVPR), CVPR ’12, pages 510–517, Washington, DC, USA, 2012. IEEE ComputerSociety. Available from: http://dl.acm.org/citation.cfm?id=2354409.2354903. ISBN 978-1-4673-1226-4.

ARAI, K. – MARDIYANTO, R. Comparative Study on Blink Detection and Gaze Esti-mation Methods for HCI, in Particular, Gabor Filter Utilized Blink Detection Method.In Proceedings of the 2011 Eighth International Conference on Information Technology:New Generations, ITNG ’11, pages 441–446, Washington, DC, USA, 2011. IEEE Com-puter Society. doi: 10.1109/ITNG.2011.84. Available from: http://dx.doi.org/10.1109/ITNG.2011.84. ISBN 978-0-7695-4367-3.

AYUDHAYA, C. D. N. – SRINARK, T. A method for a real time eye blink detection and itsapplications. In The 6th International Joint Conference on Computer Science and SoftwareEngineering (JCSSE), pages 25 – 30, May 2009. Available from: http://www.cpe.ku.ac.th/~jeab/papers/chinnawat_JCSSE2009.pdf.

BHASKAR, T. N. et al. Blink detection and eye tracking for eye localization. In TENCON2003. Conference on Convergent Technologies for Asia-Pacific Region, volume 2, pages821 – 824 Vol.2, oct. 2003. doi: 10.1109/TENCON.2003.1273293.

BLEHM, C. et al. Computer Vision Syndrome: A Review. Survey of Ophthalmology.2005, 50, 3, pages 253 – 262. ISSN 0039-6257. doi: 10.1016/j.survophthal.2005.02.008. Available from: http://www.sciencedirect.com/science/article/pii/S0039625705000093.

BRADSKI, G. Computer Vision Face Tracking For Use in a Perceptual User Interface. InIntel Technology Journal, Q2, 1998.

BRADSKI, G. – KAEHLER, A. Learning OpenCV: Computer Vision with OpenCV Library.O‘Reilly Media, 1. ed. edition, 2008. ISBN 0-596-51613-4.

CASTRO, F. L. Class I infrared eye blinking detector. Sensors and Actuators A: Phys-ical. 2008, 148, 2, pages 388 – 394. ISSN 0924-4247. doi: 10.1016/j.sna.2008.09.005. Available from: http://www.sciencedirect.com/science/article/pii/S0924424708004718.

CHAU, M. – BETKE, M. Real Time Eye Tracking and Blink Detection with USBCameras. Boston University Computer Science. 2005, pages 1–10. Availablefrom: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.3862&rep=rep1&type=pdf.

http://dl.acm.org/citation.cfm?id=2354409.2354903


http://dx.doi.org/10.1109/ITNG.2011.84

http://dx.doi.org/10.1109/ITNG.2011.84

http://www.cpe.ku.ac.th/~jeab/papers/chinnawat_JCSSE2009.pdf

http://www.cpe.ku.ac.th/~jeab/papers/chinnawat_JCSSE2009.pdf

http://www.sciencedirect.com/science/article/pii/S0039625705000093




http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.73.3862&rep=rep1&type=pdf


Bibliography Patrik Polatsek

COMEAU, J. D. – GODNIG, E. C. Computer Use and Vision. In Proceedings of the 27thAnnual ACM SIGUCCS Conference on User Services: Mile High Expectations 1999, Den-ver, Colorado, USA, pages 31–34. ACM, 1999. doi: http://doi.acm.org/10.1145/337043.337078. ISBN 1-58113-144-5.

CERNÁK, A. – CERNÁK, M. Suché oko. Ocná klinika SZU, FNsP, Bratislava, nemocnicasv. Cyrila a Metoda. 2007. Available from: http://www.solen.sk/index.php?page=pdf_view&pdf_id=2880&magazine_id=12.

DENTENEER, T. J. J. – JASINSCHI, R. S. Statistics of the Visual Normal Flow. Tech-nical Note TN-2004/01133, Philips Research Europe, 2004. Available from: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.5574.

DIVJAK, M. – BISCHOF, H. Real-time video-based eye blink analysis for detection of lowblink-rate during computer use. In First International Workshop on Tracking Humansfor the Evaluation of their Motion in Image Sequences (THEMIS 2008), pages 99–107,September 2008.

DIVJAK, M. – BISCHOF, H. Eye blink based fatigue detection for prevention of ComputerVision Syndrome. In IAPR Conference on Machine Vision Applications (MVA 2009),pages 350–353, May 2009.

GARNAVI, R. et al. Skin Lesion Segmentation Using Color Channel Optimization andClustering-based Histogram. World Academy of Science, Engineering and Technology.2009, 36.

GRAUMAN, K. et al. Communication via eye blinks - detection and duration analysis inreal time. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings ofthe 2001 IEEE Computer Society Conference on, volume 1, pages I–1010 – I–1017 vol.1,2001. doi: 10.1109/CVPR.2001.990641.

HASSANPOUR, R. – SHAHBAHRAMI, A. – WONG, S. Adaptive Gaussian MixtureModel for Skin Color Segmentation. World Academy of Science, Engineering and Tech-nology. 2008, 41.

HEISHMAN, R. – DURIC, Z. Using Image Flow to Detect Eye Blinks in Color Videos.In Applications of Computer Vision, 2007. WACV ’07. IEEE Workshop on, pages 52, feb.2007. doi: 10.1109/WACV.2007.61.

KHILARI, R. Iris tracking and blink detection for human-computer interaction using a lowresolution webcam. In Proceedings of the Seventh Indian Conference on Computer Vision,Graphics and Image Processing, ICVGIP ’10, pages 456–463, New York, NY, USA, 2010.ACM. doi: 10.1145/1924559.1924620. Available from: http://doi.acm.org/10.1145/1924559.1924620. ISBN 978-1-4503-0060-5.

LAGANIÈRE, R. OpenCV 2 Computer Vision Application Programming Cookbook. PacktPublishing Ltd., 2011. ISBN 1849513244.

LALONDE, M. et al. Real-time eye blink detection with GPU-based SIFT tracking. InProceedings of the Fourth Canadian Conference on Computer and Robot Vision, CRV ’07,pages 481–487, Washington, DC, USA, 2007. IEEE Computer Society. doi: 10.1109/CRV.2007.54. Available from: http://dx.doi.org/10.1109/CRV.2007.54. ISBN0-7695-2786-8.

42

http://www.solen.sk/index.php?page=pdf_view&pdf_id=2880&magazine_id=12

http://www.solen.sk/index.php?page=pdf_view&pdf_id=2880&magazine_id=12

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.5574

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.68.5574

http://doi.acm.org/10.1145/1924559.1924620

http://doi.acm.org/10.1145/1924559.1924620

http://dx.doi.org/10.1109/CRV.2007.54


LITING, W. et al. Eye Blink Detection Based on Eye Contour extraction. In Image Process-ing: Algorithms and Systems, pages 72450. SPIE Electronics Imaging, 2009. Availablefrom: http://link.aip.org/link/?PSISDG/7245/72450R/1.

LUCAS, B. D. – KANADE, T. An iterative image registration technique with an applicationto stereo vision. In Proceedings of the 7th international joint conference on Artificial in-telligence - Volume 2, IJCAI’81, pages 674–679, San Francisco, CA, USA, 1981. MorganKaufmann Publishers Inc. Available from: http://dl.acm.org/citation.cfm?id=1623264.1623280.

MORIYAMA, T. et al. Automatic recognition of eye blinking in spontaneously occurringbehavior. In Proceedings of the 16th International Conference on Pattern Recognition(ICPR ’2002), volume 4, pages 78 – 81, August 2002.

MORRIS, T. – BLENKHORN, P. – ZAIDI, F. Blink detection for real-time eye tracking.J. Netw. Comput. Appl. April 2002, 25, 2, pages 129–143. ISSN 1084-8045. doi: 10.1016/S1084-8045(02)90130-X. Available from: http://dx.doi.org/10.1016/S1084-8045(02)90130-X.

NAWAZ, Y. – SIRCAR, S. Real Time Eye Tracking and Blink Detection Using Low Reso-lution Web Cam. In IEEE NECEC, St Johns, Canada, November 2003.

PAN, G. et al. Eyeblink-based Anti-Spoofing in Face Recognition from a Generic Webcam-era. In The 11th IEEE International Conference on Computer Vision (ICCV’07), Rio deJaneiro, Brazil, October 2007.

PANDER, T. – PRZYBYLA, T. – CZABANSKI, R. An application of detection functionfor the eye blinking detection. In Human System Interactions, 2008 Conference on, pages287 –291, May 2008. doi: 10.1109/HSI.2008.4581450.

PANNING, A. – AL-HAMADI, A. – MICHAELIS, B. A color based approach for eye blinkdetection in image sequences. In Signal and Image Processing Applications (ICSIPA),2011 IEEE International Conference on, pages 40–45, 2011. doi: 10.1109/ICSIPA.2011.6144085.

RODRIGUEZ, J. D. et al. Investigation of extended blinks and interblink intervals in subjectswith and without dry eye. Clinical ophthalmology (Auckland, N.Z.). Jan 2013, pages 337– 342. doi: 10.2147/OPTH.S39356.

ROSENFIELD, M. Computer vision syndrome: a review of ocular causes and potentialtreatments. Ophthalmic and Physiological Optics. 2011, 31, 5, pages 502–515. ISSN1475-1313. doi: 10.1111/j.1475-1313.2011.00834.x. Available from: http://dx.doi.org/10.1111/j.1475-1313.2011.00834.x.

RYAN, S. B. et al. A long-range, wide field-of-view infrared eyeblink detector. Jour-nal of Neuroscience Methods. 2006, 152, pages 74 – 82. ISSN 0165-0270. doi:10.1016/j.jneumeth.2005.08.011. Available from: http://www.sciencedirect.com/science/article/pii/S0165027005003018.

SHI, J. – TOMASI, C. Good features to track. In Computer Vision and Pattern Recognition,1994. Proceedings CVPR ’94., 1994 IEEE Computer Society Conference on, pages 593–600, 1994. doi: 10.1109/CVPR.1994.323794.

43

http://link.aip.org/link/?PSISDG/7245/72450R/1



http://dx.doi.org/10.1016/S1084-8045(02)90130-X

http://dx.doi.org/10.1016/S1084-8045(02)90130-X

http://dx.doi.org/10.1111/j.1475-1313.2011.00834.x

http://dx.doi.org/10.1111/j.1475-1313.2011.00834.x




STERN, J. A. – WALRATH, L. C. – GOLDSTEIN, R. The Endogenous Eyeblink.Psychophysiology. 1984, 21, 1, pages 22–33. ISSN 1469-8986. doi: 10.1111/j.1469-8986.1984.tb02312.x. Available from: http://dx.doi.org/10.1111/j.1469-8986.1984.tb02312.x.

TOMASI, C. – KANADE, T. Detection and Tracking of Point Features Technical Re-port CMU-CS-91-132. Image Rochester NY. 1991, 91, April, pages 1–22. Availablefrom: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.5899&rep=rep1&type=pdf.

VIOLA, P. A. – JONES, M. J. Rapid Object Detection using a Boosted Cascade of Sim-ple Features. In CVPR, pages 511–518. IEEE Computer Society, 2001. Availablefrom: http://dblp.uni-trier.de/db/conf/cvpr/cvpr2001-1.html#ViolaJ01. ISBN 0-7695-1272-0.

XU, C. – PRINCE, J. L. Handbook of medical imaging. Orlando, FL, USA: AcademicPress, Inc., 2000. Gradient vector flow deformable models, pages 159–169. Availablefrom: http://dl.acm.org/citation.cfm?id=374166.374181. ISBN 0-12-077790-8.

YAN, Z. et al. Computer Vision Syndrome: A widely spreading but largely unknown epi-demic among computer users. Computers in Human Behavior. 2008, 24, 5, pages 2026– 2042. ISSN 0747-5632. doi: 10.1016/j.chb.2007.09.004. Available from: http://www.sciencedirect.com/science/article/pii/S0747563207001501.

YUNQI, L. et al. Recognition of Eye States in Real Time Video. In Computer Engineeringand Technology, 2009. ICCET ’09. International Conference on, volume 1, pages 554–559, 2009. doi: 10.1109/ICCET.2009.105.

44

http://dx.doi.org/10.1111/j.1469-8986.1984.tb02312.x

http://dx.doi.org/10.1111/j.1469-8986.1984.tb02312.x



http://dblp.uni-trier.de/db/conf/cvpr/cvpr2001-1.html#ViolaJ01

http://dblp.uni-trier.de/db/conf/cvpr/cvpr2001-1.html#ViolaJ01




Appendix A

Technical Documentation

Our application is implemented in C++ language using the OpenCV library. In this chapterwe describe some interesting parts of our algorithms. Complete source codes of our applica-tion are attached on DVD.

We describe the most important OpenCV functions we have used in the application1:

• CascadeClassifier::detectMultiScale: detects objects of different size in the im-age,

• calcHist: calculates a histogram,

• calcBackProject: calculates the backprojection of a histogram,

• absdiff: computes an absolute difference between two images,

• calcOpticalFlowPyrLK: computes an optical flow using the pyramid implementationof the KLT tracker,

• DescriptorExtractor::compute: computes descriptors for keypoints,

• DescriptorMatcher::match: finds the best match for each descriptor.

The main ideas of our blink detection algorithms are presented in:

• Listing A.1: histogram backprojection method,

• Listing A.2: frame difference method,

• Listing A.3: FREAK descriptor method,

• Listing A.4: Centre Aligned Movement Detection.

1http://opencv.willowgarage.com

http://opencv.willowgarage.com

Appendix A. Technical Documentation Patrik Polatsek

Listing A.1: Histogram backprojection method.if(!faceRect(img, face, faceCascade) && !init) { i++; continue;}img = img(face); // use only the face

cvtColor(img, hsv, CV_BGR2HSV );

/*we use histogram computed from an image sequence (the sum of 50histograms)*/

if(init<50 || i%frequency==0) // regularly update the histogram{

if(init<50) init++;Mat small = hsv(Rect(hsv.size().width/6, 0, hsv.size().width-2*

hsv.size().width/6, hsv.size().height)); // the background ofthe facial area is more cropped

hists[pos++] = h.getHistogram(&small, 1);if(pos==50) pos=0;for(int j=0;j<50;j++)

if(j==0) hist = hists[j];else if(!hists[j].empty()) hist += hists[j];

normalizeHistogram(hist, 255, HS); // normalize the histogram bythe second maximum value

}

Scalar col;MatND backproj;

calcBackProject(&hsv, 1, h.channels, hist, backproj, h.ranges); // createa backprojected image using the histogram

morphologyEx( backproj, backproj, MORPH_ERODE, Mat(), Point(-1, -1), 1);

Mat roi, thresholded;threshold(backproj, thresholded, 10, 255, CV_THRESH_BINARY);roi = thresholded(Rect(thresholded.size().width/5, thresholded.size().

height/4, thresholded.size().width-thresholded.size().width*2/5,thresholded.size().height/5)); // the eya area

col = mean(roi); // the mean of backprojectionval = col.val[0];

46


Listing A.2: Frame difference method.Mat diff;absdiff(frame_prev, frame_next, diff); // frame differencethreshold(diff, diff, 15, 255, CV_THRESH_BINARY);morphologyEx(diff, diff, MORPH_ERODE, Mat(), Point(-1,-1),1);

...

eyeArea = diff(leftRect); // frame difference within the left eyedouble leftDiff = (mean(eyeArea).val[0]);

eyeArea = diff(rightRect); // frame difference within the right eyedouble rightDiff = (mean(eyeArea).val[0]);

eyeDiff = (leftDiff + rightDiff)/2.; // the average ocular difference

rectangle(diff, leftRect, Scalar(0),-1); // remove the frame differencewithin the left eye

rectangle(diff, rightRect, Scalar(0),-1); // remove the frame differencewithin the right eye

diff = diff(faceRect); // frame difference within the rest of face (without the difference within the eyes)

faceDiff = (mean(diff).val[0]);

Listing A.3: FREAK descriptor method.freak.compute(frame_next, keys, des_next); // compute a matrix of FREAK

descriptors

BruteForceMatcher<Hamming> matcher;vector<DMatch> matches;matcher.match(des_prev, des_next, matches); // match descriptors

int hamming = 0, rowHamming = 0;

for(int i=0; i<matches.size(); i++) // compute the hamming distance{

hamming += (int) matches[i].distance;if(matches[i].distance>=100) rowHamming++;

}

47


Listing A.4: Centre Aligned Movement Detection.calcOpticalFlowPyrLK(frame, frame_next, points[1], points[2], status, err

); // compute the optical flow

for(int i= 0; i < points[2].size(); i++ ) { // compute the Euclideandistance and the number of points within each group of pointsif(status[i]){

dx = points[2][i].x-points[0][i].x;dy = points[2][i].y-points[0][i].y;distance = sqrt(dx*dx+dy*dy); // Euclidean distance

if(eyePoints[i]==LEFTEYE) // left eye{

leftCount++;leftAvg += distance;

}else if(eyePoints[i]==RIGHTEYE) // right eye{

rightCount++;rightAvg += distance;

}else // nose{

count++;avg += distance;yDist += points[2][i].y - points[0][i].y;

}}else lost++; // lost points

}

/*the average displacement of feature points*/if(count) avg /= count; // average displacement of non-ocular points (

Euclidean distance)if(leftCount) leftAvg /= leftCount; // average displacement of left

ocular points (Euclidean distance)if(rightCount) rightAvg /= leftCount; // average displacement of left

ocular points (Euclidean distance)if(yDist) yDist /= count; // average displacement of non-ocular points in

y-axis

double finalDiff = max(abs(leftAvg-avg),abs(rightAvg-avg)); // differencebetween ocular and non-ocular displacement

48

Appendix B

User Guide

In order to run our application, the following requirements must be satisfied:

1. Microsoft Visual Studio 2010,

2. OpenCV 2.4.2 compiled for Microsoft Visual Studio 2010:

• the default directory: C:\opencv 2.4.2,

• Library Directories: C:\opencv 2.4.2\build\Qt_gpu_tbb\lib,

3. Intel Threading Building Block (TBB)(the default directory is C:\tbb40_20120408oss),

4. Copy the whole solution directory to your hard drive,

5. If OpenCV and Intel TBB are installed in other than the default directories, the path toInclude and Library Directories must be adjusted in project property files:configOpenCVDebug.props and configOpenCVRelease.props.

Consequently you can open the project in Microsoft Visual Studio 2010. It contains allmentioned blink detection algorithms.

The "offline" versions of the algorithms read images from the selected folder. It must containjpg image files named as image-XXXX.jpg, where XXXX is the ordinal number (starting with0001). It processes the images one by one and it displays relevant information about theactual image. It notifies for the detected blink through a console output. At the end ofprocessing (stopped by the user by pressing ESC or no more unprocessed images left) itdisplays the statistics including the number of blinks, the ordinal numbers of frames whereblinks were detected and the graph representing the values of the processed images, whichour application compared with the threshold.

Our best algorithm – Centre Aligned Movement Detection is also implemented in the "on-line" version which processes images obtained from the webcam. We recommend to use awebcam with at least 25 fps (preferably 30 fps). If it detects the blink, it prints on a consoleoutput the time of detection.

After running the project, the displayed menu offers the following features:

• Backprojection (Histogram HS): It runs the "offline" blink detection algorithmbased on backprojection using 2D hue-saturation histogram.

Appendix B. User Guide Patrik Polatsek

• Backprojection (Histogram S): It runs the "offline" blink detection algorithm basedon backprojection using 1D saturation histogram.

• Frame difference (grayscale): It runs the "offline" blink detection algorithm basedon frame difference using grayscale images.

• Frame difference (RGB): It runs the "offline" blink detection algorithm based onframe difference using colour RGB images.

• FREAK descriptor: It runs the "offline" blink detection algorithm based on FREAKdescriptors.

• CAMD: offline: It runs the "offline" blink detection algorithm called Centre AlignedMovement Detection based on the KLT tracker.

• CAMD: online: It runs the "online" blink detection algorithm called Centre AlignedMovement Detection based on the KLT tracker.

• Capture images: It captures images from the webcam and stores them in a selectedfolder (if it does not exist, it creates a new folder). You can use them as the inputimages for the "offline" blink detection algorithms.

• Convert video to images: It stores a video file (.avi) to the sequence of images(.jpg). You can use them as the input images for the "offline" blink detection algo-rithms.

You can also use for testing our algorithms image sequences from our own dataset (attachedon DVD).

Backprojection

Backprojection method displays for each frame the original input image, separate colourchannels of the input (according to the used histogram) and adjusted backprojected imagewithin the face and eye area. The application writes to the console the filename and theaverage value of the backprojection within the eyes.

Frame difference

The method of the frame difference displays the input, resulting frame difference of thewhole image, within the eyes and the rest of the face. It writes in a console the currentfilename, the average value of the frame difference within the eyes and the rest of the faceand the difference between them.

FREAK Descriptor

The method which computes FREAK descriptors displays the input as well as the positionsof keypoints which are described by descriptors (yellow points). In the console it writesthe filename together with the number of descriptors, the summed hamming distance of allmatched descriptors and the number of descriptors with more than 100 different bits.

50


CAMD: offline

CAMD displays the original image and the image with tracked KLT features and their dis-placement. The points within the eyes are represented by yellow points and the points withinthe nose by blue points. Green lines represent the displacement of ocular points and red linesthe displacement of non-ocular points. The application writes to the console the name of theactual image, the number of features within the nose and eyes, the difference between theaverage displacements of both groups of points, the number of ocular points which moveddownwards and all threshold values.

CAMD: online

The "online" version of CAMD requires a webcam. It just displays the actual captured imagefrom the webcam with the tracked features (the same representation as in CAMD: offline)(Figure B.1) and it writes in the console the time of detected blink.

Figure B.1: Example of tracking feature points in the "online" version of CAMD.

51


52

Appendix C

IIT.SRC 2013 paper

IIT.SRC 2013 is the student research conference in Informatics and Information Technolo-gies organised by the Faculty of Informatics and Information Technologies of the SlovakUniversity of Technology in Bratislava. Our full paper was accepted for this conference andpresented in poster session.

Eye Blink Detection

Patrik POLATSEK∗

Slovak University of Technology in BratislavaFaculty of Informatics and Information Technologies

Ilkovicova 3, 842 16 Bratislava, [email protected]

Abstract.Nowadays, people spend more time in front of electronic screens likecomputers, laptops, TV screens, mobile phones or tablets which cause eye blinkfrequency to decrease. Each blink spreads the tears on the eye cornea to mois-ture and disinfect the eye. Reduced blink rate causes eye redness and drynessalso known as Dry Eye, which belongs to the major symptoms of the ComputerVision Syndrome. The goal of this work is to design eye blink detector whichcan be used in dry eye prevention system. We have analyzed available tech-niques for blink detection and designed our own solutions based on histogrambackprojection and optical flow methods. We have tested our algorithms ondifferent datasets under various lighting conditions. Inner movement detectionmethod based on optical flow performs better than the histogram based ones.We achieve higher recognition rate and much lower false positive rate thanthe-state-of-the-art technique presented by Divjak and Bischof.

1 Introduction

The number of people using computers every day increases. There are also more people who sufferfrom symptoms collectively called Computer Vision Syndrome (CVS). It is a set of problems relatedto computer use. The rate of unconscious eye blinking while looking at luminous objects withinclose distance reduces significantly (up to 60 % reduction). Blinking helps us to spread the tearfilm and moisten the surface of the eye, due to which the reduced rate of blinking leads to Dry Eye.Typical ocular complaints experienced by intensive computing work (more than 3 hours per day)include dryness, redness, burning, sandy-gritty eye irritation or sensitivity to light and eye fatigue.The easiest way to avoid the symptoms of Dry Eye is to blink regularly [3, 13].

Our aim is to create eye blink detector, which could be used in real-time blink detection system.In case of low blink rate it will notify a user to blink more frequently. This paper proposes twodifferent methods on blink detection. The first of presented algorithms computes backprojection from1D saturation and 2D hue-saturation histogram. The second method addressed as Inner movementdetection detects eyelid motion using Lucas-Kanade (KLT) feature tracker [11].

∗ Bachelor study programme in field: InformaticsSupervisor: Andrej Fogelton, Institute of Applied Informatics, Faculty of Informatics and InformationTechnologies STU in Bratislava

IIT.SRC 2013, Bratislava, April 23, 2013, pp. 1–8.

2 To Be Added by Editor

2 Related Work

Optical flow in [7] tracks eyelid movements to detect eye blinks. Detection is based on matchingSIFT (scale-invariant feature transform) descriptors computed on GPU. First, thresholded framedifference inside the eye region locates motion regions. Consequently, these regions are being usedto calculate the optical flow. While user blinks, eyelids move up and down and the dominant motionis in vertical direction. This method detects 97% of blinks on their dataset. Most of the false positivedetections are the result of gaze lowering and vertical head movements. Method based on opticalflow estimation is also presented in [4]. It locates eyes and face position by 3 different classifiers.The algorithm is successful mostly when the head is directly facing the camera. The KLT trackeris used to track the detected feature points. This blink detector uses GPU-based optical flow in theface region. The flow within eyes is compensated for the global face movement, normalized andcorrected in rotation when eyes are in non-horizontal position. Afterwards dominant orientationof the flow is estimated. The flow data are processed by adaptive threshold to detect eye blinks.Authors report good blink detection rate (more than 90%). However this approach has problemswith detecting blinks when eyes are quickly moving up and down.

The eyelid movements are estimated by normal flow instead of optical flow in [6]. It isthe component of optical flow that is orthogonal to the image gradient. Authors claim that thecomputation of normal flow is more effective than the previous method.

Arai et al. present Gabor filter-based method for blink detection in [1]. Gabor filter is a linearfilter for extracting contours within the eye. After applying the filter, the distance between detectedtop and bottom arc in eye region is measured. Different distance indicates closed or opened eye. Theproblem of arc extraction arises while the person is looking down.

Variance map specifies distribution of intensities from the mean value in an image sequence.The intensity of pixels located in eye region changes during the blink, which can be used in detectionprocess as in [10].

Correlation measures the similarity between actual eye and open eye image. As someone closeseyes during the blink, correlation coefficient decreases. Blink detection via correlation for immobilepeople is presented in [5].

A blink detection algorithm in [9] is based on the fact that the upper and lower part of eye havedifferent distribution of mean intensities during open eyes and blinks. These intensities cross duringthe eyelid closing and opening.

Liting et al. [8] use a deformable model - Active Shape Model represented by several landmarksas the eye contour shape. Model learns the appearance around each landmark and fits it in the actualframe to obtain new eye shape. Blinks are detected by the distance measurement between upper andlower eyelid.

Ayudhaya et al. [2] detect blinks by the eyelid’s state detecting (ESD) value calculation. Itincreases the threshold until the resulting image has at least one black pixel after applying medianblur filtering. This threshold value (ESD) differs while user blinks.

3 Proposed Algorithms

Due to our goal to do CVS preventing system, our main focus is on model situation when theuser is facing the computer screen. Because of this, the high recognition rate within the efficientcomputation is necessary.

We introduce two methods based on histogram backprojection and the Inner movement detectionbased on KLT feature tracker.

3.1 Histogram Backprojection

We use histogram to represent skin color of the user. Histogram backprojection creates a probabilitymap over the image. In other words backprojection determines how well the pixels from the image

Patrik Polatsek: Eye Blink Detection 3

fit the distribution of a given histogram. Higher value in a backprojected image denotes more likelylocation of the given object. We detect closed eyes as high percentage of skin color pixels within theeye region otherwise we consider eyes opened (Figure 1).

(a) Source images. (b) Backprojected images.

Figure 1. Histogram backprojection for a person with open and closed eyes.

We use the HSV (Hue Saturation Value) color model to achieve partial luminance invariance by theomission of the Value channel. We did experiments with two different histograms:

– 1D saturation histogram (histogram S),

– 2D hue-saturation histogram (histogram HS).

First we detect the user’s face by Haar Cascade Classifier [12]. We calculate the skin color histogramfrom a sequence of images of face regions. Other parts of the image are not used to obtain as preciseskin color histogram as possible. Histogram is normalized afterwards and regularly updated. Forevery input image we calculate the backprojection with this histogram. Subsequently a resultingbackprojected image is modified using morphological operations (Open and Erode) and threshold(thresholdHS = 10 in hue-saturation and thresholdS = 25 in saturation histogram obtainedby experiments) to increase small difference between open and closed eyelids due to lower skinprobability of eyelids caused by shadows in eye areas or make-up. Finally the average value of theprobabilities is calculated from the region of the user’s eyes. Significant increase is considered aseye blink of the user. Figures 3(a) and 3(b) illustrate results of backprojection while using differenthistograms.

3.2 Inner Movement Detection

We introduce our own Inner Movement Detection algorithm based on optical flow. Optical flowlocates new feature position in the following frame. One of the most common method called KLTtracker [11] selects features suitable for tracking with high intensity changes in both directions.

If a user blinks, the mean displacement of feature points within the eye region should be greaterthan the displacement of the rest of the points within the face area (Figure 2).

The first step consists of localizing a user’s face and eyes using Haar Cascade Classifier [12]on grayscale image. We initialize random KLT features within the eye and nose regions and clas-sify them as left ocular, right ocular or non-ocular. These features are being tracked by KLTtracker. Tracking is reinitialized in regular intervals or in case of loss of many feature points.We compute the average displacement separately for three groups of points. Afterwards we com-pare the difference between the left or right ocular and non-ocular movement displacements. Ifthis difference exceeds a threshold value (thresholddiff = face.height/165, where face.heightis the height of detected face in the initial phase), a movement within the eye region is antici-pated. Consequently we count the ratio of ocular points that moved down at least of a specific


distance in the direction of y-axis (distancey = face.height/110) in order to exclude false pos-itives caused by horizontal eye movements. Due to proper computation of the ratio we elimi-nate the vertical ocular displacement caused by head movements. The ocular points are thereforeshifted in a distance equal to the average displacement of non-ocular points. If the ratio is higherthan a threshold (5% of displacements of one group of ocular points and 2% of displacement ofthe second one), we consider it as a blink. Figure 3(c) represents a graph of values defined asmax(abs(avg(left)−avg(non)), abs(avg(right)−avg(non))), where max and abs are the max-imum and absolute value, avg indicates the average movement within a given region and left, rightand non denote left ocular, right ocular and non-ocular region.

(a) Source images. (b) Displacement of feature points.

Figure 2. Tracking of feature points using KLT feature tracker by eye blink.

4 Evaluation

Our blink detection algorithms are evaluated on two datasets. Our own dataset includes 8 individuals(5 males and 3 females, one person wearing glasses) under different lighting conditions who sit infront of a computer screen mostly in a stable position and looking directly at the screen. It consistsof 7569 frames and 128 blinks. The second image sequence - the Talking Face Video (TALKING)is publicly available from Inria1. It includes 5000 images of a person engaged in conversation whoblinks 61 times.

We have tested our algorithms and compared their blink detection abilities to the optical flowmethod mentioned in [4]. The results are shown in Table 4. The best true and false positive rate areachieved by Inner movement detection. It detects 93,75% of blinks on own dataset and 98,36% ofblinks on the Talking Face Video.

Backprojections using hue-saturation and saturation histogram provide similar accuracies. Val-ues of saturation channel of an image differ in case of skin and pupil in most light conditions, thus itprovides reliable information about user’s blinks. However hue channel is often different in wholeeye regions. Sometimes it is without any significant changes when eye blinks. It happens mostlyin very dark images. False detections are the results of luminance changes, poor light conditions,

1 http://www-prima.inrialpes.fr/FGnet/data/01-TalkingFace/talking_face.html

Patrik Polatsek: Eye Blink Detection 5

changes in gaze direction, facial mimicry such as smiling and eyelid makeup. In such cases it is verydifficult to recognize whether a user blinks or not. Backprojection using hue-saturation histogramhas many missed blinks when an individual wears glasses.

Inner movement detection, our best method, has 14 false positive and 9 false negative casescaused mainly by rapid head movements, lowering the gaze and reflection from glasses.

(a) Graph of the average values of backprojection using 2D hue-saturation histogram.

(b) Graph of the average values of backprojection using using 1D saturation histogram.

(c) Graph of the differences between the average ocular and non-ocular movement within the face area.

Figure 3. Graphs produced using our eye blink detection algorithms. They are computed from an imagesequence of a user while working in front of computer. The user blinks at frames 18, 33, 50, 69 and 90.

Detected blinks are represented by circles on the graph.

Table 1. Comparison of our blink detection algorithms to the method in [4]. TP represents true positive rateand FP is false positive rate.

Method Own dataset TALKINGTP FP TP FP

Backprojection (Histogram S) 81,25% 0,40% 88,52% 0,49%Backprojection (Histogram HS) 75,00% 0,32% 85,25% 0,47%Inner movement detection 93,75% 0,05% 98,36% 0,20%Method in [4] - - 95% 19%


5 Conclusion

In this paper we proposed two techniques for eye blink detection. The first method detects blinks bybackprojection using saturation or hue-saturation histogram. The second method is based on KLTfeature tracker, which tracks eyelid motions. The model situation is a user looking at the computerscreen. Inner movement detection method outperforms the method in [4]. It provides over 3% bettertrue positive rate and about 18% lower false positive rate.

Acknowledgement: This project is supported by Tatra bank foundation E-Talent 2012et009.

References

[1] Arai, K., Mardiyanto, R.: Comparative Study on Blink Detection and Gaze Estimation Methodsfor HCI, in Particular, Gabor Filter Utilized Blink Detection Method. In: Proceedings of the2011 Eighth International Conference on Information Technology: New Generations. ITNG’11, Washington, DC, USA, IEEE Computer Society, 2011, pp. 441–446.

[2] Ayudhaya, C., Srinark, T.: A method for a real time eye blink detection and its applications.In: The 6th International Joint Conference on Computer Science and Software Engineering(JCSSE), 2009, pp. 25 – 30.

[3] Blehm, C., Vishnu, S., Khattak, A., Mitra, S., Yee, R.W.: Computer Vision Syndrome: AReview. Survey of Ophthalmology, 2005, vol. 50, no. 3, pp. 253 – 262.

[4] Divjak, M., Bischof, H.: Eye blink based fatigue detection for prevention of Computer VisionSyndrome. In: IAPR Conference on Machine Vision Applications (MVA 2009), 2009, pp.350–353.

[5] Grauman, K., Betke, M., Gips, J., Bradski, G.: Communication via eye blinks - detectionand duration analysis in real time. In: Computer Vision and Pattern Recognition, 2001. CVPR2001. Proceedings of the 2001 IEEE Computer Society Conference on. Volume 1., 2001, pp.I–1010 – I–1017 vol.1.

[6] Heishman, R., Duric, Z.: Using Image Flow to Detect Eye Blinks in Color Videos. In:Applications of Computer Vision, 2007. WACV ’07. IEEE Workshop on, 2007, p. 52.

[7] Lalonde, M., Byrns, D., Gagnon, L., Teasdale, N., Laurendeau, D.: Real-time eye blinkdetection with GPU-based SIFT tracking. In: Proceedings of the Fourth Canadian Conferenceon Computer and Robot Vision. CRV ’07, Washington, DC, USA, IEEE Computer Society,2007, pp. 481–487.

[8] Liting, W., Xiaoqing, D., Changsong, L., Wang, K.: Eye Blink Detection Based on Eye Contourextraction. In: Image Processing: Algorithms and Systems, SPIE Electronics Imaging, 2009,p. 72450.

[9] Moriyama, T., Kanade, T., Cohn, J., Xiao, J., Ambadar, Z., Gao, J., Imanura, M.: Automaticrecognition of eye blinking in spontaneously occurring behavior. In: Proceedings of the 16thInternational Conference on Pattern Recognition (ICPR ’2002). Volume 4., 2002, pp. 78 – 81.

[10] Morris, T., Blenkhorn, P., Zaidi, F.: Blink detection for real-time eye tracking. J. Netw. Comput.Appl., 2002, vol. 25, no. 2, pp. 129–143.

[11] Tomasi, C., Kanade, T.: Detection and Tracking of Point Features Technical Report CMU-CS-91-132. Image Rochester NY, 1991, vol. 91, no. April, pp. 1–22.

[12] Viola, P.A., Jones, M.J.: Rapid Object Detection using a Boosted Cascade of Simple Features.In: CVPR, IEEE Computer Society, 2001, pp. 511–518.

[13] Yan, Z., Hu, L., Chen, H., Lu, F.: Computer Vision Syndrome: A widely spreading but largelyunknown epidemic among computer users. Computers in Human Behavior, 2008, vol. 24, no.5, pp. 2026 – 2042.

Appendix C. IIT.SRC 2013 paper Patrik Polatsek

60

Appendix D

BMVC 2013 paper

The British Machine Vision Conference (BMVC) is the international conference on com-puter vision which will be held in Bristol in 2013. All accepted papers will be published inproceedings and make available by Springer. We have submitted our paper and it is in reviewprocess.

POLATSEK, FOGELTON: EYE BLINK DETECTION 1

Eye Blink Detection

Patrik [email protected]

Andrej [email protected]

Vision and Graphics GroupInstitute of Applied InformaticsFaculty of Informatics and InformationTechnologySlovak University of TechnologyBratislava, Slovakia

Abstract

Nowadays, people spend more time in front of electronic screens like computers,laptops, TVs, mobile phones or tablets. Looking at them causes eye blink frequencyto decrease, because they are close (usually half a meter) and produce light, more overwe are focused on our work and our subconscious is distracted. Each blink spreads thetears on the eye cornea to moisture and disinfect the eye. Reduced blink rate causeseye redness and dryness also known as Dry Eye. We have analysed available techniquesfor eye blink detection and designed our own solutions based on optical flow, framedifference and FREAK descriptor methods. We have tested our algorithms on differentdatasets under various lighting conditions. Centre Aligned Movement Detection methodbased on optical flow achieves higher recognition rate and much lower false positive ratethan the-state-of-the-art technique presented by Divjak and Bischof for the Talking FaceVideo dataset.

1 IntroductionThe number of people using computers increases every day. There are also more people whosuffer from symptoms collectively called Computer Vision Syndrome (CVS). It is a set ofproblems related to computer use. The rate of unconscious eye blinks reduces significantly(up to 60% reduction) while looking at luminous objects within close distance. Blinkinghelps us to spread the tear film, moisten the surface of the eye and disinfects the eye. Re-duced blink rate often leads to Dry Eye. Typical ocular complaints experienced by intensivecomputing work (more than 3 hours per day) include dryness, redness, burning, sandy-grittyeye irritation or sensitivity to light and eye fatigue. The easiest way to avoid the symptomsof Dry Eye is to blink regularly [4, 18].

Our aim is to detect human eye blinks in real-time, that is to reason to focus only onrelevant principles. In this paper we present 3 different methods of eye blink detection.First method addressed as the Centre Aligned Movement Detection detects eyelid motionusing Lucas-Kanade (KLT) feature tracker [16]. The second technique is based on framedifference using grayscale or colour images. The last one uses the FREAK descriptor [1] todetect blinks. Unfortunately most of the methods evaluated their result on private datasets.We compare all the proposed methods with one of the-state-of-the-art algorithm on twopublicly available datasets.

c© 2013. The copyright of this document resides with its authors.It may be distributed unchanged freely in print or electronic forms.

2 POLATSEK, FOGELTON: EYE BLINK DETECTION

2 State of the artOptical flow is used in [10] to track eyelid movements and detect the eye blinks. The eye areais detected in the initial phase by analysing the horizontal and vertical profile of the image.Afterwards it is described by a SIFT (scale-invariant feature transform) descriptor computedon GPU. Motion is detected within the eye regions using threshold on frame difference.Consequently, these regions are used to calculate the optical flow. While user blinks, eyelidsmove up and down and the dominant motion is in the vertical direction. This method detects97% of blinks on their own dataset. Most of the false detections are caused due to the gazelowering and vertical head movements.

A method based on optical flow estimation is also presented in [7] as the eye fatiguedetector to prevent from CVS. It locates eyes and face position by 3 different classifiers.The algorithm is successful mostly when the head is directly facing the camera. GPU KLTtracker is used to track the detected feature points. This approach is based on the normalflow algorithm mentioned in [6]. Therefore we suppose that all parts of the proposed algo-rithm which are not explained, remain unchanged. The flow within the eyes is compensatedto the global face movements, normalized and corrected in rotation when the eyes are innon-horizontal position. Afterwards the dominant angle of optical flow orientation can beestimated. During the blink, the flow perpendicular to line connecting both eyes should bedominant (the downward motion represents the angle between 200◦ and 340◦). The flowdata are processed by an adaptive threshold. The results of left and right eye are merged todetect eye blinks. However authors do not specify the way how the results are combined.Authors report 95% blink detection rate on the Talking Face Video dataset from INRIA. Un-fortunately this approach has problems to detect eye blinks while quick head movements upand down.

The eyelid movements are estimated by normal flow instead of optical flow in [9]. It isthe component of optical flow that is orthogonal to the image gradient. A magnitude anda direction of the normal flow are used in a deterministic finite state machine (DFSM) toestimate blink parameters. DFSM has three states to determine in which phase are the sub-ject’s eyes: steady (open), opening and closing state. The disadvantage is that the thresholdstrategy used in this algorithm requires to set various thresholds manually depending on thesubjects and conditions. Authors claim that the computation of normal flow is more effectivethan the previous method.

A similar method is applied in [6]. Face and eyes of a subject are detected and trackedusing Viola and Jones classifier. If the detection is unsuccessful, feature points detected bythe FAST feature detector are tracked by the KLT tracker. Normal flow vectors computedin the eye region are used to detect eye blinks. The flow is corrected and normalized inthe same way as described in [7]. This approach utilizes and improves DFSM from pre-vious work by adding a new state: closed to detect more variations in eye movements,for example holding eyes closed. Authors use for blink extraction a threshold defined asT = 6×standard_deviation(n), where n is the flow magnitude in the stationary eye state. Insome cases this threshold must be set manually.

Arai et al. present Gabor filter-based method for blink detection in [2]. Gabor filter isa linear filter for extracting contours within the eye. After applying the filter, the distancebetween the detected top and bottom arc in eye region is measured. Different distance indi-cates closed or opened eye. The problem of arc extraction arises while the person is lookingdown.

Variance map specifies the distribution of intensities from the mean value in an image


sequence. The intensity of pixels located within the eye region changes during the blink andit can be used in detection process [13]. Due to unconscious head movements, face can bedetected using the thresholded accumulated difference frame obtained from several differ-ence frames of consecutive grayscale images. Blink detection is described in the algorithmabove. After the creation of a variance map, the rate of thresholded pixels is calculated and ifit exceeds a specific value, a blink is detected. Possible eye blink regions are then eliminatedby their size, horizontal symmetry, mutual position in the face and similarity. Selected eyeblink pair is adjusted to the same size.

Correlation between the actual eye state and the open eye image is measured. Closingeyelid during the blink decreases the correlation coefficient. Blink detection via correlationfor immobile people is presented in [8]. The first step which locates user’s eyes after the ini-tial blink consists of a frame difference which is thresholded and processed by erosion. Theresulting eye blink pair is eliminated from candidates by various filters. After the localiza-tion, an open eye template is captured and used in simple eye tracker based on the correlationbetween the actual eye region and the template. The proposed system detects blinks whenthe computed correlation score ranges from value 0.55 to 0.8.

A blink detection algorithm in [12] is based on the fact that the upper and lower partof eye have different distribution of the mean intensities during eyes open state and blinks.These intensities cross during the eyelid closing and opening.

Liting et al. [11] use a deformable model – Active Shape Model represented by severallandmarks as the eye contour shape. Model learns the appearance around each landmarkand fits it in the actual frame to obtain new eye shape. Blinks are detected by the distancemeasurement between upper and lower eyelid.

Ayudhaya et al. [3] detect blinks by the eyelid’s state detecting (ESD) value calculation.It increases the threshold until the resulting image has at least one black pixel after applyingmedian blur filtering. This threshold value (ESD) differs while user blinks.

Yunqi et al. [19] use Circular Hough Transformation (CHT) to detect the iris within theeye. The absence of the iris is considered as blink.

Castro [5] uses an IR sensor placed in the temple of the glasses. It transmits infra-redlight towards eyes and receives the reflected amount. If the eye is closed, the amount ofreflected light changes. This method can be used in cars as a prevention from a micro sleep.

Pander et al. [15] present electrooculography as a method for blink detection. Electroocu-lographic (EOG) signal is based on the electrical potential difference between the cornea andthe retina. If eyelids are closed during the blink, the potential around the eye changes as aresult of movements of the eyelid muscles.

3 Real-time eye blink detectionTo do CVS preventing system, our main focus is a model situation of a user facing thecomputer screen. Because of this, the high recognition rate within the efficient computationis necessary. We introduce three methods based on frame difference, FREAK descriptor andKLT feature tracker.

3.1 Centre Aligned Movement DetectionWe introduce our own Centre Aligned Movement Detection algorithm based on optical flow.We use KLT tracker [16] to observe the movement inside the face region. If a user blinks,


(a) Source images. (b) The feature displacement.

Figure 1: Tracking of feature points using KLT feature tracker during an eye blink.

the mean displacement of feature points within the eye region should be greater than thedisplacement of the points within the nose area (Figure 1).

The first step consists of localizing a user’s face and eyes using Haar Cascade Classi-fier [17] on grayscale image. We initialize random KLT features within the eye and noseregions and classify them as left ocular, right ocular or non-ocular. These features are beingtracked by KLT tracker. Tracking is reinitialized in regular intervals or in case of loss ofmany feature points.

We compute the average displacement separately for three groups of points. Afterwardswe compare the difference between the left (or right ocular) and the non-ocular movementdisplacements. If this difference exceeds a threshold value (thresholddi f f = f ace.height/165,where f ace.height is the height of detected face in the initial phase), a movement within theeye region is anticipated (We obtained all our thresholds during observations.). Consequentlywe count the ratio of ocular points that moved down at least of a specific distance in the di-rection of y-axis (distancey = f ace.height/110) in order to prevent false positives caused byhorizontal eye movements. Due to precise computation of the ratio we eliminate the verticalocular displacement caused by head movements. The ocular points are therefore shifted ina distance equal to the average displacement of non-ocular points. If the ratio is higher thana threshold (5% of displacements of one group of ocular points and 2% of displacement ofthe second one), we consider it as a blink. Figure 3(a) represents a graph of values definedas max(abs(avg(le f t)−avg(non)),abs(avg(right)−avg(non))), where max and abs are themaximum and absolute value, avg indicates the average movement within a given region andle f t, right and non denote left ocular, right ocular and non-ocular region. This method hassimilar blink detection process as the method in [7]. It is also based on optical flow and triesto detect eyelid motions excluding the global head motions. [7] defines the downward motionas the angle of flow vectors close to the parallel to the line connecting the eyes. However, ourmethod classifies the downward motion without the computation of the dominant angle. Wedefine it as the certain downward movement in the direction of the y-axis. Our method doesnot combine detected motion within the left and right eye region which is advantageous forexample when the reflection from glasses makes worsen the visibility of one eye. In order toextract the eyelid motion we do not use the features points from the whole face region. Suchpoints can influence facial mimicry such as smiling or talking, due to which our solutiontracks only points within the nose.

3.2 Frame differenceFrame difference returns the difference between two images. The following blink detectionmethod uses frame difference in grayscale and RGB colour image. We found out that thedifference within the eye area is mostly multiple times greater than within the rest of facialimage when a user blinks (Figure 2).

In the initial phase we detect a user’s face and eyes by Haar Cascade Classifier [17].


(a) Source images. (b) The frame difference.

Figure 2: Result of frame difference on INRIA Talking Face Video dataset.

Afterwards, their positions are maintained by KLT feature tracker in a grid within the eyes.The tracker is reinitialized in regular intervals or in case of lost of many features. Within thefacial area we compute a frame difference between the actual and last but one image. In orderto eliminate noise and make the results more significant we apply threshold (threshold = 15)and erode the results.Subsequently we compute separately for the frame difference image the average value withinthe left and the right eye region and the rest area of the face. For colour images it is computedas the mean values for every colour channel. The average difference within the eyes is com-puted as the mean from the values within the left and right eye. We detect an eye blink whenthe average differences within the eyes (di f f (eye)) and the rest of the face (di f f ( f ace))satisfy the following conditions: di f f (eye) > 8 and di f f (eye)− 4× di f f ( f ace) > 4. Theresults of this blink detection method are represented by Figures 3(b) and 3(c).

3.3 FREAK descriptor

FREAK (Fast REtinA Keypoint) is a binary keypoint descriptor [1]. It is created by compar-ing 43 Gaussian-smoothed intensities at the locations in the neighbourhood of the keypointwith the pattern inspired by the human retina. An eye blink changes the appearance neigh-bourhood of keypoints detected within the eyes, due to which the descriptors computed dur-ing the blink and non-blink state are significantly different. Thus the hamming distance ofthe descriptors computed from ocular feature points in subsequent images increases duringthe blink.

This method firstly detects the eye area using the Haar Cascade Classifier [17] ongrayscale image. The regions of interest are subsequently tracked by KLT feature tracker.The reinitialisation of the algorithm comes in regular intervals, after the blink or in case oflost of many tracked feature points. For each eye region we set keypoints in a grid withstep of 3 to 4 pixels and calculate the FREAK descriptors. Consequently we find for eachdescriptor the closest one in the next image according their hamming distance.

The threshold for blink detection is based on the hamming distance. We detect a blinkwhen at least 15% of matched descriptors have more than 100 bits different and the sum ofthe hamming distances between the all matched descriptors is higher than 12%. Figure 3(d)represents a graph of the total hamming distance between matched descriptors.

4 EvaluationOur blink detection algorithms are evaluated on three datasets. Our own dataset includes8 individuals (5 males and 3 females, one person wearing glasses) under different lightingconditions who sit in front of a computer screen mostly in a stable position and lookingdirectly at the screen. It consists of 7569 frames (640×480 size) and 128 blinks. The second


10 20 30 40 50 60 70 80 90

0

frame number

diffe

renc

e

1

2

3

4

(a) Centre Aligned Movement Detection: differencesbetween the average ocular and non-ocular movementwithin the face area.

10 20 30 40 50 60 70 80 90

0

15

30

frame number

diffe

renc

e

(b) Frame difference in grayscale: difference betweenocular and non-ocular movement.

10 20 30 40 50 60 70 80 90

0

15

30

frame number

diffe

renc

e

(c) Frame difference in color: difference between ocularand non-ocular movement.

10 20 30 40 50 60 70 80 90 frame number

4

8

12

perc

enta

ge

(d) Sum of the hamming distances between matchedFREAK descriptors.

Figure 3: Graphs produced using our eye blink detection algorithms. They are computedfrom an image sequence of a user while working in front of computer. The user blinks atframes 18, 33, 50, 69 and 90. Detected blinks are represented by circles on the graph.

image sequence – the Talking Face Video (TALKING) is publicly available from INRIA1.It includes 5000 images (720×576 size) of a person engaged in conversation who blinks 61times. The last public dataset is the ZJU Eyeblink Database (ZJU) [14]. It contains 80 shortvideo clips (10876 frames of 320×240 size) of 20 individuals (13 males and 7 females).Subjects are captured in the frontal and upward view with and without glasses. There aretotally 255 complete blinks.

We have tested our algorithms and compared their blink detection abilities to the opticalflow method mentioned in [7]. We assumed that our results will be more similar, so we havedoubts about the proper computation of the false positive rate in this method. We think thatauthors computed the false discovery rate which is the rate of false positives to total numberof detections. This would explain their high false positive rate values. Authors do not explainhow they have computed the detection rate in the ZJU Eyeblink Database. The results areshown in Table 1. The best overall true and false positive rate are achieved by Centre AlignedMovement Detection. It detects 93,75% of blinks on own dataset, 98,36% of blinks on theTalking Face Video and 89,85% of blinks on ZJU Eyeblink Database.

Centre Aligned Movement Detection (CAMD), our best method, has only 16 false posi-tive cases. False detections are caused mainly by rapid head movements, lowering the gazeand reflection from glasses. Too small distance between the eyelids of some Chinese per-son in ZJU Eyeblink Database decreases the true positive rate on this dataset. This methodhas only one missed blink in the Talking Face Video caused by gaze lowering. CAMD hasthe best trade-off between the true and false positive rate and the lowest false positives bywearing glasses.

The frame difference method can use colour or grayscale images to achieve almost thesame detection rate. Despite the attempt to reduce false detections in this method, it isstill difficult to differ open and closed eyes by fast head and pupil movements or facialmimicry. Missed blinks also occur in poor lighting conditions especially using grayscaleimages. Many false detections are also caused by wearing glasses. This method has the



Table 1: Comparison of our blink detection algorithms to the method in [7]. TP representstrue positive rate and FP is false positive rate.

Method Own dataset TALKING ZJUTP(%) FP(%) TP(%) FP(%) TP(%) FP(%)

Backprojection (Hist. S) 81,25 0,40 88,52 0,49 - -Backprojection (Hist. HS) 75,00 0,32 85,25 0,47 61,65 3,63Inner movement detection 93,75 0,05 98,36 0,20 89,85 0,02Frame difference (grayscale) 85,16 0,35 88,52 1,17 96,24 0,22Frame difference (RGB) 86,72 0,43 90,16 0,83 95,86 0,21FREAK descriptor 85,94 0,52 93,44 0,00 82,33 0,07Method presented in [7] - - 95 19 95±12 2±6

highest true positive rate in ZJU Eyeblink Database. However it has too many false positives,due to which it is reliable only when the user is relatively calm.

The method based on the FREAK descriptor has the most false positive detections due toglasses and pupil movements. Very poor lighting conditions cause some missed blinks. Thismethod could achieve a better detection rate by more precise pupil detection. Haar CascadeClassifier detects the whole eye area, due to which we often choose as the keypoint the skininstead of the eyes.

Use of Haar Cascade Classifier causes problems to all proposed blink detection methods.Haar-based eye detection often fails in dark images, when the user wears glasses or does notdirectly look at the computer screen. Exact eye or pupil detection could increase the truepositive rate of all presented algorithms.

5 ConclusionWe have presented three different techniques for the eye blink detection, which can run inreal-time. The first method is based on the KLT feature tracker, which tracks eyelid motions.The next method computes a frame difference in grayscale and colour images and the lastone uses FREAK descriptors to detect blinks.

We have compared all methods and tested them on different datasets. The model situationis a user looking towards the computer screen. After analysing all the proposed techniques,we consider the Centre Aligned Movement Detection as the best method. It outperforms themethod in [7]. It provides over 3% better true positive rate (in this dataset it is the differenceof one eye blink detection) and about 18% lower false positive rate in the Talking Face Videodataset. This method represents the best compromise between the high true positive and lowfalse positive rate. In future the blink detection ability of this algorithm can be used in a dryeye prevention system. It will warn the user to blink regularly and in this way it will protectthe eyes from drying.

References[1] Alexandre Alahi, Raphael Ortiz, and Pierre Vandergheynst. Freak: Fast retina key-

point. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pat-


tern Recognition (CVPR), CVPR ’12, pages 510–517, Washington, DC, USA, 2012.IEEE Computer Society. ISBN 978-1-4673-1226-4. URL http://dl.acm.org/citation.cfm?id=2354409.2354903.

[2] Kohei Arai and Ronny Mardiyanto. Comparative study on blink detection and gazeestimation methods for hci, in particular, gabor filter utilized blink detection method.In Proceedings of the 2011 Eighth International Conference on Information Technol-ogy: New Generations, ITNG ’11, pages 441–446, Washington, DC, USA, 2011. IEEEComputer Society. ISBN 978-0-7695-4367-3. doi: 10.1109/ITNG.2011.84. URLhttp://dx.doi.org/10.1109/ITNG.2011.84.

[3] C. Ayudhaya and T. Srinark. A method for a real time eye blink detection and its appli-cations. In The 6th International Joint Conference on Computer Science and SoftwareEngineering (JCSSE), pages 25 – 30, May 2009. URL http://www.cpe.ku.ac.th/~jeab/papers/chinnawat_JCSSE2009.pdf.

[4] Clayton Blehm, Seema Vishnu, Ashbala Khattak, Shrabanee Mitra, and Richard W.Yee. Computer vision syndrome: A review. Survey of Ophthalmology, 50(3):253 – 262, 2005. ISSN 0039-6257. doi: 10.1016/j.survophthal.2005.02.008. URL http://www.sciencedirect.com/science/article/pii/S0039625705000093.

[5] Fabio Lo Castro. Class i infrared eye blinking detector. Sensors and Actuators A:Physical, 148(2):388 – 394, 2008. ISSN 0924-4247. doi: 10.1016/j.sna.2008.09.005. URL http://www.sciencedirect.com/science/article/pii/S0924424708004718.

[6] M. Divjak and H. Bischof. Real-time video-based eye blink analysis for detectionof low blink-rate during computer use. In First International Workshop on TrackingHumans for the Evaluation of their Motion in Image Sequences (THEMIS 2008), pages99–107, September 2008.

[7] M. Divjak and H. Bischof. Eye blink based fatigue detection for prevention of computervision syndrome. In IAPR Conference on Machine Vision Applications (MVA 2009),pages 350–353, May 2009.

[8] K. Grauman, M. Betke, J. Gips, and G.R. Bradski. Communication via eye blinks -detection and duration analysis in real time. In Computer Vision and Pattern Recogni-tion, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conferenceon, volume 1, pages I–1010 – I–1017 vol.1, 2001. doi: 10.1109/CVPR.2001.990641.

[9] R. Heishman and Z. Duric. Using image flow to detect eye blinks in color videos. InApplications of Computer Vision, 2007. WACV ’07. IEEE Workshop on, page 52, feb.2007. doi: 10.1109/WACV.2007.61.

[10] Marc Lalonde, David Byrns, Langis Gagnon, Normand Teasdale, and Denis Lauren-deau. Real-time eye blink detection with gpu-based sift tracking. In Proceedings ofthe Fourth Canadian Conference on Computer and Robot Vision, CRV ’07, pages 481–487, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-2786-8. doi:10.1109/CRV.2007.54. URL http://dx.doi.org/10.1109/CRV.2007.54.


[11] Wang Liting, Ding Xiaoqing, Liu Changsong, and Kongqiao Wang. Eye blink detectionbased on eye contour extraction. In Image Processing: Algorithms and Systems, page72450. SPIE Electronics Imaging, 2009. URL http://link.aip.org/link/?PSISDG/7245/72450R/1.

[12] Tsuyoshi Moriyama, Takeo Kanade, Jeffrey Cohn, Jing Xiao, Z. Ambadar, Jiang Gao,and M. Imanura. Automatic recognition of eye blinking in spontaneously occurringbehavior. In Proceedings of the 16th International Conference on Pattern Recognition(ICPR ’2002), volume 4, pages 78 – 81, August 2002.

[13] T. Morris, P. Blenkhorn, and Farhan Zaidi. Blink detection for real-time eyetracking. J. Netw. Comput. Appl., 25(2):129–143, April 2002. ISSN 1084-8045.doi: 10.1016/S1084-8045(02)90130-X. URL http://dx.doi.org/10.1016/S1084-8045(02)90130-X.

[14] Gang Pan, Lin Sun, Zhaohui Wu, and Shihong Lao. Eyeblink-based anti-spoofing inface recognition from a generic webcamera. In The 11th IEEE International Confer-ence on Computer Vision (ICCV’07), Rio de Janeiro, Brazil, October 2007.

[15] T. Pander, T. Przybyla, and R. Czabanski. An application of detection function for theeye blinking detection. In Human System Interactions, 2008 Conference on, pages 287–291, may 2008. doi: 10.1109/HSI.2008.4581450.

[16] Carlo Tomasi and Takeo Kanade. Detection and tracking of point featurestechnical report cmu-cs-91-132. Image Rochester NY, 91(April):1–22, 1991.URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.5899&rep=rep1&type=pdf.

[17] Paul A. Viola and Michael J. Jones. Rapid object detection using a boosted cas-cade of simple features. In CVPR, pages 511–518. IEEE Computer Society, 2001.ISBN 0-7695-1272-0. URL http://dblp.uni-trier.de/db/conf/cvpr/cvpr2001-1.html#ViolaJ01.

[18] Zheng Yan, Liang Hu, Hao Chen, and Fan Lu. Computer vision syndrome: A widelyspreading but largely unknown epidemic among computer users. Computers in HumanBehavior, 24(5):2026 – 2042, 2008. ISSN 0747-5632. doi: 10.1016/j.chb.2007.09.004. URL http://www.sciencedirect.com/science/article/pii/S0747563207001501.

[19] Lei Yunqi, Yuan Meiling, Song Xiaobing, Liu Xiuxia, and Ouyang Jiangfan. Recog-nition of eye states in real time video. In Computer Engineering and Technology,2009. ICCET ’09. International Conference on, volume 1, pages 554–559, 2009. doi:10.1109/ICCET.2009.105.

Appendix E

Resumé

Appendix E. Resumé Patrik Polatsek

Resumé

1 Úvod

V dnešnej dobe sme obklopení technológiami. Trávime viac a viac hodín s obrazovými za-riadeniami ako pocítace, notebooky, mobily a tablety. Používanie týchto zariadení je castospojené so zrakovými a vizuálnymi problémami.

Väcšina l’udí žmurká 10 – 15-krát za minútu. Avšak frekvencia spontánneho žmurkania pripoužívaní pocítaca výrazne klesá (až 60% zníženie). Žmurkanie pomáha rožširovat’ slznýfilm, zvlhcovat’ a dezinfikovat’ povrch oka, preto znížená miera žmurkania spôsobuje vysy-chanie našich ocí. Medzi typické ocné problémy spôsobené intenzívnou prácou pri pocítaci(viac ako 3 hodiny denne) patrí suchost’, zacervenanie, pálenie, pocit piesku v ociach alebocitlivost’ na svetlo a únava ocí. Tieto symptómy sú taktiež známe ako suché oko, ktoré jenajpodstatnejšou cast’ou tzv. Computer Vision Syndrome (CVS). CVS je súbor problémovsúvisiacich s používaním pocítaca ako sú suché oci, namáhanie ocí, bolest’ hlavy, rozmazanévidenie, bolest’ krku a chrbta. Najjednoduchší spôsob ako sa vyhnút’ symptómom suchéhooka je pravidelne žmurkat’.

Naším ciel’om je navrhnút’ algoritmus, ktorý by detekoval používatel’ove žmurknutia zosnímkov vytvorených webkamerou. V budúcnosti sa môže stat’ súcast’ou aplikácie, ktorábude analyzovat’ používatel’ovu frekvenciu žmurkania. V prípade, že bude nižšia ako prie-mer, upozorní používatel’a, aby žmurkal a tým chránil svoje oci.

1.1 Požiadavky

Rozhodli sme sa vytvorit’ aplikáciu prístupnú pre väcšinu používatel’ov. Nevyžaduje žiadnyšpeciálny ani drahý hardvér. Jediné periférne zariadenie, ktoré naše riešenie používa na de-tekciu žmurknutí je webkamera s dostatocným snímkovaním na sledovanie tváre a ocí v re-álnom case.

Urcili sme niekol’ko požiadaviek na algoritmus na detekciu žmurkania:

• detekovanie žmurkania používatel’a, ktorý sa pozerá na obrazovku pocítaca,

• vysoká miera detekcie žmurkania a malý pocet nezachytených žmurknutí,

• malé hardvérové požiadavky,

• nízke výpoctové nároky,

• prijatel’ný pomer medzi rýchlost’ou algoritmu a kvalitou detekcie.

2 Computer Vision Syndrome

Pocet l’udí, ktorí používajú pocítac, sa zvyšuje každým dnom. Vysoké percento z nich trpíCVS. Ide o relatívne nový lekársky termín. Je to skupina symptómov vyplývajúcich z práce

72


s pocítacom. Viac ako 40% Európanov používa pocítac a 70% pocítacových používatel’ovna celom svete zažíva niektoré potiaže spojené s CVS.

2.1 Suché oko

Povrch oka musí byt’ pokrytý slzným filmom. Pri každom žmurknutí slzy pokryjú celý po-vrch oka, cím ho zvlhcujú a odplavia prach a mikroorganizmy.

Poruchy l’ubovol’nej vrsty slzného filmu alebo nedostatocné množstvo slz vedie k jednémuz najtypickejších symptómov CVS – suché oko, ktoré je prícinou únavy ocí. Dnes sa tentoproblém týka 15% – 17% populácie.

3 Analýza dostupných riešení

V tejto casti prezentujeme niektoré metódy na detekciu žmurkania. Väcšina z týchto techníkpozostáva z 3 krokov:

• detekcia tváre,

• detekcia oka,

• detekcia žmurkania.

Metóda optického toku sa používa na urcenie pozícii príznakov na nasledujúcej snímke.Casto sa používa na odhadnutie pohybu medzi dvoma obrázkami. Optický tok môžeme po-užit’ na sledovanie pohybu ocných viecok a následne detekovat’ potenciálne žmurknutie.

Pohyby ocných viecok môžeme sledovat’ aj normálovým tokom. Optický tok spája informá-cie z obrazových regiónov a vypocítava celkový pohyb v obrázku, zatial’ co normálový tokmôže byt’ vypocítaný len s použitím lokálnej informácie. Normálový tok je cast’ optickéhotoku, ktorý je kolmý na sklon jasu v obrázku alebo hranu.

Gaborov filter je lineárny filter používaný pri spracovaní obrazu na detekciu hrán. Pre našeúcely môže byt’ použitý na extrakciu kontúr v oblasti oka. Žmurkania sú potom detekovanémeraním vzdialenosti medzi horným a dolným vieckom. Táto vzdialenost’ je odlišná prizatvorenom a otvorenom oku.

Rozptylová mapa špecifikuje odchýlenie intenzít od priemernej hodnoty v sekvencii obrazov.Intenzita pixelov v oblasti oka sa pocas žmurkania mení, co sa môže využit na detekciužmurkania.

Korelácia opisuje stupen závislosti medzi dvoma hodnotami. Normalizovaný korelacný ko-eficient oznacuje podobnost’ medzi vzorkou a výsekom obrázku rovnakej vel’kosti. Ak me-riame koreláciu medzi aktuálnym okom a vzorkou s otvoreným okom, tento koeficient re-prezentuje „otvorenost” oka.

Další spôsob detekcie je založený na fakte, že horná a dolná cast’ oka má odlišné rozdeleniepriemerných intenzít pri žmurknutí.

Na detekciu žmurkania môže byt’ použitý deformovatel’ný model – Active Shape Model,ktorý reprezentuje tvar oka. Tento model sa upraví v aktuálnom snímku a tým získame novýtvar oka. Na základe zmeranej vzdialenosti medzi vieckami sa môže detekovat’ žmurknutie.

73


Nasledovná technika detekuje žmurknutia vypocítaním tzv. hodnoty detekujúcej stav ocnéhoviecka (ESD). Algoritmus zvyšuje prahovú hodnotu, pokým výsledná snímka nemá asponjeden cierny pixel po aplikovaní median blur filtra. Táto hodnota nazývaná ESD sa mení prižmurknutí.

Niektoré metódy detekujú žmurkania prostredníctvom kruhovej Houghovej transformácie,ktorá extrahuje kružnice z obrázku. Týmto spôsobom sa môže detekovat’ dúhovka. Absenciadúhovky sa považuje za žmurknutie.

Detektor žmurkania môže využit’ infracervený (IR) senzor. Vysiela IR lúce do oka a prijímaodrazené lúce. Množstvo odrazených lúcov je rozdielne v prípade otvoreného a zatvorenéhooka.

Na detekciu môže byt’ použitá elektrookulografia. Elektrookulografický signál je založenýna elektrickom potenciále medzi rohovkou a sietnicou. Ako sa oci pri žmurknutí zatvárajú,potenciál okolo ocí sa zmení ako výsledok pohybov svalov vo vieckach.

4 Analýza použitých princípov

V tejto casti opíšeme štyri odlišné metódy na detekciu žmurkania.

4.1 Histogramová spätná projekcia

V tejto metóde používame histogram na reprezentáciu farby kože používatel’a. Histogramováspätná projekcia vytvorí pravdepodobnostnú mapu z obrázku. Inými slovami spätná projek-cia urcuje ako dobre pixely z obrázku zapadajú do daného histogramu. Vyššia hodnota spät-nej projekcie znací pravdepodobnejšiu oblast’ daného objektu. Žmurkanie detekujeme akovyššie percento pixelov farby kože v oblasti ocí, inak považujeme oci za otvorené.

Pre tento algoritmus sme použili 2D hue-saturation a 1D saturation histogram.

Najprv detekujeme tvár prostredníctvom Haarových príznakov. Vypocítame histogram kožez oblastí tváre, ktorý je normalizovaný a v pravidelných intervaloch aktualizovaný. Pre každýobrázok vypocítame z tohto histogramu spätnú projekciu. Následne ju upravíme použitímmorfologickej operácie Erode a prahovej hodnoty. Nakoniec vypocítame priemernú hodnotuz pravdepodobností v oblasti ocí. Výrazný nárast tejto hodnoty považujeme za žmurknutie.

4.2 Detekcia centrálne zarovnaného pohybu

Predstavujeme našu vlastnú metódu nazvanú Detekcia centrálne zarovnaného pohybu, ktoráje založená na optickom toku. Jeden z najznámejších algoritmov na detekciu pohybu sa voláLucas-Kanade (KLT) sledovac. KLT sledovac vyberá príznaky vhodné na sledovanie s vy-sokými zmenami intenzity v oboch smeroch. Následne meria podobnost’ medzi zvolenýmbodom a jeho najpravdepodobnejšou pozíciou v d’alšom obrázku. Ak je korelácia nižšia nežprahová hodnota, považujeme body za stratené.

Ak používatel’ žmurkne, priemerné posunutie príznakov v oblasti ocí by malo byt’ vyššieako posunutie v ostatných oblastiach tváre.

74


Na zaciatku detekujeme tvár a oci cez Haarove príznaky na obrázku v odtienoch šedej. V ob-lasti ocí a nosa zvolíme náhodne body, klasifikujeme ich a sledujeme KLT sledovacom. Sle-dovanie je znova inicializované v pravidelných intervaloch alebo v prípade straty mnohýchpríznakov.

Pre každú skupinu bodov vypocítame ich priemerné posunutie a porovnáme ich rozdiel. Akje vyšší než prahová hodnota, v ociach nastal pohyb. Následne spocítame pomer ocných bo-dov, ktoré sa posunuli o urcitú vzdialenost’ nadol v smere osi y. Ak je tento pomer dostatocnevysoký, detekovali sme žmurknutie.

4.3 Snímkový rozdiel

Snímkový rozdiel vypocítava rozdiel medzi dvoma obrázkami. Táto metóda používa sním-kový rozdiel pri šedotónových a farebných obrázkoch.

Zistili sme, že pri žmurknutí je tento rozdiel niekol’konásobne vyšší v ociach v porovnaní sozvyškom tváre.

Metóda najprv detekuje tvár a oci pomocou Haarových príznakov. Ich pozíciu potom udr-žiava pomocou KLT sledovaca. V oblasti tváre vypocítame snímkový rozdiel medzi aktuál-nym a predposledným obrázkom, ktorý upravíme prostredníctvom prahovej hodnoty a ope-rácie Erode.

Pre každý snímkový rozdiel vypocítame priemernú hodnotu zvlášt’ pre oci a zvyšok tváre.Ak tieto hodnoty splnajú stanovené podmienky, zaznamenali sme žmurkanie.

4.4 FREAK deskriptor

Posledná metóda používa na detekciu binárne FREAK (Fast Retina Keypoint) deskriptory.FREAK deskriptor je vytvorený porovnávaním intenzít v okolí kl’úcového bodu v oblastiachinšpirovanými l’udskou sietnicou.

Pri žmurknutí sa okolie bodov výrazne zmení a tým sa zvýši aj hammingova vzdialenost’medzi deskriptormi vypocítanými z otvoreného a zavretého oka.

Táto metóda sleduje oci podobne ako predchádzajúca metóda. V týchto oblastiach zvolímekl’úcové body do mriežky a vypocítame z nich FREAK deskriptory. Pre každý deskriptorpotom nájdeme najpodobnejší v nasledujúcom obrázku na základe ich hammingovej vzdia-lenosti. Hammingova vzdialenost’ potom slúži aj ako prahová hodnota pre detekciu žmurka-nia.

5 Hodnotenie a diskusia

Všetky spomínané algoritmy sme otestovali na troch rôznych datasetoch a zhodnotili ichsilné a slabé stránky. Na základe výsledkov odporúcame na detekciu žmurkania použit’Detekciu centrálne zarovnaného pohybu, ktorá dosahuje celkovo najlepšiu mieru detekcieskutocných a falošných žmurknutí. Pri verejne dostupnom datasete Talking Face Video odINRIA detekuje viac žmurkaní než metóda navrhovaná autormi Divjak a Bischof.

75


Na základe testovaní sme zistili, že všetky navrhované algoritmy na detekciu žmurkaniapracujú v reálnom case.

6 Záver

V tejto práci sme vytvorili štyri rôzne algoritmy na detekciu žmurkania založených na spät-nej projekcii, optickom toku, snímkových rozdieloch a FREAK deskriptoroch. Za najlep-šiu metódu považujeme Detekciu centrálne zarovnaného pohybu, ktorá predstavuje najlepšíkompromis medzi poctom detekovaných a falošných žmurkaní. Modelovou situáciou bolpoužívatel’, ktorý pozerá na monitor. V budúcnosti môže byt’ tento algoritmus upravený,aby fungoval aj pri iných situáciách a neskôr sa môže stat’ súcast’ou systému na prevenciusuchého oka. Jeho schopnost’ detekovat’ žmurkania sa môže použit’ na analýzu frekvenciežmurknutí a v prípade potreby upozorní používatel’a, aby žmurkal castejšie a tým chránilsvoje oci pred vysychaním.

76

Appendix F

DVD Contents

/source/ - source code of project solution/dataset/ - image sequences from our own dataset/pdf/ - pdf version of this bachelor thesis

Documents

Faculty of Informatics and Information Technologiesfogelton/projects/PolatsekBachelorThesis.pdfFACULTY OF INFORMATICS AND INFORMATION TECHNOLOGIES Degree Course: INFORMATICS Author: