UNIVERSITI PUTRA MALAYSIA GESTURE RECOGNITION USING …psasir.upm.edu.my/5907/1/FK_2004_25(1-24).pdf · UNIVERSITI PUTRA MALAYSIA GESTURE RECOGNITION USING WEB CAMERA LEW YUAN POK

UNIVERSITI PUTRA MALAYSIA

GESTURE RECOGNITION USING WEB CAMERA

LEW YUAN POK.

FK 2004 25


By

LEW YUAN POK

n 3 1 . , * , 4': t

Thesis Submitted to the School of Graduate Studies. . - Universiti Putra Malaysia in Fulfillment of the

Requirements for the Degree of Master of Science

February 2004

DEDICATION

My parents : Lew Kwan Shin & Lee Se Moy

My brothers and sister : Yuan Huai. Yuan Sing & Yuan Yee

and Yien Yien

Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfillment of the requirements for the degree of Master of Science


LEW YUAN POK

February 2004

Chairman : Associate Professor Hj. Abdul Rahman Ramli, Ph.D.

Faculty : Engineering

Gesture recognition represents an important ability by which a computer is able to

directly accept human gesture as input to trigger different actions just like

conventional input devices such as keyboard. mouse, joystick and etc. As the Human

Computer Interaction (HCI) progresses over the years, emphasis is placed more on

developing input devices which are most convenient and easy to use. Human gesture

is not only natural and intuitive to a user, but can also represent motions of high

degree of freedom which is of utmost importance in many applications especially in

virtual reality.

This thesis presents the design of an offline system which is capable of recognizing

hand postures fiom the visual input of a web camera. The hand segmentation is based

on image subtraction technique and a skin color modeling process. Fourier

iii

descriptors are used as the features to describe the geometry of different hand

postures while the recognition process is based on minimum distance classifier. The

results obtained indicate that the system is able to recognize hand postures with

reasonable accuracy.

Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia untuk memenuhi keperluan untuk Ijazah Master Sains

PENGECAMAN PERGERAKAN BERPANDUKAN KAMERA WEB

Oleb

LEW YUAN POK

Februari 2004

Pengerusi

Fakulti

: Profesor Madya Hj. Abdul Rahman Ramli, Ph.D.

: Kejuruteraan

Pengecaman pergerakan merupakan sesuatu keupayaan yang penting di mana sebuah

komputer dapat menerirna pergerakan manusia sebagai input untuk melanwkan

pelbagai tindakan seperti yang dilakukan oleh peranti input tradisional misalnya

papan kekunci. tetikus, kayu bedik dan sebagainya. Seiring dengan perkembangan

cam interaksi antara manusia dengan komputer selama ini, lebih banyak penekanan

diberikan kepada pembangunan peranti input yang berguna dan senang dipakai.

Pergerakan manusia bukan sahaja semulajadi clan intuitif kepada seorang pengguna

komputer, tetapi juga &pat mewakili pergerakan yang berdarjah kebebasan tinggi,

menjadikannya arnat penting dalam banyak aplikasi terutamanya kenyataan maya.

Tesis ini membentangkan proses rekabentuk sebuah sistem luar talian yang mampu

mengecam keadaan tangan daripada input visual sebuah karnera web. Segmentasi

tangan adalah berdasarkan kaedah penolakan irnej dan sebuah proses permodelan

v

warna kulit. Deskriptor Fourier pula digunakan sebagai penghurai untuk

menggambarkan geometri keadaan tangan yang beriainan manakala proses

pengecaman adalah berpandukan pengelas jarak minimum. Keputusan yang

diperoleh menunjukkan bahawa sistem ini dapat mengecam keadaan tangan dengan

ketepatan yang rnunasabah.

ACKNOWLEDGEMENTS

My deepest thanks to my supervisor, Dr. Hj. Abdul Rahman Ramli, without

whom this project would never have materialized in the first place. He has constantly

encouraged me and given me clear guidance to make sure that I am able to complete

the thesis within a reasonable time frame. Also to both my co-supervisors, Dr.

Veeraraghavan Prakash and Mrs Roslizah Ali who have given me valuable advices

throughout the whole period of the writing of this thesis.

A very big thanks to Qussay Abbas Salih al-Badri. my senior in the research

group, who has been most generous in sharing his ideas and experiences. Also to my

colleagues. Koay Su Yeong and Beh Kok Siang who have provided me valuable

feedbacks on my thesis.

Not to mention also all the lecturers and staffs of Department of Computer

and Communication Systems Engineering as well as members of Multimedia and

Intelligent Systems Research Group who have all played their supporting roles.

Last but not least. 1 would like to thank my parents and family members who

have been most supportive in each and every way I could think of. Also, I owe many

thanks to Yien Yien, who has stood behind me no matter how harsh the situation is.

vii

I certify that an Examination Committee met on loth February 2004 to conduct the final examination of Lew Yuan Pok on his Master of Science thesis entitled "Gesture Recognition Using Web Camera" in accordance with Universiti Pertanian Xlalaysia (Higher Degree) Act 1980 and Universiti Pertanian Malaysia (Higher Degree) Regulations 1981. The Committee recommends that the candidate be awarded the relevant degree. Members of the Examination Committee are as follows:

Mohamad Khazani Abdullah, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Chairman)

Hj. Abdul Rahman Ramli, Ph.D. Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)

Veeraraghavan Prakash, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Member)

Roslizah .41i Faculty of Engineering Universiti Putra Malaysia (Member)

~ r o f e s s o r . ~ e ~ u t ) \ ~ e ~ School of Graduate Studies Universiti Putra 3lala~sia

Date: 1 7 JUN 2004

This thesis submitted to the Senate of Universiti Putra Malaysia and has been accepted as fi~lfillment of the requirements for the degree of Master of Science. The members of the Supervisory Committee are as follows:

Associate Professor Hj. Abdul Rahrnan Ramli, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Chairman)

Veeraraghavan Prakash, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Member)

Roslizah Ali Faculty of Engineering Universiti Putra Malaysia (Member)

Y AINI IDERIS, Ph.D. Professor/Dean School of Graduate Studies Universiti Putra Malaysia

Date : 0 8 SEP 2004

DECLARATION

1 hereby declare that the thesis is based on my original work except for quotations and citations which have been duly acknowledged. I also declare that it has not been previously or con-currently submitted for any other degree at UPM or other tertiary institution.

LEW YUAN POK

TABLE OF CONTENTS

DEDICATION ABSTRACT ABSTRAK ACKNOWLEDGEMENTS APPROVAL DECLARATION LIST OF FIGURES LIST OF TABLES LIST OF ABBREVIATIONS

CHAPTER

INTRODLTCTION 1 . 1 Background

1 . 1 . 1 Gesture Recognition 1.1.2 Web Camera Objectives Thesis Organization

LITERATURE REVIEW 2.1 Machine Vision

2.1.1 Working Principles of Machine Vision 2.1.2 Examples of Machine Vision Applications Computer Vision Gesture Recognition 2.3.1 Vision Based Gesture Recognition 2.3.2 Gesture Recognition Using Other Sensors Static Hand Gesture Recognition Dynamic Hand Gesture Recognition Hand Segmentation Methods Feature Extraction Methods Gesture Recognition Methods 2.8.1 Template Matching 2.8.2 Neural Network Methods 2.8.3 Statistical Methods 2.8.3 Rule-based Methods

3 METHODOLOGY 3.1 Overview 3.2 Hand Segmentation

3.2.1 lmage Subtraction 3.2.2 Segmentation for Gray Level Image 3.2.3 Segmentation for RGB Color Image 3.2.4 Segmentation for Normalized RGB Color Image 3.2.5 Skin Color Modeling

ii . . . I l l

v vii . . . Vl l l

X ... Xll l

xv svi

3.2.6 Segmentation with Skin Color Model ~e~resentation and Description 3.3.1 Preprocessing

3.3.1.1 Holes Filling 3.3. 1 2 Morphological Opening 3.3.1.3 Median Filtering Hand Representation 3.3.2.1 Boundary Extraction 3.3.2.2 Boundary Correction 3.3.2.3 Boundary Points Selection

3.3.3 Fourier Descriptors Recognition 3.4.1 Cluster Analysis 3 -4.2 K-means Clustering 3.4.3 Training Phase

3.4.3. I Preparation 3.4.3.2 Clustering Recognition Phase 3.4.4.1 Minimum Distance Classifier

RESULTS AND DISCUSSIONS 4. I Comparison of Segmentation Techniques

4.1.1 Segmentation for Gray Level Image 4.1.2 Segmentation for RGB Color Image 4.1.3 Segmentation for Normalized RGB Color Image 4.1.4 Summary Segmentation with Skin Color Model Preprocessing 4.3.1 Holes Filling 4.3.2 Morphological Opening 4.3.3 Median Filtering Hand Representation 4.4.1 Boundary Extraction 4.4.2 Boundary Correction 4.4.3 Boundary Points Selection Fourier Descriptors K-means Clustering of Training Samples Classification of Training Samples Classification of Testing Samples

COSCLUSION AND FUTURE RESEARCH 5.1 Summaq C 7 d .- Future Research Directions

REFEREWES APPEhDICES BIODATA OF THE AUTHOR

LIST OF FIGURES

Figure

2.1 Fiber Optic Inspection System (developed by Dynamic Motion Control)

Multiple Wavelength Images of A Single Apple

Elements Of Computer Vision Process

Static Hand Gesture Recognition System Block Diagram

Hand Segmentation Flow Chart

Original RGB Color Image

Illustration of the Concept in Image Subtraction

Gra). Level Image

Skin Color Modeled As Normal Distributions

Hand Representation and Description Flow Chart

Examples of Six Hand Postures

Illustration of the Block Operations Used in Holes Filling

Elements in Morphological Opening

Median Filtering Operation

Examples of Image with High Potential of Being Misclassified

Parameters Used in Correcting the Boundary

A Digital Boundary and Its Representation As A Complex Sequence

Hand Posture Recognition Process Flow Chart

Samples of Five Hand Images Used for Segmentation

Samples of Hand Images Representing Six Different Postures

Page

14

Segmentation Result Using Skin Color Modeling for Sample 6 - 106

xiii

Sample I I

Holes Filling Result for Sample 6 - Sample I 1

Morphological Opening Result for Sample 6 - Sample 1 1

Median Filtering Result for Sample 6 - Sample 1 1

Boundary Extraction Result for Sample 6 - Sample 1

Boundary Correction Result for Sample 6 - Sample I I

Hand B o u n d q Represented by Selected Points for Sample 6 - Sample 1 1

Training Samples for K-means Clustering

Testing Samples Used to Analyze the Recognition Performance

Euclidean Distance between Cluster Centers and Testing Images (a) - (h) in Table 4.9

Distortion in Hand Boundary Extracted fiom Testing Image in Table 4.9(f)

Euclidean Distance between Cluster Centers and Testing Images (a)-(f) in Table 4.8

xiv

LIST OF TABLES

Table

4.1

Page

Segmentation for Samples of Five Gray Level Images Based on 97 the Threshold Value of 10%. 15% and 20% Respectively

Segmentation for Samples of Five Images in the RGB Color Space 99 Based on the Threshold Value of 10% 15% and 20% Respectively

Segmentation for Samples of Five Images in the Normalized RGB 10 1 Color Space Based on the Threshold Value of 10%. 15% and 20% Respectively

Segmentation for Samples of Five Images Based on the Skin 103 Color Model Applying Threshold Value of 10%, 15% and 20% Respectivel).

Coordinates of Boundq Points for Sample 6

Steps for Deriving Fourier Descriptors

Total Squared Error (TSE) for Individual Cluster

Examples of Classification Result

Examples of Falsely Classified Testing Images

2 D 3 D m A1 CCD CMOS DFT DTW FSM FD GUI HMM HCI IR IC MIPS NIR PC QE RBFNN Rx's RNN RGB ROI USB VLSI VE

LIST OF ABBREVIATIONS

Two Dimension Three Dimension Analogue to Digital Artificial Intelligence Charge Couple Device Complex Metal Oxide Semiconductor Discrete Fourier Transform Dynamic Time Warping Finite State Machine Fourier Descriptors Graphical User Interface Hidden Markov Model Human Computer Interaction Infrared Integrated Circuit Millions of Instructions per Second Near Infrared Personal Computer Quantum Eficiency Radial Basis Function Neural Network Receivers Recurrent Neural Network Red, Green, Blue Region of Interest Universal Serial Bus Very Large Scale Integration Virtual Environment

xvi

CHAPTER 1

1.1 Background

The evolution of the interaction bemeen computer and human or better

known as Human Computer Interaction (HCI) has always been interesting. The first

generation computer user interface is a text based interface where user interact by

typing relevant commands. By typing the commands through keyboard, the computer

will perform relevant actions and inform the user of the outcome.

Then came the Graphical User Interface (GUI) where the main medium of

interaction are symbolic graphics called icons. Each icon is assigned a specific

meaning intended for specific application or certain action (Webopedia, 2003). In

order to carry out a specific action or to run a specific application, the user can just

click on the relevant icon by using a mouse. Texc is still dominant in this second

generation computer user interface as it is impossible to convert every line of

command into icon, however, the use of text is minimized or avoided when

necessap.

Nevertheless, it was soon realized that no matter how powerful the two kinds

of user interface are, they will never be able to replace the way human communicate

most naturally, i.e. through the use of gesture. As applications shift fiom 2D to 3D

and eventually Virtual Environment (VE), the use of conventional input devices such

as keyboard, mouse, trackball or joystick becomes more awkward and inconvenient

than ever. It was then understood that nothing is able to navigate better in such

applications than the human gesture itself. The human gesture is not only natural and

intuitive to a user, it can also represent motions of high degree of freedom impossible

to achieve using other input devices.

Owing to the many advantages displayed by the use of human gesture as a

powerful input device and the potential applications, gesture recognition has become

an important research field in recent years. Covering the diverse fields of motion

modeling, motion analysis, pattern recognition, machine learning and even

psycholinguistic studies, gesture recognition focuses on the study of human posture

and movement of different parts of body such as hand and head (Wu and Huang,

1999). These also become the major motivation of this thesis to study various aspects

of the vision based static hand gesture recognition system and to implement a system

which can eventually be used as input in VE.

1.1.1 Gesture Recognitioo

Gestures are loosely defined by different researchers in different disciplines.

From the point of view of biologists, the notion of gesture is to embrace all kinds of

instances where an individual engages in movements whose communicative intent is

paramount, manifest, and openly acknowledged (Nespoulous, 1986). This includes

pointing at an object to indicate a reference, putting a thumb up to indicate consent or

2

approval or conveying abstract meaning to one another using sign language. Gesture

recognition on the other hand can be defined as the ability to identify specific human

gestures by a machine or computer and using them to convey information or for

controlling a device.

Gestures are important part of human communication where large amount of

information is encoded in vet compact form such as specific hand shape or

orientation accompanied by certain movement. The ability to recognize these

gestures represents an oppommity to decode the information conveyed most

naturally by human. It is an alternative way of interaction between human and

computer where little learning is required on the user as gesture is part of human's

daily life communication.

Teaching a computer to recopize human gestures has been a goal of pursuit

by various researchers since the 1980's. Earlier efforts in this research are focused on

using a mechanical glove to capture the human gestures. On the mechanical glove,

sensors are used to measure the bending angle of various joints of the hand and the

pressure exerted b) the palm. The different bending angle measurement will

correspond to different geometric representation of the hand. In order to determine

the position and orientation of the forearm in space, tracking sensors are used.

However, the use of the mechanical glove is considered intrusive and unnatural to a

user.

The focus in gesture recognition research later shifis to vision based gesture

in conjunction with the advances made in the field of machine vision and

3

computer vision. Vision based gesture recognition is a gesture recognition approach

based on visual input. Imaging device such as camera is used to capture image

sequences of human motion. Geometric properties of human hand or head together

with their trajectories are then extracted from the image sequences to reconstruct

human gesture. The vision based approach proves to be an attractive alternative since

it is natural and not intrusive.

Earlier vision based gesture recognition system uses gray level image

sequences as the input. In 1992, first attempt was made to recognize hand gesture

from color video signals in real time. This was no coincidence as it is also the year

where the first frame gabber for color video signal is available (Kohler, 2002). This

marks the gradual shift of gesture recognition research from gray level video input to

the color video input. The presence of color provides an additional and powerful cue

for researchers especially in the area of segmentation.

Presently, gesture recognition still remains an active area of research which is

still in infancy. As different applications are put to the drawing board, more gesture

recognition techniques are still being improved, developed and created. This includes

areas in segmentation, tracking, motion modeling, feature extraction and recognition

techniques. Since there is no single known technique which can cater for all the

situations to be encountered, gesture recognition is highly problem orientated.

RRPUSTAKAAN SULTAN ABDUL StW4fI UNiVERBCTl PUTRA MALAYW

1.1.2 Web Camera

As explained in the Memam-Webster Online dictionary (2003). a web

camera or webcam is a camera used in transmitting live images over the World Wide

Web. Generally speaking, any video camera or digital camera that can perform this

function qualifies as a web camera. The term refers more specifically to the bct ion

perform rather than the hardware itself as suggested by the definition.

Over the years, as hosting live images in the World Wide Web becomes more

and more popular, a class of camera which is dedicated to this fiurction mainly plus

other additional features begins to emerge in the market and redefine the term, web

camera. It is a class of low cost camera usuilly in the range from above RMlOO to

RM 400 which is connected to a PC through parallel port or USB port also known as

pc camera.

Web camera is a class of camera which either use Charge Couple Device

(CCD) or Complex Metal Oxide Semiconductor (CMOS) as the photo sensor. It

usually has optical resolution of not more than 640x480 pixels and can produce

digital still image or video which can be directly stored inside a pc. Compared to the

conventional CCD camera which still use analogue signal, web camera proves to be

more convenient as no extra image capture card is needed to do the Analogue to

Digital (AID) conversion. However, lower sensitivity and higher noise level are the

two major drawbacks of web camera compared to higher quality CCD camera as

justified by its lower cost.

Despite the shortcomings in the ability of web camera, it is still an attractive

tool for performing gesture recognition especially when it is readily available in the

market at a substantially lower price compared to other CCD camera. Web camera is

still adequate to perform real-time gesture recognition as the key to its successful

implementation lies not in the camera but in the processing speed of the computer

carrying out the necessary manipulation of the image and the algorithm which

determines how to manipulate the image effectively and efficiently.

1.2 Objectives

Vision based static hand gesture recognition is a complex task which is

consisted of a series of structured steps. The aim of this thesis is to study the

necessary steps of vision based static hand gesture recognition which will eventually

lead to successful recognition of the human hand gestures off line using web camera.

The main objective of this thesis is to design a vision based static hand

gesture recognition system using MATLAB which is able to perform recognition of

six different sets of predefined hand postures.

The secondary objective of this thesis is to study the techniques involved in

various stages of hand gesture recognition task and to propose modification

whenever necessary.

1.3 Organization

The main issue to be addressed in this thesis is vision based static hand

gesture recognition. Omine recognition of six different sets of hand posture is

implemented using MATLAB 6.1 on a Pentium 111 667 with 256MB of memory. A

web camera is employed as the imaging device for capturing different hand postures.

The thesis is organized as follows:

Chapter 2 lays a background on the whole subject of computer vision and

machine vision. Detailed discussions are then given one gesture recognition

namely on vision based gesture recognition including facial gestures and hand

gestures. Hand gesture recognition is then divided into two major areas which

are static hand gesture recognition and dynamic hand gesture recognition

respectively. Major differences between the two areas are highlighted

especially in terms of the different approaches adopted. Different techniques

used for the various stages of the hand gesture recognition ranging from

segmentation, features extraction and gesture recognition technique are then

treated in details.

Chapter 3 presents the methodology involved in constructing the vision based

static hand gesture recognition system. Discussions however focus on the

three most important rntijor steps of the system namely: (1)hand

segmentation; (2)representation and description and (3)recognition. Hand

segmentation technique is adopted to extract hand region from the visual

input. Segmentation is done in gray level image, Red, Green, Blue (RGB)

color space and normalized RGB color space. A skin color modeling

techniques using the property of human skin color in color space is then

proposed. Representation and description on the other hand explains the

methods used starting from preprocessing of the extracted hand region until a

set of usehl features are extracted to represent the different hand postures.

Recognition 5 presents the techniques used in classifying the feature vectors

for the six different hand postures. Details on the steps used for implementing

K-means clustering and the recognition method used are presented as well.

Chapter 4 shows the results obtained from segmentation in different color

space. The results for segmentation using the proposed skin color model are

illustrated followed by the preprocessing, representation and eventually

recognition based on the set of features proposed.

Chapter 5 summarizes the thesis by restating the common problems arise in

different stages of the gesture recognition task, thus laying down the

requirements for a successful hand gesture recognition task as a whole and to

check how much this thesis has helped to answer these requirements. For the

questions and problems not addressed in other part of the thesis, they will be

addressed in the future research directions.

Documents

UNIVERSITI PUTRA MALAYSIA GESTURE RECOGNITION USING …psasir.upm.edu.my/5907/1/FK_2004_25(1-24).pdf · UNIVERSITI PUTRA MALAYSIA GESTURE RECOGNITION USING WEB CAMERA LEW YUAN POK