Upload
hakhue
View
216
Download
0
Embed Size (px)
Citation preview
UNIVERSITI PUTRA MALAYSIA
GESTURE RECOGNITION USING WEB CAMERA
LEW YUAN POK.
FK 2004 25
GESTURE RECOGNITION USING WEB CAMERA
By
LEW YUAN POK
n 3 1 . , * , 4': t
Thesis Submitted to the School of Graduate Studies. . - Universiti Putra Malaysia in Fulfillment of the
Requirements for the Degree of Master of Science
February 2004
DEDICATION
My parents : Lew Kwan Shin & Lee Se Moy
My brothers and sister : Yuan Huai. Yuan Sing & Yuan Yee
and Yien Yien
Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfillment of the requirements for the degree of Master of Science
GESTURE RECOGNITION USING WEB CAMERA
LEW YUAN POK
February 2004
Chairman : Associate Professor Hj. Abdul Rahman Ramli, Ph.D.
Faculty : Engineering
Gesture recognition represents an important ability by which a computer is able to
directly accept human gesture as input to trigger different actions just like
conventional input devices such as keyboard. mouse, joystick and etc. As the Human
Computer Interaction (HCI) progresses over the years, emphasis is placed more on
developing input devices which are most convenient and easy to use. Human gesture
is not only natural and intuitive to a user, but can also represent motions of high
degree of freedom which is of utmost importance in many applications especially in
virtual reality.
This thesis presents the design of an offline system which is capable of recognizing
hand postures fiom the visual input of a web camera. The hand segmentation is based
on image subtraction technique and a skin color modeling process. Fourier
iii
descriptors are used as the features to describe the geometry of different hand
postures while the recognition process is based on minimum distance classifier. The
results obtained indicate that the system is able to recognize hand postures with
reasonable accuracy.
Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia untuk memenuhi keperluan untuk Ijazah Master Sains
PENGECAMAN PERGERAKAN BERPANDUKAN KAMERA WEB
Oleb
LEW YUAN POK
Februari 2004
Pengerusi
Fakulti
: Profesor Madya Hj. Abdul Rahman Ramli, Ph.D.
: Kejuruteraan
Pengecaman pergerakan merupakan sesuatu keupayaan yang penting di mana sebuah
komputer dapat menerirna pergerakan manusia sebagai input untuk melanwkan
pelbagai tindakan seperti yang dilakukan oleh peranti input tradisional misalnya
papan kekunci. tetikus, kayu bedik dan sebagainya. Seiring dengan perkembangan
cam interaksi antara manusia dengan komputer selama ini, lebih banyak penekanan
diberikan kepada pembangunan peranti input yang berguna dan senang dipakai.
Pergerakan manusia bukan sahaja semulajadi clan intuitif kepada seorang pengguna
komputer, tetapi juga &pat mewakili pergerakan yang berdarjah kebebasan tinggi,
menjadikannya arnat penting dalam banyak aplikasi terutamanya kenyataan maya.
Tesis ini membentangkan proses rekabentuk sebuah sistem luar talian yang mampu
mengecam keadaan tangan daripada input visual sebuah karnera web. Segmentasi
tangan adalah berdasarkan kaedah penolakan irnej dan sebuah proses permodelan
v
warna kulit. Deskriptor Fourier pula digunakan sebagai penghurai untuk
menggambarkan geometri keadaan tangan yang beriainan manakala proses
pengecaman adalah berpandukan pengelas jarak minimum. Keputusan yang
diperoleh menunjukkan bahawa sistem ini dapat mengecam keadaan tangan dengan
ketepatan yang rnunasabah.
ACKNOWLEDGEMENTS
My deepest thanks to my supervisor, Dr. Hj. Abdul Rahman Ramli, without
whom this project would never have materialized in the first place. He has constantly
encouraged me and given me clear guidance to make sure that I am able to complete
the thesis within a reasonable time frame. Also to both my co-supervisors, Dr.
Veeraraghavan Prakash and Mrs Roslizah Ali who have given me valuable advices
throughout the whole period of the writing of this thesis.
A very big thanks to Qussay Abbas Salih al-Badri. my senior in the research
group, who has been most generous in sharing his ideas and experiences. Also to my
colleagues. Koay Su Yeong and Beh Kok Siang who have provided me valuable
feedbacks on my thesis.
Not to mention also all the lecturers and staffs of Department of Computer
and Communication Systems Engineering as well as members of Multimedia and
Intelligent Systems Research Group who have all played their supporting roles.
Last but not least. 1 would like to thank my parents and family members who
have been most supportive in each and every way I could think of. Also, I owe many
thanks to Yien Yien, who has stood behind me no matter how harsh the situation is.
vii
I certify that an Examination Committee met on loth February 2004 to conduct the final examination of Lew Yuan Pok on his Master of Science thesis entitled "Gesture Recognition Using Web Camera" in accordance with Universiti Pertanian Xlalaysia (Higher Degree) Act 1980 and Universiti Pertanian Malaysia (Higher Degree) Regulations 1981. The Committee recommends that the candidate be awarded the relevant degree. Members of the Examination Committee are as follows:
Mohamad Khazani Abdullah, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Chairman)
Hj. Abdul Rahman Ramli, Ph.D. Associate Professor Faculty of Engineering Universiti Putra Malaysia (Member)
Veeraraghavan Prakash, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Member)
Roslizah .41i Faculty of Engineering Universiti Putra Malaysia (Member)
~ r o f e s s o r . ~ e ~ u t ) \ ~ e ~ School of Graduate Studies Universiti Putra 3lala~sia
Date: 1 7 JUN 2004
This thesis submitted to the Senate of Universiti Putra Malaysia and has been accepted as fi~lfillment of the requirements for the degree of Master of Science. The members of the Supervisory Committee are as follows:
Associate Professor Hj. Abdul Rahrnan Ramli, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Chairman)
Veeraraghavan Prakash, Ph.D. Faculty of Engineering Universiti Putra Malaysia (Member)
Roslizah Ali Faculty of Engineering Universiti Putra Malaysia (Member)
Y AINI IDERIS, Ph.D. Professor/Dean School of Graduate Studies Universiti Putra Malaysia
Date : 0 8 SEP 2004
DECLARATION
1 hereby declare that the thesis is based on my original work except for quotations and citations which have been duly acknowledged. I also declare that it has not been previously or con-currently submitted for any other degree at UPM or other tertiary institution.
LEW YUAN POK
TABLE OF CONTENTS
DEDICATION ABSTRACT ABSTRAK ACKNOWLEDGEMENTS APPROVAL DECLARATION LIST OF FIGURES LIST OF TABLES LIST OF ABBREVIATIONS
CHAPTER
INTRODLTCTION 1 . 1 Background
1 . 1 . 1 Gesture Recognition 1.1.2 Web Camera Objectives Thesis Organization
LITERATURE REVIEW 2.1 Machine Vision
2.1.1 Working Principles of Machine Vision 2.1.2 Examples of Machine Vision Applications Computer Vision Gesture Recognition 2.3.1 Vision Based Gesture Recognition 2.3.2 Gesture Recognition Using Other Sensors Static Hand Gesture Recognition Dynamic Hand Gesture Recognition Hand Segmentation Methods Feature Extraction Methods Gesture Recognition Methods 2.8.1 Template Matching 2.8.2 Neural Network Methods 2.8.3 Statistical Methods 2.8.3 Rule-based Methods
3 METHODOLOGY 3.1 Overview 3.2 Hand Segmentation
3.2.1 lmage Subtraction 3.2.2 Segmentation for Gray Level Image 3.2.3 Segmentation for RGB Color Image 3.2.4 Segmentation for Normalized RGB Color Image 3.2.5 Skin Color Modeling
ii . . . I l l
v vii . . . Vl l l
X ... Xll l
xv svi
3.2.6 Segmentation with Skin Color Model ~e~resentation and Description 3.3.1 Preprocessing
3.3.1.1 Holes Filling 3.3. 1 2 Morphological Opening 3.3.1.3 Median Filtering Hand Representation 3.3.2.1 Boundary Extraction 3.3.2.2 Boundary Correction 3.3.2.3 Boundary Points Selection
3.3.3 Fourier Descriptors Recognition 3.4.1 Cluster Analysis 3 -4.2 K-means Clustering 3.4.3 Training Phase
3.4.3. I Preparation 3.4.3.2 Clustering Recognition Phase 3.4.4.1 Minimum Distance Classifier
RESULTS AND DISCUSSIONS 4. I Comparison of Segmentation Techniques
4.1.1 Segmentation for Gray Level Image 4.1.2 Segmentation for RGB Color Image 4.1.3 Segmentation for Normalized RGB Color Image 4.1.4 Summary Segmentation with Skin Color Model Preprocessing 4.3.1 Holes Filling 4.3.2 Morphological Opening 4.3.3 Median Filtering Hand Representation 4.4.1 Boundary Extraction 4.4.2 Boundary Correction 4.4.3 Boundary Points Selection Fourier Descriptors K-means Clustering of Training Samples Classification of Training Samples Classification of Testing Samples
COSCLUSION AND FUTURE RESEARCH 5.1 Summaq C 7 d .- Future Research Directions
REFEREWES APPEhDICES BIODATA OF THE AUTHOR
LIST OF FIGURES
Figure
2.1 Fiber Optic Inspection System (developed by Dynamic Motion Control)
Multiple Wavelength Images of A Single Apple
Elements Of Computer Vision Process
Static Hand Gesture Recognition System Block Diagram
Hand Segmentation Flow Chart
Original RGB Color Image
Illustration of the Concept in Image Subtraction
Gra). Level Image
Skin Color Modeled As Normal Distributions
Hand Representation and Description Flow Chart
Examples of Six Hand Postures
Illustration of the Block Operations Used in Holes Filling
Elements in Morphological Opening
Median Filtering Operation
Examples of Image with High Potential of Being Misclassified
Parameters Used in Correcting the Boundary
A Digital Boundary and Its Representation As A Complex Sequence
Hand Posture Recognition Process Flow Chart
Samples of Five Hand Images Used for Segmentation
Samples of Hand Images Representing Six Different Postures
Page
14
Segmentation Result Using Skin Color Modeling for Sample 6 - 106
xiii
Sample I I
Holes Filling Result for Sample 6 - Sample I 1
Morphological Opening Result for Sample 6 - Sample 1 1
Median Filtering Result for Sample 6 - Sample 1 1
Boundary Extraction Result for Sample 6 - Sample 1
Boundary Correction Result for Sample 6 - Sample I I
Hand B o u n d q Represented by Selected Points for Sample 6 - Sample 1 1
Training Samples for K-means Clustering
Testing Samples Used to Analyze the Recognition Performance
Euclidean Distance between Cluster Centers and Testing Images (a) - (h) in Table 4.9
Distortion in Hand Boundary Extracted fiom Testing Image in Table 4.9(f)
Euclidean Distance between Cluster Centers and Testing Images (a)-(f) in Table 4.8
xiv
LIST OF TABLES
Table
4.1
Page
Segmentation for Samples of Five Gray Level Images Based on 97 the Threshold Value of 10%. 15% and 20% Respectively
Segmentation for Samples of Five Images in the RGB Color Space 99 Based on the Threshold Value of 10% 15% and 20% Respectively
Segmentation for Samples of Five Images in the Normalized RGB 10 1 Color Space Based on the Threshold Value of 10%. 15% and 20% Respectively
Segmentation for Samples of Five Images Based on the Skin 103 Color Model Applying Threshold Value of 10%, 15% and 20% Respectivel).
Coordinates of Boundq Points for Sample 6
Steps for Deriving Fourier Descriptors
Total Squared Error (TSE) for Individual Cluster
Examples of Classification Result
Examples of Falsely Classified Testing Images
2 D 3 D m A1 CCD CMOS DFT DTW FSM FD GUI HMM HCI IR IC MIPS NIR PC QE RBFNN Rx's RNN RGB ROI USB VLSI VE
LIST OF ABBREVIATIONS
Two Dimension Three Dimension Analogue to Digital Artificial Intelligence Charge Couple Device Complex Metal Oxide Semiconductor Discrete Fourier Transform Dynamic Time Warping Finite State Machine Fourier Descriptors Graphical User Interface Hidden Markov Model Human Computer Interaction Infrared Integrated Circuit Millions of Instructions per Second Near Infrared Personal Computer Quantum Eficiency Radial Basis Function Neural Network Receivers Recurrent Neural Network Red, Green, Blue Region of Interest Universal Serial Bus Very Large Scale Integration Virtual Environment
xvi
CHAPTER 1
1.1 Background
The evolution of the interaction bemeen computer and human or better
known as Human Computer Interaction (HCI) has always been interesting. The first
generation computer user interface is a text based interface where user interact by
typing relevant commands. By typing the commands through keyboard, the computer
will perform relevant actions and inform the user of the outcome.
Then came the Graphical User Interface (GUI) where the main medium of
interaction are symbolic graphics called icons. Each icon is assigned a specific
meaning intended for specific application or certain action (Webopedia, 2003). In
order to carry out a specific action or to run a specific application, the user can just
click on the relevant icon by using a mouse. Texc is still dominant in this second
generation computer user interface as it is impossible to convert every line of
command into icon, however, the use of text is minimized or avoided when
necessap.
Nevertheless, it was soon realized that no matter how powerful the two kinds
of user interface are, they will never be able to replace the way human communicate
most naturally, i.e. through the use of gesture. As applications shift fiom 2D to 3D
and eventually Virtual Environment (VE), the use of conventional input devices such
as keyboard, mouse, trackball or joystick becomes more awkward and inconvenient
than ever. It was then understood that nothing is able to navigate better in such
applications than the human gesture itself. The human gesture is not only natural and
intuitive to a user, it can also represent motions of high degree of freedom impossible
to achieve using other input devices.
Owing to the many advantages displayed by the use of human gesture as a
powerful input device and the potential applications, gesture recognition has become
an important research field in recent years. Covering the diverse fields of motion
modeling, motion analysis, pattern recognition, machine learning and even
psycholinguistic studies, gesture recognition focuses on the study of human posture
and movement of different parts of body such as hand and head (Wu and Huang,
1999). These also become the major motivation of this thesis to study various aspects
of the vision based static hand gesture recognition system and to implement a system
which can eventually be used as input in VE.
1.1.1 Gesture Recognitioo
Gestures are loosely defined by different researchers in different disciplines.
From the point of view of biologists, the notion of gesture is to embrace all kinds of
instances where an individual engages in movements whose communicative intent is
paramount, manifest, and openly acknowledged (Nespoulous, 1986). This includes
pointing at an object to indicate a reference, putting a thumb up to indicate consent or
2
approval or conveying abstract meaning to one another using sign language. Gesture
recognition on the other hand can be defined as the ability to identify specific human
gestures by a machine or computer and using them to convey information or for
controlling a device.
Gestures are important part of human communication where large amount of
information is encoded in vet compact form such as specific hand shape or
orientation accompanied by certain movement. The ability to recognize these
gestures represents an oppommity to decode the information conveyed most
naturally by human. It is an alternative way of interaction between human and
computer where little learning is required on the user as gesture is part of human's
daily life communication.
Teaching a computer to recopize human gestures has been a goal of pursuit
by various researchers since the 1980's. Earlier efforts in this research are focused on
using a mechanical glove to capture the human gestures. On the mechanical glove,
sensors are used to measure the bending angle of various joints of the hand and the
pressure exerted b) the palm. The different bending angle measurement will
correspond to different geometric representation of the hand. In order to determine
the position and orientation of the forearm in space, tracking sensors are used.
However, the use of the mechanical glove is considered intrusive and unnatural to a
user.
The focus in gesture recognition research later shifis to vision based gesture
in conjunction with the advances made in the field of machine vision and
3
computer vision. Vision based gesture recognition is a gesture recognition approach
based on visual input. Imaging device such as camera is used to capture image
sequences of human motion. Geometric properties of human hand or head together
with their trajectories are then extracted from the image sequences to reconstruct
human gesture. The vision based approach proves to be an attractive alternative since
it is natural and not intrusive.
Earlier vision based gesture recognition system uses gray level image
sequences as the input. In 1992, first attempt was made to recognize hand gesture
from color video signals in real time. This was no coincidence as it is also the year
where the first frame gabber for color video signal is available (Kohler, 2002). This
marks the gradual shift of gesture recognition research from gray level video input to
the color video input. The presence of color provides an additional and powerful cue
for researchers especially in the area of segmentation.
Presently, gesture recognition still remains an active area of research which is
still in infancy. As different applications are put to the drawing board, more gesture
recognition techniques are still being improved, developed and created. This includes
areas in segmentation, tracking, motion modeling, feature extraction and recognition
techniques. Since there is no single known technique which can cater for all the
situations to be encountered, gesture recognition is highly problem orientated.
RRPUSTAKAAN SULTAN ABDUL StW4fI UNiVERBCTl PUTRA MALAYW
1.1.2 Web Camera
As explained in the Memam-Webster Online dictionary (2003). a web
camera or webcam is a camera used in transmitting live images over the World Wide
Web. Generally speaking, any video camera or digital camera that can perform this
function qualifies as a web camera. The term refers more specifically to the bct ion
perform rather than the hardware itself as suggested by the definition.
Over the years, as hosting live images in the World Wide Web becomes more
and more popular, a class of camera which is dedicated to this fiurction mainly plus
other additional features begins to emerge in the market and redefine the term, web
camera. It is a class of low cost camera usuilly in the range from above RMlOO to
RM 400 which is connected to a PC through parallel port or USB port also known as
pc camera.
Web camera is a class of camera which either use Charge Couple Device
(CCD) or Complex Metal Oxide Semiconductor (CMOS) as the photo sensor. It
usually has optical resolution of not more than 640x480 pixels and can produce
digital still image or video which can be directly stored inside a pc. Compared to the
conventional CCD camera which still use analogue signal, web camera proves to be
more convenient as no extra image capture card is needed to do the Analogue to
Digital (AID) conversion. However, lower sensitivity and higher noise level are the
two major drawbacks of web camera compared to higher quality CCD camera as
justified by its lower cost.
Despite the shortcomings in the ability of web camera, it is still an attractive
tool for performing gesture recognition especially when it is readily available in the
market at a substantially lower price compared to other CCD camera. Web camera is
still adequate to perform real-time gesture recognition as the key to its successful
implementation lies not in the camera but in the processing speed of the computer
carrying out the necessary manipulation of the image and the algorithm which
determines how to manipulate the image effectively and efficiently.
1.2 Objectives
Vision based static hand gesture recognition is a complex task which is
consisted of a series of structured steps. The aim of this thesis is to study the
necessary steps of vision based static hand gesture recognition which will eventually
lead to successful recognition of the human hand gestures off line using web camera.
The main objective of this thesis is to design a vision based static hand
gesture recognition system using MATLAB which is able to perform recognition of
six different sets of predefined hand postures.
The secondary objective of this thesis is to study the techniques involved in
various stages of hand gesture recognition task and to propose modification
whenever necessary.
1.3 Organization
The main issue to be addressed in this thesis is vision based static hand
gesture recognition. Omine recognition of six different sets of hand posture is
implemented using MATLAB 6.1 on a Pentium 111 667 with 256MB of memory. A
web camera is employed as the imaging device for capturing different hand postures.
The thesis is organized as follows:
Chapter 2 lays a background on the whole subject of computer vision and
machine vision. Detailed discussions are then given one gesture recognition
namely on vision based gesture recognition including facial gestures and hand
gestures. Hand gesture recognition is then divided into two major areas which
are static hand gesture recognition and dynamic hand gesture recognition
respectively. Major differences between the two areas are highlighted
especially in terms of the different approaches adopted. Different techniques
used for the various stages of the hand gesture recognition ranging from
segmentation, features extraction and gesture recognition technique are then
treated in details.
Chapter 3 presents the methodology involved in constructing the vision based
static hand gesture recognition system. Discussions however focus on the
three most important rntijor steps of the system namely: (1)hand
segmentation; (2)representation and description and (3)recognition. Hand
segmentation technique is adopted to extract hand region from the visual
input. Segmentation is done in gray level image, Red, Green, Blue (RGB)
color space and normalized RGB color space. A skin color modeling
techniques using the property of human skin color in color space is then
proposed. Representation and description on the other hand explains the
methods used starting from preprocessing of the extracted hand region until a
set of usehl features are extracted to represent the different hand postures.
Recognition 5 presents the techniques used in classifying the feature vectors
for the six different hand postures. Details on the steps used for implementing
K-means clustering and the recognition method used are presented as well.
Chapter 4 shows the results obtained from segmentation in different color
space. The results for segmentation using the proposed skin color model are
illustrated followed by the preprocessing, representation and eventually
recognition based on the set of features proposed.
Chapter 5 summarizes the thesis by restating the common problems arise in
different stages of the gesture recognition task, thus laying down the
requirements for a successful hand gesture recognition task as a whole and to
check how much this thesis has helped to answer these requirements. For the
questions and problems not addressed in other part of the thesis, they will be
addressed in the future research directions.