Final Project Report

Hand Gesture Recognition Using Neural Networks

A Project Report On

“Hand Gesture Recognition using Neural Networks”

Submitted by

Kiran P V Exam No. B3223021Vidit Mediratta Exam No. B3223027Gaurav Sharma Exam No. B3223042

Under the Guidance of Prof. Vijay Karra

For the partial fulfillment of

B.E. (Electronics & Telecommunication) 2008-2009

ToDEPARTMENT OF ELECTRONICS & TELECOMMUNICATION

ARMY INSTITUTE OF TECHNOLOGYDIGHI HILLS, PUNE-411015

Under

1


University of Pune

CERTIFICATE

This is to certify that Kiran P V, Gaurav Sharma and Vidit Mediratta have

successfully submitted the seminar report on

“HAND GESTURE RECOGNITION USING NEURAL NETWORK”

During the academic year 2008-2009 in the partial fulfillment towards completion

of bachelor’s Degree Program in Engineering(Electronics and Telecommunication)

under University of Pune.

Mrs. Surekha K.S Prof. Vijay Karra Head of Department Project GuideElectronics and Telecommunication Electronics and Telecommunication

Mrs. Surekha K.S Principal Army Institute of Technology Dighi Hills,Pune-411015

2


ACKNOWLEDGEMENT

We wish to express our sincere gratitude to our guide Prof. Vijay Karra for his valuable

guidance at all stages of our project. We acknowledge the whole hearted, unreserved and positive

encouragement on his part, which helped us to tackle all our problems to ensure successful

completion of the project.

We are thankful to the Staff of Department of Electronics & Telecommunication, A.I.T., for

all the direct and indirect help and for making available the resources of the department for the

timely completion of our project.

We are also thankful to our Prof. Surekha K S for her valuable suggestions.

We sincerely believe that our guides were the motivating forces behind the project. It was their

constant encouragement and constructive criticism that has made our project achieve its present

form.

We would be ungrateful if we did not acknowledge our family and friends who were always by

our side.

Last but not the least to whom we have named, we express deep gratitude and to whom we

haven’t please note that even though you are unnamed, you are appreciated more than you know.

Kiran P V

Vidit Mediratta

Gaurav Sharma

3


Table of contents

1. Abstract…………………………………………………………………… .........52. Introduction A.Brief description…………………………………………………………….7 B. Literature survey……………………………………………………………9 C. Software Engineering Approach……………………………………..........113. Problem Definition……………………………………………………………..124. Design I. A.Hand Gesture Recognition………….…………………………………. .14 B. Image .Database ………………………………………………………15 C.Image Processing………………………………………………….........16 D. Matlab……………………………..… ………………………………..17 E. Neural Network………………………………………………………..18 F.Block Diagram. ……………………………………………………..... .21 II. Defining the different issues A.Database Creation…………………………………………………… 23 B.Counting the fingers……………………………………………………29 C.Matlab Operations…………………………………………………… 37 D.Neuron Model……………………………………………………….. .43 E.Microcontroller & Robot………………………………………………495. Stepwise procedure flow……………………………………………………. ..556.Time Activity Chart………………………………………………………….. .567. Conclusion……………………………………………………………………. 578. Future scope…………………………………………………………………...589. Bibliography…………………………………………………………………. .59

4


ABSTRACT

Hand gesture recognition techniques have been studied for more than two decades. Several

solutions have been developed , however, little attention has been paid on the human factors, e.g.

the intuitiveness of the applied hand gestures. This study was inspired by the movie Minority

Report, in which a gesture-based interface was presented to a

large audience. In the movie, a video-browsing application was controlled by hand gestures.

Nowadays the tracking of hand movements and the computer recognition of gestures is

realizable , however, for a usable system it is essential to have an intuitive set of gestures. The

system functions used in Minority Report were reverse engineered and a user study was

conducted, in which participants were asked to express these functions by means of hand

gestures. We were interested how people formulate gestures and whether we could find any

pattern in these gestures. In particular, we focused on the types of gestures in order to study

intuitiveness, and on the kinetic features to discover how they influence computer recognition.

We found that there are typical gestures for each function, and these are not necessarily related to

the technology people are used to. This result suggests that an intuitive set of gestures can be

designed, which is not only usable in this specific application, but can be generalized for other

purposes as well. Furthermore, directions are given for computer recognition of gestures

regarding the number of hands used and the dimensions of the space where the gestures are

formulated.

5


INTRODUCTION

6


BRIEF DESCRIPTION

Several successful approaches to spatio-temporal signal processing such as speech recognition

and hand gesture recognition have been proposed. Most of them involve time alignment which

requires substantial computation and considerable memory storage. In this paper, we present a

neural-network-based approach to spatio-temporal pattern recognition. This approach employs

a powerful method based on Hyper Rectangular Composite Neural Networks (HRCNNs) for

selecting templates; therefore, considerable memory is alleviated.

Due to congenital malfunctions, diseases, head injuries, or virus infections, deaf or

non- vocal individuals are unable to communicate with hearing persons through speech. They

use sign language or hand gestures to express themselves, however, most hearing persons do

not have the special sign language expertise. Hand gestures can be classified into two classes:

(1) static hand gestures which relies only the information about the angles of the lingers and (2)

dynamic hand gestures which relies not only the fingers' flex angles but also the hand

trajectories and orientations. The dynamic hand gestures can be further divided into two

subclasses. The first subclass consists of hand gestures involving hand movements and the

second subclass consists; of hand gestures involving fingers' movements but without changing

the position of the hands. That is, it requires at least two different hand shapes connected

sequentially to form a particular hand gesture. Therefore samples of these hand gestures are

spatio-temporal patterns. The basic idea of our method for recognizing these spatio-temporal

hand gestures is as follows. We generate templates for each basic hand shape by training a

Hyper Rectangular Composite Neural Network (HRCNN). Templates for each hand shape are

then represented in the form of crisp IF-THEN rules, which are extracted from the values of

7


synaptic weights of the corresponding trained HRCNN. The accumulated similarity associated

with all samples of the input is computed for each hand gesture in the vocabulary, and the

unknown gesture is classified as the gesture yielding the highest accumulative similarity.

Developing sign language applications for deaf people can be very important, as many of them,

being not able to speak a language, are also not able to read or write a spoken language.

Ideally, a translation systems would make it possible to communicate with deaf people.

Compared to speech commands, hand gestures are advantageous in noisy environments, in

situations where speech commands would be disturbing, as well as for communicating

quantitative information and spatial relationships.

A gesture is a form of non-verbal communication made with a part of the body and used instead

of verbal communication (or in combination with it). Most people use gestures and body

language in addition to words when they speak. A sign language is a language which uses

gestures instead of sound to convey meaning combining hand-shapes, orientation and movement

of the hands, arms or body, facial expressions and lip-patterns. Similar to automatic speech

recognition (ASR), we focus in gesture recognition which can be later translated to a certain

machine movement.

The goal of this project is to develop a program implementing real time gesture recognition. At

any time, a user can exhibit his hand doing a specific gesture in front of a video camera linked to

a computer. However, the user is not supposed to be exactly at the same place when showing his

hand. The program has to collect pictures of this gesture thanks to the video camera, to analyze it

and to identify the sign. It has to do it as fast as possible, given that real time processing is

required. In order to lighten the project, it has been decided that the identification would consist

in counting the number of fingers that are shown by the user in the input picture.

We propose a fast algorithm for automatically recognizing a limited set of gestures from hand

images for a robot control application. Hand gesture recognition is a challenging problem in its

general form. We consider a fixed set of manual commands and a reasonably structured

environment, and develop a simple, yet effective, procedure for gesture recognition. Our

approach contains steps for segmenting the hand region, locating the fingers and finally

classifying the gesture. The algorithm is in variant to translation, rotation, and scale of the

hand .We can even demonstrate the effectiveness of the technique on real imagery.

8


LITERATURE SURVEYObjective:

Our objective is to identify requirements (i.e., quality attributes and functional

requirements) for Gesture Based Recognition. We especially focus on requirements

for research tools that target the domains of visualization for software maintenance,

reengineering, and reverse engineering.

Method:

The requirements are identified with a comprehensive literature survey based on relevant

publications in journals, conference proceedings, and theses. We have referred

Documents and journals available on the net for the same . Most of the data has been referred

from the IEEE website. As our library has online subscription of the IEEE journals, it

provided immense help in locating the resources.

The various journals referred are:

1) Implementation of adaptive feed-forward algorithm by Jaroslaw Szewinski_†, Wojciech

Jalmuzna_, University of Technology, Institute of Electronic Systems, Warsaw, Poland.

This deals with the description of the various algorithms used in Neural Networks viz. •

feed-forward (FF) • feedback (FB) • adaptive feed-forward (AFF).

2) Gesture Based Robot Control by V. S. Rao and C. Mahanta ,Department of

Electronics and Communication Engineering ,Indian Institute of Technology, Guwahati.

9


This journal deals with the past and recent developments in gesture recognition system. It

provided the great works by different scientists in different parts of the globe working on the

same aim: visual gesture recognition system for controlling robots.

3) A Fast Algorithm For Vision-Based Hand Gesture Recognition For Robot

Control by Asanterabi Malima, Erol Özgür, and Müjdat Çetin, Faculty of Engineering and

Natural Sciences, Sabancı University, Tuzla, İstanbul, Turkey.

The approach contains steps for segmenting the hand region, locating the fingers,and

finally classifying the gesture. The algorithm is invariant to translation, rotation, and scale of

the hand.

4) A Gesture controlled robot for object perception and Manipulation by Mark

Batcher, Institute of Neuroninformatics , Germany.

Gripsee is the name of the Robot of whose design is discussed in the paper ,it is used

for identifying an object, grasp it, and moving it to a new position. It serves as a

multipurpose Robot which can perform a no. of tasks , it is used as a Service Robot.

5) Programming-By-Example Gesture Recognition by Kevin Gabayan, Steven Lansel .

Machine learning and hardware improvements to a programming-by-example rapid

prototyping system are proposed This paper deals with the dynamic time warping gesture

recognition approach involving single signal channels.

10


SOFTWARE ENGINEERING APPROACH

For developing the code, and the whole algorithm, it was preferable to use Matlab. Indeed, in this

environment, image displaying, graphical analysis and image processing turn into a simple

enough issue concerning the coding, because Matlab has a huge and very complete “Image

Processing Toolbox”, and the fact that Matlab is optimized for matrix-based calculus make any

image treatment more easier given that any image can be considered as a matrix.

That’s why the whole Code has been developed first under Matlab environment. Only the

code of the Neural Network Method and of the Weighted Averaging Analysis method is

provided. Indeed, given that the last one is a kind of combination of the Pixel Counting Method

and of the Edge Counting Method, their respective codes may be extracted from the code of the

Weighted Averaging Method.

For the movement of robot, the program has been written in assembly language since it is

most suitable and we are well aware of the subject. The IC used is 8051 microcontroller, hence

the code was written and tested in RIDE software.

11


PROBLEM DEFINITION

The experimental setup consists of a digital camera used to take the images .The camera

is interfaced to computer. Computer is used to create the database & analysis of the

images. The computer consists of a program prepared in MATLAB for the various

operations on the images. Using Neural Network tool box, analysis of the images is done.

The initial step is to create the database of the images which are used for training &

testing. The image database can have different formats. Images can be either hand drawn,

digitized photographs or a 3D dimensional hand. Photographs were used, as they are the

most realistic approach. Two operations were carried out in all of the images. They were

converted to grayscale and the background was made uniform. The images with internet

databases already had uniform backgrounds but the ones taken with the digital camera

had to be processed in Photoshop .The pattern recognition system that will be used

consists of some transformation T, which converts an image into a feature vector, which

will be then compared with feature vectors of a training set of gestures.

12


DESIGN

13


HAND GESTURE RECOGNITION

Consider a robot navigation problem, in which a robot responds to the hand pose signs given by

a human, visually observed by the robot through a camera. We are interested in an algorithm that

enables the robot to identify a hand pose sign in the input image, as one of five possible

commands (or counts). The identified command will then be used as a

control input for the robot to perform a certain action or execute a certain task. For examples of

the signs to be used in our algorithm, see Figure . The signs could be associated with various

meanings depending on the function of the robot. For example, a “one” count could mean “move

forward”, a “five” count could mean “stop”. Furthermore, “two”, “three”, and “four” counts

could be interpreted as “reverse”, “turn

right,” and “turn left.”

14


Set of hand gestures, or “counts” considered in our work.

IMAGE DATABASE

The starting point of the project was the creation of a database with all the images that

would be used for training and testing.

The image database can have different formats. Images can be either hand drawn,

digitized photographs or a 3D dimensional hand. Photographs were used, as they are the

most realistic approach.

Images came from two main sources. Various ASL databases on the Internet and

photographs I took with a digital camera. This meant that they have different sizes,

different resolutions and some times almost completely different angles of shooting.

Images belonging to the last case were very few but they were discarded, as there was no

chance of classifying them correctly. Two operations were carried out in all of the

images. They were converted to grayscale and the background was made uniform. The

internet databases already had uniform backgrounds but the ones I took with the digital

camera had to be processed in Adobe Photoshop.

Drawn images can still simulate translational variances with the help of an editing

program (e.g. Adobe Photoshop).

The database itself was constantly changing throughout the completion of the project as it

was it that would decide the robustness of the algorithm. Therefore, it had to be done in

such way that different situations could be tested and thresholds above which the

algorithm didn’t classify correct would be decided.

The construction of such a database is clearly dependent on the application. If the

application is a crane controller for example operated by the same person for long periods

15


the algorithm doesn’t have to be robust on different person’s images. In this case noise

and motion blur should be tolerable.

IMAGE PROCESSING

Image processing is any form of signal processing for which the input is an image, such as

photographs or frames of video; the output of image processing can be either an image or a set of

characteristics or parameters related to the image. Most image-processing techniques involve

treating the image as a two-dimensional signal and applying standard signal-

processing techniques to it.

Typical operations

Among many other image processing operations are:

Geometric transformations such as enlargement, reduction, and rotation

Color corrections such as brightness and contrast adjustments, quantization, or

conversion to a different color space

Digital compositing or Optical compositing (combination of two or more images). Used

in filmmaking to make a "matte"

Image editing (e.g., to increase the quality of a digital image)

Image registration (alignment of two or more images), differencing and morphing

Image segmentation

Extending dynamic range by combining differently exposed images

2-D object recognition with affine invariance

Applications

Computer vision

Face detection

Feature detection

16


Lane departure warning system

Non-photorealistic rendering

Medical image processing

Microscope image processing

Morphological image processing

Remote sensing

MATLAB

The name MATLAB stands for matrix laboratory.

MATLAB is a high-performance language for technical computing. It integrates

computation, visualization, and programming in an easy-to-use environment where

problems and solutions are expressed in familiar mathematical notation. Typical uses

include:

_ Math and computation

_ Algorithm development

_ Modeling, simulation, and prototyping

_ Data analysis, exploration, and visualization

_ Scientific and engineering graphics

_ Application development, including Graphical User Interface building

MATLAB is an interactive system whose basic data element is an array that does not

require dimensioning. This allows you to solve many technical computing problems,

especially those with matrix and vector formulations, in a fraction of the time it would

take to write a program in a scalar non-interactive language such as C or Fortran.

MATLAB has evolved over a period of years with input from many users. In university

environments, it is the standard instructional tool for introductory and advanced courses

in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for

high-productivity research, development, and analysis.

17


The reason that I have decided to use MATLAB for the development of this project is its

toolboxes. Toolboxes allow you to learn and apply specialized technology. Toolboxes

are comprehensive collections of MATLAB functions (M-files) that extend the

MATLAB environment to solve particular classes of problems. It includes among others

image processing and neural networks toolboxes.

NEURAL NETWORK

An artificial neural network (ANN), also called a simulated neural network (SNN) or commonly

just neural network (NN) is an interconnected group of artificial neurons that uses

a mathematical or computational model for information processing based on a

connectionistic approach to computation. In most cases an ANN is an adaptive system that

changes its structure based on external or internal information that flows through the network.

In more practical terms neural networks are non-linear statistical data modeling or decision

making tools. They can be used to model complex relationships between inputs and outputs or

to find patterns in data

An artificial neural network involves a network of simple processing elements (artificial

neurons) which can exhibit complex global behavior, determined by the connections between the

processing elements and element parameters. One classical type of artificial neural network is the

Hopfield net.

In a neural network model simple nodes, which can be called variously "neurons", "neurodes",

"Processing Elements" (PE) or "units", are connected together to form a network of nodes —

hence the term "neural network". While a neural network does not have to be adaptive per se, its

practical use comes with algorithms designed to alter the strength (weights) of the connections in

the network to produce a desired signal flow.

18


In modern software implementations of artificial neural networks the approach inspired by

biology has more or less been abandoned for a more practical approach based on statistics and

signal processing. In some of these systems neural networks, or parts of neural networks (such as

artificial neurons) are used as components in larger systems that combine both adaptive and non-

adaptive elements.

Neural networks are composed of simple elements operating in parallel. These elements

are inspired by biological nervous systems. As in nature, the network function is

determined largely by the connections between elements. We can train a neural network

to perform a particular function by adjusting the values of the connections (weights)

between elements.

Commonly neural networks are adjusted, or trained, so that a particular input leads to a

specific target output There, the network is adjusted, based on a comparison of the output and the

target, until the network output matches the target.

Figure : Neural Net block diagram

Neural networks have been trained to perform complex functions in various fields of

application including pattern recognition, identification, classification, speech, vision and

control systems.

Today neural networks can be trained to solve problems that are difficult for conventional

computers or human beings. The supervised training methods are commonly used, but

19


other networks can be obtained from unsupervised training techniques or from direct

design methods. Unsupervised networks can be used, for instance, to identify groups of

data. Certain kinds of linear networks and Hopfield networks are designed directly. In

summary, there are a variety of kinds of design and learning techniques that enrich the

choices that a user can make.

Applications

The utility of artificial neural network models lies in the fact that they can be used to infer a

function from observations and also to use it. This is particularly useful in applications where the

complexity of the data or task makes the design of such a function by hand impractical.

Real life applications

The tasks to which artificial neural networks are applied tend to fall within the following broad

categories:

Function approximation, or regression analysis, including time series prediction and

modelling.

Classification, including pattern and sequence recognition, novelty detection and

sequential decision making.

Data processing, including filtering, clustering, blind signal separation and

compression.

Application areas include system identification and control (vehicle control, process control),

game-playing and decision making (backgammon, chess, racing), pattern recognition (radar

20


systems, face identification, object recognition, etc.), sequence recognition (gesture, speech,

handwritten text recognition), medical diagnosis, financial applications, data mining (or

knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering.

BLOCK DIAGRAM

PC WITH MATLAB

MOTOR DRIVER MOTOR

8051MICROCONTROLLER

21

Pattern to Recognized recognize Pattern

Generation of templates

Pattern recognition

Decision Logic

Sampling


22


DEFINING THE DIFFERENT ISSUES

Collecting the picturesFirst of all, and obviously, it will be necessary to collect pictures. There is a choice to do concerning the way we want to collect these pictures, given that it depends on how we implement the main program. Running in the MATLAB environment requires the pictures to be saved in memory and called back when running the program, because the Image Acquisition Toolbox is not available on the MATLAB version used for the design of the program.

That’s why, for a real time processing, it will be necessary to implement the program in a

C or C++ environment. So, the easiest way to collect pictures is to use VideoOCX for example,

assuming encoding in C++.

However, to develop the body of the program, there are no real time constraints: it is

possible to work on typical and representative pictures previously chosen and saved. The whole

MATLAB program has been developed using such saved pictures. Then, it has been modified so

that it can be used in real time C++ stand-alone functions.

Finding the hand

Now, let’s suppose that a set of representative pictures is provided. We need then to

analyze the picture, and to find the relevant part of the picture. Indeed the user will never put his

hand in the same area of the picture. Here are given few examples of the same sign done in

different areas, which have to lead to the same identification result, which should be ‘2’:

23


Analysis and identification

Then, the real work can start: Let’s suppose we got the relevant part of the image, which

contains only the hand. How can we “guess” the type of sign? To make the problem easier, we

can consider that we are interested only in the number of fingers exhibited by the user. So, we

can sum up the problem: How can we count the number of fingers in a picture of hand?

There are plenty of ways to do it. In the following pages, the advantages and drawbacks

of few of them will be described. There are some geometrical ways that can make the problem

solved by counting numbers of blocks within a picture, or some more sophisticated methods,

such as neural networks or laplacian filtering, which can lead to interesting results.

Examples of Allowed pictures

24


It has been already explained that the position of the hand in the picture is not important.

Given that the background is known, it is possible to build a new picture that corresponds to the

difference between the current picture of hand and the background. So it is possible to collect a

picture that contains only the hand, and some noise.

After processing noise removal, the resulting picture will be black almost everywhere except

where the hand is. So, zooming can then be easily realized by cropping areas whose pixel values

are close to 0.

Picture of the difference with the background

The difference with the background can be done using the Matlab function “imabsdiff”.

After that, to make all the preprocessing easier, it is better to create a binary picture. To do so, it

is necessary to choose a threshold: pixels with value lower than this threshold will be set to 0

(black) and others will be set to 1. The choice of this threshold depends on the video camera

properties: if we consider that the camera provides pixels coded on bytes, pixel values will be

from 0 to 255. Some measurements have proven that in this case, the presence of the hand will

imply a variation of pixel values bigger than 20 units. Of course, the optimal threshold depends

on the background, nevertheless, this threshold can be correct in most of cases.

Then it is necessary to execute noise-removal functions, else every noisy pixel that its

value is too high may be considered as part of the hand and will be included in the zoom-in

25


picture. For example if we suppose that the hand is in one corner of the picture and that there is a

noisy pixel in the opposite corner of the picture of the differences, so the zooming function will

keep it and the resulting picture, after zooming, will not be very different of the initial picture!

That’s why it is necessary to use noise removal functions.

The noise removal is processed using the function bwmorph(open), that erodes then

dilates the noisy picture. By this way, lonely pixels disappear during the erosion; other elements

are restored to their initial shape thanks to the following dilation.

Here are given few examples of resulting pictures.

Background Input Picture Binary Picture

26


Standard Re-sizing

According to the requirements, the user is not supposed to be systematically at the same

distance of the video camera. The consequences are obvious: if he is close to it, the hand will

occupy a large part of the input picture. At the contrary, when he is far from it, the hand will

appear small enough on the picture. So, the pictures of the hands after cropping may have some

very different sizes. That’s to say that it is necessary to resize all the pictures to a standard size so

that we can process them all the same way.

It seems evident that it is not useful to resize it to a size larger than the original one given

that it will not add information. Worse, it would be a serious drawback because it would increase

the amount of massive calculus, and it is contrary to the constraint of real time processing. For

these reasons, it is quite more interesting to reduce the size, but not too much. Indeed, in an

excessively reduced picture, some fingers can disappear, and some spaces between two fingers

way also disappear so that is seems there is only one finger.

After few tests and measurements, it has been decided that a size of 30x30 is quite small

enough to make calculus fast, and large enough to avoid any major damage to the initial picture.

In these conditions, the average dimensions of a finger are:

- width: 3~5 pixels

- length: 15~20 pixels

Of course, different users will all have different hands, hence different absolute

measurements. Nevertheless, such standard re-sizing will provide relative measurements: if the

size of the real thumb and ring fingers depend on the user, the ratio will be generally constant.

For almost all users:

-

-

That’s why this re-sizing operation can be considered as a standardization process: for any user,

the final re-sized image will have almost identical properties concerning the dimension of its

elements.

27


Finally, the fact that the width of a finger is 4 to 5 pixels implies that in the resulting

picture

A schematic example

A real example:

Input picture Binary picture Zoom-in Resizing

In these conditions, for any input picture, for any hand gesture that involve the thumb

finger, the preprocessing algorithm provides a standard-sized binary picture that corresponds to a

zoom on the hand. Once this preprocessing is finished, the real processing can start, that is to

say, the identification process can be launched.

28

Initial Picture, Size: 240 x 320

Hand, Size: ? x ?

Re-sized hand, 30 x 30


Counting the fingers

Simple Pixel Counting Analysis

The first immediate idea is the following: a picture that contains only the hand of the user

is provided to the program. In this picture, if there are only one or two fingers that are exhibited,

the numbers of pixels with value ‘1’ will be small. If the five fingers are shown, there will be

more pixels at ‘1’. So, there is a strong link between the number of fingers and the number of

pixels set to ‘1’. The easiest way to classify an image is then to compute the sum of the pixels of

the re-sized hand picture, and to compare to the resulting value to different ranges:

If sum < range_1

Then No fingers

If range_1 < sum < range_2

Then 1 finger


Then 2 fingers


Then 3 fingers


Then 4 fingers

If range_5 < sum

Then 5 fingers

The advantage of this method is huge: Such programming is quite easy and very fast.

However, it is not a very efficient way:

According to the previous sections, the width of a finger will generally be 4 to 5 pixels,

and let’s suppose its length is 15 to 20 pixels, according to the user. So let’s consider that for

User 1, each finger has a dimension:

29


For User 1, four fingers will lead to about 200 pixels. Let’s suppose that for User 2, the width of

a finger is 5 and its length is 20. Finger dimension is:

For User 2, two fingers will also lead to 200 pixels. The Consequence is that the program will get

confused and may tell the User 2 he is exhibiting five fingers (two fingers and the thumb) when

he just shows three of them (two and the thumb)!

Another issue is that even if it is always the same use who do signs, and that the different

ranges have been optimized for his average finger size, errors will probably occur if he doesn’t

open widely the hand: Indeed, if the hand is fully open, let’s assume no error will occur, but if

the fingers are a little bit cockled (“closed”), then for each finger, the sum of its relative pixels

will be smaller, and if it is the case of several fingers, the global sum may lead to a mistake. An

example of this phenomenon is given here:

The program answers ‘5’ The program answers ‘4’

In this example, when the two last fingers are cockled, the sum of their pixels makes the program

consider there are only four fingers, because the global sum is almost the same than the one the

program would obtain if four fingers were exhibited in a hand fully opened.

This very simple method is efficient for a single user, and if he accepts more constraints

concerning the allowed signs. Such solution is not acceptable for the project, at least because it

has to work with several users. Then, it is necessary to consider some more sophisticated

solutions.

30


Simple Block Counting Analysis

The program has to count the number of fingers? So let’s create a picture in which will remain

only the fingers. It is easy to do, given that the orientation of the hand is known. Cropping the

left part of the picture (including the thumb) will cause that only the fingers remain on the

picture

In such cases, the number of fingers is the number of blocks in the cropped picture, plus

1, because the thumb has to be considered, even if it has been cropped.

This method offers a huge advantage: its simplicity. Indeed, no calculus or special

treatment is required; the only operation we have to do is to compute the number of blocks in the

shortened image. Using a Matlab function, in the Image Processing Toolbox, called bwlabel

makes the coding very easy.

31


However, this method has also some major drawbacks. Indeed, the re-sizing operation

can make some well-separated fingers turn into to two joined fingers, that will look like one

single big finger, and it will cause an error in the evaluation of the number of fingers.

If the user wants to avoid such problems, he has to open widely the hand. By this way,

any confusion get impossible. The problem is that if the user opens the hand widely, the index

finger or the atrial finger (the fifth one) may not be present in the last columns of the picture. So

the user has to open the hand widely, but not too much, and he may need time to find the best

opening for each one of the different signs he want to do. And even if we suppose, that he

succeed in doing it, another phenomenon occurs:

If the user opens the hand just enough according to the sign he does, some noisy pixels

that remain, although the noise removal, may join two fingers. Then the function bwlabel will

consider they are just one block and it will imply an error in the estimation of the number of

fingers.

This method is very interesting and efficient while considering its low level of complexity and its

simple coding. However, there are possibilities to improve this method, because the rate of error

is can be reduced. With this method, around 70-75 percent of the allowed signs (say: that include

the thumb fingers) are successfully classified.

32


Weighted Averaging Analysis

In order to understand the basic idea that is discussed here, let’s consider the differences

and the common points between the methods that have already been introduced:

- The Pixel Counting method and the Edges Counting method were some very simple

solutions, but their problem was they were not efficient enough. Their advantage was

their low-complexity level for the implementation, given that they were geometrical

solutions.

- The Neural Networks solution has been proven quite more efficient, but it requires

training, and special management and processing of the binary picture. Moreover, when

looking at the weights of the input layer, it appears that the neural network just realizes a

kind of weighted averaging.

Hence, the motivation in this section is to try to realize weighed averaging by a simpler

way.

Choosing the weights

In this section, the explanations will refer to the following picture, which has already

been introduced in the section “Edge Counting Analysis”. This picture was an example that leads

to a classification error:

33


First of all, let’s suppose not weights are sued, say weights are all set to the same value,

one for example. When averaging the pixel value, all the pixels will have the same importance.

Given that the left part of the picture is not relevant in order to compute the number of fingers

(except the thumb finger, all the fingers are in the right part of the picture), the only columns that

will be considered are the columns 15 to 25 for example.

It has been proven previously that only edge counting

in this area is not efficient in this case, and that only counting

the number of pixels set to 1 may lead to incoherent results,

given that the relative dimensions of a finger depend on the user

and that the following picture will lead to ‘4’.

One solution is to mix these two methods, say to realize weighted averaging when the

weight of each pixel set to 1 is half the number of edges in the column of the considered pixel.

For example, according to the picture provided at the beginning of this section, the pixel at line

19, column 16 is set to one and its weight is 6 given that there are 12 edges in the column 16.

A fast-approximated calculus leads to the following results:

If there is only the thumb finger in the picture, no pixel will be set to 1 in the columns 15

up to 25, and the weighted averaging will lead to 0.

If there are the thumb and one fingers in the picture, about 60 to 100 pixels will be set to

1 in the columns 15 to 25, and the number of edges in this area should be 1. So the

weighted averaging should lead to values from 60 to 100.

If there are the thumb and two fingers in the picture, about 2*60 to 2*100 pixels will be

set to 1 in the columns 15 to 25, and the number of edges in this area should be 2. So the

weighted averaging should lead to values from 2*60*2 to 2*100*2, say 240 to 400.

If there are the thumb and three fingers in the picture, about 3*60 to 3*100 pixels will be



34


If there are the thumb and four fingers in the picture, about 4*60 to 4*100 pixels will be



According to these values, let’s create the following bounds:

Bound between 1 and 2 fingers:




That is to say that the algorithm has to realize the following operations:

1) Calculate

2) Estimate the number of fingers in the picture using:

35


The consequence is that the distance between typical WA values (values of the weighted

averaging) increases at an exponential rate, and that makes the classification less sensitive to

errors. Indeed, in this case, the bound between two close possibilities is always large: for

example it has been said that the typical WA when 5 fingers is (960+1600)/2=1280. An error can

occur only if the calculated WA, which should be 1280, is under 930, the calculation error has to

be bigger than 350. This can happen only if there are a lot of errors on the number of edges in

each column and if the relative dimensions of the fingers are “strange”: one finger very thick,

and three fingers very thin and the thumb.

In order to understand the efficiency of this method, let’s compare it to the bound that

would have been considered in a simple pixel counting algorithm: for four fingers, the sum of the

pixel will be about 3*60=180, and for five fingers, it would be equal to 4*60=240. The bound

between 4 and 5 fingers would be (180+240)/2=210. An error on five fingers happens when less

than 210 pixels are counted in the columns 15 to 25. The margin is:

240-210=30.

When comparing the error margins, it appears that without any weights, it is equal to 30,

and that with weights chosen as number of edges in the column of the analyzed pixel, this margin

tend to 350, so more than 10 times the previous margin! That’s why this method is quite better

the simple pixel counting one: different number of fingers lead to different ranges that are

separated by very large spaces that only huge errors can get through, and such errors are not very

frequent.

Without weights, confusion may occur when several fingers are exhibited (three, four or

five fingers). The use of weights makes these confusion quite more rare because three four and

five fingers pictures turn into WA values that are very distant one to the other.

36


Matlab Operations

Building GUI interfaces in Matlab

This example shows how to build user GUI in Matlab.

Start gui builder by typing

>>guide

Select "Blank GUI", click OK

The GUI window will open

37


Resize the design window.

Using the pallette on the left, drag and drop, resize and position the canvas, buttons, and static text windows

38


Double-click on an object to open the properties dialog. Change the captions on the buttons

and remove "Static Text" string from the text window. Set the font size 30 for the text

windows and change horizontal alingment to "right."

The GUI is finished. Save the work.

The rest of the design process will take care of the functionality provided by each GUI component

Neural Network Toolbox

39


MATLAB with tools for designing, implementing, visualizing, and simulating neural networks. Neural

networks are invaluable for applications where formal analysis would be difficult or impossible, such

as pattern recognition and nonlinear system identification and control. Neural Network Toolbox

software provides comprehensive support for many proven network paradigms, as well as graphical

user interfaces (GUIs) that enable you to design and manage your networks. The modular, open,

and extensible design of the toolbox simplifies the creation of customized functions and networks.

Neural Network Toolbox GUIs make it easy to work with neural networks. The Neural

Network Fitting Tool is a wizard that leads you through the process of fitting data using

neural networks. You can use the tool to import large and complex data sets, quickly

create and train networks, and evaluate network performance.

Key features

GUI for creating, training, and simulating neural networks

Support for the most commonly used supervised and

unsupervised network architectures

Comprehensive set of training and learning functions

Dynamic learning networks,including time delay, nonlinear

autoregressive (NARX), layer-recurrent, and custom dynamic

Simulink blocks for building neural networks and advanced

blocks for control systems applications

Support for automatically generating Simulink blocks from

neural network objects

Preprocessing and postprocessing functions and Simulink blocks

for improving network training and assessing network performance

40


Network Architectures

Neural network toolbox supports both supervised and unsupervised networks.

Supervised Networks

Supervised neural networks are trained to produce desired outputs in response to

sample inputs, making them particularly well suited to modeling and controlling dynamic

systems, classifying noisy data, and predicting future events.

Neural Network Toolbox supports four supervised networks:feedforward, radial basis, dynamic,

and learning vectorquantization (LVQ).

Feedforward networks have one-way connections from input to output layers. They are most

commonly used for prediction, pattern recognition, and nonlinear function fitting. Supported

feedforward networks include feedforward backpropagation,cascade-forward backpropagation,

feedforward input-delay backpropagation, linear, and perceptron networks.

Radial basis networks provide an alternative, fast method for designing nonlinear feedfor-

41


ward networks. Supported variations include generalized regression and probabilistic

neural networks.

Dynamic networks use memory and recurrent feedback connections to recognize spatial and

temporal patterns in data. They are commonly used for time-series prediction, nonlinear dynamic

system modeling, and control system applications. Prebuilt dynamic networks in the toolbox

include focused and distributed time-delay, nonlinear autoregressive (NARX), layer-recurrent,

Elman, and Hopfield networks. The toolbox also supports dynamic training of custom networks

with arbitrary connections.

LVQ is a powerful method for classifying patterns that are not linearly separable. LVQ lets you

specify class boundaries and the granularity of classification.

Unsupervised Networks

Unsupervised neural network saretrained by letting the network continually adjust itself

to new inputs.They find relationships within data and can automatically define classification

schemes.

Neural Network Toolbox supports two types of self-organizing, unsupervised etworks:

competitive layers and self-organizing maps.

Competitive layers recognize and group similar input vectors. By using these groups, the

network automatically sorts the inputs into categories.

Training and Learning Functions

Training and learning functions are mathematical procedures used to automatically adjust the

network’s weights and biases. The training function dictates a global algorithm that affects all the

weights and biases of a given network. The learning function can be applied to individual weights

and biases within a network.

42


Neuron Model

Simple Neuron

A neuron with a single scalar input and no bias is shown on the left below.

Figure : Neuron

The scalar input p is transmitted through a connection that multiplies its strength by the

scalar weight w, to form the product wp, again a scalar. Here the weighted input wp is the

only argument of the transfer function f, which produces the scalar output a. The neuron

on the right has a scalar bias, b. You may view the bias as simply being added to the

product wp as shown by the summing junction or as shifting the function f to the left by

an amount b. The bias is much like a weight, except that it has a constant input of 1. The

transfer function net input n, again a scalar, is the sum of the weighted input wp and the

bias b. This sum is the argument of the transfer function f. Here f is a transfer function,

typically a step function or a sigmoid function, that takes the argument n and produces

the output a. Examples of various transfer functions are given in the next section. Note

that w and b are both adjustable scalar parameters of the neuron. The central idea of

neural networks is that such parameters can be adjusted so that the network exhibits some

desired or interesting behavior.

Thus, we can train the network to do a particular job by adjusting the weight or bias

parameters, or perhaps the network itself will adjust these parameters to achieve some

desired end. All of the neurons in the program written in MATLAB have a bias.

43


.

44


Feed forward Neural Networks

Feed forward neural networks (FF networks) are the most popular and most widely used models

in many practical applications. They are known by many different names, such as "multi-layer

perceptrons."

Figure illustrates a one-hidden-layer FF network with inputs ,..., and output . Each arrow in

the figure symbolizes a parameter in the network. The network is divided into layers. The input

layer consists of just the inputs to the network. Then follows a hidden layer, which consists of

any number of neurons, or hidden units placed in parallel. Each neuron performs a weighted

summation of the inputs, which then passes a nonlinear activation function , also called

the neuron function.

A feedforward network with one hidden layer and one output.

Mathematically the functionality of a hidden neuron is described by

where the weights { , } are symbolized with the arrows feeding into the neuron.

The network output is formed by another weighted summation of the outputs of the neurons in

the hidden layer. This summation on the output is called the output layer. In Figure there is only

one output in the output layer since it is a single-output problem. Generally, the number of output

neurons equals the number of outputs of the approximation problem.

45


The output of this network is given by

where n is the number of inputs and nh is the number of neurons in the hidden layer. The

variables { , , , } are the parameters of the network model that are represented

collectively by the parameter vector ..

Note that the size of the input and output layers are defined by the number of inputs and outputs

of the network and, therefore, only the number of hidden neurons has to be specified when the

network is defined..

In training the network, its parameters are adjusted incrementally until the training data satisfy

the desired mapping as well as possible; that is, until ( ) matches the desired output y as closely

as possible up to a maximum number of iterations

The FF network in Figure is just one possible architecture of an FF network. You can modify the

architecture in various ways by changing the options. For example, you can change the activation

function to any differentiable function you want..

46


Advantages of Neural Computing

There are a variety of benefits that an analyst realizes from using neural networks in their

work.

Pattern recognition is a powerful technique for harnessing the information in

the data and generalizing about it. Neural nets learn to recognize the patterns

which exist in the data set.

The system is developed through learning rather than programming.

Programming is much more time consuming for the analyst and requires the

analyst to specify the exact behavior of the model. Neural nets teach

themselves the patterns in the data freeing the analyst for more interesting

work.

Neural networks are flexible in a changing environment. Rule based systems

or programmed systems are limited to the situation for which they were

designed--when conditions change, they are no longer valid. Although neural

networks may take some time to learn a sudden drastic change, they are

excellent at adapting to constantly changing information.

Neural networks can build informative models where more conventional

approaches fail. Because neural networks can handle very complex

interactions they can easily model data which is too difficult to model with

traditional approaches such as inferential statistics or programming logic.

Performance of neural networks is at least as good as classical statistical

modeling, and better on most problems. The neural networks build models

that are more reflective of the structure of the data in significantly less time.

47


Limitations of Neural Computing

There are some limitations to neural computing. The key limitation is the neural

network's inability to explain the model it has built in a useful way. Analysts often want

to know why the model is behaving as it is. Neural networks get better answers but they

have a hard time explaining how they got there.

There are a few other limitations that should be understood. First, It is difficult to extract

rules from neural networks. This is sometimes important to people who have to explain

their answer to others and to people who have been involved with artificial intelligence,

particularly expert systems which are rule-based.

As with most analytical methods, you cannot just throw data at a neural net and get a

good answer. You have to spend time understanding the problem or the outcome you are

trying to predict. And, you must be sure that the data used to train the system are

appropriate and are measured in a way that reflects the behavior of the factors. If the data

are not representative of the problem, neural computing will not product good results.

This is a classic situation where "garbage in" will certainly produce "garbage out."

Finally, it can take time to train a model from a very complex data set. Neural techniques

are computer intensive and will be slow on low end PCs or machines without math

coprocessors. It is important to remember though that the overall time to results can still

be faster than other data analysis approaches, even when the system takes longer to train.

Processing speed alone is not the only factor in performance and neural networks do not

require the time programming and debugging or testing assumptions that other analytical

approaches do.

48


MICROCONTROLLER AND ROBOT

Power Supply

We are directly providing 12V D C supply. The 12V D C is converted into 5V DC supply. 12v is

required for motor driving and 5 v for the microcontroller assembly.

12V is converted into 5V with the help of 7805 and capacitor combination.

Microcontroller(8051)

A microcontroller has a CPU in addition to a fixed amount of RAM, ROM, I/O ports, and timers are all embedded together on one chip. These are used in embedded system. We have used 80c51 8-bit flash microcontroller family AT89C5124PIwith 64k of flash memory and 1kB of RAM. The 89C5124PI device contains a non-volatile 64kB Flash program memory that is both parallel programmable and serial In-System and In-Application Programmable. In-System Programming (ISP) allows the user to download new code while the microcontroller sits in the application.

In-Application Programming (IAP) means that the microcontroller fetches new program code and reprograms itself while in the system. This allows for remote programming over a modem link. A default serial loader (boot loader) program in ROM allows serial In-System programming of the Flash memory via the UART without the need for a loader in the Flash code. For In-Application Programming, the user program erases and reprograms the Flash memory by use of standard routines contained in ROM.

49


This device is a Single-Chip 8-Bit Microcontroller manufactured in advanced CMOS process and is a derivative of the 80C51 microcontroller family. The instruction set is 100% compatible with the 80C51 instruction set.The device also has four 8-bit I/O ports, three 16-bit timer/event counters, a multi-source, four-priority-level, nested interrupt structure, an enhanced UART and on-chip oscillator and timing circuits. The added features of the AT89C5124PI makes it a powerful microcontroller for applications that require pulse width modulation, high-speed I/O and up/down counting capabilities such as motor control.

Features :-

a) 80C51 Central Processing Unit b) On-chip Flash Program Memory with In-System Programming(ISP) and In-Application Programming (IAP) capability c) Boot ROM contains low level Flash programming routines for downloading via the UART d) Can be programmed by the end-user application (IAP) e) 6 clocks per machine cycle operation (standard) f) 12 clocks per machine cycle operation (optional) g) Speed up to 20 MHz with 6 clock cycles per machine cycle (40 MHz equivalent performance); up to 33 MHz with 12 clocks per machine cycle h) Fully static operation i) RAM expandable externally to 64 kB j) 4 level priority interrupt k) 8 interrupt sources l) Four 8-bit I/O ports m) Full-duplex enhanced UART n) Framing error detection o) Automatic address recognition p) Power control modes Clock can be stopped and resumed – Idle mode – Power down mode q) Programmable clock out r) Second DPTR register s) Asynchronous port reset t) Low EMI (inhibit ALE) u) Programmable Counter Array (PCA) --- PWM ---Capture/Compare

50


PIN DESCRIPTION :

a) Ground: 0 V reference.

b) Power Supply(Vcc): This is the power supply voltage for normal, idle, and power- down operation.

c) Port 0(8 I/O pins from 39-32): Port 0 is an open-drain, bidirectional I/O port. Port 0 pins that have 1s written to them float and can be used as high-impedance inputs. Port 0 is also the multiplexed low-order address and data bus during accesses to external program and data memory. In this application, it uses strong internal pull-ups when emitting 1s.

d) Port 1(8 I/O numbered 1-8):

Port 1 is an 8-bit bidirectional I/O port withinternal pull-ups on all pins except P1.6 and P1.7 which are open Drain.Port 1 pins that have 1s written to them are pulled high by the internal pull-ups and can be used as inputs. As inputs, port 1 pins that are externally pulled low will source current because of the internal pull-ups.

Alternate functions for 89C51RB2/RC2/RD2 Port 1 include: 1) T2 (P1.0): Timer/Counter 2 external count input/Clockout

2) T2EX (P1.1): Timer/Counter 2 Reload/Capture/Direction Control 3) ECI (P1.2): External Clock Input to the PCA 4) CEX0 (P1.3): Capture/Compare External I/O for PCA module 0 5) CEX1 (P1.4): Capture/Compare External I/O for PCA module 1 6) CEX2 (P1.5): Capture/Compare External I/O for PCA module 2 7) CEX3 (P1.6): Capture/Compare External I/O for PCA module 3 8) CEX4 (P1.7): Capture/Compare External I/O for PCA module 4

e) Port 2(21-28):

Port 2 is an 8-bit bidirectional I/O port with internal pull-ups. Port 2 pins that have 1s written to them are pulled high by theinternal pull-ups and can be used as inputs. As inputs, port 2 pins that areexternally being pulled low will source current because of the internal

pull-ups. Port 2 emits the high-order address byte during fetches from external program memory and during accesses to external data memor that use 16-bit addresses (MOVX @DPTR).

51


f) Port 3(10-17):

Port 3 is an 8-bit bidirectional I/O port with internal pull-ups. Port 3 pins that have 1s written to them are pulled high by theinternal pull-ups and can be used as inputs. As inputs, port 3 pins that areexternally being pulled low will source current because of the pull-ups.

Port 3 also serves the special features of the 89C51RB2/RC2/RD2, as listed below:

I. RxD (P3.0): Serial input port II. TxD (P3.1): Serial output port III. INT0 (P3.2): External interrupt IV. INT1 (P3.3): External interrupt V. T0 (P3.4): Timer 0 external input VI. T1 (P3.5): Timer 1 external input VII. WR (P3.6): External data memory write strobe

VIII. RD (P3.7): External data memory read strobe

g) RST Reset(pin 9): A high on this pin for two machine cycles while the oscillator is running, resets the device. An internal diffused resistor toVSS permits a power-on reset using only an external capacitor to VCC.

h) ALE (Address Latch Enable, pin 30): Output pulse for latching the lowbyte of the address during an access to external memory. In normaloperation, ALE is emitted twice every machine cycle, and can be used for external timing or clocking. Note that one ALE pulse is skippedduring each access to external data memory. ALE can be disabled by setting SFR auxiliary.0. With this bit set, ALE will be active only during a MOVX instruction.

i) PSEN (Program Store Enable, pin 29): The read strobe to externalprogram memory. When executing code from the external programmemory, PSEN is activated twice each machine cycle, except that twoPSEN activations are skipped during each access to external data

memory. PSEN is not activated during fetches from internal program memory.

j) EA/VPP(External Access Enable/Programming Supply Voltage, pin 31): EA must be externally held low to enable the device to fetch code

from external program memory locations. If EA is held high, the deviceexecutes from internal program memory. The value on the EA pin islatched when RST is released and any subsequent changes have no

52


effect. This pin also receives the programming supply voltage (VPP)during Flash programming.

k) XTAL1 and XTAL2(pin 19 & 18): Input & output respectively to the inverting oscillator amplifier and input to the internal clock generatorcircuits.

To avoid “latch-up” effect at power-on, the voltage on any pin (other than VPP) must not be higher than VCC + 0.5 V or less than VSS – 0.5 V.

Motor Driver(ULN2004A)

The ULN2004A is high voltage, high current darlington arrays each containing seven open

collector darlington pairs with common emitters. Each channel rated at 500mA and can withstand

peak currents of 600mA.Suppression diodes are included for inductive load driving and the inputs

are pinned opposite the outputs to simplify board layout.

These versatile devices are useful for driving a wide range of loads including solenoids, relays DC

motors, LED displays filament lamps, thermal print-heads and high power buffers

Maximum output voltage is 50V

The 2004A is supplied in 16 pin plastic DIP packages with a copper lead frame to reduce thermal

resistance.

53


Robot

The robot is two wheel robot with a castor wheel provided for the support.ULN2004A ic is

used for driving the motors. Stepper motor has been used. As the name suggests, stepper

motors do not spin freely like DC motors; they rotate in discrete steps, under the command of a

controller. This makes them easier to control, as the controller knows exactly how far they

have rotated, without having to use a sensor. Therefore they are used on many robots. Stepper

motor used is a unipolar motor ,hence having six wires coming out of it. Four of them are used

for receiving data from the microcontroller for its movement while two are short circuited and

connected to 12V DC supply.

For the movement of motor ,its alternate windings are excited continuously with the help of

assembly code

Stepwise procedure/ flow:

54


Input pattern to be recognized

Recognized pattern

55

Sampling

Generation of templates

Template matching with input pattern

Best match


TIME ACTIVITY CHART:

5 ACTIVITIES

4

3

2

1 3 4 7 10 12 Months

Activities: 1- Literature review

2- Selection of Application & decide the specifications of the

equipments required for same.

3 – Make an experimental set-up.

4 - Conduct trials, plot results & conclusion.

5 - Preparation of report

CONCLUSION

56


We proposed a fast and simple algorithm for a hand gesture recognition problem. Given

observed images of the hand, the algorithm segments the hand region, and then makes an

inference on the activity of the fingers involved in the gesture. We have demonstrated the

effectiveness of this computationally efficient algorithm on real images we have acquired. Based

on our motivating robot control application, we have only considered a limited number of

gestures. Our algorithm can be extended in a number of ways to recognize a broader set of

gestures. The segmentation portion of our algorithm is too simple, and would need to be

improved if this technique would need to be used in challenging operating conditions. However

we should note that the segmentation problem in a general setting is an open research problem

itself. Reliable performance of hand gesture recognition techniques in a general setting require

dealing with occlusions, temporal tracking for recognizing dynamic gestures, as well as 3D

modeling of the hand, which are still mostly beyond the current state of the art.

57


FUTURE SCOPE

Even with limited processing power, it will be possible to design very efficient algorithms in

order to

• Track people,

• (Re-)identify them

• Understand their (static) gestures

• Control a robot

Our software has been designed to be reusable and many behaviors that are more complex may

be added to our work. Because we limited ourselves to low processing power, our work could

easily be made more performing by adding a state-of-the-art processor. The use real embedded

OS could improve our system in terms of speed and stability. In addition, implementing more

sensor modalities would improve robustness even in very complex scenes. Our system has

shown the possibility that interaction with machines through gestures is a feasible task and the

set of detected gestures could be enhanced to more commands by implementing a more complex

model of a human being. In the future, service robots executing many different tasks from house-

maid work to nuclear power plant services might arise and become a common part of everyday

live normal as computers nowadays.

58


BIBLIOGRAPHY

Books and references

Matlab by R P Singh

The 8051 Microcontroller by Mazidi

Image Processing book by Bijith Marakarkandy

Digital Image Processing: An Algorithmic Approach by Joshi M A

Neural Network by Gonzales Cenelia

www.wikipedia.com

www.google.com

ieeexplore.ieee.org

59