72
PROJECT REPORT ON Design & Implementation of a Person Authenticating & Commands Following Robot Submitted In Partial Fulfillment of the Requirements for the Degree of BACHELOR OF TECHNOLOGY Under the able guidance of By Dr. Haranath Kar Aditya Agarwal (2002519) Subhayan Banerjee (2003516) Nilesh Goel (2003560) Chandra Veer Singh (2002577)

Speech Recognition

Embed Size (px)

Citation preview

Page 1: Speech Recognition

PROJECT REPORT

ON

Design & Implementation of a Person Authenticating & Commands Following Robot

Submitted

In Partial Fulfillment of the Requirements

for the Degree of

BACHELOR OF TECHNOLOGY

Under the able guidance of By Dr. Haranath Kar Aditya Agarwal

(2002519) Subhayan Banerjee

(2003516)

Nilesh Goel (2003560)

Chandra Veer Singh

(2002577)

Page 2: Speech Recognition

ii

Department of Electronics and Communication Engineering

MOTILAL NEHRU NATIONAL INSTITUTE OF TECHNOLOGY

ALLAHABAD 211004, INDIA

MAY 2007

Abstract

In this project, the algorithms for the speech recognition

and face recognition have been developed and implemented on

MATLAB 7.0.1.These algorithms can be used for any security

system in which the person authentication is required. A security system developed using these two algorithms first recognizes the person to be authenticated using the face recognition algorithm

and after proper authentication follows the commands of the

person using the speech recognition algorithm. The speech

recognition algorithm basically uses the speech templates and the face recognition algorithm uses the Fourier Descriptors for the identification purpose. The proposed algorithms are simpler, faster

and economic as compare to previously reported algorithm. These algorithms can be easily implemented on DSP kits (Texas or Analog

Devices) to develop an autonomous wireless security system. This

security system has been easily mounted on SODDRO (Sound Direction Detection Robot) developed by the same group in November 2006.

Page 3: Speech Recognition

iii

Acknowledgments

We take this opportunity to express our deep sense of

gratitude and regard to Dr. Haranath Kar, Asst.Professor,

Department of Electronics and Communication Engineering, MNNIT, Allahabad for his continuous encouragement and able guidance, we

needed to complete this project.

We are indebted to Dr. T.N.Sharma, Dr.Sudarshan Tiwari, Mr.

Asim Mukherji, Mr.Arvind Kumar and Mr. Rajeev Gupta. of MNNIT Allahabad, for their valuable comments and suggestions that have

helped us to make it a success. The valuable and fruitful discussion with them was of immense help without which it would have been difficult to present this Robot in its present form.

We also wish to thank Mrs. Vijya Bhadauria, Project In-charge,

,Dr.V.K.Srivastava and Romesh Nandwana(B.Tech 2nd Yr.,ECE) for

their very kind support throughout the project. Finally, we are grateful to P.P.Singh, staff, Project and PCB Lab,

Chandra Vali Tiwari and Ram Sajivan, staff, Basic Electronics Lab,

Ram ji, staff, Computer Architecture Lab of MNNIT, Allahabad and

the administration of MNNIT, Allahabad for providing us the help

required.

Aditya Agarwal

Subhayan Banerjee

Nilesh Goel Chandra Veer Singh

Page 4: Speech Recognition

v

Certificate

TO WHOM IT MAY CONCERN

This is to certify that project titled

“Design & Implementation of a Person Authenticating &

Commands Following Robot”

Submitted by:

1. Aditya Agarwal

2. Subhayan Banerjee 3. Nilesh Goel

4. Chandra Veer Singh

Of B.Tech 8th semester, Electronics & Communication

Engineering, in partial fulfillment of the requirement for the degree of Bachelor of Technology in Electronics &

Communication Engineering, MNNIT(Deemed University),

Allahabad ,during the academic year 2006-07 is their original endeavor carried out under my supervision and guidance and

has not been presented anywhere else.

Dr. Haranath Kar Department of Electronics and Communication Engineering

Motilal Nehru National Institute of Technology Allahabad

Allahabad 211004

05 MAY 2007

Page 5: Speech Recognition

vi

Table of Contents

Page

Abstract ii

Acknowledgments iii

Certificate iv

Table of Contents v

List of Tables vi

List of Figures vii Chapter 1: Introduction 1 1.1 Purpose of This Document 2

Chapter 2: Algorithm for Face Recognition 3

Chapter 3: Algorithm for Speech Recognition 6 Chapter 4: System Description & Hardware Implementation 8 4.1a Person to be authenticated 8 4.1b web Camera 8

4.1c Image Acquisition & Processing Tool Box 8 4.2a Voice Commands 9

4.2b Microphone 9 4.2c Sampler 10

4.2d Band Pass Filter 10

4.2e Processing & Decision Making Unit 10

4.2f Microcontroller & Motor Controller Unit 10 4.2g Mechanical Assembly 11

4.3 List of Components 13 Chapter 5: Software Section 14

5.1 MATLAB Code for Face Recognition 14 5.2 MATLAB Code for Speech Recognition 21 5.3 Assembly Code for Sound Detection 33

Chapter 6: Results 51

Chapter 7: Summary & Conclusion 52 7.1 Summary 52

7.2 Conclusion 52

Chapter 8: Future Scope 53 References 54

Appendix A 55

Page 6: Speech Recognition

vi

List of Tables

Table Title Page

6.1 Table for Results of Face Recognition 51 6.2 Table for Results of Speech Recognition 51

Page 7: Speech Recognition

vii

List of Figures

Figure Title page

2.1 A Simple Binary Image 3 2.2 Result of Structuring element on Fig.2.1 4

4.1 Block Diagram of Face Recognition System 8

4.2 Block Diagram of Speech Recognition System 9 4.2a Microcontrollers & Motor Controller Unit 11

4.2b Bottom View of Mechanical Assembly 12

Page 8: Speech Recognition

1

Chapter 1

Introduction

Speech Recognition & Face Recognition are two important

areas that have drawn the attention of so many researchers in the recent years. Face Recognition in a real time application with

sufficient efficacy is still a challenge for the researchers keeping in

mind the constraints imposed by memory availability and

processing time. Here a two dimensional approach of face recognition is introduced. In the proposed algorithm for face

recognition first the face is detected in the image using the techniques of edge detection and then the face is recognized with

the help of Fourier descriptors. The main advantage of using Fourier descriptors is that these are invariant to translation,

rotation and scaling of the observed object. For Speech Recognition the speech templates are used that basically depends upon the

intensity and the accent of the speech.

The Speech Recognition and Face Recognition modules are the most important stages of any humanoid robot that needs

proper authentication of the person before following any instruction. Apart from the humanoid robot the proposed

algorithms can be also used in different real time industrial applications.

The rest of the report is organized as follows. The description of the algorithm for Face Recognition is given in Chapter 2. In Chapter 3, an algorithm for reliable Recognition of Speech is

proposed. The Chapter 4 consists of System Description and

Hardware Implementation. The Software Section is given in

chapter 5.Results are depicted in Chapter 6.To bind up the report Summary & Conclusion are shown in Chapter 7.Chapter 8 deals

with Future Scope and at last References are given.

Page 9: Speech Recognition

2

1.1 Purpose of this Document

This project report is prepared as the part of B.Tech final year project in Electronics and Communication Department,

MNNIT, Allahabad. The purpose of this project report is to give the detailed

description of the algorithms used for speech recognition and face

recognition; hardware for speech recognition and software

programs used for the development of the security system that at first authenticates the person using face recognition and after

proper authentication follows the predefined commands given by

him.

Page 10: Speech Recognition

3

Chapter 2 Algorithm for Face Recognition

This algorithm consists of both the face detection and

recognition parts .First the edge of the face is detected by using a

morphological algorithm of boundary extraction. In this algorithm the image is first converted into a binary image and then erosion is

performed after taking a structuring element of 1‘s and of suitable dimensions (generally 5 ×5).

The edge of the face image ( I ) can be obtained by first eroding A by a structuring element B and then performing the set

difference between A and its erosion. If edge of A is denoted by E

(A) then

E (A) = A – (AӨB)

where AӨB shows erosion of image A by structuring element B.

Use of this 5 ×5 structuring element on Fig. 2 would result in

an edge of between 2 and 3 pixels thick as shown in Fig. 3.

Figure 2.1: A simple binary image.

Page 11: Speech Recognition

4

Figure 2.2: Result of structuring element in Fig. 2.1

After getting the edge Fourier descriptors for the edge are calculated. Fourier descriptors are used for the face detection of

any object found on input image. The main advantage of using Fourier descriptors is their invariance to translation, rotation and

scaling of the observed object. Let the complex array ,

represent the edge belonging to face. Here value of n

depends upon the size of the face and the dimensions of the image

matrix obtained during image acquisition.

The Fourier transform coefficient is calculated by

The Fourier descriptors are obtained from the sequence by

truncating the elements and , then by taking the absolute

value of the remaining elements and dividing every element of thusly obtained array by . To summarize the Fourier descriptors

are

Ck-2 , k= 2, 3…..n-1.

The Fourier descriptors for each face edge will be invariant to

rotation, translation and scaling. Idealized translation only affects .So it is truncated while evaluating the Fourier

descriptors. Idealized rotation only causes the multiplication of

with each element. So while calculating Fourier descriptors the

absolute value is calculated. Idealized scaling accounts for

Page 12: Speech Recognition

5

multiplication of a constant C with every element and so this effect

can be nullified by dividing all Fourier transforms by one of the calculated Fourier Transform Coefficient. As has been already

truncated, one good choice is .So every element is divided by .

All of the properties described are correct when idealized case

of translation, rotation and scaling is considered, but as the input

images acquired by our acquisition system are spatially sampled and all of the transformations will occur before image sampling, the

assumptions made concerning all of the transformations

translation, rotation and scaling—are not correct, but are only

approximations. However, in practical usage this will not cause any difficulties.

For face recognition a library of images of the person to be authenticated is created. First 3 images of each of the persons are taken with the help of web camera and the image acquisition

toolbox as described before and converted to binary image. The edge of the binary images and their corresponding Fourier

descriptors are calculated according to the algorithm using image processing toolbox.

For authentication of a person the same procedure is repeated to calculate the edge of the face and its Fourier descriptors.

For recognition purpose the absolute difference between the

first 100 Fourier descriptors of the sample image and the Fourier

descriptors of the dictionary images are calculated and summed using the following formula.

Sum=

For a sample image there are 3 such sums are calculated for

images of a person saved in the dictionary as there are 3 images

saved for each person in the dictionary. After getting these sums,

two highest sums are taken and average of these two final sums is

the final sum that is to be used for recognition purpose. The

numbers of final sums depend upon the number of authenticated persons whose images are saved in the dictionary.

For recognition the minimum sum is found out and this minimum sum corresponds to the image of the authenticated

person.

Page 13: Speech Recognition

6

Chapter 3 Algorithm for Speech Recognition

To recognize the voice commands efficiently different

parameters of speech like pitch, amplitude pattern or power/energy can be used. Here to recognize the voice commands

power of the speech signal is used.

First the voice commands are taken with the help of a microphone that is directly connected to PC. After it the analog

voice signals are sampled using MATLAB 7.0.1. As speech signals generally lie in the range of 300Hz-4000 Hz, so according to Nyquist Sampling Theorem, minimum sampling rate required

should be 8000 samples/second.

After sampling, the discrete data obtained is passed through a

band pass filter having pass band frequency in the range of 300 - 4000 Hz. The basic purpose of using band pass filter is to eliminate

the noise that lies at low frequencies (below 300 Hz) and generally above 4000 Hz there is no speech signal.

This algorithm for voice recognition comprises of speech templates. The templates basically consist of the power of discrete

signals. To create the templates here the power of each sample is calculated and then the accumulated power of 250 subsequent

samples is represented by one value. For example, in the implemented algorithm 16000 samples are taken, then power of

discrete data will be represented by 64 discrete values as power of

subsequent 250 samples (i.e. 1-250.251-500….15750-16000) is accumulated and represented by one value. The numbers of

samples taken and grouped are absolutely flexible and can be

changed keeping in mind the required accuracy, memory space

available and the processing time. For recognition of commands first a dictionary is created that

consists of templates of all the commands that the robot has to

follow (like in our case these are ‗Turn Left‘, ‗Move Right‘, ‗Come Forward‘ and ‗Go Back‘. For creating the dictionary the same

command is taken several times (15 in this case) and template is

created each time. For creating the final template the average of all these templates is taken and stored.

Page 14: Speech Recognition

7

After creating the dictionary of templates, the command to be

followed is taken with the help of the microphone and the template

of the input command signal is created using the same procedure as mentioned earlier.

Now the template of command received is compared with the templates of dictionary using Euclidian distance. It is the

accumulation of the square of each difference between the value of

dictionary template and that of command template at each sample

points. The formula can be given as

Euclidian Distance=

where i denotes the number of sample points, which is 32 in the proposed algorithm. After calculating Euclidian distance for each

dictionary template, these distances are sorted in the ascending

order to find out the smallest distance among them. This distance corresponds to a particular dictionary template which is the

template belonging to a particular dictionary command. Then the Robot detects that particular command given by the operator and

performs the task accordingly. If the command given by the operator does not match with any of the dictionary command then the Robot should not follow that command. In order to incorporate

this feature in the system an individual maximum range of values

of Euclidian distance for all the dictionary commands has been set. If the calculated Euclidian distance of received command does not

lie in the range for any dictionary command, then the command received is considered as a strange one and robot requests for a

familiar command. The efficiency of the proposed algorithm depends on the mechanism of dictionary creation and comparison method of the dictionary templates with the received command

template and also on the range of values for Euclidian distance. If the number of times the same command is taken for creating

dictionary is increased, the efficiency of proposed algorithm will go

up.

Page 15: Speech Recognition

8

Chapter 4

System Description & Hardware Implementation

The block diagram of the proposed system for the face recognition is shown in Figure 4.1.

Figure 4.1: Block diagram of the face recognition system.

4.1a Person to be Authenticated

The person who wants to get the robot following his/her commands should be an authenticated one. So to prove one‘s

authenticity one will have to come in front of Web Camera for giving shot.

4.1b Web Camera

The web camera used for getting the snapshots may be any

simple web camera with an appropriate resolution. The resolution generally used is 480 ×640 but higher resolution can also be used

keeping in mind the processing time of the image acquired and

complexity of the system.

4.1c Image Acquisition and Processing Toolbox

Image acquisition toolbox and image processing toolbox are modules of MATLAB 7.0.1 installed in personal computer. Image

acquisition toolbox is used to acquire the image of the person to be

observed with the help of web camera .Image processing toolbox processes the image that has been obtained in the matrix form

Page 16: Speech Recognition

9

using web camera and image acquisition toolbox. The main

functions of this toolbox are to detect the face in image first using

edge detection technique and then recognize it using Fourier descriptors.

The block diagram of the proposed system for the speech

recognition is shown in Figure 4.2.

Figure 4.2: Block diagram of the speech recognition system.

A brief description of various modules of the speech recognition system is given below.

4.2a Voice Commands

The voice commands are given by the person who has been

authenticated by the face recognition algorithm. In the proposed algorithm a limit on the number of voice commands has been

imposed to make the system useful for real world applications.

4.2b Microphone

The microphone takes the commands from the authenticated person. It is directly connected to the personal computer.

Commands given by the person are taken as analog inputs using the Date Acquisition Toolbox of MATLAB.

Page 17: Speech Recognition

10

4.2c Sampler

The speech signal obtained is sampled to convert it into discrete from. The sampling is done in MATLAB. As speech signals

lies in the range of 300 Hz-4000 Hz, so according to Nyquist

Sampling Theorem minimum sampling rate required should be

8000 samples/second. But in order to obtain the required accuracy sampling rate is decided as 16000 samples/second.

4.2d Band Pass Filter

After sampling the discrete signal obtained is passed through

a band pass filter. Here, the fourth order Chebysheb band pass filter having the pass band 300 Hz - 4000 Hz. The band pass filter

is used to remove the noise existed outside the pass band.

4.2e Processing and Decision Making Unit The processing unit does all the processing of the speech

signals required for the voice commands recognition. Here personal computer with MATLAB 7.0.1 is used as processing and decision

making unit.

4.2f Microcontroller & Motor Controller Unit

Here the processing and decision making unit (See Figure 3.4) for sound detection and then for required is ATMEL‘s AVR family

ATMEGA32L microcontroller. This microcontroller has 32k programmable flash memory and maximum clock frequency 8 MHz

It consists of 32 x 8 General Purpose Working Registers. It has 32K Bytes of In-System Self-Programmable Flash with endurance:

10,000 Write/Erase Cycles, 1024 Bytes EEPROM with endurance:

100,000 Write/Erase Cycles, 2K Byte Internal SRAM, Two 8-bit

Timer/Counters with Separate Prescalers and Compare Modes and

One 16-bit Timer/Counter with Separate Prescaler, Compare Mode and Capture Mode. The Operating Voltage is 2.7 - 5.5V and Speed

Grade is 0 - 8 MHz for ATmega32L. It has an inbuilt 8 channel, 10

bit A/D converter. Here A/D converter (Port A) is used for converting analog signal

from the output of band pass filter to digital signal which is

processed by the processing unit of microcontroller and

accordingly, it will generate appropriate control signals (at Port D

Page 18: Speech Recognition

11

in microcontroller for ‗SODDRO‘) to drive the motors used in

mechanical assembly.

Dual full H bridge motor driver IC L298 as shown in Figure

3.4 is used to control the movement of motors. Each H bridge is capable of moving the motor ‗clockwise‘ or ‗anticlockwise‘

depending upon the direction of current flow through the circuit.

Using IC L298, it is possible to ‗jam‘ or ‗free‘ the motors if required.

Basically L298 acts as an interface between the low power control signals generated by microcontroller and the motor assembly which

requires relatively high power for driving of motors. In this system

the logic supply voltage used is 5V and motor supply voltage is 6V.

Figure 4.2a: Microcontroller and motor controller unit

4.2g Mechanical assembly

This module mainly consists of two 3V brushed DC motors,

gear boxes and vehicle chassis. The implemented side steering

mechanism in mechanical assembly can effectively control the

motors for taking sharp turns. Motor 1 controls the motion of left wheels and motor 2 controls the right wheels as shown in the

Figure 3.5.

Page 19: Speech Recognition

12

Figure 4.2b: Bottom view of mechanical assembly

Page 20: Speech Recognition

13

4.3 List of components used for Sound Detection, Face

Recognition and Speech Recognition.

Semiconductor:

1. 7805 +5V Dc Regulator (Qty. 1) 2. LM324 Quad Op-amp (Qty. 3)

3. 1N4007 rectifier diode (Qty. 1)

4. 5mm Light Emitting Diode (Qty. 1) 5. ATMEGA32L AVR Microcontroller (Qty. 1)

6. L298 Dual Full Bridge Driver (Qty. 1)

7. 1N4148 Switching Diode (Qty. 8)

Resistors (all ¼ watt, ±5% carbon)

1. 1 K ohm (Qty. 4)

2. 4.7 K ohm (Qty. 12) 3. 10 K ohm (Qty. 15)

4. 100 K ohm (Qty. 3) 5. 470 K ohm pots (Qty. 3)

Capacitors:

1. 470 μF electrolytic (Qty. 1)

2. 0.1 μF ceramic disk Qty. 8)

3. 0.01 μF ceramic disk (Qty. 10) Miscellaneous:

1. 9V Battery (Qty. 1) 2. 5V Battery pack (Qty. 1)

3. Battery Connector (Qty. 2)

4. Bread Board (Qty. 1)

5. Condenser Microphone (Qty.4) 6. General purpose PCB (Qty. 1)

7. Connecting Wires

8. PCB Wire Connectors (Qty. 4) 9. IC base 14 pin (Qty. 3)

10. DC motors from toy car (Qty. 2)

11. Gear reduction box from toy car (Qty. 2) 12. Ply wood sheet for vehicle chassis

13. Web camera

14. Personal Computer

Page 21: Speech Recognition

14

Chapter 5

Software section

In this section the description of the MATLAB 7.0.1 codes for

Speech & Face Recognition is given.

5.1 MATLAB Code for Face Recognition 5.1A MATLAB Code for Image Capturing

out=imaqhwinfo('winvideo');

vid=videoinput('winvideo',1,'RGB24_640x480');

for i=1:3

preview(vid)

a=getsnapshot(vid);

closepreview;

imshow(a)

b=rgb2gray(a);

imshow(b)

level=graythresh(b);

c=im2bw(b,level);

imshow(c)

e = imfill(c,'holes');

imshow(e)

se = strel('disk',5);

f = imerode(e,se);

imshow(f)

f=imdilate(f,se);

image1(1,i)={f};

end

for i=1:3

preview(vid)

a=getsnapshot(vid);

closepreview;

imshow(a)

b=rgb2gray(a);

imshow(b)

level=graythresh(b);

c=im2bw(b,level);

imshow(c)

e = imfill(c,'holes');

imshow(e)

se = strel('disk',5);

f = imerode(e,se);

imshow(f)

f=imdilate(f,se);

image2(1,i)={f};

end

for i=1:3

preview(vid)

a=getsnapshot(vid);

Page 22: Speech Recognition

15

closepreview;

imshow(a)

b=rgb2gray(a);

imshow(b)

level=graythresh(b);

c=im2bw(b,level);

imshow(c)

e = imfill(c,'holes');

imshow(e)

se = strel('disk',5);

f = imerode(e,se);

imshow(f)

f=imdilate(f,se);

image3(1,i)={f};

end

for i=1:3

preview(vid)

a=getsnapshot(vid);

closepreview;

imshow(a)

b=rgb2gray(a);

imshow(b)

level=graythresh(b);

c=im2bw(b,level);

imshow(c)

e = imfill(c,'holes');

imshow(e)

se = strel('disk',5);

f = imerode(e,se);

imshow(f)

f=imdilate(f,se);

image4(1,i)={f};

end

save('radhe','image1','image2','image3','image4')

5.1B MATLAB Code for Fourier Descriptors & Dictionary Creation

load('radhe','image1','image2','image3','image4');

for i=1:3

temp=cell2mat(image1(1,i));

imshow(temp)

s=~temp;

imshow(s);

t1=imfill(s,'holes');

imshow(t1);

bound = bwboundaries(t1);

[row1,col1]=size(bound);

for j=1:row1

bound3=cell2mat(bound(j,1));

[row(1,j),col(1,j)]=size(bound3);

end

temp1=row(1,1);

num=1;

Page 23: Speech Recognition

16

for j=2:row1

if(row(1,j)>temp1)

temp1=row(1,j);

num=j;

end

end

bound1=bound(num);

bound2=cell2mat(bound1);

n1=size(bound2);

n=n1(1,1);

for k=1:102

z(k)=0;

end

for k=1:102

for m=1:n

p=bound2(m,1);

q=bound2(m,2);

t=complex(p,q);

z(k)=z(k)+ t*exp(-(i*2*pi*k*m)/n);

end

end

d=abs(z(2));

c=abs(z);

c=c/d;

for k=3:102

pl(k-2)=c(k);

end

dict1(1,i)={pl};

end

for i=1:3

temp=cell2mat(image2(1,i));

imshow(temp)

s=~temp;

imshow(s);

t1=imfill(s,'holes');

imshow(t1);

imshow(t1)

bound = bwboundaries(t1);

[row1,col1]=size(bound);

for j=1:row1

bound3=cell2mat(bound(j,1));

[row(1,j),col(1,j)]=size(bound3);

end

temp1=row(1,1);

num=1;

for j=2:row1

if(row(1,j)>temp1)

temp1=row(1,j);

num=j;

end

end

bound1=bound(num);

bound2=cell2mat(bound1);

n1=size(bound2);

n=n1(1,1);

Page 24: Speech Recognition

17

for k=1:102

z(k)=0;

end

for k=1:102

for m=1:n

p=bound2(m,1);

q=bound2(m,2);

t=complex(p,q);

z(k)=z(k)+ t*exp(-(i*2*pi*k*m)/n);

end

end

d=abs(z(2));

c=abs(z);

c=c/d;

for k=3:102

pl(k-2)=c(k);

end

dict1(1,(i+3))={pl};

end

for i=1:3

temp=cell2mat(image3(1,i));

imshow(temp)

s=~temp;

imshow(s);

t1=imfill(s,'holes');

imshow(t);

bound = bwboundaries(t1);

[row1,col1]=size(bound);

for j=1:row1

bound3=cell2mat(bound(j,1));

[row(1,j),col(1,j)]=size(bound3);

end

temp1=row(1,1);

num=1;

for j=2:row1

if(row(1,j)>temp1)

temp1=row(1,j);

num=j;

end

end

bound1=bound(num);

bound2=cell2mat(bound1);

n1=size(bound2);

n=n1(1,1);

for k=1:102

z(k)=0;

end

for k=1:102

for m=1:n

p=bound2(m,1);

q=bound2(m,2);

t=complex(p,q);

z(k)=z(k)+ t*exp(-(i*2*pi*k*m)/n);

end

Page 25: Speech Recognition

18

end

d=abs(z(2));

c=abs(z);

c=c/d;

for k=3:102

pl(k-2)=c(k);

end

dict1(1,i+6)={pl};

end

for i=1:3

temp=cell2mat(image1(1,i));

imshow(temp)

s=~temp;

imshow(s);

t1=imfill(s,'holes');

imshow(t1);

bound = bwboundaries(t1);

[row1,col1]=size(bound);

for j=1:row1

bound3=cell2mat(bound(j,1));

[row(1,j),col(1,j)]=size(bound3);

end

temp1=row(1,1);

num=1;

for j=2:row1

if(row(1,j)>temp1)

temp1=row(1,j);

num=j;

end

end

bound1=bound(num);

bound2=cell2mat(bound1);

n1=size(bound2);

n=n1(1,1);

for k=1:102

z(k)=0;

end

for k=1:102

for m=1:n

p=bound2(m,1);

q=bound2(m,2);

t=complex(p,q);

z(k)=z(k)+ t*exp(-(i*2*pi*k*m)/n);

end

end

d=abs(z(2));

c=abs(z);

c=c/d;

for k=3:102

pl(k-2)=c(k);

end

dict1(1,i+9)={pl};

end

Page 26: Speech Recognition

19

save('dictionary','dict1');

5.1C MATLAB Code for Image Comparison load('dictionary','dict1');

out=imaqhwinfo('winvideo');

vid=videoinput('winvideo',1,'RGB24_640x480');

preview(vid)

a=getsnapshot(vid);

closepreview;

imshow(a)

b=rgb2gray(a);

imshow(b)

level=graythresh(b);

c=im2bw(b,level);

imshow(c)

e = imfill(c,'holes');

imshow(e)

se = strel('disk',3);

f = imerode(e,se);

imshow(f)

f=imdilate(f,se);

samp=descriptor(f);

m=0;

for k=1:66

sum2(1,k)=0;

end

for i=1:11

temp =cell2mat(dict1(1,i));

for j=i+1:12

temp1 =cell2mat(dict1(1,j));

m=m+1;

for k=1:100

sum2(1,m)=sum2(1,m)+abs(temp(1,k)-temp1(1,k));

end

end

end

flag=sum2(1,1);

for i=2:66

if flag<sum2(1,i)

flag = sum2(1,i);

end

end

for i=1:12

temp=cell2mat(dict1(1,i));

sum1=0;

for j=1:100

sum1=sum1 + abs(temp(1,j)-samp(1,j));

end

sum(1,i)=sum1;

end

for i=1:4

suma(1,i)=0;

end

Page 27: Speech Recognition

20

for i=1:4

for j=3*i-2:3*i

suma(1,i)=suma(1,i)+sum(1,j);

end

end

flag1=suma(1,1);

seq=1;

for i=2:4

if flag1 > suma(1,i)

flag1 = suma(1,i);

seq=i;

end

end

if seq==1

disp('Hi Mr.Aditya')

end

if seq==2

disp('Hi Mr.Subhayan')

end

if seq==3

disp('Hi Mr.Nilsh')

end

if seq==4

disp('Hi Mr.Chandra Veer')

end

5.1D MATLAB Code for module Descriptor (Called in line 19 of

5.1C)

function[pl]=descriptor(f)

load('sound','aditya','subhayan','nilesh','chandraveer','stranger');

imshow(f)

s=~f;

imshow(s);

t=imfill(s,'holes');

imshow(t);

bound = bwboundaries(t);

[row1,col1]=size(bound);

for j=1:row1

[row(1,j),col(1,j)]=size(bound(j,1));

end

temp1=row(1,1);

num=1;

for j=2:row1

if(row(1,j)>temp1)

temp1=row(1,j);

num=j;

end

end

bound1=bound(num);

bound2=cell2mat(bound1);

n1=size(bound2);

Page 28: Speech Recognition

21

n=n1(1,1);

for k=1:102

z(k)=0;

end

for k=1:102

for m=1:n

p=bound2(m,1);

q=bound2(m,2);

t=complex(p,q);

z(k)=z(k)+ t*exp(-(i*2*pi*k*m)/n);

end

end

d=abs(z(2));

c=abs(z);

c=c/d;

for k=3:102

pl(k-2)=c(k);

end

5.2 MATLAB Code for Speech Recognition 5.2A MATLAB Code for Dictionary Creation

ai = analoginput('winsound');

chan=addchannel(ai,1);

set(ai,'SampleRate',16000);

set(ai,'SamplesPerTrigger',16000);

set(ai,'TriggerChannel',chan)

set(ai,'TriggerType','Software')

set(ai,'TriggerCondition','Rising')

set(ai,'TriggerConditionValue',0.1)

set(ai,'TriggerDelayUnits','Samples')

set(ai,'TriggerDelay',-50)

start(ai)

noinput(:,1) = getdata(ai);

set(ai,'TriggerConditionValue',0.3)

n = 4; Rp = 0.05;

Wn = [300 3400]/4000;

[b,a] = cheby1(n,Rp,Wn);

for p = 1:2

start(ai)

left(:,p) = getdata(ai);

% subplot(3,4,p);

a=plot(left(:,p));

end

left11=sum(left')/2;

left1=left11';

filter(b,a,left1);

for p = 1:2

start(ai)

right(:,p) = getdata(ai);

% subplot(2,4,p+2);

Page 29: Speech Recognition

22

a=plot(right(:,p));

end

right11=sum(right')/2;

right1=right11';

filter(b,a,right1);

for p = 1:2

start(ai)

forward(:,p) = getdata(ai);

% subplot(2,4,p+4);

a=plot(forward(:,p));

end

forward11=sum(forward')/2;

forward1=forward11';

filter(b,a,forward1);

for p = 1:2

start(ai)

backward(:,p) = getdata(ai);

% subplot(2,4,p+6);

a=plot(backward(:,p));

end

backward11=sum(backward')/2;

backward1=backward11';

filter(b,a,backward1);

for k=1:16000

left2(k)=left1(k)*left1(k);

noinput1(k)=noinput(k)*noinput(k);

end

for j=1:64

left_final(j)=0;

noinput_final(j)=0;

end

for j=0:63

for l=(250*j+1):(250*j+250)

left_final(j+1)=left_final(j+1)+left2(l);

noinput_final(j+1)=noinput_final(j+1)+noinput1(l);

end

end

% noise=sort(noinput_final,'descend');

noise=0.04;

for k=1:16000

right2(k)=right1(k)*right1(k);

end

for j=1:64

right_final(j)=0;

end

for j=0:63

for l=(250*j+1):(250*j+250)

right_final(j+1)=right_final(j+1)+right2(l);

end

end

for k=1:16000

forward2(k)=forward1(k)*forward1(k);

end

Page 30: Speech Recognition

23

for j=1:64

forward_final(j)=0;

end

for j=0:63

for l=(250*j+1):(250*j+250)

forward_final(j+1)=forward_final(j+1)+forward2(l);

end

end

for k=1:16000

backward2(k)=backward1(k)*backward1(k);

end

for j=1:64

backward_final(j)=0;

end

for j=0:63

for l=(250*j+1):(250*j+250)

backward_final(j+1)=backward_final(j+1)+backward2(l);

end

end

for j=1:64

left_final2(j)=left_final(j)-noise;

right_final2(j)=right_final(j)-noise;

forward_final2(j)=forward_final(j)-noise;

backward_final2(j)=backward_final(j)-noise;

end

save('dictionary','left_final','right_final','forward_final','bac

kward_final')

5.2B MATLAB Code for Correlation

function coff = correlation(input,dict,num)

i_ = mean(input);

d_ = mean(dict);

for j=1:64

ID(j)=0;

sqi(j)=0;

sqd(j)=0;

I(j)=input(j)-i_;

D(j)=dict(j)-d_;

end

for j=1:num

sqi(j)=I(j)*I(j);

sqd(j)=D(j)*D(j);

ID(j)=I(j)*D(j);

Page 31: Speech Recognition

24

end

num1=0;

r=0;

for j=1:num

num1=num1+ID(j);

end

den1=sqrt((sum(sqi)*sum(sqd)));

if den1~=0

r=num1/den1;

end

coff=r;

5.2C MATLAB Code for Template Comparison

load('dictionary','left_final','right_final','forward_final','bac

kward_final')

aj = analoginput('winsound');

chan1=addchannel(aj,1);

set(aj,'SampleRate',16000);

set(aj,'SamplesPerTrigger',16000);

set(aj,'TriggerChannel',chan1)

set(aj,'TriggerType','Software')

set(aj,'TriggerCondition','Rising')

set(aj,'TriggerConditionValue',0.001)

set(aj,'TriggerDelayUnits','Samples')

set(aj,'TriggerDelay',-50)

start(aj)

noiseinput(:,1) = getdata(aj);

set(aj,'TriggerConditionValue',0.2)

start(aj)

compare(:,1) = getdata(aj);

subplot(2,2,1);

p=plot(compare(:,1));

n = 4; Rp = 0.05;

Wn = [300 3400]/4000;

[b,a] = cheby1(n,Rp,Wn);

filter(b,a,compare);

subplot(2,2,2);

p=plot(compare(:,1));

for k=1:16000

compare1(k)=compare(k)*compare(k);

noinput1(k)=noiseinput(k)*noiseinput(k);

end

for j=1:64

compare_final(j)=0;

noinput_final(j)=0;

end

for j=0:63

for l=(250*j+1):(250*j+250)

Page 32: Speech Recognition

25

compare_final(j+1)=compare_final(j+1)+compare1(l);

noinput_final(j+1)=noinput_final(j+1)+noinput1(l);

end

end

noise=0.3;

for j=1:64

left_final2(j)=left_final(j)-noise;

right_final2(j)=right_final(j)-noise;

forward_final2(j)=forward_final(j)-noise;

backward_final2(j)=backward_final(j)-noise;

end

% noise=sort(noinput_final,'descend');

ccom1=0;

ccom2=0;

cleft1=0;

cleft2=0;

cright1=0;

cright2=0;

cfor1=0;

cfor2=0;

cback1=0;

cback2=0;

number1=0;

number2=0;

number3=0;

number4=0;

number5=0;

number6=0;

number7=0;

number8=0;

for j=1:64

compare_final3(j)=0;

compare_final4(j)=0;

left_final3(j)=0;

left_final4(j)=0;

right_final3(j)=0;

right_final4(j)=0;

forward_final3(j)=0;

forward_final4(j)=0;

backward_final3(j)=0;

backward_final4(j)=0;

end

l=1;

for j=1:64

compare_final2(j)=compare_final(j)-noise;

end

for j=1:64

if (((j==1)|(j==64))&(compare_final2(j)>0))

compare_final3(l)=compare_final2(j);

Page 33: Speech Recognition

26

l=l+1;

ccom1=ccom1+1;

elseif ((compare_final2(j)>0)&(compare_final2 (j-1)>0) &

(compare_final2 (j+1))>0)

compare_final3(l)=compare_final2(j);

l=l+1;

ccom1=ccom1+1;

elseif (j~=(1 |64)) & (compare_final2 (j-1)<0) &

(compare_final2 (j+1)<0)

break

end

end

l=1;

for k=j+1:64

if ((k==64|k==63)& compare_final2(k)>0)

compare_final4(l)=compare_final2(k);

l=l+1;

ccom2=ccom2+1;

elseif ((compare_final2(k)>0)&(compare_final2 (k-1)>0) &

(compare_final2 (k+1)>0)& (compare_final2 (k-2)>0) &

(compare_final2 (k+2)>0))

compare_final4(l)=compare_final2(k);

l=l+1;

ccom2=ccom2+1;

% elseif (k~=(1 |64)) & (compare_final2 (k-1)<0) &

(compare_final2 (k+1)<0)

% break

end

end

l=1;

for j=1:64

if ((left_final2(j)>0)&((j==1) |(j==64)))

left_final3(l)=left_final2(j);

l=l+1;

cleft1=cleft1+1;

elseif ((left_final2(j)>0)&(left_final2 (j-1)>0) &

(left_final2 (j+1))>0)

left_final3(l)=left_final2(j);

l=l+1;

cleft1=cleft1+1;

elseif (j~=(1 |64)) & (left_final2 (j-1)<0) & (left_final2

(j+1)<0)

break

end

end

Page 34: Speech Recognition

27

l=1;

for k=j+1:64

if ((k==64|k==63)& compare_final2(k)>0)

left_final4(l)=left_final2(k);

l=l+1;

cleft2=cleft2+1;

elseif ((left_final2(k)>0)&(left_final2 (k-1)>0) &

(left_final2 (k+1)>0)&(left_final2 (k-2)>0) & (left_final2

(k+2)>0))

left_final4(l)=left_final2(k);

l=l+1;

cleft2=cleft2+1;

% elseif (k~=(1 |64)) & (left_final2 (k-1)<0) & (left_final2

(k+1)<0)

% break

%

end

end

l=1;

for j=1:64

if ((right_final2(j)>0)&((j==1) |(j==64)))

right_final3(l)=right_final2(j);

l=l+1;

cright1=cright1+1;

elseif ((right_final2(j)>0)&(right_final2 (j-1)>0) &

(right_final2 (j+1))>0)

right_final3(l)=right_final2(j);

l=l+1;

cright1=cright1+1;

elseif (j~=(1 |64)) & (right_final2 (j-1)<0) & (right_final2

(j+1)<0)

break

end

end

l=1;

for k=j+1:64

if ((k==64|k==63)& right_final2(k)>0)

right_final4(l)=right_final2(k);

l=l+1;

cright2=cright2+1;

elseif ((right_final2(k)>0)&(right_final2 (k-1)>0) &

(right_final2 (k+1))>0 &(right_final2 (k-2)>0) & (right_final2

(k+2)>0))

right_final4(l)=right_final2(k);

l=l+1;

cright2=cright2+1;

% elseif (k~=(1 |64)) & (right_final2 (k-1)<0) &

(right_final2 (k+1)<0)

Page 35: Speech Recognition

28

% break

%

end

end

l=1;

for j=1:64

if ((forward_final2(j)>0)&((j==1)|(j==64)))

forward_final3(l)=forward_final2(j);

l=l+1;

cfor1=cfor1+1;

elseif ((forward_final2(j)>0)&(forward_final2 (j-1)>0) &

(forward_final2 (j+1))>0)

forward_final3(l)=forward_final2(j);

l=l+1;

cfor1=cfor1+1;

elseif (j~=(1 |64)) & (forward_final2 (j-1)<0) &

(forward_final2 (j+1)<0)

break

end

end

l=1;

for k=j+1:64

if ((k==64|k==63)& forward_final2(k)>0)

forward_final4(l)=forward_final2(k);

l=l+1;

cfor2=cfor2+1;

elseif ((forward_final2(k)>0)&(forward_final2 (k-1)>0) &

(forward_final2 (k+1))>0 &(forward_final2 (k-2)>0) &

(forward_final2 (k+2)>0))

forward_final4(l)=forward_final2(k);

l=l+1;

cfor2=cfor2+1;

% elseif (k~=(1 |64)) & (forward_final2 (k-1)<0) &

(forward_final2 (k+1)<0)

% break

%

end

end

l=1;

for j=1:64

if ((backward_final2(j)>0)&((j==1)|(j==64)))

backward_final3(l)=backward_final2(j);

l=l+1;

cback1=cback1+1;

elseif ((backward_final2(j)>0)&(backward_final2 (j-1)>0) &

(backward_final2 (j+1))>0)

backward_final3(l)=backward_final2(j);

Page 36: Speech Recognition

29

l=l+1;

cback1=cback1+1;

elseif (j~=(1 |64)) & (backward_final2 (j-1)<0) &

(backward_final2 (j+1)<0)

break

end

end

l=1;

for k=j+1:64

if ((k==64|k==63)& backward_final2(k)>0)

backward_final4(l)=backward_final2(k);

l=l+1;

cback2=cback2+1;

elseif ((backward_final2(k)>0)&(backward_final2 (k-1)>0) &

(backward_final2 (k+1)>0) &(backward_final2 (k-2)>0) &

(backward_final2 (k+2)>0))

backward_final4(l)=backward_final2(k);

l=l+1;

cback2=cback2+1;

% elseif (k~=(1 |64)) & (backward_final2 (k-1)<0) &

(backward_final2 (k+1)<0)

% break

end

end

plot(compare_final2)

for j=1:4

t(j)=0;

end

if ccom1<=cleft1

number1=cleft1;

else

number1=ccom1;

end

if ccom2<=cleft2

number2=cleft2;

else

number2=ccom2;

end

if ccom1<=cright1

number3=cright1;

else

number3=ccom1;

end

Page 37: Speech Recognition

30

if ccom2<=cright2

number4=cright2;

else

number4=ccom2;

end

if ccom1<=cfor1

number5=cfor1;

else

number5=ccom1;

end

if ccom2<=cfor2

number6=cfor2;

else

number6=ccom2;

end

if ccom1<=cback1

number7=cback1;

else

number7=ccom1;

end

if ccom2<=cback2

number8=cback2;

else

number8=ccom2;

end

t11=correlation((compare_final3)',(left_final3)',number1);

t12=correlation((compare_final4)',(left_final4)',number2);

t21=correlation((compare_final3)',(right_final3)',number3);

t22=correlation((compare_final4)',(right_final4)',number4);

t31=correlation((compare_final3)',(forward_final3)',number5);

t32=correlation((compare_final4)',(forward_final4)',number6);

t41=correlation((compare_final3)',(backward_final3)',number7);

t42=correlation((compare_final4)',(backward_final4)',number8);

t(1)=t11+t12

t(2)=t21+t22

t(3)=t31+t32

t(4)=t41+t42

for j=1:4

q(j)=t(j);

end

for j=0:3

for m=1:3-j

if q(m)>q(m+1)

temp=q(m);

q(m)=q(m+1);

q(m+1)=temp;

end

Page 38: Speech Recognition

31

end

end

q(4)

if cright1<=cleft1

number1=cleft1;

else

number1=cright1;

end

if cright2<=cleft2

number2=cleft2;

else

number2=cright2;

end

if cfor1<=cleft1

number3=cleft1;

else

number3=cfor1;

end

if cfor2<=cleft2

number4=cleft2;

else

number4=cfor2;

end

if cback1<=cleft1

number5=cleft1;

else

number5=cback1;

end

if cback2<=cleft2

number6=cleft2;

else

number6=cback2;

end

if cfor1<=cright1

number7=cright1;

else

number7=cfor1;

end

if cfor2<=cright2

number8=cright2;

else

number8=cfor2;

end

if cback1<=cright1

Page 39: Speech Recognition

32

number9=cright1;

else

number9=cback1;

end

if cback2<=cright2

number10=cright2;

else

number10=cback2;

end

if cback1<=cfor1

number11=cfor1;

else

number11=cback1;

end

if cback2<=cfor2

number12=cfor2;

else

number12=cback2;

end

s11=correlation((right_final3)',(left_final3)',number1);

s12=correlation((right_final4)',(left_final4)',number2);

s21=correlation((forward_final3)',(left_final3)',number3);

s22=correlation((forward_final4)',(left_final4)',number4);

s31=correlation((backward_final3)',(left_final3)',number5);

s32=correlation((backward_final4)',(left_final4)',number6);

s41=correlation((forward_final3)',(right_final3)',number7);

s42=correlation((forward_final4)',(right_final4)',number8);

s51=correlation((backward_final3)',(right_final3)',number9);

s52=correlation((backward_final4)',(forward_final4)',number10);

s61=correlation((backward_final3)',(forward_final3)',number11);

s62=correlation((backward_final4)',(forward_final4)',number12);

s(1)=s11+s12;

s(2)=s21+s22;

s(3)=s31+s32;

s(4)=s41+s42;

s(5)=s51+s52;

s(6)=s61+s62;

for j=1:6

p(j)=s(j);

end

for j=0:5

for m=1:5-j

if p(m)>p(m+1)

temp=p(m);

p(m)=p(m+1);

p(m+1)=temp;

end

Page 40: Speech Recognition

33

end

end

a=[04 02 01 03]; b=[2 2 2 2];

if q(4)==t(1) data=a(1); y=b(1); putvalue(parport,data); pause(y); disp('The given command is LEFTSIDE'); end

if q(4)==t(2) data=a(2); y=b(2); putvalue(parport,data); pause(y); disp('The given command is RIGHT'); end

if q(4)==t(3) data=a(3); y=b(3); putvalue(parport,data); pause(y); disp('The given command is GO FORWARD'); end

if q(4)==t(4) data=a(4); y=b(4); putvalue(parport,data); pause(y); disp('The given command is MOVE BACK'); end

delete(parport) clear parport

5.3 Assembly Code for Sound Detection .include "m32def.inc"

.cseg

.org 0

rjmp INITIALISATION

INITIALISATION:

;=========INITIALISATION STARTS HERE=========

Page 41: Speech Recognition

34

ldi r16,low(RAMEND) ;initalising stack pointer starts

out SPL,r16

ldi r16,high(RAMEND)

out SPH,r16 ;stack initialisation ends here

ldi r16,$00

mov r3,r16 ;reset the register r0

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16

out tccr1a,r16

out tccr1b,r16

ldi r16,0b10000000

out adcsra,r16 ;enable ADC

out ddrb,r16 ;configure prot b as i/p port

ldi r16,$ff

out ddrc,r16 ;configure port c as o/p port

out ddrd,r16 ;configure prot d as o/p port

; INITIALISATION ENDS HERE

;================================

;====save data in data memory====

;================================

ldi r19,$00

ldi r18,$00

sts $0060,r18

sts $0061,r19

ldi r19,$00

ldi r18,$2a

sts $0062,r18

sts $0063,r19

ldi r19,$00

ldi r18,$53

sts $0064,r18

sts $0065,r19

ldi r19,$00

ldi r18,$7d

sts $66,r18

sts $67,r19

ldi r19,$00

ldi r18,$a7

sts $68,r18

sts $69,r19

ldi r19,$00

ldi r18,$d1

sts $6a,r18

sts $6b,r19

ldi r19,$00

ldi r18,$fa

sts $6c,r18

sts $6d,r19

ldi r19,$01

ldi r18,$24

sts $6e,r18

sts $6f,r19

ldi r19,$01

Page 42: Speech Recognition

35

ldi r18,$4e

sts $70,r18

sts $71,r19

ldi r19,$01

ldi r18,$78

sts $72,r18

sts $73,r19

ldi r19,$01

ldi r18,$a2

sts $74,r18

sts $75,r19

ldi r19,$01

ldi r18,$cc

sts $76,r18

sts $77,r19

ldi r19,$01

ldi r18,$f6

sts $78,r18

sts $79,r19

ldi r19,$02

ldi r18,$20

sts $7a,r18

sts $7b,r19

ldi r19,$02

ldi r18,$4a

sts $7c,r18

sts $7d,r19

ldi r19,$02

ldi r18,$74

sts $7e,r18

sts $7f,r19

ldi r19,$02

ldi r18,$9e

sts $80,r18

sts $81,r19

ldi r19,$02

ldi r18,$c8

sts $82,r18

sts $83,r19

ldi r19,$02

ldi r18,$f3

sts $84,r18

sts $85,r19

ldi r19,$03

ldi r18,$1d

sts $86,r18

sts $87,r19

ldi r19,$03

ldi r18,$47

sts $88,r18

sts $89,r19

ldi r19,$03

ldi r18,$72

sts $8a,r18

sts $8b,r19

ldi r19,$03

ldi r18,$9d

Page 43: Speech Recognition

36

sts $8c,r18

sts $8d,r19

ldi r19,$03

ldi r18,$c7

sts $8e,r18

sts $8f,r19

ldi r19,$03

ldi r18,$f2

sts $90,r18

sts $91,r19

ldi r19,$04

ldi r18,$1d

sts $92,r18

sts $93,r19

ldi r19,$04

ldi r18,$48

sts $94,r18

sts $95,r19

ldi r19,$04

ldi r18,$73

sts $96,r18

sts $97,r19

ldi r19,$04

ldi r18,$9e

sts $98,r18

sts $99,r19

ldi r19,$04

ldi r18,$ca

sts $9a,r18

sts $9b,r19

ldi r19,$04

ldi r18,$f5

sts $9c,r18

sts $9d,r19

ldi r19,$05

ldi r18,$21

sts $9e,r18

sts $9f,r19

ldi r19,$05

ldi r18,$4d

sts $a0,r18

sts $a1,r19

ldi r19,$05

ldi r18,$78

sts $a2,r18

sts $a3,r19

ldi r19,$05

ldi r18,$a4

sts $a4,r18

sts $a5,r19

ldi r19,$05

ldi r18,$d1

sts $a6,r18

sts $a7,r19

ldi r19,$05

ldi r18,$fd

sts $a8,r18

Page 44: Speech Recognition

37

sts $a9,r19

ldi r19,$06

ldi r18,$2a

sts $aa,r18

sts $ab,r19

ldi r19,$06

ldi r18,$56

sts $ac,r18

sts $ad,r19

ldi r19,$06

ldi r18,$83

sts $ae,r18

sts $af,r19

ldi r19,$06

ldi r18,$b0

sts $b0,r18

sts $b1,r19

ldi r19,$06

ldi r18,$dd

sts $b2,r18

sts $b3,r19

ldi r19,$07

ldi r18,$0b

sts $b4,r18

sts $b5,r19

ldi r19,$07

ldi r18,$39

sts $b6,r18

sts $b7,r19

ldi r19,$07

ldi r18,$66

sts $b8,r18

sts $b9,r19

ldi r19,$07

ldi r18,$95

sts $ba,r18

sts $bb,r19

ldi r19,$07

ldi r18,$c3

sts $bc,r18

sts $bd,r19

ldi r19,$07

ldi r18,$f2

sts $be,r18

sts $bf,r19

ldi r19,$08

ldi r18,$20

sts $c0,r18

sts $c1,r19

ldi r19,$08

ldi r18,$4f

sts $c2,r18

sts $c3,r19

ldi r19,$08

ldi r18,$7f

sts $c4,r18

sts $c5,r19

Page 45: Speech Recognition

38

ldi r19,$08

ldi r18,$af

sts $c6,r18

sts $c7,r19

ldi r19,$08

ldi r18,$de

sts $c8,r18

sts $c9,r19

ldi r19,$09

ldi r18,$0f

sts $ca,r18

sts $cb,r19

ldi r19,$09

ldi r18,$3f

sts $cc,r18

sts $cd,r19

ldi r19,$09

ldi r18,$70

sts $ce,r18

sts $cf,r19

ldi r19,$09

ldi r18,$a1

sts $d0,r18

sts $d1,r19

ldi r19,$09

ldi r18,$d3

sts $d2,r18

sts $d3,r19

ldi r19,$0a

ldi r18,$05

sts $d4,r18

sts $d5,r19

ldi r19,$0a

ldi r18,$37

sts $d6,r18

sts $d7,r19

ldi r19,$0a

ldi r18,$6a

sts $d8,r18

sts $d9,r19

ldi r19,$0a

ldi r18,$9d

sts $da,r18

sts $db,r19

ldi r19,$0a

ldi r18,$d1

sts $dc,r18

sts $dd,r19

ldi r19,$0b

ldi r18,$05

sts $de,r18

sts $df,r19

ldi r19,$0b

ldi r18,$3a

sts $e0,r18

sts $e1,r19

ldi r19,$0b

Page 46: Speech Recognition

39

ldi r18,$6f

sts $e2,r18

sts $e3,r19

ldi r19,$0b

ldi r18,$a4

sts $e4,r18

sts $e5,r19

ldi r19,$0b

ldi r18,$da

sts $e6,r18

sts $e7,r19

ldi r19,$0c

ldi r18,$11

sts $e8,r18

sts $e9,r19

ldi r19,$0c

ldi r18,$48

sts $ea,r18

sts $eb,r19

ldi r19,$0c

ldi r18,$80

sts $ec,r18

sts $ed,r19

ldi r19,$0c

ldi r18,$b9

sts $ee,r18

sts $ef,r19

ldi r19,$0c

ldi r18,$f2

sts $f0,r18

sts $f1,r19

ldi r19,$0d

ldi r18,$2c

sts $f2,r18

sts $f3,r19

ldi r19,$0d

ldi r18,$67

sts $f4,r18

sts $f5,r19

ldi r19,$0d

ldi r18,$a3

sts $f6,r18

sts $f7,r19

ldi r19,$0d

ldi r18,$df

sts $f8,r18

sts $f9,r19

ldi r19,$0e

ldi r18,$1c

sts $fa,r18

sts $fb,r19

ldi r19,$0e

ldi r18,$5b

sts $fc,r18

sts $fd,r19

ldi r19,$0e

ldi r18,$9a

Page 47: Speech Recognition

40

sts $fe,r18

sts $ff,r19

ldi r19,$0e

ldi r18,$da

sts $100,r18

sts $101,r19

ldi r19,$0f

ldi r18,$1c

sts $102,r18

sts $103,r19

ldi r19,$0f

ldi r18,$5f

sts $104,r18

sts $105,r19

ldi r19,$0f

ldi r18,$a3

sts $106,r18

sts $107,r19

ldi r19,$0f

ldi r18,$e8

sts $108,r18

sts $109,r19

ldi r19,$10

ldi r18,$2f

sts $10a,r18

sts $10b,r19

ldi r19,$10

ldi r18,$78

sts $10c,r18

sts $10d,r19

ldi r19,$10

ldi r18,$c3

sts $10e,r18

sts $10f,r19

ldi r19,$11

ldi r18,$0f

sts $110,r18

sts $111,r19

ldi r19,$11

ldi r18,$5e

sts $112,r18

sts $113,r19

ldi r19,$11

ldi r18,$af

sts $114,r18

sts $115,r19

;================================

;====ends====

;================================

ldi r18,$00

ldi r19,$00

ldi r17,$64 ;run threshold setting loop 100 times

THRESHOLD_SETTING:

Page 48: Speech Recognition

41

call SAMPLE_ADC0

or r3,r16

call SAMPLE_ADC1

or r3,r16

call SAMPLE_ADC2

or r3,r16

dec r17

brne THRESHOLD_SETTING

ldi r16,$4f

; mov r3,r16

or r3,r16

lsl r3 ;setting the threshold value

;===============================

;======Main loop starts here====

;===============================

MAIN:

ldi r16,$0f

out portc,r16 ;siganl for ready to get sound signal

call SAMPLE_ADC0

cp r16,r3

brsh MIC_1_JUMP

call SAMPLE_ADC1

cp r16,r3

brsh MIC_2_JUMP

call SAMPLE_ADC2

cp r16,r3

brsh MIC_3_JUMP

rjmp MAIN

MIC_1_JUMP: rjmp MIC_1

MIC_2_JUMP: rjmp MIC_2

MIC_3_JUMP: rjmp MIC_3

MIC_1:

ldi r20,$00 ; lower nibble for 0 degree offset

ldi r21,$00 ; higher nibble for 0 degree offset

call SAMPLE_ADC1

cp r16,r3

brsh MIC_1_CW

call SAMPLE_ADC2

cp r16,r3

brsh MIC_1_CCW

rjmp MIC_1

MIC_1_CW:

ldi r25,$ff ; angle is to be added in offset

ldi r29,$65 ;generate control signals for

clockwise.

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

Page 49: Speech Recognition

42

ldi r16,$01

out tccr1b,r16 ;start the timer

CONTINUE_10:

call SAMPLE_ADC2

cp r16,r3

brlo CONTINUE_10

ldi r16,$00

out tccr1b,r16 ;stop the timer

in r26,tcnt1l ; read the timer low byte

in r27,tcnt1h ; read the timer high byte

ldi r16,$01

rjmp ANGLE_CALCULATE

MIC_1_CCW:

ldi r25,$ff ;angle is to be added in offset

ldi r29,$56 ;generate control signals for CCW

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

ldi r16,$01

out tccr1b,r16 ;start the timer

CONTINUE_11:

call SAMPLE_ADC1

cp r16,r3

brlo CONTINUE_11

ldi r16,$00

out tccr1b,r16 ;stop the timer

in r26,tcnt1l ; read the timer low byte

in r27,tcnt1h ; read the timer high byte

ldi r16,$02

rjmp ANGLE_CALCULATE

MIC_2:

ldi r20,$9b ;lower nibble for 120 degree offset

ldi r21,$23 ;higher nibble for 120 degree offset

call SAMPLE_ADC2

cp r16,r3

brsh MIC_2_CW

call SAMPLE_ADC0

cp r16,r3

brsh MIC_2_CCW

rjmp MIC_2

MIC_2_CW:

ldi r25,$ff ;angle is to be added in offset

Page 50: Speech Recognition

43

ldi r29,$65 ;generate control signals for

clockwise.

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

ldi r16,$01

out tccr1b,r16 ;start the timer

CONTINUE_20:

call SAMPLE_ADC0

cp r16,r3

brlo CONTINUE_20

ldi r16,$00

out tccr1b,r16 ;stop the timer

in r26,tcnt1l ; read the timer low byte

in r27,tcnt1h ; read the timer high byte

ldi r16,$04

rjmp ANGLE_CALCULATE

MIC_2_CCW:

ldi r25,$00 ; angle is to be subtracted from offset

ldi r29,$65 ;generate control signals for clockwise

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

ldi r16,$01

out tccr1b,r16 ;start the timer

CONTINUE_21:

call SAMPLE_ADC2

cp r16,r3

brlo CONTINUE_21

ldi r16,$00

out tccr1b,r16 ; stop the timer

in r26,tcnt1l ; read the timer low byte

in r27,tcnt1h ; read the timer high byte

ldi r16,$08

rjmp ANGLE_CALCULATE

MIC_3:

ldi r20,$9b ;lower nibble for 120 degree offset

ldi r21,$23 ;higher nibble for 120 degree offset

call SAMPLE_ADC0

cp r16,r3

brsh MIC_3_CW

call SAMPLE_ADC1

cp r16,r3

brsh MIC_3_CCW

rjmp MIC_3

Page 51: Speech Recognition

44

MIC_3_CW:

ldi r25,$00 ; angle is to be subtracted from offset

ldi r29,$56 ;generate control signals for CCW

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

ldi r16,$01

out tccr1b,r16 ;start the timer

.

CONTINUE_30:

call SAMPLE_ADC1

cp r16,r3

brlo CONTINUE_30

ldi r16,$00

out tccr1b,r16 ; stop the timer

in r26,tcnt1l ; read the timer low

byte

in r27,tcnt1h ; read the timer high

byte

ldi r16,$10

rjmp ANGLE_CALCULATE

MIC_3_CCW:

ldi r25,$ff ;angle is to be added in offset

ldi r29,$56 ;generate control signals for CCW

ldi r16,$00

out tcnt1h,r16 ;reset the timer1

out tcnt1l,r16 ;reset the timer

ldi r16,$01

out tccr1b,r16 ;start the timer

CONTINUE_31:

call SAMPLE_ADC0

cp r16,r3

brlo CONTINUE_31

ldi r16,$00

out tccr1b,r16 ; stop the timer

in r26,tcnt1l ; read the timer low byte

in r27,tcnt1h ; read the timer high byte

ldi r16,$20

rjmp ANGLE_CALCULATE

;===============================

;======Main loop ends here======

;===============================

Page 52: Speech Recognition

45

SAMPLE_ADC0:

;====FOR ADC Ch.0====

ldi r16,$60 ;ref.volt. is AVCC with cap. at

AREF

out admux,r16 ;left adjusted ADC result and Ch.0

sbi adcsra,6 ; start conversion

LOOP0:

sbis adcsra,4

rjmp LOOP0

in r16,adch

ret

SAMPLE_ADC1:

; ====FOR ADC Ch. 1====

ldi r16,$61 ;ref.volt. is AVCC with cap. at

AREF

out admux,r16 ;left adjusted ADC and Ch.1

sbi adcsra,6 ;start conversion

LOOP1:

sbis adcsra,4

rjmp LOOP1

in r16,adch

ret

SAMPLE_ADC2:

; ====FOR ADC Ch. 2====

ldi r16,$62 ;ref.volt. is AVCC with cap. at

AREF

out admux,r16 ;left adjusted ADC and Ch.2

sbi adcsra,6 ;start conversion

LOOP2:

sbis adcsra,4

rjmp LOOP2

in r16,adch

ret

ANGLE_CALCULATE:

andi r26,$f0

clc

ror r27

ror r26

ror r27

ror r26

ror r27

ror r26

adiw r27:r26,$30

adiw r27:r26,$30

ld r18,X+

Page 53: Speech Recognition

46

ld r19,X

cpi r19,$12 ;if timer value N exceeds

1140

brlo CONTINUE ;then r19 and r18 will

store $ff

ldi r19,$11 ;if this occurs then store

value

ldi r18,$af ;corrosponding to 60

degree (11af)

CONTINUE: cpi r25,$ff

brlo SUBTRACT

ADDITION:

add r20,r18

adc r21,r19

rjmp NEXT

SUBTRACT:

sub r20,r18

sbc r21,r19

NEXT:

out portd,r29 ;give control signal to port D to

rotate

; =============================

; delay loop of

; 100,000 cycles:

; -----------------------------

; delaying 99990 cycles:

ldi R22, $A5

WGLOOPdd0: ldi R23, $C9

WGLOOPdd1: dec R23

brne WGLOOPdd1

dec R22

brne WGLOOPdd0

; -----------------------------

; delaying 9 cycles:

ldi R22, $03

WGLOOPdd2: dec R22

brne WGLOOPdd2

; -----------------------------

; delaying 1 cycle:

nop

; =============================

LOOPCC:

; =============================

; delay loop of

; 64 cycles:

; -----------------------------

; delaying 63 cycles:

Page 54: Speech Recognition

47

ldi R22, $15

WGLOOPcc0: dec R22

brne WGLOOPcc0

; -----------------------------

; delaying 1 cycle:

nop

; =============================

dec r20

brne LOOPCC

cpi r21,$00

breq LOOP_END

dec r21

brne LOOPCC

LOOP_END:

ldi r29,$55

out portd,r29 ;send ctrl signal to port d to move

FWD

; =============================

; delay loop of

; 1,750,000 cycles:

; -----------------------------

; delaying 1749993 cycles:

ldi R22, $A7

WGLOOP10: ldi R23, $12

WGLOOP11: ldi R24, $C1

WGLOOP12: dec R24

brne WGLOOP12

dec R23

brne WGLOOP11

dec R22

brne WGLOOP10

; -----------------------------

; delaying 6 cycles:

ldi R22, $02

WGLOOP13: dec R22

brne WGLOOP13

; -----------------------------

; delaying 1 cycle:

nop

; =============================

ldi r29,$ff ; jam both motors

out portd,r29

Page 55: Speech Recognition

48

; =============================

; delay loop generator

; 15,000,000 cycles:

; -----------------------------

; delaying 14999280 cycles:

ldi R21, $50

WGLOOPqq0: ldi R22, $F8

WGLOOPqq1: ldi R23, $FB

WGLOOPqq2: dec R23

brne WGLOOPqq2

dec R22

brne WGLOOPqq1

dec R21

brne WGLOOPqq0

; -----------------------------

; delaying 720 cycles:

ldi R21, $F0

WGLOOPqq3: dec R21

brne WGLOOPqq3

; =============================

;===============================================================

;==now wait for the command signal from the pc and then do tha==

;===============================================================

ldi r16,$ff

out portc,r16 ;siganl for ready to get command i/p

READ_PORTB:

in r16,pinb ;read port B on which signal from pc is present

andi r16,$07 ; masking the upper five bits

cpi r16,$00

breq READ_PORTB

cpi r16,$01

breq MOVE_FORWARD

cpi r16,$02

breq TURN_LEFT

cpi r16,$03

breq GO_BACK

cpi r16,$04

breq TURN_RIGHT

rjmp READ_PORTB

MOVE_FORWARD:

ldi r29,$55 ;data to move robo FWD

out portd,r29 ;give control signal to port D to move

ldi r20,$b4

ldi r21,$1a

call MOVE

rjmp MAIN

TURN_LEFT:

ldi r29,$56 ;data to move robo CCW

Page 56: Speech Recognition

49

out portd,r29 ;give control signal to port D to move

ldi r20,$b4

ldi r21,$1a

call MOVE

rjmp MAIN

GO_BACK:

ldi r29,$66 ;data to move robo back

out portd,r29 ;give control signal to port D to move

ldi r20,$b4

ldi r21,$1a

call MOVE

rjmp MAIN

TURN_RIGHT:

ldi r29,$65 ;data to move robo cw

out portd,r29 ;give control signal to port D to move

ldi r20,$b4

ldi r21,$1a

call MOVE

rjmp MAIN

MOVE:

; =============================

; delay loop of

; 100,000 cycles:

; -----------------------------

; delaying 99990 cycles:

ldi R22, $A5

WGLOOPdd00: ldi R23, $C9

WGLOOPdd01: dec R23

brne WGLOOPdd01

dec R22

brne WGLOOPdd00

; -----------------------------

; delaying 9 cycles:

ldi R22, $03

WGLOOPdd02: dec R22

brne WGLOOPdd02

; -----------------------------

; delaying 1 cycle:

nop

; =============================

LOOPABCD:

; =============================

; delay loop of

; 64 cycles:

Page 57: Speech Recognition

50

; -----------------------------

; delaying 63 cycles:

ldi R22, $15

WGLOOPcc000: dec R22

brne WGLOOPcc000

; -----------------------------

; delaying 1 cycle:

nop

; =============================

dec r20

brne LOOPABCD

cpi r21,$00

breq END_LOOP

dec r21

brne LOOPABCD

END_LOOP:

ldi r29,$ff ; jam both motors

out portd,r29

; =============================

; delay loop generator

; 5,000,000 cycles:

; -----------------------------

; delaying 4999995 cycles:

ldi R22, $21

WGLOOPqqq0: ldi R23, $D6

WGLOOPqqq1: ldi R25, $EB

WGLOOPqqq2: dec R25

brne WGLOOPqqq2

dec R23

brne WGLOOPqqq1

dec R22

brne WGLOOPqqq0

; -----------------------------

; delaying 3 cycles:

ldi R22, $01

WGLOOPqqq3: dec R22

brne WGLOOPqqq3

; -----------------------------

; delaying 2 cycles:

nop

nop

; =============================

ret

Page 58: Speech Recognition

51

Chapter 6

Results

The algorithms for face recognition and speech recognition

have been successfully implemented on the MATLAB 7.0.1 and the

results are given below.

6.1 Face Recognition Results

Person Recognition Efficiency

Aditya 85%

Subhayan 85%

Nilesh 82%

Chandra Veer 90%

.

6.2 Speech Recognition Results

Words

Recognition Efficiency

TURN LEFT

85%

MOVE RIGHT

80%

COME FORWARD

85%

GO BACK

75%

The above results are taken in very stringent environmental

conditions. For speech recognition, it is very necessary to

maintain same environmental conditions during creation of

dictionary and while taking sample voice commands. For face

recognition it is always required to maintain same light

conditions and image background during creation of dictionary and while taking sample image. The efficiency of the proposed

algorithm can be improved significantly if the strict and suitable

laboratory conditions are provided.

Page 59: Speech Recognition

52

Chapter 7

Summary and Conclusion

7.1 Summary

The algorithms for the Speech Recognition and Face

Recognition are of utmost importance as far as any security system

is concerned. In a reliable security system at least two stages of

security exist. This system also incorporates the same. At first the person is authenticated using face recognition and after proper

authentication of the person voice commands of the person is accepted by the system and the system works according to that.

This system can be used as authentication module of a humanoid robot or in real time industrial applications for security. This

security system is faster, simpler and economic as compare to previous algorithms.

7.2 Conclusion

A system for reliable recognition of speech and face has been

designed and developed. This system can be made highly efficient and effective if stringent environmental conditions are maintained.

The setup for maintaining these environmental conditions will be a onetime investment for any real life application. Apart from it, this system is highly efficient and economic as compare to other

systems generally used for providing security. The running cost of

this system is much lower as compare to other systems used for

the same purpose.

Page 60: Speech Recognition

53

Chapter 8

Future Scope The proposed system is highly efficient and effective .The

accuracy of the system can be improved remarkably if a setup for

providing stringent environmental conditions is provided. In the

proposed system for speech recognition the accent is used as distinguishing parameter and for face the boundary points are used

as distinguishing parameters. To improve the efficiency of these

algorithms some other parameters for speech and image can be used but this will definitely increase the system complexity and the

processing time. The efficiency of the same algorithms can be

improved significantly if the concept of Neural Networks is also incorporated in the implementation of the proposed algorithms.

Page 61: Speech Recognition

54

REFERENCES

[1] A. U. Batur, B.E. Flinchbaug and M.H. Hayes, ―A dsp based approach for the implementation of face recognition algorithms,‖

Proc. of ICASSP, pp. 253-256, 2003.

[2] A. U. Batur and M.H. Hayes, ―Linear subspace for illumination-robust face recognition,‖ Proc. of IEEE Conf. Computer Vision and

Pattern Recognition, pp. 296-301, 2001.

[3] M. A. Turk and A.P. Pentland, ―Face recognition using eigenfaces,‖ Proc. IEEE Conf. Computer Vision and Pattern

Recognition, pp. 586-91, 1991

[4] B. Moghaddam and A. Pentland, ―Probabilistic Visual Learning

for Object Representation,‖ IEEE Trans. Pattern Analysis and

Machine Intelligence, Vol.19, pp. 696-710, July 1997

[5] C. Kotropoucos and I. Pitas, ―Rule –Based Face Detection in

Frontal Views,‖ Proc. Int’l Conf. Acoustics, Speed and Signal

Processing, Vol.4, pp. 2537-2540, 1997.

[6] K. K. Sung and T. Poggio, ―Example Based Learning for View-Based Human Face Detection,‖ IEEE Trans. Pattern Analysis and

Machine Intelligence, Vol. 20, No. 1, pp. 39-51, Jan. 1998.

.

[7] R.F. Estrada, and E. A. Starr, “50 years of acoustic signal

processing for detection: coping with the digital revolution,‖ Annals of the History of Computing, IEEE, vol. 27, Issue: 2, pp. 65 – 78,

April-June 2005.

[8] R. C. Gonzalez and R. E. Woods, Digital Image Processing,

Pearson Education (Singapore) Pte. Ltd, Delhi, India, pp. 519-560,

2004.

[9] A. Mishra and A.Jain, Programming in MATLAB 7.0.1, Agarwal

Publication, Delhi, India.

Page 62: Speech Recognition

55

Appendix A

Pin Diagrams of Different Electronic Components

PIN DIAGRAM OF ATMEGA32L

Page 63: Speech Recognition

56

Page 64: Speech Recognition

57

Page 65: Speech Recognition

58

Page 66: Speech Recognition

59

Page 67: Speech Recognition

60

Page 68: Speech Recognition

61

Page 69: Speech Recognition

62

Page 70: Speech Recognition

63

Page 71: Speech Recognition

64

Page 72: Speech Recognition

65