SPEECH RECOGNITION BASED SWITCHING SYSTEMeie.uonbi.ac.ke/sites/default/files/cae/engineering/eie/SPEECH... · Project report submitted to the Department ... SPEECH RECOGNITION BASED

UNIVERSITY OF NAIROBI

FACULTY OF ELECTRICAL AND INFORMATION ENGINEERING

PROJECT: SPEECH RECOGNITION BASED SWITCHING SYSTEM

PROJECT INDEX: PRJ085

NAME: MBARAK HAJI CHIMWANI

REG. NO: F17/39705/2011

SUPERVISOR: DR. G. KAMUCHA

EXAMINER: DR. W. MWEMA

Project report submitted to the Department of Electrical and Information Engineering in partial

fulfillment of the requirements of BSc. Electrical and Electronic Engineering of University of Nairobi

Date of submission:

17/05/2016

DECLARATION

FACULTY/SCHOOL/INSTITUTE: Engineering

DEPARTMENT: Electrical and Information Engineering

COURSE: Bachelor of Science in Electrical & Electronic Engineering

NAME OF STUDENT: MBARAK HAJI CHIMWANII

REGISTRATION NUMBER: F17/39705/2011

PROJECT TITLE: SPEECH RECOGNITION BASED SWITCHING SYSTEM

I hereby declare and confirm that:

1) I understand what plagiarism is and I am aware of the university policy in this

regard.

1) The work embodied in this report I am submitting is entirely my own work, under

the supervision of Dr G. Kamucha and that has not been submitted elsewhere for

examination, award of a degree or publication. Where other people’s work or my

own work has been used, this has properly been acknowledged and referenced in

accordance with the University of Nairobi’s requirements.

2) I have not sought or used the services of any professional agencies to produce this

work.

3) I have not allowed, and shall not allow anyone to copy my work with the

intention of passing it off as his/her own work.

4) I understand that any false claim in respect of this work shall result in disciplinary

action, in accordance with University anti-plagiarism policy.

AUTHOR: MBARAK HAJI CHIMWANI

SIGNATURE: ………………………………………..

DATE: …………………………………………………

iii

DEDICATION I dedicate this work to my parents who have been a pillar to me throughout the duration of

my studies, from the first day of school I found in them all the support I needed to get

through Engineering School.

I also dedicate this work to my teachers. I am a product of all their grooming and preparation

instilled in me for the five years at University of Nairobi.

I finally dedicate this work to all my friends, family and loved ones. It is their unconditional

love and motivation that spurs me to attain greater heights in life every day.

iv

ACKNOWLEDGEMENTS

For the period consumed for the development of this project, many people took their time to

provide important inputs, help and great deal of support. First of all, I would like to express

my sincere thanks and great gratitude to Dr. G. Kamucha (University of Nairobi) for his

continuous guidance throughout the project. His supervision was very critical for the

development of the project. I would also like to thank family, relatives and friends not

forgetting all other members of the UoN fraternity including lecturers, fellow students, lab

technicians and non teaching staff who contributed in one way or another for the

development of the project. There was no occasion I sought assistance and I was turned

down.

Finally, I would like to thank Mr. Nicholus Kimali (UoN Fablab) who helped me develop the

skills to tackle the project, without him it is likely I would not be up for the task and I would

still be really struggling to reach my objective.

v

ABSTRACT

As observed in the vastly changing world of technology, automation is turning to be an

important part of a working system performing specific tasks for attaining desired goals.

Speech recognition inevitably cannot be overlooked when we speak automation since with

further advancements in technology there is no doubt it will be an integral part in the creation

of automated smart solutions to many technological problems of today’s world.

We seek to study the design and implementation of a microcontroller based speech

recognition system that can allow specific persons to perform simple tasks such as turning

lights on/off and opening/closing an electric door. This was centered on Electronics and

Microprocessors, specifically the Microcontroller. The mode of investigation applied was

hardware implementation involving physical fabrication of a working system.

vi

TABLE OF CONTENTS ACKNOWLEDGEMENTS............................................................................................ iv

ABSTRACT ................................................................................................................... v

ACRONYMS ............................................................................................................... viii

LIST OF FIGURES ....................................................................................................... ix

LIST OF TABLES .......................................................................................................... x

Chapter 1 ........................................................................................................................ 1

INTRODUCTION .......................................................................................................... 1

1.1 Background ........................................................................................................... 1

1.2 Main Objective ...................................................................................................... 1

1.3 Justifications .......................................................................................................... 1

1.4 Scope of the Project ............................................................................................... 2

1.5 Project Organization .............................................................................................. 2

Chapter 2 ........................................................................................................................ 3

LITERATURE REVIEW ................................................................................................ 3

2.1 Speech Recognition Implementation Techniques ................................................... 3

2.2 Selection of the Technique to Be Used ................................................................... 9

2.3 Coding Theory ..................................................................................................... 10

2.3.1 Error Correction ............................................................................................... 10

2.4 Literature of Main Components Implemented ...................................................... 14

2.4.1 Microcontroller ................................................................................................ 14

2.4.2 Microphone ..................................................................................................... 18

vii

Chapter 3 ...................................................................................................................... 21

DESIGN & IMPLENTATION ...................................................................................... 21

3.1 System Operation ................................................................................................ 21

3.1.1 Analysis ........................................................................................................... 21

3.1.2 Feature Extraction ............................................................................................ 21

3.1.3 Modeling ......................................................................................................... 22

3.1.4 Testing ............................................................................................................. 22

3.2 Circuits & Mathematical Descriptions ................................................................. 23

3.3 Software Development......................................................................................... 28

Chapter 4 ...................................................................................................................... 30

ANALYSIS & RESULTS ............................................................................................. 30

4.1 Microphone and Amplifier Tests.......................................................................... 30

4.2 Microcontroller Tests ........................................................................................... 33

4.3 Speech Recognition System Tests ........................................................................ 33

Chapter 5 ...................................................................................................................... 36

CONCLUSIONS & RECOMENDATIONS .................................................................. 36

References .................................................................................................................... 37

viii

ACRONYMS

ASR

HMM

DSP

ADC

GND

DTW

MCU

CRC

Automatic Speech Recognition

Hidden Markov Model

Digital Signal Processing

Analogue to Digital Converter

Ground

Dynamic Time Wrapping

Microcontroller

Cyclic Redundancy Check

ix

LIST OF FIGURES

Figure 1: Single-Bit Error ............................................................................................. 11 Figure 2: Burst Error .................................................................................................... 11 Figure 3: ATmega328P................................................................................................. 14 Figure 4: Condenser Microphone .................................................................................. 18 Figure 8: Amplifier First Stage ..................................................................................... 24 Figure 9: Amplifier Second Stage ................................................................................. 25 Figure 10 : Low Pass Filter ........................................................................................... 26 Figure 11 : Circuit Diagram .......................................................................................... 27 Figure 12: Overall Program Flowchart ........................................................................ 28 Figure 13: Speech Recognition Process Flowchart ....................................................... 29

x

LIST OF TABLES

Table 2: Results from Recorded Speech ........................................................................ 31 Table 3: Results from Spoken Speech ........................................................................... 32

1

Chapter 1

INTRODUCTION

1.1 Background .Since the 1960 Speech Recognition has been in development over decades where

computer scientists have been researching ways and means to give computers ability

to record interpret and understand human speech. This has not been an easy feet to

achieve. Even the most rudimentary problem such as digitalizing (sampling) voice

was a huge challenge in the early years. It was not until the 1980s before the first

systems arrived which could actually decipher speech. Since it was still in the early

stages, these early systems were very limited in scope and power. Later on more

complex systems that were stronger in scope, power and accuracy were developed

and this gave rise to the various techniques and methodologies that are now available

for implementing these automatic speech recognition systems.

1.2 Main Objective The main objective of this project is to design an embedded system which can be used

to detect and recognize human voice commands, using only a microcontroller as the

intelligence of the system, which is in turn used to toggle respective loads.

1.3 Justifications Automatic speech recognition today finds widespread applications in tasks that

require human machine interface such as automatic call processing [1], virtual reality,

multimedia searches, auto-attendants, travels information and reservation, translators,

natural language understanding and many more applications [2,3].

Systems in the market with the ability to recognize speech are however very

2

expensive and there is need to produce a system that finds many useful applications as

these with fairly low costs so as to challenge the existing markets with lower prices.

1.4 Scope of the Project The system is to be designed such that it is essentially expected to be in standby mode

waiting for an input from the user. Once input is detected, it is analyzed by the speech

recognition module. If a known command is detected the speech recognition system

sends respective digital representations to the microcontroller. The microcontroller

interprets these data signals compares them with a database and thus identifies the

referred load and its desired state ,according to the load state identified, control

signals are sent to respective relay circuits thus actuating appropriate loads. The

processing results are then displayed on the LCD which is particularly used to display

system states.

1.5 Project Organization

This project is organized into five chapters each specified as outlined below:

• Chapter 1: This is Introduction which more information about the project

including the problem statement, main objectives and scope of the project.

• Chapter 2: This chapter covers the literature review and provides theoretical

background of the main units of the project.

• Chapter 3: Design and implementation represents the system performance

and mathematical descriptions of how various components used were arrived

at.

• Chapter 4: Results obtained including analysis of the same.

• Chapter 5: Conclusion and recommendation of the project is outlined. This

chapter also contains appendices and references of the documents used in the

project.

3

Chapter 2

LITERATURE REVIEW 2.1 Speech Recognition Implementation Techniques

The following are the modeling or techniques which can be implemented in speech

recognition processes.

I. The Acoustic-phonetic Approach

The basis of the acoustic phonetic approach was based on finding speech sounds and

providing appropriate labels to these sounds. This method postulates that there exist

finite, distinctive phonetic units (phonemes) in spoken language and that these units

are broadly characterized by a set of acoustic properties that are manifested in the

speech signal over time [13].

Even though, the acoustic properties of phonetic units are highly variable, both with

speakers and with neighboring sounds, it is assumed in the acoustic-phonetic

approach that the rules governing the variability are straight forward and can be

readily learned by a machine [13].

Weaknesses

a) Limited success has been obtained because of the lack of a good knowledge of

acoustic phonetics and other related areas [15].

b) Researchers are yet to uncover proper acoustic properties for features and

therefore they have not been able to extract this information for phonetic

recognition [15].

c) All aspects of feature assimilation are not understood [15].

II. Pattern Recognition Approach

It involves two essential steps namely, pattern training and pattern comparison. The

essential feature of this approach is that it uses a well formulated mathematical

framework and establishes consistent speech pattern representations, for reliable

pattern comparison, from a set of labeled training samples [13].

4

A speech pattern representation can be in the form of a speech template or a statistical

model and can be applied to a sound (smaller than a word), a word, or a phrase.

In the pattern- comparison stage of the approach, a direct comparison is made

between the unknown speeches (the speech to be recognized) with each possible

pattern learned in the training stage in order to determine the identity of the unknown

according to the goodness of match of the patterns.

Strengths

a) This system is usually quite fast since once pattern comparison is effected the

system output is almost instantaneous. This makes timing quite good.

b) Costing is much less. The requirements to implement this sort of system are

readily available and cheaper to a large extent.

c) Uses a well formulated mathematical framework and establishes consistent

speech pattern representations, for reliable pattern comparison.

Weaknesses

a) Different accents produce different patterns and thus the accuracy of the

system is minimized when the accent of the prerecorded pattern is different

from the sample pattern.

III. Template Based Approaches

Unknown speech is compared against a set of pre-recorded words (templates) in order

to find the best match. A collection of prototypical speech patterns are stored as

reference patterns representing the dictionary of candidate’s words.

Recognition is then carried out by matching an unknown spoken utterance with each

of these reference templates and selecting the category of the best matching pattern.

Usually templates for entire words are constructed.

Strengths

a) Perfectly accurate word models are used.

b) Errors due to segmentation or classification of smaller acoustically more

variable units such as phonemes can be avoided.

c) Very efficient and cost effective for small systems with few words.

5

Weaknesses

a) Each word must have its own full reference template; template preparation and

matching become prohibitively expensive or impractical as vocabulary size

increases.

b) Pre-recorded templates are fixed, so variations in speech can only be modeled

by using many templates per word, which eventually becomes Impractical.

c) While template based approaches have been very effective in the design

of a variety of speech recognition systems; they provided little insight about

human speech processing, thereby making error analysis and knowledge-based

system enhancement difficult [13 ].

IV. Dynamic Time Warping

Dynamic time warping is an algorithm for measuring similarity between two

sequences which may vary in time or speed. For instance, similarities in walking

patterns would be detected, even if in one video, the person was walking slowly and if

in another, he or she were walking more quickly, or even if there were accelerations

and decelerations during the course of one observation. A well known application has

been automatic speech recognition, to cope with different speaking speeds [16].

DTW is a method that allows a computer to find an optimal match between two given

sequences. The sequences are "warped" non- linearly in the time dimension to

determine a measure of their similarity independent of certain non-linear variations in

the time dimension.

The optimization process is performed using dynamic programming, hence the name.

Strengths

a) Continuity is less important in DTW than in other pattern matching

algorithms.

b) DTW is an algorithm particularly suited to matching sequences with missing

information provided there are long enough segments for matching to occur.

c) DTW is quite efficient for isolated word recognition and can be adapted to

connected word recognition.

6

Weaknesses

a) Restrictions are imposed on the matching of the sequences.

b) Monotonicity of the mapping in the time dimension also exists.

c) Very complex mathematical and software development background is

required.

V. Knowledge Based Approaches

An expert knowledge about variations in speech is hand coded into a system. This has

the advantage of explicit modeling variations in speech; but unfortunately such expert

knowledge is difficult to obtain and use successfully. Thus this approach was judged

to be impractical and automatic learning procedure was sought instead [13].

Since this method was rendered impractical there is no need in dwelling on its

strengths and weaknesses.

VI. Statistical Based Approaches

Variations in speech are modeled statistically, using automatic, statistical learning

procedure, typically the Hidden Markov Models, or HMM. This approach represents

the current state of the art.

Modern general-purpose speech recognition systems are based on Hidden Markov

Models. These are statistical models that output a sequence of symbols or quantities.

HMMs are used in speech recognition because a speech signal can be viewed as a

piecewise stationary signal or a short-time (10 ms) stationary signal. In this model,

each phoneme is like a link in a chain, and the completed chain is a word. However,

the chain branches off in different directions as the program attempts to match the

digital sound with the phoneme that's most likely to come next.

During this process, a program assigns a probability score to each phoneme, based on

its built-in dictionary and user training [17].

HMMs would output a sequence of n-dimensional real-valued vectors (with n being a

small integer, such as 10), outputting one of these every 10 milliseconds. This process

is even more complicated for phrases and sentences since the system has to figure out

where each word stops and starts [17].

7

These statistical systems need lots of exemplary training data to reach their optimal

performance .These training data are used to create acoustic models of words, word

lists, and multi-word probability networks. While the software developers who set up

the system's initial vocabulary perform much of this training, the end user must also

spend some time training it on their particular speech patterns.

They must also train the system to recognize terms and acronyms particular to their

setting. Usually we have those special editions of speech recognition programs for

certain environmental settings such as medical or legal offices having terms

commonly used in those fields already trained into them [17].

Decoding of the speech, the term for what happens when the system is presented with

a new utterance and must compute the most likely source sentence, would probably

use the Viterbi algorithm to find the best path, and here there is a choice between

dynamically creating a combination hidden Markov model, which includes both the

acoustic and language model information, and combining it statically beforehand (the

finite state transducer, or FST, approach) [16 ].

A possible improvement to decoding is to keep a set of good candidates instead of just

keeping the best candidate, and to use a better scoring function (re scoring) to rate

these good candidates so that we may pick the best one according to this refined

score. The set of candidates can be kept either as a list, the N-best list approach, or as

a subset of the models referred to as a lattice. Efficient algorithms have been devised

to re score lattices represented as weighted finite state transducers [16].

Strengths

a) In speech recognition area HMM have been applied with great success to

problem such as part of speech classification [13].

b) A well-tuned HMM generally provides better compression than other models,

allowing more sequences to be significantly found.

c) Quite efficient for random inputs.

d) They can be trained automatically and are computationally feasible to use.

https://en.wikipedia.org/wiki/Viterbi_algorithm

https://en.wikipedia.org/wiki/Finite_state_transducer

https://en.wikipedia.org/w/index.php?title=Re_scoring_(ASR)&action=edit&redlink=1

https://en.wikipedia.org/w/index.php?title=N-best_list&action=edit&redlink=1

https://en.wikipedia.org/wiki/Lattice_(order)

https://en.wikipedia.org/wiki/Lattice_(order)

https://en.wikipedia.org/wiki/Finite_state_transducers

8

Weaknesses

a) Statistical methods must take priori modeling assumptions which are

answerable to be inaccurate, handicapping the system performance.

b) HMM algorithms are expensive, both in terms of memory and computing

time.

c) Very complex statistical knowledge is required that takes a lot of time and

dedication to master.

d) Coming up with effective working systems would most likely require steep

time investments.

VII. Learning Based Approaches

Approaches introduced to overcome the disadvantages of the HMMs. These are

machine learning methods such as neural networks and genetic algorithm

programming. Here explicit rules or other domain expert knowledge do not need to be

given they a can be learned automatically through emulations or evolutionary process.

VIII. The Artificial Intelligence Approach

The artificial intelligence approach attempts to mechanize the recognition procedure.

According to the way a person applies its intelligence in visualizing, analyzing, and

finally making a decision on the measured acoustic features. Expert system is used

widely in this approach.

The Artificial Intelligence approach is a hybrid of the acoustic phonetic approach and

pattern recognition approach. In this, it exploits the ideas and concepts of Acoustic

phonetic and pattern recognition methods. In its pure form, knowledge engineering

design involves the direct and explicit incorporation of expert speech knowledge into

a recognition system. This knowledge is usually derived from careful study of

spectrograms and is incorporated using rules or procedures.

In more indirect forms, knowledge has also been used to guide the design of models

and algorithms of other techniques such as template matching and stochastic

modeling. This form of knowledge application makes an important distinction

between knowledge and algorithms. Algorithms enable us to solve problems.

Knowledge enables the algorithms to work better.

9

Strengths

a) This form of knowledge based system enhancement has contributed

considerably to the design of all successful strategies reported.

b) It plays an important role in the selection of a suitable input representation, the

definition of units of speech, or the design of the recognition algorithm itself.

Weaknesses

a) This approach had only limited success, largely due to the difficulty in

quantifying expert knowledge.

b) The integration of many levels of human knowledge phonetics, phonotactics,

lexical access, syntax, semantics and pragmatics is a big difficulty.

c) Combining independent and asynchronous knowledge sources optimally

remains an unsolved problem.

2.2 Selection of the Technique to Be Used

A low cost and fully functioning implementation of a microcontroller based speech

recognition system that can allow specific persons to perform simple tasks such as

turning lights on/off and opening/closing an electric door that is simple and

achievable with the available resources is the finish line. In light of the above

discussions it seems fit to approach this project using the Template Based Approach

which is well suited bearing in mind the processing power needed for chips driving a

speech recognition system increase as the complexity of the chosen approach

increases. The processing power of locally available microcontrollers seems to only

befit this approach to a very large extent.

This approach also has all required inputs readily available and affordable as well.

Not forgetting the availability of material and software development tools together

with online debugging supports for software development essential for the project.

10

2.3 Coding Theory Coding theory is the study of the properties of codes and their fitness for a specific

application. Codes are used for data compression, cryptography, error-correction and

more recently also for network coding. Codes are studied by various scientific

disciplines such as information theory, electrical engineering,

mathematics, linguistics, and computer science for the purpose of designing efficient

and reliable data transmission methods. This typically involves the removal of

redundancy and the correction (or detection) of errors in the transmitted data. [22]

There are four types of coding, namely:

1. Data compression/Source coding

2. Error correction/ Channel coding

3. Cryptographic coding

4. Line coding

Source encoding attempts to compress the data from a source in order to transmit it

more efficiently e.g. zip data compression for transmission of smaller files over

network. Channel encoding, adds extra data bits to make the transmission of data

more robust to disturbances present on the transmission channel e.g. A typical music

CD uses this to correct for scratches and dust. In this application the transmission

channel is the CD itself. Cell phones also use coding techniques to correct for the

fading and noise of high frequency radio transmission.

Main area of interest in coding theory for the project is the Error Correction aspect of

it. The program employed for speech recognition, for purposes of better accuracy in

execution, would have to employ some Error Correction techniques.

2.3.1 Error Correction

Error correction is a technique that enables reliable delivery of digital data over

unreliable communication channels. Many communication channels are subject

to channel noise, and thus errors may be introduced during transmission.

https://en.wikipedia.org/wiki/Code

https://en.wikipedia.org/wiki/Data_compression

https://en.wikipedia.org/wiki/Cryptography

https://en.wikipedia.org/wiki/Error-correction

https://en.wikipedia.org/wiki/Network_coding

https://en.wikipedia.org/wiki/Information_theory

https://en.wikipedia.org/wiki/Electrical_engineering

https://en.wikipedia.org/wiki/Mathematics

https://en.wikipedia.org/wiki/Linguistics

https://en.wikipedia.org/wiki/Computer_science

https://en.wikipedia.org/wiki/Data_transmission

https://en.wikipedia.org/wiki/Data_compression

https://en.wikipedia.org/wiki/Error-correction_code

https://en.wikipedia.org/wiki/Channel_coding

https://en.wikipedia.org/wiki/Cryptography

https://en.wikipedia.org/wiki/Line_coding

https://en.wikipedia.org/wiki/Digital_data

https://en.wikipedia.org/wiki/Communication_channel

https://en.wikipedia.org/wiki/Noise_(electronics)

11

Error detection techniques allow detecting such errors, while error correction

enables reconstruction of the original data in many cases. [23]

An error is a deviation from a correct value or an unauthorized change in the content

that is being transmitted caused by a malfunction in a system or a functional unit

(signal gets attenuated, overwhelmed by noise). An example would be the occurrence

of a wrong a bit caused by an equipment malfunction. It is important to note that the

error can completely change the meaning of the transmitted data. [24]

The type of error may be either Single-Bit or Burst Error. The term single bit error

means that only one bit of a given data is changed from 1to 0 or from 0 to 1 which is

the least likely type of error to occur in serial data transmission however higher

occurrence probabilities exist in parallel data transmission. Burst Error on the other

hand is such that two or more bits of the data unit change from 1 to 0 or from 0 to 1

whereby it doesn’t necessarily have to occur in consecutive bits. The length of the

burst is measured from the first corrupted bit to the last corrupted bit and this is the

most likely type of error in serial data transmission.

Figure 1: Single-Bit Error

Figure 2: Burst Error

12

2.3.1.1 Error Detection Schemes

Error detection codes, including all error-detection-and-correction

codes transmit more bits than were in the original data. The transmitter sends a fixed

number of original data bits, followed by fixed number of check bits, also referred to

as redundancies, which are derived from the data bits by some deterministic

algorithm. The receiver applies the same algorithm to the received data bits and

compares its output to the received check bits; if the values do not match, an error has

occurred at some point during the transmission.

2.3.1.1.1 Repetition schemes

Given a stream of data that is to be sent, the data is broken up into blocks of bits, and

in sending, each block is sent some predetermined number of times. For example, if it

is required to send "1011", we may repeat this block three times each. Suppose we

send "1011 1011 1011", and this is received as "1010 1011 1011". As one group is not

the same as the other two, we can determine that an error has occurred. This scheme is

not very efficient, and can be susceptible to problems if the error occurs inexactly the

same place for each group say "1010 1010 1010" in the example above will be

detected as correct in this scheme[24].

2.3.1.2 Parity schemes

Also called Even Parity or Odd Parity, it is an error detection mechanism whereby

a parity bit is added to a group of source bits to ensure that the number of set bits (i.e.,

bits with value 1) in the outcome is even or odd[24]. The stream of data is broken up

into blocks of bits, and the number of 1 bits is counted. Then, a "parity bit" is set (or

cleared) if the number of one bits is odd (or even). If the tested blocks overlap, then

the parity bits can be used to isolate the error, and even correct it if the error affects a

single bit. There is a limitation to parity schemes in that a parity bit is only guaranteed

to detect an odd number of bit errors. If an even number of bits is flipped, the parity

bit appears to be correct, even though the data is corrupt.

13

2.3.1.3 Checksum

A checksum of a message is an arithmetic sum of message code words of a certain

word length, for example byte values and their carry value. The sum is negated by

one’s complement and stored or transferred as an extra code word, extending the

message. On the receiver side, a new checksum may be calculated from the extended

message. If the new checksum is not 0, error is detected.

2.3.1.4 Cyclic redundancy checks

The cyclic redundancy check considers a block of data as the coefficients to a

polynomial and then divides by a fixed, predetermined polynomial. The coefficients

of the result of the division are taken as the redundant data bits, the CRC. On

reception, one can recomputed the CRC from the payload bits and compare this with

the CRC that was received. A mismatch indicates that an error occurred.

2.3.1.5 Hamming distance based checks

This is the scheme of error correction applied for this project mainly due to its

simplicity in its execution and also effectiveness in giving the system the intelligence

to distinguish commands issued to it. This is how it basically works. If the goal is to

detect d bit errors in an n bit word we can map every n bit word into a

bigger n+d+1bit word so that the minimum Hamming distance between each valid

mapping is d+1. This way, if one receives a n+d+1 word that doesn't match any word

in the mapping(with a Hamming distance x <=d+1 from any word in the mapping) it

can successfully detect it as an error.

2.3.1.5.1 Hamming Distance

Hamming distance between two strings of equal length is the number of positions at

which the corresponding symbols are different. In another way, it measures the

minimum number of substitutions required to change one string into the other, or the

minimum number of errors that could have transformed one string into the other [25].

https://en.wikipedia.org/wiki/String_(computer_science)

14

As an example the Hamming distance between:

• "karolin" and "kathrin" is 3[25].

• "karolin" and "kerstin" is 3[25].

• 1011101 and 1001001 is 2[25].

• 2173896 and 2233796 is 3[25].

This sort of analysis on data will be an integral part of the software developed for the

system and it will be the basis for decision making for the purpose of distinction of

commands for the running program.

2.4 Literature of Main Components Implemented

2.4.1 Microcontroller

Figure 3: ATmega328P

Microcontrollers are basically small computers for embedded applications with

programmable capabilities and features such as input and output pins, serial

communication interfaces, memory for data storage (RAM), memory for program

storage and analog-to-digital converters. The number and availability of these and

other features vary from model to model, as does the programming language and

interface.

15

For the selection of which microcontroller to use for the project, not only are there a

number of technical features to consider, there are also economical issues such as cost

and lead-times that can cripple a project. In making my selection for this project, I

start with the justifications that will rule out a part family.

It is important to note, most manufacturers’ products offer the same peripherals which

gives the decision to pick a certain microcontroller even a bigger dilemma. Deadlines

can also influence MCU choice. Very short timelines make allocation of time to

learning a new architecture difficult, or even impossible, to justify. In that case, I

would conform to a family I have used before, or something similar.

At the start of a project it is never a good practice to jump in and begin selecting a

microcontroller before the details of the system have been hashed out. Before any

thought is given to the microcontroller, the high levels of the system block diagram

and flowchart must be determined first only then is there enough information to start

making a rational decision on microcontroller selection. When this point is reached,

the following were the justifications for selecting the ATmega328P as the MCU for

the project.

Required hardware interfaces: All the external interfaces such as communication

interfaces and digital inputs and outputs as well as analog to digital inputs that the

microcontroller will need to support were available for this family. These interfaces

dictate the number of pins that will be required by the microcontroller.

Software architecture: An estimate of how long and how often each task will need to

run together with an order of magnitude feel for how much processing power will be

needed is what was considered. The amount of computing power required and

frequency of the microcontroller seemed to be well inside allowable response times

expected from a microcontroller driving such a project. An 8bit MCU such as the

ATmega328P could comfortable handle this task.

Memory Needs: Flash and RAM are two very critical components of any

microcontrollers. Making sure space or variable space is sufficient was a high

priority. The ATmega328P provides 32 Kilobytes of memory for operation.

However an external SD card could still be used to supplement any memory

16

requirements that might crop up. So the memory needs were well catered for.

Costs and Power Constraints: The ATmega328P is powered by a 5V Vcc which

can easily be provided by batteries which are available at fairly affordable prices. The

chip’s overall cost is also affordable.

Part availability: This chip is readily available in the country and the institution

could easily provide it. So this was also a key point in settling for ATmega328P.

Compilers and tools: The last consideration was to examine the compiler and tools

that are available. Most microcontrollers have a number of choices for compilers,

example code and debugging tools. All the necessary tools were available for the

part. Without the right tools the development process could have become tedious and

expensive.

Principle of Operation

The ATmega328P features a 10-bit successive approximation ADC. The ADC is

connected to an 8-channel Analog Multiplexer which allows 8 single-ended voltage

inputs constructed from the pins of Port A. The single-ended voltage inputs refer to

0V (GND).

The ADC converts an analog input voltage to a 10-bit digital value through successive

approximation. The minimum value represents GND and the maximum value

represents the voltage on the AREF pin minus 1 LSB. Optionally, AVCC or an

internal 2.56V reference voltage may be connected to the AREF pin by writing to the

REFSn bits in the ADMUX Register. The internal voltage reference may thus be

decoupled by an external capacitor at the AREF pin to improve noise immunity.

The analog input channel and differential gain are selected by writing to the MUX bits

in ADMUX. Any of the ADC input pins, as well as GND and a fixed band gap

voltage reference, can be selected as single ended inputs to the ADC. A selection of

ADC input pins can be selected as positive and negative inputs to the differential gain

amplifier.

17

The ADC generates a 10-bit result which is presented in the ADC Data Registers,

ADCH and ADCL. By default, the result is presented right adjusted, but can

optionally be presented left adjusted by setting the ADLAR bit in ADMUX. If the

result is left adjusted and no more than 8-bit precision is required, it is sufficient to

read ADCH. Otherwise, ADCL must be read first, then ADCH, to ensure that the

content of the Data Registers belongs to the same conversion. Once ADCL is read,

ADC access to data registers is blocked. This means that if ADCL has been read, and

a conversion completes before ADCH is read, neither register is updated and the

result from the conversion is lost. When ADCH is read, ADC access to the ADCH

and ADCL Registers is re-enabled.

Noise Cancellation Techniques

Digital circuitry inside and outside the device generates EMI which might affect the

accuracy of analog measurements. If conversion accuracy is critical, the noise level

can be reduced by applying the following techniques:

1. Keep analog signal paths as short as possible. Make sure analog tracks run

over the analog ground plane, and keep them well away from high-speed

switching digital tracks

2. The AVCC pin on the device should be connected to the digital Vcc supply

voltage via an LC network as shown in Figure 106 of its data sheet.

3. Use the ADC noise canceller function to reduce induced noise from the CPU.

4. If any ADC port pins are used as digital outputs, it is essential that these do

not switch while a conversion is in progress.

18

2.4.2 Microphone

Figure 4: Condenser Microphone

An electret microphone is a type of electrostatic capacitor-based microphone, which

eliminates the need for a polarizing power supply by using a permanently charged

material [18].

An electret is a stable dielectric material with a permanently embedded static electric

charge (which, due to the high resistance and chemical stability of the material, will

not decay for hundreds of years). The name comes from electrostatic and magnet;

drawing analogy to the formation of a magnet by alignment of magnetic domains in a

piece of iron [18]. Electrets are commonly made by first melting a

suitable dielectric material such as a plastic or wax that contains polar molecules, and

then allowing it to re-solidify in a powerful electrostatic field. The polar molecules of

the dielectric align themselves to the direction of the electrostatic field, producing a

permanent electrostatic "bias" [18].

Electret Microphones come in three forms:

Foil-type or diaphragm-type

A film of electret material is used as the diaphragm itself. This is the most common

type, but also the lowest quality, since the electret material does not make a

particularly good diaphragm [18].

https://en.wikipedia.org/wiki/Microphone

https://en.wikipedia.org/wiki/Electret

https://en.wikipedia.org/wiki/Dielectric

https://en.wikipedia.org/wiki/Electric_charge

https://en.wikipedia.org/wiki/Electric_charge

https://en.wikipedia.org/wiki/Electrical_resistance

https://en.wikipedia.org/wiki/Magnet

https://en.wikipedia.org/wiki/Dielectric

https://en.wikipedia.org/wiki/Chemical_polarity

https://en.wikipedia.org/wiki/Molecules

19

Back electret

An electret film is applied to the back plate of the microphone capsule and the

diaphragm is made of an uncharged material, which may be mechanically more

suitable for the transducer design being realized [18].

Front electret

In this newer type, the back plate is eliminated from the design, and the capacitor is

formed by the diaphragm and the inside surface of the capsule. The electret film is

adhered to the inside front cover and the metalized diaphragm is connected to the

input of the FET. It is equivalent to the back electret in that any conductive film may

be used for the diaphragm [18].

Microphone sensitivity is typically measured with a 1 kHz sine wave at a 94 dB sound

pressure level (SPL), or 1 Pascal (Pa) pressure. The magnitude of the analog or digital

output signal from the microphone with that input stimulus is a measure of its

sensitivity [19].

Typically specified in logarithmic units of dBV (decibels with respect to 1 V), it tells

how many volts the output signal will be for a given SPL. For an analog microphone,

sensitivity, in linear units of mV/Pa, can be expressed logarithmically in decibels:

Where:

OutputAREF is the 1000 mV/Pa (1 V/Pa) reference output ratio.

A high sensitivity microphone isn’t always better than a low sensitivity microphone.

Sensitivity tells something about the characteristics of the microphone but not

necessarily about its quality [19]. A balance between the microphone’s noise level,

clipping point, distortion, and sensitivity determines whether a microphone is a good

fit for a particular application.

https://en.wikipedia.org/wiki/Field_effect_transistor

20

A microphone with high sensitivity may need less preamp gain before the analog-to-

digital conversion, but it may have less headroom before clipping than a microphone

with lower sensitivity [19].

In near-field applications, such as cell phones, where the microphone is close to the

sound source, a microphone with higher sensitivity is more likely to reach the

maximum acoustic input, clip, and cause distortion.

On the other hand, a higher sensitivity may be desirable in far-field applications, such

as conference phones and security cameras, where the sound is attenuated as the

distance from the source to the microphone increases [19].

21

Chapter 3

DESIGN & IMPLENTATION

3.1 System Operation

The speech recognition system may be viewed as working in a four stages

1. Analysis

2. Feature extraction

3. Modeling

4. Testing

3.1.1 Analysis

Speech data contains speaker specific information due to vocal tract, excitation source

and behavioral features of the speaker which makes the data unique in each instance.

This information embedded in the signal is analyzed for speech recognition.

3.1.2 Feature Extraction

In order to find some statistically relevant information from incoming data, it is

important to have mechanisms for reducing the information of each segment in the

audio signal into a relatively small number of parameters, or features. These features

should describe each segment in such a characteristic way that other similar segments

can be grouped together by comparing their features.

The information in speech signal is represented by short term amplitude spectrum of

the speech wave form. This allows us to extract features based on the short term

amplitude spectrum from speech (phonemes). Digital signal processing (DSP)

techniques are the core of this speech recognition system. DSP methods are used in

speech analysis, synthesis, coding and recognition.

This is all executed by input signal pre amplification, followed by a an analogue to

digital conversion by the microcontroller MCU and a software implementation that

22

achieves recognition through carrying out microcontroller PORT logic and employing

knowledge on Hamming distance to increase accuracy.

Theoretically, it should be possible to recognize speech directly from the digitized

waveform. However, because of the large variability of the speech signal, it is better

to perform some feature extraction that would reduce that variability.

3.1.3 Modeling

The speech recognition modeling techniques are divided into two classifications i.e.

1. Speaker Identification: Automatically identifies who is speaking on basis

of individual information integrated in speech signal. The main aim of speaker

identification is comparing speech from an unknown speaker to a database of known

speakers. The system can recognize the speaker when it has been trained with a

number of speakers.

2. Speaker Recognition: This aspect of the field can be looked at from the

following four angles i.e. speaker independent, speaker dependent, text dependent and

text independent. Whereby;

• Speaker Independent- The computer should ignore the speaker specific

characteristics of the speech signal and extract the intended massage.

• Speaker Dependant-Machine should extract speaker characteristics in the

acoustic signal.

• Text dependent -The speaker says key words or sentences having the same

text for both training and recognition trials.

• Text independent – The system does not rely on a specific text being spoken.

This system is a Speaker Dependent Speech Recognition System.

3.1.4 Testing

The testing phase and matching go hand in hand. In this phase we test whether a

processed output is in line with the required output. For this system, a conclusion of

SUCCESS is reached at by correct selection of a system output(loads) as intended by

a user.

23

System Block Diagram

3.2 Circuits & Mathematical Descriptions

The amplification employed is of OP AMP. The main purpose of this circuit is pre

amplification i.e. to make weak signals strong enough for further processing.

The microcontroller employed for the circuit is the ATmega328P microcontroller of

the AVR series by ATMEL. This microcontroller is the intelligence of the whole

system and its operation is what controls the activities of the system. The system

utilizes the inbuilt Analogue to Digital conversion capabilities of this chip to convert

an analogue speech signal to an equivalent digital capture and stores it in one of its

registers as the sample that shall be processed for EEPROM storage as a template for

comparison with an input signal that will be arriving as a user uses the system.

Microphone voice input from user

Input signal Amplification

Speech analysis

Control signal to controller

Controller identify the load

Load

(BLUE LED) Load

(GREEN LED)

24

Principles Employed For Circuit Design.

Baring in mind any extended DC voltage alters the biasing conditions and affects

performance, capacitor C1 was placed as the input Microphone coupling capacitor

which blocks any DC component if present. Cout prevents any DC voltage from

entering into the succeeding stage hence preventing clamping of this output signal by

the DC level present at the OP AMP output. R3 and R4 provide necessary voltage to

drive the OP AMP, whereby we require that VCC drops before the input. R1 is just a

microphone load resistor.

Figure 5: Amplifier First Stage

Now, from the output of an electrets microphone as specified by manufacture data

sheet it usually falls in the range of mV. This means an amplification of a gain of

about 1000 is required. Obviously amplifying this signal with this gain directly may

render the system disastrous since the accompanying noise will also be amplified with

this gain. To curb this, a two stage amplifier is employed with first stage having a gain

of 10 followed by a low pass filter which then acts as an input to a second stage of

amplification having a gain of 100 whose output is finally fed to the microcontroller

ADC input.

25

For the first stage:

Choosing 100

10 100 10

100 101.0 10 10

For the second stage:

Choosing 100

100 100 10

100 101.0 10 1

Figure 6: Amplifier Second Stage

26

Condition Five: The main consideration in choosing capacitor C1 value is to ensure

that its capacitive reactance is low enough, compared with the input impedance of the

amplifier, or any load connected to the output, to allow signals at all the required

frequencies to pass. The reactance of a capacitor is greatest at low frequencies,

therefore the choice of coupling capacitor values must allow for a low reactance at the

lowest frequencies the amplifier is designed to amplify. A generally accepted value

for coupling capacitors in an audio amplifier would be between 1µF and 10µF.

For our system we know that an electret microphone has an operating frequency

response of about 0-30KHz.

Figure 7 : Low Pass Filter

For the Active filter design, the following principles and procedures were used.

Choosing C5 =0.47;

Calculating C6 = C5 2 ≅ 1

Calculating R8 and R9 = √

√..

≅ 10Ω

C4=C8=100 to 1000 times C1 (not critical) = 220

27

Figure 8 : Circuit Diagram

28

3.3 Software Development Figure 9: Overall Program Flowchart YES NO YES NO YES NO

Start

DISPLAY: STORE BLUE COMMAND

Speech Input from microphone

DISPLAY: NO MATCH

TRY AGAIN

DISPLAY: BLUE COMMAND STORED DISPLAY: STORE GREEN COMMAD


DISPLAY: GREEN COMMAND STORED DISPLAY: STORE OFF COMMAD


DISPLAY: OFF COMMAND STORED DISPLAY: SYSTEM READY

DISPLAY: SPEAK WHEN RED OFF


Speech Recognized SWITCH ON

BLUE LED

DISPLAY: NO MATCH

TRY AGAIN Speech Recognized SWITCH OFF

ALL LEDs

DISPLAY: NO MATCH

TRY AGAIN Speech Recognized SWITCH ON

GREEN LED

DISPLAY: SYSTEM

TASK ACHIEVED

29

Figure 10: Speech Recognition Process Flowchart

HAMMING DISTANCE LESS THAN 3 HAMMING DISTANCE GREATER THAN 3

Start


DISPLAY NO MATCH MESSAGE

Compare speech input with stored template

through measuring Hamming Distance

SPEECH RECOGNISED

30

Chapter 4

ANALYSIS & RESULTS

4.1 Microphone and Amplifier Tests The system was connected up to the output stage of the filter of the second stage amplifier, i.e. the system circuit excluding the microcontroller and the loads connected to the system, with the output being fed to the channel two input of a Digital Oscilloscope for the purpose of system preliminary analysis. There was also provision for the output of the system power supply be fed to channel one of the same oscilloscope for purposes of ensuring the system is receiving required power supply all though the tests. Two test speech samples containing the same spoken word “TESTING” were analyzed, the first a pre-recorded speech sample obtained through using a mobile phone for the voice recording and the second a spoken speech of the same word spoken during the experiment. The following results were obtained: Oscilloscope Settings Channel One Volts per Division: 1V Time base or Time/Division: 2ms Coupling: DC Impedance: 1MΩ

Channel Two Volts per Division: 1V Time base or Time/Division: 2ms Coupling: AC Impedance: 1MΩ

31

Table 1: Results from Recorded Speech

ITERATION WAVEFORM AMPLITUDE (Voltsp-p)

First Try

1.8

Second Try

2.3

Third Try

2.69

32

Table 2: Results from Spoken Speech

ITERATION WAVEFORM AMPLITUDE (Voltsp-p)

First Try

3.03

Second Try

2.01

Third Try

2.99

33

NB: When no input was applied to the microphone, a voltage output of 986mVp-p was

measured at the amplifier output. This was recorded as the fairly silent measurement

since the lab was open air and there was no total silence in the lab.

The above tests conformed to the expected amplifier specifications. Since an output in

the range of volts was measured after a microphone input in the range of mill volts

was fed in it. This sufficiently proved the two stage audio amplifiers were properly

designed to give a combined gain of 1000 which was the requirement necessary for a

signal to undergo Analogue to Digital Conversion sufficiently on an ATmega 328P as

per the manufacturer data sheet.

4.2 Microcontroller Tests The ATmega 328P employed for the system was then first tested for any timing issues

and ensuring all initial requirements of setting up a new microcontroller such as

burning of fuse bits were done properly.

The microcontroller was powered and a simple code implementing the blinking of one

of the LEDs after every five seconds was programmed to it.

The LED was observed to blink after every five seconds as expected from the small

program specifications signaling the clock requirements were properly set up

4.3 Speech Recognition System Tests The code ‘see Appendix E’ was compiled and programmed to the microcontroller. The

system was powered on and the following observations were made when a short

spoken speech sample was issued to the system:

• The LCD screen came on and displayed the “WELCOME” message on the

first row followed by “STORE BLUE COMMAND” on the second row.

• The RED LED came on to alert the user to be ready for recording(training) the

templates that would be stored in EEPROM for feature comparison processes

34

• The system ,through the LCD screen prompted a user sequentially to store the

commands for ‘TURNING ON BLUE LED’ then ‘GREEN’ then finally

‘TURN OFF’ all the two LEDS i.e. BLUE and GREEN.

• After successful storage the system confirmed to a user through the LCD

screen that all the three commands have been stored in the EEPROM.

• The system then prompted the user to start operating the system by issuing

commands to sequentially light the BLUE LED followed by GREEN then

finally turn OFF all the two LEDs at once.

• Once the above process terminated, the LCD screen displayed “FINAL

SPEECH RECOGNISED” thereby signaling the end of the system’s process.

• Pressing a push button prompted the system to restart this process all over

again from the point it displayed the “WELCOME” message.

To analyze the above observations, the obtained observations resulted from the

algorithm implemented in the code for the system. The microcontroller performed

each task it was programmed to do very well at the specified delays as per the

software installed.

It is important to note that the system seemed to be unstable commands longer than

three seconds were issued to the system. This was as a result of the accuracy of error

correction technique used by the system. Hamming distance has a fairly low degree of

presented accuracy in its operation.

The system also forced a user to do its tasks in a sequential manner as per the program

and any deviation from the algorithm was NOT recognized and it prompted a user to

RESTART issuing the lighting commands from the point the system prompts a user to

“SPEAK IF RED LED ON”

35

When the code ‘see appendix c’ was compiled and programmed to the

microcontroller, with the system powered on and recorded speech samples were

issued to the system instead of spoken, the same observations presented as above were

seen but with higher accuracy. Accuracy here being the number of tries it required a

user to move through the steps in lighting the lights. With spoken speech, a user had

to try about six times to be able to successfully operate the system to the end.

With recorded speech, it took four tries and a user could move through the whole

system steps of lighting sequentially the two LEDs and putting them all off.

Percentage Difference

% =

× 100

= ÷

× 100

= 20%

36

Chapter 5

CONCLUSIONS &

RECOMENDATIONS Conclusion: Feasibility

With a lot of programming and some knowledge in Error Correction, it is possible to

implement a Microcontroller based speech recognition system using a simple 8bit IC

such as an ATmega 328P.

Recommendation: A strong background in programming must be

attained before attempting this implementation.

Conclusion: Cost

A simple microcontroller is a powerful device that can give you value for your money

by enabling one build a complex system doing powerful tasks at affordable costs. The

project amounted to a total cost of about Kshs 1 500.

Recommendation: More emphasis should be made on the many

opportunities such a small device such as a microcontroller can open to willing minds.

The possibilities are simply limited to ones’ own imagination.

Conclusion: Project Specifications

The project objective and specifications were successfully met. However, this was not

an easy feet to achieve due to the limited kinds of microcontroller available locally

when features are to be considered. It was tedious to implement the system with an

8bit microcontroller due to the limitations it presents in terms of features. It was not

possible to acquire a microcontroller that could do a better job and this contributed to

inaccuracies of the system.

Recommendation: Feature project specifications can be matched with

devices that are readily available or the family of the device to drive the project can

completely be changed to a stronger family. For example moving a task from being

implemented with a microcontroller to being implemented using a microprocessor.

37

References [1] R.Klevansand R.Rodman, “Voice Recognition", Artech House, Boston,

London 1997.

[2] Kevin Brady, Michael Brandstein, Thomas Quatieri, Bob Dunn “An

Evaluation Of Audio-Visual person Recognition on the XM2VTS corpus using the

Lausanne protocol” MIT Lincoln Laboratory, 244 Wood St., Lexington MA

[3] W. M. Campbell_, D. E. Sturim W. Shen D. A. Reynolds_,J. Navr´atily “The

MIT- LL/IBM Speaker recognition System using High performance reduced

Complexity recognition” MIT Lincoln Laboratory IBM 2006.

[4] GIN-DER WU AND YING LEI “ A Register Array based Low power FFT

Processor for speech recognition” Department of Electrical engineering national Chi

Nan university Puli ,545 Taiwan.

[5] Nicolás Morales1, John H. L. Hansen2 and Doorstep T. Toledano1 “MFCC

Compensation for improved recognition filtered and band limited speech” Center for

Spoken Language Research, University of Colorado at Boulder,Boulder (CO), USA

[6] M.A.Anusuya ,S.K.Katti “Speech Recognition by Machine: A Review”

International journal of computer science and Information Security 2009.

[7] Kenneth Thomas Schutte “Parts-based Models and Local Features for

Automatic Speech Recognition” B.S., University of Illinois at Urbana-Champaign

(2001) S.M., Massachusetts Institute of Technology (2003).

[8] Sannella, M Speaker recognition Project Report report” From

http://cs.joensuu.fi/pages/tkinnu/research/index.html Viewed 23 Feb. 2010.

[9] S.katagiri, Speech Pattern recognition using Neural Networks.

[10] L.R.Rabiner and B.H.jaung ,” Fundamentles of Speech Recognition Prentice-

Hall, Englewood Cliff, New Jersy,1993.

[11] D.R.Reddy, An Approach to Computer Speech Recognition by Direct

Analysis of the Speech Wave , Tech.Report No.C549, Computer Science Dept.,

Stanford Univ. September 1966

38

[12] International Journal of Computer Applications (0975 – 8887)Volume 10– No.3,

November 2010

[13] Santosh K.Gaikwad, Bharti W.Gawali, Pravin Yannawar,"A Review on Speech

Recognition Technique", Department of CS& IT ,Dr.Babasaheb Ambedkar

MarathwadaUniversity,Aurangabad(2010)

[14] Urmila Shrawankar, Dr. Vilas Thakare," Techniques for Feature Extraction in

Speech Recognition system: a comparative study", Dept. of Computer Science, SGB

Amravati University, Amravati.

[15] Carol Yvonne Espy-Wilson, “An Acoustic-Phonetic Approach to Speech

Recognition: Application to the Semivowels”, Research Laboratory of Electronics

Massachusetts Institute of Technology Cambridge, MA02139USA(1987).

[16] "Speech recognition," in Wikipedia, Wikimedia Foundation, 2016. [Online].

Available: https://en.wikipedia.org/wiki/Speech_recognition. Accessed: Apr. 6, 2016.

[17] E. Grabianowski, "How speech recognition works," HowStuffWorks, 2006.

[Online]. Available: http://electronics.howstuffworks.com/gadgets/high-tech-g

[18] "Electret microphone," in Wikipedia, Wikimedia Foundation, 2016. [Online].

Available: https://en.wikipedia.org/wiki/Electret_microphone. Accessed: Apr. 6,

2016.adgets/speech-recognition2.htm. Accessed: Apr. 6, 2016.

[19] J. Lewis, "Understanding microphone sensitivity,". [Online]. Available:

http://www.analog.com/library/analogDialogue/archives/46-

05/understanding_microphone_sensitivity.html. Accessed: Apr. 6, 2016.

[20] Bruce Carter, “Filter Design in Thirty Seconds”, Texas Instruments, Dallas,

Texas 75265(2001).

[21] "LM358," in Wikipedia, Wikimedia Foundation, 2016. [Online]. Available:

https://en.wikipedia.org/wiki/LM358. Accessed: Apr. 6, 2016.

[22]"Coding theory," in Wikipedia, Wikimedia Foundation, 2016. [Online]. Available:

https://en.wikipedia.org/wiki/Coding_theory. Accessed: May 8, 2016.

[23]S. Inc, "ERROR DETECTION AND CORRECTION USING HAMMING

CODE," Scribd, 2016. [Online]. Available:

https://www.scribd.com/doc/25715552/ERROR-DETECTION-AND-

39

CORRECTION-USING-HAMMING-CODE. Accessed: May 8, 2016.

[24]"Error detection and correction," in Wikipedia, Wikimedia Foundation, 2016.

[Online]. Available: https://en.wikipedia.org/wiki/Error_detection_and_correction.

Accessed: May 8, 2016.

[25]"Hamming distance," in Wikipedia, Wikimedia Foundation, 2016. [Online].

Available: https://en.wikipedia.org/wiki/Hamming_distance. Accessed: May 8, 2016.

Appendix A

BILL OF MATERIALS

40

Appendix B PCB Schematic Drawing Using Microwin

41

Appendix C

42

PCB Schematic

Appendix D

43

ATmega328P Overall Features

• High-performance, Low-power AVR

®

8-bit Microcontroller

l Advanced RISC Architecture

131 Powerful Instructions – Most Single Clock

Cycle Execution

32 x 8 General Purpose Working Registers

Fully Static Operation

Up to 20 MIPS Throughput at 20MHz

On-chip 2-cycle Multiplier

l High Endurance Non-volatile Memory Segments

4/8/16/32KBytes of In-System Self-

Programmable Flash program memory

256/512/512/1KBytes EEPROM

512/1K/1K/2KBytes Internal SRAM

Write/Erase Cycles: 10,000 Flash/100,000

EEPROM

Data retention: 20 years at 85°C/100 years at

25°C

Optional Boot Code Section with Independent

Lock Bits

l In-System Programming by On-chip Boot Program

l True Read-While-Write Operation

Programming Lock for Software Security

l Atmel® QTouch® library support Capacitive touch buttons, sliders and wheels

QTouch and QMatrix® acquisition

Up to 64 sense channels

l Peripheral Features

Two 8-bit Timer/Counters with Separate

Prescaler and Compare Mode

One 16-bit Timer/Counter with Separate

Prescaler, Compare Mode, and

Capture Mode

Real Time Counter with Separate Oscillator

Six PWM Channels

8-channel 10-bit ADC in TQFP and QFN/MLF

package

l Temperature Measurement

6-channel 10-bit ADC in PDIP Package

l Temperature Measurement

Programmable Serial USART

Master/Slave SPI Serial Interface

Byte-oriented 2-wire Serial Interface (Philips I

2

C compatible)

Programmable Watchdog Timer with Separate

On-chip Oscillator

On-chip Analog Comparator

Interrupt and Wake-up on Pin Change

• Special Microcontroller Features

Power-on Reset and Programmable Brown-out

Detection

Internal Calibrated Oscillator

External and Internal Interrupt Sources

Six Sleep Modes: Idle, ADC Noise Reduction,

Power-save, Power-down, Standby, and

Extended Standby

l I/O and Packages

23 Programmable I/O Lines

28-pin PDIP, 32-lead TQFP, 28-pad QFN/MLF

and 32-pad QFN/MLF

l Operating Voltage: 1.8 - 5.5V

l Temperature Range: -40°C to 85°C

l Speed Grade: 0 - [email protected] - 5.5V, 0 - [email protected] - 5.5.V, 0

- 20MHz @ 4.5 - 5.5V

l Power Consumption at 1MHz, 1.8V, 25°C

Active Mode: 0.2mA

Power-down Mode: 0.1µA

Power-save Mode: 0.75µA (Including 32kHz

RTC)

Pin Descriptions

ADC7:6 (TQFP and QFN/MLF Package Only)

44

In the TQFP and QFN/MLF package, ADC7:6 serve as analog inputs to the A/D

converter. These pins are powered from the analog supply and serve as 10-bit ADC

channels.

Port B (PB7:0) XTAL1/XTAL2/TOSC1/TOSC2

Port B is an 8-bit bi-directional I/O port with internal pull-up resistors (selected for

each bit). The Port B output buffers have symmetrical drive characteristics with both

high sink and source capability. As inputs, Port B pins that are externally pulled low

will source current if the pull-up resistors are activated. The Port B pins are tristated

when a reset condition becomes active, even if the clock is not running.

Depending on the clock selection fuse settings, PB6 can be used as input to the

inverting Oscillator amplifier and input to the internal clock operating circuit.

Depending on the clock selection fuse settings, PB7 can be used as output from the

inverting Oscillator amplifier.

If the Internal Calibrated RC Oscillator is used as chip clock source, PB7...6 is used as

TOSC2...1 input for the Asynchronous Timer/Counter2 if the AS2 bit in ASSR is set.

AVcc

AVcc is the supply voltage pin for the A/D Converter, PC3:0, and ADC7:6. It should

be externally connected to Vcc, even if the ADC is not used. If the ADC is used, it

should be connected to Vcc through a low-pass filter. Note that PC6...4 use digital

supply voltage, Vcc,

AREF

AREF is the analog reference pin for the A/D Converter.

Analogue to Digital Converter Features

• 10-bit Resolution

45

• 0.5 LSB Integral Non-linearity

• ±2 LSB Absolute Accuracy

• 13 - 260 µs Conversion Time

• Up to 76.9kSPS (Up to 15kSPS at Maximum Resolution)

• 6 Multiplexed Single Ended Input Channels

• 2 Additional Multiplexed Single Ended Input Channels (TQFP and QFN/MLF

Package only)

• Temperature Sensor Input Channel

• Optional Left Adjustment for ADC Result Readout

• 0 – VCC ADC Input Voltage Range

• Selectable 1.1V ADC Reference Voltage

• Free Running or Single Conversion Mode

• Interrupt on ADC Conversion Complete

• Sleep Mode Noise Canceler

APPENDIX E

Code: /*

46

* Final Project Code.c

*

* Created: 14-May-16 10:38:18 AM

* Author : F17/39705/2011

*/

#ifndef F_CPU

#define F_CPU 16000000UL // Define clock frequency

#endif

#include <avr/io.h>//Includes io.h header file where all the Input/Output

Registers and its Bits are defined for all AVR micro controller.

#include <util/delay.h>//Includes delay.h header file which defines two

functions, _delay_ms (millisecond delay) and _delay_us (microsecond delay).

#include <avr/eeprom.h>//Includes eeprom.h header file which defines different

functions for eeprom manipulation.

#include "lcd.h"//Includes lcd.h header file which defines different functions

for all Alphanumeric LCD.

#include <avr/interrupt.h>//Includes interrupt.h header file which defines

different functions for ISR.

uint8_t volatile start_recording = 0;

uint8_t start_second_recording = 0;

uint8_t start_third_recording = 0;

uint8_t start_lisening = 0;

uint8_t nextGreen = 0;

uint8_t nextOFF = 0;

//Global Variables

uint8_t theLowADC;

uint16_t theTenBitResults;

volatile uint8_t tot_overflow;

void timer1_init();

//Function to set up the ADC registers

void adc_enable(void)

ADCSRA = (1<<ADPS2)|(1<<ADPS1)|(1<<ADPS0); //Pre-scaler selection.

The division factor is 128,thus, F_ADC = 16M/128 = 125kHz.

ADMUX = (1<<REFS0); //AREF = AVcc

ADMUX = (1<<MUX2);// choose ADC channel 4 as the ADC input

ADCSRA |= 1<<ADIE; //ADC Interrupt enable

ADCSRA |= 1<<ADEN; //Enable ADC

//The Interrupt Service routine for ADC

ISR(ADC_vect)

theLowADC= ADCL;

theTenBitResults= ADCH<<8 | theLowADC;

47

ADCSRA |= 1<<ADSC; // Start Conversion

int main(void)

DDRC = 0b00000000;

DDRD = 0b11111111; // initialize port D

DDRB = 0b00000111; // initialize port B

lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on *

lcd_clrscr(); /* clear screen of lcd */

lcd_home(); /* bring cursor to 0,0 */

lcd_puts("WELCOME"); /* display message */

lcd_gotoxy(0,1); /* go to 2nd row 1st col */

lcd_puts("STORE BLUE CMND"); /* display something */

_delay_ms(3000); /* wait 3seconds */

//timer1_init();

start_recording = 1;

while (1)

if (PINC5 == 0)


PORTB = 0b11111110; // RED LED on.

if(start_recording == 1)

PORTB = 0b11111111; // RED LED off.

lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on,

cursor on */



lcd_puts("BLUE COMMAND"); /* display something */


lcd_puts("STORED"); /* display something */


start_second_recording = 1;

if (start_second_recording == 1) // if lcd displays "blue command

stored"

48

start_second_recording = 0;

lcd_init(LCD_DISP_ON_CURSOR);

lcd_clrscr();

lcd_home();

lcd_puts("NEXT");

lcd_gotoxy(0,1);

lcd_puts("STORE GREEN COMMAND");

_delay_ms(3000);// display "Next Store Green Command


_delay_ms(5000); /* wait 5s */


adc_enable();

sei(); //Enable global interrupts

ADCSRA |= 1<<ADSC; // Start conversion

uint16_t WordOfData_B ; //Define type and size of data

WordOfData_B = theTenBitResults;

eeprom_update_word (( uint16_t *) 2, WordOfData_B ); // update

the EEPROM location"0"

tot_overflow++; // keep a track of number of overflows

if (tot_overflow >= 2) // the writing process needs 3.4ms we add a few

more overflows to give time for EEPROM Write then we signal the micro

controller we are done

start_third_recording = 1;

if (start_third_recording == 1) // if lcd displays "green

command stored"

start_third_recording = 0;


lcd_clrscr();

lcd_home();

lcd_puts("NEXT");

lcd_gotoxy(0,1);

lcd_puts("STORE OFF COMMAND");

_delay_ms(3000);

49




adc_enable();



//void eeprom3 (void)

//

uint16_t WordOfData_C ; //Define type and size of data

WordOfData_C = theTenBitResults;

eeprom_update_word (( uint16_t *) 4, WordOfData_C ); //

update the EEPROM location"0"

//

tot_overflow++; // keep a track of number of overflows

if (tot_overflow >= 2) // the writing process needs 3.4ms we add

a few more overflows to give time for EEPROM Write then we signal the micro


start_lisening = 1;

if(start_lisening == 1)

start_lisening = 0;


lcd_clrscr();

lcd_home();

lcd_puts("SYSTEM READY");

lcd_gotoxy(0,1);

lcd_puts("SPEAK IF RED OFF");

_delay_ms(3000);




50

adc_enable();



uint16_t WordOfData_D ;

WordOfData_D = eeprom_read_word (( uint16_t *) 0) ;

//Perform hamming_distance_

int dist = 0;

unsigned val = WordOfData_D ^ theTenBitResults;

// Count the number of bits set

while (val != 0)

// A bit is set, so increment the count and clear the bit

dist++;

val &= val - 1;

if (dist<=3)

PORTB = 0b11111101;

lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */

_clrscr(); /* clear screen of lcd */


lcd_puts("SPEECH"); /* display something */


lcd_puts("RECOGNISED"); /* display something */


nextGreen = 1;

else

PORTB = 0b11111111;




lcd_puts("NO MATCH"); /* display something */


51

lcd_puts("TRY AGAIN"); /* display something */


start_lisening = 1;

if(nextGreen == 1)

uint16_t WordOfData_E ;

WordOfData_E = eeprom_read_word (( uint16_t *) 2) ;

//Evaluate hamming distance

int dist = 0;

unsigned val = WordOfData_E ^ theTenBitResults;


while (val != 0)


dist++;

val &= val - 1;

if (dist<=3)

PORTB = 0b11111001;




lcd_puts("SPEECH"); /* display something */




nextOFF = 1;

else

PORTB = 0b11111101;








start_lisening = 1;

52

if(nextOFF == 1)

uint16_t WordOfData_F ;

WordOfData_F = eeprom_read_word (( uint16_t *) 4) ;

//Evaluate Hamming Distance

int dist = 0;

unsigned val = WordOfData_F ^ theTenBitResults;


while (val != 0)


dist++;

val &= val - 1;

// Return the number of differing bits

if (dist<=3)

PORTB = 0b11111111;




lcd_puts(" FINAL SPEECH"); /* display something */


lcd_puts("TASK COMPLETE"); /* display something */




lcd_puts(" FINAL SPEECH"); /* display something */




else PORTB = 0b11111001;





53



_delay_ms(3000);

start_lisening = 1;

//Function to set up the TIMER registers

void timer1_init()

TCCR1B |= (1 << CS11);// set up timer with prescaler = 8

TCNT1 = 0;// initialize counter

TIMSK1 |= (1 << TOIE1);// enable overflow interrupt

sei();// enable global interrupts

tot_overflow = 0;// initialize overflow counter variable

void recordCommand()

tot_overflow = 0; // reset overflow counter

adc_enable();



void eeprom (void)

uint16_t WordOfData_A ; //Define type and size of data

WordOfData_A = theTenBitResults;

eeprom_update_word (( uint16_t *) 0, WordOfData_A ); //

update the EEPROM location"0"

//tot_overflow++; // keep a track of number of overflows

if (tot_overflow >= 2) // the writing process needs 3.4ms we add

a few more overflows to give time for EEPROM Write then we signal the micro



54

Documents

SPEECH RECOGNITION BASED SWITCHING SYSTEMeie.uonbi.ac.ke/sites/default/files/cae/engineering/eie/SPEECH... · Project report submitted to the Department ... SPEECH RECOGNITION BASED