UNIVERSITY OF NAIROBI
FACULTY OF ELECTRICAL AND INFORMATION ENGINEERING
PROJECT: SPEECH RECOGNITION BASED SWITCHING SYSTEM
PROJECT INDEX: PRJ085
NAME: MBARAK HAJI CHIMWANI
REG. NO: F17/39705/2011
SUPERVISOR: DR. G. KAMUCHA
EXAMINER: DR. W. MWEMA
Project report submitted to the Department of Electrical and Information Engineering in partial
fulfillment of the requirements of BSc. Electrical and Electronic Engineering of University of Nairobi
Date of submission:
17/05/2016
DECLARATION
FACULTY/SCHOOL/INSTITUTE: Engineering
DEPARTMENT: Electrical and Information Engineering
COURSE: Bachelor of Science in Electrical & Electronic Engineering
NAME OF STUDENT: MBARAK HAJI CHIMWANII
REGISTRATION NUMBER: F17/39705/2011
PROJECT TITLE: SPEECH RECOGNITION BASED SWITCHING SYSTEM
I hereby declare and confirm that:
1) I understand what plagiarism is and I am aware of the university policy in this
regard.
1) The work embodied in this report I am submitting is entirely my own work, under
the supervision of Dr G. Kamucha and that has not been submitted elsewhere for
examination, award of a degree or publication. Where other people’s work or my
own work has been used, this has properly been acknowledged and referenced in
accordance with the University of Nairobi’s requirements.
2) I have not sought or used the services of any professional agencies to produce this
work.
3) I have not allowed, and shall not allow anyone to copy my work with the
intention of passing it off as his/her own work.
4) I understand that any false claim in respect of this work shall result in disciplinary
action, in accordance with University anti-plagiarism policy.
AUTHOR: MBARAK HAJI CHIMWANI
SIGNATURE: ………………………………………..
DATE: …………………………………………………
iii
DEDICATION I dedicate this work to my parents who have been a pillar to me throughout the duration of
my studies, from the first day of school I found in them all the support I needed to get
through Engineering School.
I also dedicate this work to my teachers. I am a product of all their grooming and preparation
instilled in me for the five years at University of Nairobi.
I finally dedicate this work to all my friends, family and loved ones. It is their unconditional
love and motivation that spurs me to attain greater heights in life every day.
iv
ACKNOWLEDGEMENTS
For the period consumed for the development of this project, many people took their time to
provide important inputs, help and great deal of support. First of all, I would like to express
my sincere thanks and great gratitude to Dr. G. Kamucha (University of Nairobi) for his
continuous guidance throughout the project. His supervision was very critical for the
development of the project. I would also like to thank family, relatives and friends not
forgetting all other members of the UoN fraternity including lecturers, fellow students, lab
technicians and non teaching staff who contributed in one way or another for the
development of the project. There was no occasion I sought assistance and I was turned
down.
Finally, I would like to thank Mr. Nicholus Kimali (UoN Fablab) who helped me develop the
skills to tackle the project, without him it is likely I would not be up for the task and I would
still be really struggling to reach my objective.
v
ABSTRACT
As observed in the vastly changing world of technology, automation is turning to be an
important part of a working system performing specific tasks for attaining desired goals.
Speech recognition inevitably cannot be overlooked when we speak automation since with
further advancements in technology there is no doubt it will be an integral part in the creation
of automated smart solutions to many technological problems of today’s world.
We seek to study the design and implementation of a microcontroller based speech
recognition system that can allow specific persons to perform simple tasks such as turning
lights on/off and opening/closing an electric door. This was centered on Electronics and
Microprocessors, specifically the Microcontroller. The mode of investigation applied was
hardware implementation involving physical fabrication of a working system.
vi
TABLE OF CONTENTS ACKNOWLEDGEMENTS............................................................................................ iv
ABSTRACT ................................................................................................................... v
ACRONYMS ............................................................................................................... viii
LIST OF FIGURES ....................................................................................................... ix
LIST OF TABLES .......................................................................................................... x
Chapter 1 ........................................................................................................................ 1
INTRODUCTION .......................................................................................................... 1
1.1 Background ........................................................................................................... 1
1.2 Main Objective ...................................................................................................... 1
1.3 Justifications .......................................................................................................... 1
1.4 Scope of the Project ............................................................................................... 2
1.5 Project Organization .............................................................................................. 2
Chapter 2 ........................................................................................................................ 3
LITERATURE REVIEW ................................................................................................ 3
2.1 Speech Recognition Implementation Techniques ................................................... 3
2.2 Selection of the Technique to Be Used ................................................................... 9
2.3 Coding Theory ..................................................................................................... 10
2.3.1 Error Correction ............................................................................................... 10
2.4 Literature of Main Components Implemented ...................................................... 14
2.4.1 Microcontroller ................................................................................................ 14
2.4.2 Microphone ..................................................................................................... 18
vii
Chapter 3 ...................................................................................................................... 21
DESIGN & IMPLENTATION ...................................................................................... 21
3.1 System Operation ................................................................................................ 21
3.1.1 Analysis ........................................................................................................... 21
3.1.2 Feature Extraction ............................................................................................ 21
3.1.3 Modeling ......................................................................................................... 22
3.1.4 Testing ............................................................................................................. 22
3.2 Circuits & Mathematical Descriptions ................................................................. 23
3.3 Software Development......................................................................................... 28
Chapter 4 ...................................................................................................................... 30
ANALYSIS & RESULTS ............................................................................................. 30
4.1 Microphone and Amplifier Tests.......................................................................... 30
4.2 Microcontroller Tests ........................................................................................... 33
4.3 Speech Recognition System Tests ........................................................................ 33
Chapter 5 ...................................................................................................................... 36
CONCLUSIONS & RECOMENDATIONS .................................................................. 36
References .................................................................................................................... 37
viii
ACRONYMS
ASR
HMM
DSP
ADC
GND
DTW
MCU
CRC
Automatic Speech Recognition
Hidden Markov Model
Digital Signal Processing
Analogue to Digital Converter
Ground
Dynamic Time Wrapping
Microcontroller
Cyclic Redundancy Check
ix
LIST OF FIGURES
Figure 1: Single-Bit Error ............................................................................................. 11 Figure 2: Burst Error .................................................................................................... 11 Figure 3: ATmega328P................................................................................................. 14 Figure 4: Condenser Microphone .................................................................................. 18 Figure 8: Amplifier First Stage ..................................................................................... 24 Figure 9: Amplifier Second Stage ................................................................................. 25 Figure 10 : Low Pass Filter ........................................................................................... 26 Figure 11 : Circuit Diagram .......................................................................................... 27 Figure 12: Overall Program Flowchart ........................................................................ 28 Figure 13: Speech Recognition Process Flowchart ....................................................... 29
x
LIST OF TABLES
Table 2: Results from Recorded Speech ........................................................................ 31 Table 3: Results from Spoken Speech ........................................................................... 32
1
Chapter 1
INTRODUCTION
1.1 Background .Since the 1960 Speech Recognition has been in development over decades where
computer scientists have been researching ways and means to give computers ability
to record interpret and understand human speech. This has not been an easy feet to
achieve. Even the most rudimentary problem such as digitalizing (sampling) voice
was a huge challenge in the early years. It was not until the 1980s before the first
systems arrived which could actually decipher speech. Since it was still in the early
stages, these early systems were very limited in scope and power. Later on more
complex systems that were stronger in scope, power and accuracy were developed
and this gave rise to the various techniques and methodologies that are now available
for implementing these automatic speech recognition systems.
1.2 Main Objective The main objective of this project is to design an embedded system which can be used
to detect and recognize human voice commands, using only a microcontroller as the
intelligence of the system, which is in turn used to toggle respective loads.
1.3 Justifications Automatic speech recognition today finds widespread applications in tasks that
require human machine interface such as automatic call processing [1], virtual reality,
multimedia searches, auto-attendants, travels information and reservation, translators,
natural language understanding and many more applications [2,3].
Systems in the market with the ability to recognize speech are however very
2
expensive and there is need to produce a system that finds many useful applications as
these with fairly low costs so as to challenge the existing markets with lower prices.
1.4 Scope of the Project The system is to be designed such that it is essentially expected to be in standby mode
waiting for an input from the user. Once input is detected, it is analyzed by the speech
recognition module. If a known command is detected the speech recognition system
sends respective digital representations to the microcontroller. The microcontroller
interprets these data signals compares them with a database and thus identifies the
referred load and its desired state ,according to the load state identified, control
signals are sent to respective relay circuits thus actuating appropriate loads. The
processing results are then displayed on the LCD which is particularly used to display
system states.
1.5 Project Organization
This project is organized into five chapters each specified as outlined below:
• Chapter 1: This is Introduction which more information about the project
including the problem statement, main objectives and scope of the project.
• Chapter 2: This chapter covers the literature review and provides theoretical
background of the main units of the project.
• Chapter 3: Design and implementation represents the system performance
and mathematical descriptions of how various components used were arrived
at.
• Chapter 4: Results obtained including analysis of the same.
• Chapter 5: Conclusion and recommendation of the project is outlined. This
chapter also contains appendices and references of the documents used in the
project.
3
Chapter 2
LITERATURE REVIEW 2.1 Speech Recognition Implementation Techniques
The following are the modeling or techniques which can be implemented in speech
recognition processes.
I. The Acoustic-phonetic Approach
The basis of the acoustic phonetic approach was based on finding speech sounds and
providing appropriate labels to these sounds. This method postulates that there exist
finite, distinctive phonetic units (phonemes) in spoken language and that these units
are broadly characterized by a set of acoustic properties that are manifested in the
speech signal over time [13].
Even though, the acoustic properties of phonetic units are highly variable, both with
speakers and with neighboring sounds, it is assumed in the acoustic-phonetic
approach that the rules governing the variability are straight forward and can be
readily learned by a machine [13].
Weaknesses
a) Limited success has been obtained because of the lack of a good knowledge of
acoustic phonetics and other related areas [15].
b) Researchers are yet to uncover proper acoustic properties for features and
therefore they have not been able to extract this information for phonetic
recognition [15].
c) All aspects of feature assimilation are not understood [15].
II. Pattern Recognition Approach
It involves two essential steps namely, pattern training and pattern comparison. The
essential feature of this approach is that it uses a well formulated mathematical
framework and establishes consistent speech pattern representations, for reliable
pattern comparison, from a set of labeled training samples [13].
4
A speech pattern representation can be in the form of a speech template or a statistical
model and can be applied to a sound (smaller than a word), a word, or a phrase.
In the pattern- comparison stage of the approach, a direct comparison is made
between the unknown speeches (the speech to be recognized) with each possible
pattern learned in the training stage in order to determine the identity of the unknown
according to the goodness of match of the patterns.
Strengths
a) This system is usually quite fast since once pattern comparison is effected the
system output is almost instantaneous. This makes timing quite good.
b) Costing is much less. The requirements to implement this sort of system are
readily available and cheaper to a large extent.
c) Uses a well formulated mathematical framework and establishes consistent
speech pattern representations, for reliable pattern comparison.
Weaknesses
a) Different accents produce different patterns and thus the accuracy of the
system is minimized when the accent of the prerecorded pattern is different
from the sample pattern.
III. Template Based Approaches
Unknown speech is compared against a set of pre-recorded words (templates) in order
to find the best match. A collection of prototypical speech patterns are stored as
reference patterns representing the dictionary of candidate’s words.
Recognition is then carried out by matching an unknown spoken utterance with each
of these reference templates and selecting the category of the best matching pattern.
Usually templates for entire words are constructed.
Strengths
a) Perfectly accurate word models are used.
b) Errors due to segmentation or classification of smaller acoustically more
variable units such as phonemes can be avoided.
c) Very efficient and cost effective for small systems with few words.
5
Weaknesses
a) Each word must have its own full reference template; template preparation and
matching become prohibitively expensive or impractical as vocabulary size
increases.
b) Pre-recorded templates are fixed, so variations in speech can only be modeled
by using many templates per word, which eventually becomes Impractical.
c) While template based approaches have been very effective in the design
of a variety of speech recognition systems; they provided little insight about
human speech processing, thereby making error analysis and knowledge-based
system enhancement difficult [13 ].
IV. Dynamic Time Warping
Dynamic time warping is an algorithm for measuring similarity between two
sequences which may vary in time or speed. For instance, similarities in walking
patterns would be detected, even if in one video, the person was walking slowly and if
in another, he or she were walking more quickly, or even if there were accelerations
and decelerations during the course of one observation. A well known application has
been automatic speech recognition, to cope with different speaking speeds [16].
DTW is a method that allows a computer to find an optimal match between two given
sequences. The sequences are "warped" non- linearly in the time dimension to
determine a measure of their similarity independent of certain non-linear variations in
the time dimension.
The optimization process is performed using dynamic programming, hence the name.
Strengths
a) Continuity is less important in DTW than in other pattern matching
algorithms.
b) DTW is an algorithm particularly suited to matching sequences with missing
information provided there are long enough segments for matching to occur.
c) DTW is quite efficient for isolated word recognition and can be adapted to
connected word recognition.
6
Weaknesses
a) Restrictions are imposed on the matching of the sequences.
b) Monotonicity of the mapping in the time dimension also exists.
c) Very complex mathematical and software development background is
required.
V. Knowledge Based Approaches
An expert knowledge about variations in speech is hand coded into a system. This has
the advantage of explicit modeling variations in speech; but unfortunately such expert
knowledge is difficult to obtain and use successfully. Thus this approach was judged
to be impractical and automatic learning procedure was sought instead [13].
Since this method was rendered impractical there is no need in dwelling on its
strengths and weaknesses.
VI. Statistical Based Approaches
Variations in speech are modeled statistically, using automatic, statistical learning
procedure, typically the Hidden Markov Models, or HMM. This approach represents
the current state of the art.
Modern general-purpose speech recognition systems are based on Hidden Markov
Models. These are statistical models that output a sequence of symbols or quantities.
HMMs are used in speech recognition because a speech signal can be viewed as a
piecewise stationary signal or a short-time (10 ms) stationary signal. In this model,
each phoneme is like a link in a chain, and the completed chain is a word. However,
the chain branches off in different directions as the program attempts to match the
digital sound with the phoneme that's most likely to come next.
During this process, a program assigns a probability score to each phoneme, based on
its built-in dictionary and user training [17].
HMMs would output a sequence of n-dimensional real-valued vectors (with n being a
small integer, such as 10), outputting one of these every 10 milliseconds. This process
is even more complicated for phrases and sentences since the system has to figure out
where each word stops and starts [17].
7
These statistical systems need lots of exemplary training data to reach their optimal
performance .These training data are used to create acoustic models of words, word
lists, and multi-word probability networks. While the software developers who set up
the system's initial vocabulary perform much of this training, the end user must also
spend some time training it on their particular speech patterns.
They must also train the system to recognize terms and acronyms particular to their
setting. Usually we have those special editions of speech recognition programs for
certain environmental settings such as medical or legal offices having terms
commonly used in those fields already trained into them [17].
Decoding of the speech, the term for what happens when the system is presented with
a new utterance and must compute the most likely source sentence, would probably
use the Viterbi algorithm to find the best path, and here there is a choice between
dynamically creating a combination hidden Markov model, which includes both the
acoustic and language model information, and combining it statically beforehand (the
finite state transducer, or FST, approach) [16 ].
A possible improvement to decoding is to keep a set of good candidates instead of just
keeping the best candidate, and to use a better scoring function (re scoring) to rate
these good candidates so that we may pick the best one according to this refined
score. The set of candidates can be kept either as a list, the N-best list approach, or as
a subset of the models referred to as a lattice. Efficient algorithms have been devised
to re score lattices represented as weighted finite state transducers [16].
Strengths
a) In speech recognition area HMM have been applied with great success to
problem such as part of speech classification [13].
b) A well-tuned HMM generally provides better compression than other models,
allowing more sequences to be significantly found.
c) Quite efficient for random inputs.
d) They can be trained automatically and are computationally feasible to use.
8
Weaknesses
a) Statistical methods must take priori modeling assumptions which are
answerable to be inaccurate, handicapping the system performance.
b) HMM algorithms are expensive, both in terms of memory and computing
time.
c) Very complex statistical knowledge is required that takes a lot of time and
dedication to master.
d) Coming up with effective working systems would most likely require steep
time investments.
VII. Learning Based Approaches
Approaches introduced to overcome the disadvantages of the HMMs. These are
machine learning methods such as neural networks and genetic algorithm
programming. Here explicit rules or other domain expert knowledge do not need to be
given they a can be learned automatically through emulations or evolutionary process.
VIII. The Artificial Intelligence Approach
The artificial intelligence approach attempts to mechanize the recognition procedure.
According to the way a person applies its intelligence in visualizing, analyzing, and
finally making a decision on the measured acoustic features. Expert system is used
widely in this approach.
The Artificial Intelligence approach is a hybrid of the acoustic phonetic approach and
pattern recognition approach. In this, it exploits the ideas and concepts of Acoustic
phonetic and pattern recognition methods. In its pure form, knowledge engineering
design involves the direct and explicit incorporation of expert speech knowledge into
a recognition system. This knowledge is usually derived from careful study of
spectrograms and is incorporated using rules or procedures.
In more indirect forms, knowledge has also been used to guide the design of models
and algorithms of other techniques such as template matching and stochastic
modeling. This form of knowledge application makes an important distinction
between knowledge and algorithms. Algorithms enable us to solve problems.
Knowledge enables the algorithms to work better.
9
Strengths
a) This form of knowledge based system enhancement has contributed
considerably to the design of all successful strategies reported.
b) It plays an important role in the selection of a suitable input representation, the
definition of units of speech, or the design of the recognition algorithm itself.
Weaknesses
a) This approach had only limited success, largely due to the difficulty in
quantifying expert knowledge.
b) The integration of many levels of human knowledge phonetics, phonotactics,
lexical access, syntax, semantics and pragmatics is a big difficulty.
c) Combining independent and asynchronous knowledge sources optimally
remains an unsolved problem.
2.2 Selection of the Technique to Be Used
A low cost and fully functioning implementation of a microcontroller based speech
recognition system that can allow specific persons to perform simple tasks such as
turning lights on/off and opening/closing an electric door that is simple and
achievable with the available resources is the finish line. In light of the above
discussions it seems fit to approach this project using the Template Based Approach
which is well suited bearing in mind the processing power needed for chips driving a
speech recognition system increase as the complexity of the chosen approach
increases. The processing power of locally available microcontrollers seems to only
befit this approach to a very large extent.
This approach also has all required inputs readily available and affordable as well.
Not forgetting the availability of material and software development tools together
with online debugging supports for software development essential for the project.
10
2.3 Coding Theory Coding theory is the study of the properties of codes and their fitness for a specific
application. Codes are used for data compression, cryptography, error-correction and
more recently also for network coding. Codes are studied by various scientific
disciplines such as information theory, electrical engineering,
mathematics, linguistics, and computer science for the purpose of designing efficient
and reliable data transmission methods. This typically involves the removal of
redundancy and the correction (or detection) of errors in the transmitted data. [22]
There are four types of coding, namely:
1. Data compression/Source coding
2. Error correction/ Channel coding
3. Cryptographic coding
4. Line coding
Source encoding attempts to compress the data from a source in order to transmit it
more efficiently e.g. zip data compression for transmission of smaller files over
network. Channel encoding, adds extra data bits to make the transmission of data
more robust to disturbances present on the transmission channel e.g. A typical music
CD uses this to correct for scratches and dust. In this application the transmission
channel is the CD itself. Cell phones also use coding techniques to correct for the
fading and noise of high frequency radio transmission.
Main area of interest in coding theory for the project is the Error Correction aspect of
it. The program employed for speech recognition, for purposes of better accuracy in
execution, would have to employ some Error Correction techniques.
2.3.1 Error Correction
Error correction is a technique that enables reliable delivery of digital data over
unreliable communication channels. Many communication channels are subject
to channel noise, and thus errors may be introduced during transmission.
11
Error detection techniques allow detecting such errors, while error correction
enables reconstruction of the original data in many cases. [23]
An error is a deviation from a correct value or an unauthorized change in the content
that is being transmitted caused by a malfunction in a system or a functional unit
(signal gets attenuated, overwhelmed by noise). An example would be the occurrence
of a wrong a bit caused by an equipment malfunction. It is important to note that the
error can completely change the meaning of the transmitted data. [24]
The type of error may be either Single-Bit or Burst Error. The term single bit error
means that only one bit of a given data is changed from 1to 0 or from 0 to 1 which is
the least likely type of error to occur in serial data transmission however higher
occurrence probabilities exist in parallel data transmission. Burst Error on the other
hand is such that two or more bits of the data unit change from 1 to 0 or from 0 to 1
whereby it doesn’t necessarily have to occur in consecutive bits. The length of the
burst is measured from the first corrupted bit to the last corrupted bit and this is the
most likely type of error in serial data transmission.
Figure 1: Single-Bit Error
Figure 2: Burst Error
12
2.3.1.1 Error Detection Schemes
Error detection codes, including all error-detection-and-correction
codes transmit more bits than were in the original data. The transmitter sends a fixed
number of original data bits, followed by fixed number of check bits, also referred to
as redundancies, which are derived from the data bits by some deterministic
algorithm. The receiver applies the same algorithm to the received data bits and
compares its output to the received check bits; if the values do not match, an error has
occurred at some point during the transmission.
2.3.1.1.1 Repetition schemes
Given a stream of data that is to be sent, the data is broken up into blocks of bits, and
in sending, each block is sent some predetermined number of times. For example, if it
is required to send "1011", we may repeat this block three times each. Suppose we
send "1011 1011 1011", and this is received as "1010 1011 1011". As one group is not
the same as the other two, we can determine that an error has occurred. This scheme is
not very efficient, and can be susceptible to problems if the error occurs inexactly the
same place for each group say "1010 1010 1010" in the example above will be
detected as correct in this scheme[24].
2.3.1.2 Parity schemes
Also called Even Parity or Odd Parity, it is an error detection mechanism whereby
a parity bit is added to a group of source bits to ensure that the number of set bits (i.e.,
bits with value 1) in the outcome is even or odd[24]. The stream of data is broken up
into blocks of bits, and the number of 1 bits is counted. Then, a "parity bit" is set (or
cleared) if the number of one bits is odd (or even). If the tested blocks overlap, then
the parity bits can be used to isolate the error, and even correct it if the error affects a
single bit. There is a limitation to parity schemes in that a parity bit is only guaranteed
to detect an odd number of bit errors. If an even number of bits is flipped, the parity
bit appears to be correct, even though the data is corrupt.
13
2.3.1.3 Checksum
A checksum of a message is an arithmetic sum of message code words of a certain
word length, for example byte values and their carry value. The sum is negated by
one’s complement and stored or transferred as an extra code word, extending the
message. On the receiver side, a new checksum may be calculated from the extended
message. If the new checksum is not 0, error is detected.
2.3.1.4 Cyclic redundancy checks
The cyclic redundancy check considers a block of data as the coefficients to a
polynomial and then divides by a fixed, predetermined polynomial. The coefficients
of the result of the division are taken as the redundant data bits, the CRC. On
reception, one can recomputed the CRC from the payload bits and compare this with
the CRC that was received. A mismatch indicates that an error occurred.
2.3.1.5 Hamming distance based checks
This is the scheme of error correction applied for this project mainly due to its
simplicity in its execution and also effectiveness in giving the system the intelligence
to distinguish commands issued to it. This is how it basically works. If the goal is to
detect d bit errors in an n bit word we can map every n bit word into a
bigger n+d+1bit word so that the minimum Hamming distance between each valid
mapping is d+1. This way, if one receives a n+d+1 word that doesn't match any word
in the mapping(with a Hamming distance x <=d+1 from any word in the mapping) it
can successfully detect it as an error.
2.3.1.5.1 Hamming Distance
Hamming distance between two strings of equal length is the number of positions at
which the corresponding symbols are different. In another way, it measures the
minimum number of substitutions required to change one string into the other, or the
minimum number of errors that could have transformed one string into the other [25].
14
As an example the Hamming distance between:
• "karolin" and "kathrin" is 3[25].
• "karolin" and "kerstin" is 3[25].
• 1011101 and 1001001 is 2[25].
• 2173896 and 2233796 is 3[25].
This sort of analysis on data will be an integral part of the software developed for the
system and it will be the basis for decision making for the purpose of distinction of
commands for the running program.
2.4 Literature of Main Components Implemented
2.4.1 Microcontroller
Figure 3: ATmega328P
Microcontrollers are basically small computers for embedded applications with
programmable capabilities and features such as input and output pins, serial
communication interfaces, memory for data storage (RAM), memory for program
storage and analog-to-digital converters. The number and availability of these and
other features vary from model to model, as does the programming language and
interface.
15
For the selection of which microcontroller to use for the project, not only are there a
number of technical features to consider, there are also economical issues such as cost
and lead-times that can cripple a project. In making my selection for this project, I
start with the justifications that will rule out a part family.
It is important to note, most manufacturers’ products offer the same peripherals which
gives the decision to pick a certain microcontroller even a bigger dilemma. Deadlines
can also influence MCU choice. Very short timelines make allocation of time to
learning a new architecture difficult, or even impossible, to justify. In that case, I
would conform to a family I have used before, or something similar.
At the start of a project it is never a good practice to jump in and begin selecting a
microcontroller before the details of the system have been hashed out. Before any
thought is given to the microcontroller, the high levels of the system block diagram
and flowchart must be determined first only then is there enough information to start
making a rational decision on microcontroller selection. When this point is reached,
the following were the justifications for selecting the ATmega328P as the MCU for
the project.
Required hardware interfaces: All the external interfaces such as communication
interfaces and digital inputs and outputs as well as analog to digital inputs that the
microcontroller will need to support were available for this family. These interfaces
dictate the number of pins that will be required by the microcontroller.
Software architecture: An estimate of how long and how often each task will need to
run together with an order of magnitude feel for how much processing power will be
needed is what was considered. The amount of computing power required and
frequency of the microcontroller seemed to be well inside allowable response times
expected from a microcontroller driving such a project. An 8bit MCU such as the
ATmega328P could comfortable handle this task.
Memory Needs: Flash and RAM are two very critical components of any
microcontrollers. Making sure space or variable space is sufficient was a high
priority. The ATmega328P provides 32 Kilobytes of memory for operation.
However an external SD card could still be used to supplement any memory
16
requirements that might crop up. So the memory needs were well catered for.
Costs and Power Constraints: The ATmega328P is powered by a 5V Vcc which
can easily be provided by batteries which are available at fairly affordable prices. The
chip’s overall cost is also affordable.
Part availability: This chip is readily available in the country and the institution
could easily provide it. So this was also a key point in settling for ATmega328P.
Compilers and tools: The last consideration was to examine the compiler and tools
that are available. Most microcontrollers have a number of choices for compilers,
example code and debugging tools. All the necessary tools were available for the
part. Without the right tools the development process could have become tedious and
expensive.
Principle of Operation
The ATmega328P features a 10-bit successive approximation ADC. The ADC is
connected to an 8-channel Analog Multiplexer which allows 8 single-ended voltage
inputs constructed from the pins of Port A. The single-ended voltage inputs refer to
0V (GND).
The ADC converts an analog input voltage to a 10-bit digital value through successive
approximation. The minimum value represents GND and the maximum value
represents the voltage on the AREF pin minus 1 LSB. Optionally, AVCC or an
internal 2.56V reference voltage may be connected to the AREF pin by writing to the
REFSn bits in the ADMUX Register. The internal voltage reference may thus be
decoupled by an external capacitor at the AREF pin to improve noise immunity.
The analog input channel and differential gain are selected by writing to the MUX bits
in ADMUX. Any of the ADC input pins, as well as GND and a fixed band gap
voltage reference, can be selected as single ended inputs to the ADC. A selection of
ADC input pins can be selected as positive and negative inputs to the differential gain
amplifier.
17
The ADC generates a 10-bit result which is presented in the ADC Data Registers,
ADCH and ADCL. By default, the result is presented right adjusted, but can
optionally be presented left adjusted by setting the ADLAR bit in ADMUX. If the
result is left adjusted and no more than 8-bit precision is required, it is sufficient to
read ADCH. Otherwise, ADCL must be read first, then ADCH, to ensure that the
content of the Data Registers belongs to the same conversion. Once ADCL is read,
ADC access to data registers is blocked. This means that if ADCL has been read, and
a conversion completes before ADCH is read, neither register is updated and the
result from the conversion is lost. When ADCH is read, ADC access to the ADCH
and ADCL Registers is re-enabled.
Noise Cancellation Techniques
Digital circuitry inside and outside the device generates EMI which might affect the
accuracy of analog measurements. If conversion accuracy is critical, the noise level
can be reduced by applying the following techniques:
1. Keep analog signal paths as short as possible. Make sure analog tracks run
over the analog ground plane, and keep them well away from high-speed
switching digital tracks
2. The AVCC pin on the device should be connected to the digital Vcc supply
voltage via an LC network as shown in Figure 106 of its data sheet.
3. Use the ADC noise canceller function to reduce induced noise from the CPU.
4. If any ADC port pins are used as digital outputs, it is essential that these do
not switch while a conversion is in progress.
18
2.4.2 Microphone
Figure 4: Condenser Microphone
An electret microphone is a type of electrostatic capacitor-based microphone, which
eliminates the need for a polarizing power supply by using a permanently charged
material [18].
An electret is a stable dielectric material with a permanently embedded static electric
charge (which, due to the high resistance and chemical stability of the material, will
not decay for hundreds of years). The name comes from electrostatic and magnet;
drawing analogy to the formation of a magnet by alignment of magnetic domains in a
piece of iron [18]. Electrets are commonly made by first melting a
suitable dielectric material such as a plastic or wax that contains polar molecules, and
then allowing it to re-solidify in a powerful electrostatic field. The polar molecules of
the dielectric align themselves to the direction of the electrostatic field, producing a
permanent electrostatic "bias" [18].
Electret Microphones come in three forms:
Foil-type or diaphragm-type
A film of electret material is used as the diaphragm itself. This is the most common
type, but also the lowest quality, since the electret material does not make a
particularly good diaphragm [18].
19
Back electret
An electret film is applied to the back plate of the microphone capsule and the
diaphragm is made of an uncharged material, which may be mechanically more
suitable for the transducer design being realized [18].
Front electret
In this newer type, the back plate is eliminated from the design, and the capacitor is
formed by the diaphragm and the inside surface of the capsule. The electret film is
adhered to the inside front cover and the metalized diaphragm is connected to the
input of the FET. It is equivalent to the back electret in that any conductive film may
be used for the diaphragm [18].
Microphone sensitivity is typically measured with a 1 kHz sine wave at a 94 dB sound
pressure level (SPL), or 1 Pascal (Pa) pressure. The magnitude of the analog or digital
output signal from the microphone with that input stimulus is a measure of its
sensitivity [19].
Typically specified in logarithmic units of dBV (decibels with respect to 1 V), it tells
how many volts the output signal will be for a given SPL. For an analog microphone,
sensitivity, in linear units of mV/Pa, can be expressed logarithmically in decibels:
Where:
OutputAREF is the 1000 mV/Pa (1 V/Pa) reference output ratio.
A high sensitivity microphone isn’t always better than a low sensitivity microphone.
Sensitivity tells something about the characteristics of the microphone but not
necessarily about its quality [19]. A balance between the microphone’s noise level,
clipping point, distortion, and sensitivity determines whether a microphone is a good
fit for a particular application.
20
A microphone with high sensitivity may need less preamp gain before the analog-to-
digital conversion, but it may have less headroom before clipping than a microphone
with lower sensitivity [19].
In near-field applications, such as cell phones, where the microphone is close to the
sound source, a microphone with higher sensitivity is more likely to reach the
maximum acoustic input, clip, and cause distortion.
On the other hand, a higher sensitivity may be desirable in far-field applications, such
as conference phones and security cameras, where the sound is attenuated as the
distance from the source to the microphone increases [19].
21
Chapter 3
DESIGN & IMPLENTATION
3.1 System Operation
The speech recognition system may be viewed as working in a four stages
1. Analysis
2. Feature extraction
3. Modeling
4. Testing
3.1.1 Analysis
Speech data contains speaker specific information due to vocal tract, excitation source
and behavioral features of the speaker which makes the data unique in each instance.
This information embedded in the signal is analyzed for speech recognition.
3.1.2 Feature Extraction
In order to find some statistically relevant information from incoming data, it is
important to have mechanisms for reducing the information of each segment in the
audio signal into a relatively small number of parameters, or features. These features
should describe each segment in such a characteristic way that other similar segments
can be grouped together by comparing their features.
The information in speech signal is represented by short term amplitude spectrum of
the speech wave form. This allows us to extract features based on the short term
amplitude spectrum from speech (phonemes). Digital signal processing (DSP)
techniques are the core of this speech recognition system. DSP methods are used in
speech analysis, synthesis, coding and recognition.
This is all executed by input signal pre amplification, followed by a an analogue to
digital conversion by the microcontroller MCU and a software implementation that
22
achieves recognition through carrying out microcontroller PORT logic and employing
knowledge on Hamming distance to increase accuracy.
Theoretically, it should be possible to recognize speech directly from the digitized
waveform. However, because of the large variability of the speech signal, it is better
to perform some feature extraction that would reduce that variability.
3.1.3 Modeling
The speech recognition modeling techniques are divided into two classifications i.e.
1. Speaker Identification: Automatically identifies who is speaking on basis
of individual information integrated in speech signal. The main aim of speaker
identification is comparing speech from an unknown speaker to a database of known
speakers. The system can recognize the speaker when it has been trained with a
number of speakers.
2. Speaker Recognition: This aspect of the field can be looked at from the
following four angles i.e. speaker independent, speaker dependent, text dependent and
text independent. Whereby;
• Speaker Independent- The computer should ignore the speaker specific
characteristics of the speech signal and extract the intended massage.
• Speaker Dependant-Machine should extract speaker characteristics in the
acoustic signal.
• Text dependent -The speaker says key words or sentences having the same
text for both training and recognition trials.
• Text independent – The system does not rely on a specific text being spoken.
This system is a Speaker Dependent Speech Recognition System.
3.1.4 Testing
The testing phase and matching go hand in hand. In this phase we test whether a
processed output is in line with the required output. For this system, a conclusion of
SUCCESS is reached at by correct selection of a system output(loads) as intended by
a user.
23
System Block Diagram
3.2 Circuits & Mathematical Descriptions
The amplification employed is of OP AMP. The main purpose of this circuit is pre
amplification i.e. to make weak signals strong enough for further processing.
The microcontroller employed for the circuit is the ATmega328P microcontroller of
the AVR series by ATMEL. This microcontroller is the intelligence of the whole
system and its operation is what controls the activities of the system. The system
utilizes the inbuilt Analogue to Digital conversion capabilities of this chip to convert
an analogue speech signal to an equivalent digital capture and stores it in one of its
registers as the sample that shall be processed for EEPROM storage as a template for
comparison with an input signal that will be arriving as a user uses the system.
Microphone voice input from user
Input signal Amplification
Speech analysis
Control signal to controller
Controller identify the load
Load
(BLUE LED) Load
(GREEN LED)
24
Principles Employed For Circuit Design.
Baring in mind any extended DC voltage alters the biasing conditions and affects
performance, capacitor C1 was placed as the input Microphone coupling capacitor
which blocks any DC component if present. Cout prevents any DC voltage from
entering into the succeeding stage hence preventing clamping of this output signal by
the DC level present at the OP AMP output. R3 and R4 provide necessary voltage to
drive the OP AMP, whereby we require that VCC drops before the input. R1 is just a
microphone load resistor.
Figure 5: Amplifier First Stage
Now, from the output of an electrets microphone as specified by manufacture data
sheet it usually falls in the range of mV. This means an amplification of a gain of
about 1000 is required. Obviously amplifying this signal with this gain directly may
render the system disastrous since the accompanying noise will also be amplified with
this gain. To curb this, a two stage amplifier is employed with first stage having a gain
of 10 followed by a low pass filter which then acts as an input to a second stage of
amplification having a gain of 100 whose output is finally fed to the microcontroller
ADC input.
25
For the first stage:
Choosing 100
10 100 10
100 101.0 10 10
For the second stage:
Choosing 100
100 100 10
100 101.0 10 1
Figure 6: Amplifier Second Stage
26
Condition Five: The main consideration in choosing capacitor C1 value is to ensure
that its capacitive reactance is low enough, compared with the input impedance of the
amplifier, or any load connected to the output, to allow signals at all the required
frequencies to pass. The reactance of a capacitor is greatest at low frequencies,
therefore the choice of coupling capacitor values must allow for a low reactance at the
lowest frequencies the amplifier is designed to amplify. A generally accepted value
for coupling capacitors in an audio amplifier would be between 1µF and 10µF.
For our system we know that an electret microphone has an operating frequency
response of about 0-30KHz.
Figure 7 : Low Pass Filter
For the Active filter design, the following principles and procedures were used.
Choosing C5 =0.47;
Calculating C6 = C5 2 ≅ 1
Calculating R8 and R9 = √
√..
≅ 10Ω
C4=C8=100 to 1000 times C1 (not critical) = 220
27
Figure 8 : Circuit Diagram
28
3.3 Software Development Figure 9: Overall Program Flowchart YES NO YES NO YES NO
Start
DISPLAY: STORE BLUE COMMAND
Speech Input from microphone
DISPLAY: NO MATCH
TRY AGAIN
DISPLAY: BLUE COMMAND STORED DISPLAY: STORE GREEN COMMAD
Speech Input from microphone
DISPLAY: GREEN COMMAND STORED DISPLAY: STORE OFF COMMAD
Speech Input from microphone
DISPLAY: OFF COMMAND STORED DISPLAY: SYSTEM READY
DISPLAY: SPEAK WHEN RED OFF
Speech Input from microphone
Speech Recognized SWITCH ON
BLUE LED
DISPLAY: NO MATCH
TRY AGAIN Speech Recognized SWITCH OFF
ALL LEDs
DISPLAY: NO MATCH
TRY AGAIN Speech Recognized SWITCH ON
GREEN LED
DISPLAY: SYSTEM
TASK ACHIEVED
29
Figure 10: Speech Recognition Process Flowchart
HAMMING DISTANCE LESS THAN 3 HAMMING DISTANCE GREATER THAN 3
Start
Speech Input from microphone
DISPLAY NO MATCH MESSAGE
Compare speech input with stored template
through measuring Hamming Distance
SPEECH RECOGNISED
30
Chapter 4
ANALYSIS & RESULTS
4.1 Microphone and Amplifier Tests The system was connected up to the output stage of the filter of the second stage amplifier, i.e. the system circuit excluding the microcontroller and the loads connected to the system, with the output being fed to the channel two input of a Digital Oscilloscope for the purpose of system preliminary analysis. There was also provision for the output of the system power supply be fed to channel one of the same oscilloscope for purposes of ensuring the system is receiving required power supply all though the tests. Two test speech samples containing the same spoken word “TESTING” were analyzed, the first a pre-recorded speech sample obtained through using a mobile phone for the voice recording and the second a spoken speech of the same word spoken during the experiment. The following results were obtained: Oscilloscope Settings Channel One Volts per Division: 1V Time base or Time/Division: 2ms Coupling: DC Impedance: 1MΩ
Channel Two Volts per Division: 1V Time base or Time/Division: 2ms Coupling: AC Impedance: 1MΩ
31
Table 1: Results from Recorded Speech
ITERATION WAVEFORM AMPLITUDE (Voltsp-p)
First Try
1.8
Second Try
2.3
Third Try
2.69
32
Table 2: Results from Spoken Speech
ITERATION WAVEFORM AMPLITUDE (Voltsp-p)
First Try
3.03
Second Try
2.01
Third Try
2.99
33
NB: When no input was applied to the microphone, a voltage output of 986mVp-p was
measured at the amplifier output. This was recorded as the fairly silent measurement
since the lab was open air and there was no total silence in the lab.
The above tests conformed to the expected amplifier specifications. Since an output in
the range of volts was measured after a microphone input in the range of mill volts
was fed in it. This sufficiently proved the two stage audio amplifiers were properly
designed to give a combined gain of 1000 which was the requirement necessary for a
signal to undergo Analogue to Digital Conversion sufficiently on an ATmega 328P as
per the manufacturer data sheet.
4.2 Microcontroller Tests The ATmega 328P employed for the system was then first tested for any timing issues
and ensuring all initial requirements of setting up a new microcontroller such as
burning of fuse bits were done properly.
The microcontroller was powered and a simple code implementing the blinking of one
of the LEDs after every five seconds was programmed to it.
The LED was observed to blink after every five seconds as expected from the small
program specifications signaling the clock requirements were properly set up
4.3 Speech Recognition System Tests The code ‘see Appendix E’ was compiled and programmed to the microcontroller. The
system was powered on and the following observations were made when a short
spoken speech sample was issued to the system:
• The LCD screen came on and displayed the “WELCOME” message on the
first row followed by “STORE BLUE COMMAND” on the second row.
• The RED LED came on to alert the user to be ready for recording(training) the
templates that would be stored in EEPROM for feature comparison processes
34
• The system ,through the LCD screen prompted a user sequentially to store the
commands for ‘TURNING ON BLUE LED’ then ‘GREEN’ then finally
‘TURN OFF’ all the two LEDS i.e. BLUE and GREEN.
• After successful storage the system confirmed to a user through the LCD
screen that all the three commands have been stored in the EEPROM.
• The system then prompted the user to start operating the system by issuing
commands to sequentially light the BLUE LED followed by GREEN then
finally turn OFF all the two LEDs at once.
• Once the above process terminated, the LCD screen displayed “FINAL
SPEECH RECOGNISED” thereby signaling the end of the system’s process.
• Pressing a push button prompted the system to restart this process all over
again from the point it displayed the “WELCOME” message.
To analyze the above observations, the obtained observations resulted from the
algorithm implemented in the code for the system. The microcontroller performed
each task it was programmed to do very well at the specified delays as per the
software installed.
It is important to note that the system seemed to be unstable commands longer than
three seconds were issued to the system. This was as a result of the accuracy of error
correction technique used by the system. Hamming distance has a fairly low degree of
presented accuracy in its operation.
The system also forced a user to do its tasks in a sequential manner as per the program
and any deviation from the algorithm was NOT recognized and it prompted a user to
RESTART issuing the lighting commands from the point the system prompts a user to
“SPEAK IF RED LED ON”
35
When the code ‘see appendix c’ was compiled and programmed to the
microcontroller, with the system powered on and recorded speech samples were
issued to the system instead of spoken, the same observations presented as above were
seen but with higher accuracy. Accuracy here being the number of tries it required a
user to move through the steps in lighting the lights. With spoken speech, a user had
to try about six times to be able to successfully operate the system to the end.
With recorded speech, it took four tries and a user could move through the whole
system steps of lighting sequentially the two LEDs and putting them all off.
Percentage Difference
% =
× 100
= ÷
× 100
= 20%
36
Chapter 5
CONCLUSIONS &
RECOMENDATIONS Conclusion: Feasibility
With a lot of programming and some knowledge in Error Correction, it is possible to
implement a Microcontroller based speech recognition system using a simple 8bit IC
such as an ATmega 328P.
Recommendation: A strong background in programming must be
attained before attempting this implementation.
Conclusion: Cost
A simple microcontroller is a powerful device that can give you value for your money
by enabling one build a complex system doing powerful tasks at affordable costs. The
project amounted to a total cost of about Kshs 1 500.
Recommendation: More emphasis should be made on the many
opportunities such a small device such as a microcontroller can open to willing minds.
The possibilities are simply limited to ones’ own imagination.
Conclusion: Project Specifications
The project objective and specifications were successfully met. However, this was not
an easy feet to achieve due to the limited kinds of microcontroller available locally
when features are to be considered. It was tedious to implement the system with an
8bit microcontroller due to the limitations it presents in terms of features. It was not
possible to acquire a microcontroller that could do a better job and this contributed to
inaccuracies of the system.
Recommendation: Feature project specifications can be matched with
devices that are readily available or the family of the device to drive the project can
completely be changed to a stronger family. For example moving a task from being
implemented with a microcontroller to being implemented using a microprocessor.
37
References [1] R.Klevansand R.Rodman, “Voice Recognition", Artech House, Boston,
London 1997.
[2] Kevin Brady, Michael Brandstein, Thomas Quatieri, Bob Dunn “An
Evaluation Of Audio-Visual person Recognition on the XM2VTS corpus using the
Lausanne protocol” MIT Lincoln Laboratory, 244 Wood St., Lexington MA
[3] W. M. Campbell_, D. E. Sturim W. Shen D. A. Reynolds_,J. Navr´atily “The
MIT- LL/IBM Speaker recognition System using High performance reduced
Complexity recognition” MIT Lincoln Laboratory IBM 2006.
[4] GIN-DER WU AND YING LEI “ A Register Array based Low power FFT
Processor for speech recognition” Department of Electrical engineering national Chi
Nan university Puli ,545 Taiwan.
[5] Nicolás Morales1, John H. L. Hansen2 and Doorstep T. Toledano1 “MFCC
Compensation for improved recognition filtered and band limited speech” Center for
Spoken Language Research, University of Colorado at Boulder,Boulder (CO), USA
[6] M.A.Anusuya ,S.K.Katti “Speech Recognition by Machine: A Review”
International journal of computer science and Information Security 2009.
[7] Kenneth Thomas Schutte “Parts-based Models and Local Features for
Automatic Speech Recognition” B.S., University of Illinois at Urbana-Champaign
(2001) S.M., Massachusetts Institute of Technology (2003).
[8] Sannella, M Speaker recognition Project Report report” From
http://cs.joensuu.fi/pages/tkinnu/research/index.html Viewed 23 Feb. 2010.
[9] S.katagiri, Speech Pattern recognition using Neural Networks.
[10] L.R.Rabiner and B.H.jaung ,” Fundamentles of Speech Recognition Prentice-
Hall, Englewood Cliff, New Jersy,1993.
[11] D.R.Reddy, An Approach to Computer Speech Recognition by Direct
Analysis of the Speech Wave , Tech.Report No.C549, Computer Science Dept.,
Stanford Univ. September 1966
38
[12] International Journal of Computer Applications (0975 – 8887)Volume 10– No.3,
November 2010
[13] Santosh K.Gaikwad, Bharti W.Gawali, Pravin Yannawar,"A Review on Speech
Recognition Technique", Department of CS& IT ,Dr.Babasaheb Ambedkar
MarathwadaUniversity,Aurangabad(2010)
[14] Urmila Shrawankar, Dr. Vilas Thakare," Techniques for Feature Extraction in
Speech Recognition system: a comparative study", Dept. of Computer Science, SGB
Amravati University, Amravati.
[15] Carol Yvonne Espy-Wilson, “An Acoustic-Phonetic Approach to Speech
Recognition: Application to the Semivowels”, Research Laboratory of Electronics
Massachusetts Institute of Technology Cambridge, MA02139USA(1987).
[16] "Speech recognition," in Wikipedia, Wikimedia Foundation, 2016. [Online].
Available: https://en.wikipedia.org/wiki/Speech_recognition. Accessed: Apr. 6, 2016.
[17] E. Grabianowski, "How speech recognition works," HowStuffWorks, 2006.
[Online]. Available: http://electronics.howstuffworks.com/gadgets/high-tech-g
[18] "Electret microphone," in Wikipedia, Wikimedia Foundation, 2016. [Online].
Available: https://en.wikipedia.org/wiki/Electret_microphone. Accessed: Apr. 6,
2016.adgets/speech-recognition2.htm. Accessed: Apr. 6, 2016.
[19] J. Lewis, "Understanding microphone sensitivity,". [Online]. Available:
http://www.analog.com/library/analogDialogue/archives/46-
05/understanding_microphone_sensitivity.html. Accessed: Apr. 6, 2016.
[20] Bruce Carter, “Filter Design in Thirty Seconds”, Texas Instruments, Dallas,
Texas 75265(2001).
[21] "LM358," in Wikipedia, Wikimedia Foundation, 2016. [Online]. Available:
https://en.wikipedia.org/wiki/LM358. Accessed: Apr. 6, 2016.
[22]"Coding theory," in Wikipedia, Wikimedia Foundation, 2016. [Online]. Available:
https://en.wikipedia.org/wiki/Coding_theory. Accessed: May 8, 2016.
[23]S. Inc, "ERROR DETECTION AND CORRECTION USING HAMMING
CODE," Scribd, 2016. [Online]. Available:
https://www.scribd.com/doc/25715552/ERROR-DETECTION-AND-
39
CORRECTION-USING-HAMMING-CODE. Accessed: May 8, 2016.
[24]"Error detection and correction," in Wikipedia, Wikimedia Foundation, 2016.
[Online]. Available: https://en.wikipedia.org/wiki/Error_detection_and_correction.
Accessed: May 8, 2016.
[25]"Hamming distance," in Wikipedia, Wikimedia Foundation, 2016. [Online].
Available: https://en.wikipedia.org/wiki/Hamming_distance. Accessed: May 8, 2016.
Appendix A
BILL OF MATERIALS
40
Appendix B PCB Schematic Drawing Using Microwin
41
Appendix C
42
PCB Schematic
Appendix D
43
ATmega328P Overall Features
• High-performance, Low-power AVR
®
8-bit Microcontroller
l Advanced RISC Architecture
131 Powerful Instructions – Most Single Clock
Cycle Execution
32 x 8 General Purpose Working Registers
Fully Static Operation
Up to 20 MIPS Throughput at 20MHz
On-chip 2-cycle Multiplier
l High Endurance Non-volatile Memory Segments
4/8/16/32KBytes of In-System Self-
Programmable Flash program memory
256/512/512/1KBytes EEPROM
512/1K/1K/2KBytes Internal SRAM
Write/Erase Cycles: 10,000 Flash/100,000
EEPROM
Data retention: 20 years at 85°C/100 years at
25°C
Optional Boot Code Section with Independent
Lock Bits
l In-System Programming by On-chip Boot Program
l True Read-While-Write Operation
Programming Lock for Software Security
l Atmel® QTouch® library support Capacitive touch buttons, sliders and wheels
QTouch and QMatrix® acquisition
Up to 64 sense channels
l Peripheral Features
Two 8-bit Timer/Counters with Separate
Prescaler and Compare Mode
One 16-bit Timer/Counter with Separate
Prescaler, Compare Mode, and
Capture Mode
Real Time Counter with Separate Oscillator
Six PWM Channels
8-channel 10-bit ADC in TQFP and QFN/MLF
package
l Temperature Measurement
6-channel 10-bit ADC in PDIP Package
l Temperature Measurement
Programmable Serial USART
Master/Slave SPI Serial Interface
Byte-oriented 2-wire Serial Interface (Philips I
2
C compatible)
Programmable Watchdog Timer with Separate
On-chip Oscillator
On-chip Analog Comparator
Interrupt and Wake-up on Pin Change
• Special Microcontroller Features
Power-on Reset and Programmable Brown-out
Detection
Internal Calibrated Oscillator
External and Internal Interrupt Sources
Six Sleep Modes: Idle, ADC Noise Reduction,
Power-save, Power-down, Standby, and
Extended Standby
l I/O and Packages
23 Programmable I/O Lines
28-pin PDIP, 32-lead TQFP, 28-pad QFN/MLF
and 32-pad QFN/MLF
l Operating Voltage: 1.8 - 5.5V
l Temperature Range: -40°C to 85°C
l Speed Grade: 0 - [email protected] - 5.5V, 0 - [email protected] - 5.5.V, 0
- 20MHz @ 4.5 - 5.5V
l Power Consumption at 1MHz, 1.8V, 25°C
Active Mode: 0.2mA
Power-down Mode: 0.1µA
Power-save Mode: 0.75µA (Including 32kHz
RTC)
Pin Descriptions
ADC7:6 (TQFP and QFN/MLF Package Only)
44
In the TQFP and QFN/MLF package, ADC7:6 serve as analog inputs to the A/D
converter. These pins are powered from the analog supply and serve as 10-bit ADC
channels.
Port B (PB7:0) XTAL1/XTAL2/TOSC1/TOSC2
Port B is an 8-bit bi-directional I/O port with internal pull-up resistors (selected for
each bit). The Port B output buffers have symmetrical drive characteristics with both
high sink and source capability. As inputs, Port B pins that are externally pulled low
will source current if the pull-up resistors are activated. The Port B pins are tristated
when a reset condition becomes active, even if the clock is not running.
Depending on the clock selection fuse settings, PB6 can be used as input to the
inverting Oscillator amplifier and input to the internal clock operating circuit.
Depending on the clock selection fuse settings, PB7 can be used as output from the
inverting Oscillator amplifier.
If the Internal Calibrated RC Oscillator is used as chip clock source, PB7...6 is used as
TOSC2...1 input for the Asynchronous Timer/Counter2 if the AS2 bit in ASSR is set.
AVcc
AVcc is the supply voltage pin for the A/D Converter, PC3:0, and ADC7:6. It should
be externally connected to Vcc, even if the ADC is not used. If the ADC is used, it
should be connected to Vcc through a low-pass filter. Note that PC6...4 use digital
supply voltage, Vcc,
AREF
AREF is the analog reference pin for the A/D Converter.
Analogue to Digital Converter Features
• 10-bit Resolution
45
• 0.5 LSB Integral Non-linearity
• ±2 LSB Absolute Accuracy
• 13 - 260 µs Conversion Time
• Up to 76.9kSPS (Up to 15kSPS at Maximum Resolution)
• 6 Multiplexed Single Ended Input Channels
• 2 Additional Multiplexed Single Ended Input Channels (TQFP and QFN/MLF
Package only)
• Temperature Sensor Input Channel
• Optional Left Adjustment for ADC Result Readout
• 0 – VCC ADC Input Voltage Range
• Selectable 1.1V ADC Reference Voltage
• Free Running or Single Conversion Mode
• Interrupt on ADC Conversion Complete
• Sleep Mode Noise Canceler
APPENDIX E
Code: /*
46
* Final Project Code.c
*
* Created: 14-May-16 10:38:18 AM
* Author : F17/39705/2011
*/
#ifndef F_CPU
#define F_CPU 16000000UL // Define clock frequency
#endif
#include <avr/io.h>//Includes io.h header file where all the Input/Output
Registers and its Bits are defined for all AVR micro controller.
#include <util/delay.h>//Includes delay.h header file which defines two
functions, _delay_ms (millisecond delay) and _delay_us (microsecond delay).
#include <avr/eeprom.h>//Includes eeprom.h header file which defines different
functions for eeprom manipulation.
#include "lcd.h"//Includes lcd.h header file which defines different functions
for all Alphanumeric LCD.
#include <avr/interrupt.h>//Includes interrupt.h header file which defines
different functions for ISR.
uint8_t volatile start_recording = 0;
uint8_t start_second_recording = 0;
uint8_t start_third_recording = 0;
uint8_t start_lisening = 0;
uint8_t nextGreen = 0;
uint8_t nextOFF = 0;
//Global Variables
uint8_t theLowADC;
uint16_t theTenBitResults;
volatile uint8_t tot_overflow;
void timer1_init();
//Function to set up the ADC registers
void adc_enable(void)
ADCSRA = (1<<ADPS2)|(1<<ADPS1)|(1<<ADPS0); //Pre-scaler selection.
The division factor is 128,thus, F_ADC = 16M/128 = 125kHz.
ADMUX = (1<<REFS0); //AREF = AVcc
ADMUX = (1<<MUX2);// choose ADC channel 4 as the ADC input
ADCSRA |= 1<<ADIE; //ADC Interrupt enable
ADCSRA |= 1<<ADEN; //Enable ADC
//The Interrupt Service routine for ADC
ISR(ADC_vect)
theLowADC= ADCL;
theTenBitResults= ADCH<<8 | theLowADC;
47
ADCSRA |= 1<<ADSC; // Start Conversion
int main(void)
DDRC = 0b00000000;
DDRD = 0b11111111; // initialize port D
DDRB = 0b00000111; // initialize port B
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on *
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("WELCOME"); /* display message */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("STORE BLUE CMND"); /* display something */
_delay_ms(3000); /* wait 3seconds */
//timer1_init();
start_recording = 1;
while (1)
if (PINC5 == 0)
start_recording = 1;
PORTB = 0b11111110; // RED LED on.
if(start_recording == 1)
PORTB = 0b11111111; // RED LED off.
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on,
cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("BLUE COMMAND"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("STORED"); /* display something */
_delay_ms(3000); /* wait 3seconds */
start_second_recording = 1;
if (start_second_recording == 1) // if lcd displays "blue command
stored"
48
start_second_recording = 0;
lcd_init(LCD_DISP_ON_CURSOR);
lcd_clrscr();
lcd_home();
lcd_puts("NEXT");
lcd_gotoxy(0,1);
lcd_puts("STORE GREEN COMMAND");
_delay_ms(3000);// display "Next Store Green Command
PORTB = 0b11111110; // RED LED on.
_delay_ms(5000); /* wait 5s */
PORTB = 0b11111111; // RED LED off.
adc_enable();
sei(); //Enable global interrupts
ADCSRA |= 1<<ADSC; // Start conversion
uint16_t WordOfData_B ; //Define type and size of data
WordOfData_B = theTenBitResults;
eeprom_update_word (( uint16_t *) 2, WordOfData_B ); // update
the EEPROM location"0"
tot_overflow++; // keep a track of number of overflows
if (tot_overflow >= 2) // the writing process needs 3.4ms we add a few
more overflows to give time for EEPROM Write then we signal the micro
controller we are done
start_third_recording = 1;
if (start_third_recording == 1) // if lcd displays "green
command stored"
start_third_recording = 0;
lcd_init(LCD_DISP_ON_CURSOR);
lcd_clrscr();
lcd_home();
lcd_puts("NEXT");
lcd_gotoxy(0,1);
lcd_puts("STORE OFF COMMAND");
_delay_ms(3000);
49
PORTB = 0b11111110; // RED LED on.
_delay_ms(5000); /* wait 5s */
PORTB = 0b11111111; // RED LED off.
adc_enable();
sei(); //Enable global interrupts
ADCSRA |= 1<<ADSC; // Start conversion
//void eeprom3 (void)
//
uint16_t WordOfData_C ; //Define type and size of data
WordOfData_C = theTenBitResults;
eeprom_update_word (( uint16_t *) 4, WordOfData_C ); //
update the EEPROM location"0"
//
tot_overflow++; // keep a track of number of overflows
if (tot_overflow >= 2) // the writing process needs 3.4ms we add
a few more overflows to give time for EEPROM Write then we signal the micro
controller we are done
start_lisening = 1;
if(start_lisening == 1)
start_lisening = 0;
lcd_init(LCD_DISP_ON_CURSOR);
lcd_clrscr();
lcd_home();
lcd_puts("SYSTEM READY");
lcd_gotoxy(0,1);
lcd_puts("SPEAK IF RED OFF");
_delay_ms(3000);
PORTB = 0b11111110; // RED LED on.
_delay_ms(5000); /* wait 5s */
PORTB = 0b11111111; // RED LED off.
50
adc_enable();
sei(); //Enable global interrupts
ADCSRA |= 1<<ADSC; // Start conversion
uint16_t WordOfData_D ;
WordOfData_D = eeprom_read_word (( uint16_t *) 0) ;
//Perform hamming_distance_
int dist = 0;
unsigned val = WordOfData_D ^ theTenBitResults;
// Count the number of bits set
while (val != 0)
// A bit is set, so increment the count and clear the bit
dist++;
val &= val - 1;
if (dist<=3)
PORTB = 0b11111101;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("SPEECH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("RECOGNISED"); /* display something */
_delay_ms(3000); /* wait 3seconds */
nextGreen = 1;
else
PORTB = 0b11111111;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("NO MATCH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
51
lcd_puts("TRY AGAIN"); /* display something */
_delay_ms(3000); /* wait 3seconds */
start_lisening = 1;
if(nextGreen == 1)
uint16_t WordOfData_E ;
WordOfData_E = eeprom_read_word (( uint16_t *) 2) ;
//Evaluate hamming distance
int dist = 0;
unsigned val = WordOfData_E ^ theTenBitResults;
// Count the number of bits set
while (val != 0)
// A bit is set, so increment the count and clear the bit
dist++;
val &= val - 1;
if (dist<=3)
PORTB = 0b11111001;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("SPEECH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("RECOGNISED"); /* display something */
_delay_ms(3000); /* wait 3seconds */
nextOFF = 1;
else
PORTB = 0b11111101;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("NO MATCH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("TRY AGAIN"); /* display something */
_delay_ms(3000); /* wait 3seconds */
start_lisening = 1;
52
if(nextOFF == 1)
uint16_t WordOfData_F ;
WordOfData_F = eeprom_read_word (( uint16_t *) 4) ;
//Evaluate Hamming Distance
int dist = 0;
unsigned val = WordOfData_F ^ theTenBitResults;
// Count the number of bits set
while (val != 0)
// A bit is set, so increment the count and clear the bit
dist++;
val &= val - 1;
// Return the number of differing bits
if (dist<=3)
PORTB = 0b11111111;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts(" FINAL SPEECH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("TASK COMPLETE"); /* display something */
_delay_ms(3000); /* wait 3seconds */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts(" FINAL SPEECH"); /* display something */
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("RECOGNISED"); /* display something */
_delay_ms(3000); /* wait 3seconds */
else PORTB = 0b11111001;
lcd_init(LCD_DISP_ON_CURSOR); /* initialize lcd, display on, cursor on */
lcd_clrscr(); /* clear screen of lcd */
lcd_home(); /* bring cursor to 0,0 */
lcd_puts("NO MATCH"); /* display something */
53
lcd_gotoxy(0,1); /* go to 2nd row 1st col */
lcd_puts("TRY AGAIN"); /* display something */
_delay_ms(3000);
start_lisening = 1;
//Function to set up the TIMER registers
void timer1_init()
TCCR1B |= (1 << CS11);// set up timer with prescaler = 8
TCNT1 = 0;// initialize counter
TIMSK1 |= (1 << TOIE1);// enable overflow interrupt
sei();// enable global interrupts
tot_overflow = 0;// initialize overflow counter variable
void recordCommand()
tot_overflow = 0; // reset overflow counter
adc_enable();
sei(); //Enable global interrupts
ADCSRA |= 1<<ADSC; // Start conversion
void eeprom (void)
uint16_t WordOfData_A ; //Define type and size of data
WordOfData_A = theTenBitResults;
eeprom_update_word (( uint16_t *) 0, WordOfData_A ); //
update the EEPROM location"0"
//tot_overflow++; // keep a track of number of overflows
if (tot_overflow >= 2) // the writing process needs 3.4ms we add
a few more overflows to give time for EEPROM Write then we signal the micro
controller we are done
start_recording = 0;
54