Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT USING VOICE
RECOGNITION
NORSHUHADA HASZARI
BACHELOR OF COMPUTER SCIENCE
(NETWORK SECURITY)
UNIVERSITI SULTAN ZAINAL ABIDIN
2017
AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT USING
VOICE RECOGNITION
NORSHUHADA BINTI HASZARI
Bachelor of Computer Science (Network Security)
Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin, Terengganu, Malaysia
MAY 201
i
i
DECLARATION
I hereby declare that this report is based on my original work except for quotations and
citations, which have been duly acknowledged. I also declare that it has not been
previously or concurrently submitted for any other degree at Universiti Sultan Zainal
Abidin or other institutions.
________________________________
Name : ..................................................
Date : ..................................................
ii
CONFIRMATION
This is to confirm that:
The research conducted and the writing of this report was under my supervisor.
________________________________
Name : ..................................................
iii
Date : ..................................................
DEDICATION
First and foremost , praise be to Allah because if His love and strength that He has given
to me to complete my final year project entitled “Authentication System for Security
Enhancement using Voice Recognition.” I do thank for His blessing to my daily life, good
health, healthy mind and good ideas although I have to go through some difficulties along
the way.
I take this opportunity to express my gratitude and deep regards to those who had
contributed in the completion of this project. First, I would like to express my sincere
gratitude to my supervisor, Puan Siti Dhalila binti Mohamad Satar for the motivation,
support, advices and experiences. His guidance on helping me in all the time with the
accomplished final year project.
Last but not least, I would also like to thanks to my family members for their concern,
encouragement and understanding.
Finally, thanks to those who have contributed directly or indirectly to the success of this
project whom I have not mentioned their name specifically. Without them, this project
would not successful.
iv
ABSTRACT
Voice recognition is one of the biometric technologies used in a security system to
reduce cases of fraud. We present an authentication system using voice recognition to
enhance the security for entering into a system. To be authenticated, this project plan to
present an authentication scheme which is voice recognition. A voice recognition
provides a significant increase in security. The problem occurred when some individuals
have too many passwords. One reason for this is because of the World Wide Web
(WWW). An increasing number of Web sites ask users to register, requesting both a user
name and password. Since most of these registrations are free, and only used for
marketing purposes rather than security, the result is an increase of passwords. So, with
the use of authentication system, the hacker or attacker won’t be able to get into the main
page and access the user’s information. Voice recognition is more secured because our
voices are unique. The system cannot be cheated by mimicking a voice, and will
recognize the voice even if you have a cold or are in a loud office. The system built is
user-friendly and is able to learn quickly and easily understandable. It eliminates the
problem related with ID pin authentication system. Three stages of process were used,
feature extraction, vector quantization and feature matching to analyze the voice signal.
After analyze the voice, the highest correct percentage same as in the database will be
allowed to enter into the system.
v
ABSTRAK
Pengecaman suara adalah satu daripada teknologi biometrik digunakan dalam sistem
keselamatan untuk mengurangkan kes-kes penipuan. Kami memperkenalkan sistem
pengesahan menggunakan pengecaman suara untuk meningkatkan keselamatan untuk
memasuki sistem. Untuk disahkan, rancangan projek ini untuk memberikan skim
pengesahan yang pengecaman suara. A pengecaman suara menyediakan peningkatan
yang ketara dalam keselamatan. masalah itu berlaku apabila sesetengah individu
mempunyai terlalu banyak kata laluan. Salah satu sebabnya adalah kerana World Wide
Web (WWW). Semakin banyak laman web meminta pengguna untuk mendaftar, meminta
kepada penggunaan nama pengguna dan kata laluan. Oleh kerana kebanyakan
pendaftaran ini adalah percuma, dan hanya digunakan untuk tujuan pemasaran bukannya
keselamatan, hasilnya adalah peningkatan yang banyak pada kata laluan. Jadi, dengan
penggunaan sistem pengesahan, penggodam tidak akan dapat masuk ke halaman utama
dan mengakses maklumat pengguna. Pengecaman suara adalah lebih terjamin kerana
suara kami adalah unik. Sistem ini tidak boleh ditipu dengan meniru suara, dan akan
mengenali suara walaupun anda demam atau di kawasan yang bising. Sistem yang dibina
adalah mesra pengguna dan dapat belajar dengan cepat dan mudah difahami. Ia
menghapuskan masalah yang berkaitan dengan sistem ID pengesahan pin. Tiga peringkat
proses telah digunakan, pengekstrakan ciri, vektor pengkuantuman dan ciri yang hampir
vi
sama untuk menganalisis isyarat suara. Selepas menganalisis suara, peratusan betul
tertinggi sama seperti dalam pangkalan data akan dibenarkan masuk ke dalam sistem.
vii
CHAPTER TITLE PAGE
DECLARATION
CONFIRMATION
DEDICATION
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
LIST OF FIGURES
LIST OF ABBREVIATIONS
i
ii
iii
iv
v – vi
vii – x
xi
xii
viii
CHAPTER I INTRODUCTION
1.1 Background of study 1 – 2
1.2 Problem statement 3
1.3 Objectives 4
1.4
1.5
Scopes
Limitation of work
4
5
CHAPTER II LITERATURE REVIEW
2.1 Introduction 6 - 7
2.2 Benefits of Bio-metric Technology 8
2.3 Classes of Bio-metric Technology
2.3.1 Face Recognition
2.3.2 Iris Recognition
2.3.3 Fingerprint recognition
2.3.4 Voice recognition
2.3.5 Text dependent and text independent
9 - 14
ix
2.4 Feature extraction and feature matching
2.4.1 Mel frequency Cepstrum coefficient
(MFCC)
2.4.2 Linear predictive coding (LPC)
2.4.3 Perceptual linear predictive (PLP)
2.4.4 Vector Quantization (VQ)
2.4.5 Hidden Markov Model (HMM)
2.4.6 Gaussian Mixture Model (GMM)
14 -19
2.5 Conclusion 20
CHAPTER III
METHODOLOGY
3.1 Introduction 20
3.2 Waterfall Methodology 20-25
3.3 Software implementation 25-30
3.3.1 MATLAB
3.3.2 Mel frequency Cepstrum coefficient
(MFCC) Approach
3.3.3 Frame Blocking
3.3.4 Windowing
x
3.3.5 Fast Fourier Transform (FFT)
3.3.6 Mel frequency wrapping
3.3.7 Cepstrum
3.4
3.5
Vector Quantization Approach
Feature Matching
31
31-32
3.6 Project overview 32-33
3.7 Conclusion 33
xi
FIGURE NO. TITLE PAGE
1 Classes of biometric 9
2 Waterfall model 21
3 Framework of voice recognition. 23
4 The capture of voice signal in 5 seconds 27
xii
LIST OF ABBREVIATIONS
DFT -Discrete Fourier Transform
FFT -Fast Fourier Transform
FOTRAN -Formula Translation
GMM -Gaussian Mixture Model
HMM - Hidden Markov Model
LPC -Linear Predictive Analysis
MATLAB -Matrix Laborotary
MFCC -Mel Frequency Cepstrum
Coefficient
PLP -Perceptual Linear Predictive
RFID -Radio Frequency Identification
VQ -Vector Quantization
xiii
1
CHAPTER1
INTRODUCTION
1.1 Background Of Study
Security is the level of resistance to, or a protection from harm or risk. It applies to any
defenseless and valuable asset, such as dwelling, community, nation or government.
Security also can be described to defend the important things from threat or danger,
stealing and fraud. Nowadays, security system is widely used in various sectors such as in
the industries and housing industry. Besides, a lot of applications require reliable and
secure authentication methods to confirm the identity of an individual requesting.
In a computer security system human factors are considered as the weakest link such as
user becoming ever more device dependent. However, there are three major areas where
interaction between human and computer is important, which are authentication,
developing secure systems and security operations. Here, we are focus more on the
authentication problem.
Password and user identity pin number (id pin) protection are the most common
authentication system that being used. The basal protocol of a password and user ID pin
2
protection is to remember password for security access control. Complexity and longer
password are created to maintain the security. However, the complex and long password
can be lost. Besides, the Radio Frequency Identification (RFID) tag is one of the
technologies in security system, but there some fact that can damage the RFID tag such
damaged on electrostatic charge. Technology had been amended with each single day. By
times past, there is an alternative means of authentication system other than word and
user pin protection. There is some development of research in the hallmark system. The
developers mention is using voice biometric as a spiritualist in the certification scheme
for security access control. For example, electrical appliance will be fudge by using a
voice as the medium. Therefore, in this project the voice will be used as the medium in
the authentication system. More importantly computer and hand held devices do not need
to have a keyboard to be useful and can really be everywhere and speaks all languages
(just patterns for computer).
3
1.2 Problem Statement
Nowadays, the security system is important to prevent loss of our belongings, property
and confidential transaction. The conventional security system was used widely in
authentication method for access control, magnetic card such as RFID tag was used in
physical access, attendance, as identification, while password and user ID pin allow user
to enter a certain premise.
Password and user id pin need to be complex and long to increase the security but it can
be forgotten. Basically, the user will use common password such as birthday date or
anniversary date as their password in order to help them memorize the password. This
common password can be easily guessed. Moreover, password, user id pin and magnetic
card can be stolen, can be duplicated (if the user used key), can be shares and this allows
imposter to access a restricted area without user authorize because conventional security
system cannot detect whether the person who access the system is a user or an imposter.
Therefore, this project will apply one of biometric technology to upgrade the level of
security than conventional security system. Biometric technology cannot be fooled or be
stolen. Besides, it also cannot be guessed as like a password. Basic protocol of biometric
technology is determining the unknown identify by comparing the identifier claimed to
the database. It’s a more convenient method in authentication system than conventional
system.
4
1.3 Objectives Of Project
Objectives are:
1. To identify features of the voice by using feature extraction and feature matching.
2. To design and develop a security system based on voice recognition.
3. To test and analysis the voice recognition security system.
1.4 Scope of Project
There are several scope that need to be identified in order to achieve the objective of this
project. In this project, it requires software implementation. For software implementation,
software (MATLAB) is required for writing the source code.
In this project, some techniques in MATLAB was implemented.
5
1.5 Limitation of Work
This system is high accuracy and efficiency in noiseless environment. There is still noise
detected if the system in noise environment. The algorithm is simple and easy
implemented, but it only minimizes small amount of noise.
6
CHAPTER 2
LITERATURE REVIEW
2.1 Introduction
French anthropologist Alphonse Bertillion is a person whose creates the first real
biometric system. By using detailed on body measurement, physical description and
photograph, he developed an identification system[norulashikin,2014]. Biometric is
taken from the Greek word, of which Bio means life and Metric means measures. By
combining these two words, Biometric can be defined as the measure (study) of life,
which includes humans, animals and plants. It also can be called life measurement.
Based on unique physiological and behavioral characteristic, we can say that biometric
is automated method of identifying the individual identity. Behaviors characteristic is
based on the unique if they do their things such as talking their name or singing. In the
other word, each person has different pitch of a voice or keystroke pattern. For
physiological characteristic is more to part of the human body, such as fingerprint, facial
feature and retina pattern.
A biometric system can operate in two modes which are verification or identification.
Verification based on unique identifier which single our a particular person and that
7
individual’s biometric while for identification it based only on biometric measurement
and it compares these measurement to the entire database that had been recorded.
This authentication system is more reliable in verifies and recognize person identity and
widely used in the application. Nowadays, there are lots of applications that need
reliable and secure authentication system to confirm the identity of individuals when
they request of service. Besides, biometric system is widely used in feature of security,
such as law enforcement, database access service, confidential transaction and others.
8
2.2 Benefits of Biometric Technology
Biometric technology is used in the security system because it has a higher security
level which is it cannot be stolen, created and forgotten, shared and loss. This
technology has a few properties that can be known as a high security level. In
universality module which is every person have the characteristic such as everybody has
a face, mouth, voice and others. It also distinctiveness means each person has different
enough to distinguish each other based on the characteristic. Besides, acceptability by
other people and this method is not bothersome invasive. Mostly the performance of
biometric technology is accurate and successful and also nearly impossible to forge.
Next, it also a circumvention which means an ability of fraudulent people and
techniques to fool the biometric system should be legible[2]. Compare with the
conventional security system, biometric technology is more convenient to use.
9
2.3 Classes of Biometric Technology
Figure 1 : Classes of biometric
Figure 1 above shows a class of biometric technology. In biometric technology,
fingerprint and handwriting are an earliest authentication system while the recent ones
included face print, iris or eyes print, finger print and voice print. Generally, face print is
a process of automatically identifying or verifying a person from a digital image or a
video source. An image of the face is captured and analyzed in order to derive a
template.
10
2.3.1 Face Recognition
In face recognition, can be divided into two main modules which is geometric module
and photo metric module. In geometric module it based on the looks at distinguishing
feature as human face such as eyes, mouth and etc. while for photo metric module it
based on statistical approach that distill an image into values and comparing the values
with templates to eliminate variances[5]. This bio-metric technology is suitable with a
small application where it only works with photograph database, video tape and camera.
However, the basic protocol of face recognition, sufficiency of light is needed in order
to get a better identification result. The basic weakness that can be figured in this
method are when a person changes their hairstyles, do makeup and facial hair.
Furthermore, when a person is exposed to the sun it may cause false detection during
identification phase.
11
2.3.2 Iris Recognition
Iris recognition is based on the resolution images of iris an individual’s eye. This
method needs high quality of camera technology such as kiosks-based system which it
is the most expensive one and easier to operate [2]. This biometric technology has high
capability to differentiate between individuals, even between the user’s left and right
eyes [2]. The benefits of this method, it has the smallest outer which is people cannot
use or enroll of all biometric technologies. Besides, it has longevity that can last a
lifetime. Moreover, the performance may degrade because of impaired by glasses,
sunglasses and contact lenses.it also has a few of legacy database.
2.3.3 Fingerprint Recognition.
The fingerprint is a classical and old fashion method. The basic protocol of fingerprint
recognition is verifying a match between two fingerprints. This method needs a specific
device for the fingerprint scanning. There are few advantages in this method which is
more practical in forensic investigation areas. It also has a very large legacy database of
fingerprint. However, fingerprint recognition can be impossible when there has
permanent or temporal damage on the fingerprint. Basically, the fingerprint scanner
cannot accept an oily and a dry fingerprint.
12
2.3.4 Voice Recognition
Voice is a common and natural way that will be used to communicate our ideas to others
in our immediate surroundings. Voice recognition represents an important biometric
field. It is a process of recognizing and identifying of an unknown speaker by
comparing information of each individual speech signal. This technology makes it
possible to verify a person’s identity by using speaker voice to access various services
such as database access service, information services and voice mail, banking by
telephone and security control for confidential information areas. Voice recognition is a
one of the fastest growing biometric technologies and this technology are widely used
by companies. This is because this technology has a reusable data. Voice data can be
reused like any data which means we can delete the data and replace with the other data.
For example, in telephone banking, it provides an easy and comfortable way to identify
customer without them come down to organize location. Besides, each person has
different pitch of voice including twins. This is in contrast with fingerprints, for
instance, where the system just can ask the user to put his or her finger on the surface, or
perhaps to ask for an exact finger, which will just be one of 10 possibilities[2].
Previously, there has inability to reduce the background noise to make the recording as
efficient as possible. However, with new algorithm and technique the noise can be
reduced. Furthermore, these systems require a person to speak loud. The best feature of
voice recognition is robust against noise distortion and the dimension of the feature
should be low. This is to prevent high cost in the development of the system.
13
Voice recognition can be classified into two categories either voice verification or voice
identification. Generally, voice verification is a process of accepting or rejecting the
identity claim of a speaker. When an unknown speaker claimed an identity, his or her
voice print will be compared to a database model. The identity will be accepted if the
match of the unknown speaker and the database is matched. For voice identification,
when there has an input speech signal from the unknown speaker to the system, the
system will analyze the voice and then compared with a speech sample of known
speakers. The unknown speaker is identified based on the best matched of database
model. The number of decision alternatives is a difference between verification and
identification. Choices in verification are rejected or accepted while in the identification
number of populations equal to a number of result. Thus, voice verification is not
depend on number population and identification are depending on the number of
population.
2.3.5 Text Dependent and Text Independent.
Text dependent system, require a person to speak exactly the enrolled or given password
during enrollment and identity verification. Then, the system will compare the voice
print from enrollment with the database model from verification. Besides, text
dependent can be seeks to associate an unknown speaker with a number from a
(registered) population, given a textual transcription of the phrases uttered by the
14
speaker [1]. There are a few list system of text dependent which has a fixed password
system, user specific text dependent system, vocabulary dependent system, speech event
dependent system and machine driven text independent system (Merlin, 1991). Other
than that, the system only can recognize the speaker when the predictable word had
been used. Nowadays, most of voice biometric system are text dependent and it suitable
real application. Generally, text dependent can be describing system will verify the
unknown speaker without limitation on the speech content. For text dependent, it needs
an appropriate training and testing to attain a good performance. Moreover, this
techniques requires a longer enrollment process to identify. Basically, text independent
is most challenging than text dependent.
2.4 Feature Extraction and Feature Matching
There are two main modules in voice recognition, which is feature extraction and
feature matching. Basically, feature extraction can be described as a process of database
to be built. Feature extraction is the process that extracts a minor amount of data from
the voice signal that can later be used to characterize each speaker [2]. Particularly,
eliminating various sources of information, such as whether the sound is voiced for
unvoiced and, if voice, it eliminates the effects of the periodicity or pitch, amplitude of
excitation signal and fundamental frequency [9]. In this project, short term spectrum
was chosen to be used. This is because it ways to implement, easy to extract and not
15
necessarily a large data need. There are few of feature extraction, namely Mel
Frequency Cepstrum coefficient(MFCC), Linear Predictive Analysis (LPC) and
Perceptual Linear Predictive Coefficient (PLP).
2.4.1 Mel Frequency Cepstrum Coefficient (MFCC)
This algorithm is most popular and best known in the voice recognition industry. MFCC
is a mimic of the human ear, so the computer should recognize about this since our
understanding is also through our ears. MFCC is based feature vectors are extracts from
pitch of voice. There are a few steps to be taken to calculate the MFCC which is,
framing, hamming window, Fast Fourier Transform algorithm (FFT), Mel Frequency
wrapping and MFCC can be calculated using equation below.
16
2.4.2 Linear Predictive Coding (LPC)
LPC is one of traditional method in feature extraction. It’s a very popular feature in
early of voice recognition. The basic principle of LPC is it assumed that each speech
signal is produced by buzzer at the end of the speech signal with hissing or popping
sounds. However, it’s actually similar to the reality of speech signal. LPC is analyzed by
estimate or assumed the intensity and frequency of the buzz. His feature exaction is not
suitable used in real time application. This is because to compare LPC vectors, need an
Iakura-Saito distance to measure the similarity which is an expansive measurement.
2.4.3 Perceptual Linear Predictive (PLP)
This feature extraction is similar to the LPC analysis where it based on the short term
spectrum. The basic principle of this feature is to minimize the differences and preserve
important feature of speech signal information. In contrast, PLP modified the short term
spectrum into psychophysics. It consists 3 concepts which are:
The critical band spectral resolution
The equal-loudness curve
The intensity-loudness power law
17
For feature matching is a process of matching between the unknown speakers by
comparing unknown speaker with the database of voice recognition. It also can be
described which feature matching involves the actual procedure to identify the known
speakers speech by comparing extracted features from his or her voice input with the
ones from a set of known speaker. Vector Quantization (VQ), Hidden Markov Model
(HMM), Gaussian Mixture Model (GMM) and Dynamic Time Wrapping (DTW) are an
example of feature matching or pattern matching.
2.4.4 Vector Quantization (VQ)
For the matching feature, the researches use vector quantization (VQ). This method is a
traditional quantization technique. The function of this technique by dividing a large
point into groups having an approximately same number of points closest to them where
each group is represented by its centroid point. This method is powerful in identifying
the density of large and high dimensioned data. Besides, it commonly occurs data have
low error, and rarely has high error data. It’s also suitable for lossy data compression.
Vector quantization method there sets of code vector or code book. In the training
process, use the VQ codebook for each speaker gathering the acoustic vectors of each
speaker in the database created.
18
2.4.5 Hidden Markov Model (HMM)
Generally, it is a statistical Markov model which the system needs to assume to be a
Markov process with a hidden state. In voice recognition sector, HMM is a modern
approach in feature extraction and it’s popular in feature matching. The characteristic of
speech signal can be approximated in stationary process. Besides, HMM can be trained
automatically, simple and reasonable to use it.
2.4.6 Gaussian Mixture Model (GMM)
GMM is a modern technique which is it more accurate than other feature matching. In
this technique, there are two principles that needs to be taken. Firstly, is a possibility to
represent the speaker dependent vocal tract. Next is observed that a linear combination
of Gaussian function is capable which large class distribution.
GMM works a different way based on the choice based on principle. For more specific,
the distribution of the feature extraction vector of the speech signal is modeled a
Gaussian Mixture density which shown in equation below.
19
2.5 Conclusion
Voice recognition is one of the best biometric technologies in authentication industries.
There are two modules involves in voice recognition which are feature extraction. In
this project, MFCC and VQ algorithm was used to complete the modules.
20
CHAPTER 3
METHODOLOGY
3.1 Introduction
This project consists of developmental works which is software development. For
the software development, Matlab had been used to execute the voice recognition
process.
Since this project has a system as its main element, it is important the stages of the
project are carefully planned out and executed in order to ensure that the project is
completed on time and effectively based on the time given by faculty. The design
method that I have adopted is the ‘Waterfall Model’.
3.2 Waterfall Model
The waterfall model is a popular version of the systems development life cycle
model. The waterfall model is named as such because each element of it ‘flows’
into the next much like water flows down a real waterfall. Waterfall development
21
has distinct goals for each phase of development where each phase is completed
for the next one is started and there is no turning back.
Figure 2. Waterfall model
The stages of the waterfall methodology that this project is concerned with are, in
order; Requirements, Design, Implementation, Verification and Maintenance. The
requirements consist of a list of things that the completed project must do in order
to be complete. In the design phase, we are design the system and the framework.
Mel Frequency cepstrum coefficient (MFCC) approach, Vector Quantization
Approach and feature matching is used to design the system. The implementation
22
stage is the stage in which the design is actually turned into a real working system.
The verification stage is ensuring that the system meets the requirements in the
earlier stage and that there are no problems with the system. Finally, the
Maintenance stage is concerned with the installation of the system and keeping it
in working use. The perceived advantages of the waterfall process are that it
allows for departmentalization and managerial control. A schedule is set with
deadlines for each stage of development and a project can proceed through the
development process. In theory, this process leads to the project being delivered
on time because each phase has been planned in detail.
This project cycle stages are:
3.2.1 Requirement Phase
In the requirement phase, when voice signal is capture database of voice
recognition will go through process of feature extraction using MFCC approach in
a Matlab. In this approach to convert the speech signal and to extract important
parameter for further analysis. There are few steps need to be taken in MFCC
process which are framing, hamming window, Fast Fourier Transform (FFT), Mel
frequency wrapping, ceptrum and MFCC. The result of MFFC is an acoustic
vector and it will store in the database. A speaker specific codebook is generated
by clustering an acoustic vector in the feature extraction. The distance between
vector and a codebook called a VQ distortion.
23
The main process of matching phase is to identify an unknown speaker by
comparing unknown speech input with an acoustic vector in the database of voice
recognition. The matched output depends on VQ distortion and Euclidean
distance. As mentioned previously, value of Euclidean distance may vary. When
VQ distortion is less than Euclidean distance, the output is matched while VQ
distortion more than Euclidean distance, the output is unmatched.
3.2.2 Design Phase
In the design phase, we design for the system and the framework.
Figure 3. Framework of voice recognition.
24
The voice signal that had been capture will be analyze and extract the feature. Mel
Frequency cepstrum coefficient (MFCC) approach will capture important
parameter in speech signal. Vector Quantization approach is one of the techniques
in the feature matching. It is a traditional method in the feature matching but it
commonly uses in voice recognition. Vector Quantization Method is used for high
accuracy and ease of implementation. Generally, it is developing of mapping
vectors from a large vector space into a finite number of regions in that space.
Lastly, feature matching to identify the unknown speaker by comparing unknown
speech input with an acoustic vector in the database of voice recognition.
3.2.3 Implementation Phase
With inputs from the system design, the system is first developed start from
capture the voice signal, feature extraction, vector quantization and lastly feature
matching. Each phase is developed and tested for its functionality. I am using
Matlab for all those feature.
3.2.4 Maintenance phase
There will have some issues which is will come up in the user environment. To
fix those issues, we need to know the need of user and the system must user
25
friendly. Maintenance is done to serve the user with the best and ease to use
system.
3.3 Software implementation
3.3.3 MATLAB
Matlab can be describe a high performance language for technical
computing, which it integrates between computation, visualization and
programming environment. Matlab is stands for MATrix LABarotary and
it was written originally to provide easy access to matrix software. Matlab
is widely used because it has sophisticated data structures, contains built-in
editing and debugging tools and supports object-oriented programming.
Besides, Matlab is excellent software compared to conventional computer
language such as C language and FORTRAN in solving technical
problems. The graphics command makes the results immediately available
and easy to understand. Moreover, Matlab is known as independence
software because it can perform at all platforms.
In industry, Matlab is the tool of best for high productivity study,
development and analysis while in the universities, Matlab is used as a
standard instructional tool in mathematics, engineering and science. In
26
Matlab, there are feature a family specific application which collected in
package referred to the toolbox. Signal processing, audio processing,
symbolic computation, control theory, simulation are examples of a
toolbox in the MATLAB. The major tools from the MATLAB desktop ae
command window, command history window, workspace, current
directory , help browser and start button.
3.3.2 Mel frequency Cepstrum coefficient (MFCC) Approach
MFCC approach is to develop a database for voice recognition. The process of
voice recognition needs to be followed MFCC method is a feature extraction. It
uses to capture important parameter in speech signal. Mel Frequency Cepstrum
Coefficient (MFCC) is one of the short terms spectral of a sound. Basically,
sampling rate above 10000Hz is used during recording the speech signal. The
purpose is to minimize the effect of aliasing in the analog to digital conversion. In
this system, 220150Hz has been chosen for sampling rate. MFCC function is like
amimic the behavior of the human ear. MFCC process produces a number of
coefficients that identify the processed speech and these parameters are used in
speaker recognition or in speaker verification systems. There are few steps need to
be taken in MFCC process which are, framing, hamming window, Fast Fourier
27
Transform(FFT), mel frequency wrapping, cepstrum and MFCC. Each steps has
its own function and analysis.
3.3.3 Frame Blocking
Frame blocking or framing is used after the continuous voice is captured and
blocked into frame of N samples, through adjacent frames being divided by
M(M<N). typically in this section, speech sample in boxes within the range 10ms
to 60ms. The purposes of frame blocking is to ensure the speech signal in short
period of time. The characteristic of speech signal in a short period of time shows
the speech signal nearly in stationary which is it easy to analyze. A long period of
time of speech signal may cause the characteristic of speech signal change. Figure
below shows a sample of a speech signal of unknown speaker.
Figure 5 : the capture of voice signal in 5 seconds
28
3.3.4 Windowing
The next step is windowing where each individual frame will through this process
as to minimize the signal discontinuities of speech signal at the beginning and at
the end of the frame. Besides, it also to minimize the spectral distortion by using
the window to attenuate to zero value at the beginning and the end of the frame.
The window can be defined in equation 3.1.
Where N is the number samples in each frame.
3.3.5 Fast Fourier Transform (FFT)
The algorithm are by Cooley and Tukey. The purpose of this algorithm is to
convert each frame from time domain to frequency domain. Fast Fourier
Transform(DFT) and it improves the performance of the system. Besides, Fast
Fourier Transform also a fast algorithm to implement DFT.
29
3.3.6 Mel Frequency wrapping
Then, next processing is mel frequency wrapping. Psychophysical studies have
shown that human perception of the frequency contents of sounds for speech
signals does not follow a linear scale. In this case, each tone with actual frequency
, f, is an Hz, while pitch is measured on a scale known as ‘Mel’ scale. The mels
can be computed by using equation 3.2.
It can be seen here, the parameter that had been extracted in voice recognition is a
pitch voice. Pitch of voice are proportional to frequency.
3.3.7 Cepstrum
The final step is converting the mel or frequency spectrum to time. The cepstrum
representation of the speech spectrum provides a good representation of the local
spectral properties of the signal for the given frame analysis. The results of
converting is a Mel Frequency Cepstrum coefficient(MFCC) while the result of
30
MFCC is a acoustic vector. This acoustic vector will be stored in train folder or
database of voice recognition and will be used in the feature matching process.
All od this process was implemented in Matlab. Matlab software has a built
function for MFCC approach. All the source code is shown in Appendix.
3.4 Vector Quantization Approach
Vector quantization is one of the techniques in the feature matching. It is a
traditional method in the feature matching but it commonly uses in voice
recognition. Vector Quantization Method is used for high accuracy and ease of
implementation. Generally, VQ is developing of mapping vectors from a large
vector space into a finite number of regions in that space. The region is called a
cluster and signified by its center called a codebook or code vector. A speaker
specific codebook is generated for each known speaker by collecting his or her
acoustic vectors from feature extraction. Codebook is a combination of code word
or centroid from a region in acoustic vector.
Figure 3.3 shows the resultant of code word or centroid. The circle and triangle at
the center represent speaker 1 and speaker 2 respectively. The distance between
vector(sample) and a code word is called VQ distortion.
31
3.5 Feature Matching
The purpose of feature matching is to identify the unknown speaker by comparing
unknown speech input with an acoustic vector in the database of voice
recognition. The process of matching depends on VQ distortion and Euclidean
distance. Euclidean distance was set at the first of coding. It can be varied and to
get the Euclidean distance value, need to and error to get the suitable and
compatible in the process of matching. The output matched or unknown speaker is
identified when the VQ distortion is less than the Euclidean distance while the
unmatched output is when the VQ distortion is more than Euclidean distance.
3.7 Project overview
Figure above shows a system overview of voice recognition. This system is a text
dependent which is a speaker need to speak a specific word. The first stage of this
system is an unknown speaker will give an input speech signal using a
microphone. The speech signal will recorded in .wave or .wav file. Then, the
entire recorded speech signal will be stored in the database of voice recognition or
train folder.
32
Database of voice recognition will go through process of feature extraction using
MFCC approach in Matlab. Basically, in this approach to convert the speech
signal and to extract the important parameter for further analysis. There are few
steps need to be taken in MFCC process; framing, hamming window, Fast Fourier
Transform(FFT), mel frequency wrapping, cepstrum and MFCC. The results of
MFCC is an acoustic vector and a codebook called a VQ distortion.
The main process of matching phase is to identify an unknown speaker by
comparing unknown speech input with an acoustic vector in the database of voice
recognition. The matched output depends on VQ distortion and Euclidean
distance. As mentioned previously, value of Euclidean distance may vary. When
VQ distortion is less than Euclidean distance, the output is unmatched.
When the output is matched, user can enter into the system. For unmatched
output, user cannot enter into the system.
33
3.8 Conclusion
As a conclusion, the system consists of softwares development which is it is more
focus on to develop a database of voice recognition. Software development is
more focus on to develop a database of voice recognition. Arduino
microcontroller was used as the main of hardware in this project. The purpose of
development of hardware is to show the functionality of the voice recognition
security system.
34
REFERENCES
1. Campbell, J.P., Jr., "Speaker recognition: a tutorial," Proceedings of the IEEE
September 1997. 85(9): 1437-1412.
2. Gupta, Dipmoy, et al. "Isolated Word Speech Recognition Using Vector
Quantization (VQ)." International Journal 2.5 May 2012. 2(1):114-118.
3. Thomas F. Quatieri. “Speech Signal Processing: principles and practice”. New
York: Prentice Hall PTR. 2002.
4. http://www.rfidjournal.com/articles/view?1980Oct2013 (Internet Source)
5. Majekodunmi, Tiwalade O., and Francis E. Idachaba.. "A Review of the
Fingerprint, Speaker Recognition, Face Recognition and Iris Recognition Based
Biometric Identification Technologies." Proceedings of the World Congress on
Engineering. 2011 Vol. 2.
6. Z. Sony, RM4mil stolen within first three months, The Star Online, 2011
7. http://www.anu.edu.au/people/Roger.Clarke/DV/HumanID.html(Internet
Sources)
8. Faundez-Zanuy, M. "Biometric security technology," Aerospace and Electronic
Systems Magazine, IEEE,2001. 21(1): 15-21.
9. Shrawankar, Urmila, and Vilas M. Thakare. "Techniques for Feature Extraction
in Speech Recognition System: A Comparative Study." 2013. 1305-1145.