AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT …greenskill.net/suhailan/fyp/report/038013.pdfLimitation of work 4 5 CHAPTER II LITERATURE REVIEW 2.1 Introduction 6 - 7 2.2 Benefits

AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT USING VOICE

RECOGNITION

NORSHUHADA HASZARI

BACHELOR OF COMPUTER SCIENCE

(NETWORK SECURITY)

UNIVERSITI SULTAN ZAINAL ABIDIN

2017

AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT USING

VOICE RECOGNITION

NORSHUHADA BINTI HASZARI

Bachelor of Computer Science (Network Security)

Faculty of Informatics and Computing

Universiti Sultan Zainal Abidin, Terengganu, Malaysia

MAY 201

i

i

DECLARATION

I hereby declare that this report is based on my original work except for quotations and

citations, which have been duly acknowledged. I also declare that it has not been

previously or concurrently submitted for any other degree at Universiti Sultan Zainal

Abidin or other institutions.

________________________________

Name : ..................................................

Date : ..................................................

ii

CONFIRMATION

This is to confirm that:

The research conducted and the writing of this report was under my supervisor.

________________________________

Name : ..................................................

iii

Date : ..................................................

DEDICATION

First and foremost , praise be to Allah because if His love and strength that He has given

to me to complete my final year project entitled “Authentication System for Security

Enhancement using Voice Recognition.” I do thank for His blessing to my daily life, good

health, healthy mind and good ideas although I have to go through some difficulties along

the way.

I take this opportunity to express my gratitude and deep regards to those who had

contributed in the completion of this project. First, I would like to express my sincere

gratitude to my supervisor, Puan Siti Dhalila binti Mohamad Satar for the motivation,

support, advices and experiences. His guidance on helping me in all the time with the

accomplished final year project.

Last but not least, I would also like to thanks to my family members for their concern,

encouragement and understanding.

Finally, thanks to those who have contributed directly or indirectly to the success of this

project whom I have not mentioned their name specifically. Without them, this project

would not successful.

iv

ABSTRACT

Voice recognition is one of the biometric technologies used in a security system to

reduce cases of fraud. We present an authentication system using voice recognition to

enhance the security for entering into a system. To be authenticated, this project plan to

present an authentication scheme which is voice recognition. A voice recognition

provides a significant increase in security. The problem occurred when some individuals

have too many passwords. One reason for this is because of the World Wide Web

(WWW). An increasing number of Web sites ask users to register, requesting both a user

name and password. Since most of these registrations are free, and only used for

marketing purposes rather than security, the result is an increase of passwords. So, with

the use of authentication system, the hacker or attacker won’t be able to get into the main

page and access the user’s information. Voice recognition is more secured because our

voices are unique. The system cannot be cheated by mimicking a voice, and will

recognize the voice even if you have a cold or are in a loud office. The system built is

user-friendly and is able to learn quickly and easily understandable. It eliminates the

problem related with ID pin authentication system. Three stages of process were used,

feature extraction, vector quantization and feature matching to analyze the voice signal.

After analyze the voice, the highest correct percentage same as in the database will be

allowed to enter into the system.

v

ABSTRAK

Pengecaman suara adalah satu daripada teknologi biometrik digunakan dalam sistem

keselamatan untuk mengurangkan kes-kes penipuan. Kami memperkenalkan sistem

pengesahan menggunakan pengecaman suara untuk meningkatkan keselamatan untuk

memasuki sistem. Untuk disahkan, rancangan projek ini untuk memberikan skim

pengesahan yang pengecaman suara. A pengecaman suara menyediakan peningkatan

yang ketara dalam keselamatan. masalah itu berlaku apabila sesetengah individu

mempunyai terlalu banyak kata laluan. Salah satu sebabnya adalah kerana World Wide

Web (WWW). Semakin banyak laman web meminta pengguna untuk mendaftar, meminta

kepada penggunaan nama pengguna dan kata laluan. Oleh kerana kebanyakan

pendaftaran ini adalah percuma, dan hanya digunakan untuk tujuan pemasaran bukannya

keselamatan, hasilnya adalah peningkatan yang banyak pada kata laluan. Jadi, dengan

penggunaan sistem pengesahan, penggodam tidak akan dapat masuk ke halaman utama

dan mengakses maklumat pengguna. Pengecaman suara adalah lebih terjamin kerana

suara kami adalah unik. Sistem ini tidak boleh ditipu dengan meniru suara, dan akan

mengenali suara walaupun anda demam atau di kawasan yang bising. Sistem yang dibina

adalah mesra pengguna dan dapat belajar dengan cepat dan mudah difahami. Ia

menghapuskan masalah yang berkaitan dengan sistem ID pengesahan pin. Tiga peringkat

proses telah digunakan, pengekstrakan ciri, vektor pengkuantuman dan ciri yang hampir

vi

sama untuk menganalisis isyarat suara. Selepas menganalisis suara, peratusan betul

tertinggi sama seperti dalam pangkalan data akan dibenarkan masuk ke dalam sistem.

vii

CHAPTER TITLE PAGE

DECLARATION

CONFIRMATION

DEDICATION

ABSTRACT

ABSTRAK

TABLE OF CONTENTS

LIST OF FIGURES

LIST OF ABBREVIATIONS

i

ii

iii

iv

v – vi

vii – x

xi

xii

viii

CHAPTER I INTRODUCTION

1.1 Background of study 1 – 2

1.2 Problem statement 3

1.3 Objectives 4

1.4

1.5

Scopes

Limitation of work

4

5

CHAPTER II LITERATURE REVIEW

2.1 Introduction 6 - 7

2.2 Benefits of Bio-metric Technology 8

2.3 Classes of Bio-metric Technology

2.3.1 Face Recognition

2.3.2 Iris Recognition

2.3.3 Fingerprint recognition

2.3.4 Voice recognition

2.3.5 Text dependent and text independent

9 - 14

ix

2.4 Feature extraction and feature matching

2.4.1 Mel frequency Cepstrum coefficient

(MFCC)

2.4.2 Linear predictive coding (LPC)

2.4.3 Perceptual linear predictive (PLP)

2.4.4 Vector Quantization (VQ)

2.4.5 Hidden Markov Model (HMM)

2.4.6 Gaussian Mixture Model (GMM)

14 -19

2.5 Conclusion 20

CHAPTER III

METHODOLOGY

3.1 Introduction 20

3.2 Waterfall Methodology 20-25

3.3 Software implementation 25-30

3.3.1 MATLAB

3.3.2 Mel frequency Cepstrum coefficient

(MFCC) Approach

3.3.3 Frame Blocking

3.3.4 Windowing

x

3.3.5 Fast Fourier Transform (FFT)

3.3.6 Mel frequency wrapping

3.3.7 Cepstrum

3.4

3.5

Vector Quantization Approach

Feature Matching

31

31-32

3.6 Project overview 32-33

3.7 Conclusion 33

xi

FIGURE NO. TITLE PAGE

1 Classes of biometric 9

2 Waterfall model 21

3 Framework of voice recognition. 23

4 The capture of voice signal in 5 seconds 27

xii

LIST OF ABBREVIATIONS

DFT -Discrete Fourier Transform

FFT -Fast Fourier Transform

FOTRAN -Formula Translation

GMM -Gaussian Mixture Model

HMM - Hidden Markov Model

LPC -Linear Predictive Analysis

MATLAB -Matrix Laborotary

MFCC -Mel Frequency Cepstrum

Coefficient

PLP -Perceptual Linear Predictive

RFID -Radio Frequency Identification

VQ -Vector Quantization

xiii

1

CHAPTER1

INTRODUCTION

1.1 Background Of Study

Security is the level of resistance to, or a protection from harm or risk. It applies to any

defenseless and valuable asset, such as dwelling, community, nation or government.

Security also can be described to defend the important things from threat or danger,

stealing and fraud. Nowadays, security system is widely used in various sectors such as in

the industries and housing industry. Besides, a lot of applications require reliable and

secure authentication methods to confirm the identity of an individual requesting.

In a computer security system human factors are considered as the weakest link such as

user becoming ever more device dependent. However, there are three major areas where

interaction between human and computer is important, which are authentication,

developing secure systems and security operations. Here, we are focus more on the

authentication problem.

Password and user identity pin number (id pin) protection are the most common

authentication system that being used. The basal protocol of a password and user ID pin

2

protection is to remember password for security access control. Complexity and longer

password are created to maintain the security. However, the complex and long password

can be lost. Besides, the Radio Frequency Identification (RFID) tag is one of the

technologies in security system, but there some fact that can damage the RFID tag such

damaged on electrostatic charge. Technology had been amended with each single day. By

times past, there is an alternative means of authentication system other than word and

user pin protection. There is some development of research in the hallmark system. The

developers mention is using voice biometric as a spiritualist in the certification scheme

for security access control. For example, electrical appliance will be fudge by using a

voice as the medium. Therefore, in this project the voice will be used as the medium in

the authentication system. More importantly computer and hand held devices do not need

to have a keyboard to be useful and can really be everywhere and speaks all languages

(just patterns for computer).

3

1.2 Problem Statement

Nowadays, the security system is important to prevent loss of our belongings, property

and confidential transaction. The conventional security system was used widely in

authentication method for access control, magnetic card such as RFID tag was used in

physical access, attendance, as identification, while password and user ID pin allow user

to enter a certain premise.

Password and user id pin need to be complex and long to increase the security but it can

be forgotten. Basically, the user will use common password such as birthday date or

anniversary date as their password in order to help them memorize the password. This

common password can be easily guessed. Moreover, password, user id pin and magnetic

card can be stolen, can be duplicated (if the user used key), can be shares and this allows

imposter to access a restricted area without user authorize because conventional security

system cannot detect whether the person who access the system is a user or an imposter.

Therefore, this project will apply one of biometric technology to upgrade the level of

security than conventional security system. Biometric technology cannot be fooled or be

stolen. Besides, it also cannot be guessed as like a password. Basic protocol of biometric

technology is determining the unknown identify by comparing the identifier claimed to

the database. It’s a more convenient method in authentication system than conventional

system.

4

1.3 Objectives Of Project

Objectives are:

1. To identify features of the voice by using feature extraction and feature matching.

2. To design and develop a security system based on voice recognition.

3. To test and analysis the voice recognition security system.

1.4 Scope of Project

There are several scope that need to be identified in order to achieve the objective of this

project. In this project, it requires software implementation. For software implementation,

software (MATLAB) is required for writing the source code.

In this project, some techniques in MATLAB was implemented.

5

1.5 Limitation of Work

This system is high accuracy and efficiency in noiseless environment. There is still noise

detected if the system in noise environment. The algorithm is simple and easy

implemented, but it only minimizes small amount of noise.

6

CHAPTER 2

LITERATURE REVIEW

2.1 Introduction

French anthropologist Alphonse Bertillion is a person whose creates the first real

biometric system. By using detailed on body measurement, physical description and

photograph, he developed an identification system[norulashikin,2014]. Biometric is

taken from the Greek word, of which Bio means life and Metric means measures. By

combining these two words, Biometric can be defined as the measure (study) of life,

which includes humans, animals and plants. It also can be called life measurement.

Based on unique physiological and behavioral characteristic, we can say that biometric

is automated method of identifying the individual identity. Behaviors characteristic is

based on the unique if they do their things such as talking their name or singing. In the

other word, each person has different pitch of a voice or keystroke pattern. For

physiological characteristic is more to part of the human body, such as fingerprint, facial

feature and retina pattern.

A biometric system can operate in two modes which are verification or identification.

Verification based on unique identifier which single our a particular person and that

7

individual’s biometric while for identification it based only on biometric measurement

and it compares these measurement to the entire database that had been recorded.

This authentication system is more reliable in verifies and recognize person identity and

widely used in the application. Nowadays, there are lots of applications that need

reliable and secure authentication system to confirm the identity of individuals when

they request of service. Besides, biometric system is widely used in feature of security,

such as law enforcement, database access service, confidential transaction and others.

8

2.2 Benefits of Biometric Technology

Biometric technology is used in the security system because it has a higher security

level which is it cannot be stolen, created and forgotten, shared and loss. This

technology has a few properties that can be known as a high security level. In

universality module which is every person have the characteristic such as everybody has

a face, mouth, voice and others. It also distinctiveness means each person has different

enough to distinguish each other based on the characteristic. Besides, acceptability by

other people and this method is not bothersome invasive. Mostly the performance of

biometric technology is accurate and successful and also nearly impossible to forge.

Next, it also a circumvention which means an ability of fraudulent people and

techniques to fool the biometric system should be legible[2]. Compare with the

conventional security system, biometric technology is more convenient to use.

9

2.3 Classes of Biometric Technology

Figure 1 : Classes of biometric

Figure 1 above shows a class of biometric technology. In biometric technology,

fingerprint and handwriting are an earliest authentication system while the recent ones

included face print, iris or eyes print, finger print and voice print. Generally, face print is

a process of automatically identifying or verifying a person from a digital image or a

video source. An image of the face is captured and analyzed in order to derive a

template.

10

2.3.1 Face Recognition

In face recognition, can be divided into two main modules which is geometric module

and photo metric module. In geometric module it based on the looks at distinguishing

feature as human face such as eyes, mouth and etc. while for photo metric module it

based on statistical approach that distill an image into values and comparing the values

with templates to eliminate variances[5]. This bio-metric technology is suitable with a

small application where it only works with photograph database, video tape and camera.

However, the basic protocol of face recognition, sufficiency of light is needed in order

to get a better identification result. The basic weakness that can be figured in this

method are when a person changes their hairstyles, do makeup and facial hair.

Furthermore, when a person is exposed to the sun it may cause false detection during

identification phase.

11

2.3.2 Iris Recognition

Iris recognition is based on the resolution images of iris an individual’s eye. This

method needs high quality of camera technology such as kiosks-based system which it

is the most expensive one and easier to operate [2]. This biometric technology has high

capability to differentiate between individuals, even between the user’s left and right

eyes [2]. The benefits of this method, it has the smallest outer which is people cannot

use or enroll of all biometric technologies. Besides, it has longevity that can last a

lifetime. Moreover, the performance may degrade because of impaired by glasses,

sunglasses and contact lenses.it also has a few of legacy database.

2.3.3 Fingerprint Recognition.

The fingerprint is a classical and old fashion method. The basic protocol of fingerprint

recognition is verifying a match between two fingerprints. This method needs a specific

device for the fingerprint scanning. There are few advantages in this method which is

more practical in forensic investigation areas. It also has a very large legacy database of

fingerprint. However, fingerprint recognition can be impossible when there has

permanent or temporal damage on the fingerprint. Basically, the fingerprint scanner

cannot accept an oily and a dry fingerprint.

12

2.3.4 Voice Recognition

Voice is a common and natural way that will be used to communicate our ideas to others

in our immediate surroundings. Voice recognition represents an important biometric

field. It is a process of recognizing and identifying of an unknown speaker by

comparing information of each individual speech signal. This technology makes it

possible to verify a person’s identity by using speaker voice to access various services

such as database access service, information services and voice mail, banking by

telephone and security control for confidential information areas. Voice recognition is a

one of the fastest growing biometric technologies and this technology are widely used

by companies. This is because this technology has a reusable data. Voice data can be

reused like any data which means we can delete the data and replace with the other data.

For example, in telephone banking, it provides an easy and comfortable way to identify

customer without them come down to organize location. Besides, each person has

different pitch of voice including twins. This is in contrast with fingerprints, for

instance, where the system just can ask the user to put his or her finger on the surface, or

perhaps to ask for an exact finger, which will just be one of 10 possibilities[2].

Previously, there has inability to reduce the background noise to make the recording as

efficient as possible. However, with new algorithm and technique the noise can be

reduced. Furthermore, these systems require a person to speak loud. The best feature of

voice recognition is robust against noise distortion and the dimension of the feature

should be low. This is to prevent high cost in the development of the system.

13

Voice recognition can be classified into two categories either voice verification or voice

identification. Generally, voice verification is a process of accepting or rejecting the

identity claim of a speaker. When an unknown speaker claimed an identity, his or her

voice print will be compared to a database model. The identity will be accepted if the

match of the unknown speaker and the database is matched. For voice identification,

when there has an input speech signal from the unknown speaker to the system, the

system will analyze the voice and then compared with a speech sample of known

speakers. The unknown speaker is identified based on the best matched of database

model. The number of decision alternatives is a difference between verification and

identification. Choices in verification are rejected or accepted while in the identification

number of populations equal to a number of result. Thus, voice verification is not

depend on number population and identification are depending on the number of

population.

2.3.5 Text Dependent and Text Independent.

Text dependent system, require a person to speak exactly the enrolled or given password

during enrollment and identity verification. Then, the system will compare the voice

print from enrollment with the database model from verification. Besides, text

dependent can be seeks to associate an unknown speaker with a number from a

(registered) population, given a textual transcription of the phrases uttered by the

14

speaker [1]. There are a few list system of text dependent which has a fixed password

system, user specific text dependent system, vocabulary dependent system, speech event

dependent system and machine driven text independent system (Merlin, 1991). Other

than that, the system only can recognize the speaker when the predictable word had

been used. Nowadays, most of voice biometric system are text dependent and it suitable

real application. Generally, text dependent can be describing system will verify the

unknown speaker without limitation on the speech content. For text dependent, it needs

an appropriate training and testing to attain a good performance. Moreover, this

techniques requires a longer enrollment process to identify. Basically, text independent

is most challenging than text dependent.

2.4 Feature Extraction and Feature Matching

There are two main modules in voice recognition, which is feature extraction and

feature matching. Basically, feature extraction can be described as a process of database

to be built. Feature extraction is the process that extracts a minor amount of data from

the voice signal that can later be used to characterize each speaker [2]. Particularly,

eliminating various sources of information, such as whether the sound is voiced for

unvoiced and, if voice, it eliminates the effects of the periodicity or pitch, amplitude of

excitation signal and fundamental frequency [9]. In this project, short term spectrum

was chosen to be used. This is because it ways to implement, easy to extract and not

15

necessarily a large data need. There are few of feature extraction, namely Mel

Frequency Cepstrum coefficient(MFCC), Linear Predictive Analysis (LPC) and

Perceptual Linear Predictive Coefficient (PLP).

2.4.1 Mel Frequency Cepstrum Coefficient (MFCC)

This algorithm is most popular and best known in the voice recognition industry. MFCC

is a mimic of the human ear, so the computer should recognize about this since our

understanding is also through our ears. MFCC is based feature vectors are extracts from

pitch of voice. There are a few steps to be taken to calculate the MFCC which is,

framing, hamming window, Fast Fourier Transform algorithm (FFT), Mel Frequency

wrapping and MFCC can be calculated using equation below.

16

2.4.2 Linear Predictive Coding (LPC)

LPC is one of traditional method in feature extraction. It’s a very popular feature in

early of voice recognition. The basic principle of LPC is it assumed that each speech

signal is produced by buzzer at the end of the speech signal with hissing or popping

sounds. However, it’s actually similar to the reality of speech signal. LPC is analyzed by

estimate or assumed the intensity and frequency of the buzz. His feature exaction is not

suitable used in real time application. This is because to compare LPC vectors, need an

Iakura-Saito distance to measure the similarity which is an expansive measurement.

2.4.3 Perceptual Linear Predictive (PLP)

This feature extraction is similar to the LPC analysis where it based on the short term

spectrum. The basic principle of this feature is to minimize the differences and preserve

important feature of speech signal information. In contrast, PLP modified the short term

spectrum into psychophysics. It consists 3 concepts which are:

The critical band spectral resolution

The equal-loudness curve

The intensity-loudness power law

17

For feature matching is a process of matching between the unknown speakers by

comparing unknown speaker with the database of voice recognition. It also can be

described which feature matching involves the actual procedure to identify the known

speakers speech by comparing extracted features from his or her voice input with the

ones from a set of known speaker. Vector Quantization (VQ), Hidden Markov Model

(HMM), Gaussian Mixture Model (GMM) and Dynamic Time Wrapping (DTW) are an

example of feature matching or pattern matching.

2.4.4 Vector Quantization (VQ)

For the matching feature, the researches use vector quantization (VQ). This method is a

traditional quantization technique. The function of this technique by dividing a large

point into groups having an approximately same number of points closest to them where

each group is represented by its centroid point. This method is powerful in identifying

the density of large and high dimensioned data. Besides, it commonly occurs data have

low error, and rarely has high error data. It’s also suitable for lossy data compression.

Vector quantization method there sets of code vector or code book. In the training

process, use the VQ codebook for each speaker gathering the acoustic vectors of each

speaker in the database created.

18

2.4.5 Hidden Markov Model (HMM)

Generally, it is a statistical Markov model which the system needs to assume to be a

Markov process with a hidden state. In voice recognition sector, HMM is a modern

approach in feature extraction and it’s popular in feature matching. The characteristic of

speech signal can be approximated in stationary process. Besides, HMM can be trained

automatically, simple and reasonable to use it.

2.4.6 Gaussian Mixture Model (GMM)

GMM is a modern technique which is it more accurate than other feature matching. In

this technique, there are two principles that needs to be taken. Firstly, is a possibility to

represent the speaker dependent vocal tract. Next is observed that a linear combination

of Gaussian function is capable which large class distribution.

GMM works a different way based on the choice based on principle. For more specific,

the distribution of the feature extraction vector of the speech signal is modeled a

Gaussian Mixture density which shown in equation below.

19

2.5 Conclusion

Voice recognition is one of the best biometric technologies in authentication industries.

There are two modules involves in voice recognition which are feature extraction. In

this project, MFCC and VQ algorithm was used to complete the modules.

20

CHAPTER 3

METHODOLOGY

3.1 Introduction

This project consists of developmental works which is software development. For

the software development, Matlab had been used to execute the voice recognition

process.

Since this project has a system as its main element, it is important the stages of the

project are carefully planned out and executed in order to ensure that the project is

completed on time and effectively based on the time given by faculty. The design

method that I have adopted is the ‘Waterfall Model’.

3.2 Waterfall Model

The waterfall model is a popular version of the systems development life cycle

model. The waterfall model is named as such because each element of it ‘flows’

into the next much like water flows down a real waterfall. Waterfall development

21

has distinct goals for each phase of development where each phase is completed

for the next one is started and there is no turning back.

Figure 2. Waterfall model

The stages of the waterfall methodology that this project is concerned with are, in

order; Requirements, Design, Implementation, Verification and Maintenance. The

requirements consist of a list of things that the completed project must do in order

to be complete. In the design phase, we are design the system and the framework.

Mel Frequency cepstrum coefficient (MFCC) approach, Vector Quantization

Approach and feature matching is used to design the system. The implementation

22

stage is the stage in which the design is actually turned into a real working system.

The verification stage is ensuring that the system meets the requirements in the

earlier stage and that there are no problems with the system. Finally, the

Maintenance stage is concerned with the installation of the system and keeping it

in working use. The perceived advantages of the waterfall process are that it

allows for departmentalization and managerial control. A schedule is set with

deadlines for each stage of development and a project can proceed through the

development process. In theory, this process leads to the project being delivered

on time because each phase has been planned in detail.

This project cycle stages are:

3.2.1 Requirement Phase

In the requirement phase, when voice signal is capture database of voice

recognition will go through process of feature extraction using MFCC approach in

a Matlab. In this approach to convert the speech signal and to extract important

parameter for further analysis. There are few steps need to be taken in MFCC

process which are framing, hamming window, Fast Fourier Transform (FFT), Mel

frequency wrapping, ceptrum and MFCC. The result of MFFC is an acoustic

vector and it will store in the database. A speaker specific codebook is generated

by clustering an acoustic vector in the feature extraction. The distance between

vector and a codebook called a VQ distortion.

23

The main process of matching phase is to identify an unknown speaker by

comparing unknown speech input with an acoustic vector in the database of voice

recognition. The matched output depends on VQ distortion and Euclidean

distance. As mentioned previously, value of Euclidean distance may vary. When

VQ distortion is less than Euclidean distance, the output is matched while VQ

distortion more than Euclidean distance, the output is unmatched.

3.2.2 Design Phase

In the design phase, we design for the system and the framework.

Figure 3. Framework of voice recognition.

24

The voice signal that had been capture will be analyze and extract the feature. Mel

Frequency cepstrum coefficient (MFCC) approach will capture important

parameter in speech signal. Vector Quantization approach is one of the techniques

in the feature matching. It is a traditional method in the feature matching but it

commonly uses in voice recognition. Vector Quantization Method is used for high

accuracy and ease of implementation. Generally, it is developing of mapping

vectors from a large vector space into a finite number of regions in that space.

Lastly, feature matching to identify the unknown speaker by comparing unknown

speech input with an acoustic vector in the database of voice recognition.

3.2.3 Implementation Phase

With inputs from the system design, the system is first developed start from

capture the voice signal, feature extraction, vector quantization and lastly feature

matching. Each phase is developed and tested for its functionality. I am using

Matlab for all those feature.

3.2.4 Maintenance phase

There will have some issues which is will come up in the user environment. To

fix those issues, we need to know the need of user and the system must user

25

friendly. Maintenance is done to serve the user with the best and ease to use

system.

3.3 Software implementation

3.3.3 MATLAB

Matlab can be describe a high performance language for technical

computing, which it integrates between computation, visualization and

programming environment. Matlab is stands for MATrix LABarotary and

it was written originally to provide easy access to matrix software. Matlab

is widely used because it has sophisticated data structures, contains built-in

editing and debugging tools and supports object-oriented programming.

Besides, Matlab is excellent software compared to conventional computer

language such as C language and FORTRAN in solving technical

problems. The graphics command makes the results immediately available

and easy to understand. Moreover, Matlab is known as independence

software because it can perform at all platforms.

In industry, Matlab is the tool of best for high productivity study,

development and analysis while in the universities, Matlab is used as a

standard instructional tool in mathematics, engineering and science. In

26

Matlab, there are feature a family specific application which collected in

package referred to the toolbox. Signal processing, audio processing,

symbolic computation, control theory, simulation are examples of a

toolbox in the MATLAB. The major tools from the MATLAB desktop ae

command window, command history window, workspace, current

directory , help browser and start button.

3.3.2 Mel frequency Cepstrum coefficient (MFCC) Approach

MFCC approach is to develop a database for voice recognition. The process of

voice recognition needs to be followed MFCC method is a feature extraction. It

uses to capture important parameter in speech signal. Mel Frequency Cepstrum

Coefficient (MFCC) is one of the short terms spectral of a sound. Basically,

sampling rate above 10000Hz is used during recording the speech signal. The

purpose is to minimize the effect of aliasing in the analog to digital conversion. In

this system, 220150Hz has been chosen for sampling rate. MFCC function is like

amimic the behavior of the human ear. MFCC process produces a number of

coefficients that identify the processed speech and these parameters are used in

speaker recognition or in speaker verification systems. There are few steps need to

be taken in MFCC process which are, framing, hamming window, Fast Fourier

27

Transform(FFT), mel frequency wrapping, cepstrum and MFCC. Each steps has

its own function and analysis.

3.3.3 Frame Blocking

Frame blocking or framing is used after the continuous voice is captured and

blocked into frame of N samples, through adjacent frames being divided by

M(M<N). typically in this section, speech sample in boxes within the range 10ms

to 60ms. The purposes of frame blocking is to ensure the speech signal in short

period of time. The characteristic of speech signal in a short period of time shows

the speech signal nearly in stationary which is it easy to analyze. A long period of

time of speech signal may cause the characteristic of speech signal change. Figure

below shows a sample of a speech signal of unknown speaker.

Figure 5 : the capture of voice signal in 5 seconds

28

3.3.4 Windowing

The next step is windowing where each individual frame will through this process

as to minimize the signal discontinuities of speech signal at the beginning and at

the end of the frame. Besides, it also to minimize the spectral distortion by using

the window to attenuate to zero value at the beginning and the end of the frame.

The window can be defined in equation 3.1.

Where N is the number samples in each frame.

3.3.5 Fast Fourier Transform (FFT)

The algorithm are by Cooley and Tukey. The purpose of this algorithm is to

convert each frame from time domain to frequency domain. Fast Fourier

Transform(DFT) and it improves the performance of the system. Besides, Fast

Fourier Transform also a fast algorithm to implement DFT.

29

3.3.6 Mel Frequency wrapping

Then, next processing is mel frequency wrapping. Psychophysical studies have

shown that human perception of the frequency contents of sounds for speech

signals does not follow a linear scale. In this case, each tone with actual frequency

, f, is an Hz, while pitch is measured on a scale known as ‘Mel’ scale. The mels

can be computed by using equation 3.2.

It can be seen here, the parameter that had been extracted in voice recognition is a

pitch voice. Pitch of voice are proportional to frequency.

3.3.7 Cepstrum

The final step is converting the mel or frequency spectrum to time. The cepstrum

representation of the speech spectrum provides a good representation of the local

spectral properties of the signal for the given frame analysis. The results of

converting is a Mel Frequency Cepstrum coefficient(MFCC) while the result of

30

MFCC is a acoustic vector. This acoustic vector will be stored in train folder or

database of voice recognition and will be used in the feature matching process.

All od this process was implemented in Matlab. Matlab software has a built

function for MFCC approach. All the source code is shown in Appendix.

3.4 Vector Quantization Approach

Vector quantization is one of the techniques in the feature matching. It is a

traditional method in the feature matching but it commonly uses in voice

recognition. Vector Quantization Method is used for high accuracy and ease of

implementation. Generally, VQ is developing of mapping vectors from a large

vector space into a finite number of regions in that space. The region is called a

cluster and signified by its center called a codebook or code vector. A speaker

specific codebook is generated for each known speaker by collecting his or her

acoustic vectors from feature extraction. Codebook is a combination of code word

or centroid from a region in acoustic vector.

Figure 3.3 shows the resultant of code word or centroid. The circle and triangle at

the center represent speaker 1 and speaker 2 respectively. The distance between

vector(sample) and a code word is called VQ distortion.

31

3.5 Feature Matching

The purpose of feature matching is to identify the unknown speaker by comparing

unknown speech input with an acoustic vector in the database of voice

recognition. The process of matching depends on VQ distortion and Euclidean

distance. Euclidean distance was set at the first of coding. It can be varied and to

get the Euclidean distance value, need to and error to get the suitable and

compatible in the process of matching. The output matched or unknown speaker is

identified when the VQ distortion is less than the Euclidean distance while the

unmatched output is when the VQ distortion is more than Euclidean distance.

3.7 Project overview

Figure above shows a system overview of voice recognition. This system is a text

dependent which is a speaker need to speak a specific word. The first stage of this

system is an unknown speaker will give an input speech signal using a

microphone. The speech signal will recorded in .wave or .wav file. Then, the

entire recorded speech signal will be stored in the database of voice recognition or

train folder.

32

Database of voice recognition will go through process of feature extraction using

MFCC approach in Matlab. Basically, in this approach to convert the speech

signal and to extract the important parameter for further analysis. There are few

steps need to be taken in MFCC process; framing, hamming window, Fast Fourier

Transform(FFT), mel frequency wrapping, cepstrum and MFCC. The results of

MFCC is an acoustic vector and a codebook called a VQ distortion.

The main process of matching phase is to identify an unknown speaker by

comparing unknown speech input with an acoustic vector in the database of voice

recognition. The matched output depends on VQ distortion and Euclidean

distance. As mentioned previously, value of Euclidean distance may vary. When

VQ distortion is less than Euclidean distance, the output is unmatched.

When the output is matched, user can enter into the system. For unmatched

output, user cannot enter into the system.

33

3.8 Conclusion

As a conclusion, the system consists of softwares development which is it is more

focus on to develop a database of voice recognition. Software development is

more focus on to develop a database of voice recognition. Arduino

microcontroller was used as the main of hardware in this project. The purpose of

development of hardware is to show the functionality of the voice recognition

security system.

34

REFERENCES

1. Campbell, J.P., Jr., "Speaker recognition: a tutorial," Proceedings of the IEEE

September 1997. 85(9): 1437-1412.

2. Gupta, Dipmoy, et al. "Isolated Word Speech Recognition Using Vector

Quantization (VQ)." International Journal 2.5 May 2012. 2(1):114-118.

3. Thomas F. Quatieri. “Speech Signal Processing: principles and practice”. New

York: Prentice Hall PTR. 2002.

4. http://www.rfidjournal.com/articles/view?1980Oct2013 (Internet Source)

5. Majekodunmi, Tiwalade O., and Francis E. Idachaba.. "A Review of the

Fingerprint, Speaker Recognition, Face Recognition and Iris Recognition Based

Biometric Identification Technologies." Proceedings of the World Congress on

Engineering. 2011 Vol. 2.

6. Z. Sony, RM4mil stolen within first three months, The Star Online, 2011

7. http://www.anu.edu.au/people/Roger.Clarke/DV/HumanID.html(Internet

Sources)

8. Faundez-Zanuy, M. "Biometric security technology," Aerospace and Electronic

Systems Magazine, IEEE,2001. 21(1): 15-21.

9. Shrawankar, Urmila, and Vilas M. Thakare. "Techniques for Feature Extraction

in Speech Recognition System: A Comparative Study." 2013. 1305-1145.

http://www.rfidjournal.com/articles/view?1980Oct2013

http://www.anu.edu.au/people/Roger.Clarke/DV/HumanID.html

Documents

AUTHENTICATION SYSTEM FOR SECURITY ENHANCEMENT …greenskill.net/suhailan/fyp/report/038013.pdfLimitation of work 4 5 CHAPTER II LITERATURE REVIEW 2.1 Introduction 6 - 7 2.2 Benefits