7/28/2019 BIOEN 303 Final Project Report
1/21
Voice Recognition:A Discourse Between Man and Machine
BIOEN 303 Final Project
8 March 2007
Andy Chang
Charlie HuangKwang Kim
Jae Hyung Lee
Ali Ziadloo
Contents:Abstract..p. 1
Introduction...p. 2Background Information.......p. 2
Methods....pp. 3-5
Results..pp. 6-9Discussion...pp. 9-11
Conclusion...p. 11
Referencesp. 11
Appendix (MATLAB code)..pp. 12-20
7/28/2019 BIOEN 303 Final Project Report
2/21
Abstract
The objective of this project was to develop a voice recognition program using MATLAB thataccurately identifies speakers by their voice. The program is divided into three stages: (1)
recording of the voice signal, (2) filtering of the signal, (3) analysis and comparison of the signal
with stored values in a pre-made database. In the analysis portion, formants were used to
accurately represent the password phrases power spectral density for the speaker. A techniqueusing cepstrums was also implemented to determine the pitch of the speakers voice and
accurately differentiate between male and female speakers. The comparison section of the
program differentiated between each member of this project group and also between group andnon-group members. In comparing with the database of group member voices, points were given
based on how many matches were found. A threshold score value was set, depending on the
security level wanted, and speakers were identified based on the difference between their totalscore and the threshold score. Through running several test trials, it was found that group
members were successfully identified 80% of the time, and non-group members (unknown
speakers) were identified 100% of the time. Based on these results, it was concluded that the
methods employed in this project could be improved if given more time, but are nonetheless a
good stepping stone for the development of a more advanced speaker identification system basedon voice printing.
1
7/28/2019 BIOEN 303 Final Project Report
3/21
IntroductionThe development of voice recognition systems began as early as the 1960s. Voice printing is abiometric method that compares the spectral content of the voice, which is uniquely defined for
each individual and therefore difficult for others to imitate. In the future, this technique may
replace normal access control means such as keys, locks, access cards, or password combinations
that unlock doors to grant access to the bearerregardless of whether or not the bearer issupposed to have access to the restricted area.
Speaker identification research continues today in the field of digital signal processing wheremany advances have been made in recent years. The concept of a human-computer interface is
gradually entering the mainstream as it has proven its usefulness for a variety of applications.
Speech recognition plays an important role in an increasing number of our daily activities, suchas speech-to-text programs and voice-activated household appliances. Through further discovery
of this emerging technology, everyone may have the opportunity to participate in the discourse
between man and machine.1
For our current design project, a digital-based speaker identification program was developed todifferentiate each member of the project team by voice. In addition, the program detects when
the speakers voice does not belong to a member of the project team, and also distinguishesbetween male and female speakers. Although much improvements can still be made, the methods
employed in our program may be used as a stepping stone for the design of more flexible and
accurate voice recognition software. Such software may then be integrated into larger structuressuch as voice-activated security systems or appliances.
Background Information
Our voice recognition program was designed to use the acoustic features of speech and further
amplify the characteristics that each individual possesses. Comparative methods using energyspectral density, speech pattern analysis, and voiceprints of pitch were employed to further
distinguish each individuals voice traits.
The energy spectral density, also known as a formant,2
is a peak in the acoustic frequencyspectrum that results from the resonant frequencies of any acoustical system. Formants are
determined by the phonetic resonant frequencies of every individuals vocal tract, so they may be
used to distinguish the energy spectrum of one voice signal from another. The information thathumans require to distinguish between vowels can be represented purely by the energy content of
the vowel sounds. The formant with the lowest frequency is calledf1, the secondf2, and the third
f3. The first two formant frequencies are enough to disambiguate individual vowels. Vowelsusually have four formant peaks, but sometimes may have up to six.
3
Pitch analysis in our program was performed by utilizing a method involving cepstrums. Acepstrum is the result of taking the Fourier Transform (FT) of the decibel spectrum:
Cepstrum = FT of the logarithm of the FT of the original signal.
The cepstrum technique operates in the domain of quefrency. Quefrency is a measure of time,
but not in the sense of a signal in the time domain. The peak in the quefrency domain indicates a
presence of harmonic pitch. This peak occurs due to the periodic harmonics in the spectrum.4
2
7/28/2019 BIOEN 303 Final Project Report
4/21
Methods Theory of Operation
Filtering:
After studying different passwords and analyzing their vowels, we decided to make our password
a three-word phrase: Let Me In. Using MATLABs wavrecordfunction, the speakers input
voice signal was recorded and stored as a one-dimensional sequence of data. The first step in the
filtering process was to separate the vowels from each word of the password and to determine theindices of the signal where the vowels start and end. We designed a high-pass filter by
convolving a triangular window with the voice signal to remove any background noise and to
smooth out the signal. The function that implements this filter, filterZ2, has two inputs: the
voice signal and a variable which indicates the number of vowels that the password has. Our
chosen password consisted of three vowels, although filterZ2 is flexible and the number of
vowels can be changed if the password changes.
To distinguish between the vowels and consonants in each word of the password phrase, we
normalized the signal by subtracting the mean of the signal from the signal values and taking theabsolute value of the results after dividing the signal by the maximum value. Then, we defined a
threshold cutoff value (0.2 on a scale from 0 to 1) so that the vowels would pass the thresholdand the other noisy parts would be removed from the signal by assigning them to zero. Afterfiltering the signal, the start and stop points of the vowels were found by identifying the indices
of the first and last nonzero value in each vowel segment. These indices were returned as a
matrix to be used by other functions in the next stages of our program.
Formants:
Our programs main function, recordmain2, checks if the password has three syllables, and if
so, the signal data is passed to the comparison function comparor2. In comparor2,we used our
formant-calculation function, formantgen, to get the spectrums of different voices and to
distinguish between speakers. In formantgen, the MATLAB functionpyulear is called, which
calculates the power spectral density (PSD) of the voice signal. The result of this step producedthe formant of the voice, which is the frequency spectrum caused by the vocal tract and is used to
differentiate between human voices. The order of the autoregressive model for the signal was set
to 20 by trial and error to obtain the best results possible. We converted the results into decibelsby taking the log of the power spectrum and multiplying by 10, and passed the formants back to
comparor2 for more evaluation.
Cepstrums:
Besides the formant approach, a method involving cepstrums was also used to expose any
unusual pitches in the voice. In thepitchfinder function, we determined the cepstrum of the
voice signal and filtered the cepstrum using a lowpass Butterworth filter. Next, the pitch of the
voice was found by dividing the sampling frequency by the index of the first maximum (theindices of the cepstrum are represented as quefrency). This method was used specifically todetermine the sex of the subject since most women have a higher pitch than men. If the pitch was
calculated to be higher than 185 Hz, then the speaker was determined to be female.
Data Bank and Comparison:
At this point in our program, all necessary calculations have been made to compare the input
voice signal with a pre-made data bank and to determine whether or not the speaker is a member
of the group. The data bank stores several copies of the password audio of all group members.
3
7/28/2019 BIOEN 303 Final Project Report
5/21
Each word of the password phrase was recorded several times separately and the formant of each
recorded vowel and its peaks were gathered. The mean of the indices and the magnitude of thepeaks of the formants of each vowel were also collected in this data bank for each group
member.
After finding the formants of the subjects voice, thepeakfinder4 function was used to obtainthe peaks of each vowels formant and to compare them with the average of the peaks of the
formant for each group member in the data bank. For each matching index, the subject gained
one point. The magnitude of the first formant peak (the ae vowel in Let) was then comparedto the corresponding magnitude value in the database. If the magnitude was within a range of one
standard deviation of the corresponding magnitude data in the database, the speaker gained two
additional points. This process was repeated for the other two vowels (ee for Me and i for
In) of the subjects voice signal. After the input voice was compared to all the entries in thedatabase, the individual scores were summed up to give the total score for each person in the
database. After testing the program several times and analyzing the scores, we set the passingthreshold value to a score of 18 so that we could get the right match without making it too hard
to pass the test.
To add more security to our program, we extended our analysis to cases where the final point is
slightly below the threshold. If the input voice signal got a total score just below the threshold,
the program moved on to the next method where the first two peaks of each vowel werecompared in terms of indices and magnitudes, and the percentage difference was calculated. If
the second method agreed with the result of the first method, the person was admitted. Any voice
input that did not pass the two tests was rejected.
Figure 1 shows a flowchart that summarizes our methods. All of our programs MATLAB
functions are presented in the Appendix.
Methods Test Protocol
To create the database for our group, each member spoke the password at least ten times and the
averages and ranges of each vowels formant index and peak were recorded and stored. Next,
each group member went through ten test trials where recordMain2 determined the identity of
the speaker each time. These tests demonstrated the voice recognition abilities of our program.
Our program was also tested on seven classmates whose voices were not stored in our database.
These tests demonstrated the security capabilities of our program; if the speaker was not a
member of our group, our program indicated so. Our pitch detection method was concurrently
tested; if the speaker was female, our program identified her as an unknown female subject.
4
7/28/2019 BIOEN 303 Final Project Report
6/21
Figure 1: Summary of our voice recognition program.
5
7/28/2019 BIOEN 303 Final Project Report
7/21
ResultsFigure 2 shows an example of the original input voice signal before filtering. Note thatbackground noise was present in the signal.
Figure 2: Original input voice signal before any filtering.
The filtered signal is shown in Figure 3. Only the words of the password were passed; all other
background noise was zeroed out. Also, as shown in the figure, the signal was normalized to
show the difference in magnitude between the words of the voice signal.
Figure 3: Filtered and normalized input voice signal.
In Figure 4, the formants for two different subjects are shown. As one can see, the location and
magnitude of the peaks were different and distinguishable. The formants of each word were
determined separately and compared with the data bank. The best matches were used to identifythe speaker.
6
7/28/2019 BIOEN 303 Final Project Report
8/21
Figure 4: Formant comparison between two speakers for each vowelin the password phrase.
A sample result from our pitch detection method is shown in Figure 5. The first peak in thisexample was at the index 50, and the fundamental frequency was calculated by dividing the
sampling frequency by the index of the peak (pitch = 11025/50 = 220.5 Hz).
Figure 5: Example of the cepstrum of a speakers voice forthe whole password phrase.
7
7/28/2019 BIOEN 303 Final Project Report
9/21
Table 1 shows the results of running 10 test trials of our program for each group member. A
successfully identified speaker is denoted by a one (1) and non-identified or wrongly-identifiedspeaker is denoted by a zero (0).
Table 1: Results from running the voice recognition program
for each group members.
Trial Ali Andy Kwang Charlie
1 0 1 1 0
2 1 0 1 1
3 1 1 1 1
4 1 0 0 1
5 1 1 0 1
6 1 1 1 1
7 1 1 1 1
8 1 1 1 0
9 1 0 1 1
10 1 0 1 1 Total
Average 90% 60% 80% 80% 80%
Std. Dev. 10% 16.3% 13.3% 13.3% 13.2%
The average percentage of successful trials among group members is shown in Figure 6. The
lowest average percentage was 60%.
0
10
20
30
40
50
60
70
80
90
100
Ali Andy Kwang Charlie
Avg.
PercentageofSuccessfulTrials
Figure 6: Average percentage of successful trials among group members.The error bars show a range of one standard deviation from the average.
Table 2 shows the results of testing our program on students who were not in our group. Formale speakers, a one (1) means that the program successfully identified the speaker as a non-
8
7/28/2019 BIOEN 303 Final Project Report
10/21
group member. For female speakers, a one (1) means that the program identified the speaker as
female in addition to being a non-group member.
Table 2: Results from running the voice recognition on non-group members.
Speaker Chun Alber t Adrienne Jason Joshua K imber ly Chri st ina
Successful?1 1 1 1 1 1 1
Our program successfully identified each non-group member on the first try, so we did not
conduct more than one test trial for each speaker.
Discussion
The first challenge we faced in writing our program was isolating the vowels in the spoken
password in order to accurately process each vowel to create their formants. The filter function
(filterZ2) that we designed to perform the isolation task was especially efficient because its
filtering method was simple yet powerful, as can be seen from its output in Figure 3. However,
the designed filter was not capable of overcoming an excessive amount of noise. This drawback
could have been improved by processing the signal with much more delicate and extensivefiltering methods. However, this small problem was avoided in our program by prompting the
speaker to say the password again whenever too much background noise was present.
With the individual vowels separated, the formant of each vowel showed distinctive features that
clearly characterized their sound. In addition, even the formants for the same vowel vary among
individuals. This variability and specificity of the formants allowed for high-resolution
comparison between different individuals. As is shown in Figure 4, x-axis positions and y-axismagnitudes were different for the two speakers when they both spoke the same vowels. Although
it was obvious that the formants vary between individuals, it was not easy to find the peaks thatuniquely describe the speaker because to some degree, peaks overlapped. This required an
extensive analysis of the formant pattern of our members. By conducting a thorough statisticalanalysis on a large number of voice samples obtained within our group, we managed to constructa comprehensive database that contained the characteristic peaks associated with each individual
and their variability compared with others.
The biggest challenge was designing the comparison method (comparor2) that determines the
speakers identity. This stage was especially difficult because the formants varied even for the
same person, depending on the environment the speaker was in and their physical condition.
However, we were confident that with 18 different peak positions (six from each vowel), wecould accurately distinguish between various individuals.
Finding the right threshold values was the key to the success of the comparison method. Therewere two main factors that contradicted each other, making the fine-tuning process problematic:
a higher threshold made it too hard for the speaker to pass the test even though he was a group
member; on the other hand, a threshold that was too low made the program vulnerable to falseidentification of a non-member. To overcome this dilemma, our program was designed to give
bonus points when the number of matches exceeded a certain value for each vowel. Additionally,
when there was no match for a vowel, penalties were given to the final score. Furthermore, by
separating the comparison process into two sequential levels (comparing x-axis then y-axisvalues), the program was able to produce a wider spectrum for degree of similarity. With the
9
7/28/2019 BIOEN 303 Final Project Report
11/21
implementation of the bonus-penalty points system, the overall comparison method successfully
determined the identity of the speaker.
Even with the method designed as described above, we still observed cases where the right
person got a final score below the threshold value. We fixed this problem in the program by
giving the speaker a second chance so as to not sacrifice the rigorous nature of the comparisonmethod. An extra precautionary step was taken by adding this additional comparison method in
series with the main comparison method. Provided that the final score was just few points below
the threshold and that the second comparison method produced the same result as the first, thedetermined identity of the speaker was confirmed.
The test trials with the members of our group resulted in an average success rate of 80%. Testtrials with non-member speakers were also conducted, and the result was 100% accurate.
However, the test was done with only seven volunteers. The test trials and the scores obtained
from each trial revealed that the accuracy was heavily influenced by the environment and the
physical condition of the speaker, and as a result, there was a big difference between the success
rates of our members. For example, as seen in Table 1, Andy was not recognized or was falselyrecognized four out of ten times, whereas the other three subjects passed the test with 80% or
90% accuracy. This disparity in accuracy between group members shows that there is still muchroom for improvement. In general, the performance was hampered when there was an excessive
amount of noise or when the subject was tired, thus having a lower than normal voice.
Although the formant comparison method was useful for distinguishing between vowels of
different individuals, it did not suggest anything about the pitch of the sound. Therefore, we
added one final method using cepstrums to our program because the most conspicuousinformation that can be extracted from the cepstrum is the fundamental frequency. We used the
cepstrum to find the overall pitch of the speakers voice and to decide the gender of the speaker.However, since our group consisted of all male members, the cepstrum method was not used as a
factor in determining the identity of the speaker.
There are many parts of our program where much improvement could have been made if moretime was allotted. Firstly, in the filtering stage, the function could have been made to be capable
of cleaning out noise components outside the normal frequency range of the human voice. We
suspect that, to some extent, the unfiltered noise interfered with the calculation of the formants.This interference might have contributed largely to the undesirable variability in formant values
that significantly reduced the accuracy of our program.
In addition, it was observed that the formants varied greatly between people for certain vowels
whereas the variability was minimal for other vowels. Choosing an appropriate password phrase
that contains the vowels that are easier to analyze seems to be the key for higher accuracy inidentifying the speaker. In the comparison stage, a more in-depth and comprehensive statistical
analysis of the formants is needed in order to make the method more reliable. Particularly,
identifying the peaks that are unique to each speaker seems to be the most essential part of the
process. In any future voice recognition design projects, we would consider each of theaforementioned problems more closely.
10
7/28/2019 BIOEN 303 Final Project Report
12/21
Overall, we strived to make our program code flexible so that it could be modified with ease later
on. To make editing as convenient as possible, we used variables for the important recurringvalues and made sub-methods that handled smaller tasks whenever applicable. It was especially
hard to manage the size of each function because there were many variables that had to be passed
on to other functions, so to step back and divide the whole voice recognition program into pieces
was the most difficult task to be completed. Once each sub-methods role was ironed out, thework was divided evenly among our group members, and the rest of the project came together
smoothly as a result of careful planning.
Conclusion
With the variability and the uncertainty associated with our voices, it is most likely impossible to
design an identification method based solely on a voice signal that is as accurate asfingerprinting. However, in this project, we demonstrated that to a certain degree, our voices do
convey unique features that may enable us to accurately identify the speaker. Despite the time
constraint and insufficient knowledge of our vocal system, we managed to design a voice
recognition program that can identify the speaker with 80% confidence. This indicates that if
enough effort and time was spent, the program could have beenimproved to be a viableapplication. The improvements needed mostly lie in the statistical analysis of the formants of
each vowel. Understanding the characteristics of the formants is the most crucial part of thedevelopment of a voice recognition system. In the future, with more knowledge of our vocal
systems and voices, a much improved speaker recognition systems could be designed and used
for many applications.
References1
Propper, Ryan. Speech recognition: Enabling tomorrows breakthroughs in human-computerinteraction. . Retrieved
February 16, 2007.
2 Pasich, Chris. Introduction to Speaker Identification. .
Retrieved February 16, 2007.
3Neel, Amy T. Formant detail needed for vowel identification.Acoustics Research Letters
Online. Vol. 5, Issue 4 (2004): 125-131.
4Childers, D.G., D.P. Skinner, R.C. Kemerait. The cepstrum: a guide to processing.
Proceedings of the IEEE. Vol. 65, Issue 10 (1977): 1428-1443.
11
7/28/2019 BIOEN 303 Final Project Report
13/21
Appendix (MATLAB code)
%r ecor dMai n2 i s t he mai n f unct i on t hat r uns our voi ce r ecogni t i on pr ogr am. %Thi s f unct i on al so cal l ed on f i l t er Z2, compar or 2, and pi t chf i nder
%Out put : i dent i f y who the speaker i s and t he scor e f or by compar i ng wi t h
%t he data bankl oad everyt hi ng. mat
%passTheSi gnal Test i s t he var i abl e t hat al l ows us t o br eak out of t he whi l e%l ooppassTheSi gnal Test = - 1;
%t hi s whi l e l oops keeps on r unni ng unt i l t he speaker has noi se- f r ee i nput %si gnal whi l e passTheSi gnal Test == - 1;
% Prompt f or passwordwavpl ay(promptPW, f sprompt) ; % Say t he passworddi spl ay( ' BEGI N I N' ) ; pause( 1) ; di spl ay( ' 3' ) ; pause( 1) ; di spl ay( ' 2' ) ; pause( 1) ; di spl ay( ' 1' ) ; pause( 1) ; di spl ay( ' GO! ' ) ; Fs = 11025; ori g_si g = wavr ecord( 5*Fs, Fs, ' doubl e' ) ; di spl ay( ' STOP! ' ) ;
pause( 1) ; di spl ay( ' ' ) ; % Cur r ent l y i nspecti ngdi spl ay( ' Cur rent l y i nspect i ng. . . ' ) ; wavpl ay( i nspect i ngPW, f si nspect i ng) ; pause(0. 5) ; wavpl ay( or i g_si g, Fs) ;
[ f i l t er ed_si g, t hr e2, passTheSi gnal ] =f i l t er Z2( or i g_si g, passTheSi gnal Test ) ;
i f passTheSi gnal ==1break
end
%speaker di d not speak wi t h a cl ear voi ce or had si gni f i cant %background noi sei f passTheSi gnal Test == - 1
di spl ay( ' Pl ease say i t agai n. . . l oud, cl ear and SLOWLY' ) end
end
12
7/28/2019 BIOEN 303 Final Project Report
14/21
% compar e t est Bank wi t h st ored bank.[ i dent i t y scor es i ndi ce peak] = compar or 2( or i g_si g, t hr e2, kwang_dat a,dataBank2) ;
di spl ay( ' Fi ni shed i nspecti ng. ' ) ; % pl ays t he wav f i l e "speaker i s"
wavpl ay( speaker I D, f sspeakerI D) ; pause(0. 5) ;
% cal l s t he pi t chf i nder t o det er mi ne t he pi t chpi t ch = pi t chf i nder ( or i g_si g, Fs);
i f ( pi t ch > 180) i dent i t y = 6; %i dent i t y = 6 means when pi t ch i s hi gher t han 180 Hz, we can say the%voi ce i s f r om a gi r l
end
%Tel l s who t he speaker i s
% Al i i f i dent i t y == 2
wavpl ay( al i I D, f sal i I D) ; di spl ay( ' Wel come home! ' ) ;
% Andyel sei f i dent i t y == 3
wavpl ay( andyI D, f schar l i eI D) ; di spl ay( ' Wel come home! ' ) ;
% Char l i eel sei f i dent i t y == 4
wavpl ay(char l i eI D, f skwangI D) ; di spl ay( ' Wel come home! ' ) ;
% Kwangel sei f i dent i t y == 1
wavpl ay(kwangI D, f sal i I D) ;di spl ay( ' Wel come home! ' ) ;
% J a eel sei f i dent i t y == 5
wavpl ay(j aeI D, f sj aeI D) ; di spl ay( ' Wel come home! ' ) ;
% Unknown speaker el sei f i dent i t y == 0wavpl ay( unknownI D, f sunknownI D) ; di spl ay( ' St ep away f r omt he door . The pol i ce have been not i f i ed. ' ) ;
% Unknown f emal e speaker el sei f i dent i t y == 6
wavpl ay( gi r l , gi r l I D) ; di spl ay( ' Sor ry you are a gi r l . . . ' )
end
13
7/28/2019 BIOEN 303 Final Project Report
15/21
% f i l t er Z2 f i l t er s and nor mal i zes the i nput si gnal .
f uncti on [ f i l t er ed_si g, t hr e2, passTheSi gnal Test ] = f i l t er Z2( passwor d,passTheSi gnal )
%% I nput s: passwor d = t he i nput si gnal whi ch i s t he voi ce si gnal% passTheSi gnal = t he var i abl e t hat can be 1 or - 1 and checks i f% t he spoken phr ase has t he same number of words as t he passwor d%% Out put s: f i l t er ed_si g = t he f i l t er ed si gnal % t hr e2 = t he matr i x t hat cont ai ns t he st ar t and end i ndeces of% t he words% passTheSi gnal Test = t hi s var i abl e al so i s 1 when t he voi ce% si gnal has same number of words and - 1 when i t doesn' t % Oper at i on: t hi s f uncti on f i l t er s t he voi ce si gnal and r et ur ns t he st ar t % and end i ndi ces of t he words%
%set t i ng t he of f set of t he si gnal t o zer omean_or i g_si g = mean( passwor d) ; password = abs( password - mean_or i g_si g) ;
%convol vi ng t he si gnal wi t h a t r i angl e wi ndow t o f i l t er t he hi gh%f r equenci est r i = t r i ang( 512) ; f i l t er ed_si g = conv( passwor d, t r i ) ;
%nor mal i zed t he si gnal by di vi di ng t he si gnal by the mazi mum val uef i l t er ed_si g = f i l t er ed_si g/ max( f i l t er ed_si g) ;
%mor e f i l t er i ng by set t i ng a t hr eshol d = 0. 2t hr e1 = f i l t er ed_si g > 0. 2;
st ar t = 0; f i ni sh = 0; f i l t ered_si g = f i l t ered_si g. *thre1;
%t he mat r i x hol d t he st ar t and end poi nt s of t he vowel st hr e2 =[ ] ;
%t hi s vari abl e makes sure t hat t he l engt h of t he vowel s ar e reasonabl el _l i mi t = 1500;
%number of wor dsnumber = 3;
%r et ur ns t he or i gi nal si gnalpassTheSi gnal Test = passTheSi gnal ;
%t he f ol l owi ng codes f i nd t he st art and end poi nt s of t he vowel s and st ore%t he i ndi ces i n thr e2f or i = 1: ( l engt h( f i l t ered_s i g) - 1) ;
i f( f i l t er ed_si g( i ) == 0 && f i l t er ed_si g( i +1) >= 0) st ar t = ( i +1) ;
el sei f ( f i l t er ed_si g( i ) >= 0 && f i l t er ed_si g( i +1) == 0)
14
7/28/2019 BIOEN 303 Final Project Report
16/21
f i ni sh = ( i ) ; endi f( ( f i ni sh - star t ) > l _ l i mi t )
t hr e2 = [ t hr e2; [ start f i ni sh] ] ; el se
f i l t ered_s i g( start : f i ni sh) = 0; end
end
%compar i ng t he number of number spoken wor ds wi t h t he number of wor ds i n the%passwor di f ( l ength( t hre2) == number )
passTheSi gnal Test = 1; end
% f ormant gen f i nds t he f ormant s of t he password phr ase.
f uncti on [ f ormant ] = f ormant gen( si gnal , order )
%% I nput s: si gnal = t he i nput si gnal % order = oreder of aut oregr essi ve model ( AR) used t o est i mate PSD% Out put s: f ormant = t he power spect r al densi t y (PSD) of t he si gnal i n dB% scal e
%f i ndi ng PSD of t he si gnal usi ng Yul e- Wal ker met hod wi t h speci f i ed orderf ormant = pyul ear ( si gnal , or der ) ;
%conver t i ng t he PSD t o dB scal ef ormant = 10*l og10( f ormant ) ; %pl ot PSDpl ot ( f or mant ) ;
% comparor2 compares t he f ormant s f r omt he i nput voi ce si gnal t o t he% f ormant s i n the database and det er mi nes t he i dent i t y of t he speaker . % % si gnal = i nput voi ce si gnal % i ndi ces = begi nni ng and endi ng of each vowel segment % % databank = database of i ndex and magni t ude of al l t he peaks of t he% f ormant s of each vowel obt ai ned f r om t he members of our gr oup% % databank2 = dat abase of i ndex and magni t ude of t he f i r st t wo peaks of t he% f ormant s
f uncti on [ i dent i t y scores i ndi ce peak] = comparor2( si gnal , i ndi ces, databank,dat abank2)
% Al l owabl e r ange i n peak posi t i on compar i son. Det ermi nes t he secur i t y l evel . n2 = 1;
% Number of vowel s i n the voi ce si gnal n = l engt h( i ndi ces( : , 1) ) ;% I dent i t y of t he speaker . Set t o be undeter mi ned ( denoted as - 1)
15
7/28/2019 BIOEN 303 Final Project Report
17/21
i dent i t y = - 1;i f n < 3 | | n > 3 I f more t han 3 syl l abl es, i dent i t y r emai ns undet ermi ned. %
i dent i t y = - 1; el se
[ r ow col umn] = si ze( dat abank) ; % Si ze of dat abank i s obt ai ned.scores = [ ] ; % scores f or al l t hr ee vowel s and al l member si ndi ce = [ ] ; % i ndi ces of t he peaks i n t he f or mant s of t he i nput si gnal . peak = [ ] ; % magni t udes of t he peaks i n t he f ormant s of t he i nput si gnal .
% Repeat t he process as many t i mes as t he number of vowel s i n t he i nput% si gnal . f or i = 1: n
% Gets t he segment t hat cont ai ns a si ngl e vowel . si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ;
% Gets t he f ormant f r om t he segment . f ormant = f ormant gen( si gnal _seg, 20) ;
% Gets i ndi ces and magni t udes of al l peaks of t he f ormant . [ i ndex peaks] = peakmast er3( f ormant , 2) ;% Saves t he i ndi ces f or vi ewi ng pur pose. i ndi ce = [ i ndi ce i ndex' ] ;
% Saves t he magni t udes f or vi ewi ng pur pose. peak = [ peak peaks' ] ;
i f l engt h( i ndex) < 3 % I f t her e ar e l ess t han 3 peaks i n one f ormant i dent i t y = - 1; % set t he i dent i t y as undet er mi ned andbreak; % br eak
end
% scor es obt ai ned by compar i ng t he f ormant s of t he i nput si gnal
% wi t h t he f or mant s of di f f er ent peopl e f or t he sel ect ed vowel . score = [ ] ;
% r epeat as many t i mes as t he number of r ows i n dat abankf or j = 1: r ow
s = 0; % saves poi nt s f r om x- axi s compar i sonp = 0; % saves poi nt s f r om y- axi s compar i son
% r epeat as many t i mes as t he number of peaks i n the f ormant f or k = 1: l engt h( i ndex)
% r epeat as many t i mes as t he number of peaks f or t he vowel
% of t he person i n databankf or l = 1: l engt h( dat abank{j , i }( : , 1) )
% i f t he x- axi s posi t i ons mat chi f( i ndex( k) = ( dat abank{j , i }( l , 1) -cei l ( dat abank{j , i }( l , 2) ) ) )
% Add 1 t o the t ot al scor e f or t he x- axi s compari sons = s + 1;
16
7/28/2019 BIOEN 303 Final Project Report
18/21
% I f t he magni t ude of t he peak t hat passed t he x- axi s% compar i son f al l s i n t he r ange i n y- axi si f peaks( k) = ( dat abank{j , i }( l , 3) - n2 - ( dat abank{j , i }( l , 4) ) )
% Add 2 t o the t ot al scor e f or t he y- axi s% comapr i sonp = p + 2;
endend
endend
i f ( l - s ) < 3 % I f t her e ar e l ess t han 3 mi smatches i n x- axi ss = s + 1; % bonus 1 poi nt
el sei f ( l - s ) 5 % I f t her e ar e more t han 3 matches i n y- axi s
p = p + 3; % bonus 3 poi nt sel sei f p > 7 % i f t her e ar e more t han 4 matches i n y- axi s
p = p + 4; % bonus 4 poi nt sends = s + p; % Fi nal scor escor e =[ scor e; s] ; % Save t he f i nal scores f or each vowel
endscores = [ scores score] ; % Save t he f i nal scor es f or al l t he vowel s
end
end
sums = [ ] ; % sum f or each per son i n t he database
% By summi ng each row i n scor es, t he t otal score f or each person i n% database i s obt ai ned. f or z = 1: l engt h( scor es( : , 1) )
i f l engt h( f i nd( scor es( z, : ) ==0) ) > 0sums = [ sums; ( sum( scores( z, : ) ) - 3) ] ;
el sesums = [ sums; sum( scor es( z, : ) ) ] ;
endend
% Fi nd ent r i es i n dat abase t hat exceeds t he thr eshol d. absol ut e = f i nd( sums >= 18) ;
% Fi nd ent i r es t hat mi ght need a second chancenotsur e = f i nd( sums >= 10) ;
17
7/28/2019 BIOEN 303 Final Project Report
19/21
% Run t he second compar i son met hod[ i dent i t y2 poi nt s aver ages] = compar or ( si gnal , i ndi ces, dat abank2) ; % I f any one of t he ent r i es i n dat abase exceeds t he t hr eshol di f l engt h( absol ut e) > 0
i f l engt h( absol ut e) > 1 % I f t her e i s more than one match% The one who got t he maxi mum scor es i s det ermi ned as t he mat ch. i dent i t y = f i nd( max( sums) == sums) ;
el se% I f t her e i s onl y one, i dent i t y i s det er mi ned. i dent i t y = absol ut e( 1) ;
end% I f no one passed t he thr eshol d, but some were cl oseel sei f l engt h( notsur e) > 0
% I f t he second compar i son method agr ees wi t h any one of t he candi dat esi f l engt h( f i nd( not sur e == i dent i t y2) ) > 0
% I dent i t y i s det er mi ned. i dent i t y = not sur e( f i nd( not sur e == i dent i t y2) ) ;
el se% I f t he above cr i t er i a ar e not met , t he speaker i s an unknown
% person. i dent i t y = 0;
endel se
i dent i t y = 0; % I f t here' s no one above 10, t he speaker i s unknown. end
%pi t chf i nder i s cal l ed by r ecor dMai n2 t o f i nd t he pi t ch of t he si gnal
%Thi s f unct i on f i nds t he aver age pi t ch of t he speaker %I nput = si gnal ( or i gi nal wave si gnal ) , f s ( sampl i ng f r equency) %Out put = Pi t ch ( i n Hz) and cepst r um spect r umf uncti on [ pi t ch, cepst r um] = pi t chf i nder ( si gnal , f s)
%r ceps i s a mat l ab f unct i on. I t does t he f our i er t r ansf or m of t he%l og of t he f our i er t r ansf or m of t he or i gi nal si gnal si gnal _ceps = r ceps( si gnal ) ;
%Set t i ng t he l i mi t t hr eshol d f or t he human voi ce pi t chupper l i mi t = 300; l owerl i mi t = 70; t hr eshol d = r ound( f s/ upper l i mi t ) ; l i mi t = r ound( f s/ l ower l i mi t ) ;
%t aki ng t he si gnal f r omt he speci f i c set of t he cepst r um- ed si gnal cepst r um_or i g = si gnal _ceps( t hr eshol d: l i mi t ) ;
%appl yi ng a but t erwort h f i l t er on t he cepst r um- ed si gnal [ b a] = but t er ( 10, 0. 1) ; cepstr um = f i l t f i l t ( b, a, cepstr um_or i g) ;
%pi t ch i s t he sampl i ng f r equency di vi ded by t he maxi mum peaks i n t he%cepst r umdomai n. Max peak i n t he cepst r um domai n i s t he f undenment al %f r equencypi t ch = f s/ ( f i nd( max( cepst r um_or i g) == cepst r um_or i g) +t hr eshol d) ;
18
7/28/2019 BIOEN 303 Final Project Report
20/21
%peakmast er3 f i nds t he peaks of t he i nput si gnal .
%i nput : si gnal of t he wave t hat we want t o f i nd peaks f or, Thol d i s t he%t hr eshol d val ue
%out put : gi ves t he i ndex and t he hei ght of t he peaks
f uncti on [ Ta Tb] = peakmast er3( si gnal , Thol d)
%st ar t i ng out wi t h t hr eshol d val uev = 0; x = 1; T=[ ] ; pr ev = si gnal ( 1) ; mi nV = mi n(si gnal ) ; max = [ 0, mi nV] ; l = l engt h( si gnal ) ;
%l oop t hat compar es each val ue. . . i f t he next val ue i s hi gher , t hen i t %i s count ed as a peakwhi l e x < l
i f si gnal ( x) > max( 2) && si gnal ( x) > pr evmax = [ x, si gnal ( x)] ; v = v + si gnal ( x) - pr ev; el se i f ( max(2) ~= mi nV) && ( si gnal ( x) < ( max(2) - Thol d) )
i f v > Thol d- 3. 9T = [ T; max] ;
endmax( 2) = mi nV; v = 0;
endendpr ev = si gnal ( x); x = x+1;
end
% comparor t akes an i nput voi ce si gnal and ext r act s segment s t hat cont ai n% vowel s. Then gets t he f ormant s f or each vowel and compares t he f i r st t wo% peaks wi t h t he data i n dat abank.
% si gnal = i nput voi ce si gnal % i ndi ces = begi nni ngs and endi ngs of each vowel % databank = database of t he f i r st t wo peaks of t he f ormant of each vowel % f or al l t he member s of our gr oup.
f uncti on [ i dent i t y scor es aver ages] = compar or ( si gnal , i ndi ces, dat abank)
scores = [ ] ; % Fi nal scor es f or al l t hr ee vowel s f or al l t he member s
n = l engt h( i ndi ces( : , 1) ) ; % number of vowel s i n t he i nput si gnal i dent i t y = - 1; % i dent i t y i s i ni t i al l y set t o be undet er mi ned
19
7/28/2019 BIOEN 303 Final Project Report
21/21
% r epeat s t he compar i son pr ocess as many t i mes as t he number of vowel s i n% t he i nput si gnal f or i = 1: n
si gnal _seg = si gnal ( i ndi ces(i , 1) : i ndi ces(i , 2) ) ; % get s t he nth segment f ormant = f ormant gen( si gnal _seg, 20) ; % get s t he f ormant of t he segment [ i ndex peaks] = peakmast er3( f ormant , 2) ; % f i nds t he peaks i n t he f ormant i ndex = i ndex' ; peaks = peaks' ;
% I f t her e ar e l ess t han t hr ee peaks, i dent i t y i s undet er mi ned and% br eak out . i f l engt h( i ndex) < 3
i dent i t y = - 1; break;
end
vowel scor e =[ ] ; % scor es f or al l ent r i es f or t he vowel bei ng compar ed.f or k = 1: l engt h( dat abank( : , 1) )
% per cent age di f f er ence of t he x- axi s posi t i ons and t he y- axi s% posi t i ons of t he two peakss = abs( ( i ndex( 1, 1) - dat abank(k, ( 4*i - 3) ) ) / dat abank( k, ( 4*i - 3) ) ) ; s = s + abs( ( peaks( 1, 1) - dat abank(k, ( 4*i - 2) ) ) / dat abank(k, ( 4*i - 2) ) ) ; s = s + abs( ( i ndex( 1, 2) - dat abank( k, ( 4*i - 1) ) ) / dat abank( k, ( 4*i - 1) ) ) ; s = s + abs( ( peaks( 1, 2) - dat abank(k, ( 4*i ) ) ) / dat abank(k, ( 4*i ) ) ) ;
vowel score = [ vowel score; s] ; % sums t he percent age di f f erencesend
scores =[ scores vowel score] ; % saves al l t he sums
end
% comput es aver age of t he per cent age di f f er ences i n t he t hr ee vowel s f or % each personaver ages = [ ] ; f or h = 1: l engt h( scor es( : , 1) )
aver age1 = abs( ( scor es( h, 1) +scor es( h, 2) +2*scores( h, 3) ) / 4) ; aver ages = [ aver ages; aver age1] ;
end
% I f t he mi ni mum aver age per cent age di f f er ence i s bel ow 1 ( t hr eshol d) , t he% speaker i s i dent i f i ed.i f mi n( aver ages) < 1
i dent i t y = f i nd( averages == mi n( averages) ) ; el se
i dent i t y = 0; end