Upload
glynn
View
108
Download
3
Embed Size (px)
DESCRIPTION
Speaker Recognition. Searching & Decoding in SR ECE5527 Wilson Burgos. Outline. Introduction Objective Implementation Simulation Test and Result Conclusion. Introduction. Speaker Recognition aims to recognize speakers from their voices Divided into identification and verification. - PowerPoint PPT Presentation
Citation preview
Speaker RecognitionSearching & Decoding in SR
ECE5527
Wilson Burgos
OutlineIntroductionObjectiveImplementationSimulation Test and ResultConclusion
IntroductionSpeaker Recognition aims to recognize
speakers from their voicesDivided into identification and verification.Identification determines which registered
speaker provides inputVerification determine if the speaker is really
the person.Speaker recognition can be text-dependent or
text-independent
IntroductionText dependent, speakers say same key
words for training and recognition.Text independent, system identifies the
speaker regardless.The goal of the study is a real-time text
dependent identification system that compares signals from unknown speakers to the database of known speakers.
ObjectiveReal Text Dependent tool for speaker identification
using sphinx4
The tool will have two modesTrainingDetection or recognition
During training mode, feature models are created from user voices
The detection phase uses that model to identify the speaker
Concept of OperationThe system uses the Mel Frequency Cepstral
Coefficients (MFCC) and the Vector Quantization (VQ) algorithms.
The Kmeans clustering algorithm was used.
Concept of OperationFeature Extraction using MFCC
Pre-emphasis Windowing FFT Mel filter-bank log IFFT
Energy Deltas
MFCC12 Coefficients
MFCC - 12 CoefficientsEnergy - 1 Coefficient∆ MFCC – 12 Coefficients∆ Energy - 1 Coefficient∆∆ MFCC – 12 Coefficients∆∆ Energy - 1 Coefficient
Concept of OperationFeature MatchingTraining enrolls the speaker creating a
unique model based on it’s featuresTesting computes a score and matches to the
speaker with the minimum matching score.
Concept of OperationVector QuantizationLarge number of vectors, reduced while
maintaining characteristicsCodebook are generated for each speakerKmeans partitions the feature vectors into
some number of centroids.
Concept of OperationDemonstration of the standard algorithm
1) k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).
Concept of Operation
2) k clusters are created by associating every observation with the nearest mean.
Concept of Operation
3) The centroid of each of the kclusters becomes the new means.
Concept of Operation
4) Steps 2 and 3 are repeated until convergence has been reached.
Concept of OperationIn the detection phase the unkown speaker
feature vector is compared to all the codebook vectors in the database.
The speaker with the lowest score is chosen.The score is defined as the average of the
Euclidean distances
Concept of OperationSphinx4 DataProcessorSphinx uses the front ends to perform the
specific signal processing
The baseclass implements a DataProcessor to get the mfcc coefficients from the chain
Concept of OperationThe configuration file was updated to reflect
this:
<item>dct </item> <item>liveCMN </item> <item>featureExtraction </item>
<item>featureStore </item>
ImplementationThe Mfcc is data get’s stored in a
Vector<float[]>The array of all the mfcc coefficients is used
as the input to the Kmeans algorithm to generate the clusters.
The codebook that gets generated is store into a Hashtable<Integer,Speaker> that get’s serialized into a file for later retrieval
Codebook
Concept of Operation
KMeans
Speaker Clusters Disk
Sphinx4 featureStore
Simulation Test and ResultResults using wave files from recorded
speech 10 speakersThe MFFC used are 38 coefficients from
sphinx4The number of filter banks is set
automatically by sphinx 40 for 16KhzThe codebook size was set to 62
Simulation Test and ResultAverage Euclidean Distance of speakers
K=8,cb=62A B C D E F G H I JA 0.
02.6 2.4 2.4 2.2 2.1 1.8 2.2 2.0 2.2
B 2.3 0.0
1.9 2.3 2.1 1.9 2.1 2.0 2.1 1.9
C 1.5 1.5 0.1
1.7 1.4 1.1 1.5 1.2 1.3 1.1
D 2.1 2.0 1.9 0.0
1.5 1.5 1.6 1.7 2.0 1.7
E 1.7 1.6 1.3 1.5 0.8
1.2 1.2 1.2 1.4 1.1
F 1.8 1.6 1.2 1.5 1.2 0.4
1.3 1.2 1.4 1.3
G 1.8 1.6 1.5 1.5 1.1 1.2 0.4
1.4 1.3 1.2
H 1.5 1.7 1.2 1.6 1.3 1.3 1.3 0.5
1 1.3
I 1.6 1.7 1.1 1.7 1.3 1.3 1.4 1.2 0.2
1.3
Simulation Test and ResultNumber of Clusters vs Identification Rate
0 10 20 40 50 60 640
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Referenceshttp://cmusphinx.sourceforge.net/sphinx4/Internation Journal an EE Independent
Speaker Identification ,Vol 3 2011