Speaker Recognition

Speaker RecognitionSearching & Decoding in SR

ECE5527

Wilson Burgos

OutlineIntroductionObjectiveImplementationSimulation Test and ResultConclusion

IntroductionSpeaker Recognition aims to recognize

speakers from their voicesDivided into identification and verification.Identification determines which registered

speaker provides inputVerification determine if the speaker is really

the person.Speaker recognition can be text-dependent or

text-independent

IntroductionText dependent, speakers say same key

words for training and recognition.Text independent, system identifies the

speaker regardless.The goal of the study is a real-time text

dependent identification system that compares signals from unknown speakers to the database of known speakers.

ObjectiveReal Text Dependent tool for speaker identification

using sphinx4

The tool will have two modesTrainingDetection or recognition

During training mode, feature models are created from user voices

The detection phase uses that model to identify the speaker

Concept of OperationThe system uses the Mel Frequency Cepstral

Coefficients (MFCC) and the Vector Quantization (VQ) algorithms.

The Kmeans clustering algorithm was used.

Concept of OperationFeature Extraction using MFCC

Pre-emphasis Windowing FFT Mel filter-bank log IFFT

Energy Deltas

MFCC12 Coefficients

MFCC - 12 CoefficientsEnergy - 1 Coefficient∆ MFCC – 12 Coefficients∆ Energy - 1 Coefficient∆∆ MFCC – 12 Coefficients∆∆ Energy - 1 Coefficient

Concept of OperationFeature MatchingTraining enrolls the speaker creating a

unique model based on it’s featuresTesting computes a score and matches to the

speaker with the minimum matching score.

Concept of OperationVector QuantizationLarge number of vectors, reduced while

maintaining characteristicsCodebook are generated for each speakerKmeans partitions the feature vectors into

some number of centroids.

Concept of OperationDemonstration of the standard algorithm

1) k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

Concept of Operation

2) k clusters are created by associating every observation with the nearest mean.


3) The centroid of each of the kclusters becomes the new means.

http://en.wikipedia.org/wiki/Centroid


4) Steps 2 and 3 are repeated until convergence has been reached.

Concept of OperationIn the detection phase the unkown speaker

feature vector is compared to all the codebook vectors in the database.

The speaker with the lowest score is chosen.The score is defined as the average of the

Euclidean distances

Concept of OperationSphinx4 DataProcessorSphinx uses the front ends to perform the

specific signal processing

The baseclass implements a DataProcessor to get the mfcc coefficients from the chain

Concept of OperationThe configuration file was updated to reflect

this:

<item>dct </item> <item>liveCMN </item> <item>featureExtraction </item>

<item>featureStore </item>

ImplementationThe Mfcc is data get’s stored in a

Vector<float[]>The array of all the mfcc coefficients is used

as the input to the Kmeans algorithm to generate the clusters.

The codebook that gets generated is store into a Hashtable<Integer,Speaker> that get’s serialized into a file for later retrieval

Codebook


KMeans

Speaker Clusters Disk

Sphinx4 featureStore

Simulation Test and ResultResults using wave files from recorded

speech 10 speakersThe MFFC used are 38 coefficients from

sphinx4The number of filter banks is set

automatically by sphinx 40 for 16KhzThe codebook size was set to 62

Simulation Test and ResultAverage Euclidean Distance of speakers

K=8,cb=62A B C D E F G H I JA 0.

02.6 2.4 2.4 2.2 2.1 1.8 2.2 2.0 2.2

B 2.3 0.0

1.9 2.3 2.1 1.9 2.1 2.0 2.1 1.9

C 1.5 1.5 0.1

1.7 1.4 1.1 1.5 1.2 1.3 1.1

D 2.1 2.0 1.9 0.0

1.5 1.5 1.6 1.7 2.0 1.7

E 1.7 1.6 1.3 1.5 0.8

1.2 1.2 1.2 1.4 1.1

F 1.8 1.6 1.2 1.5 1.2 0.4

1.3 1.2 1.4 1.3

G 1.8 1.6 1.5 1.5 1.1 1.2 0.4

1.4 1.3 1.2

H 1.5 1.7 1.2 1.6 1.3 1.3 1.3 0.5

1 1.3

I 1.6 1.7 1.1 1.7 1.3 1.3 1.4 1.2 0.2

1.3

Simulation Test and ResultNumber of Clusters vs Identification Rate

0 10 20 40 50 60 640

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Referenceshttp://cmusphinx.sourceforge.net/sphinx4/Internation Journal an EE Independent

Speaker Identification ,Vol 3 2011

http://cmusphinx.sourceforge.net/sphinx4/

Documents

Speaker Recognition