22
Speaker Recognition Searching & Decoding in SR ECE5527 Wilson Burgos

Speaker Recognition

  • Upload
    glynn

  • View
    108

  • Download
    3

Embed Size (px)

DESCRIPTION

Speaker Recognition. Searching & Decoding in SR ECE5527 Wilson Burgos. Outline. Introduction Objective Implementation Simulation Test and Result Conclusion. Introduction. Speaker Recognition aims to recognize speakers from their voices Divided into identification and verification. - PowerPoint PPT Presentation

Citation preview

Page 1: Speaker Recognition

Speaker RecognitionSearching & Decoding in SR

ECE5527

Wilson Burgos

Page 2: Speaker Recognition

OutlineIntroductionObjectiveImplementationSimulation Test and ResultConclusion

Page 3: Speaker Recognition

IntroductionSpeaker Recognition aims to recognize

speakers from their voicesDivided into identification and verification.Identification determines which registered

speaker provides inputVerification determine if the speaker is really

the person.Speaker recognition can be text-dependent or

text-independent

Page 4: Speaker Recognition

IntroductionText dependent, speakers say same key

words for training and recognition.Text independent, system identifies the

speaker regardless.The goal of the study is a real-time text

dependent identification system that compares signals from unknown speakers to the database of known speakers.

Page 5: Speaker Recognition

ObjectiveReal Text Dependent tool for speaker identification

using sphinx4

The tool will have two modesTrainingDetection or recognition

During training mode, feature models are created from user voices

The detection phase uses that model to identify the speaker

Page 6: Speaker Recognition

Concept of OperationThe system uses the Mel Frequency Cepstral

Coefficients (MFCC) and the Vector Quantization (VQ) algorithms.

The Kmeans clustering algorithm was used.

Page 7: Speaker Recognition

Concept of OperationFeature Extraction using MFCC

Pre-emphasis Windowing FFT Mel filter-bank log IFFT

Energy Deltas

MFCC12 Coefficients

MFCC - 12 CoefficientsEnergy - 1 Coefficient∆ MFCC – 12 Coefficients∆ Energy - 1 Coefficient∆∆ MFCC – 12 Coefficients∆∆ Energy - 1 Coefficient

Page 8: Speaker Recognition

Concept of OperationFeature MatchingTraining enrolls the speaker creating a

unique model based on it’s featuresTesting computes a score and matches to the

speaker with the minimum matching score.

Page 9: Speaker Recognition

Concept of OperationVector QuantizationLarge number of vectors, reduced while

maintaining characteristicsCodebook are generated for each speakerKmeans partitions the feature vectors into

some number of centroids.

Page 10: Speaker Recognition

Concept of OperationDemonstration of the standard algorithm

 1) k initial "means" (in this casek=3) are randomly selected from the data set (shown in color).

Page 11: Speaker Recognition

Concept of Operation

2) k clusters are created by associating every observation with the nearest mean.

Page 12: Speaker Recognition

Concept of Operation

3) The centroid of each of the kclusters becomes the new means.

Page 13: Speaker Recognition

Concept of Operation

4) Steps 2 and 3 are repeated until convergence has been reached.

Page 14: Speaker Recognition

Concept of OperationIn the detection phase the unkown speaker

feature vector is compared to all the codebook vectors in the database.

The speaker with the lowest score is chosen.The score is defined as the average of the

Euclidean distances

Page 15: Speaker Recognition

Concept of OperationSphinx4 DataProcessorSphinx uses the front ends to perform the

specific signal processing

The baseclass implements a DataProcessor to get the mfcc coefficients from the chain

Page 16: Speaker Recognition

Concept of OperationThe configuration file was updated to reflect

this:

<item>dct </item> <item>liveCMN </item> <item>featureExtraction </item>

<item>featureStore </item>

Page 17: Speaker Recognition

ImplementationThe Mfcc is data get’s stored in a

Vector<float[]>The array of all the mfcc coefficients is used

as the input to the Kmeans algorithm to generate the clusters.

The codebook that gets generated is store into a Hashtable<Integer,Speaker> that get’s serialized into a file for later retrieval

Page 18: Speaker Recognition

Codebook

Concept of Operation

KMeans

Speaker Clusters Disk

Sphinx4 featureStore

Page 19: Speaker Recognition

Simulation Test and ResultResults using wave files from recorded

speech 10 speakersThe MFFC used are 38 coefficients from

sphinx4The number of filter banks is set

automatically by sphinx 40 for 16KhzThe codebook size was set to 62

Page 20: Speaker Recognition

Simulation Test and ResultAverage Euclidean Distance of speakers

K=8,cb=62A B C D E F G H I JA 0.

02.6 2.4 2.4 2.2 2.1 1.8 2.2 2.0 2.2

B 2.3 0.0

1.9 2.3 2.1 1.9 2.1 2.0 2.1 1.9

C 1.5 1.5 0.1

1.7 1.4 1.1 1.5 1.2 1.3 1.1

D 2.1 2.0 1.9 0.0

1.5 1.5 1.6 1.7 2.0 1.7

E 1.7 1.6 1.3 1.5 0.8

1.2 1.2 1.2 1.4 1.1

F 1.8 1.6 1.2 1.5 1.2 0.4

1.3 1.2 1.4 1.3

G 1.8 1.6 1.5 1.5 1.1 1.2 0.4

1.4 1.3 1.2

H 1.5 1.7 1.2 1.6 1.3 1.3 1.3 0.5

1 1.3

I 1.6 1.7 1.1 1.7 1.3 1.3 1.4 1.2 0.2

1.3

Page 21: Speaker Recognition

Simulation Test and ResultNumber of Clusters vs Identification Rate

0 10 20 40 50 60 640

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 22: Speaker Recognition

Referenceshttp://cmusphinx.sourceforge.net/sphinx4/Internation Journal an EE Independent

Speaker Identification ,Vol 3 2011