Upload
nika
View
53
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Text Independent Speaker Identification Using Gaussian Mixture Model. Chee -Ming Ting Sh-Hussain Salleh Tian-Swee Tan A . K. Ariff . International Conference on Intelligent and Advanced Systems 2007. Jain- De,Lee. OUTLINE. INTRODUCTION GMM SPEAKER IDENTIFICATION SYSTEM - PowerPoint PPT Presentation
Citation preview
Text Independent Speaker Identification Using Gaussian Mixture Model
International Conference on Intelligent and Advanced Systems 2007
Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff.
Jain-De,Lee
INTRODUCTION
GMM SPEAKER IDENTIFICATION SYSTEM
EXPERIMENTAL EVALUATION
CONCLUSION
OUTLINE
Speaker recognition is generally divided into two tasks◦ Speaker Verification(SV)◦ Speaker Identification(SI)
Speaker model ◦ Text-dependent(TD)◦ Text-independent(TI)
INTRODUCTION
Many approaches have been proposed for TI speaker recognition◦VQ based method◦Hidden Markov Models◦Gaussian Mixture Model
VQ based method
INTRODUCTION
Hidden Markov Models◦ State Probability◦ Transition Probability
Classify acoustic events corresponding to HMM states to characterize each speaker in TI task
TI performance is unaffected by discarding transition probabilities in HMM models
INTRODUCTION
Gaussian Mixture Model
◦Corresponds to a single state continuous ergodic HMM◦Discarding the transition probabilities in the HMM models
The use of GMM for speaker identity modeling
◦ The Gaussian components represent some general speaker-dependent spectral shapes
◦ The capability of Gaussian mixture to model arbitrary densities
INTRODUCTION
The GMM speaker identification system consists of the following elements
◦ Speech processing
◦Gaussian mixture model
◦ Parameter estimation
◦ Identification
GMM SPEAKER IDENTIFICATION SYSTEM
The Mel-scale frequency cepstral coefficients (MFCC) extraction is used in front-end processing
Speech Processing
Input Speech Signal Pre-Emphasis Frame Hamming
Window
FFTTriangularband-pass
filterLogarithm DCT
Mel-sca1e cepstral feature analysis
The Gaussian model is a weighted linear combination of M uni-model Gaussian component densities
The mixture weight satisfy the constraint that
Gaussian mixture model
M
iii xbwxp
1
)()|(
Where is a D-dimensional vectorxare the component densitiesMixbi ,...,1),(
wi , i=1,…,M are the mixture weights
M
iiw
1
1
Each component density is a D-variate Gaussian function of the form
The Gaussian mixture density model are denoted as
Gaussian mixture model
)}()(21exp{
||)2(1)( 1
2/12/ iiT
ii
D xxxbi
Where is mean vectori
is covariance matrixi
Miw iii ,...,1),,,(
Conventional GMM training process
Parameter estimation
Input training vector
LBG algorithm
EM algorithm
Convergence EndY
N
LBG AlgorithmInput training
vector
Overall average
Split
Clustering
Cluster’saverage
Calculate Distortion (D-D’)/D< δ
D’=D
m<M End
N Y
Y N
Speaker model training is to estimate the GMM parameters via maximum likelihood (ML) estimation
Expectation-maximization (EM) algorithm
EM Algorithm
T
ttxpXp
1
)|()|(
T
tti xip
Tw
1
),|(1
T
t t
T
t tti
xip
xxip
1
1
),|(
),|(
2
1
12
2
),|(
),|(iT
t t
T
t tti
xip
xxip
This paper proposes an algorithm consists of two steps
Parameter estimation
Cluster the training vectors to the mixture component with the highest likelihood
Re-estimate parameters of each component
Parameter estimation
)(maxarg1
xbC iMi
i
number of vectors classified in cluster i / total number of training vectors
iw
sample mean of vectors classified in cluster i.i
sample covariance matrix of vectors classified in cluster ii
The feature is classified to the speaker ,whose model likelihood is the highest
The above can be formulated in logarithmic term
IdentificationS
SkkXpS
1)|(maxargˆ
T
tktSk
xpS11
)|(logmaxargˆ
Database and Experiment Conditions◦ 7 male and 3 female◦ The same 40 sentences utterances with different text◦ The average sentences duration is approximately 3.5 s
Performance Comparison between EM and Highest Mixture Likelihood Clustering Training◦ The number of Gaussian components 16◦ 16 dimensional MFCCs◦ 20 utterances is used for training
EXPERIMENTAL EVALUATION
Convergence condition
EXPERIMENTAL EVALUATION03.0|)|()|(| )()1( kk XpXp
EXPERIMENTAL EVALUATION The comparison between EM and highest likelihood
clustering training on identification rate◦ 10 sentences were used for training◦ 25 sentences were used for testing◦ 4 Gaussian components◦ 8 iterations
EXPERIMENTAL EVALUATION Effect of Different Number of Gaussian Mixture
Components and Amount of Training Data◦MFCCs feature dimension is fixed to 12◦ 25 sentences is used for testing
EXPERIMENTAL EVALUATION Effect of Feature Set on Performance for Different
Number of Gaussian Mixture Components◦Combination with first and second order difference coefficients
was tested◦ 10 sentences is used for training◦ 30 sentences is used for testing
Comparably to conventional EM training but with less computational time
First order difference coefficients is sufficient to capture the transitional information with reasonable dimensional complexity
The 12 dimensional 16 order GMM and using 5 training sentences achieved 98.4% identification rate
CONCLUSION