View
0
Download
0
Category
Preview:
Citation preview
Mel-Frequency Cepstral Coefficient (MFCC) Implementation in C# based on HTK
Hanbat National University – IISPL Laboratory
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
2 of 20
Introduction What is MFCC?
• Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in automatic speech and speaker recognition.
• Introduced by Davis and Mermelstein in the 1980's, and have been state-of-the-art ever since
• Prior to its introduction, the main feature type for ASRs are:
– Linear Prediction Coefficients (LPCs)
– Linear Prediction Cepstral Coefficients (LPCCs)
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
3 of 20
Introduction An overview
• Audio signal is constantly changing
• To simplify this, we assume audio into short time scales that doesn’t change much
– Frame the signal into 20-40ms frames
– Longer frames will lead to too much changes throughout the frame
– Shorter frames don’t have enough samples to get a reliable spectral estimate
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
4 of 20
Introduction An overview
• Next, we calculate the power spectrum of each frame.
This is motivated by the human cochlea which vibrates at different spots depending on the frequency of the incoming sound
• A periodogram estimates a similar job by identifying which frequencies are present in the frame
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
5 of 20
Introduction An overview
• Periodograms spectral estimates still contains a lot of information not required by ASR
– A cochlea does not know the difference between to closely spaced frequency
– In this reason, we take the sum of the energies at various frequency regions
• This is performed by a Mel-filterbank where we are only concerned at how much energy is produced in each spot
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
6 of 20
Introduction An overview
The first filter is very narrow and gives an indication of how much energy exists neat 0 hertz. Higher frequencies gives a wider filter as it becomes less concerned about variations
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
7 of 20
• Mel-Frequency Cepstral Coefficients (MFCC) is composed of the following steps:
1. pre-emphasis 2. Hamming window 3. power spectrum (not dB scale) 4. Mel scale filter banks (triangular filters) .
Steps in MFCC Feature Extraction
)1(97.0)()(' msmsms nnn
)1
2cos(46.054.0)(
N
nnh
S=(Xr2+Xi
2)
)700
1(log2595)Mel( 10
ff
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
8 of 20
• MFCC is composed of the following steps: 5. compute log spectrum from filter banks 6. convert log energies from filter banks to cepstral coefficients 7. weight cepstral coefficients
Steps in MFCC Feature Extraction
10 log10(S)
))5.0(cos(1
jN
imc
N
j
ji
mj = log energy values
N = number of filter banks
nn cknc )exp(' k = 0.6
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
9 of 20
Implementation in C# Data Preparation
Data source is a wav file format Number Channels: 1 (mono) 16bit/sample
Testing file: “PBW3001.wav”
Endian File Offset (bytes)
Field name Field Size (bytes)
Big 0 ChunkID 4
Little 4 ChunkSize 4
Big 8 Format 4
Big 12 Subchunk1 ID 4
Little 16 Subchunk1 Size 4
Little 20 AudioFormat 2
Little 22 NumChannels 2
Little 24 SampleRate 4
Little 28 ByteRate 4
Little 32 BlockAlign 2
Little 34 BitsPerSample 2
Big 36 SubChunk2ID 4
Little 40 SubChunk2Size 4
Little 44 Data
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
10 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
Computes for the MFCC Features
(short[]) GetDataSamples( ) Returns the Data Samples from the wavefile filename
(List<float[]>) GetFeatures() Returns the values MFCC Features of each frames in form of a list of float arrays: List<float[]> (bool) SaveAs(string filename)
Saves the Features in HTK format (.mfc) as filename.mfc – Returns true if saving is successful, otherwise false.
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
11 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
(float) ExtractFeatureFrames(float[] frames)
Extracts the Features from a frame of size config.FRAMESIZE = 160
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
12 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
(float) ExtractFeatureFrames(float[] frames)
(void) PreEmphasis(float[] s, float k)
Does pre-emphasis for frame samples float[]s with the pre-emphasis coefficient of float k
Pre-emphasis coefficient = 0.97
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
13 of 20
Hamming Window
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
(float) ExtractFeatureFrames(float[] frames)
(void) PreEmphasis(float[] s, float k)
(void) Ham(float[] s)
Performs Hamming Windowing into samples of float[] s
(void) GenHamWindow (int frameSize)
Generates a window values using the formula
)1
2cos(46.054.0)(
N
nnh
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
14 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
Mel-Filterbanks
(FbankInfo) InitFBank(int frameSize, long sampPeriod, int numChans,
float lopass, float hipass, Boolean usePower, Boolean takeLogs,
Boolean doubleFFT, float alpha, float warpLowCut,
float warpUpCut
Builds filter bank Information and generates a table for filter bank computation before calling Wave2Fbank
)700
1(log2595)Mel( 10
ff
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
15 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
Mel-Filterbanks
(FbankInfo) InitFBank(int frameSize, …)
Convert given speech frame in float[] s into mel-frequency filterbank coefficients. The FBankInfo info contains precomputed filter weights and should be set prior to using Wave2FBank by calling InitFBank
(float[]) Wave2FBank(float[] s, FBankInfo info)
(void) Realft(float[] s)
Performs Fast Fourier Transform for each samples
(void) FFT(float[] s, int invert)
Log of Filtered samples are also computed in this method
10 log10(S)
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
16 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
Mel-Filterbanks
…
void FBank2MFCC(float[] fbank, float[] c, int n)
Applies the Discrete Cosine Transform (DCT) to the filterbank values float[] fb and stores the first int n cepstral coeff in float[] c
))5.0(cos(1
jN
imc
N
j
ji
mj = log energy values
N = number of filter banks
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
17 of 20
Implementation in C# Feature Extraction
(void) MFCC (string filename)
MFCC(string filename)
(short[]) GetDataSamples( )
(List<float[]>) GetFeatures()
(bool) SaveAs(string filename)
Mel-Filterbanks
…
void FBank2MFCC(float[] fbank, float[] c, int n)
void WeightCepstrum(float[] c, int start, int count, int cepLiftering)
Zeroth Cepstral Coeff?
float FBank2C0(float[] fbank)
Computes for the Zeroth Cepstral Coefficient by getting the sum of the squared energies of each channel
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
18 of 20
Implementation in C# Output Format
Endian File Offset (bytes)
Field name Field Size (bytes)
Little 0 nSamples 4
Little 4 sampPeriod 4
Little 8 sampSize 2
Little 10 parmKind 2
Little 12 Data Chunk
HTK File Format (.mfc)
parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
19 of 20
Implementation in C# Results
------------------------------------ Samples: 0->-1 ------------------------------------ 0: -1.022 -1.213 -7.195 -3.397 2.617 -3.719 -5.103 -8.504 4.003 1.078 6.698 0.812 61.568 1: 1.292 -7.186 -8.138 -1.217 0.970 2.471 3.538 -6.511 2.651 2.122 1.411 -5.272 58.030 2: -0.139 -6.042 -6.751 -1.635 6.527 -0.841 5.195 -5.353 -2.609 -0.694 -1.879 -4.337 59.042 3: -0.828 -4.023 -7.125 -4.043 1.380 0.624 8.837 -8.095 -1.895 3.497 1.440 -5.241 60.955 4: -1.707 -4.215 -7.265 -4.286 2.585 -1.638 1.916 -11.040 4.112 3.503 2.758 -2.860 60.343 5: -2.483 -5.382 -6.464 -7.302 5.775 0.474 -1.112 -13.445 3.988 4.483 1.517 -2.423 60.570 6: -1.693 -4.832 -6.464 -3.432 4.879 -0.349 6.912 -10.826 -3.371 8.082 8.802 -2.008 59.807 7: -1.985 -6.790 -5.654 -2.350 2.901 -0.573 3.631 -8.504 -2.682 3.165 5.513 -0.371 60.429 8: -1.328 -6.246 -5.088 -4.363 1.940 1.905 3.247 -10.004 2.388 4.935 8.520 1.038 60.738
Result from HTK
------------------------------------ Samples: 0->-1 ------------------------------------ 0: -1.022 -1.213 -7.195 -3.397 2.617 -3.719 -5.103 -8.504 4.003 1.077 6.698 0.812 61.568 1: 1.292 -7.186 -8.138 -1.217 0.971 2.471 3.538 -6.511 2.651 2.122 1.411 -5.272 58.030 2: -0.139 -6.042 -6.752 -1.635 6.527 -0.841 5.195 -5.353 -2.609 -0.694 -1.879 -4.337 59.042 3: -0.828 -4.023 -7.125 -4.043 1.380 0.624 8.837 -8.095 -1.894 3.497 1.440 -5.241 60.955 4: -1.707 -4.215 -7.265 -4.286 2.585 -1.638 1.916 -11.040 4.112 3.503 2.758 -2.860 60.343 5: -2.483 -5.382 -6.464 -7.302 5.775 0.474 -1.112 -13.445 3.988 4.483 1.517 -2.423 60.570 6: -1.693 -4.832 -6.464 -3.432 4.879 -0.349 6.912 -10.826 -3.372 8.082 8.802 -2.008 59.806 7: -1.985 -6.790 -5.654 -2.350 2.901 -0.573 3.631 -8.505 -2.682 3.165 5.513 -0.371 60.429 8: -1.328 -6.246 -5.088 -4.363 1.940 1.905 3.247 -10.004 2.388 4.935 8.520 1.038 60.738
.net Implementation
Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering
20 of 20
THANK YOU! 감사합니다
Recommended