20
Mel-Frequency Cepstral Coefficient (MFCC) Implementation in C# based on HTK Hanbat National University – IISPL Laboratory Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Mel-Frequency Cepstral Coefficient (MFCC) Implementation in C# based on HTK

Hanbat National University – IISPL Laboratory

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

Page 2: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

2 of 20

Introduction What is MFCC?

• Mel Frequency Cepstral Coefficents (MFCCs) are a feature widely used in automatic speech and speaker recognition.

• Introduced by Davis and Mermelstein in the 1980's, and have been state-of-the-art ever since

• Prior to its introduction, the main feature type for ASRs are:

– Linear Prediction Coefficients (LPCs)

– Linear Prediction Cepstral Coefficients (LPCCs)

Page 3: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

3 of 20

Introduction An overview

• Audio signal is constantly changing

• To simplify this, we assume audio into short time scales that doesn’t change much

– Frame the signal into 20-40ms frames

– Longer frames will lead to too much changes throughout the frame

– Shorter frames don’t have enough samples to get a reliable spectral estimate

Page 4: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

4 of 20

Introduction An overview

• Next, we calculate the power spectrum of each frame.

This is motivated by the human cochlea which vibrates at different spots depending on the frequency of the incoming sound

• A periodogram estimates a similar job by identifying which frequencies are present in the frame

Page 5: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

5 of 20

Introduction An overview

• Periodograms spectral estimates still contains a lot of information not required by ASR

– A cochlea does not know the difference between to closely spaced frequency

– In this reason, we take the sum of the energies at various frequency regions

• This is performed by a Mel-filterbank where we are only concerned at how much energy is produced in each spot

Page 6: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

6 of 20

Introduction An overview

The first filter is very narrow and gives an indication of how much energy exists neat 0 hertz. Higher frequencies gives a wider filter as it becomes less concerned about variations

Page 7: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

7 of 20

• Mel-Frequency Cepstral Coefficients (MFCC) is composed of the following steps:

1. pre-emphasis 2. Hamming window 3. power spectrum (not dB scale) 4. Mel scale filter banks (triangular filters) .

Steps in MFCC Feature Extraction

)1(97.0)()(' msmsms nnn

)1

2cos(46.054.0)(

N

nnh

S=(Xr2+Xi

2)

)700

1(log2595)Mel( 10

ff

Page 8: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

8 of 20

• MFCC is composed of the following steps: 5. compute log spectrum from filter banks 6. convert log energies from filter banks to cepstral coefficients 7. weight cepstral coefficients

Steps in MFCC Feature Extraction

10 log10(S)

))5.0(cos(1

jN

imc

N

j

ji

mj = log energy values

N = number of filter banks

nn cknc )exp(' k = 0.6

Page 9: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

9 of 20

Implementation in C# Data Preparation

Data source is a wav file format Number Channels: 1 (mono) 16bit/sample

Testing file: “PBW3001.wav”

Endian File Offset (bytes)

Field name Field Size (bytes)

Big 0 ChunkID 4

Little 4 ChunkSize 4

Big 8 Format 4

Big 12 Subchunk1 ID 4

Little 16 Subchunk1 Size 4

Little 20 AudioFormat 2

Little 22 NumChannels 2

Little 24 SampleRate 4

Little 28 ByteRate 4

Little 32 BlockAlign 2

Little 34 BitsPerSample 2

Big 36 SubChunk2ID 4

Little 40 SubChunk2Size 4

Little 44 Data

Page 10: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

10 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

Computes for the MFCC Features

(short[]) GetDataSamples( ) Returns the Data Samples from the wavefile filename

(List<float[]>) GetFeatures() Returns the values MFCC Features of each frames in form of a list of float arrays: List<float[]> (bool) SaveAs(string filename)

Saves the Features in HTK format (.mfc) as filename.mfc – Returns true if saving is successful, otherwise false.

Page 11: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

11 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

(float) ExtractFeatureFrames(float[] frames)

Extracts the Features from a frame of size config.FRAMESIZE = 160

Page 12: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

12 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

(float) ExtractFeatureFrames(float[] frames)

(void) PreEmphasis(float[] s, float k)

Does pre-emphasis for frame samples float[]s with the pre-emphasis coefficient of float k

Pre-emphasis coefficient = 0.97

Page 13: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

13 of 20

Hamming Window

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

(float) ExtractFeatureFrames(float[] frames)

(void) PreEmphasis(float[] s, float k)

(void) Ham(float[] s)

Performs Hamming Windowing into samples of float[] s

(void) GenHamWindow (int frameSize)

Generates a window values using the formula

)1

2cos(46.054.0)(

N

nnh

Page 14: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

14 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

Mel-Filterbanks

(FbankInfo) InitFBank(int frameSize, long sampPeriod, int numChans,

float lopass, float hipass, Boolean usePower, Boolean takeLogs,

Boolean doubleFFT, float alpha, float warpLowCut,

float warpUpCut

Builds filter bank Information and generates a table for filter bank computation before calling Wave2Fbank

)700

1(log2595)Mel( 10

ff

Page 15: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

15 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

Mel-Filterbanks

(FbankInfo) InitFBank(int frameSize, …)

Convert given speech frame in float[] s into mel-frequency filterbank coefficients. The FBankInfo info contains precomputed filter weights and should be set prior to using Wave2FBank by calling InitFBank

(float[]) Wave2FBank(float[] s, FBankInfo info)

(void) Realft(float[] s)

Performs Fast Fourier Transform for each samples

(void) FFT(float[] s, int invert)

Log of Filtered samples are also computed in this method

10 log10(S)

Page 16: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

16 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

Mel-Filterbanks

void FBank2MFCC(float[] fbank, float[] c, int n)

Applies the Discrete Cosine Transform (DCT) to the filterbank values float[] fb and stores the first int n cepstral coeff in float[] c

))5.0(cos(1

jN

imc

N

j

ji

mj = log energy values

N = number of filter banks

Page 17: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

17 of 20

Implementation in C# Feature Extraction

(void) MFCC (string filename)

MFCC(string filename)

(short[]) GetDataSamples( )

(List<float[]>) GetFeatures()

(bool) SaveAs(string filename)

Mel-Filterbanks

void FBank2MFCC(float[] fbank, float[] c, int n)

void WeightCepstrum(float[] c, int start, int count, int cepLiftering)

Zeroth Cepstral Coeff?

float FBank2C0(float[] fbank)

Computes for the Zeroth Cepstral Coefficient by getting the sum of the squared energies of each channel

Page 18: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

18 of 20

Implementation in C# Output Format

Endian File Offset (bytes)

Field name Field Size (bytes)

Little 0 nSamples 4

Little 4 sampPeriod 4

Little 8 sampSize 2

Little 10 parmKind 2

Little 12 Data Chunk

HTK File Format (.mfc)

parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Page 19: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

19 of 20

Implementation in C# Results

------------------------------------ Samples: 0->-1 ------------------------------------ 0: -1.022 -1.213 -7.195 -3.397 2.617 -3.719 -5.103 -8.504 4.003 1.078 6.698 0.812 61.568 1: 1.292 -7.186 -8.138 -1.217 0.970 2.471 3.538 -6.511 2.651 2.122 1.411 -5.272 58.030 2: -0.139 -6.042 -6.751 -1.635 6.527 -0.841 5.195 -5.353 -2.609 -0.694 -1.879 -4.337 59.042 3: -0.828 -4.023 -7.125 -4.043 1.380 0.624 8.837 -8.095 -1.895 3.497 1.440 -5.241 60.955 4: -1.707 -4.215 -7.265 -4.286 2.585 -1.638 1.916 -11.040 4.112 3.503 2.758 -2.860 60.343 5: -2.483 -5.382 -6.464 -7.302 5.775 0.474 -1.112 -13.445 3.988 4.483 1.517 -2.423 60.570 6: -1.693 -4.832 -6.464 -3.432 4.879 -0.349 6.912 -10.826 -3.371 8.082 8.802 -2.008 59.807 7: -1.985 -6.790 -5.654 -2.350 2.901 -0.573 3.631 -8.504 -2.682 3.165 5.513 -0.371 60.429 8: -1.328 -6.246 -5.088 -4.363 1.940 1.905 3.247 -10.004 2.388 4.935 8.520 1.038 60.738

Result from HTK

------------------------------------ Samples: 0->-1 ------------------------------------ 0: -1.022 -1.213 -7.195 -3.397 2.617 -3.719 -5.103 -8.504 4.003 1.077 6.698 0.812 61.568 1: 1.292 -7.186 -8.138 -1.217 0.971 2.471 3.538 -6.511 2.651 2.122 1.411 -5.272 58.030 2: -0.139 -6.042 -6.752 -1.635 6.527 -0.841 5.195 -5.353 -2.609 -0.694 -1.879 -4.337 59.042 3: -0.828 -4.023 -7.125 -4.043 1.380 0.624 8.837 -8.095 -1.894 3.497 1.440 -5.241 60.955 4: -1.707 -4.215 -7.265 -4.286 2.585 -1.638 1.916 -11.040 4.112 3.503 2.758 -2.860 60.343 5: -2.483 -5.382 -6.464 -7.302 5.775 0.474 -1.112 -13.445 3.988 4.483 1.517 -2.423 60.570 6: -1.693 -4.832 -6.464 -3.432 4.879 -0.349 6.912 -10.826 -3.372 8.082 8.802 -2.008 59.806 7: -1.985 -6.790 -5.654 -2.350 2.901 -0.573 3.631 -8.505 -2.682 3.165 5.513 -0.371 60.429 8: -1.328 -6.246 -5.088 -4.363 1.940 1.905 3.247 -10.004 2.388 4.935 8.520 1.038 60.738

.net Implementation

Page 20: Hanbat National University IISPL Laboratory Mel-Frequency ...1].pdf · HTK File Format (.mfc) parmKind (6 bit code) + additional qualifiers Parameter Kind Additional Qualifiers

Professor Kim Yoon Joong, PhD Hanbat National University – Department of Computer Engineering

20 of 20

THANK YOU! 감사합니다