87
Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta Kumar Ghosh SPIRE LAB Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India SPIRE LAB, IISc, Bangalore 1

Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

  • Upload
    others

  • View
    23

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Pitch Prediction from Mel-frequency CepstralCoefficients Using Sparse Spectrum Recovery

Achuth Rao MV, Prasanta Kumar Ghosh

SPIRE LABElectrical Engineering,

Indian Institute of Science (IISc), Bangalore, India

SPIRE LAB, IISc, Bangalore 1

Page 2: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 2

Page 3: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Section 1

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 3

Page 4: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

The Automatic speech recognition (ASR) systems are very commonin Mobile devices.

Implementing ASR applications in mobile devices using these modelscould be challenging due to its computational and memoryconstraints.

SPIRE LAB, IISc, Bangalore 4

Page 5: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

The Automatic speech recognition (ASR) systems are very commonin Mobile devices.

Implementing ASR applications in mobile devices using these modelscould be challenging due to its computational and memoryconstraints.

SPIRE LAB, IISc, Bangalore 4

Page 6: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

Distributed speech recognition (DSR) allows ASR applications to beused in mobile devices1.

1Choi, “14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets”, 2016

SPIRE LAB, IISc, Bangalore 5

Page 7: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

Distributed speech recognition (DSR) allows ASR applications to beused in mobile devices2.

Such systems replace low bit-rate speech codecs with feature vectors(such as MFCCs).

The removal of the speech codec gives increased recognition accuracy,particular in the presence of acoustic noise or channel errors3.

2Choi, “14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets”, 20163Shao and Milner, “Pitch prediction from MFCC vectors for speech reconstruction”, 2004

SPIRE LAB, IISc, Bangalore 6

Page 8: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

Distributed speech recognition (DSR) allows ASR applications to beused in mobile devices2.

Such systems replace low bit-rate speech codecs with feature vectors(such as MFCCs).

The removal of the speech codec gives increased recognition accuracy,particular in the presence of acoustic noise or channel errors3.

2Choi, “14-2: Invited Paper: Enabling Technologies for Wearable Smart Headsets”, 20163Shao and Milner, “Pitch prediction from MFCC vectors for speech reconstruction”, 2004

SPIRE LAB, IISc, Bangalore 6

Page 9: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

HMM based recognizer was using directly features to do ASR4.

4Gales, “Maximum likelihood linear transformations for HMM-based speech recognition”, 1998

SPIRE LAB, IISc, Bangalore 7

Page 10: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

Recently in many practical scenarios, the accuracy of the speechrecognition is closer to the human level using the end to end deeparchitectures.5 6 7

5Xiong et al., “The Microsoft 2016 Conversational Speech Recognition System”, 20166Zweig et al., “Advances in All-Neural Speech Recognition”, 20167Chan et al., “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition”, 2016

SPIRE LAB, IISc, Bangalore 8

Page 11: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

One way to use is reconstructing the speech from features.

SPIRE LAB, IISc, Bangalore 9

Page 12: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Motivation

In most cases the feature used are Mel-frequency Cepstral Coefficients(MFCC) in case of HMM based ASR. So we need way to reconstructthe speech just using MFCC. So we propose to predict the pitch fromMFCC as first step in speech reconstruction.

SPIRE LAB, IISc, Bangalore 10

Page 13: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

How Mel-frequency Cepstral Coefficients (MFCC) encodes thepitch information?

SPIRE LAB, IISc, Bangalore 11

Page 14: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Source Filter model of speech 8

8Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1971

SPIRE LAB, IISc, Bangalore 12

Page 15: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

Source Filter model of speech 8

Note the sparse nature of the speech spectrum..!

8Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1971

SPIRE LAB, IISc, Bangalore 12

Page 16: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 17: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

w(n) is the window signal.

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 18: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 19: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

Hm[k], 0 ≤ k ≤ N − 1 is frequency response of mth filter.

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 20: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 21: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 22: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Introduction

MFCC computation 9

9Huang et al., Spoken language processing: A guide to theory, algorithm, and system development, 2001

SPIRE LAB, IISc, Bangalore 13

Page 23: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Section 2

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 14

Page 24: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

What are the blocks to be inverted?

SPIRE LAB, IISc, Bangalore 15

Page 25: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

Speech magnitude spectrum is enough to predict the pitch..!

SPIRE LAB, IISc, Bangalore 15

Page 26: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

Which blocks are non-invertible?

What are the blocks are non-invertible?

We propose a three-step method to estimate the pitch from MFCC.1 Estimate the MFBE from the MFCC.2 Recover the spectrum from the estimated MFBEs.3 Estimate pitch from spectrum.

SPIRE LAB, IISc, Bangalore 16

Page 27: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

Which blocks are non-invertible?

What are the blocks are non-invertible?

We propose a three-step method to estimate the pitch from MFCC.1 Estimate the MFBE from the MFCC.2 Recover the spectrum from the estimated MFBEs.3 Estimate pitch from spectrum.

SPIRE LAB, IISc, Bangalore 16

Page 28: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

What are the blocks are non-invertible?

What are the blocks are non-invertible?

We propose a three-step method to estimate the pitch from MFCC.1 Estimate the MFBE from the MFCC.2 Recover the spectrum from the estimated MFBEs.3 Estimate pitch from spectrum.

SPIRE LAB, IISc, Bangalore 16

Page 29: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

What are the blocks are non-invertible?

What are the blocks are non-invertible?

We propose a three-step method to estimate the pitch from MFCC.1 Estimate the MFBE from the MFCC.2 Recover the spectrum from the estimated MFBEs.3 Estimate pitch from spectrum.

SPIRE LAB, IISc, Bangalore 16

Page 30: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach: Pitch prediction from MFCC

What are the blocks are non-invertible?

We propose a three-step method to estimate the pitch from MFCC.

1 Estimate the MFBE from the MFCC.2 Recover the spectrum from the estimated MFBEs.3 Estimate pitch from spectrum.

SPIRE LAB, IISc, Bangalore 16

Page 31: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

Proposed approach-Estimation of the spectrum from theMFBEs

SPIRE LAB, IISc, Bangalore 17

Page 32: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(1) Estimate the MFBE from the MFCC

SPIRE LAB, IISc, Bangalore 18

Page 33: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(1) Estimate the MFBE from the MFCC

SPIRE LAB, IISc, Bangalore 19

The DCT operation is invertibleonly if the number of MFBEs(M)and MFCCs(K) are the same. IfK<M.

we use two methods to recover theMFBEs.

1 ZDCT : Zero padding to MFCC2 DNNDCT : DNN based

estimation.

Page 34: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2) Recover the spectrum from the estimated MFBEs

SPIRE LAB, IISc, Bangalore 20

Page 35: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

[2a] Recover the spectrum from the estimated MFBEs.

SPIRE LAB, IISc, Bangalore 21

The voiced spectrum is sparse and thepitch can be determined from the voicespectrum.

The values around the harmonics isdetermined by the spectrum of thewindow.

We model the voiced speech spectrum as

Y [k] ≈W [k]?

(L∑l=1

xlδ(k −N0l)

)=

(L∑l=1

xlW (k −N0l)

)

This can be compactly written asY ≈Wx. where x is a sparse vector

Page 36: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2b) Recover the spectrum from the estimated MFBEs.

Error in the modeling because of non-inveribility.

The estimated MFBE can be written as

f = HWx+ γ

where γ is sum of model and estimation noise.We propose two methods to recover the spectrum from MFBEs

1 Direct estimation of spectrum under the noise model given above.2 Estimation of spectrum with sparsity constraint on the spectrum.

SPIRE LAB, IISc, Bangalore 22

Page 37: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2b) Recover the spectrum from the estimated MFBEs.

Error in the modeling because of non-inveribility.

The estimated MFBE can be written as

f = HWx+ γ

where γ is sum of model and estimation noise.

We propose two methods to recover the spectrum from MFBEs1 Direct estimation of spectrum under the noise model given above.2 Estimation of spectrum with sparsity constraint on the spectrum.

SPIRE LAB, IISc, Bangalore 22

Page 38: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2b) Recover the spectrum from the estimated MFBEs.

Error in the modeling because of non-inveribility.

The estimated MFBE can be written as

f = HWx+ γ

where γ is sum of model and estimation noise.We propose two methods to recover the spectrum from MFBEs

1 Direct estimation of spectrum under the noise model given above.2 Estimation of spectrum with sparsity constraint on the spectrum.

SPIRE LAB, IISc, Bangalore 22

Page 39: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2c) Recover the spectrum from the estimated MFBEs.

Given that the f = HWx+ γ, The maximum likelihood estimation ofspectrum is given by

x∗PINV = argminx

||f −HWx||22 (1)

The solution turns out to be a closed form expression and can bewritten using the pseudo-inverse (PINV) of HW as follows:

x∗PINV = ((HW )THW )−1(HW )T f (2)

SPIRE LAB, IISc, Bangalore 23

Page 40: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2c) Recover the spectrum from the estimated MFBEs.

Given that the f = HWx+ γ, The maximum likelihood estimation ofspectrum is given by

x∗PINV = argminx

||f −HWx||22 (1)

The solution turns out to be a closed form expression and can bewritten using the pseudo-inverse (PINV) of HW as follows:

x∗PINV = ((HW )THW )−1(HW )T f (2)

SPIRE LAB, IISc, Bangalore 23

Page 41: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2d) Recover the spectrum from the estimated MFBEs.

We impose non-negativity and sparsity constraint on x. This resultsin the following optimization problem:

x∗S = argminx≥0

||f −HWx||22 + λ||x||1

Since there is non negativity constraint on x, the l1 norm of x can bewritten as sum of its elements. The equivalent optimization problembecomes:

x∗S = argminx≥0

||f −HWx||22 + λ1Tx (3)

The following optimization is posed as a quadratic programingproblem10.Note that the λ is hyper-parameter and controls the sparsity.

10Koh, Kim, and Boyd, “An interior-point method for large-scale l1-regularized logistic regression”, 2007

SPIRE LAB, IISc, Bangalore 24

Page 42: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2d) Recover the spectrum from the estimated MFBEs.

We impose non-negativity and sparsity constraint on x. This resultsin the following optimization problem:

x∗S = argminx≥0

||f −HWx||22 + λ||x||1

Since there is non negativity constraint on x, the l1 norm of x can bewritten as sum of its elements. The equivalent optimization problembecomes:

x∗S = argminx≥0

||f −HWx||22 + λ1Tx (3)

The following optimization is posed as a quadratic programingproblem10.

Note that the λ is hyper-parameter and controls the sparsity.

10Koh, Kim, and Boyd, “An interior-point method for large-scale l1-regularized logistic regression”, 2007

SPIRE LAB, IISc, Bangalore 24

Page 43: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(2d) Recover the spectrum from the estimated MFBEs.

We impose non-negativity and sparsity constraint on x. This resultsin the following optimization problem:

x∗S = argminx≥0

||f −HWx||22 + λ||x||1

Since there is non negativity constraint on x, the l1 norm of x can bewritten as sum of its elements. The equivalent optimization problembecomes:

x∗S = argminx≥0

||f −HWx||22 + λ1Tx (3)

The following optimization is posed as a quadratic programingproblem10.Note that the λ is hyper-parameter and controls the sparsity.

10Koh, Kim, and Boyd, “An interior-point method for large-scale l1-regularized logistic regression”, 2007

SPIRE LAB, IISc, Bangalore 24

Page 44: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(3) Estimation of the pitch from the estimated spectrum

SPIRE LAB, IISc, Bangalore 25

Page 45: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

(3) Estimation of the pitch from the estimated spectrum

We use Subharmonic to Harmonic Ratio (SHR) 11 to estimate thepitch from the spectrum.

Given the magnitude spectrum X∗(f′), the pitch range S and the

number of harmonics (Q), the pitch value (p∗) is obtained followingan optimization given below:

p∗ =argmaxf∈S

∫ ∞0

log(X∗(f

′))×

Q∑k=1

δ(f′ − kf)− δ(f ′ − (k − 1/2)f) df

′(4)

11Sun, “Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio”, 2002

SPIRE LAB, IISc, Bangalore 26

Page 46: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Proposed approach

[3]Estimation of the pitch from the estimated spectrum

SPIRE LAB, IISc, Bangalore 27

Page 47: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Previous work and baseline

Section 3

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 28

Page 48: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Previous work and baseline

Previous work and baseline

There are several works in the literature where the pitch is predictedfrom the MFCC using a statistical model such as Gaussian mixturemodel (GMM) and hidden Markov models1213.

Here we use Deep neural network (DNN) based method to predictpitch from MFCC. Which showed lot of success in many fields. Werefer this DNN by DNNb.

12Milner and Shao, “Prediction of fundamental frequency and voicing from Mel-frequency cepstral coefficients forunconstrained speech reconstruction”, 2007

13Shao and Milner, “Pitch prediction from MFCC vectors for speech reconstruction”, 2004

SPIRE LAB, IISc, Bangalore 29

Page 49: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Previous work and baseline

[3]Estimation of the pitch from the estimated spectrum

SPIRE LAB, IISc, Bangalore 30

Page 50: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Section 4

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 31

Page 51: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

We use two databases: CMUARCTIC14 and KEELE15.

CMU-ARCTIC database:

one male(JMK) and one female(SLT).∼ 48min each.

KEELE database:

one male and one female.∼ 4min each.

We use randomly choose 80% of the CMUARCTIC data from eachspeaker as training set and rest as test set. 100% KEELE is used astest set to evaluate the generalization of the algorithms.

14Kominek and Black, “The CMU ARCTIC speech databases”, 200415Plante, Meyer, and Ainsworth, “A pitch extraction reference database”, 1995

SPIRE LAB, IISc, Bangalore 32

Page 52: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

We use two databases: CMUARCTIC14 and KEELE15.

CMU-ARCTIC database:

one male(JMK) and one female(SLT).∼ 48min each.

KEELE database:

one male and one female.∼ 4min each.

We use randomly choose 80% of the CMUARCTIC data from eachspeaker as training set and rest as test set. 100% KEELE is used astest set to evaluate the generalization of the algorithms.

14Kominek and Black, “The CMU ARCTIC speech databases”, 200415Plante, Meyer, and Ainsworth, “A pitch extraction reference database”, 1995

SPIRE LAB, IISc, Bangalore 32

Page 53: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

We use two databases: CMUARCTIC14 and KEELE15.

CMU-ARCTIC database:

one male(JMK) and one female(SLT).∼ 48min each.

KEELE database:

one male and one female.∼ 4min each.

We use randomly choose 80% of the CMUARCTIC data from eachspeaker as training set and rest as test set. 100% KEELE is used astest set to evaluate the generalization of the algorithms.

14Kominek and Black, “The CMU ARCTIC speech databases”, 200415Plante, Meyer, and Ainsworth, “A pitch extraction reference database”, 1995

SPIRE LAB, IISc, Bangalore 32

Page 54: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

We use two databases: CMUARCTIC14 and KEELE15.

CMU-ARCTIC database:

one male(JMK) and one female(SLT).∼ 48min each.

KEELE database:

one male and one female.∼ 4min each.

We use randomly choose 80% of the CMUARCTIC data from eachspeaker as training set and rest as test set. 100% KEELE is used astest set to evaluate the generalization of the algorithms.

14Kominek and Black, “The CMU ARCTIC speech databases”, 200415Plante, Meyer, and Ainsworth, “A pitch extraction reference database”, 1995

SPIRE LAB, IISc, Bangalore 32

Page 55: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

The histogram pitch distribution for different train and test set isshown below

SPIRE LAB, IISc, Bangalore 33

Page 56: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Database

Database

The histogram pitch distribution for different train and test set isshown below

Note the histogram mismatch in MALE is more.

SPIRE LAB, IISc, Bangalore 33

Page 57: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Experimental setup

Experimental setup: MFCC and Pitch computation

MFCC computation:

1 hamming window of 40ms and shift of 10ms.2 The DFT of 2048 point is computed.3 The MFBEs are computed by placing the M=26 filter banks uniformly

on the Melscale from 50-3700Hz16.3 The DCT with K = 26, 21, 16, 13 is computed to investigate

estimation error due to different amount of truncation in DCTcoefficients.

Pitch: We use auto-correlation method from Praat 17 on the EGGsignal available with the database to determine ground truth of thefundamental frequency and voicing. The un-voiced frames areremoved from the data for the experiments.

16The velocity and the acceleration coefficients of MFCC are not used.17Boersma and Weenink, “Praat: doing phonetics by computer”, 2010

SPIRE LAB, IISc, Bangalore 34

Page 58: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Experimental setup

Experimental setup: Hyper parameter selection

proposed method: The sparse spectrum estimation has hyperparameter λ is experimentally found using the 10% of randomlyselected training data to minimize the pitch error for each training set.

Pitch estimation: We use 4 harmonics to compute the pitch scoreusing SHR. The pitch search range(S) for CM, CF and CM+CF arechosen to be 150-350, 70-150, 70-350 respectively.

SPIRE LAB, IISc, Bangalore 35

Page 59: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Experimental setup

Experimental setup: Hyper parameter selection

proposed method: The sparse spectrum estimation has hyperparameter λ is experimentally found using the 10% of randomlyselected training data to minimize the pitch error for each training set.

Pitch estimation: We use 4 harmonics to compute the pitch scoreusing SHR. The pitch search range(S) for CM, CF and CM+CF arechosen to be 150-350, 70-150, 70-350 respectively.

SPIRE LAB, IISc, Bangalore 35

Page 60: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

Evaluation

We use root mean squared error18 as metrics to measure the pitchestimation performance. This is computed using the estimated pitch(p∗i ) and original pitch (pi) at the i-th frame for the entire test setwith Ntot voiced frames given by

RMSE =

√√√√ 1

Ntot

Ntot∑i=1

(pi − p∗i )2

18Tabrikian, Dubnov, and Dickalov, “Maximum a-posteriori probability pitch tracking in noisy environments using harmonicmodel”, 2004

SPIRE LAB, IISc, Bangalore 36

Page 61: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

Estimation of the pitch from the estimated spectrum

SPIRE LAB, IISc, Bangalore 37

Page 62: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

Sample recovered spectrum

the sparsity constraint helps in recovering the lower pitch spectrumwith higher accuracy

SPIRE LAB, IISc, Bangalore 38

Page 63: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

Sample recovered spectrum

the sparsity constraint helps in recovering the lower pitch spectrumwith higher accuracy

SPIRE LAB, IISc, Bangalore 38

Page 64: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

Set of models trained

subscript z indicates the DCT reconstruction using zero padding andsubscript D indicates the DCT reconstruction using DNN.

model CM CF CM+CF

DNN DNNCMb DNNCF

b DNNCM+CFb

PINVz PINVzPSz PSCM

z PSCFz PSCM+CF

z

PINVD PINV CMD PINV CF

D PINV CM+CFD

PSD PSCMD PSCF

D PSCM+CFD

SPIRE LAB, IISc, Bangalore 39

Page 65: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE

SPIRE LAB, IISc, Bangalore 40

DNN out performs all the methods.

TestCM 0 1 5 10 13

DNNCMb 4.80 4.48 6.29 9.99 11.26

DNNCFb 57.48 111.37 100.73 44.2 62.20

DNNCM+CFb 4.95 4.90 6.69 9.57 11.28

PINVz 19.94 19.57 21.8 18.2 16.24

PSCMz 8.73 8.69 10.46 14.49 15.18

PSCFz 8.73 9.89 10.46 16.86 16.69

PSCF+CMz 8.73 13.47 15.72 16.86 16.69

PINVD - 10.87 12.44 14.07 16.15

PSCMD - 8.86 9.24 11.54 13.35

Page 66: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE

SPIRE LAB, IISc, Bangalore 40

The pitch RMSE increases as the truncation increases.

TestCM 0 1 5 10 13

DNNCMb 4.80 4.48 6.29 9.99 11.26

DNNCFb 57.48 111.37 100.73 44.2 62.20

DNNCM+CFb 4.95 4.90 6.69 9.57 11.28

PINVz 19.94 19.57 21.8 18.2 16.24

PSCMz 8.73 8.69 10.46 14.49 15.18

PSCFz 8.73 9.89 10.46 16.86 16.69

PSCF+CMz 8.73 13.47 15.72 16.86 16.69

PINVD - 10.87 12.44 14.07 16.15

PSCMD - 8.86 9.24 11.54 13.35

Page 67: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE

SPIRE LAB, IISc, Bangalore 40

The DNN gender mismatched model performing poorly.

TestCM 0 1 5 10 13

DNNCMb 4.80 4.48 6.29 9.99 11.26

DNNCFb 57.48 111.37 100.73 44.2 62.20

DNNCM+CFb 4.95 4.90 6.69 9.57 11.28

PINVz 19.94 19.57 21.8 18.2 16.24

PSCMz 8.73 8.69 10.46 14.49 15.18

PSCFz 8.73 9.89 10.46 16.86 16.69

PSCF+CMz 8.73 13.47 15.72 16.86 16.69

PINVD - 10.87 12.44 14.07 16.15

PSCMD - 8.86 9.24 11.54 13.35

Page 68: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE

SPIRE LAB, IISc, Bangalore 40

The PS and PINV methods are not affected much by the gendermismatch.

TestCM 0 1 5 10 13

DNNCMb 4.80 4.48 6.29 9.99 11.26

DNNCFb 57.48 111.37 100.73 44.2 62.20

DNNCM+CFb 4.95 4.90 6.69 9.57 11.28

PINVz 19.94 19.57 21.8 18.2 16.24

PSCMz 8.73 8.69 10.46 14.49 15.18

PSCFz 8.73 9.89 10.46 16.86 16.69

PSCF+CMz 8.73 13.47 15.72 16.86 16.69

PINVD - 10.87 12.44 14.07 16.15

PSCMD - 8.86 9.24 11.54 13.35

Page 69: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE

SPIRE LAB, IISc, Bangalore 40

The DNN based DCT estimation is helping in all cases.

TestCM 0 1 5 10 13

DNNCMb 4.80 4.48 6.29 9.99 11.26

DNNCFb 57.48 111.37 100.73 44.2 62.20

DNNCM+CFb 4.95 4.90 6.69 9.57 11.28

PINVz 19.94 19.57 21.8 18.2 16.24

PSCMz 8.73 8.69 10.46 14.49 15.18

PSCFz 8.73 9.89 10.46 16.86 16.69

PSCF+CMz 8.73 13.47 15.72 16.86 16.69

PINVD - 10.87 12.44 14.07 16.15

PSCMD - 8.86 9.24 11.54 13.35

Page 70: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data FEMALE

SPIRE LAB, IISc, Bangalore 41

Same observations applicable. The RMSE is lower comapre to theMALE case.

TestCF 0 1 5 10 13

DNNCMb 67.57 59.9 67.98 73.07 77.9

DNNCFb 2.36 2.37 2.57 4.69 5.98

DNNCM+CFb 2.78 2.95 3.38 6.86 7.50

PINVz 6.46 7.89 12.02 40.65 46.33

PSCMz 7.34 12.02 12.75 42.71 49.64

PSCFz 7.34 11.11 12.75 37.58 47.35

PSCF+CMz 7.34 12.60 12.02 40.65 46.33

PINVD - 6.5 6.52 7.31 9.05

PSCFD - 7.19 7.22 7.97 28.33

Page 71: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with matched test data MALE+FEMALE

SPIRE LAB, IISc, Bangalore 42

Same observations applicable. The RMSE is increased compare togender dependent cases.

TestCM+CF 0 1 5 10 13

DNNCMb 50.21 44.52 50.59 54.60 58.30

DNNCFb 38.60 74.73 67.59 29.86 41.96

DNNCM+CFb 3.91 3.95 5.14 8.19 9.39

PINVz 23.17 23.52 27.8 38.53 39.4

PSCMz 12.4 14.51 16.83 35.28 41.2

PSCFz 12.4 16.91 16.83 33.17 39.2

PSCF+CMz 12.4 14.24 16.91 33.28 38.85

PINVD - 14.12 15.7 17.72 20.67

PSCM+CFD - 12.45 12.85 15.52 27.52

Page 72: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched MALE test data

SPIRE LAB, IISc, Bangalore 43

DNN is performing poorly because of the histogram mismatch.

KM 0 1 5 10 13

DNNCMb 27.44 24.71 27.33 26.74 27.25

DNNCFb 75.62 65.07 79.75 69.46 61

DNNCM+CFb 31.13 43.42 41.58 38.54 37.33

PINVz 63.2 63.4 69.73 56.47 43.33

PSCMz 19.51 18.15 19.61 17.14 17.44

PSCFz 19.51 17.43 19.61 16.59 16.88

PSCF+CMz 19.51 17.56 17.43 16.88 17.70

PINVD - 26.3 27.6 25.7 26.34

PSCMD - 18.32 18.84 19.82 18.12

Page 73: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched MALE test data

SPIRE LAB, IISc, Bangalore 43

PS method is out performing all the methods.

KM 0 1 5 10 13

DNNCMb 27.44 24.71 27.33 26.74 27.25

DNNCFb 75.62 65.07 79.75 69.46 61

DNNCM+CFb 31.13 43.42 41.58 38.54 37.33

PINVz 63.2 63.4 69.73 56.47 43.33

PSCMz 19.51 18.15 19.61 17.14 17.44

PSCFz 19.51 17.43 19.61 16.59 16.88

PSCF+CMz 19.51 17.56 17.43 16.88 17.70

PINVD - 26.3 27.6 25.7 26.34

PSCMD - 18.32 18.84 19.82 18.12

Page 74: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched MALE test data

SPIRE LAB, IISc, Bangalore 43

The RMSE of PINVD and PSD is higher than the PINVz and PSzbecause the DNN-DCT prediction is also poor.

KM 0 1 5 10 13

DNNCMb 27.44 24.71 27.33 26.74 27.25

DNNCFb 75.62 65.07 79.75 69.46 61

DNNCM+CFb 31.13 43.42 41.58 38.54 37.33

PINVz 63.2 63.4 69.73 56.47 43.33

PSCMz 19.51 18.15 19.61 17.14 17.44

PSCFz 19.51 17.43 19.61 16.59 16.88

PSCF+CMz 19.51 17.56 17.43 16.88 17.70

PINVD - 26.3 27.6 25.7 26.34

PSCMD - 18.32 18.84 19.82 18.12

Page 75: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched FEMALE test data

SPIRE LAB, IISc, Bangalore 44

PINVz method is out performing all the methods at lower value oftruncation.

KF 0 1 5 10 13

DNNCMb 85.13 80.87 95.76 71.13 73.21

DNNCFb 31.22 26.91 29.45 31.84 26.1

DNNCM+CFb 27.29 18.40 18.89 21.83 23.57

PINVz 11.77 12 12.82 29.59 42.97

PSCMz 28.48 29.09 29.94 31.51 32.02

PSCFz 28.48 26.23 29.94 29.27 29.25

PSCF+CMz 28.48 28.53 29.25 28.78 27.09

PINVD - 27.96 28.53 29.96 29.82

PSCFD - 27.71 28.13 29.5 28.7

Page 76: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched FEMALE test data

SPIRE LAB, IISc, Bangalore 44

DNN method is better at higher truncation and this because of thehistogram mismatch is less in case of FEMALE data.

KF 0 1 5 10 13

DNNCMb 85.13 80.87 95.76 71.13 73.21

DNNCFb 31.22 26.91 29.45 31.84 26.1

DNNCM+CFb 27.29 18.40 18.89 21.83 23.57

PINVz 11.77 12 12.82 29.59 42.97

PSCMz 28.48 29.09 29.94 31.51 32.02

PSCFz 28.48 26.23 29.94 29.27 29.25

PSCF+CMz 28.48 28.53 29.25 28.78 27.09

PINVD - 27.96 28.53 29.96 29.82

PSCFD - 27.71 28.13 29.5 28.7

Page 77: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched MALE+FEMALE test data

SPIRE LAB, IISc, Bangalore 45

To evaluate the performance of the algorithm in a general unseenscenario, we evaluate the gender independent models in each method(DNN, PINV and PS). The average RMSE on unseen KEELEdatabase is shown below

AVG 0 1 5 10 13

DNNbCM+CF 29.21 30.91 30.23 30.18 30.45

PINVz 37.48 37.70 41.27 43.03 43.15

PSCF+CMz 23.99 23.04 23.34 22.83 22.39

Page 78: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Experiments and results

Evaluation

RMSE error with mismatched MALE+FEMALE test data

SPIRE LAB, IISc, Bangalore 45

The pitch prediction with sparsity constraint will out perform other 2methods in general unseen data and unknown gender case.

To evaluate the performance of the algorithm in a general unseenscenario, we evaluate the gender independent models in each method(DNN, PINV and PS). The average RMSE on unseen KEELEdatabase is shown below

AVG 0 1 5 10 13

DNNbCM+CF 29.21 30.91 30.23 30.18 30.45

PINVz 37.48 37.70 41.27 43.03 43.15

PSCF+CMz 23.99 23.04 23.34 22.83 22.39

Page 79: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Section 5

1 Introduction

2 Proposed approach

3 Previous work and baseline

4 Experiments and resultsDatabaseExperimental setupEvaluation

5 Conclusion and future work

SPIRE LAB, IISc, Bangalore 46

Page 80: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Conclusion and future work

Proposed a there-step method to estimate pitch from MFCC vectors.

We showed that the sparsity constraint help in recovering the pitchvalue more accurately in MALE subjects and generalize well acrossdatabase.

It might be possible to train a DNN with many speakers to get abetter model that generalizes well on unseen test cases. However,obtaining data from many speakers with EGG could be challenging.

Future works may include

imposing the periodicity constraint along with the sparsity constrainton the spectrum.Reconstruction of speech using the estimated pitch and evaluation ofASR performance and the naturalness of synthesized speech.

SPIRE LAB, IISc, Bangalore 47

Page 81: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Conclusion and future work

Proposed a there-step method to estimate pitch from MFCC vectors.

We showed that the sparsity constraint help in recovering the pitchvalue more accurately in MALE subjects and generalize well acrossdatabase.

It might be possible to train a DNN with many speakers to get abetter model that generalizes well on unseen test cases. However,obtaining data from many speakers with EGG could be challenging.

Future works may include

imposing the periodicity constraint along with the sparsity constrainton the spectrum.Reconstruction of speech using the estimated pitch and evaluation ofASR performance and the naturalness of synthesized speech.

SPIRE LAB, IISc, Bangalore 47

Page 82: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Conclusion and future work

Proposed a there-step method to estimate pitch from MFCC vectors.

We showed that the sparsity constraint help in recovering the pitchvalue more accurately in MALE subjects and generalize well acrossdatabase.

It might be possible to train a DNN with many speakers to get abetter model that generalizes well on unseen test cases. However,obtaining data from many speakers with EGG could be challenging.

Future works may include

imposing the periodicity constraint along with the sparsity constrainton the spectrum.Reconstruction of speech using the estimated pitch and evaluation ofASR performance and the naturalness of synthesized speech.

SPIRE LAB, IISc, Bangalore 47

Page 83: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Conclusion and future work

Proposed a there-step method to estimate pitch from MFCC vectors.

We showed that the sparsity constraint help in recovering the pitchvalue more accurately in MALE subjects and generalize well acrossdatabase.

It might be possible to train a DNN with many speakers to get abetter model that generalizes well on unseen test cases. However,obtaining data from many speakers with EGG could be challenging.

Future works may include

imposing the periodicity constraint along with the sparsity constrainton the spectrum.Reconstruction of speech using the estimated pitch and evaluation ofASR performance and the naturalness of synthesized speech.

SPIRE LAB, IISc, Bangalore 47

Page 84: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

THANK YOU

SPIRE LAB, IISc, Bangalore 48

Page 85: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

SPIRE LAB, IISc, Bangalore 49

Page 86: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Experimental setup: Deep neural network

1 The structure of DNNs is defined recursively on the layer index l. Theinput vector, zl ∈ Rd1 , is mapped to the representation vectorzl+1 ∈ Rd2 through an activation function fl as follows:

zl+1 = fl(Wlzl + bl), 0 ≤ l ≤ L− 1 (5)

where

fl(x) =

{tanh(x), 0 ≤ l ≤ L− 2

x, l = L− 1.

d1,d2 are the input and output dimensions of the lth layer. The Wl

and bl are the parameters of the network. These parameters areestimated by back propagation and stochastic gradient decent.

SPIRE LAB, IISc, Bangalore 50

Page 87: Pitch Prediction from Mel-frequency Cepstral Coefficients ...€¦ · Pitch Prediction from Mel-frequency Cepstral Coe cients Using Sparse Spectrum Recovery Achuth Rao MV, Prasanta

Conclusion and future work

Experimental setup: Deep neural network

1 DNNs for both DNNDCT and DNNb have the same architecture andtraining procedure, except for the number of hidden units in eachlayer. We use 4-layer network with 256 units in each layer for DNNb

and 512 units for DNNDCT .

2 The input data is normalized to zero mean and unit variance.

3 The network is initialized using glorot initialization.

4 The training is performed using stochastic gradient descent with abatch size of 256 and a momentum of 0.9. The 20% of training datais used to monitor the validation loss at each epoch and the weightupdate is stopped when there is no improvement in the validation loss.

SPIRE LAB, IISc, Bangalore 51