8. Cepstral Analysis - Xavier Anguera · Cepstral analysis of speech As pointed out at the beginning, we would like to separate the excitation from the vocal tract filter h(n) by

8. Cepstral Analysis

(most slides taken from MIT course by Glass and Zue)

Homomorphic filtering Homomorphic filtering/transformation is a nonlinear

transformation usually applied to image and speech processing used to convert a signal obtained from a convolution of two original signals into the sum of two signals.

In speech processing it can be applied to separate the filter from

the excitation in the source-filter model The cepstrum is one such homomorphic transformation that allows

us to perform such separation. It is an alternative option to linear prediction analysis seen before

x[n]= e[n]!h[n]" x[n]= e[n]+ h[n]

x[n]= D(x[n])

s[n]= u[n]!h[n]

TDP: Cepstral Analysis 3

Cepstral analysis is based on the observation that

by taking the log of X(z)

If the complex log is unique and the z transform is valid then, by applying Z-1

the two convolved signals are now additive.

Basics of cepstral analysis

Basics of cepstral analysis (II) Consider now that we restrict our signal x[n] to have poles and zeros only in the unit circle, i.e.:

then if

This is the complex logarithm of X(w)

Definition of Cepstrum The real cepstrum is defined as:

Its magnitude is real and non-negative And the complex cepstrum:

Where arg() represents the phase. We call it complex because it uses the complex logarithm, not due to the sequence, which can also ne real. In fact, the complex cepstrum of a real sequence is also real


Guido Geßl 21.01.2004

original definition - real cepstrum

Inverse Fourier transfrom of the logarithm of themagnitude of the Fourier transform:

[ ] ( ) !"

!"

"

! �� #

$

= log21

magnitude is real and nonnegative

not invertible as the phase is missing


more general definitioncomplex cepstrum

!"

=#

#

$$ $#

�� )](log[21][ˆ

!"

+=#

#

$$$ $#

�� ))](arg()(log[21

arg is the continuous (unwrapped) phase function.

The term complex cepstrum refers to the use of thecomplex logarithm, not to the sequence.

The complex cepstrum of a real sequence is also a real sequence.

Definition of cesptrum(II)

It can be shown that the the real cepstrum is the even part of the complex cepstrum:

In speech processing we generally use the real cepstrum, which is

obtained by applying an inverse Fourier Transform of the log-spectrum of the signal.

In fact, the name “cepstrum” comes from inverting the first syllable of the word “spectrum”. Similarly, the variable “n” in cx[n] is called “quefrency”, which is the inversion of “frequency”


The real cepstrum is the inverse transform of the real partof and therefore is equal to the conjugate-symmetric part of :

)(ˆ !��][ˆ ��

2][ˆ][ˆ][

* ��

"+=

Properties of cepstrums


From this we can derive the following generalproperties:1) the complex cepstrum decays at least as

fast as 1/|n|2) it has infinite duration, even if x[n] has

finite duration3) it is real if x[n] is real (poles and zeros are

in complex conjugate pairs)

NOTE: from 2) and 3) we see why usually a finite number of cepstrums is used in speech processing (12-20 is sufficient), as very high order cepstrums have very small values.


An example

(Using Taylor series expansion and several tricks)

Given the “r” in the denominator, it is an infinite train of deltas that converges to 0


(...)


Computational considerations: using DFT In digital signals we replace the Fourier Transform by the Discrete Fourier Transform

Aliasing by repetition of the cepstrums with period N

When N> the number of used cepstrums we do not have a problem (which is usually the case)

Cepstral analysis of speech As pointed out at the beginning, we would like to separate the

excitation from the vocal tract filter h(n) by using a homomorphic transformation.

We can do so easily as the filter parameters usually reside in the lower quefrencies, while the excitation parameters have higher quefrencies

HOMOMORPHIC TRANSFORMATIONS

A homomorphic transformation converts a convolution into a sum:

Consider the problem of recovering a filter's response from a periodic signal (such as a voiced excitation):

The filter response can be recovered if we can separate the output of the homomorphic transformation using a simple filter:

Note that the process of separating the signals is essentially a windowing processing. Is this useful for speech processing?


Cepstral analysis of speech

13

Cepstrum of a generic voiced signal

magnitude spectrum

cepstrum

•  Contributions to the cepstrum due to periodic excitation will occur at integer multiples of the fundamental period. NOTE that for children and high-pitch women we might have a problem

•  Contributions due to parameters usually modeled by the filter will concentrate in the low quefrency region and will decay quickly with n

Cepstral analysis of speech (voiced signals)

SOURCE-FILTER SEPARATION VIA THE CEPSTRUM

An example of source-filter separation using voiced speech:

(a) Windowed Signal

(b) Log Spectrum

(c) Filtered Cepstrum (n < N)

(d) Smoothed Log Spectrum

(e) Excitation Signal

(f) Log Spectrum (high freq.)

An example of source-filter separation using unvoiced speech:

(a) Windowed Signal

(b) Log Spectrum



The reason this works is simple: the fundamental frequency for the speaker produces a peak in the cepstrum sequence that is far removed (n > N) from the influence of the vocal tract (n < N). You can also demonstrate this using an autocorrelation function. What happens for an extremely high-pitched female or child?

Cepstral analysis of speech (unvoiced signals)

SOURCE-FILTER SEPARATION VIA THE CEPSTRUM

An example of source-filter separation using voiced speech:

(a) Windowed Signal

(b) Log Spectrum



(e) Excitation Signal

(f) Log Spectrum (high freq.)

An example of source-filter separation using unvoiced speech:

(a) Windowed Signal

(b) Log Spectrum



The reason this works is simple: the fundamental frequency for the speaker produces a peak in the cepstrum sequence that is far removed (n > N) from the influence of the vocal tract (n < N). You can also demonstrate this using an autocorrelation function. What happens for an extremely high-pitched female or child?


Cepstral analysis of vowel (rectangular window)


Cepstral analysis of vowel (tapering window)


Cepstral analysis of fricative (rectangular window)


Cepstral analysis of fricative (tapering window)


Use in Speech Recognition


Statistical properties of cepstral coefficients


Mel-frequency cepstral representation


MFCC computation diagram


Mel-filter bank processing

Documents

8. Cepstral Analysis - Xavier Anguera · Cepstral analysis of speech As pointed out at the beginning, we would like to separate the excitation from the vocal tract filter h(n) by