By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate

By Sarita Jondhale 1

Signal preprocessor: “conditions” the speech signal s(n) to newform which is more suitable for the analysis

Postprocessor: operate on the ouptut x(m) to give the processedoutput that are more suitable for recognition

Generalizations of Filter Bank Analyzer

)(ˆ mx

)(ˆ ns


Generalizations of Filter Bank Analyzer

Preprocessor Operations Signal preemphasis: higher frequencies are increased in

amplitude Noise elimination Signal enhancement (to make the formant peaks more

prominent)

The purpose of pre processor is to make the speech signal as clean as possible


Generalizations of Filter Bank AnalyzerPostprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs

vectors Principal components analysis of the filter bank output vector.

The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal.

sp


Spectral Analysis

Two methods:

The Filter Bank spectrum

The Linear Predictive coding (LPC)


Linear Predictive Coding

Estimating the parameters of the current speech sample by using the parameter values from linear combinations of past speech samples.


Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples.

We can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound).


We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:

P

k

k knSas1

)(


The LPC Model

The speech sample at time n can be approximated as a linear combination of the past p speech samples,

.excitation theofgain theisG

and excitation is u(n) where)()()(

frame analysisspeech over theconstant assumed area,......,a,a tscoefficien thewhere

)(.....)2()1()(

1

p21

21

P

ii

p

nGuinsans

pnsansansans


The number of samples (p) is referred to as the “order” of the LPC.

As p approaches infinity, we should be able to predict the nth sample exactly.

However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation.

The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value.

Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible


Error signal

It is defined as difference between predicted sample and actual signal

S[n] is actual signal S[n] is predicted signal

p

kns1k

k ][a-s[n]s[n]-s[n]e[n]


LPC Analysis Equations

Two methods of defining the range of speech (m) The autocorrelation method The covariance method


The Autocorrelation Method

otherwise 01-N0 ),().(

{)(

mmwmns

msn

The speech signal s(m+n) is multiplied by a finite window, w(m)Which is zero outside the range 0≤m ≤N-1The purpose of window in the above equation is to taper the signal near m=0 and near m=N-1 so as to minimize the errors at section boundaries



Upper panel shows the running speech waveform s(m)

Middle panel shows the weighted section of speech

Bottom panel shows the resulting error signal en(m), based on optimum selection of predictor parameters



For m<0 , the prediction error i.e en(m)=0 since sn(m)=0 for all m<0

For m>N-1+p there is no prediction error because sn(m)=0 for all m>N-1


Examples of LPC AnalysisThe figure shows the effect of LPC prediction order, p, on the prediction error En, for both voiced and unvoiced speech.•For small values of p (1-4) a sharp decrease in prediction error•As p increases error decreases much more slowly•The prediction error for unvoiced speech for a given value of p, is significantly higher than for voiced speech.•i.e. the unvoiced speech is less linearly predictable than voiced speech.


LPC Processor for Speech Recognition


LPC Processor for Speech Recognition1. Preemphasis: the digital system (First order FIR filter)

used in the preemphasizer is either fixed or slowly adaptive to average transmission conditions, noise background.The output is related to the input by

sp



2. Frame Blocking: In this step the preemphasized speech signal is blocked into frames of N samples, with adjacent frames being separated by M samples.


Typical LPC Analysis Parameters N: number of samples in analysis frame. M: number of samples shift between analysis frames. p: LPC analysis order. Q: dimension of LPC derived cepstral vector. K: number of frames over which cepstral time derivatives are

computed.



• In the above figure M=(1/3)N• First frame consist of first N speech samples.• The second frame begins M samples after the first frame

and overlaps it by N-M samples• Similarly third frame begins 2M samples after the first frame (or M

samples after the first samples) and overlaps it by N-2M samples• This process continues until all the speech is accounted for within

one or more frames.



n.recognitiospeech for analysis LPC practicalany in eintolarabl issituation This 4.

analysis). from omitted isspeech more (as increases M as increasesmagnitude whoescomponent noisy acontain willspecral LPC

lost. totally be willsignalspeech theof some thereforeframes,adjacent between overlap no be will thereN,M If 3.

smooth. quite be willframe toframe from estimates spectral LPC then N,M If 2.

frame. toframe from correlated be willestimates spectral LPC resulting and overlap, framesadjacent then N,M If 1.


LPC Processor for Speech Recognition3. Windowing: the next step in the processing is to window

each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame (same as short time spectrum in frequency domain)

i.e. we use the window to taper the signal to zero at the beginning and end of each frame

10 ,)()()(~signal theis windowingofresult The10 window, theis )(

signalspeech entire e within thframes ofnumber totalL1......,2,1,0 respeech whe of frame theis )( th

Nnnwnn

Nnnw

Llln

ll

l


LPC Processor for Speech Recognition A “typical” window used for the autocorrelation method of

LPC is the Hamming window

10 ,1

2cos46.054.0)(

Nn

Nnnw


LPC Processor for Speech Recognition4. Autocorrelation Analysis: Each frame of windowed signal

is next autocorrelated to give

used) are 16 to8 from p of (values analysis LPC theoforder theis n value,correlatiohighest theis p

p,0,1,2,....m ,)(~)(~)(1

0

mN

nlll mnxnxmr



5. LPC Analysis: the next step is the LPC analysis, which converts each frame of P+1 autocorrelations into an “LPC parameter set”.

The set might be The LPC coefficients The reflection (or PARCOR) coefficients The log area ratio coefficients The cepstral coefficients Or any desired transformations of the above sets

The method for converting from autocorrelation coefficients to an LPC parameter set is known as Durbin’s method


LPC Processor for Speech Recognition The cepstral coefficients are the coefficients of the FT

representation of the log magnitude spectrum. The cepstral coefficients are more robust reliable feature set for

speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.


LPC Processor for Speech Recognition7. Parameter Weighting: because of the sensitivity of the low-

order cepstral coefficients to overall spectral slope and sensitivity of the high-order cepstral coefficients to noise, it is necessary to weight the cepstral coefficients by a tapered window to minimize these sensitivities.


8. Temporal Cepstral Derivative: the Cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame

Better representation can be obtained by including the information about temporal Cepstral derivative.

tat timet coefficien cepstral m theis )(c where

)(),(log

thm t

ettcteS

t m

mjmj

Documents

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate