Upload
kristin-floyd
View
221
Download
0
Embed Size (px)
DESCRIPTION
By Sarita Jondhale 3 Generalizations of Filter Bank Analyzer Postprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs vectors Principal components analysis of the filter bank output vector. The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal. sp
Citation preview
By Sarita Jondhale 1
Signal preprocessor: “conditions” the speech signal s(n) to newform which is more suitable for the analysis
Postprocessor: operate on the ouptut x(m) to give the processedoutput that are more suitable for recognition
Generalizations of Filter Bank Analyzer
)(ˆ mx
)(ˆ ns
By Sarita Jondhale 2
Generalizations of Filter Bank Analyzer
Preprocessor Operations Signal preemphasis: higher frequencies are increased in
amplitude Noise elimination Signal enhancement (to make the formant peaks more
prominent)
The purpose of pre processor is to make the speech signal as clean as possible
By Sarita Jondhale 3
Generalizations of Filter Bank AnalyzerPostprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs
vectors Principal components analysis of the filter bank output vector.
The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal.
sp
By Sarita Jondhale 4
Spectral Analysis
Two methods:
The Filter Bank spectrum
The Linear Predictive coding (LPC)
By Sarita Jondhale 5
Linear Predictive Coding
Estimating the parameters of the current speech sample by using the parameter values from linear combinations of past speech samples.
By Sarita Jondhale 8
Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples.
We can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound).
By Sarita Jondhale 9
We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:
P
k
k knSas1
)(
By Sarita Jondhale 10
The LPC Model
The speech sample at time n can be approximated as a linear combination of the past p speech samples,
.excitation theofgain theisG
and excitation is u(n) where)()()(
frame analysisspeech over theconstant assumed area,......,a,a tscoefficien thewhere
)(.....)2()1()(
1
p21
21
P
ii
p
nGuinsans
pnsansansans
By Sarita Jondhale 11
The number of samples (p) is referred to as the “order” of the LPC.
As p approaches infinity, we should be able to predict the nth sample exactly.
However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation.
The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value.
Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible
By Sarita Jondhale 18
Error signal
It is defined as difference between predicted sample and actual signal
S[n] is actual signal S[n] is predicted signal
p
kns1k
k ][a-s[n]s[n]-s[n]e[n]
By Sarita Jondhale 25
LPC Analysis Equations
Two methods of defining the range of speech (m) The autocorrelation method The covariance method
By Sarita Jondhale 26
The Autocorrelation Method
otherwise 01-N0 ),().(
{)(
mmwmns
msn
The speech signal s(m+n) is multiplied by a finite window, w(m)Which is zero outside the range 0≤m ≤N-1The purpose of window in the above equation is to taper the signal near m=0 and near m=N-1 so as to minimize the errors at section boundaries
By Sarita Jondhale 27
The Autocorrelation Method
Upper panel shows the running speech waveform s(m)
Middle panel shows the weighted section of speech
Bottom panel shows the resulting error signal en(m), based on optimum selection of predictor parameters
By Sarita Jondhale 30
The Autocorrelation Method
For m<0 , the prediction error i.e en(m)=0 since sn(m)=0 for all m<0
For m>N-1+p there is no prediction error because sn(m)=0 for all m>N-1
By Sarita Jondhale 38
Examples of LPC AnalysisThe figure shows the effect of LPC prediction order, p, on the prediction error En, for both voiced and unvoiced speech.•For small values of p (1-4) a sharp decrease in prediction error•As p increases error decreases much more slowly•The prediction error for unvoiced speech for a given value of p, is significantly higher than for voiced speech.•i.e. the unvoiced speech is less linearly predictable than voiced speech.
By Sarita Jondhale 40
LPC Processor for Speech Recognition
By Sarita Jondhale 42
LPC Processor for Speech Recognition1. Preemphasis: the digital system (First order FIR filter)
used in the preemphasizer is either fixed or slowly adaptive to average transmission conditions, noise background.The output is related to the input by
sp
By Sarita Jondhale 43
LPC Processor for Speech Recognition
2. Frame Blocking: In this step the preemphasized speech signal is blocked into frames of N samples, with adjacent frames being separated by M samples.
By Sarita Jondhale 44
Typical LPC Analysis Parameters N: number of samples in analysis frame. M: number of samples shift between analysis frames. p: LPC analysis order. Q: dimension of LPC derived cepstral vector. K: number of frames over which cepstral time derivatives are
computed.
By Sarita Jondhale 45
LPC Processor for Speech Recognition
• In the above figure M=(1/3)N• First frame consist of first N speech samples.• The second frame begins M samples after the first frame
and overlaps it by N-M samples• Similarly third frame begins 2M samples after the first frame (or M
samples after the first samples) and overlaps it by N-2M samples• This process continues until all the speech is accounted for within
one or more frames.
By Sarita Jondhale 46
LPC Processor for Speech Recognition
n.recognitiospeech for analysis LPC practicalany in eintolarabl issituation This 4.
analysis). from omitted isspeech more (as increases M as increasesmagnitude whoescomponent noisy acontain willspecral LPC
lost. totally be willsignalspeech theof some thereforeframes,adjacent between overlap no be will thereN,M If 3.
smooth. quite be willframe toframe from estimates spectral LPC then N,M If 2.
frame. toframe from correlated be willestimates spectral LPC resulting and overlap, framesadjacent then N,M If 1.
By Sarita Jondhale 47
LPC Processor for Speech Recognition3. Windowing: the next step in the processing is to window
each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame (same as short time spectrum in frequency domain)
i.e. we use the window to taper the signal to zero at the beginning and end of each frame
10 ,)()()(~signal theis windowingofresult The10 window, theis )(
signalspeech entire e within thframes ofnumber totalL1......,2,1,0 respeech whe of frame theis )( th
Nnnwnn
Nnnw
Llln
ll
l
By Sarita Jondhale 48
LPC Processor for Speech Recognition A “typical” window used for the autocorrelation method of
LPC is the Hamming window
10 ,1
2cos46.054.0)(
Nn
Nnnw
By Sarita Jondhale 49
LPC Processor for Speech Recognition4. Autocorrelation Analysis: Each frame of windowed signal
is next autocorrelated to give
used) are 16 to8 from p of (values analysis LPC theoforder theis n value,correlatiohighest theis p
p,0,1,2,....m ,)(~)(~)(1
0
mN
nlll mnxnxmr
By Sarita Jondhale 50
LPC Processor for Speech Recognition
5. LPC Analysis: the next step is the LPC analysis, which converts each frame of P+1 autocorrelations into an “LPC parameter set”.
The set might be The LPC coefficients The reflection (or PARCOR) coefficients The log area ratio coefficients The cepstral coefficients Or any desired transformations of the above sets
The method for converting from autocorrelation coefficients to an LPC parameter set is known as Durbin’s method
By Sarita Jondhale 53
LPC Processor for Speech Recognition The cepstral coefficients are the coefficients of the FT
representation of the log magnitude spectrum. The cepstral coefficients are more robust reliable feature set for
speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.
By Sarita Jondhale 54
LPC Processor for Speech Recognition7. Parameter Weighting: because of the sensitivity of the low-
order cepstral coefficients to overall spectral slope and sensitivity of the high-order cepstral coefficients to noise, it is necessary to weight the cepstral coefficients by a tapered window to minimize these sensitivities.
By Sarita Jondhale 57
8. Temporal Cepstral Derivative: the Cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame
Better representation can be obtained by including the information about temporal Cepstral derivative.
tat timet coefficien cepstral m theis )(c where
)(),(log
thm t
ettcteS
t m
mjmj