Upload
liliana-eaton
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Alternative Algorithmic Methods in the
Acoustical Noise and Function Testing
(alternatives to the classical Fourier Transformation)
Christoph Lauer – www.christoph-lauer.de
Alternative Signalanalyse Algorithms and Methods
Major:
•“Linear Predictive Coding“ (LPC) and allied Techniques•Algorithms and statistical Methods from the „Natural Language Processing“ (NLP)
Minor:
•Wavelet Transformation.•Jitter Analyse•Autocorrelation Signal „Information-Content“ extraction.•Magnitude Spectrum•BoxCarSmoothng, adjustable Logarithmization, Various Normalizations, ZeroCrossingRate, PolygonChain...
Linear Predictive Coding
• What is LPC
• Ho to extract the LPC Parameter
• The LPC Error and the application in the acoustical function test technique
• Alternative methods compared with the LPC-Error method
• Spectral Analysis via LPC
What is LPC•An „autoregressive gaussian process“ can be described with:
•Future Samples are predicted with the LPC coefficients:
•Generative LP parameters:
•LPC get his name from the fact that it predicts the current sample as a linear combination of its past p samples.
How to estimate the predictor coefficients and predict the future
• Coefficient Estimation:Beginning from a given set of input samples, we extract the coefficients which minimize the sum of the squared error.Complex method. Code lean from the “Numerical Recipies”. A standard Levinson-Durbin matrix inversion can be used to solve the LP Coefficients (with the Yule-Walker Equation).
• Future Prediction: Given the the LP Coefficients, a IIR (Pole-)Filter predicts the future samples.
The LPC Error
•The forward prediction Error for the pth order prediction can be written as:
•The prediction Error is the difference from the predicted future to real samples.
Application of the LPC Error in the acoustical noise and
function testing•Based on the LPC error Equation we implement
a Win-Shifted version.
•Base parameters of the implementation:previous prediction points:previous prediction points: Samples used to extract the Coeff’sfuture prediction pointsfuture prediction points: Samples to predict the futurenumber of coefficients:number of coefficients: The number of Coeff’s to computewindow shift:window shift: The step size
Results with the LPC-Error Method (defect Gear Motor)
previous prediction points:previous prediction points: 50 ms -> 1250 samplesfuture prediction pointsfuture prediction points: 10 ms -> 250 samplesnumber of coefficients:number of coefficients: 1249 LP-coefficientswindow shift:window shift: 1 ms -> 250 samples
Advantages Disadvantages form the
LPC-Error• Advatages: Seems to be a very Robust Method
• Disadvantages: Silent parts runs the algorithm into artiffical wrong predictions, so a precutted signal is necessary. Slow with high precicion !
• Notes: The numer of coefficients can not be greather than the number of previous samples from where the coefficients are extracted. Best precicion can be estimated if the number of coefficients is with the maximum possible coefficients. Because we coded the prediction by hand the speed performance from 5..6 with the compiler optimizations.
LPC-Error compared with alternative
methods
•Came from the so far not solved Problems in the acoustical noise an functiontesting we developed two further methods.
1. Autocorrelation based Information Content extraction.
2. Micro Changes in the Frequency Domain are called Jitter Analyse.
Autocorrelation based “Information-Content”
Extraction• We implement our own standard Autocorrelation with
variable inner summarization loop length M:
• We generate then the Autocorrelation result over the known time:
• The autocorrelation corresponds to the information-content (Informationsgehalt in German) over the time. Places where happens a lot the changes are high.
Results from the Autocorrelation
Inner Summarization Loop length = 128 samples
Advantages Disadvantages of the Autocorrelation
Method
Disadvanteges: We see empirically that the AC does not fit the best practice.
Advantages: This algorithm is very very fast.
The LPC-Spectrum Envelope
•The predicted future samples can be transformed into the frequency domain.
•The LPC-Spectrum interpolates an Envelope of the Powerspectrum.
LPC-Spectrum Example with 10 and 54 Filter
Coefficients
Jitter-Analysis based on the LPC-Spectrum
• Jitter is defined as variantions of the whole signal in the frequency domain.
• In the Signal We track the peak in the frequency domain!
How to compute the Jitter
•There is no real specification or calcualtion rule, our method:(1) Bandpass(2) track the peak frequency(3) compute the derivation of the peak function
Results from the Jitter Analyze
previous prediction points:previous prediction points: 8192 samplesfuture prediction pointsfuture prediction points: 1000 samplesnumber of coefficients:number of coefficients: 900 LP-coefficientswindow shift:window shift: 100 sampleslower band border:lower band border: 3000 Hzupper band border:upper band border: 3500 Hz
Advantages and Disadvantages of the Jitter-
Analyze
•Advantages: Very precise for signals with constrained small-band character.
•Disadvantages: The Signal must be pre-filtered to prevent disruptions. The Signal must lay in the selected band. Not so fast.
•Notes: Successful method for the detection defect motor transmissions.
Three Techniques
LPC Error
Autocorrelation
Jitter Analyze
Example: a defect Gear-Motor
Wavlet-Transformation
• The classical FT differ from the WT in the time localization capability which arrises from the kernel-function. Compared with STFT, the wavelet transformation scales octavewise in frequency domain and doubles the time resolution for each octave.
• We have currently a multiresolution version with 4 different base Kernels (Daubechies, Coiflet, Beylkin, Vaidyanathan) running. A problem could be finding a propper time start point for a classifyer.
Wavelet Package Decomposition
•Unlike the classical Multiresoltion Wavelet Pyramid Algorithm, the Lowpass results can be used for the further Analyze.
•We avoid the acoustical uncertainty relation !
Magnitude Spectrum
• DC-Substraction • Peak-Spectrum extraction• A real resampler with LP Filtering for the
Demodulation routines• Power Spectrum• Logarithmized Level Spectrum• Own Independent implementation of the Core
FFT algorithmus• Reusable/Structured/Documented source code
Three Resampling Algorithms
•Zero-Order-Hold Converter
•Linear Interpolation Converter
•Sinc Bandlimited Interpolator
Zero-Order-Hold Converter
• Interpolated value is equal to the last value while upsampling, downsampling is camb filtering .
• Poor quality, speed is blindlingly fast.
Linear Converter
• Included/excluded samples will be linear interpolated.•No antialiasing post filtering !•Conversion speed is blindlingly fast.
Sinc Bandlimited Interpolation in Theory
• Perfect reconstruction corresponds to applying a perfect Low Pass Filter with cutoff i.e. it corresponds to convolving with a sinc function
•The sinc function has a response that goes from
, so it cannot be used in practice, except for periodic signals.
•Multiplication of an Low Pass Filter in the frequency domain corresponds an convolution in the time doimain with an sinc function.
Sinc Bandlimited Interpolationin the Practice
Sinc Bandlimited Interpolationin the Practice
•The precision of the convolution rise and falls with the number of convolution coefficients.
•We have currently three filter banks generated with lengths from 24642464, 2243822438 and 340239340239 coefficients, for LowLow, MediumMedium and HighHigh quailty.
Antialiasing Filter Results of the SINC
Interpolator‘s
Low Quality with 2464 supporting points
Low Quality with 22438 supporting points
Low Quality with 340239 supporting points
Speed Comparization
SRC TYPESRC TYPE 48000 -> 6400048000 -> 64000 48000 -> 800048000 -> 8000 FactorFactor
ZOHZOH 0.0066 0.0061 163LINEARLINEAR 0.0064 0.0064 156
LOW_SINCLOW_SINC 0.12 0.12 8.33MEDIUM_SINCMEDIUM_SINC 0.29 0.29 3.45
HIGH_SINCHIGH_SINC 1.0 1.0 1.0
Additional Algorithms in the algorithm Collection
• Own FFT
• Own Autocorrelation
• Linear Predictive Coding (LPC-Spectrum, Prediction and Prediction Error)
• Jitter Analyze
• Sample Rate Converter
• Lin/Log function with scalabale reference point and log base (inplace and outplace)
• WaveFile writer/reader
• Zero Crossing Rate
• Polygonal Chain
• Wavelet Multiresoltion and Package (in preparation) Transformation
• Smoothing (Boxcar Algorithm)
• Nomalization (to AVG, RMS and Intervall)
• Auto Zeropadding to a length from a power of two
• Asynchron Exponention Window Function
The Alogrithm Collection – Programming Notes
• All Algortihms are Platform-Independet implemented in C++ and GCC/Make.
• Generic Typing is used if possible, Tempaltes.
• Clear Structured, well Documented and Rereusable code peaces.
• Namespace clauer::math:: and clauer::io::
Algorithms lean from the Natural Language Processing for the
Acoustical Noise and Function Test Technique
Roadmap:
•Introduction to the Speech-Recognition techniques.
•Speech recognition algorithms applicated to the noise and function testing.
•Problems with the resonance analysis.
•Alternatives for the Feature Extraction.
•Other tools developed.
HMM/GMM Speech Recognition Overview
Feature Extraction and MFCC‘s
•After the power spectrum estimation of the windowed signal, the logarithmic Mel-Filter bank matches best the distribution of the cilia in the ear snake.
•The last DCT is done to decorrelate the speech spectrum to achieve easyer post processing with gausian mixtures Later more here.
•[100,200,300,400,500,600,700,800,900,1200,1500,1800,2400,2900,2600] HZ
Speech Features
•We have 12 Band energy representaions corresponding the Mel Spectrum.
•We take the summ ernergy of the Band C0 - 13 features.
•Δ and the Δ Δ‘s of this represeantion.
•39 Dimensional Feature Vector
(1) Gaussian Mixture Models (GMM‘s)
•The normal-distribution.
•Multivariate normal-distribution.
(2) Gaussian Mixture Models (GMM‘s)
•While training we have a group of well known time windows and for each a 39 Dimensional Vector.
• For each group we cluster a Gaussian-Probability-Distribution, called Gaussian-Mixture-Model which consist of a mean vector and a covariance matrix.
(3) Gaussian Mixture Models (GMM‘s)
•The need for the last decorrelatig DCT in the Feature extraction can be seen in the covariance matrix.
Hidden Markov Models (HMM‘s)
•Progress of the GMM‘s over the time.
Training
• For the extraction of the transition probabiltys of the HMM the Viterbi-Algorithm/Baum-Welch is used.
•The extraction of the GMM parameters can be done with the Forward-Backward Algorithm.
•This is the complexest Part of the implementation. Over 20 Jears of development into this algorithms.
•The result is is the so called acoustic Model.
The Acoustic Model
Decoding/Klassification
Language Model
•The Language Model is usally based on so called statistical N-Grams.
•Bi and Trigram statistical Models.
• In case of our simplified Model, we have only a dictionary of 2 Word‘s (NOK and OK) We do not need a Language Model or Linguist because we do not want to detetct any concatenation of words in sentences.
Implementation
•Two fully implemented Recognizers are available: HTK(Cambridge University , lincense forbids redistribution) and Sphinx (Carnedgie Mellon Univerity, redistribution for commercial purposes allowed)
•HTK has a more cleaner strukture and is easyer to modify for my purposes, but the license doesnt match our needs.
•Sphinx was orininally developed by the ARPA and is available in 4 Versions, Sphinx1/2/3/4.
•We use the newer Versions 3 and 4.
Sphinx III
•Programmed in C++
•5-10 real time processing.
•3 or 5-state continious HMM topologies.
• Live and Batch/File Operation.
•Statistical and Binary Models.
• FST decoding, re-factoring, re-architect
Sphinx IV
•Programmed in Java.
• Faster than Sphinx III.
•Continious and semicontinious density acoustic models with arbitrarily number of states.
•Bigram, Trigram or finite State grammar language model.
• Fasts Viterby search.
Sphinx Train
•Sphinx 3/4 Decoders have no Trainer included.
• For Sphinx 3 the SphinxTrain Package is available, this allows the training of acoustic models with the Baum-Welch algorithm.
•Sphinx 3 uses the same Feature extraction, as the Sphixn Train Package.
•The Sphinx 4 Trainer is not finished so far, but there is a wrapper for the Sphinx-3-Models trained with the Sphinx Train Package.
Research-Recognizer
• Our-Research-Prototype is implemented in MS-Windows with CYGWIN, it should run on other systems too.
• Programming Languages: Python, Perl, C++, Bash, Ruby…
• The whole recognizer is a script controlled collection of over 200 small command-line Programms.
• The Tainer and the Decoder are separate Programms.
• The whole Recognizer Envionemant has a size of 700MB.
• We use Phoneme-Models instead of Word-Models !!!
Research-Recognizer
• Phoneme Models instead of Word Models
•Acoustic-Model training with 3-State HMM‘s and the standart MFCC‘s with 39 Features ((12+1) * 3)
•Use the Sphinx4 (Java) and Sphinx3 (C++)Decoders
Life Demo and Paper
IO NIO
Results•The spectrum of the asparagus matches very well the mel scale.•The length of the input samples (50ms) matches a the 3-State HMM phoneme.
Applicated to the Acoustical Noise Testing
Our research-recognition-system runs very well for the problems in the acoustical noise testing where the spectral distribution of the impulse responses matches the MFCC spectrum distribution. For example the impulse-response of asparagus.
Problems• In the acoustical resonance analysis we have to concentrate only to small region of interests.
• We run into a problem with the Frequency-Time resolution because for the feature extraction the time window is only 10ms long. With the classical Fourier-Transformation we run into problems with the Uncertianty Principle.
• We have to modyfy the Featre extraction because in the acoustical resonance analysis we have to exchange the MFCC‘s with our special case features Build our own Feature-Extraction.
Frequenz [Hz]0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
-50
-40
-30
-20
-10
0
10
20
Acoustic Uncertainty Principle Example
1.ws = 10ms, sr = 8000, nf = 4000Hz 80 samples 40 spectral points Δf = 4000Hz / 40 = 100Hz
2.ws = 10ms, sr = 50000, nf = 25000Hz 500 samples 250 spectral points Δf = 25000Hz / 250 = 100Hz
Modification
Sphinx Train
• In case of the resonace analysis we need to modify the feature extraction because we have only a few samples from the impulse responses unlike human voice recordings.
• In case of the Sphinx 4 Decoder we have to modify the feature extraction at two different places because the lack of the Sphinx 4 Trainer.
Avoid the Uncertainty Principle
• We need an alternative to the Fourier Transformation to avoid the Uncertainty Principle because we have only a few samples.
• Two possible methods to aviod the Uncertainly Principle.
1.) The Wavelet-Packet-Decomposition2.) The Wigner-Ville-Distibution
Wigner-Ville Distribution• Time-frequency-distribution of a signal with very high time and frequency
resolution.
• The Wigner-Ville Transformation came from physic in 1939 to add quantum-corrections to classical mechanic.
detststX jWVD
22,
-
12/
2/
/4][*][2],[N
Nk
NmkjWVD eknsknsmnX
Pseudo-Smoothed-Wigner-Ville-Distribution
PSWVD Implementation
DISLIN Data-Plotter-Tool
•Data Visualization Programming Library from the MPI in Lindau for any Language (C/C++, Fortran, Java, Perl, Phyton..)
•Prints to the Screen, Printer and nay Image Format (PNG, GIF, WMF, JPEG, BMP, XFIG
•Commercial License 120 €
Data Plotter
Contour Plot
Waterfall Plot
Wavlet-Transformation
• The differences between the Wavelet and the classical FT is the localization which arrises from another kernel-function. Compared with STFT the wavelet transformation scaled automatically the time resolution dependign on the frequency.
• We have currently a multiresolution version with 4 different base Kernels (Daubechies, Coiflet, Beylkin, Vaidyanathan) and currently no application for the Wavlet-Transformation because we need a time calibration/reference point to be able to build a classifyer.
Wavelet Packet Decomposition
•The DWT bases on orthogonal Filter Banks where the high pass will be used to calcualte the next Wavelet Octave.•We have multiresolution pyramid Implementation.
Wavelet Packet Decomposition
Conclusion
Development Roadmap
•Nice to have: An fully automatic classiciation system besed on Language Technology.
•Need for new algorithms because the constrains of the Fourier-Transformation.
•An alternative to the mechanical impulsre response extraction e.g. Asparagus project Sine-Sweept-Technique
Development Roadmap
Development RoadmapBasic Buildng Blocks
Damping Factor• Calculation of the Envelope via the Hilbert Transformation.• Extract via the exponential Regression the Damping constant.
Sine Sweep
•The Impulse Response can also be extracted via the recorded Noise response:
•With a sine Sweep as actor signal and a special deconvolution the nonlinearities can be separated.
Thank You
Q & A