View
219
Download
0
Category
Preview:
Citation preview
7/28/2019 Wavelets for Sound Analysis
1/63
Wavelets for Sound Analysis
and Re-Synthesis
Graham Self
2nd
May 2001
Project Supervisor: Dr. Guy Brown
Second Marker: Dr. Joab Winkler
This report is submitted in partial fulfilment of the requirement for the Bachelor of
Science Dual Honours in Computer Science and Mathematics byGraham Self
7/28/2019 Wavelets for Sound Analysis
2/63
Page I
Declaration
All sentences or passages quoted in this dissertation from other peoples work have
been specifically acknowledged by clear cross-referencing to author, work and page(s).
Any illustrations which are not the work of the author of this dissertation have been
used with the explicit permission of the originator and are specifically acknowledged. I
understand that failure to do this amounts to plagiarism and will be considered grounds
for failure in this dissertation and the degree examination as a whole.
Name:
Signature:
Date:
7/28/2019 Wavelets for Sound Analysis
3/63
Page II
Abstract
This paper firstly introduces the main branch of mathematics that lead to the discovery ofWavelets and their transforms, Fourier Analysis. The paper gives a detailed account of the
development of different Fourier transforms and describes the motivation for a need for a
different transform due to the multiresolution problem. The theory of Wavelets is then
introduced as a solution to this problem together with a detailed account of the Continuous and
Discrete Wavelet Transforms.
The paper also documents the development, testing and evaluation of two Computer Assisted
Learning Tools that students could use to learn about wavelet theory.
7/28/2019 Wavelets for Sound Analysis
4/63
Page III
Acknowledgments
Thanks go to
My Parents, for being supportive
Alice for being there every step of the way
Guy for endless proof reading
And to all those involved in the testing and evaluation
7/28/2019 Wavelets for Sound Analysis
5/63
Page IV
Contents
Introduction 1
1: Fourier Analysis 21.1 The Statement that changed mathematics 2
1.2 Applications of Fourier Analysis 3
1.3 The Fourier Transform Finding the frequency content of a signal 3
1.4 The Discrete Fourier Transform 4
1.5 The Fast Fourier Transform 5
1.6 The Time/Frequency problem 5
1.7 The Short Term Fourier Transform 6
1.8 Technical definitions for Chapter 1 8
2 : Introduction to Wavelets 11
2.1 Where did they come from? 112.2 The Mother Wavelet 122.3 Wavelets achieve Multiresolution 12
2.4 How do you create a Wavelet? 13
3 : The Continuous Wavelet Transform 143.1 Theory 14
3.2 Computation of the CWT 15
3.3 Visualising the CWT 3D Plot 163.4 Visualising the CWT Scalograms 16
4 : The Discrete Wavelet Transform 17
4.1 Why not use the CWT? 174.2 Discretizing the Continuous Wavelet Transform 17
4.3 Subband Coding 184.4 Example of Subband Coding 20
5 : Computer Assisted Learning Tools 225.1 Computer Assisted Learning (CAL) 225.2 Which programming language? 23
6 : Requirements Analysis 246.1 The initial requirement 24
6.2 The Client and Developer scenario 24
6.3 The Matlab Auditory Demos 26
7 : Software Development 277.1 Getting familiar with MATLAB 27
7.1.1 MATLABs GUI 27
7.1.2 Coding in MATLAB 28
7.2 Wavelet Learning Tool - Interface Design 29
7.3 The coding of The Wavelet Learning Tool 307.3.1 Coding the CWT 30
7.3.2 Coding the Inverse CWT 32
7.3.3 Input error recovery 32
7.4 Development problems 337.4.1 Surf vs Imagesc 33
7.4.2 The disappearing cursor 34
7/28/2019 Wavelets for Sound Analysis
6/63
Page V
7.5 Development of the Subband Learning Tool 35
7.6 Screen Shots 35
7.7 Functionality of the Wavelet Learning Tool 36
8 : Software Testing 388.1 Motivation 38
8.2 Functional Testing 39
8.3 Testing the Wavelet Learning Tool 398.3.1 Random Testing 40
8.3.2 Testing Structural Synthesis 41
8.3.3 The Category-Partition method 44
8.4 Testing Summary 45
9: Evaluation 469.1 Questionnaire Construction 469.2 User Guide 47
9.3 Questionnaire Results 47
9.4 Evaluation Summary 499.5 Evaluation of the Subband Learning Tool 49
9.6 Future Work 49
10: Conclusions 51
References 52
Appendix A Questionnaire 53
Appendix B Wavelet Learning Tool User Guide 55
Appendix C Software Development Log 57
7/28/2019 Wavelets for Sound Analysis
7/63
Introduction
Page 1
Introduction
Wavelets have many historical routes leading up to their discovery as they can be used in
many different scientific areas. This dissertation documents the usefulness and importance
wavelets have when calculating a frequency analysis on a signal. The main route of thediscovery of wavelets came as a natural progression from the Short Term Fourier Transform,
as there was a need for a better analysis technique. The problem with using Fourier Transforms
is that a frequency analysis cant give both good frequency and good time resolution at the
same time. This is known as the multiresolution problem and was solved by using Waveletsinstead of a fixed analysis window. The wavelets have the important property that they dont
have to be fixed in size. They can be easily controlled by parameters, which compress and
dilate the wavelet depending on which frequency is being analysed in the signal. An account of
how to create wavelets using the mother wavelet formula and how they can be applied to
signal analysis is also given in this paper.
The free movement of the wavelets was exploited in the Continuous Wavelet Transform to
give a detailed representation of the frequencies in a signal. The signal could now be
represented with both good time and good frequency resolution, so unlike before, the user
could tell not only what frequencies occurred in the signal, but also when they occurred. Thetransform introduces the notion of a scale, which is equivalent to the inverse of the frequencies
being analysed. The continuous wavelet transform produces a coefficient matrix, which can be
visualised by the use of a scalogram, which is also investigated. The problem that the
continuous wavelet transform has is that it calculates a coefficient for each sample in the signal
for every scale that is to be calculated. For long signals, this can be time consuming and
uneconomical when applying the transform computationally.
The solution to this problem was solved by the development of a discrete version of thewavelet transform. The dissertation details the reasons for wanting to use a discrete version
along with the arguments for showing how this can be done without loss of data. The DiscreteWavelet Transform is based on the ideas behind Nyquists Sampling Theorem and a method
for calculating the transform is given called Subband Coding.
A large part of this dissertation gives a detailed account of the development, testing andevaluation of two Computer Assisted Learning (CAL) Tools. Because wavelet theory is a
fairly new topic, lecturers are limited with the way they can teach the topic to new students
who are unfamiliar to the concepts that are involved. Therefore, CAL tools were developed to
aid in the teaching of wavelets and to provide an interactive demonstration of what thetransforms can do and how they provide benefits to signal analysis. The dissertation provides
evidence for the benefits for using CAL tools with a series of lectures and also what
requirements were needed for them to be successful.
The main tool developed alongside this dissertation is called the Wavelet Learning Tool. This
was developed to demonstrate visually how wavelets are used in the continuous version of the
wavelet transform. The user is able to compare the effects of using different wavelets and tosee a visual representation of the coefficient matrix produced, called a scalogram. The
scalogram effectively shows the signal in the frequency domain. As the title of the dissertation
suggests, the user can also re-synthesise a signal from a scalogram that has been altered. The
tool allows the user to remove scales from the scalogram and then listen to the effects this has
on the overall signal. This is effectively a frequency filtering technique.
The second tool called the Subband learning Tool was developed to show visually the process
undergone during Subband coding, which is the discrete version of the wavelet transform.
Subband coding is, in its simplest terms, a series of filtering and sub-sampling operations andthe user can see what effects on a speech signal each stage of the process has.
7/28/2019 Wavelets for Sound Analysis
8/63
Fourier Analysis
Page 2
1 : Fourier Analysis
1.1 The Statement that Changed Mathematics
Before 1930, the main branch of mathematics leading to Wavelet Theory formed as a result ofa natural progression from the world of the Fourier analysis. Fourier analysis is a significant
discovery and has influenced many different areas of applications including science, maths,
engineering and most important of all, signal processing. Joseph Fourier started off the
Fourier revolution by introducing a simple mathematical statement:
Any periodic function can be represented as a sum of sines and cosines
This discovery had fundamental importance to the signal processing world, as a complex
signal could be visualised as a combination of smaller signals of which we know a lot about.
Studying these smaller signals i.e. sine waves and cosine waves, will directly allow you toinfer properties about the more complex signal. We will see later how the amplitudes of the
sine and cosine waves in a complex signal enable you to calculate the frequencies in the signal.
Fouriers statement can be expressed mathematically as follows.
f(x) = )sincos(2
1
1
0
++ kxbkxaa kk [EQ 1]
Where the Fourier Coefficientskk
baa ,,0 are defined as
=
2
0
0 )(2
1dxxfa =
2
0
)cos()(1
dxkxxfak =
2
0
)sin()(1
dxkxxfbk
[M 1993]
This is known as the Fourier Series and says that any curve that periodically repeats itself can
be expressed as the sum of perfectly smooth oscillations i.e. the sines and cosines.
Barbara Burke Hubbard described how the Fourier Series can be represented by a process ofmultiplying various sinusoidal waves (sines and cosines) with certain amplitude coefficients
and then shifting them so they either add or cancel [H 1995]. An example of this procedure is
shown in figure 1.1.1
Figure 1.1.1
a) sin(x), b) sin(x) + sin(2x), c) sin(x) + sin(2x) + (4+ sin(3x))
The sinusoidal waves are called basis signals as they form a basis of any function. These are
also the basis for the Fourier Transform (see Section 1.3), which is derived from the property
described above. The sines and cosines form a basis of any signal. See page 8 for anexplanation of a basis
a)
b)
c)
7/28/2019 Wavelets for Sound Analysis
9/63
Fourier Analysis
Page 3
1.2 Applications of Fourier Analysis
Not only did the Fourier Series aid mathematicians to differentiate difficult functions; the
Fourier Series opened a new door to frequency analysis. Before Fourier, raw signals werealways represented in the time domain. The problem with a signal being represented in the
time domain is that no information, except amplitude, can be given at a specific time. It isoften more convenient to be able to see what frequencies occur at different time intervals.
As described before, the Fourier Series consist of a combination of sinusoids. If you extract the
amplitudes of each sinusoidal component you obtain the Fourier Coefficients. As the sinusoids
are equally spaced in frequency i.e. sin x, cos x, sin 2x, cos 2x knowing these coefficients
gives us information on which frequencies are present in the function.As sound signals are effectively represented as a function, the Fourier Series enables us to
extract the frequencies present in the signal.
This leads us to the Fourier Transform, which converts a signal from the time-domain into the
frequency-domain.
1.3 The Fourier Transform Finding the frequency content of a signal
The Fourier Transform, as introduced above, converts information about a signal in the time-
domain into a signal in the frequency-domain. It also allows you to go back without loss of
information.
Figure 1.3.1
The only frequencies that contribute to the Fourier Series of a periodic function (see 1.1) arethe integer multiples of the functions fundamental frequency. This fundamental frequency,
also known as the base frequency is the inverse of the period of the signal. For example, if the
signal has a period of 2 ms ( = 0.002 s), the fundamental frequency will be 1/0.002 = 500Hz.The Fourier Transform (FT) also allows certain non-periodic functions (those that decrease
fast enough so that the area under their graphs is finite) to be converted i.e. it is still possible to
describe it in terms of its frequencies. But to do this, you need to compute coefficients for all
possible frequencies, to compute its FT.
The Fourier Transform can be derived from the Fourier series, where the coefficients for a and
b are defined as follows.
= xdxxfa 2cos)()(
= xdxxfb 2sin)()( [H 1995] [EQ 2]
Since we are now dealing with a non-periodic function, we must consider the interval between
minus-infinity and infinity.
Transforming into the complex planeIn Fourier Analysis, complex numbers make it possible to have a single coefficient for each
frequency. This means that you no longer need coefficients for the sines and cosines, just one,
which will give you the information of both. It is calculated using the fact that a complex
number z = x + iy can be represented by the point (x,y) in the complex plane. The x-axis
representing the real part of the number z, and the y-axis representing the imaginary (i) part(see figure 1.3.2). The phase of the complex number is simply the angle that is created when
Inverse Fourier Transform
Fourier Transform
TIME DOMAIN FREQUENCY DOMAIN
7/28/2019 Wavelets for Sound Analysis
10/63
Fourier Analysis
Page 4
Figure 1.3.2The Complex Plane
a line joins the point (0,0) to the point (x,y). The magnitude is the length of the line, which is
calculated using Pythagoras Theorem.
To write a Fourier series or transform using complex numbers, you can use the formula
sincos iei += , which is derived from the complex planeThe formula for the Fourier Series of a periodic function (equation 1) can now be written
ikx
kecxf
=)(
Where the formulas for the coefficients (equation 2) become
dxexfcikx
k =1
0
)(
Using this knowledge, the formula for the Fourier Transform )(f and its inverse f(x) are
dxexffxi
= )()( [EQ 3]
defxf
xi)(
2
1)(
=
1.4 The Discrete Fourier Transform
Digital computers are finite machines: any desired computation can only use a finite number of
operations. No digital computer, then, can deal with real-(or complex)-valued functions of real
numbers. The Discrete Fourier Transform (DFT) samples the function in order for it to be
represented on a computer. Instead of having f(x) at all x, we have only the values of f at a
finite number of points f(x1), f(x2), For convenience, this is usually sample points at regular
intervals, of say. If you start sampling at x1=0, the sequence of sample points becomes 0, ,2,,(n-1) and the sample values are f(0), f(), f(2),,f((n-1) )Because the DFT is to enable you to use digital computers in Fourier Analysis, we need a
finite analogue of the Fourier Transform given in equation 3.
The integral is all over x, and is calculated for every real value of . In the finite analogue, theintegral becomes the sum
=
1
0
)(n
r
rierf
[C 1990] [EQ 4]
This is still defined for all complex numbers of absolute value 1, therefore the equation needs
restricting further, so we have a finite number of values .We use the fact that
rie , for fixed , is periodic, with period 2/. Thus the range can berestricted to values of between 0 and 2/.So equation 4 now becomes the formula for the Discrete Fourier Transform (DFT)
==
1
0
/2, )()/1()/2(
n
r
nikrn erfnnkfD
(x , y)
x
y
magnitude
)(22
yx +
7/28/2019 Wavelets for Sound Analysis
11/63
Fourier Analysis
Page 5
1.5 The Fast Fourier Transform
The Fast Fourier Transform is of special interest because it has enabled the use of the DFT to
be faster and therefore more commonly used. When calculating the DFT, a small number ofFourier Coefficients is often adequate for an accurate Fourier Transform. It means that, with
the aid of computers, a Discrete Fourier Transform can be quickly processed.An example of this is the fft command in Matlab. The program uses the FFT technique to give
a fast effective result, which can then be plotted on the frequency domain.
Figure 1.5.1 shows the result of using this command.
Figure 1.5.1Matlab plots of left) Plot of a speech signal right) Its Fast Fourier Transform
When the Fourier Transform is plotted, only the amplitude data is shown and the phase is
discarded.
The Fast Fourier Transform (FFT) was introduced by Cooley and Tuley in the mid 1960s. Its
effects have been evolutionary, turning Fourier Analysis from merely being a mathematical
tool to being a practical one as more and more people turned to Fourier Transforms as an
effective way of frequency analysis. Cooley and Tooley developed an algorithm to speed up
the DFT for signals that had a length of a power of 2 e.g. 2, 4, 8, 16, 32, 64 This is becausethe method recursively halved the data.
The algorithm is described as a divide and conquer algorithm that systematically divides the
data into two halves, the FFT is then performed on each half, and then the two halves are
spliced back together.
The FFT cuts down the number of computations needed from 2
n to n log n. Therefore thelarger the value of n, the more impressive the gain in speed of calculating the FT.
Since the Cooley and Tooley algorithm has been developed, a new FFT algorithm has been
discovered where the signal need not have a length of a power of 2.
1.6 The Time/Frequency problem
There is a fundamental problem with the Fourier Transform that led to the search for a moresuitable transform such as the Wavelet Transform.
A function and its Fourier Transform are two faces of the same information. The function
displays the time information and hides the information about frequencies, and the Fourier
Transform displays the frequency information but no information about the time. The question
that is raised is, is it necessary to have both the time and the frequency information at the same
time?
The answer depends on the particular application and the nature of the signal in hand. The
Fourier Transform gives the frequency information of the signal, which means that it tells ushow much of each frequency exists in the signal, but it does not tell us when in time these
frequency components exist.
Olivier Rioul and Martin Vetterli state that a transform such as this one can only be effectivelyapplied to stationary signals[R 1991].
7/28/2019 Wavelets for Sound Analysis
12/63
Fourier Analysis
Page 6
Definition: Signals in which frequency content doesnt vary significantly in time are
Stationary.
The only strictly stationary signal is one with constant frequency.
Here is an example.Robi Polikar gave the following example in his Wavelet tutorial, which shows the problems
with non-stationary signals. [P 1996]
Figure 1.6.1 is a plot of a stationary signal and its Fast Fourier Transform, zoomed in. The
signal is x(t)=cos(2*pi*10*t)+cos(2*pi*25*t)+cos(2*pi*50*t)+cos(2*pi*100*t) and so
contains frequencies of 10, 25, 50, 100 hz at any given time instant.
Figure 1.6.1Plot of a Stationary signal and its FFT
Figure 1.6.2Plot of a Non-stationary signal and its FFT
Figure 1.6.2 is a plot of a signal with four different frequency components at four differenttime intervals i.e. it is non-stationary. This signal contains the same frequencies as the signal
used for the stationary example. The Fast Fourier Transform is also given.
The ripples in the FFT are due to sudden changes from one frequency to another and are
irrelevant in this example.
Apart from the ripples, the two plots of the FFT look identical, they both have large peaks at
frequencies 10, 25, 50, 100 Hz. The problem is apparent when you ask yourself when these
frequencies actually occurred. It is straightforward for the stationary example, but not so forthe non-stationary signal. This example shows that any change in frequency of a signal at the
slightest of time intervals will effect the overall FFT. As a result, you would get completely
false information.
As this project involves non-stationary signals, a further approach is needed.
1.7 The Short Term Fourier Transform
Fourier analysis doesnt work equally well for all kinds of signals or for all kinds of problems.
Hubbard describes the nave approach of some scientists when applying Fourier
In some cases, scientists using it are like the man looking for a dropped coin under a
lamppost, not because that is where he dropped it, but because thats where the light is
[H 1995]
If the signal is non-periodic, the summation of the periodic functions, sine and cosine, doesnt
accurately represent the signal. Research into artificially extending the signal to make it
7/28/2019 Wavelets for Sound Analysis
13/63
Fourier Analysis
Page 7
periodic revealed that you would require what is called continuity at the endpoints of thefunction.
Dennis Gabor introduced the Short Term Fourier Transform (STFT), also known as
Windowed Fourier Transform, as an attempt to overcome the problem with identifying when afrequency occurred in a non-stationary signal. STFT introduces the notion of time dependency
into the Fourier Analysis.
Gabors basic Idea: Introduce a local frequency parameter (local in time) so that the Fourier
Transform looks at the signal through a window over which the signal is approximately
stationary.
Figure 1.7.1
Studying the frequencies of the signal segment by segment limits the span of time during
which something is happening.
Formal Definition
Given a signal x(t), Gabor recognised that to be accurate in time, a two-dimensional time-frequency representation is needed, S(t, f), where f is the local frequency.
Recall, the signal is stationary when it is seen through a window g(t) (see figure 1.7.1).
The signals viewed in the window are represented by the following equation.
)()( Ttgtx T is the time location where the window is centred, and g (t) is the window function.
The multiplication of x(t) by g (t) is called Convolution (see page 10)
The Fourier Transform of these signals is then obtained by applying this to the FourierTransform given in Equation 3 (page 4)
As you can see by the formula, the STFT relies heavily on the choice of the window. In figure
1.7.1, the window was a basic rectangular window, but for more accurate results, different
shaped windows can be used such as the preferred Hamming window.
Another factor is the size of the window. Although the size of the window is fixed for the
entire process of calculating the STFT, different STFTs can be calculated using different sizedwindows. A small window is effectively blind to low frequencies, as they are too large for the
window, but using a large window, information is lost about brief changes.
We will see later how Wavelets have combated this problem as an attempt to see the wood and
the trees.
Figure 1.7.2 illustrates the windowing of a signal in STFT.It shows Gabors 2-dimension principle and gives two alternative views.
Analysis Window g(t)The wave is stationary in this
windowed sectionT
Time (t)
x(t)
dteTtgtxfTSTFTftj2
)()(),(
= [EQ 5]
7/28/2019 Wavelets for Sound Analysis
14/63
Fourier Analysis
Page 8
Figure 1.7.2[R 1991]
Figure1.7.2 shows vertical stripes in the time-frequency plane. They illustrate the windowing
of the signal view of the STFT. Windowing at time t, it computes all frequencies of the STFT.
The alternative view is shown by the horizontal stripes. It is based on a filter bank
interpretation of the STFT process. At a given frequency f, the STFT amounts to filtering the
signal at every value of t, using a bandpass filter. The window function is modulated to thegiven frequency.
The time/frequency resolution problem with STFT
In 1975, Jean Morlet recognised a problem: Unlike Fourier Analysis, The STFT system has thedisadvantage of being imprecise about time in the high frequencies because the size of the
window is fixed. If you then make the window very small, it means losing all the informationabout low frequencies.
So Morlet took another approach, which lead to the discovery of the WAVELET.
1.8 Technical Definitions for Chapter 1
Basis Functions
A group of functions such as y = sinx, y=sin2x, etc. form a basis if
i) They are all linearly independent from each other.ii) They can form any other linear function i.e. they span a vector space.
[Adapted from PMA211 course notes]
Linear Independence Formal Definition
Let v1,,vr be functions over a field F (eg. Real Numbers).
Say that there exist coefficients a1,,ar in F, not all zero, such that a1v1+a2v2++arvr = 0If the only way this can occur is that a1=a2==ar=0, then v1,,vr are all linearly
independent. Basically, this means that you cant make one of the linear independent functions
by combining any multiples of the others that are in the same basis.
Part ii) states that given any linear function in the same field, you can make it up by using
combinations of the basis functions.
STFT(T,f2)
STFT(T,f1)
Tf
Sliding Window g(t)
STFT(T1,f) STFT(T2,f)
T1 T2
f1
f2
modulated
filter bank
7/28/2019 Wavelets for Sound Analysis
15/63
Fourier Analysis
Page 9
In the case of the Fourier Transform, the basis is made up of sinusoids. According to the
definition of a basis, this means that all sinusoids are linearly independent from each other.
The following is a proof to show this.
A proof to show linear independence of sinusoidsFirstly, we need to explore Orthogonality and Inner Products
The Inner Product of two functions is a mapping < , > : V x V where V is a function in avector space and is a real number. We only need to concern ourselves with the mapping fora continuous function, as both sin nx and cos nx are both continuous.
Given the space of all continuous functions on the closed interval [a,b], the Inner Product < >
is defined by
>= =
++=
tdtnmtdtnmntdtmt )cos(2
1)cos(
2
1coscos
+ ++= nm
tnmnm
tnm )sin(21)sin(
21 = 0
so cos mx and cos nx are orthogonal
Step 2 proof that sin mx and sin nx are othogonal where m n
< sin mx, sin nx > =
+=
tdtnmtdtnmntdtmt )cos(2
1)cos(
2
1sinsin
++
=nm
tnm
nm
tnm )sin(
2
1)sin(
2
1= 0
so sin mx and sin nx are orthogonal
Step 3 proof that cos nx and sin mx are othogonal
< cos nx, sin mx > =
++=
tdtmntdtmnmtdtnt )sin(2
1)sin(
2
1sincos
if n m
+
++
=mn
tmn
mn
tmn )cos(
2
1)cos(
2
1= 0
if n = m
++
= mntmn )cos(
2
1
= 0
7/28/2019 Wavelets for Sound Analysis
16/63
Fourier Analysis
Page 10
so cos nx and sin mx are othogonal
All possible cases have been exhausted, and in each case the inner products have equalled to
zero, therefore, all sinusoids are orthogonal to each other and hence they are linearly
independent and form a basis for the Fourier Transform.
Convolution
An operation of the form x(n)h(n) is called Convolution, written x(n) h(n), but the symbol isusually omitted.The Matlab command in conv(x,h)
Convolution is used to calculate the response of the system to an arbitrary input signal by
convolving it with the systems impulse response.
You can think of convolutions geometrically, but it is best to explain them mathematically as
this is what computers do when they calculate the convolution.The convolution of two sequences a and b, is given by
jkjk baba
= )(
wherek
ba )( is the kth element of the resulting sequence.If the ja and jb are nonzero only for j >= 0 then
jk
k
j
jkbaba
==
0
)( [H 1995]
The convolution property is more useful when applied in a transformed domain (such as
frequency is the transformed domain of time in Fourier Analysis). It is very hard to visualise
what is happening to two signals after convolution when still in the time domain, but in the
transformed domain, convolution becomes multiplication. ][][][ yTxTyxT =
So this is effectively where corresponding values of points along the x-axis of both graphs are
multiplied together to form the new point.
Figure 1.8.1
TIME TIME
FREQUENCY FREQUENCY
FOURIER TRANSFORM
7/28/2019 Wavelets for Sound Analysis
17/63
Introduction to Wavelets
Page 11
2 : Introduction to Wavelets
2.1 Where did they come from?
Wavelets were discovered as a result of engineering and not from mathematics like mostapplications in signal processing. Yves Meyer was one of the first people who realised the
importance of wavelets and recognised that most researchers had been using a process
resembling the wavelet process already without knowing of its history or functionalbackgrounds. Wavelet theory has developed independently from a large number of areas, and
it was he who made the connection. He made the following comment.
Tracing the history of wavelets is almost a job for an archeologist, I have found at least 15
distinct roots of the theory, some going back to the 1930s[H 1995]
This dissertation focuses on the discovery of wavelets as an approach to solve the
time/frequency resolution problem presented in the previous chapter.
The first use of wavelets was when Morlet was using Short Term Fourier Analysis. He wasusing STFT when working on a system that processed echo signals, used for aiding the
localisation of oil for excavation. Big windows were placed at different places on the signal,
then, as the price of computing dropped further; windows were placed closer and closer
together, even overlapping. Morlets problem was, no matter what he did; the process didnt
get any better. Morlet wanted a finer local definition.
As mentioned in section 1.7, the STFT system has the disadvantage of being imprecise abouttime in the high frequencies (unless you make the window very small, which means loosing all
the information about low frequencies).
So Morlet decided on another technique. Instead of keeping the size of the window fixed andfilling it with oscillations of different frequencies, he did the reverse: he kept the number of
oscillations in the window constant and varied the width of the window.
This window is called a WAVELET.
Figure 2.1.1
STFT Vs Wavelets
When the wavelet is stretched, the oscillations inside of it are stretched, decreasing their
frequency. When the wavelet is compressed, higher frequencies are produced.
Figure 2.1.1 shows the difference between STFT and Wavelets.
Top Row: STFT
Size of window is fixed and the number of oscillations varies.Small window is blind to low frequencies, which are too large for the window.
STFT
WAVELETS
7/28/2019 Wavelets for Sound Analysis
18/63
Introduction to Wavelets
Page 12
The large window looses information about a brief change in the information concerning the
entire interval corresponding to the window.
Bottom Row: WAVELETS
A mother wavelet (left) is stretched or compressed to change the size of the window.It makes it possible to analyse a signal at different scales.
2.2 The Mother Wavelet
The mother waveletis the building block for all other wavelets.
All wavelets are generated from a single wavelet function by a series of simple scaling and
translation procedures. This two dimensional parameterization is obtained from the function
(t) by
)2(2)( 2, kttj
j
kj = j , k Z [B 1998]
Z is the set of all integers. The factor 22j
maintains a constant norm idependent of scale j.
k- parameterisation of the time or space location.
j- the frequency or scale.
The function (t) is called the generating wavelet or Mother Wavelet and defines the waveletbasis. The term basis is the same here as it was in the FT case.
Looking at the formula, it is clear to see that there are infinitely many mother wavelets, which
form the foundations of the Wavelet Transforms.
2.3 Wavelets achieve Multiresolution
Figure 2.3.1
[G 1995]
Frequency Frequency
TimeTime
a) b)
7/28/2019 Wavelets for Sound Analysis
19/63
Introduction to Wavelets
Page 13
Amara Graps describes the benefits of using wavelets in signal processing. Wavelets overcome
the time/frequency resolution problem because of their ability to be stretched and compressed
(see section 2.2).
Part a of figure 2.3.1 shows a STFT, where the window is simply a square. Because a single
window is used for all frequencies in the STFT, the resolution of the analysis is the same at all
locations in the time/frequency plane.
An advantage of using wavelets in a transform is that the width of the windows can vary. You
can have short high-frequencies windows and long low-frequency windows.Part b shows the coverage in the time/frequency plane with a wavelet function.
2.4 How do you create a Wavelet?
As mentioned earlier, there are infinitely many wavelets. Unlike the basis for the Fourier
Transform i.e. sinusoids, wavelets can contain many sharp corners or discontinuities.
Wavelets are obtained by altering the variables j and k given in the mother wavelet formula in
section 2.2. These variables are integers that scale and dilate the mother function to generate
different wavelet families such as the Daubechies family (see below). The scale index jindicates the wavelets width, and the translation index k gives its position.
The term position is used in the same sense as it is for the STFT; it is related to the location of
the window, as it is shifted through the signal.
Figure 2.4.1
Figure 2.4.1, shows the Daubechies Wavelet family with different scaling and transitions.
They were created by a Matlab function that used the rules described by Ingrid Daubechies, inher book Ten Lectures on Wavelets. [D 1992]
Within each family of wavelets (such as the Daubechies family) are wavelet subclasses that
are distinguished by the number of coefficients and by the level of integration. These wavelets
are classified within a family often by the number of vanishing moments. [G 1995]
7/28/2019 Wavelets for Sound Analysis
20/63
The Continuous Wavelet Transform
Page 14
3 : The Continuous Wavelet Transform
The Continuous Wavelet Transform (CWT) was developed as an alternative approach to the
Short Term Fourier Transform (STFT) to overcome the time/frequency resolution problem
(section 1.7)
3.1 Theory
The CWT is done in a similar way to the STFT, in the sense that the signal is multiplied with a
function, which is in this case, the Wavelet introduced in the previous chapter. Also, like
STFT, the transform is computed seperately for different segments of the time-domain signal.The main difference between CWT and STFT is that the width of the window is changed as
the transform is computed for every single spectral component, which is probably the most
significant characteristic of the CWT.
In the STFT computation, because the window had a constant shape and size throughout the
analysis, the frequency responses of the window were regularly spaced over the frequency
axis. Figure 3.1.1 (a) shows what filter bank the STFT produces. A filter bank is a term used to
describe the filtering effects on the frequencies in the signal as the window moves along thesignal.
Figure 3.1.1
[R 1991]
In the CWT case, instead of the frequency responses of the analysis filter being regularly
spaced over the frequency axis, they are regularly spread in a logarithmic scale (figure 3.1.1
(b) ).
Olivier Rioul and Martin Vetterli describes that this logarithmic approach in the filter banks isused for modelling the frequency response of the cochlea situated in the inner ear and is
therefore adapted to auditory perception. [R 1991]
We have already introduced Wavelets as the basis function for the CWT, and that these are
scaled and translated versions of the mother wavelet. The following formula expresses the
CWT in terms of the signal applied to the wavelet
Formula
dts
ttx
sabssCWT
x
=
)(
)(
1),( [EQ 6]
This shows the transformed signal is a function of two variables, and s, the translation andscale parameters, respectively and (t) is the mother wavelet.
Fre uenc f Fre uenc f
(a) Constant Bandwidth (STFT) (b) Constant Relative Bandwidth (CWT)
7/28/2019 Wavelets for Sound Analysis
21/63
The Continuous Wavelet Transform
Page 15
Notice that we do not have a frequency parameter, as we had with the STFT (EQ 5, page 7),
instead, we have a scale parameter which is defined as 1/frequency.
Robi Polikar made the following analogy:
The scale parameter in the wavelet analysis is similar to the scale used in maps. As is the
case of maps, high scales correspond to a non-detailed global view (of the signal), and low
scales correspond to a detailed view. [P 1996]
3.2 Computation of the CWT
This section explains the formula given above and shows some applications.
Let x(t) be the signal that is to be analysed. Firstly, you need to choose a wavelet to act as the
analysing window. There are several candidates, Morlet, Sombrero, Daubechies, which are all
derived from the mother wavelet. Once the wavelet is chosen, the computation starts at s = 1
and the CWT is computed for all values of s, smaller and larger than 1. It is conventional that
the value of s (the scale) starts at 1, but this doesnt always have to be the case. The procedure
then continues for increasing values of s i.e. the analysis will start from high frequencies and
proceed towards low frequencies.As the value of s increases, the more the wavelet dilates, so the first value of s corresponds to
the most compressed wavelet.
The wavelet is placed at the beginning of the signal which is at the point which corresponds
to t = 0. The wavelet function at scale 1 is multiplied by the signal and then integrated. The
result of this integration is then multiplied by the constant number 1/sqrt(s). This is for
normalisation purposes only, so that the transformed signal will have the same energy at every
scale. The final result is the value of the CWT at time zero (t=0) and scale s=1 in the time-
scale plane.
The value of the transformation is calculated every time the wavelet is shifted towards the right
by . Therefore the value for the CWT is obtained at t=0, t=, t=2, etc. with scale s=1as thewavelet is shifted. This procedure repeats until the wavelet reaches the end of the signal. Onerow of points on the time-scale plane is then completed. Sections 3.3 and 3.4 show how these
rows are represented.
s is then increased by a small value and the above procedure is repeated for every value of s,
where each value of s fills the corresponding row of the time-scale plane.
Figure 3.2.1 shows the CWT process with s=1. The wavelet is the Morlet wavelet and isshown in yellow. Here, t represents the value of time where the centre of the wavelet is
positioned.
Figure 3.2.2 shows the CWT process with s=5.
Figure 3.2.1s=1 a) t=2, b) t=40, c) t=90, d) t=140
Figure 3.2.2s=5 a) t=2, b) t=40, c) t=90, d) t=140
a b a b
c d c d
7/28/2019 Wavelets for Sound Analysis
22/63
The Continuous Wavelet Transform
Page 16
3.3 Visualising the CWT 3D Plot
Section 3.2 showed how the CWT is calculated by moving the wavelet window along the
signal at different wavelet scales. Each time the scale is increased, a new row of a matrix isadded. The matrix produced by the CWT process has the following dimensions.
x-axis : Translation (depends on the value of tau)y-axis : the number of different values of s used
Figure 3.3.1 shows a typical plot of a CWT.
Figure 3.3.1
As described earlier, the scale parameter s in equation 7 is actually the inverse of frequency. In
other words, frequency decreases as scale increases. So the portion of the graph in figure 3.3.1with scales around zero, actually correspond to highest frequencies in the analysis.
3.4 Visualising the CWT - Scalograms
The Scalogram is a very common tool in signal analysis, as it provides a distribution of theenergy of the signal in the time-scale plane. Olivier Rioul and Martin Vetterli recognised that
the CWT is isometric and therefore preserves energy. They proved this with the following
formula
XE
s
dsdsCWT = 2
2
),(
[R 1991]
where2
)(= txEx is the energy of the signal x(t).
This discovery lead to the definition of the scalogram, as the squared modulus of the CWT.Figure 3.4.1 shows an example of a typical scalogram.
Figure 3.4.1
The Scalogram is the visual representation used in the Wavelet Learning Tool being developedas part of this dissertation
7/28/2019 Wavelets for Sound Analysis
23/63
The Discrete Wavelet Transform
Page 17
4 : The Discrete Wavelet Transform
4.1 Why not use the CWT?
As described in section 3, a signal can be transformed from the time domain to the frequencydomain using the Continuous Wavelet Transform (CWT) while reducing the loss of time and
frequency resolution. The section explained how the CWT was calculated by changing the
shape of the wavelet, which acts as the analysis window, for each analysis frequency. The
wavelet shape was governed by the scale parameter s where a larger s would represent a
more dilated wavelet. The wavelet would move along the signal with each scale and calculate
the CWT coefficient for each step. The size of the steps are governed by the parameter.
The s and parameters are continuous, i.e. their values can be incremented up to any value,and hence the transform is called the Continuous Wavelet Transform. Due to these parameters
being continuous, the CWT is not well suited to computer implementation [A 1996]. Although
the Wavelet Learning Tool, being developed alongside this dissertation, is being developed touse the CWT, to show how the process works and how the scalograms are produced, it is not
the quickest or most practical transform to use. The tool will only allow you to use small
values of s and , any larger values i.e. a signal with too many samples increases computationtime dramatically.
4.2 Discretizing the Continuous Wavelet Transform
In the Continuous Wavelet Transform, the wavelet coefficients were calculated using equation6. As mentioned above, the CWT cant be practically computed because it contains an integral
with which the variables are continuous. It is therefore necessary to discretize the transform.
The most intuitive way of doing this is to simply sample the time-frequency plane. With mosttransforms, the most natural choice would be to sample the plane with a uniform sampling rate,but in the case of Wavelet Transforms, the scale change can be used to reduce the sampling
rate. Nyquist developed the following rule that explains the reasons for using scale to reduce
sampling rate.
Nyquists Sampling Theorem: If the range of frequencies of a signal measured in cycles per
second is n, then the signal can be represented with complete accuracy by measuring its
amplitude 2n times a second. [H 1995]
This theorem describes how a curve with a finite number of frequencies can be represented
exactly by a finite number of samples. Usually you would need an infinite number of samples
in order to represent the curve exactly.
Nyquists Sampling Theorem can be interpreted so that if the time-scale plane needs to besampled with a sampling rate of n1 at scale s1, the same plane can be sampled with a sampling
rate of n2 at scale s2, where s1f2) and n2
7/28/2019 Wavelets for Sound Analysis
24/63
The Discrete Wavelet Transform
Page 18
Figure 4.2.1Dyadic Sampling Grid [V 1995]
The Dyadic Sampling Grid shown in figure 4.2.1 is a pictorial representation of the
relationship between sampling frequency and scale. As scale increases down the graph, thefrequency being analysed decreases. Nyquists Rule says that the further you go down the
graph, the lower the sampling rate that is needed. In the figure, the sampling rate is represented
by the dots. The more dots, the higher the sampling rate. Each dot corresponds to a Waveletcoefficient calculated using the Continuous Wavelet Transform. The larger the scale
parameter, the fewer the number of coefficients that are needed, and therefore the quicker the
computation time.
You could think of the area covered by the axes as the entire time-scale plane. The CWT will
assign a value to the continuum of points on this plane. There are obviously an infinite number
of CWT coefficients. Considering the discretization of the scale axis, among the infinitenumber of points, only a finite number of them will actually be calculated, using a logarithmic
rule. The base of the logarithm depends on the application but the most common is 2 because
of its convenience. An application called Subband Coding (see section 4.3), uses such a base.
If the base 2 is chosen, only the values 2,4,8,16,32etc are used for the scale parameter. The
time axis is then discretized according to the discretization of the scale axis. Since the discretescale changes by a factor of 2, the sampling rate is reduced for the time axis by a factor of 2 at
every scale. You can see at each stage of the Dyadic Grid that the sample rate is decreased by
half. As a consequence, the Discrete Wavelet Transform uses wavelets only of the form where
the scalekj 2= and k is a whole number, see the formula for the mother wavelet on page 12.
4.3 Subband Coding
There are two well documented methods for calculating the Discrete Wavelet Transform basedon the ideas expressed in section 4.2, The Multiresolution Pyramid and Subband Coding. This
section will give a detailed explanation of the later of these two and will be used as part of asecond piece of software showing the visual effects the method has on the signal.
Driven by applications such as speech and image compression, a method called Subband
Coding was proposed by Croisier, Esteban and Galand using a special class of filters called
quadrative mirror filters in the late 1970s [V 1995]
The Subband coding scheme, first popularised in speech compression, uses a combination of
high-pass and low-pass filters to reduce the sample rate of the transform. Filters of different
cut-off frequencies are used to analyse the signal at different scales. The whole Subband
process consists of a series of these filters known as a filter bank. High-pass filters are used to
analyse the high frequencies in the signal, and the signal is passed through a series of low-pass
filters to analyse the low frequencies.
log s
7/28/2019 Wavelets for Sound Analysis
25/63
The Discrete Wavelet Transform
Page 19
The resolution of the signal, which is a measure of the amount of detail information in the
signal, is changed by the filtering operations, and the scale is changed by downsampling (sub-
sampling) operations. Sub-sampling a signal corresponds to reducing the sampling rate, which
is equivalent to removing some of the samples of the signal. For example, subsampling by two
refers to dropping every other sample of the signal (see figure 4.2.1). Subsampling by a factor
n reduces the number of samples in the signal n times.
Figure 4.3.1The Subband Coding scheme shown as a filter bank tree. [R 1991]
h(n) high-pass filter, g(n) Low-pass filter, 2 Subsampling by 2
The procedure starts with creating a high-pass filtered version of the signal by passing the
signal through a half band digital low-pass filter. This is done by convolving the signal by an
impulse response function h[n] which represents the low-pass filter. A half band low-pass filtereliminates exactly half the frequencies from the low end of the frequency scale. For example,
if a signal has a maximum of 1000 Hz component, then half band lowpass filtering removes all
the frequencies above 500 Hz.
There is an important thing to consider when talking about frequency in the discrete case and
is explained as follows.
In discrete signals, frequency is usually expressed in terms of radians. As a result of this, the
sampling frequency of the signal is equal to 2 radians in terms of radial frequency. Therefore,the highest frequency component that exists in a signal will be radians, if the signal issampled at Nyquists rate (which is twice the maximum frequency that exists in the signal, see
page 17); that is, the Nyquists rate corresponds to rad/s in the discrete frequency domain.Therefore using Hz is not appropriate for discrete signals. However, Hz is used whenever it is
needed to clarify a discussion, since it is very common to think of frequency in terms of Hz. It
should always be remembered that the unit of frequency for discrete time signals is radians.
After passing the signal through a half band low-pass filter, half of the samples can be
eliminated. This is according to Nyquists rule, since the signal now has a highest frequency of
/2 radians instead of radians. Simply discarding every other sample will subsample thesignal by two, and the signal will then have half the number of data points. The low-pass
filtering removes the high frequency information, but leaves the scale unchanged. Only the
subsampling process changes the scale (see figure 4.3.2). Resolution, on the other hand, is
related to the amount of information in the signal, and therefore, it is affected by the filteringoperations. Half band lowpass filtering removes half of the frequencies, which can be
Etc.
h(n)
g(n)
h(n)
g(n)
h(n)
g(n)
2
2
2Level 1
Level 2
Level 3
7/28/2019 Wavelets for Sound Analysis
26/63
The Discrete Wavelet Transform
Page 20
interpreted as losing half of the information. Therefore, the resolution is halved after the
filtering operation. Half the samples can be discarded without any loss of information.
Basically, the lowpass filtering halves the resolution, but leaves the scale unchanged. The
signal is then subsampled by 2 since half of the number of samples are redundant. This doubles
the scale.
This completes one level of the Subband decomposition
This can be repeated for further decomposition. At every level, the filtering and subsamplingwill result in half the number of samples (and hence half the time resolution) and half the
frequency band spanned (and hence double the frequency resolution).
Figure 4.3.2
Resolution and scale changes in discrete time
4.4 Example of Subband Coding
We have shown how to decompose a sequence into two sub-sequences at half rate by using abank of halfband pass filters. This process can be iterated on the sequence from the lower band
to achieve finer frequency resolution at lower frequencies. Repeating the process once on the
first low band creates a new low band, which corresponds to the lower quarter of the frequency
spectrum. Each further iteration will half the amount of frequency in the signal.Figure 4.4.1 shows the result of applying a signal to the Subband Coding scheme, each stage
shows how the signal is sub-sampled and how the frequency band is reduced by half.
The signal in blue is a sound wave made up of two tones produced by someone whistling. The
first region is a low tone and the second is a distinctively higher tone. Its Scalogram image,
produced by the Wavelet Learning Tool being developed along side this dissertation, is given
below also. This was computed using the Morlet wavelet. It clearly shows the two distinct
frequencies.
Figure 4.4.1Outputs from Subband Coding Scheme
halfband
lowpass filter
resolution: halved
scale: no change
halfband
lowpass filter
resolution: halved
scale: doubled
2
High pass filter 1
2000 Samples
f(/2 ~ )
High pass filter 2
1000 Samples
f(/4 ~ /2)
High pass filter 3
500 Samples
f(/8 ~ /4)
High pass filter 4
250 Samples
f(/16 ~ /8)
4000 Samples
7/28/2019 Wavelets for Sound Analysis
27/63
The Discrete Wavelet Transform
Page 21
Figure 4.4.1 shows the results from each stage (iteration) of the Subband Coding scheme.
The example given is of a signal comprised of a low frequency tone followed by a high
frequency tone. Throughout the different stages of the Subband coding scheme, the signal has
been high-pass filtered and then down sampled. The outputs given show the signal at the
different stages of the scheme. It is clearly visible that at the start of the process, only the high
frequency components are visible but, as the process has gone on, the signal has been more
and more high pass filtered until only the very low frequencies are left. This is proved by the
fact that the high frequency tone has been completely filtered out after 4 filters.
The outputted signals given in figure 4.4.1 are taken from the Subband Learning Tool
developed along side this dissertation.
7/28/2019 Wavelets for Sound Analysis
28/63
Computer Assisted Learning Tools
Page 22
5 : Computer Assisted Learning Tools
5.1 Computer Assisted Learning (CAL)
As this dissertation researches into the fairly new field of Wavelet Theory, a software packageis being developed to aid in the understanding of the processes involved. The fact that there
hasnt been much software development in this field gives all the more reason to develop one
now, which can be used to complement the teaching of the subject. Generally speaking, thesoftware package being developed belongs to the family of Computer Assisted Learning
(CAL) tools. This chapter discusses the advantages and disadvantages of using CAL and
whether a CAL tool is appropriate in this situation.
Computer Assisted Learning means (in a broad sense) using computers in education for all
kinds of purposes. [ICASSP vol2 1995]
When constructing CAL tools, it has been recognised that they should provide flexibility for
the student involved and also be stimulating enough so that the student can construct private
concepts rather than reproducing given explanations. [K 1996]
The advantage of having such a tool is that users can control their own access to the
information being taught. In this way, the flexibility of the system allows students to adapt the
available information streams to their mental need at any given moment. The disadvantage of
this free access to the CAL tool is that lecturers have no control over the flexibility of the
system. Lecturers need to anticipate how much the students will take advantage of this free
access, which may lead to the student not fully understanding the basic concepts that the CAL
tool was developed for in the first place. Because of this flexibility problem, CAL tools can notsolely be used as a method for teaching a new subject, but there is plenty of evidence below to
suggest that there are great advantages of using a CAL tool when used with a series of lectures.
Lecturers, Martin Cooke and Guy Brown, have researched into possible situations where a
CAL tool is useful in teaching speech and hearing, and may therefore aid in the teaching of
Wavelets. Their development of the Matlab Auditory Demos (MAD), with which the Wavelet
Learning Tool being developed as part of this dissertation would contribute to, has given thetwo authors a deep understanding on whether a CAL tool is appropriate.
They initially recognised the following problem,
The courses in speech and hearing typically introduce large amounts of unfamiliar material to
participants with backgrounds almost as variedthe domains of speech and hearing involve
intangible signals, ill-suited to traditional styles of presentationthe possibilities formisinterpretation are immense and, in our experience, difficult to predict [C 1999]
Recognising this problem, it was clear that CAL tools would be an appropriate solution, due to
the scope for interaction and experimentation.
Matti Karjalainen and Martti Rahkila also recognised the fruitfulness of using a CAL tool with
teaching Signal Processing. They also understood that a CAL tool must be in some way more
useful than ordinary teaching methods. [ICASSP vol2 1995]
Kommers, Grabinger and Dunlap recognised that there are 3 main areas to most CAL tools
where there are significant advantages to learning,Resource, Communication andExploration.
[K 1996]Resource: Paper-based documents are restricted to text, tables, schematic line drawings and
pictures, whereas hypermedia allows sound and video sequences as well. CAL tools can
provide multiple dimensions in the meanings of expressed ideas e.g. hearing a property of asound wave provides a better and more natural understanding than a picture of the sound wave.
7/28/2019 Wavelets for Sound Analysis
29/63
Computer Assisted Learning Tools
Page 23
However, it is important that the resources provided by the tool match those provided by the
lecturer. If the tool and the lectures mention the same word for different meanings, the learning
experience is weakened.
Communication: This is based on the idea that a system should be programmed so that a
dialogue could evolve between the machine and the learner. The actual bandwidth of
communication in a CAL tool is very low and would probably not feature a large amount in
the tools being developed. However, a user guide will be developed to help users to use the
tools to the best effect, see Appendix B.Exploration: Computer simulation programs themselves are convincing for demonstrating their
education value. Confronting a student with a simulation allows more drastic, flexible and
critical manipulations. In a book, only a few examples may be given for a particular property,
but with a CAL tool, explorations into many other instances of that particular property can be
achieved. The exploration property will allow students to learn by discovery.
The CAL tools being developed for this dissertation are developed with the teaching of thesubject in mind, and how it can complement and reinforce a lecture course by providing hands-
on experience. The software has to be fairly easy to use to avoid early frustrations to a users
inexperience, and must also be visually suitable to make it clear what is happening. Justlooking at the previous chapters demonstrates how mathematical the theory of Wavelets is, and
to a computer science student, who may or may not have a good mathematical background,
may seem very demanding. The tool will help students to see visually what the maths
represents and enhance the students willingness to learn more about the subject, instead of
being intimidated by the mathematical content. After all, the actual use of Wavelets is to aid in
Signal Processing, so actual audio and visual demonstrations are an obvious key to teaching
the subject.
5.2 Which programming language?
There are many different program languages, which you can develop a learning tool from andthey all have their advantages and disadvantages, so it is sometimes difficult to choose which
one to use without seeing the benefits.
Matti Karjalainen and Martti Rahkila constructed a CAL tool using the QuickSig object-
orientated environment packages. This provides signal-processing tools for many application
domains, using the concept that signals and related concepts are represented as objects and the
operations on them are typically implemented by method functions. Another advantage is that
a wide range of functions such as filtering and transforms (like FFT) are built-in. However,
this programming environment has a problem with portability. QuickSig is Macintosh-specific
and also requires the languages Lisp and CLOS. So QuickSig is ill suited to large class sizes
and students would not be able to run the software on their home machines.
The CAL tools being developed in this dissertation use MATLAB.Matlab is a high level programming language, which provides many facilities for data
visualisation and numerical computation. It doesnt have the problem with portability, as a
version of MATLAB is available for most operating systems. MATLAB lends itself to
prototype programming, as it provides good facilities for quick interface construction. Matlab
also has a high-level support for sound handling and signal processing. As it is a mathematical
language, it lends itself to the use of vectors and matrices and makes it very straightforward to
plot graphs of signals. Martin Cooke and Guy Brown recognised that MATLAB is a sensiblechoice compared to languages such as Java. They recognised that an application in Java would
be time consuming because the Java applications interface (API) has no equivalent of the
signal processing toolbox available in MATLAB. A Java application for signal processing
would probably be too slow for any adequate user interaction.
7/28/2019 Wavelets for Sound Analysis
30/63
Requirements Analysis
Page 24
6 : Requirements Analysis
The previous chapter introduced the notion of a Computer Assisted Learning tool and how
they can be used to assist in the teaching of Wavelet Theory. This dissertation will now focus
on the development of such a tool.
6.1 The initial requirement
At the very beginning, before any work was carried out, there was a small brief on what the
project should cover. This brief also contained a short paragraph on the basic requirements of
the system that was also to be developed
A MATLAB application will be designed and implemented that allows a wavelet
representation of sound to be generated using the DWT and modified by direct on-screen
manipulation (e.g. by removing components at certain scales); the inverse DWT will then beapplied to resynthesize a sound waveform which can be played to the listener.
This statement details the basic ideas of the system and was considered to be the backbone of
the system. It is clear that the system needs to include a function to work out the DWT, and
also the IDWT, and some user interaction to alter the scales on screen before the IDWT is
calculated. There are no clues to how the GUI should look, or how the user interacts with it.
Even though the initial requirement states that the DWT should be calculated, there was no
obvious indication of how the results of these calculations were to be displayed. Also, which
wavelet is to be used to calculate the DWT ?
All of these questions needed to be answered, so an interview was set up between myself (thedeveloper) and the client, to attempt to make clear what is needed.
6.2 The Client and Developer scenario
Before starting the design of any software system, it is important that you have all the
requirements clear first. The best way to do this is for the developer to ask a multitude of
questions to the client in order to extract important information about the system that the client
may not of previously given.
In any software development program, the client will have a total understanding of the
problem, as he is the one with the expertise on the subject. It is all too common for clients to
assume that a software developer is also an expert on their particular field of work. However,this is always never the case and the software developer will have a limited understanding of
the problem and may be confused by the clients initial requirements. It may be the case thatthe initial requirements are vague or even impossible to implement, so the idea of having an
interview is to explore and develop the clients narrow goals and to fill in any gaps of
understanding.
The interview session took place in the first week of development of this project and some of
the questions that were raised are detailed below. Most of the questions were of a result of
studying the initial requirement and of an early research into the topics of Wavelet Theory
before the project began. The answers given are not direct quotes, but they summarise the
discussion.
7/28/2019 Wavelets for Sound Analysis
31/63
Requirements Analysis
Page 25
Q. Do you want a visualrepresentation of the effects thatdifferent wavelets have?This two-part question asks the developer to clarify how the result of the DWT should be
presented and to find out whether the user should be able to compare the effects of different
wavelets.
A. The result of the process of using the Wavelet Transform should be represented by aScalogram so the user can see all the coefficients calculated on a time-frequency axis. This
scalogram can then be manipulated ready for the inverse transform.
The user should be able to choose from a selection of wavelets so they that can compare thedifferent effects they have on the process. A good idea would be to have a side-by-side
comparison.
After the discussion, it was realised that the answer given by the client to this question
contradicted the initial requirement, and also the text that had been studied. In the
requirements, it was clear that the system should use a Discrete Wavelet Transform to analyse
the signal. However, the answer to the question suggested that the result of the transformshould be presented visually using a scalogram. In section 3.4, it explains that a scalogram is
calculated by taking a Continuous Wavelet Transform matrix and taking the square of the
magnitudes of the coefficients. This means that the requirements should be changed to usingthe CWT and not the DWT. After discussing this, this became the new requirement and it was
agreed that another piece of software would be developed to show how the DWT could be
calculated using Subband Coding.
Q. Do you want a breakdown visually of the CWT process?After recognising that it was in fact the CWT that would be used in the software, this question
approached the subject of using animation to picture the CWT process as well as the scalogramshowing the results it gives.
A. Yes. Showing the wavelet as it compresses or dilates to calculate each scale of the CWT
would be beneficial.
Q. How do you want the On-Screen manipulation to work?The initial requirement suggested that the user should be able to use on-screen manipulations
to remove scales from the CWT in order to re-compute the inverse transform. However, it isunclear how this could be done, especially in Matlab, where graphics are more limited than in
Java, say.
A. It was suggested that the user could directly manipulate the scalogram by using some sort of
cursor. The cursor could move up and down the scalogram and a window would show the
CWT coefficients at that scale. The cursor could then be used to select certain scales to be
removed.
Q. Would you like to be able to play back the sound waves?
A. Yes. You need to be able to play the original sound wave that is to be analysed, and also
you need to hear the effects of the re-synthesis of the scalogram using the ICWT.
Q. Do you want to be able to actively control the scaling and translation coefficients of
the wavelet used?The CWT coefficients are calculated at every scale by using the scaling coefficient in the
mother wavelet formula (see section 2.2). The number of rows in the CWT matrix depends on
the number of scales calculated. Also, the number of columns in the CWT matrix depends on
the size of the steps the wavelet takes as it moves along the signal between each coefficient
calculation. The steps are controlled by the translation coefficient. Altering the values that
these coefficients take will change the number of CWT coefficients to be calculated, which
may be a useful property.
A. Yes. You could use a slider.
7/28/2019 Wavelets for Sound Analysis
32/63
Requirements Analysis
Page 26
6.3 The Matlab Auditory Demos
The software tools being developed along side this dissertation will form part of the Matlab
Auditory Demo (MAD) CAL tools. These tools have been developed to aid in the teaching ofComputer Speech and Hearing courses in the University of Sheffield. [C 1999]
Because these MADs are Computer Assisted Learning tools, they need to fulfil therequirements mentioned in chapter 5. The MADs consist of many different tools that enable a
user to understand many different concepts in the subject of Computer Speech and Hearing.
They vary from producing spectogram representations of speech waveforms to the complex
modelling of the Basilar Membrane in the ear.
Even though there is a wide variety of software systems comprising the MADs, they all have
many things in common that enable them to be successful CAL tools. The research into the
MADs produced the following further requirements.
Speed
The software tools that are being developed must be quick enough to allow sufficient user
interaction and provide meaningful animation. A quick system will maintain the users
interest, which will naturally enhance users understanding and learning of Wavelets.
Ease of useResearch into the MADs showed that all the systems were user friendly in such a way that theuser could see almost straight away what functionality was available. GUI objects are clearly
labelled and are only accessible at the appropriate times. Axes are labelled appropriately and
on-screen instructions appear when appropriate to guide the user. None of the systems are too
clustered with too many buttons, sliders etc. which would only confuse a new user, especially
if the user is new to the subject being investigated.
AestheticsIt is important that the interface is pleasing to look at. If the system is dull in appearance, the
user would not be as interested in using the system, so the teaching of the subject would suffer.The system should provide a suitable amount of colour to aid in the users understanding e.g.the system could be colour coded so that different components have their own colour. Also, the
tools being developed should match the appearance of the MADs so that it is clear that they
belong to that group. The tools should have similar headers, size, background colour etc.
Input error recoveryIt is important that a CAL tool recovers well to input error. Ideally, the system shouldnt allow
a wrong sequence of inputs i.e. by disabling GUI objects, but it is often the case where a userwould input something wrong either by mistake or by their lack of understanding about the
material or the system. It is here where the system should recognise that a user has inputted the
wrong value, and correct the error appropriately.
7/28/2019 Wavelets for Sound Analysis
33/63
Software Development
Page 27
7 : Software Development
After the Requirements Analysis, it was proposed that two pieces of software would be
developed. The main one, called the Wavelet Learning Tool (WLT), will be developed and
documented in detail and will comprise of all the requirements set out in the previous chapter.The second piece of software will be a basic tool, which will show the different stages in the
Subband Coding scheme and is called the Subband Learning Tool (SLT).
7.1 Getting familiar with MATLAB
In simplest terms, MATLAB is a computer environment for performing calculations.[R 1998]
MATLAB is a contraction of Matrix Laboratory, and is primarily used for a convenient tool
for the manipulation of matrices. Since it was first created, it has added more and more
functionality and remains a leading tool for scientific computation. While simple problems canbe solved interactively with MATLAB, its real power shows when you give it calculations
that are extremely cumbersome or tedious to do by hand. Because the Continuous Wavelet
Transform involves constructing a very large matrix, and then displaying that matrix,
MATLAB seemed to be a natural choice.
MATLAB also allows graphics to be displayed with ease and with little programming as it
combines an efficient programming structure with a multitude of pre-defined mathematical
commands. Therefore, before any software development, it was beneficial to familiarise with
what MATLAB has to offer in term of commands and interface construction.
7.1.1 MATLABs GUI
Matlab provides many different GUI objects to make user interaction as easy as possible. Eachobject has its own different advantages depending on the type of operation. Below is a
comprehensive list of what is available.
Push Buttons The software user can press buttons for instant execution of a particularfunction.
Pop-up Menus Contains a list where one item can be selected i.e. a wavelet type. After
selection, a process can be initiated.
Edit Boxes Used to alter a numerical parameter in a function.
Check Boxes Often used when there is an option to have a particular property in the
system or not.
Radio Buttons Similar to check boxes except they usually come in pairs where you
select one or the other.
Sliders Also used to alter the parameter in the function, but unlike edit boxes, a
value doesnt need to be known and the function can update in real-timeas the slider moves.
List Boxes Similar to pop-up menus except that the whole of the list cant bevisualised at once to save space.
7/28/2019 Wavelets for Sound Analysis
34/63
Software Development
Page 28
Matlab also allows the construction of axis, lines, text, figures and other graphical devices, that
were taken into consideration when sketching the initial interface design.
Each GUI object has attached to it a series of handles. These handles define many properties of
the objects, which are set and controlled by the developer. The properties are set when the
software is first executed, but can be changed at any time during software development.
Controlling the handles to the GUI objects controls how the system will look, behave and how
efficiently. Therefore, a good understanding of the handles available and how to use themappropriately was an important stage in software development.
7.1.2 Coding in MATLAB
Matlab is a functional program language rather than an object-orientated one. Functions call
other functions, which are executed in a sequential fashion. Each function is read downwardsunless encountered by aforor while loop or by an ifclause.
The Wavelet Learning Tool will consist of many different functions, which will call each otherappropriately. The Matlab system will have to cope with the flow of control between these
different functions, which are written in m-files, which can be called at any time within a
program. However, keeping track of all these m-files can be very difficult and looks veryuntidy. The solution to this is the case statement. Instead of having many different m-files
containing the different functions of the system, it is better practise to place all the functions
into one m-file separated by a case statement.
Figure 7.1.2.1
Every time the m-file containing all the switch statements is called, it must be called with an
appropriate argument corresponding to which case is to be read.
Another advantage to this case switching approach is that you can have an initialisation case
where all the global parameters can be set when the system is first executed.
Matlab programming revolves around designing separate functions to do separate jobs, and
then plugging them into the overall system to interact with the interface. This plug-in nature of
programming allows each function to be tested independently before being encapsulated into
the system. Most of the functions being developed involve matrix manipulation and can be
tested simply by running them on the Matlab command window to see if they produce correct
results. Some functions need not be tested seperately from the system if their sole purpose is to
just change global variables or alter the interface in some way. The functions that are
constructed from a mathematical knowledge followed the sequence of development shown in
figure 7.1.2.2.
m-file
CWT
m-file
LoadSignal
m-file
Zoomin
m-file
switchcase CWT
--------
case Load Signal
--------
case Zoom in
--------
end
7/28/2019 Wavelets for Sound Analysis
35/63
Software Development
Page 29
Figure 7.1.2.2
One of the most useful GUI objects handles is the enabling handle. This allows the software
developer to control when a user can use a particular GUI object. To stop the wrong sequence
of inputs into the system, a particular object can be disabled at a time when it shouldnt be
used. These restrictions will make the software more secure and more likely to recover from
user input error.
7.2 Wavelet Learning Tool - Interface Design
The main system to be developed is the one identified in the requirements analysis, the
Wavelet Learning Tool (WLT). This tool will allow users to choose different wavelets to
compute a CWT and corresponding scalogram. A side by side comparison of different
wavelets was recommended, as was a way of manipulating the scalogram to produce a re-
synthesised signal from an Inverse CWT function.
Matlab allows you to construct an interface very simply and effectively with very little code. It
is very beneficial to do interface development in the early stages of development, as it will aid
in the understanding of what functionality is needed in the final system. Knowing which GUI
objects were available, an initial sketch of the interface was made and presented for comment.
Figure 7.2.1
The sketch shown in figure 7.2.1 is an early representation of the proposed look of thesoftware tool. The original idea was to have the following GUI objects.
Mathematical
Formula
Code into
Matlab
Test seperately
on command
screen
Introduce appropriate
Interface code and
handle commands
Plug into the systemusing case clause
Test the
system
7/28/2019 Wavelets for Sound Analysis
36/63
Software Development
Page 30
5 Graphs: Plot of loaded signal. 2 for showing the animated wavelet as it changes shape tocalculate the different scales of the scalogram. Plot of the CWT coefficients from the
selected scale from a scalogram. Plot of the reprocessed signal.
2 Scalogram Images each with their own cursor Pop-up menus for selecting a wavelet
Zoom Controls for zooming in on the plot of CWT coefficients Process buttons for re-processing the CWT. Play buttons for playing back the original and altered signals.
7.3 The coding of The Wavelet Learning Tool
Section 7.2 shows how an initial interface was designed to accommodate for the requirements
established in the requirement analysis. This initial interface was then coded into Matlab with
the help of the Guide tool. The Guide tool is a Matlab interface construction tool, which allows
a user to code an interface quickly and effectively without the tedious task of setting all the
handles to each GUI object. Although the Guide tool writes the interface code for you, it is in a
format which didnt suit the interactive nature of the system, therefore it was only used to setthe basic handles such as position and colour, whilst other handles such as call-backs were
coded by hand.
Having developed the interface first, it was then logical to develop the main functions, whichwould be called by the user via the interface. There are two main functions: the Continuous
Wavelet Transform (CWT) and the Inverse Continuous Wavelet Transform (ICWT).
This section describes the main coding developments that were undergone during the coding ofthe Wavelet Learning Tool. By no means is this a fully comprehensive description of all the
coding, as it does not include full details e.g. of global parameters or handle properties. To see
a full development log, see appendix C.
See Appendix C for the software development log
7.3.1 Coding the CWT
The requirements state that the user must be able to compare the effects of using different
wavelets with the CWT. The interface contains pop-up menus, which contain a list of the
different wavelets available for the computation. The main job of the CWT function is to
calculate the CWT coefficients at each scale of the wavelet transform and to plot these into a
CWT matrix. Chapter 3 described how this matrix could then be used to produce a scalogram
of the CWT by squaring each coefficient independently. The scalogram will have time on thex-axis and scale on the y-axis.
There are three separate CWT functions, one for each wavelet (Gauss, Morlet and Sombrero).
Each calculates the CWT matrix and sets this as a global variable. The CWT matrix is
calculated in the usual way by changing the shape of the wavelet at each scale of the
transform. The CWT functions first calculate how many scales to compute depending on thelength of the signal. Re-calling from chapter 3, as the scale value increases, the frequency
decreases and therefore the wavelet needs to dilate. So the user can see the final shape of the
wavelet, the CWT function calculates the largest scale first and then compresses the wavelet
until the scale=1. The different shapes of the wavelet are stored in another matrix, which is
also set as a global variable for other functions to use.
7/28/2019 Wavelets for Sound Analysis
37/63
Software Development
Page 31
When the user selects a wavelet from a pop-up menu either the function waveselecta orwaveselectb is called depending on which pop-up menu is executed. The waveselcta and
waveselctb functions are the main functions in the WLT program. These functions control all
the data handling and graphics handling of the system when a new CWT is calculated. Each
function controls its own part of the system, either A or B, which corresponds to which pop-up
menu is selected. The interface is split into A and B to allow for the side by side comparisons.
Both functions were developed simultaneously as they both contain similar functionality. The
functions were carefully developed to provide the following sequence of operations
The function reads which wavelet has been selected,
then calls the appropriate CWT function.
(The CWT function calculates the CWT matrix and matrix of wavelet shapes see above)
A graph is animated with the different forms that the wavelet takes at each scale of the CWT.
The function then presents the CWT matrix as a scalogram.
The scalograms cursor is drawn onto scale number 2.
The PlotCWT function is called.
The above list gives a summary of the functionality within the two main functions of the
system, namely waveselecta and waveselectb. These functions were the first to be developed
after the coding of the CWT function, as there needed to be a way of calling the function with
different arguments.
The summary states that after the displaying of the scalogram, the main function then calls a
function called PlotCWT. This function plots a graph of the CWT coefficients from a selected
scale of the scalogram. The scale is selected by the cursor and, as the cursor is re-drawn after
the construction of a new scalogram, the CWTPlot function needs to be called to update the
plot. The CWTPLot function is also called every time the cursor is moved, to update the plot
Recommended