Wavelets for Sound Analysis

Embed Size (px)

Citation preview

  • 7/28/2019 Wavelets for Sound Analysis

    1/63

    Wavelets for Sound Analysis

    and Re-Synthesis

    Graham Self

    2nd

    May 2001

    Project Supervisor: Dr. Guy Brown

    Second Marker: Dr. Joab Winkler

    This report is submitted in partial fulfilment of the requirement for the Bachelor of

    Science Dual Honours in Computer Science and Mathematics byGraham Self

  • 7/28/2019 Wavelets for Sound Analysis

    2/63

    Page I

    Declaration

    All sentences or passages quoted in this dissertation from other peoples work have

    been specifically acknowledged by clear cross-referencing to author, work and page(s).

    Any illustrations which are not the work of the author of this dissertation have been

    used with the explicit permission of the originator and are specifically acknowledged. I

    understand that failure to do this amounts to plagiarism and will be considered grounds

    for failure in this dissertation and the degree examination as a whole.

    Name:

    Signature:

    Date:

  • 7/28/2019 Wavelets for Sound Analysis

    3/63

    Page II

    Abstract

    This paper firstly introduces the main branch of mathematics that lead to the discovery ofWavelets and their transforms, Fourier Analysis. The paper gives a detailed account of the

    development of different Fourier transforms and describes the motivation for a need for a

    different transform due to the multiresolution problem. The theory of Wavelets is then

    introduced as a solution to this problem together with a detailed account of the Continuous and

    Discrete Wavelet Transforms.

    The paper also documents the development, testing and evaluation of two Computer Assisted

    Learning Tools that students could use to learn about wavelet theory.

  • 7/28/2019 Wavelets for Sound Analysis

    4/63

    Page III

    Acknowledgments

    Thanks go to

    My Parents, for being supportive

    Alice for being there every step of the way

    Guy for endless proof reading

    And to all those involved in the testing and evaluation

  • 7/28/2019 Wavelets for Sound Analysis

    5/63

    Page IV

    Contents

    Introduction 1

    1: Fourier Analysis 21.1 The Statement that changed mathematics 2

    1.2 Applications of Fourier Analysis 3

    1.3 The Fourier Transform Finding the frequency content of a signal 3

    1.4 The Discrete Fourier Transform 4

    1.5 The Fast Fourier Transform 5

    1.6 The Time/Frequency problem 5

    1.7 The Short Term Fourier Transform 6

    1.8 Technical definitions for Chapter 1 8

    2 : Introduction to Wavelets 11

    2.1 Where did they come from? 112.2 The Mother Wavelet 122.3 Wavelets achieve Multiresolution 12

    2.4 How do you create a Wavelet? 13

    3 : The Continuous Wavelet Transform 143.1 Theory 14

    3.2 Computation of the CWT 15

    3.3 Visualising the CWT 3D Plot 163.4 Visualising the CWT Scalograms 16

    4 : The Discrete Wavelet Transform 17

    4.1 Why not use the CWT? 174.2 Discretizing the Continuous Wavelet Transform 17

    4.3 Subband Coding 184.4 Example of Subband Coding 20

    5 : Computer Assisted Learning Tools 225.1 Computer Assisted Learning (CAL) 225.2 Which programming language? 23

    6 : Requirements Analysis 246.1 The initial requirement 24

    6.2 The Client and Developer scenario 24

    6.3 The Matlab Auditory Demos 26

    7 : Software Development 277.1 Getting familiar with MATLAB 27

    7.1.1 MATLABs GUI 27

    7.1.2 Coding in MATLAB 28

    7.2 Wavelet Learning Tool - Interface Design 29

    7.3 The coding of The Wavelet Learning Tool 307.3.1 Coding the CWT 30

    7.3.2 Coding the Inverse CWT 32

    7.3.3 Input error recovery 32

    7.4 Development problems 337.4.1 Surf vs Imagesc 33

    7.4.2 The disappearing cursor 34

  • 7/28/2019 Wavelets for Sound Analysis

    6/63

    Page V

    7.5 Development of the Subband Learning Tool 35

    7.6 Screen Shots 35

    7.7 Functionality of the Wavelet Learning Tool 36

    8 : Software Testing 388.1 Motivation 38

    8.2 Functional Testing 39

    8.3 Testing the Wavelet Learning Tool 398.3.1 Random Testing 40

    8.3.2 Testing Structural Synthesis 41

    8.3.3 The Category-Partition method 44

    8.4 Testing Summary 45

    9: Evaluation 469.1 Questionnaire Construction 469.2 User Guide 47

    9.3 Questionnaire Results 47

    9.4 Evaluation Summary 499.5 Evaluation of the Subband Learning Tool 49

    9.6 Future Work 49

    10: Conclusions 51

    References 52

    Appendix A Questionnaire 53

    Appendix B Wavelet Learning Tool User Guide 55

    Appendix C Software Development Log 57

  • 7/28/2019 Wavelets for Sound Analysis

    7/63

    Introduction

    Page 1

    Introduction

    Wavelets have many historical routes leading up to their discovery as they can be used in

    many different scientific areas. This dissertation documents the usefulness and importance

    wavelets have when calculating a frequency analysis on a signal. The main route of thediscovery of wavelets came as a natural progression from the Short Term Fourier Transform,

    as there was a need for a better analysis technique. The problem with using Fourier Transforms

    is that a frequency analysis cant give both good frequency and good time resolution at the

    same time. This is known as the multiresolution problem and was solved by using Waveletsinstead of a fixed analysis window. The wavelets have the important property that they dont

    have to be fixed in size. They can be easily controlled by parameters, which compress and

    dilate the wavelet depending on which frequency is being analysed in the signal. An account of

    how to create wavelets using the mother wavelet formula and how they can be applied to

    signal analysis is also given in this paper.

    The free movement of the wavelets was exploited in the Continuous Wavelet Transform to

    give a detailed representation of the frequencies in a signal. The signal could now be

    represented with both good time and good frequency resolution, so unlike before, the user

    could tell not only what frequencies occurred in the signal, but also when they occurred. Thetransform introduces the notion of a scale, which is equivalent to the inverse of the frequencies

    being analysed. The continuous wavelet transform produces a coefficient matrix, which can be

    visualised by the use of a scalogram, which is also investigated. The problem that the

    continuous wavelet transform has is that it calculates a coefficient for each sample in the signal

    for every scale that is to be calculated. For long signals, this can be time consuming and

    uneconomical when applying the transform computationally.

    The solution to this problem was solved by the development of a discrete version of thewavelet transform. The dissertation details the reasons for wanting to use a discrete version

    along with the arguments for showing how this can be done without loss of data. The DiscreteWavelet Transform is based on the ideas behind Nyquists Sampling Theorem and a method

    for calculating the transform is given called Subband Coding.

    A large part of this dissertation gives a detailed account of the development, testing andevaluation of two Computer Assisted Learning (CAL) Tools. Because wavelet theory is a

    fairly new topic, lecturers are limited with the way they can teach the topic to new students

    who are unfamiliar to the concepts that are involved. Therefore, CAL tools were developed to

    aid in the teaching of wavelets and to provide an interactive demonstration of what thetransforms can do and how they provide benefits to signal analysis. The dissertation provides

    evidence for the benefits for using CAL tools with a series of lectures and also what

    requirements were needed for them to be successful.

    The main tool developed alongside this dissertation is called the Wavelet Learning Tool. This

    was developed to demonstrate visually how wavelets are used in the continuous version of the

    wavelet transform. The user is able to compare the effects of using different wavelets and tosee a visual representation of the coefficient matrix produced, called a scalogram. The

    scalogram effectively shows the signal in the frequency domain. As the title of the dissertation

    suggests, the user can also re-synthesise a signal from a scalogram that has been altered. The

    tool allows the user to remove scales from the scalogram and then listen to the effects this has

    on the overall signal. This is effectively a frequency filtering technique.

    The second tool called the Subband learning Tool was developed to show visually the process

    undergone during Subband coding, which is the discrete version of the wavelet transform.

    Subband coding is, in its simplest terms, a series of filtering and sub-sampling operations andthe user can see what effects on a speech signal each stage of the process has.

  • 7/28/2019 Wavelets for Sound Analysis

    8/63

    Fourier Analysis

    Page 2

    1 : Fourier Analysis

    1.1 The Statement that Changed Mathematics

    Before 1930, the main branch of mathematics leading to Wavelet Theory formed as a result ofa natural progression from the world of the Fourier analysis. Fourier analysis is a significant

    discovery and has influenced many different areas of applications including science, maths,

    engineering and most important of all, signal processing. Joseph Fourier started off the

    Fourier revolution by introducing a simple mathematical statement:

    Any periodic function can be represented as a sum of sines and cosines

    This discovery had fundamental importance to the signal processing world, as a complex

    signal could be visualised as a combination of smaller signals of which we know a lot about.

    Studying these smaller signals i.e. sine waves and cosine waves, will directly allow you toinfer properties about the more complex signal. We will see later how the amplitudes of the

    sine and cosine waves in a complex signal enable you to calculate the frequencies in the signal.

    Fouriers statement can be expressed mathematically as follows.

    f(x) = )sincos(2

    1

    1

    0

    ++ kxbkxaa kk [EQ 1]

    Where the Fourier Coefficientskk

    baa ,,0 are defined as

    =

    2

    0

    0 )(2

    1dxxfa =

    2

    0

    )cos()(1

    dxkxxfak =

    2

    0

    )sin()(1

    dxkxxfbk

    [M 1993]

    This is known as the Fourier Series and says that any curve that periodically repeats itself can

    be expressed as the sum of perfectly smooth oscillations i.e. the sines and cosines.

    Barbara Burke Hubbard described how the Fourier Series can be represented by a process ofmultiplying various sinusoidal waves (sines and cosines) with certain amplitude coefficients

    and then shifting them so they either add or cancel [H 1995]. An example of this procedure is

    shown in figure 1.1.1

    Figure 1.1.1

    a) sin(x), b) sin(x) + sin(2x), c) sin(x) + sin(2x) + (4+ sin(3x))

    The sinusoidal waves are called basis signals as they form a basis of any function. These are

    also the basis for the Fourier Transform (see Section 1.3), which is derived from the property

    described above. The sines and cosines form a basis of any signal. See page 8 for anexplanation of a basis

    a)

    b)

    c)

  • 7/28/2019 Wavelets for Sound Analysis

    9/63

    Fourier Analysis

    Page 3

    1.2 Applications of Fourier Analysis

    Not only did the Fourier Series aid mathematicians to differentiate difficult functions; the

    Fourier Series opened a new door to frequency analysis. Before Fourier, raw signals werealways represented in the time domain. The problem with a signal being represented in the

    time domain is that no information, except amplitude, can be given at a specific time. It isoften more convenient to be able to see what frequencies occur at different time intervals.

    As described before, the Fourier Series consist of a combination of sinusoids. If you extract the

    amplitudes of each sinusoidal component you obtain the Fourier Coefficients. As the sinusoids

    are equally spaced in frequency i.e. sin x, cos x, sin 2x, cos 2x knowing these coefficients

    gives us information on which frequencies are present in the function.As sound signals are effectively represented as a function, the Fourier Series enables us to

    extract the frequencies present in the signal.

    This leads us to the Fourier Transform, which converts a signal from the time-domain into the

    frequency-domain.

    1.3 The Fourier Transform Finding the frequency content of a signal

    The Fourier Transform, as introduced above, converts information about a signal in the time-

    domain into a signal in the frequency-domain. It also allows you to go back without loss of

    information.

    Figure 1.3.1

    The only frequencies that contribute to the Fourier Series of a periodic function (see 1.1) arethe integer multiples of the functions fundamental frequency. This fundamental frequency,

    also known as the base frequency is the inverse of the period of the signal. For example, if the

    signal has a period of 2 ms ( = 0.002 s), the fundamental frequency will be 1/0.002 = 500Hz.The Fourier Transform (FT) also allows certain non-periodic functions (those that decrease

    fast enough so that the area under their graphs is finite) to be converted i.e. it is still possible to

    describe it in terms of its frequencies. But to do this, you need to compute coefficients for all

    possible frequencies, to compute its FT.

    The Fourier Transform can be derived from the Fourier series, where the coefficients for a and

    b are defined as follows.

    = xdxxfa 2cos)()(

    = xdxxfb 2sin)()( [H 1995] [EQ 2]

    Since we are now dealing with a non-periodic function, we must consider the interval between

    minus-infinity and infinity.

    Transforming into the complex planeIn Fourier Analysis, complex numbers make it possible to have a single coefficient for each

    frequency. This means that you no longer need coefficients for the sines and cosines, just one,

    which will give you the information of both. It is calculated using the fact that a complex

    number z = x + iy can be represented by the point (x,y) in the complex plane. The x-axis

    representing the real part of the number z, and the y-axis representing the imaginary (i) part(see figure 1.3.2). The phase of the complex number is simply the angle that is created when

    Inverse Fourier Transform

    Fourier Transform

    TIME DOMAIN FREQUENCY DOMAIN

  • 7/28/2019 Wavelets for Sound Analysis

    10/63

    Fourier Analysis

    Page 4

    Figure 1.3.2The Complex Plane

    a line joins the point (0,0) to the point (x,y). The magnitude is the length of the line, which is

    calculated using Pythagoras Theorem.

    To write a Fourier series or transform using complex numbers, you can use the formula

    sincos iei += , which is derived from the complex planeThe formula for the Fourier Series of a periodic function (equation 1) can now be written

    ikx

    kecxf

    =)(

    Where the formulas for the coefficients (equation 2) become

    dxexfcikx

    k =1

    0

    )(

    Using this knowledge, the formula for the Fourier Transform )(f and its inverse f(x) are

    dxexffxi

    = )()( [EQ 3]

    defxf

    xi)(

    2

    1)(

    =

    1.4 The Discrete Fourier Transform

    Digital computers are finite machines: any desired computation can only use a finite number of

    operations. No digital computer, then, can deal with real-(or complex)-valued functions of real

    numbers. The Discrete Fourier Transform (DFT) samples the function in order for it to be

    represented on a computer. Instead of having f(x) at all x, we have only the values of f at a

    finite number of points f(x1), f(x2), For convenience, this is usually sample points at regular

    intervals, of say. If you start sampling at x1=0, the sequence of sample points becomes 0, ,2,,(n-1) and the sample values are f(0), f(), f(2),,f((n-1) )Because the DFT is to enable you to use digital computers in Fourier Analysis, we need a

    finite analogue of the Fourier Transform given in equation 3.

    The integral is all over x, and is calculated for every real value of . In the finite analogue, theintegral becomes the sum

    =

    1

    0

    )(n

    r

    rierf

    [C 1990] [EQ 4]

    This is still defined for all complex numbers of absolute value 1, therefore the equation needs

    restricting further, so we have a finite number of values .We use the fact that

    rie , for fixed , is periodic, with period 2/. Thus the range can berestricted to values of between 0 and 2/.So equation 4 now becomes the formula for the Discrete Fourier Transform (DFT)

    ==

    1

    0

    /2, )()/1()/2(

    n

    r

    nikrn erfnnkfD

    (x , y)

    x

    y

    magnitude

    )(22

    yx +

  • 7/28/2019 Wavelets for Sound Analysis

    11/63

    Fourier Analysis

    Page 5

    1.5 The Fast Fourier Transform

    The Fast Fourier Transform is of special interest because it has enabled the use of the DFT to

    be faster and therefore more commonly used. When calculating the DFT, a small number ofFourier Coefficients is often adequate for an accurate Fourier Transform. It means that, with

    the aid of computers, a Discrete Fourier Transform can be quickly processed.An example of this is the fft command in Matlab. The program uses the FFT technique to give

    a fast effective result, which can then be plotted on the frequency domain.

    Figure 1.5.1 shows the result of using this command.

    Figure 1.5.1Matlab plots of left) Plot of a speech signal right) Its Fast Fourier Transform

    When the Fourier Transform is plotted, only the amplitude data is shown and the phase is

    discarded.

    The Fast Fourier Transform (FFT) was introduced by Cooley and Tuley in the mid 1960s. Its

    effects have been evolutionary, turning Fourier Analysis from merely being a mathematical

    tool to being a practical one as more and more people turned to Fourier Transforms as an

    effective way of frequency analysis. Cooley and Tooley developed an algorithm to speed up

    the DFT for signals that had a length of a power of 2 e.g. 2, 4, 8, 16, 32, 64 This is becausethe method recursively halved the data.

    The algorithm is described as a divide and conquer algorithm that systematically divides the

    data into two halves, the FFT is then performed on each half, and then the two halves are

    spliced back together.

    The FFT cuts down the number of computations needed from 2

    n to n log n. Therefore thelarger the value of n, the more impressive the gain in speed of calculating the FT.

    Since the Cooley and Tooley algorithm has been developed, a new FFT algorithm has been

    discovered where the signal need not have a length of a power of 2.

    1.6 The Time/Frequency problem

    There is a fundamental problem with the Fourier Transform that led to the search for a moresuitable transform such as the Wavelet Transform.

    A function and its Fourier Transform are two faces of the same information. The function

    displays the time information and hides the information about frequencies, and the Fourier

    Transform displays the frequency information but no information about the time. The question

    that is raised is, is it necessary to have both the time and the frequency information at the same

    time?

    The answer depends on the particular application and the nature of the signal in hand. The

    Fourier Transform gives the frequency information of the signal, which means that it tells ushow much of each frequency exists in the signal, but it does not tell us when in time these

    frequency components exist.

    Olivier Rioul and Martin Vetterli state that a transform such as this one can only be effectivelyapplied to stationary signals[R 1991].

  • 7/28/2019 Wavelets for Sound Analysis

    12/63

    Fourier Analysis

    Page 6

    Definition: Signals in which frequency content doesnt vary significantly in time are

    Stationary.

    The only strictly stationary signal is one with constant frequency.

    Here is an example.Robi Polikar gave the following example in his Wavelet tutorial, which shows the problems

    with non-stationary signals. [P 1996]

    Figure 1.6.1 is a plot of a stationary signal and its Fast Fourier Transform, zoomed in. The

    signal is x(t)=cos(2*pi*10*t)+cos(2*pi*25*t)+cos(2*pi*50*t)+cos(2*pi*100*t) and so

    contains frequencies of 10, 25, 50, 100 hz at any given time instant.

    Figure 1.6.1Plot of a Stationary signal and its FFT

    Figure 1.6.2Plot of a Non-stationary signal and its FFT

    Figure 1.6.2 is a plot of a signal with four different frequency components at four differenttime intervals i.e. it is non-stationary. This signal contains the same frequencies as the signal

    used for the stationary example. The Fast Fourier Transform is also given.

    The ripples in the FFT are due to sudden changes from one frequency to another and are

    irrelevant in this example.

    Apart from the ripples, the two plots of the FFT look identical, they both have large peaks at

    frequencies 10, 25, 50, 100 Hz. The problem is apparent when you ask yourself when these

    frequencies actually occurred. It is straightforward for the stationary example, but not so forthe non-stationary signal. This example shows that any change in frequency of a signal at the

    slightest of time intervals will effect the overall FFT. As a result, you would get completely

    false information.

    As this project involves non-stationary signals, a further approach is needed.

    1.7 The Short Term Fourier Transform

    Fourier analysis doesnt work equally well for all kinds of signals or for all kinds of problems.

    Hubbard describes the nave approach of some scientists when applying Fourier

    In some cases, scientists using it are like the man looking for a dropped coin under a

    lamppost, not because that is where he dropped it, but because thats where the light is

    [H 1995]

    If the signal is non-periodic, the summation of the periodic functions, sine and cosine, doesnt

    accurately represent the signal. Research into artificially extending the signal to make it

  • 7/28/2019 Wavelets for Sound Analysis

    13/63

    Fourier Analysis

    Page 7

    periodic revealed that you would require what is called continuity at the endpoints of thefunction.

    Dennis Gabor introduced the Short Term Fourier Transform (STFT), also known as

    Windowed Fourier Transform, as an attempt to overcome the problem with identifying when afrequency occurred in a non-stationary signal. STFT introduces the notion of time dependency

    into the Fourier Analysis.

    Gabors basic Idea: Introduce a local frequency parameter (local in time) so that the Fourier

    Transform looks at the signal through a window over which the signal is approximately

    stationary.

    Figure 1.7.1

    Studying the frequencies of the signal segment by segment limits the span of time during

    which something is happening.

    Formal Definition

    Given a signal x(t), Gabor recognised that to be accurate in time, a two-dimensional time-frequency representation is needed, S(t, f), where f is the local frequency.

    Recall, the signal is stationary when it is seen through a window g(t) (see figure 1.7.1).

    The signals viewed in the window are represented by the following equation.

    )()( Ttgtx T is the time location where the window is centred, and g (t) is the window function.

    The multiplication of x(t) by g (t) is called Convolution (see page 10)

    The Fourier Transform of these signals is then obtained by applying this to the FourierTransform given in Equation 3 (page 4)

    As you can see by the formula, the STFT relies heavily on the choice of the window. In figure

    1.7.1, the window was a basic rectangular window, but for more accurate results, different

    shaped windows can be used such as the preferred Hamming window.

    Another factor is the size of the window. Although the size of the window is fixed for the

    entire process of calculating the STFT, different STFTs can be calculated using different sizedwindows. A small window is effectively blind to low frequencies, as they are too large for the

    window, but using a large window, information is lost about brief changes.

    We will see later how Wavelets have combated this problem as an attempt to see the wood and

    the trees.

    Figure 1.7.2 illustrates the windowing of a signal in STFT.It shows Gabors 2-dimension principle and gives two alternative views.

    Analysis Window g(t)The wave is stationary in this

    windowed sectionT

    Time (t)

    x(t)

    dteTtgtxfTSTFTftj2

    )()(),(

    = [EQ 5]

  • 7/28/2019 Wavelets for Sound Analysis

    14/63

    Fourier Analysis

    Page 8

    Figure 1.7.2[R 1991]

    Figure1.7.2 shows vertical stripes in the time-frequency plane. They illustrate the windowing

    of the signal view of the STFT. Windowing at time t, it computes all frequencies of the STFT.

    The alternative view is shown by the horizontal stripes. It is based on a filter bank

    interpretation of the STFT process. At a given frequency f, the STFT amounts to filtering the

    signal at every value of t, using a bandpass filter. The window function is modulated to thegiven frequency.

    The time/frequency resolution problem with STFT

    In 1975, Jean Morlet recognised a problem: Unlike Fourier Analysis, The STFT system has thedisadvantage of being imprecise about time in the high frequencies because the size of the

    window is fixed. If you then make the window very small, it means losing all the informationabout low frequencies.

    So Morlet took another approach, which lead to the discovery of the WAVELET.

    1.8 Technical Definitions for Chapter 1

    Basis Functions

    A group of functions such as y = sinx, y=sin2x, etc. form a basis if

    i) They are all linearly independent from each other.ii) They can form any other linear function i.e. they span a vector space.

    [Adapted from PMA211 course notes]

    Linear Independence Formal Definition

    Let v1,,vr be functions over a field F (eg. Real Numbers).

    Say that there exist coefficients a1,,ar in F, not all zero, such that a1v1+a2v2++arvr = 0If the only way this can occur is that a1=a2==ar=0, then v1,,vr are all linearly

    independent. Basically, this means that you cant make one of the linear independent functions

    by combining any multiples of the others that are in the same basis.

    Part ii) states that given any linear function in the same field, you can make it up by using

    combinations of the basis functions.

    STFT(T,f2)

    STFT(T,f1)

    Tf

    Sliding Window g(t)

    STFT(T1,f) STFT(T2,f)

    T1 T2

    f1

    f2

    modulated

    filter bank

  • 7/28/2019 Wavelets for Sound Analysis

    15/63

    Fourier Analysis

    Page 9

    In the case of the Fourier Transform, the basis is made up of sinusoids. According to the

    definition of a basis, this means that all sinusoids are linearly independent from each other.

    The following is a proof to show this.

    A proof to show linear independence of sinusoidsFirstly, we need to explore Orthogonality and Inner Products

    The Inner Product of two functions is a mapping < , > : V x V where V is a function in avector space and is a real number. We only need to concern ourselves with the mapping fora continuous function, as both sin nx and cos nx are both continuous.

    Given the space of all continuous functions on the closed interval [a,b], the Inner Product < >

    is defined by

    >= =

    ++=

    tdtnmtdtnmntdtmt )cos(2

    1)cos(

    2

    1coscos

    + ++= nm

    tnmnm

    tnm )sin(21)sin(

    21 = 0

    so cos mx and cos nx are orthogonal

    Step 2 proof that sin mx and sin nx are othogonal where m n

    < sin mx, sin nx > =

    +=

    tdtnmtdtnmntdtmt )cos(2

    1)cos(

    2

    1sinsin

    ++

    =nm

    tnm

    nm

    tnm )sin(

    2

    1)sin(

    2

    1= 0

    so sin mx and sin nx are orthogonal

    Step 3 proof that cos nx and sin mx are othogonal

    < cos nx, sin mx > =

    ++=

    tdtmntdtmnmtdtnt )sin(2

    1)sin(

    2

    1sincos

    if n m

    +

    ++

    =mn

    tmn

    mn

    tmn )cos(

    2

    1)cos(

    2

    1= 0

    if n = m

    ++

    = mntmn )cos(

    2

    1

    = 0

  • 7/28/2019 Wavelets for Sound Analysis

    16/63

    Fourier Analysis

    Page 10

    so cos nx and sin mx are othogonal

    All possible cases have been exhausted, and in each case the inner products have equalled to

    zero, therefore, all sinusoids are orthogonal to each other and hence they are linearly

    independent and form a basis for the Fourier Transform.

    Convolution

    An operation of the form x(n)h(n) is called Convolution, written x(n) h(n), but the symbol isusually omitted.The Matlab command in conv(x,h)

    Convolution is used to calculate the response of the system to an arbitrary input signal by

    convolving it with the systems impulse response.

    You can think of convolutions geometrically, but it is best to explain them mathematically as

    this is what computers do when they calculate the convolution.The convolution of two sequences a and b, is given by

    jkjk baba

    = )(

    wherek

    ba )( is the kth element of the resulting sequence.If the ja and jb are nonzero only for j >= 0 then

    jk

    k

    j

    jkbaba

    ==

    0

    )( [H 1995]

    The convolution property is more useful when applied in a transformed domain (such as

    frequency is the transformed domain of time in Fourier Analysis). It is very hard to visualise

    what is happening to two signals after convolution when still in the time domain, but in the

    transformed domain, convolution becomes multiplication. ][][][ yTxTyxT =

    So this is effectively where corresponding values of points along the x-axis of both graphs are

    multiplied together to form the new point.

    Figure 1.8.1

    TIME TIME

    FREQUENCY FREQUENCY

    FOURIER TRANSFORM

  • 7/28/2019 Wavelets for Sound Analysis

    17/63

    Introduction to Wavelets

    Page 11

    2 : Introduction to Wavelets

    2.1 Where did they come from?

    Wavelets were discovered as a result of engineering and not from mathematics like mostapplications in signal processing. Yves Meyer was one of the first people who realised the

    importance of wavelets and recognised that most researchers had been using a process

    resembling the wavelet process already without knowing of its history or functionalbackgrounds. Wavelet theory has developed independently from a large number of areas, and

    it was he who made the connection. He made the following comment.

    Tracing the history of wavelets is almost a job for an archeologist, I have found at least 15

    distinct roots of the theory, some going back to the 1930s[H 1995]

    This dissertation focuses on the discovery of wavelets as an approach to solve the

    time/frequency resolution problem presented in the previous chapter.

    The first use of wavelets was when Morlet was using Short Term Fourier Analysis. He wasusing STFT when working on a system that processed echo signals, used for aiding the

    localisation of oil for excavation. Big windows were placed at different places on the signal,

    then, as the price of computing dropped further; windows were placed closer and closer

    together, even overlapping. Morlets problem was, no matter what he did; the process didnt

    get any better. Morlet wanted a finer local definition.

    As mentioned in section 1.7, the STFT system has the disadvantage of being imprecise abouttime in the high frequencies (unless you make the window very small, which means loosing all

    the information about low frequencies).

    So Morlet decided on another technique. Instead of keeping the size of the window fixed andfilling it with oscillations of different frequencies, he did the reverse: he kept the number of

    oscillations in the window constant and varied the width of the window.

    This window is called a WAVELET.

    Figure 2.1.1

    STFT Vs Wavelets

    When the wavelet is stretched, the oscillations inside of it are stretched, decreasing their

    frequency. When the wavelet is compressed, higher frequencies are produced.

    Figure 2.1.1 shows the difference between STFT and Wavelets.

    Top Row: STFT

    Size of window is fixed and the number of oscillations varies.Small window is blind to low frequencies, which are too large for the window.

    STFT

    WAVELETS

  • 7/28/2019 Wavelets for Sound Analysis

    18/63

    Introduction to Wavelets

    Page 12

    The large window looses information about a brief change in the information concerning the

    entire interval corresponding to the window.

    Bottom Row: WAVELETS

    A mother wavelet (left) is stretched or compressed to change the size of the window.It makes it possible to analyse a signal at different scales.

    2.2 The Mother Wavelet

    The mother waveletis the building block for all other wavelets.

    All wavelets are generated from a single wavelet function by a series of simple scaling and

    translation procedures. This two dimensional parameterization is obtained from the function

    (t) by

    )2(2)( 2, kttj

    j

    kj = j , k Z [B 1998]

    Z is the set of all integers. The factor 22j

    maintains a constant norm idependent of scale j.

    k- parameterisation of the time or space location.

    j- the frequency or scale.

    The function (t) is called the generating wavelet or Mother Wavelet and defines the waveletbasis. The term basis is the same here as it was in the FT case.

    Looking at the formula, it is clear to see that there are infinitely many mother wavelets, which

    form the foundations of the Wavelet Transforms.

    2.3 Wavelets achieve Multiresolution

    Figure 2.3.1

    [G 1995]

    Frequency Frequency

    TimeTime

    a) b)

  • 7/28/2019 Wavelets for Sound Analysis

    19/63

    Introduction to Wavelets

    Page 13

    Amara Graps describes the benefits of using wavelets in signal processing. Wavelets overcome

    the time/frequency resolution problem because of their ability to be stretched and compressed

    (see section 2.2).

    Part a of figure 2.3.1 shows a STFT, where the window is simply a square. Because a single

    window is used for all frequencies in the STFT, the resolution of the analysis is the same at all

    locations in the time/frequency plane.

    An advantage of using wavelets in a transform is that the width of the windows can vary. You

    can have short high-frequencies windows and long low-frequency windows.Part b shows the coverage in the time/frequency plane with a wavelet function.

    2.4 How do you create a Wavelet?

    As mentioned earlier, there are infinitely many wavelets. Unlike the basis for the Fourier

    Transform i.e. sinusoids, wavelets can contain many sharp corners or discontinuities.

    Wavelets are obtained by altering the variables j and k given in the mother wavelet formula in

    section 2.2. These variables are integers that scale and dilate the mother function to generate

    different wavelet families such as the Daubechies family (see below). The scale index jindicates the wavelets width, and the translation index k gives its position.

    The term position is used in the same sense as it is for the STFT; it is related to the location of

    the window, as it is shifted through the signal.

    Figure 2.4.1

    Figure 2.4.1, shows the Daubechies Wavelet family with different scaling and transitions.

    They were created by a Matlab function that used the rules described by Ingrid Daubechies, inher book Ten Lectures on Wavelets. [D 1992]

    Within each family of wavelets (such as the Daubechies family) are wavelet subclasses that

    are distinguished by the number of coefficients and by the level of integration. These wavelets

    are classified within a family often by the number of vanishing moments. [G 1995]

  • 7/28/2019 Wavelets for Sound Analysis

    20/63

    The Continuous Wavelet Transform

    Page 14

    3 : The Continuous Wavelet Transform

    The Continuous Wavelet Transform (CWT) was developed as an alternative approach to the

    Short Term Fourier Transform (STFT) to overcome the time/frequency resolution problem

    (section 1.7)

    3.1 Theory

    The CWT is done in a similar way to the STFT, in the sense that the signal is multiplied with a

    function, which is in this case, the Wavelet introduced in the previous chapter. Also, like

    STFT, the transform is computed seperately for different segments of the time-domain signal.The main difference between CWT and STFT is that the width of the window is changed as

    the transform is computed for every single spectral component, which is probably the most

    significant characteristic of the CWT.

    In the STFT computation, because the window had a constant shape and size throughout the

    analysis, the frequency responses of the window were regularly spaced over the frequency

    axis. Figure 3.1.1 (a) shows what filter bank the STFT produces. A filter bank is a term used to

    describe the filtering effects on the frequencies in the signal as the window moves along thesignal.

    Figure 3.1.1

    [R 1991]

    In the CWT case, instead of the frequency responses of the analysis filter being regularly

    spaced over the frequency axis, they are regularly spread in a logarithmic scale (figure 3.1.1

    (b) ).

    Olivier Rioul and Martin Vetterli describes that this logarithmic approach in the filter banks isused for modelling the frequency response of the cochlea situated in the inner ear and is

    therefore adapted to auditory perception. [R 1991]

    We have already introduced Wavelets as the basis function for the CWT, and that these are

    scaled and translated versions of the mother wavelet. The following formula expresses the

    CWT in terms of the signal applied to the wavelet

    Formula

    dts

    ttx

    sabssCWT

    x

    =

    )(

    )(

    1),( [EQ 6]

    This shows the transformed signal is a function of two variables, and s, the translation andscale parameters, respectively and (t) is the mother wavelet.

    Fre uenc f Fre uenc f

    (a) Constant Bandwidth (STFT) (b) Constant Relative Bandwidth (CWT)

  • 7/28/2019 Wavelets for Sound Analysis

    21/63

    The Continuous Wavelet Transform

    Page 15

    Notice that we do not have a frequency parameter, as we had with the STFT (EQ 5, page 7),

    instead, we have a scale parameter which is defined as 1/frequency.

    Robi Polikar made the following analogy:

    The scale parameter in the wavelet analysis is similar to the scale used in maps. As is the

    case of maps, high scales correspond to a non-detailed global view (of the signal), and low

    scales correspond to a detailed view. [P 1996]

    3.2 Computation of the CWT

    This section explains the formula given above and shows some applications.

    Let x(t) be the signal that is to be analysed. Firstly, you need to choose a wavelet to act as the

    analysing window. There are several candidates, Morlet, Sombrero, Daubechies, which are all

    derived from the mother wavelet. Once the wavelet is chosen, the computation starts at s = 1

    and the CWT is computed for all values of s, smaller and larger than 1. It is conventional that

    the value of s (the scale) starts at 1, but this doesnt always have to be the case. The procedure

    then continues for increasing values of s i.e. the analysis will start from high frequencies and

    proceed towards low frequencies.As the value of s increases, the more the wavelet dilates, so the first value of s corresponds to

    the most compressed wavelet.

    The wavelet is placed at the beginning of the signal which is at the point which corresponds

    to t = 0. The wavelet function at scale 1 is multiplied by the signal and then integrated. The

    result of this integration is then multiplied by the constant number 1/sqrt(s). This is for

    normalisation purposes only, so that the transformed signal will have the same energy at every

    scale. The final result is the value of the CWT at time zero (t=0) and scale s=1 in the time-

    scale plane.

    The value of the transformation is calculated every time the wavelet is shifted towards the right

    by . Therefore the value for the CWT is obtained at t=0, t=, t=2, etc. with scale s=1as thewavelet is shifted. This procedure repeats until the wavelet reaches the end of the signal. Onerow of points on the time-scale plane is then completed. Sections 3.3 and 3.4 show how these

    rows are represented.

    s is then increased by a small value and the above procedure is repeated for every value of s,

    where each value of s fills the corresponding row of the time-scale plane.

    Figure 3.2.1 shows the CWT process with s=1. The wavelet is the Morlet wavelet and isshown in yellow. Here, t represents the value of time where the centre of the wavelet is

    positioned.

    Figure 3.2.2 shows the CWT process with s=5.

    Figure 3.2.1s=1 a) t=2, b) t=40, c) t=90, d) t=140

    Figure 3.2.2s=5 a) t=2, b) t=40, c) t=90, d) t=140

    a b a b

    c d c d

  • 7/28/2019 Wavelets for Sound Analysis

    22/63

    The Continuous Wavelet Transform

    Page 16

    3.3 Visualising the CWT 3D Plot

    Section 3.2 showed how the CWT is calculated by moving the wavelet window along the

    signal at different wavelet scales. Each time the scale is increased, a new row of a matrix isadded. The matrix produced by the CWT process has the following dimensions.

    x-axis : Translation (depends on the value of tau)y-axis : the number of different values of s used

    Figure 3.3.1 shows a typical plot of a CWT.

    Figure 3.3.1

    As described earlier, the scale parameter s in equation 7 is actually the inverse of frequency. In

    other words, frequency decreases as scale increases. So the portion of the graph in figure 3.3.1with scales around zero, actually correspond to highest frequencies in the analysis.

    3.4 Visualising the CWT - Scalograms

    The Scalogram is a very common tool in signal analysis, as it provides a distribution of theenergy of the signal in the time-scale plane. Olivier Rioul and Martin Vetterli recognised that

    the CWT is isometric and therefore preserves energy. They proved this with the following

    formula

    XE

    s

    dsdsCWT = 2

    2

    ),(

    [R 1991]

    where2

    )(= txEx is the energy of the signal x(t).

    This discovery lead to the definition of the scalogram, as the squared modulus of the CWT.Figure 3.4.1 shows an example of a typical scalogram.

    Figure 3.4.1

    The Scalogram is the visual representation used in the Wavelet Learning Tool being developedas part of this dissertation

  • 7/28/2019 Wavelets for Sound Analysis

    23/63

    The Discrete Wavelet Transform

    Page 17

    4 : The Discrete Wavelet Transform

    4.1 Why not use the CWT?

    As described in section 3, a signal can be transformed from the time domain to the frequencydomain using the Continuous Wavelet Transform (CWT) while reducing the loss of time and

    frequency resolution. The section explained how the CWT was calculated by changing the

    shape of the wavelet, which acts as the analysis window, for each analysis frequency. The

    wavelet shape was governed by the scale parameter s where a larger s would represent a

    more dilated wavelet. The wavelet would move along the signal with each scale and calculate

    the CWT coefficient for each step. The size of the steps are governed by the parameter.

    The s and parameters are continuous, i.e. their values can be incremented up to any value,and hence the transform is called the Continuous Wavelet Transform. Due to these parameters

    being continuous, the CWT is not well suited to computer implementation [A 1996]. Although

    the Wavelet Learning Tool, being developed alongside this dissertation, is being developed touse the CWT, to show how the process works and how the scalograms are produced, it is not

    the quickest or most practical transform to use. The tool will only allow you to use small

    values of s and , any larger values i.e. a signal with too many samples increases computationtime dramatically.

    4.2 Discretizing the Continuous Wavelet Transform

    In the Continuous Wavelet Transform, the wavelet coefficients were calculated using equation6. As mentioned above, the CWT cant be practically computed because it contains an integral

    with which the variables are continuous. It is therefore necessary to discretize the transform.

    The most intuitive way of doing this is to simply sample the time-frequency plane. With mosttransforms, the most natural choice would be to sample the plane with a uniform sampling rate,but in the case of Wavelet Transforms, the scale change can be used to reduce the sampling

    rate. Nyquist developed the following rule that explains the reasons for using scale to reduce

    sampling rate.

    Nyquists Sampling Theorem: If the range of frequencies of a signal measured in cycles per

    second is n, then the signal can be represented with complete accuracy by measuring its

    amplitude 2n times a second. [H 1995]

    This theorem describes how a curve with a finite number of frequencies can be represented

    exactly by a finite number of samples. Usually you would need an infinite number of samples

    in order to represent the curve exactly.

    Nyquists Sampling Theorem can be interpreted so that if the time-scale plane needs to besampled with a sampling rate of n1 at scale s1, the same plane can be sampled with a sampling

    rate of n2 at scale s2, where s1f2) and n2

  • 7/28/2019 Wavelets for Sound Analysis

    24/63

    The Discrete Wavelet Transform

    Page 18

    Figure 4.2.1Dyadic Sampling Grid [V 1995]

    The Dyadic Sampling Grid shown in figure 4.2.1 is a pictorial representation of the

    relationship between sampling frequency and scale. As scale increases down the graph, thefrequency being analysed decreases. Nyquists Rule says that the further you go down the

    graph, the lower the sampling rate that is needed. In the figure, the sampling rate is represented

    by the dots. The more dots, the higher the sampling rate. Each dot corresponds to a Waveletcoefficient calculated using the Continuous Wavelet Transform. The larger the scale

    parameter, the fewer the number of coefficients that are needed, and therefore the quicker the

    computation time.

    You could think of the area covered by the axes as the entire time-scale plane. The CWT will

    assign a value to the continuum of points on this plane. There are obviously an infinite number

    of CWT coefficients. Considering the discretization of the scale axis, among the infinitenumber of points, only a finite number of them will actually be calculated, using a logarithmic

    rule. The base of the logarithm depends on the application but the most common is 2 because

    of its convenience. An application called Subband Coding (see section 4.3), uses such a base.

    If the base 2 is chosen, only the values 2,4,8,16,32etc are used for the scale parameter. The

    time axis is then discretized according to the discretization of the scale axis. Since the discretescale changes by a factor of 2, the sampling rate is reduced for the time axis by a factor of 2 at

    every scale. You can see at each stage of the Dyadic Grid that the sample rate is decreased by

    half. As a consequence, the Discrete Wavelet Transform uses wavelets only of the form where

    the scalekj 2= and k is a whole number, see the formula for the mother wavelet on page 12.

    4.3 Subband Coding

    There are two well documented methods for calculating the Discrete Wavelet Transform basedon the ideas expressed in section 4.2, The Multiresolution Pyramid and Subband Coding. This

    section will give a detailed explanation of the later of these two and will be used as part of asecond piece of software showing the visual effects the method has on the signal.

    Driven by applications such as speech and image compression, a method called Subband

    Coding was proposed by Croisier, Esteban and Galand using a special class of filters called

    quadrative mirror filters in the late 1970s [V 1995]

    The Subband coding scheme, first popularised in speech compression, uses a combination of

    high-pass and low-pass filters to reduce the sample rate of the transform. Filters of different

    cut-off frequencies are used to analyse the signal at different scales. The whole Subband

    process consists of a series of these filters known as a filter bank. High-pass filters are used to

    analyse the high frequencies in the signal, and the signal is passed through a series of low-pass

    filters to analyse the low frequencies.

    log s

  • 7/28/2019 Wavelets for Sound Analysis

    25/63

    The Discrete Wavelet Transform

    Page 19

    The resolution of the signal, which is a measure of the amount of detail information in the

    signal, is changed by the filtering operations, and the scale is changed by downsampling (sub-

    sampling) operations. Sub-sampling a signal corresponds to reducing the sampling rate, which

    is equivalent to removing some of the samples of the signal. For example, subsampling by two

    refers to dropping every other sample of the signal (see figure 4.2.1). Subsampling by a factor

    n reduces the number of samples in the signal n times.

    Figure 4.3.1The Subband Coding scheme shown as a filter bank tree. [R 1991]

    h(n) high-pass filter, g(n) Low-pass filter, 2 Subsampling by 2

    The procedure starts with creating a high-pass filtered version of the signal by passing the

    signal through a half band digital low-pass filter. This is done by convolving the signal by an

    impulse response function h[n] which represents the low-pass filter. A half band low-pass filtereliminates exactly half the frequencies from the low end of the frequency scale. For example,

    if a signal has a maximum of 1000 Hz component, then half band lowpass filtering removes all

    the frequencies above 500 Hz.

    There is an important thing to consider when talking about frequency in the discrete case and

    is explained as follows.

    In discrete signals, frequency is usually expressed in terms of radians. As a result of this, the

    sampling frequency of the signal is equal to 2 radians in terms of radial frequency. Therefore,the highest frequency component that exists in a signal will be radians, if the signal issampled at Nyquists rate (which is twice the maximum frequency that exists in the signal, see

    page 17); that is, the Nyquists rate corresponds to rad/s in the discrete frequency domain.Therefore using Hz is not appropriate for discrete signals. However, Hz is used whenever it is

    needed to clarify a discussion, since it is very common to think of frequency in terms of Hz. It

    should always be remembered that the unit of frequency for discrete time signals is radians.

    After passing the signal through a half band low-pass filter, half of the samples can be

    eliminated. This is according to Nyquists rule, since the signal now has a highest frequency of

    /2 radians instead of radians. Simply discarding every other sample will subsample thesignal by two, and the signal will then have half the number of data points. The low-pass

    filtering removes the high frequency information, but leaves the scale unchanged. Only the

    subsampling process changes the scale (see figure 4.3.2). Resolution, on the other hand, is

    related to the amount of information in the signal, and therefore, it is affected by the filteringoperations. Half band lowpass filtering removes half of the frequencies, which can be

    Etc.

    h(n)

    g(n)

    h(n)

    g(n)

    h(n)

    g(n)

    2

    2

    2Level 1

    Level 2

    Level 3

  • 7/28/2019 Wavelets for Sound Analysis

    26/63

    The Discrete Wavelet Transform

    Page 20

    interpreted as losing half of the information. Therefore, the resolution is halved after the

    filtering operation. Half the samples can be discarded without any loss of information.

    Basically, the lowpass filtering halves the resolution, but leaves the scale unchanged. The

    signal is then subsampled by 2 since half of the number of samples are redundant. This doubles

    the scale.

    This completes one level of the Subband decomposition

    This can be repeated for further decomposition. At every level, the filtering and subsamplingwill result in half the number of samples (and hence half the time resolution) and half the

    frequency band spanned (and hence double the frequency resolution).

    Figure 4.3.2

    Resolution and scale changes in discrete time

    4.4 Example of Subband Coding

    We have shown how to decompose a sequence into two sub-sequences at half rate by using abank of halfband pass filters. This process can be iterated on the sequence from the lower band

    to achieve finer frequency resolution at lower frequencies. Repeating the process once on the

    first low band creates a new low band, which corresponds to the lower quarter of the frequency

    spectrum. Each further iteration will half the amount of frequency in the signal.Figure 4.4.1 shows the result of applying a signal to the Subband Coding scheme, each stage

    shows how the signal is sub-sampled and how the frequency band is reduced by half.

    The signal in blue is a sound wave made up of two tones produced by someone whistling. The

    first region is a low tone and the second is a distinctively higher tone. Its Scalogram image,

    produced by the Wavelet Learning Tool being developed along side this dissertation, is given

    below also. This was computed using the Morlet wavelet. It clearly shows the two distinct

    frequencies.

    Figure 4.4.1Outputs from Subband Coding Scheme

    halfband

    lowpass filter

    resolution: halved

    scale: no change

    halfband

    lowpass filter

    resolution: halved

    scale: doubled

    2

    High pass filter 1

    2000 Samples

    f(/2 ~ )

    High pass filter 2

    1000 Samples

    f(/4 ~ /2)

    High pass filter 3

    500 Samples

    f(/8 ~ /4)

    High pass filter 4

    250 Samples

    f(/16 ~ /8)

    4000 Samples

  • 7/28/2019 Wavelets for Sound Analysis

    27/63

    The Discrete Wavelet Transform

    Page 21

    Figure 4.4.1 shows the results from each stage (iteration) of the Subband Coding scheme.

    The example given is of a signal comprised of a low frequency tone followed by a high

    frequency tone. Throughout the different stages of the Subband coding scheme, the signal has

    been high-pass filtered and then down sampled. The outputs given show the signal at the

    different stages of the scheme. It is clearly visible that at the start of the process, only the high

    frequency components are visible but, as the process has gone on, the signal has been more

    and more high pass filtered until only the very low frequencies are left. This is proved by the

    fact that the high frequency tone has been completely filtered out after 4 filters.

    The outputted signals given in figure 4.4.1 are taken from the Subband Learning Tool

    developed along side this dissertation.

  • 7/28/2019 Wavelets for Sound Analysis

    28/63

    Computer Assisted Learning Tools

    Page 22

    5 : Computer Assisted Learning Tools

    5.1 Computer Assisted Learning (CAL)

    As this dissertation researches into the fairly new field of Wavelet Theory, a software packageis being developed to aid in the understanding of the processes involved. The fact that there

    hasnt been much software development in this field gives all the more reason to develop one

    now, which can be used to complement the teaching of the subject. Generally speaking, thesoftware package being developed belongs to the family of Computer Assisted Learning

    (CAL) tools. This chapter discusses the advantages and disadvantages of using CAL and

    whether a CAL tool is appropriate in this situation.

    Computer Assisted Learning means (in a broad sense) using computers in education for all

    kinds of purposes. [ICASSP vol2 1995]

    When constructing CAL tools, it has been recognised that they should provide flexibility for

    the student involved and also be stimulating enough so that the student can construct private

    concepts rather than reproducing given explanations. [K 1996]

    The advantage of having such a tool is that users can control their own access to the

    information being taught. In this way, the flexibility of the system allows students to adapt the

    available information streams to their mental need at any given moment. The disadvantage of

    this free access to the CAL tool is that lecturers have no control over the flexibility of the

    system. Lecturers need to anticipate how much the students will take advantage of this free

    access, which may lead to the student not fully understanding the basic concepts that the CAL

    tool was developed for in the first place. Because of this flexibility problem, CAL tools can notsolely be used as a method for teaching a new subject, but there is plenty of evidence below to

    suggest that there are great advantages of using a CAL tool when used with a series of lectures.

    Lecturers, Martin Cooke and Guy Brown, have researched into possible situations where a

    CAL tool is useful in teaching speech and hearing, and may therefore aid in the teaching of

    Wavelets. Their development of the Matlab Auditory Demos (MAD), with which the Wavelet

    Learning Tool being developed as part of this dissertation would contribute to, has given thetwo authors a deep understanding on whether a CAL tool is appropriate.

    They initially recognised the following problem,

    The courses in speech and hearing typically introduce large amounts of unfamiliar material to

    participants with backgrounds almost as variedthe domains of speech and hearing involve

    intangible signals, ill-suited to traditional styles of presentationthe possibilities formisinterpretation are immense and, in our experience, difficult to predict [C 1999]

    Recognising this problem, it was clear that CAL tools would be an appropriate solution, due to

    the scope for interaction and experimentation.

    Matti Karjalainen and Martti Rahkila also recognised the fruitfulness of using a CAL tool with

    teaching Signal Processing. They also understood that a CAL tool must be in some way more

    useful than ordinary teaching methods. [ICASSP vol2 1995]

    Kommers, Grabinger and Dunlap recognised that there are 3 main areas to most CAL tools

    where there are significant advantages to learning,Resource, Communication andExploration.

    [K 1996]Resource: Paper-based documents are restricted to text, tables, schematic line drawings and

    pictures, whereas hypermedia allows sound and video sequences as well. CAL tools can

    provide multiple dimensions in the meanings of expressed ideas e.g. hearing a property of asound wave provides a better and more natural understanding than a picture of the sound wave.

  • 7/28/2019 Wavelets for Sound Analysis

    29/63

    Computer Assisted Learning Tools

    Page 23

    However, it is important that the resources provided by the tool match those provided by the

    lecturer. If the tool and the lectures mention the same word for different meanings, the learning

    experience is weakened.

    Communication: This is based on the idea that a system should be programmed so that a

    dialogue could evolve between the machine and the learner. The actual bandwidth of

    communication in a CAL tool is very low and would probably not feature a large amount in

    the tools being developed. However, a user guide will be developed to help users to use the

    tools to the best effect, see Appendix B.Exploration: Computer simulation programs themselves are convincing for demonstrating their

    education value. Confronting a student with a simulation allows more drastic, flexible and

    critical manipulations. In a book, only a few examples may be given for a particular property,

    but with a CAL tool, explorations into many other instances of that particular property can be

    achieved. The exploration property will allow students to learn by discovery.

    The CAL tools being developed for this dissertation are developed with the teaching of thesubject in mind, and how it can complement and reinforce a lecture course by providing hands-

    on experience. The software has to be fairly easy to use to avoid early frustrations to a users

    inexperience, and must also be visually suitable to make it clear what is happening. Justlooking at the previous chapters demonstrates how mathematical the theory of Wavelets is, and

    to a computer science student, who may or may not have a good mathematical background,

    may seem very demanding. The tool will help students to see visually what the maths

    represents and enhance the students willingness to learn more about the subject, instead of

    being intimidated by the mathematical content. After all, the actual use of Wavelets is to aid in

    Signal Processing, so actual audio and visual demonstrations are an obvious key to teaching

    the subject.

    5.2 Which programming language?

    There are many different program languages, which you can develop a learning tool from andthey all have their advantages and disadvantages, so it is sometimes difficult to choose which

    one to use without seeing the benefits.

    Matti Karjalainen and Martti Rahkila constructed a CAL tool using the QuickSig object-

    orientated environment packages. This provides signal-processing tools for many application

    domains, using the concept that signals and related concepts are represented as objects and the

    operations on them are typically implemented by method functions. Another advantage is that

    a wide range of functions such as filtering and transforms (like FFT) are built-in. However,

    this programming environment has a problem with portability. QuickSig is Macintosh-specific

    and also requires the languages Lisp and CLOS. So QuickSig is ill suited to large class sizes

    and students would not be able to run the software on their home machines.

    The CAL tools being developed in this dissertation use MATLAB.Matlab is a high level programming language, which provides many facilities for data

    visualisation and numerical computation. It doesnt have the problem with portability, as a

    version of MATLAB is available for most operating systems. MATLAB lends itself to

    prototype programming, as it provides good facilities for quick interface construction. Matlab

    also has a high-level support for sound handling and signal processing. As it is a mathematical

    language, it lends itself to the use of vectors and matrices and makes it very straightforward to

    plot graphs of signals. Martin Cooke and Guy Brown recognised that MATLAB is a sensiblechoice compared to languages such as Java. They recognised that an application in Java would

    be time consuming because the Java applications interface (API) has no equivalent of the

    signal processing toolbox available in MATLAB. A Java application for signal processing

    would probably be too slow for any adequate user interaction.

  • 7/28/2019 Wavelets for Sound Analysis

    30/63

    Requirements Analysis

    Page 24

    6 : Requirements Analysis

    The previous chapter introduced the notion of a Computer Assisted Learning tool and how

    they can be used to assist in the teaching of Wavelet Theory. This dissertation will now focus

    on the development of such a tool.

    6.1 The initial requirement

    At the very beginning, before any work was carried out, there was a small brief on what the

    project should cover. This brief also contained a short paragraph on the basic requirements of

    the system that was also to be developed

    A MATLAB application will be designed and implemented that allows a wavelet

    representation of sound to be generated using the DWT and modified by direct on-screen

    manipulation (e.g. by removing components at certain scales); the inverse DWT will then beapplied to resynthesize a sound waveform which can be played to the listener.

    This statement details the basic ideas of the system and was considered to be the backbone of

    the system. It is clear that the system needs to include a function to work out the DWT, and

    also the IDWT, and some user interaction to alter the scales on screen before the IDWT is

    calculated. There are no clues to how the GUI should look, or how the user interacts with it.

    Even though the initial requirement states that the DWT should be calculated, there was no

    obvious indication of how the results of these calculations were to be displayed. Also, which

    wavelet is to be used to calculate the DWT ?

    All of these questions needed to be answered, so an interview was set up between myself (thedeveloper) and the client, to attempt to make clear what is needed.

    6.2 The Client and Developer scenario

    Before starting the design of any software system, it is important that you have all the

    requirements clear first. The best way to do this is for the developer to ask a multitude of

    questions to the client in order to extract important information about the system that the client

    may not of previously given.

    In any software development program, the client will have a total understanding of the

    problem, as he is the one with the expertise on the subject. It is all too common for clients to

    assume that a software developer is also an expert on their particular field of work. However,this is always never the case and the software developer will have a limited understanding of

    the problem and may be confused by the clients initial requirements. It may be the case thatthe initial requirements are vague or even impossible to implement, so the idea of having an

    interview is to explore and develop the clients narrow goals and to fill in any gaps of

    understanding.

    The interview session took place in the first week of development of this project and some of

    the questions that were raised are detailed below. Most of the questions were of a result of

    studying the initial requirement and of an early research into the topics of Wavelet Theory

    before the project began. The answers given are not direct quotes, but they summarise the

    discussion.

  • 7/28/2019 Wavelets for Sound Analysis

    31/63

    Requirements Analysis

    Page 25

    Q. Do you want a visualrepresentation of the effects thatdifferent wavelets have?This two-part question asks the developer to clarify how the result of the DWT should be

    presented and to find out whether the user should be able to compare the effects of different

    wavelets.

    A. The result of the process of using the Wavelet Transform should be represented by aScalogram so the user can see all the coefficients calculated on a time-frequency axis. This

    scalogram can then be manipulated ready for the inverse transform.

    The user should be able to choose from a selection of wavelets so they that can compare thedifferent effects they have on the process. A good idea would be to have a side-by-side

    comparison.

    After the discussion, it was realised that the answer given by the client to this question

    contradicted the initial requirement, and also the text that had been studied. In the

    requirements, it was clear that the system should use a Discrete Wavelet Transform to analyse

    the signal. However, the answer to the question suggested that the result of the transformshould be presented visually using a scalogram. In section 3.4, it explains that a scalogram is

    calculated by taking a Continuous Wavelet Transform matrix and taking the square of the

    magnitudes of the coefficients. This means that the requirements should be changed to usingthe CWT and not the DWT. After discussing this, this became the new requirement and it was

    agreed that another piece of software would be developed to show how the DWT could be

    calculated using Subband Coding.

    Q. Do you want a breakdown visually of the CWT process?After recognising that it was in fact the CWT that would be used in the software, this question

    approached the subject of using animation to picture the CWT process as well as the scalogramshowing the results it gives.

    A. Yes. Showing the wavelet as it compresses or dilates to calculate each scale of the CWT

    would be beneficial.

    Q. How do you want the On-Screen manipulation to work?The initial requirement suggested that the user should be able to use on-screen manipulations

    to remove scales from the CWT in order to re-compute the inverse transform. However, it isunclear how this could be done, especially in Matlab, where graphics are more limited than in

    Java, say.

    A. It was suggested that the user could directly manipulate the scalogram by using some sort of

    cursor. The cursor could move up and down the scalogram and a window would show the

    CWT coefficients at that scale. The cursor could then be used to select certain scales to be

    removed.

    Q. Would you like to be able to play back the sound waves?

    A. Yes. You need to be able to play the original sound wave that is to be analysed, and also

    you need to hear the effects of the re-synthesis of the scalogram using the ICWT.

    Q. Do you want to be able to actively control the scaling and translation coefficients of

    the wavelet used?The CWT coefficients are calculated at every scale by using the scaling coefficient in the

    mother wavelet formula (see section 2.2). The number of rows in the CWT matrix depends on

    the number of scales calculated. Also, the number of columns in the CWT matrix depends on

    the size of the steps the wavelet takes as it moves along the signal between each coefficient

    calculation. The steps are controlled by the translation coefficient. Altering the values that

    these coefficients take will change the number of CWT coefficients to be calculated, which

    may be a useful property.

    A. Yes. You could use a slider.

  • 7/28/2019 Wavelets for Sound Analysis

    32/63

    Requirements Analysis

    Page 26

    6.3 The Matlab Auditory Demos

    The software tools being developed along side this dissertation will form part of the Matlab

    Auditory Demo (MAD) CAL tools. These tools have been developed to aid in the teaching ofComputer Speech and Hearing courses in the University of Sheffield. [C 1999]

    Because these MADs are Computer Assisted Learning tools, they need to fulfil therequirements mentioned in chapter 5. The MADs consist of many different tools that enable a

    user to understand many different concepts in the subject of Computer Speech and Hearing.

    They vary from producing spectogram representations of speech waveforms to the complex

    modelling of the Basilar Membrane in the ear.

    Even though there is a wide variety of software systems comprising the MADs, they all have

    many things in common that enable them to be successful CAL tools. The research into the

    MADs produced the following further requirements.

    Speed

    The software tools that are being developed must be quick enough to allow sufficient user

    interaction and provide meaningful animation. A quick system will maintain the users

    interest, which will naturally enhance users understanding and learning of Wavelets.

    Ease of useResearch into the MADs showed that all the systems were user friendly in such a way that theuser could see almost straight away what functionality was available. GUI objects are clearly

    labelled and are only accessible at the appropriate times. Axes are labelled appropriately and

    on-screen instructions appear when appropriate to guide the user. None of the systems are too

    clustered with too many buttons, sliders etc. which would only confuse a new user, especially

    if the user is new to the subject being investigated.

    AestheticsIt is important that the interface is pleasing to look at. If the system is dull in appearance, the

    user would not be as interested in using the system, so the teaching of the subject would suffer.The system should provide a suitable amount of colour to aid in the users understanding e.g.the system could be colour coded so that different components have their own colour. Also, the

    tools being developed should match the appearance of the MADs so that it is clear that they

    belong to that group. The tools should have similar headers, size, background colour etc.

    Input error recoveryIt is important that a CAL tool recovers well to input error. Ideally, the system shouldnt allow

    a wrong sequence of inputs i.e. by disabling GUI objects, but it is often the case where a userwould input something wrong either by mistake or by their lack of understanding about the

    material or the system. It is here where the system should recognise that a user has inputted the

    wrong value, and correct the error appropriately.

  • 7/28/2019 Wavelets for Sound Analysis

    33/63

    Software Development

    Page 27

    7 : Software Development

    After the Requirements Analysis, it was proposed that two pieces of software would be

    developed. The main one, called the Wavelet Learning Tool (WLT), will be developed and

    documented in detail and will comprise of all the requirements set out in the previous chapter.The second piece of software will be a basic tool, which will show the different stages in the

    Subband Coding scheme and is called the Subband Learning Tool (SLT).

    7.1 Getting familiar with MATLAB

    In simplest terms, MATLAB is a computer environment for performing calculations.[R 1998]

    MATLAB is a contraction of Matrix Laboratory, and is primarily used for a convenient tool

    for the manipulation of matrices. Since it was first created, it has added more and more

    functionality and remains a leading tool for scientific computation. While simple problems canbe solved interactively with MATLAB, its real power shows when you give it calculations

    that are extremely cumbersome or tedious to do by hand. Because the Continuous Wavelet

    Transform involves constructing a very large matrix, and then displaying that matrix,

    MATLAB seemed to be a natural choice.

    MATLAB also allows graphics to be displayed with ease and with little programming as it

    combines an efficient programming structure with a multitude of pre-defined mathematical

    commands. Therefore, before any software development, it was beneficial to familiarise with

    what MATLAB has to offer in term of commands and interface construction.

    7.1.1 MATLABs GUI

    Matlab provides many different GUI objects to make user interaction as easy as possible. Eachobject has its own different advantages depending on the type of operation. Below is a

    comprehensive list of what is available.

    Push Buttons The software user can press buttons for instant execution of a particularfunction.

    Pop-up Menus Contains a list where one item can be selected i.e. a wavelet type. After

    selection, a process can be initiated.

    Edit Boxes Used to alter a numerical parameter in a function.

    Check Boxes Often used when there is an option to have a particular property in the

    system or not.

    Radio Buttons Similar to check boxes except they usually come in pairs where you

    select one or the other.

    Sliders Also used to alter the parameter in the function, but unlike edit boxes, a

    value doesnt need to be known and the function can update in real-timeas the slider moves.

    List Boxes Similar to pop-up menus except that the whole of the list cant bevisualised at once to save space.

  • 7/28/2019 Wavelets for Sound Analysis

    34/63

    Software Development

    Page 28

    Matlab also allows the construction of axis, lines, text, figures and other graphical devices, that

    were taken into consideration when sketching the initial interface design.

    Each GUI object has attached to it a series of handles. These handles define many properties of

    the objects, which are set and controlled by the developer. The properties are set when the

    software is first executed, but can be changed at any time during software development.

    Controlling the handles to the GUI objects controls how the system will look, behave and how

    efficiently. Therefore, a good understanding of the handles available and how to use themappropriately was an important stage in software development.

    7.1.2 Coding in MATLAB

    Matlab is a functional program language rather than an object-orientated one. Functions call

    other functions, which are executed in a sequential fashion. Each function is read downwardsunless encountered by aforor while loop or by an ifclause.

    The Wavelet Learning Tool will consist of many different functions, which will call each otherappropriately. The Matlab system will have to cope with the flow of control between these

    different functions, which are written in m-files, which can be called at any time within a

    program. However, keeping track of all these m-files can be very difficult and looks veryuntidy. The solution to this is the case statement. Instead of having many different m-files

    containing the different functions of the system, it is better practise to place all the functions

    into one m-file separated by a case statement.

    Figure 7.1.2.1

    Every time the m-file containing all the switch statements is called, it must be called with an

    appropriate argument corresponding to which case is to be read.

    Another advantage to this case switching approach is that you can have an initialisation case

    where all the global parameters can be set when the system is first executed.

    Matlab programming revolves around designing separate functions to do separate jobs, and

    then plugging them into the overall system to interact with the interface. This plug-in nature of

    programming allows each function to be tested independently before being encapsulated into

    the system. Most of the functions being developed involve matrix manipulation and can be

    tested simply by running them on the Matlab command window to see if they produce correct

    results. Some functions need not be tested seperately from the system if their sole purpose is to

    just change global variables or alter the interface in some way. The functions that are

    constructed from a mathematical knowledge followed the sequence of development shown in

    figure 7.1.2.2.

    m-file

    CWT

    m-file

    LoadSignal

    m-file

    Zoomin

    m-file

    switchcase CWT

    --------

    case Load Signal

    --------

    case Zoom in

    --------

    end

  • 7/28/2019 Wavelets for Sound Analysis

    35/63

    Software Development

    Page 29

    Figure 7.1.2.2

    One of the most useful GUI objects handles is the enabling handle. This allows the software

    developer to control when a user can use a particular GUI object. To stop the wrong sequence

    of inputs into the system, a particular object can be disabled at a time when it shouldnt be

    used. These restrictions will make the software more secure and more likely to recover from

    user input error.

    7.2 Wavelet Learning Tool - Interface Design

    The main system to be developed is the one identified in the requirements analysis, the

    Wavelet Learning Tool (WLT). This tool will allow users to choose different wavelets to

    compute a CWT and corresponding scalogram. A side by side comparison of different

    wavelets was recommended, as was a way of manipulating the scalogram to produce a re-

    synthesised signal from an Inverse CWT function.

    Matlab allows you to construct an interface very simply and effectively with very little code. It

    is very beneficial to do interface development in the early stages of development, as it will aid

    in the understanding of what functionality is needed in the final system. Knowing which GUI

    objects were available, an initial sketch of the interface was made and presented for comment.

    Figure 7.2.1

    The sketch shown in figure 7.2.1 is an early representation of the proposed look of thesoftware tool. The original idea was to have the following GUI objects.

    Mathematical

    Formula

    Code into

    Matlab

    Test seperately

    on command

    screen

    Introduce appropriate

    Interface code and

    handle commands

    Plug into the systemusing case clause

    Test the

    system

  • 7/28/2019 Wavelets for Sound Analysis

    36/63

    Software Development

    Page 30

    5 Graphs: Plot of loaded signal. 2 for showing the animated wavelet as it changes shape tocalculate the different scales of the scalogram. Plot of the CWT coefficients from the

    selected scale from a scalogram. Plot of the reprocessed signal.

    2 Scalogram Images each with their own cursor Pop-up menus for selecting a wavelet

    Zoom Controls for zooming in on the plot of CWT coefficients Process buttons for re-processing the CWT. Play buttons for playing back the original and altered signals.

    7.3 The coding of The Wavelet Learning Tool

    Section 7.2 shows how an initial interface was designed to accommodate for the requirements

    established in the requirement analysis. This initial interface was then coded into Matlab with

    the help of the Guide tool. The Guide tool is a Matlab interface construction tool, which allows

    a user to code an interface quickly and effectively without the tedious task of setting all the

    handles to each GUI object. Although the Guide tool writes the interface code for you, it is in a

    format which didnt suit the interactive nature of the system, therefore it was only used to setthe basic handles such as position and colour, whilst other handles such as call-backs were

    coded by hand.

    Having developed the interface first, it was then logical to develop the main functions, whichwould be called by the user via the interface. There are two main functions: the Continuous

    Wavelet Transform (CWT) and the Inverse Continuous Wavelet Transform (ICWT).

    This section describes the main coding developments that were undergone during the coding ofthe Wavelet Learning Tool. By no means is this a fully comprehensive description of all the

    coding, as it does not include full details e.g. of global parameters or handle properties. To see

    a full development log, see appendix C.

    See Appendix C for the software development log

    7.3.1 Coding the CWT

    The requirements state that the user must be able to compare the effects of using different

    wavelets with the CWT. The interface contains pop-up menus, which contain a list of the

    different wavelets available for the computation. The main job of the CWT function is to

    calculate the CWT coefficients at each scale of the wavelet transform and to plot these into a

    CWT matrix. Chapter 3 described how this matrix could then be used to produce a scalogram

    of the CWT by squaring each coefficient independently. The scalogram will have time on thex-axis and scale on the y-axis.

    There are three separate CWT functions, one for each wavelet (Gauss, Morlet and Sombrero).

    Each calculates the CWT matrix and sets this as a global variable. The CWT matrix is

    calculated in the usual way by changing the shape of the wavelet at each scale of the

    transform. The CWT functions first calculate how many scales to compute depending on thelength of the signal. Re-calling from chapter 3, as the scale value increases, the frequency

    decreases and therefore the wavelet needs to dilate. So the user can see the final shape of the

    wavelet, the CWT function calculates the largest scale first and then compresses the wavelet

    until the scale=1. The different shapes of the wavelet are stored in another matrix, which is

    also set as a global variable for other functions to use.

  • 7/28/2019 Wavelets for Sound Analysis

    37/63

    Software Development

    Page 31

    When the user selects a wavelet from a pop-up menu either the function waveselecta orwaveselectb is called depending on which pop-up menu is executed. The waveselcta and

    waveselctb functions are the main functions in the WLT program. These functions control all

    the data handling and graphics handling of the system when a new CWT is calculated. Each

    function controls its own part of the system, either A or B, which corresponds to which pop-up

    menu is selected. The interface is split into A and B to allow for the side by side comparisons.

    Both functions were developed simultaneously as they both contain similar functionality. The

    functions were carefully developed to provide the following sequence of operations

    The function reads which wavelet has been selected,

    then calls the appropriate CWT function.

    (The CWT function calculates the CWT matrix and matrix of wavelet shapes see above)

    A graph is animated with the different forms that the wavelet takes at each scale of the CWT.

    The function then presents the CWT matrix as a scalogram.

    The scalograms cursor is drawn onto scale number 2.

    The PlotCWT function is called.

    The above list gives a summary of the functionality within the two main functions of the

    system, namely waveselecta and waveselectb. These functions were the first to be developed

    after the coding of the CWT function, as there needed to be a way of calling the function with

    different arguments.

    The summary states that after the displaying of the scalogram, the main function then calls a

    function called PlotCWT. This function plots a graph of the CWT coefficients from a selected

    scale of the scalogram. The scale is selected by the cursor and, as the cursor is re-drawn after

    the construction of a new scalogram, the CWTPlot function needs to be called to update the

    plot. The CWTPLot function is also called every time the cursor is moved, to update the plot