158
Aalborg Universitet Sparse Nonstationary Gabor Expansions - with Applications to Music Signals Ottosen, Emil Solsbæk DOI (link to publication from Publisher): 10.5278/VBN.PHD.ENG.00041 Publication date: 2018 Document Version Publisher's PDF, also known as Version of record Link to publication from Aalborg University Citation for published version (APA): Ottosen, E. S. (2018). Sparse Nonstationary Gabor Expansions - with Applications to Music Signals. Aalborg Universitetsforlag. Ph.d.-serien for Det Ingeniør- og Naturvidenskabelige Fakultet, Aalborg Universitet https://doi.org/10.5278/VBN.PHD.ENG.00041 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. ? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ? Take down policy If you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from vbn.aau.dk on: December 05, 2020

Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Aalborg Universitet

Sparse Nonstationary Gabor Expansions - with Applications to Music Signals

Ottosen, Emil Solsbæk

DOI (link to publication from Publisher):10.5278/VBN.PHD.ENG.00041

Publication date:2018

Document VersionPublisher's PDF, also known as Version of record

Link to publication from Aalborg University

Citation for published version (APA):Ottosen, E. S. (2018). Sparse Nonstationary Gabor Expansions - with Applications to Music Signals. AalborgUniversitetsforlag. Ph.d.-serien for Det Ingeniør- og Naturvidenskabelige Fakultet, Aalborg Universitethttps://doi.org/10.5278/VBN.PHD.ENG.00041

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright ownersand it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

? Users may download and print one copy of any publication from the public portal for the purpose of private study or research. ? You may not further distribute the material or use it for any profit-making activity or commercial gain ? You may freely distribute the URL identifying the publication in the public portal ?

Take down policyIf you believe that this document breaches copyright please contact us at [email protected] providing details, and we will remove access tothe work immediately and investigate your claim.

Downloaded from vbn.aau.dk on: December 05, 2020

Page 2: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen
Page 3: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

EMIL SO

LSBÆ

K O

TTOSEN

SPAR

SE NO

NSTATIO

NA

RY GA

BO

R EXPA

NSIO

NS

SPARSE NONSTATIONARYGABOR EXPANSIONS

WITH APPLICATIONS TO MUSIC SIGNALS

BYEMIL SOLSBÆK OTTOSEN

DISSERTATION SUBMITTED 2018

Page 4: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen
Page 5: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Sparse NonstationaryGabor Expansions

with Applications to Music Signals

Ph.D. DissertationEmil Solsbæk Ottosen

Dissertation submitted February 12, 2018

Page 6: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Dissertation submitted: February 12, 2018

PhD supervisor: Prof. Morten Nielsen Dept. of Mathematical Sciences Aalborg University

PhD committee: Professor Horia Cornean (Chairman) Aalborg University

Associate Professor Jakob Lemvig Technical University of Denmark

Professor Hans G. Feichtinger University of Vienna

PhD Series: Faculty of Engineering and Science, Aalborg University

Department: Department of Mathematical Sciences

ISSN (online): 2446-1636 ISBN (online): 978-87-7210-153-8

Published by:Aalborg University PressLangagervej 2DK – 9220 Aalborg ØPhone: +45 [email protected]

© Copyright: Emil Solsbæk Ottosen

Printed in Denmark by Rosendahls, 2018

Page 7: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Preface

The work presented in this thesis was carried out at the Department of Math-ematical Sciences, Aalborg University, within the period February 2015 toFebruary 2018. The Ph.D. project has been supervised by professor MortenNielsen from the Department of Mathematical Sciences, Aalborg University.Parts of the research was accomplished during a research stay at the Facultyof Mathematics, University of Vienna, within the period September 2016 toDecember 2016. During the research stay, the Ph.D. project was supervisedby Dr. Monika Dörfler from the research group NuHAG (Numeric HarmonicAnalysis Group).

The central part of this thesis consists of a collection of four scientific pa-pers. Three of these papers have been published in peer reviewed journalswhile the remaining one is undergoing review at the time of submission.The first part of the thesis gives an introduction to time-frequency analy-sis and provides some background material necessary for understanding theproblems considered in the scientific papers. The second part of the the-sis presents the four scientific papers as self-contained articles with separateabstracts, introductions, bibliographies and so on.

First of all, I would like to thank my supervisor Morten Nielsen for in-troducing me to the field of time-frequency analysis and for his guidancethroughout the Ph.D. project. I would also like to thank Monika Dörfler,and the other members of NuHAG, for an excellent and constructive stay atthe University of Vienna. My colleges at the Department of MathematicalSciences also deserves recognition for their help, especially Søren Vilsen forassistance on various computer problems. Finally, I would like to thank myfriends and family for their support during the Ph.D. project.

Emil Solsbæk OttosenAalborg University, February 12, 2018

iii

Page 8: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Preface

iv

Page 9: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Thesis Details

Thesis title: Sparse Nonstationary Gabor Expansions— with Applications to Music Signals

PhD student: Emil Solsbæk OttosenDept. of Mathematical SciencesAalborg University

PhD Supervisor: Prof. Morten NielsenDept. of Mathematical SciencesAalborg University

The thesis is based on the following four scientific papers:

(A) E. S. Ottosen and M. Nielsen. Weighted Thresholding and NonlinearApproximation. Submitted to Indian Journal of Pure and Applied Mathe-matics, 2017.

(B) E. S. Ottosen and M. Nielsen. A Characterization of Sparse Nonstation-ary Gabor Expansions. Accepted for publication in Journal of FourierAnalysis and Applications, 2017. Available: https://link.springer.com/article/10.1007%2Fs00041-017-9546-6.

(C) E. S. Ottosen and M. Nielsen. Nonlinear Approximation with Nonsta-tionary Gabor Frames. Accepted for publication in Advances in Com-putational Mathematics, 2017. Available: https://doi.org/10.1007/s10444-017-9577-1

(D) E. S. Ottosen and M. Dörfler. A Phase Vocoder Based on NonstationaryGabor Frames. Published in IEEE/ACM Transactions on Audio, Speech,and Language Processing, vol 25, no. 11, pp. 2199–2208, 2017. Available:http://ieeexplore.ieee.org/document/8031036/.

v

Page 10: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Thesis Details

vi

Page 11: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

English Summary

In this thesis we consider sparseness properties of classical Gabor expansionsand adaptive nonstationary Gabor expansions. A classical Gabor expansiondecomposes a signal into a convergent sum of time-frequency (TF) localizedatoms, which are constructed as TF-shifts of a single fixed window function.In contrast, nonstationary Gabor expansions allow for the usage of multi-ple window functions to obtain an adaptive TF resolution. Both classicaland nonstationary Gabor expansions have proven to be extremely useful forrepresenting music signals as they provide sparse and structured TF repre-sentations. Sparseness of a TF representation is desirable since it reduces thecomputational cost involved in processing the expansion coefficients. Also,the sparseness property allows for efficient approximations of the signal bythresholding the associated expansion coefficients. This kind of approxima-tion belongs to the area of approximation theory known as nonlinear approx-imation.

In paper A of this thesis we consider sparseness properties of classical Ga-bor expansions in the general framework of nonlinear approximation theory.The main contribution of this paper is an upper bound (a Jackson inequal-ity) on the approximation error occurring when thresholding the expansioncoefficients with respect to certain weight functions. These weight functionsseek to exploit the coherence between expansion coefficients which is notaccounted for in traditional greedy algorithms.

Nonstationary Gabor frames extend the concept of classical Gabor framesby allowing for a flexible TF resolution along either the time or the frequencyaxis. In paper B of this thesis we use decomposition spaces to characterizethose signals which permit sparse expansions with respect to certain nonsta-tionary Gabor frames with flexible frequency resolution. In paper C we con-sider a similar approach for nonstationary Gabor frames with flexible timeresolution and provide a numerical analysis of the associated approximationrate. Finally, in paper D we consider a practical application and constructa new time-stretching algorithm based on the theory of nonstationary Ga-bor frames. Time-stretching is the task of modifying the length of a signalwithout affecting its frequencies.

vii

Page 12: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

English Summary

viii

Page 13: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Dansk Resumé

I denne afhandling undersøger vi sparseness egenskaber ved klassiske Gaborudviklinger og adaptive ikke-stationære Gabor udviklinger. En klassisk Ga-bor udvikling nedbryder et signal til en konvergent sum af tids-frekvens (TF)lokaliserede atomer, der er konstruerede som TF-skift af en enkelt vindues-funktion. I modsætning hertil så anvender ikke-stationære Gabor udviklingerflere vinduesfunktioner til at opnå en adaptiv TF opløsning. Både klassiskeog ikke-stationære Gabor udviklinger har vist sig at være særligt nyttige tilat repræsentere musiksignaler, da de producerer sparse og strukturerede TFrepræsentationer. Sparseness af en TF repræsentation er ønskværdigt, da detnedsætter mængden af computerkraft, der skal bruges til at bearbejde koef-ficienterne. Desuden medfører sparseness egenskaben, at man effektivt kanapproksimere signalet ved at thresholde de tilhørende koefficienter. Dennetype approksimation tilhører det felt af approksimationsteorien, der kendessom ikke-linear approksimation.

I artikel A af denne afhandling undersøger vi sparseness egenskaber vedklassiske Gabor udviklinger i det generelle framework, som benyttes i ikke-lineær approksimationsteory. Hovedresultatet af denne artikel er en øvregrænse (en Jackson ulighed) på den approximationsfejl, der opstår, når manthresholder koefficienterne mht. visse vægtfunktioner. Disse vægtfunktionerforsøger at udnytte sammenhængen mellem koefficienterne, hvilket der al-mindeligvis ikke tages højde for i traditionelle grådige algoritmer.

Ikke-stationære Gabor frames generaliserer klassiske Gabor frames vedat tillade en fleksibel TF opløsning langs enten tids- eller frekvensaksen.I artikel B af denne afhandling bruger vi dekompositionsrum til at karak-terisere de signaler, der har sparse udviklinger mht. ikke-stationære Gaborframes med fleksibel opløsning i frekvens. I artikel C benytter vi en lignendetilgang for ikke-stationære Gabor frames med fleksibel opløsning i tid oginkluderer desuden en numeriske analyse af den tilhørende approksimation-srate. Til sidst, i artikel D, ser vi på en praktisk anvendelse og konstruereren ny time-stretching algoritme baseret på teorien om ikke-stationære Gaborframes. Time-stretching er anvendelsen, hvor man modificerer længden af etsignal uden at ændre ved dets frekvenser.

ix

Page 14: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Dansk Resumé

x

Page 15: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Contents

Preface iii

Thesis Details v

English Summary vii

Dansk Resumé ix

I Introduction 1

Introduction 31 Frame theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Gabor frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Nonstationary Gabor frames . . . . . . . . . . . . . . . . . . . . 6

3.1 NSGFs in the time domain . . . . . . . . . . . . . . . . . 73.2 NSGFs in the frequency domain . . . . . . . . . . . . . . 8

4 Time-frequency analysis . . . . . . . . . . . . . . . . . . . . . . . 94.1 Stationary Gabor analysis . . . . . . . . . . . . . . . . . . 114.2 Nonstationary Gabor analysis . . . . . . . . . . . . . . . 16

5 Nonlinear approximation with frames . . . . . . . . . . . . . . . 226 Connection with papers A-D . . . . . . . . . . . . . . . . . . . . 25References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

II Papers 35

A Weighted Thresholding and Nonlinear Approximation 371 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 Elements from nonlinear approximation theory . . . . . . . . . 413 Weighted thresholding . . . . . . . . . . . . . . . . . . . . . . . . 434 Modulation spaces and Gabor frames . . . . . . . . . . . . . . . 485 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 49

xi

Page 16: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Contents

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

B A Characterization of Sparse Nonstationary Gabor Expansions 571 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 Decomposition spaces . . . . . . . . . . . . . . . . . . . . . . . . 61

2.1 Structured covering and BAPU . . . . . . . . . . . . . . . 612.2 Definition of decomposition spaces . . . . . . . . . . . . 63

3 Nonstationary Gabor frames . . . . . . . . . . . . . . . . . . . . 654 Decomposition spaces based on nonstationary Gabor frames . 695 Characterization of decomposition spaces . . . . . . . . . . . . . 726 Banach frames for decomposition spaces . . . . . . . . . . . . . 757 Application to nonlinear approximation theory . . . . . . . . . 77A Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . 79References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

C Nonlinear Approximation with Nonstationary Gabor Frames 851 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872 Decomposition spaces . . . . . . . . . . . . . . . . . . . . . . . . 893 Nonstationary Gabor frames . . . . . . . . . . . . . . . . . . . . 924 Characterization of decomposition spaces . . . . . . . . . . . . . 955 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . 99

5.1 Single experiment . . . . . . . . . . . . . . . . . . . . . . 1005.2 Large scale experiment . . . . . . . . . . . . . . . . . . . 102

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103A Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . 104References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

D A Phase Vocoder Based on Nonstationary Gabor Frames 1111 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

1.1 State of the art . . . . . . . . . . . . . . . . . . . . . . . . 1142 Discrete Gabor theory . . . . . . . . . . . . . . . . . . . . . . . . 1163 The phase vocoder . . . . . . . . . . . . . . . . . . . . . . . . . . 119

3.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193.2 Modification . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1213.4 Drawbacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

4 A phase vocoder based on nonstationary Gabor frames . . . . . 1224.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.2 Modification . . . . . . . . . . . . . . . . . . . . . . . . . . 1244.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274.4 Advances . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

xii

Page 17: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Contents

5.1 Synthetic signals . . . . . . . . . . . . . . . . . . . . . . . 1315.2 Real world signals . . . . . . . . . . . . . . . . . . . . . . 133

6 Conclusion and perspectives . . . . . . . . . . . . . . . . . . . . 135References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

xiii

Page 18: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Contents

xiv

Page 19: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Part I

Introduction

1

Page 20: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen
Page 21: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Introduction

This thesis investigates sparseness properties of time-frequency (TF) repre-sentations obtain with classical Gabor frames and adaptive nonstationary Ga-bor frames. In particular, the theory is relevant for analyzing music signalsas these signals tend to produce sparse TF representations when expandedin a (nonstationary) Gabor dictionary. In this first part of the thesis we givean introduction to TF analysis in the general framework provided by frametheory. Both classical Gabor frames and nonstationary Gabor frames are (asthe words suggest) special cases of frames and it therefore make sense toconsider them in a unified framework. To illustrate the concepts from TFanalysis we include a real world example and analyze the associated TF rep-resentations. With this approach, the abstract theory presented in the papersin the second part of the thesis will hopefully be easier to grasp as many ofthe deep theoretical concepts take their roots in practical problems from TFanalysis.

1 Frame theory

The main property of a frame gkk∈N in a separable infinite dimensionalHilbert space H is that every f ∈ H has a stable expansion of the formf = ∑k∈N ckgk with ckk∈N ⊂ C. In general, frames are redundant systemswith non-unique expansion coefficients ckk∈N. The notion of frames wasfirst introduced in 1952 by Duffin and Schaeffer [35] who studied the prop-erties of overcomplete families of exponential functions. Much later in 1986,Daubechies, Grossmann, and Meyer [19] revisited the theory and considerednew types of frames not restricted to the framework of nonharmonic Fourierseries. We also mention the work by Young [109], Heil and Walnut [63], andDaubechies [16, 17] as examples of some of the early work done on frametheory. A modern and thorough introduction to frame theory can be foundin the book by Christensen [13].

Formally, a sequence gkk∈N ⊂ H is a frame for H if there exist frame

3

Page 22: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

bounds A, B > 0 such that

A ‖ f ‖2H ≤ ∑

k∈N

|〈 f , gk〉|2 ≤ B ‖ f ‖2H , ∀ f ∈ H.

If A = B then the frame is said to be tight. It follows from Parseval’s identitythat any orthonormal basis is a tight frame with A = B = 1. For a givenframe gkk∈N ⊂ H, we define the associated frame operator S : H → H by

S f = ∑k∈N

〈 f , gk〉 gk, f ∈ H.

The frame property implies that S is a bounded, invertible, self-adjoint, andpositive operator, which permits a dual frame gkk∈N := S−1gkk∈N whichis also a frame but with frame-bounds B−1, A−1 > 0 [13, 55, 61, 86]. It followsthat every f ∈ H has a frame expansion of the form

f = SS−1 f = ∑k∈N

⟨S−1 f , gk

⟩gk = ∑

k∈N

〈 f , gk〉 gk,

with unconditional convergence in H. Similarly, f possesses an expansionwith respect to the dual frame gkk∈N in the sense that

f = S−1S f = ∑k∈N

〈 f , gk〉 S−1gk = ∑k∈N

〈 f , gk〉 gk. (1)

We choose to work mainly with the expansion given in (1). The frame coeffi-cients 〈 f , gk〉k∈N can be shown to minimize the `2−norm among all possi-ble reconstruction coefficients ckk∈N satisfying f = ∑k∈N ck gk, see [109].

For a tight frame we note that 〈S f , f 〉 = ∑k∈N |〈 f , gk〉|2 = A‖ f ‖2, whichimplies S = AI with I denoting the identity operator on H. Hence, for anorthonormal basis we obtain S = I and the frame expansion in (1) reduces tothe well-known unique reconstruction formula f = ∑k∈N〈 f , gk〉gk. However,for many practical purposes it is beneficial to consider redundant frameswhere the reconstruction coefficients are not uniquely determined [14, 27, 95].

2 Gabor frames

In this section we consider an important kind of frames for L2(Rd) known asGabor frames. These frames are named after D. Gabor [52] who in 1946 con-sidered a new approach for signal decomposition using TF localized atoms.These decompositions was later studied by Janssen [70, 71] in the early 1980sand was combined with frame theory by Daubechies, Grossmann, and Meyer[19] in their paper from 1986. Further studies on the fundamental proper-ties of Gabor frames were performed in the late 1980s by Feichtinger and

4

Page 23: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Gabor frames

Gröchenig [40, 42, 45, 46], in the early 1990s by Daubechies [16] and Heiland Walnut [63, 106], and later in the 1990s by several other authors (see[20, 73, 96] and references therein). We also mention the two books [47, 48]by Feichtinger and Strohmer which contain surveys on various topics of Ga-bor analysis.

Gabor frames are based on two classes of unitary operators on L2(Rd),namely translation and modulation. For α, β ∈ Rd we define the translationoperator Tα : L2(Rd)→ L2(Rd) and the modulation operator Mβ : L2(Rd)→L2(Rd) by

Tα f (x) = f (x− α) and Mβ f (x) = f (x)e2πiβ·x.

Given lattice parameters a, b > 0, and a fixed window function g ∈ L2(Rd),we define the Gabor system gm,nm,n∈Zd := MmbTnagm,n∈Zd . If the sys-tem constitutes a frame for L2(Rd) then it is referred to as a Gabor frame.Explicitly written, the frame elements of a Gabor frame are given by

gm,n(x) = g(x− na)e2πimb·x, m, n ∈ Zd,

and the associated frame operator by

S f = ∑m,n∈Zd

〈 f , gm,n〉 gm,n, f ∈ L2(Rd).

By cumbersome calculations it can be shown that S (and consequently S−1)commutes with translation and modulation [55]. The dual frame of gm,nm,nis therefore given by

gm,nm,n = S−1gm,nm,n = MmbTnaS−1gm,n = MmbTna gm,n,

with g := S−1g denoting the so-called dual window of g. The fact that thedual frame is also a Gabor frame is a special property of Gabor frames notshared by general frames. The frame expansions in (1) take the form

f = ∑m,n∈Zd

〈 f , gm,n〉 gm,n, f ∈ L2(Rd).

The Gabor frames we consider here are often referred to as regular Gaborframes since the sampling points na, mbm,n∈Zd form a separable latticeaZd × bZd in R2d. The lattice parameters a, b > 0 determine the densityof the Gabor frame, which is usually divided into three cases:

1. ab > 1: This is called undersampling and implies that gm,nm,n cannotform a frame for L2(Rd). This was proved for the rational case ab ∈ Q

by Daubechies [16] and for the general case by Baggett [5] (see also thepaper by Janssen [72]).

5

Page 24: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. ab = 1: This is called critical sampling. In this case, gm,nm,n is a framefor L2(Rd) if and only if it is also a Riesz basis, i.e., a basis of the formUekk∈N with ekk∈N being an orthonormal basis and U a boundedbijective operator on L2(Rd), see [13]. Additionally, if gm,nm,n is aframe for L2(Rd) then the Balian-Low theorem states that g cannot bewell localized in both time and frequency [7, 9, 84]. We elaborate furtheron this point in Section 4.1.

3. ab < 1: This is called oversampling. In this case, if gm,nm,n is a framefor L2(Rd) then it cannot be a Riesz basis. A frame which is not aRiesz basis is called redundant since there exist coefficients cm,nm,n in`2 \ 0 with ∑m,n cm,ngm,n = 0, see [13].

The original expansion considered by D. Gabor in 1946 was in the crit-ical case with ab = 1 and g(x) = e−x2/2 being the Gaussian function [52].This expansion has later been shown to be unstable as the Gaussian windowposses the special property that the corresponding Gabor system gm,nm,nis a frame for L2(Rd) if and only if ab < 1. This result was first conjecturedby Daubechies and Grossmann [18] in 1988 and later proven in 1992 inde-pendently by Lyubarskiı [85] and Seip and Wallstén [99, 100]. For a generalwindow function g ∈ L2(Rd) it is difficult to find an exact range of parame-ters a, b > 0 guaranteeing that gm,nm,n is a frame for L2(Rd) [56, 75, 81]. It istherefore important to note that the terminology introduced above is slightlymisleading. Even with heavy oversampling ab < 1 we are not guaranteedthat gm,nm,n forms a frame for L2(Rd).

We now consider a simple construction of Gabor frames, which is ofgreat practical importance. Let a, b > 0 and assume g ∈ L2(Rd) satisfiessupp(g) ⊆ [0, 1/b]d. With G(x) := b−d ∑n∈Zd |g(x− na)|2, the frame opera-tor for gm,nm,n turns out to be the multiplication operator [19]

S f (x) = G(x) f (x), f ∈ L2(Rd).

Consequently, gm,nm,n is a frame for L2(Rd) with frame bounds A, B > 0 ifand only if

A ≤ G(x) ≤ B, for a.e. x ∈ Rd.

Furthermore, if gm,nm,n is a frame for L2(Rd) then the dual frame is givenby gm,n(x) = G−1(x)gm,n(x). Such frames are traditionally referred to aspainless nonorthogonal expansions [19]. To be consistent with the notationin Section 3 we choose to simply call them painless Gabor frames.

3 Nonstationary Gabor frames

One of the shortcomings of classical Gabor frames is the stationarity resultingfrom applying only one window function. A straightforward generalization

6

Page 25: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Nonstationary Gabor frames

of the theory is to choose a countable set of window functions gnn∈Zd ⊂L2(Rd), with corresponding sampling parameters bnn∈Zd ⊂ R+, and thendefine the system gm,nm,n∈Zd by

gm,n(x) = Mmbn gn(x) = gn(x)e2πimbn ·x, m, n ∈ Zd. (2)

We obtain classical Gabor systems by choosing, for all n ∈ Zd, gn := Tnagand bn := b with g ∈ L2(Rd) and a, b > 0.

The generalized Gabor systems defined in (2) were originally studied byHernández, Labate, and Weiss [64] and later by Ron and Shen [97] under thename of generalized-shift invariant systems (see also [13, 69, 74, 80]). Techni-cally, generalized-shift invariant systems are obtained by applying a Fouriertransform to the systems in (2), thereby producing the equivalent systemsdefined in Section 3.2. The systems in (2) are defined in the time domainwhereas the systems in Section 3.2 are defined in the frequency domain. Theterm nonstationary Gabor frames (NSGFs) was introduced by Jaillet [68] forframes of the form (2) and further used by Balazs et al. [6], Holighaus [65],and Dörfler and Matusiak [33, 34]. We choose to work with the terminologyof [68] to emphasize the connection with classical Gabor frames. Likewise,we will often refer to a classical Gabor frame as a stationary Gabor frame.

The important paper [6] by Balazs, Dörfler, Jaillet, Holighaus and Velascocontains the first practical implementations of NSGFs. This paper is accom-panied by a Matlab toolbox, which provides source code for construction NS-GFs in both the time domain and the frequency domain. The main author ofthis toolbox is Holighaus and the source code applies routines from the LargeTime-Frequency Analysis Toolbox (LTFAT), which was originally founded bySøndergaard [101] and is currently maintained by Pruša [94]. The practi-cal implementations presented in [6] shows that NSGFs can be used to createfast adaptive TF representations, which for certain signal classes outperformsclassical (stationary) Gabor frames.

Finally, it should be noted that NSGFs can be considered a special case ofthe more general concept of multi-window Gabor frames [30, 108, 110]. How-ever, so far NSGFs are the only realization of multi-window Gabor frameswhich permit perfect reconstruction and a fast implementation based on theFFT [83]. These properties makes NSGFs particular interesting from a practi-cal point of view.

3.1 NSGFs in the time domain

We first describe NSGFs in the time domain, i.e., NSGFs of the form (2). Theframe operator for a NSGF gm,nm,n is defined by

S f = ∑m,n∈Zd

〈 f , gm,n〉 gm,n, f ∈ L2(Rd),

7

Page 26: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

and the resulting frame expansions take the form

f = ∑m,n∈Zd

〈 f , gm,n〉 S−1gm,n, f ∈ L2(Rd).

In general, the dual frame S−1gm,nm,n does not produce a new NSGFas is the case for stationary Gabor frames [65]. However, by generalizingthe painless condition to the nonstationary case we do obtain dual frameswith this property. Let bnn∈Zd ⊂ R+ and assume gnn∈Zd ⊂ L2(Rd)satisfy supp(gn) ⊆ [0, 1/bn]d + an with an ∈ Rd for all n ∈ Zd. WithG(x) := ∑n∈Zd b−d

n |gn(x)|2, the frame operator for gm,nm,n is the multi-plication operator [6, 19]

S f (x) = G(x) f (x), f ∈ L2(Rd).

It follows that gm,nm,n is a frame for L2(Rd) with frame bounds A, B > 0 ifand only if

A ≤ G(x) ≤ B, for a.e. x ∈ Rd.

If gm,nm,n is a frame for L2(Rd) then the dual frame is given by

gm,n(x) = Mmbn

gn(x)G(x)

, x ∈ Rd.

We note that this result completely generalizes the painless case for stationaryGabor frames. For this reason we refer to the system gm,nm,n as a painlessNSGF [6].

3.2 NSGFs in the frequency domain

As mentioned in the beginning of Section 3, NSGFs have an equivalent im-plementation in the frequency domain. We use the following normalizationfor the Fourier transform

f (ξ) =∫

Rdf (x)e−2πix·ξdx, f ∈ L1(Rd),

which by standard arguments extends to a unitary operator on L2(Rd), see[82]. Given hmm∈Zd ⊂ L2(Rd) and amm∈Zd ⊂ R+, we define the systemhm,nm,n∈Zd by

hm,n(x) = Tnam hm(x) = hm(x− nam), m, n ∈ Zd.

In complete analogy with the painless condition in the time domain, we de-fine H(ξ) := ∑m∈Zd a−d

m |hm(ξ)|2 and assume supp(hm) ⊆ [0, 1/am]d + bmwith bm ∈ Rd for all m ∈ Zd. The frame operator is then given by [6, 19]

S f (x) = (F−1H ∗ f )(x), f ∈ L2(Rd),

8

Page 27: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

which corresponds to a multiplication operator in the frequency domain.Consequently, hm,nm,n is a frame for L2(Rd) with frame bounds A, B > 0 ifand only if

A ≤ H(ξ) ≤ B, for a.e. ξ ∈ Rd.

If hm,nm,n is a frame for L2(Rd) then the dual frame is given by

hm,n(x) = TnamF−1(

H−1hm

)(x), x ∈ Rd.

In the next section we consider applications of stationary Gabor frames andNSGFs in connection with TF analysis.

4 Time-frequency analysis

The purpose of TF analysis is to combine the information of a signal f ∈L2(Rd) and its Fourier transform f ∈ L2(Rd) into a 2d−dimensional TF rep-resentation V(x, ξ) containing information about the frequencies ξ occurringat time x. A common analogy for a TF representation is the musical score [21].In Fig. 1 we see the musical score for the first 4 bars of a piece of piano music.

Fig. 1: First 4 bars of "Moanin" by Bobby Timmons.

A musician reads the musical score from left to right and the verticalposition of each note determines the pitch of the note. A note on a pianocorresponds to a fundamental frequency and overtones with frequencies thatare integer multiples of the fundamental frequency. It is the presence ofovertones that determines the particular timbre of the instrument. The firstnote of the musical score in Fig. 1 is F4 which has a fundamental frequencyof 350 Hz and overtones of frequencies 700 Hz, 1050 Hz, 1400 Hz, and soon. Suppose the piano music has been recorded with a microphone. Thisproduces a signal f (x) describing the changes over time in air pressure asshown in Fig. 2.

We might be able to determine some rhythmical patterns or maybe eventhe tempo ( ˇ “ = 126) by analyzing f (x) directly but we cannot say anythingabout the melody. On the other hand, calculating the (normalized) values of| f (ξ)| we obtain the plot in Fig. 3.

9

Page 28: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

0 1 2 3 4 5 6 7Time (s)

-1

0

1

Am

plit

ude

Fig. 2: Time-domain plot of the piano signal.

0 100 200 300 400 500 600 700 800 900 1000Frequency (Hz)

0

0.01

0.02

0.03

Pow

er

Fig. 3: Frequency-domain plot of the piano signal.

By studying Fig. 3 we might conclude that the most dominating frequencyis 350 Hz. Since this frequency corresponds the fundamental frequency of F4we might further deduce that the key of the music piece is F (major or minor)which is correct. Hence, analyzing f (x) and f (ξ) separately we can deter-mine some global properties of the music (such as tempo and key) but weare unable to say anything about its melody. In some sense, this problemis counterintuitive. Even though f (x) and f (ξ) contain all possible informa-tion of the signal, neither representation provides the relevant information.For this reason we want to construct a TF representation which imitates themusical score and provides a description of the frequencies as a function oftime. The connection between TF analysis and music has been studied byseveral authors [2, 4, 29, 92] and has applications within areas such as trans-position [32], transcription [76], beat tracking [89], and compression [57]. Ournotation is mainly inspired by the book of Gröchenig [55], which provides anintroduction to TF analysis in the framework of Gabor theory.

In order to obtain local TF information we apply a smooth cutoff functiong called a window function. Multiplying the signal and the window functionwe obtain a segment of the signal. If we then take the Fourier transform ofthis segment we obtain a description of the frequencies of the signal occur-ring within the segment. This procedure is known as the short-time Fouriertransform (STFT) [1, 8, 55]. Formally, the STFT of f ∈ L2(Rd) with respect to

10

Page 29: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

g ∈ L2(Rd) \ 0 is defined as

Vg f (x, ξ) =∫

Rdf (t)g(t− x)e−2πit·ξ dt, x, ξ ∈ Rd.

For practical purposes we want to consider a discretization of the STFT andthis is where Gabor frames come into play.

4.1 Stationary Gabor analysis

The connection between the STFT and Gabor frames is straightforward. Letf ∈ L2(Rd) and assume gm,nm,n∈Zd is a stationary Gabor frame for L2(Rd).We can rewrite the frame coefficients of f as

〈 f , gm,n〉 = 〈 f , MmbTnag〉 = Vg f (na, mb), m, n ∈ Zd.

Hence, the frame coefficients are samples of the STFT along the lattice aZd ×bZd of the TF plane. Likewise we may write the Gabor expansion of f as

f = ∑m,n∈Zd

Vg f (na, mb)gm,n.

The TF resolution provided by the STFT depends crucially on the choice ofwindow function. To obtain a good time resolution the window functionneeds to be well localized in time. As a first attempt, one might thereforechoose g = χ[0,1]d , with χ[0,1]d denoting the indicator function on the cube[0, 1]d. The Fourier transform of g is then given by

g(ξ) =∫[0,1]d

e−2πix·ξ dx =d

∏k=1

1− e−2πiξk

2πiξk.

With d = 1 we get the plots shown in Fig. 4.

-10 -5 0 5 10

0

0.5

1

Indicator Function

-10 -5 0 5 10

0

0.5

1

Fourier Transform

Fig. 4: Plots of g = χ[0,1] and (the real part of) g.

The plot of g in Fig. 4 reveals that the Fourier transform of g = χ[0,1]d

decays slowly. This produces a new problem for our analysis as the STFT

11

Page 30: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

then provides a poor frequency resolution. One way of realizing this is thefollowing: Using the facts that Tx f = M−x f and Mξ f = Tξ f , we can applyPlancherel’s theorem to obtain the identity

Vg f (x, ξ) = 〈 f , Mξ Txg〉 = 〈 f , Tξ M−x g〉= 〈 f , e2πix·ξ M−xTξ g〉 = e−2πix·ξVg f (ξ,−x), f ∈ L2(Rd). (3)

This identity is often referred to as the fundamental identity of TF analysis[55]. We can think of the expression on the right hand-side of (3) as a rotationof the TF plane by 90 degrees. It follows that if g decays slowly then thefrequency resolution of the STFT becomes poor. Therefore, g = χ[0,1]d is notwell suited as window function. It turns out that the problem noticed forg is due to a fundamental property of functions known as the uncertaintyprinciple.

The uncertainty principle

The uncertainty principle states roughly that g and g cannot both be sup-ported on arbitrary small sets. The exact formulation of the principle can bestated in many different forms [51, 59] and we have chosen to present theformulation of Donoho and Stark [28].

We say that f ∈ L2(Rd) is ε−concentrated on a measurable set T ⊆ Rd

if ‖χTc f ‖2 ≤ ε‖ f ‖2 with Tc denoting the complement of T. In particular, ifε ∈ [0, 1/2) then T is called the essential support of f . We note that if ε = 0then supp( f ) ⊆ T. Assume f ∈ L2(Rd) \ 0 is εT−concentrated on T ⊆ Rd

and f is εΩ−concentrated on Ω ⊆ Rd. The uncertainty principle then states

|T| |Ω| ≥ (1− εT − εΩ)2.

As a corollary, if supp( f ) ⊆ T and supp( f ) ⊆ Ω then |T||Ω| ≥ 1. Under theseadditional assumptions, another version of the uncertainty principle [3, 10]further states that ∞ > |T||Ω| implies f = 0.

The uncertainty principle has a particularly interesting consequence forGabor frames known as the Balian-Low theorem [7, 84]. This result reveals acrucial disadvantage of sampling at the critical density, i.e. ab = 1, and thusexplains why frames are more appropriate than orthonormal bases for Gaboranalysis. Let W0(R

d) denote the Wiener space [37, 60, 107] of continuousfunctions h ∈ L∞(Rd) with ∑n∈Zd ‖h · Tnχ[0,1]d‖∞ < ∞. The Balian-Lowtheorem states that if Mma−1 Tnagm,n∈Zd is a Gabor frame (and thus a Rieszbasis) for L2(Rd) then both g /∈ W0(R

d) and g /∈ W0(Rd) [9, 62]. Hence, it is

not possible to construct a Gabor frame, sampled at the critical density, usinga window function that is well localized in both time and frequency.

The uncertainty principle implies that the idea of simultaneous time andfrequency information is unobtainable. It is not possible to construct an

12

Page 31: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

ideal TF representation which contains exact information of the frequen-cies occurring at each time instant. In practice one instead uses a win-dow function such that both g and g are decaying rapidly, for instance aSchwartz function g ∈ S(Rd). In particular, one can choose the Gaussiang(x) = e−‖x‖

22/2 ∈ S(Rd) as shown in Fig. 5.

-10 -5 0 5 10

0

0.5

1

Gaussian

-10 -5 0 5 10

0

0.5

1

Fourier Transform

Fig. 5: Plots of g(x) = e−x2/2 (to the left) and g(x) =√

2πe−2π2x2(to the right).

The Fourier transform of a Gaussian ϕa(x) := e−π‖x‖22/a, with a > 0, is

simply given by another Gaussian ϕa(ξ) = ad/2 ϕ1/a(ξ), see [55]. The Gaus-sian also posses the important property that it minimizes the uncertainty inthe Heisenberg-Pauli-Weyl inequality (a classical formulation of the uncer-tainty principle) [15]. As the resolution of the STFT depends on the choice ofwindow function, the study of suitable window classes are of great impor-tance [36, 55, 79]. However, it is outside the scope of this introduction to gointo further details.

The Gabor expansion as a sum of building blocks

In this section we take a more intuitive approach for understanding the (sta-tionary) Gabor expansions

f = ∑m,n∈Zd

Vg f (na, mb)gm,n, f ∈ L2(Rd). (4)

We will assume that the window function is a Schwartz function g ∈ S(Rd)such that both g and g decay rapidly. The dual window g = S−1g does notnecessarily have the same shape as g [44, 73, 104] but it can be shown that

lim(a,b)→(0,0)

1ab

g = g.

This result was originally proved by Feichtinger and Zimmermann in [49].Hence, as the sampling density increases the dual window starts to resemblethe original window. This is illustrated in Fig. 6 for the special case when gis a Gaussian.

13

Page 32: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

ab=0.8 ab=0.5 ab=0.2

Fig. 6: Dual windows of the Gaussian with increasing density.

Let us now analyze the structure of the dual frame gm,nm,n by investi-gating the actions of MmbTna on g. In Fig. 7 we have plotted a translated and(the real part of) a modulated version of the Gaussian.

GaussianTranslated Gaussian

GaussianModulated Gaussian

Fig. 7: Translation and (the real part of) modulation of the Gaussian.

Since translation Tna corresponds to a horizontal displacement of the func-tion we often refer to this operator as a time shift. On the other hand, sinceMξ f = Tξ f we refer to modulation as a frequency shift. The compositionMmbTna is called a TF shift and corresponds to a displacement by na in thehorizontal direction and mb in the vertical direction of the TF plane. Hence,we can think of the dual frame MmbTna gm,n as horizontal and vertical dis-placements of g in the TF plane [31]. This is illustrated in Fig. 8.

The Gabor expansion in (4) can therefore be interpreted as a weightedsum of building blocks gm,nm,n centered at the lattice points aZd × bZd inthe TF plane. The weights are samples of the STFT Vg f (na, mb)m,n deter-mining the contribution of the building blocks to the signal [29]. This point ofview can also be found in the more general theory of atomic decompositionsdeveloped by Feichtinger and Gröchenig [42, 45, 46].

14

Page 33: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

Time

Frequency

Time

Frequency

a

b

Fig. 8: TF shift MbTa of g in the TF plane.

The spectrogram

Based on the TF information contained in the (complex valued) Gabor coeffi-cients Vg f (na, mb)m,n we can visualize the TF contents of f by plotting thevalues of |Vg f (na, mb)|2m,n in the TF plane. This produces a TF represen-tation called a spectrogram [86]. In Fig. 9 we have plotted a spectrogram forthe piano music associated with the musical score in Fig. 1.

To produce such a spectrogram one has to go to the finite settings asthe computer can only process vectors of finite lengths. It is outside thescope of this introduction to provide such details and we refer the reader to[91, 101, 103]. In paper D of this thesis we also give a brief summary of Gabortheory in the finite settings. For the associated MATLAB implementation werefer the reader to the toolboxes by Holighaus [6] and Søndergaard [94].

The spectrogram in Fig. 9 has a redundancy of four meaning there arefour times as many Gabor coefficients as signal samples. The 22 vertical linesin the spectrogram correspond to the onsets of the music piece and the hor-izontal lines correspond to the frequencies of the fundamental frequenciesand the overtones. The lengths of the horizontal lines correspond to the du-rations of the notes. We recall that the first note is F4 with a fundamentalfrequency of 350 Hz and overtones of frequencies 700 Hz, 1050 Hz, 1400 Hz,which is reflected in the spectrogram. By varying the length of the windowfunction we can change the resolution of the spectrogram. A shorter win-dow corresponds to an improved time resolution (and worsened frequencyresolution) whereas a longer window corresponds to an improved frequencyresolution (and worsened time resolution), see Fig. 10.

The optimal window size depends on the application. For instance, ashort window might be preferable for determining the tempo whereas a

15

Page 34: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Fig. 9: Spectrogram of the piano music from Fig. 1.

longer window might be better suited for estimating the fundamental fre-quencies [83]. Once the window function and the lattice parameters are cho-sen, the spectrogram provides a TF resolution which is independent of thesignal under consideration. This stationarity can be seen both as an advan-tage and a disadvantage of Gabor frames. On the one hand, the stationarityimplies a fast and easy to handle implementation [101]. On the other hand,it is not possible to assign different TF resolutions to different regions in theTF plane [31]. In particular, it is not possible for the TF resolution to adapt tothe particular characteristics of the signal. The idea behind NSGFs (and themore general multi-window Gabor frames) is to allow the usage of severaldifferent window functions to provide a more flexible TF resolution.

4.2 Nonstationary Gabor analysis

As described in Section 3, NSGFs can be implemented in either the timedomain or the frequency domain. In other words, the TF resolution obtainedthrough NSGFs can vary along either the time or the frequency axis [6]. Thisis a limitation compared to general multi-window Gabor frames where a

16

Page 35: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

Fig. 10: Spectrograms with different window functions.

particular TF resolution can be assigned to any given region of the TF plane[30]. However, the structure of NSGFs implies a fast implementation basedon the FFT which is not the case for general multi-window Gabor frames[83]. In the next two sections we present two practical implementations ofNSGFs, one in the time domain and one in the frequency domain. Bothimplementations were presented in [6] and included in the associated Matlabtoolbox.

Scale frames

We first consider an implementation of NSGFs in the time domain. We recallthat the atoms are given by

gm,n(x) = Mmbn gn(x) = gn(x)e2πimbn ·x, m, n ∈ Zd,

with gnn∈Zd ⊂ L2(Rd) and bnn∈Zd ⊂ R+. For a fixed time point n,the associated frequency sampling parameter bn determines the distance be-tween the vertical sampling points in the TF plane. It is this regularity thatallows for an implementation based on the FFT [6]. Hence, the sampling gridassociated with a NSGF in the time domain is irregular over time but regularover frequency at each fixed time point as shown in Fig. 11.

The algorithm presented in [6] is based on the idea of applying shortwindow functions around the onsets of a music piece and longer windowfunctions between the onsets. The onsets are estimated using a spectral fluxalgorithm which applies a preliminary Gabor transform and calculates thesum of (positive) change in magnitude for all frequency bins [26]. The spacebetween two onsets is spanned in such a way that the window length firstincreases (as we move away from the first onset) and then decreases (as weapproach the second onset). More precisely, with g ∈ C∞

c ([0, 1]) denoting a

17

Page 36: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Time

Frequency

Fig. 11: Sampling grid associated with a NSGF in the time domain.

fixed prototype function, the window functions are given by the dilations

gn(x) :=1√2sn

g( x

2sn

), n ∈ Z.

The parameters snn∈Z constitute an integer-valued scale sequence satisfy-ing |sn − sn−1| ∈ 0, 1. In this way, adjacent window functions are eitherof the same length or one is twice as long as the other. To obtain a low re-dundancy and stable reconstruction, the overlap between adjacent windowfunctions is chosen as 1/3 of the length for equal windows and 2/3 of thelength of the shorter window for different windows. This guarantees a non-zero overlap between adjacent window function and at most two non-zerowindows for any fixed time point. Finally, the numbers of frequency chan-nels are chosen such that the resulting system constitutes a painless NSGF(cf. Section 3.1). Such a NSGF is called a scale frame [6] to emphasize thedependency on the scale sequence snn∈Z. By construction, scale framesapply a redundancy of ≈ 5/3. In Fig. 12 we have plotted a spectrogram,based on a scale frame, for the piano music in Fig. 1.

We note from Fig. 12 how the adaptive behavior of the scale frame pro-duces a spectrogram which is significantly different from the one in Fig. 9.Between each pair of onsets, the horizontal lines become very thin as a conse-quence of the improved frequency resolution. On the other hand, the verticallines associated with the onsets are very sharp due to the good time res-olution. Considering the scale frame applies less than half the number ofcoefficients as used in Fig. 9, the result is very impressive. The reason for thisimpressive performance is do to the characteristics of the signal. Besides forthe chords at the end of bar 2 and bar 4 (cf. Fig. 1), only one note is played ata time and the adaptation procedure therefore correctly detects and separatesall 22 onsets. However, for a more complicated music piece (for instance apiano piece with different melodies played by the left and the right hand), the

18

Page 37: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

Fig. 12: Spectrogram based on a NSGF in the time domain.

onset detection algorithm will have to choose the most "significant" onsets.For onsets not detected by the algorithm, the associated time resolution willbe poor which makes it difficult to determine the correct time instants. Weconsider this problem in more depth in paper D of this thesis.

The constant Q-transform

In this section we consider an implementation of NSGFs in the frequencydomain known as the constant Q-transform [6, 11]. We recall the atoms ofthe frame are given by

hm,n(x) = Tnam hm(x) = hm(x− nam), m, n ∈ Zd,

with hmm∈Zd ⊂ L2(Rd) and amm∈Zd ⊂ R+. The associated samplinggrid is irregular over frequency but regular over time for each fixed frequencychannel as shown in Fig. 13.

At any given point in the TF plane we can define the Q−factor by

Q-factor :=center frequency

windows bandwidth

19

Page 38: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Time

Frequency

Fig. 13: Sampling grid associated with a NSGF in the frequency domain.

For a stationary Gabor expansion, the Q−factor is constant along time but in-creases with frequency. The idea behind the constant Q-transform is to keepthe Q−factor constant over the entire TF plane. To do so one needs to applya NSGF in the frequency domain such that the bandwidths of the windowfunctions increase with frequency. This construction results in a TF repre-sentation with good frequency resolution at the lower frequencies and goodtime resolution at the higher frequencies, similarly to wavelets [17, 58, 87].Such a resolution might be preferable for music signals since it facilitatesthe identification of the fundamental frequencies while keeping a good timeresolution at higher frequencies for determining the locations of the onsets.The constant Q−transform was originally suggested by Brown [11] (in a lessefficient form) and implemented in the framework of NSGFs in [6]. The ef-ficient implementation presented in [6] has subsequently been used for var-ious practical applications such as beat tracking and transposition of musicsignals [32, 66, 67]. In Fig. 14 we have plotted a spectrogram (with dB scaledfrequency axis), based on a constant Q−transform, for the piano music inFig. 1.

The spectrogram in Fig. 14 has a redundancy of ≈ 4 and thus appliesthe same number of coefficients as the spectrogram in Fig. 9. We note the"Christmas tree"-shape of the vertical lines resulting from the varying fre-quency resolution. The fundamental frequencies are easily accessible andthe locations of the associated onsets can be found by looking at the higherfrequencies. However, just as for scale frames, the impressive performance isdue to the simplistic structure of the music piece. For a more advanced pianopiece, the task of determining the locations of the onsets is not at all trivial.Finally, let us note that the constant Q−transform is actually a stationarytransform as the TF resolution is independent of the signal under considera-tion. Hence, the terminology of a NSGF is slightly misleading in this case asthe TF resolution follows a predetermined rule just as for stationary Gabor

20

Page 39: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Time-frequency analysis

Fig. 14: Spectrogram based on a NSGF in the frequency domain (dB scaled frequency axis).

frames.We have presented three different approaches for designing TF represen-

tations of music signals. The stationary Gabor expansion, scale frames andthe constant Q−transform. Each TF representation comes with its own ad-vantages and disadvantages and the best choice of TF representation usuallydepends on the characteristics of the particular signal and the application. Asseen from the spectrograms in Fig. 9, Fig. 12, and Fig. 14, music signals tendto produce structured TF representations. This is because music signals aregenerated from components which are sparse in either time or frequency [93],thereby allowing for horizontal and vertical structures of the spectrograms.We call such spectrograms sparse, since many of the coefficients are closeto zero. The sparseness property allows for powerful approximations thatwith only few non-zero coefficients (compared to the signal length) can pro-duce small reconstruction errors with almost no audible artifacts. In the nextsection we describe this application in more detail.

21

Page 40: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5 Nonlinear approximation with frames

Let D := gkk∈N be a frame for H with ‖gk‖H = 1 for all k ∈ N. Inapproximation theory, such a frame is often referred to as a dictionary for H.Given a possible complicated function f ∈ H we want to approximate f usinglinear combinations of the simpler functions from D. We call f the targetfunction and the members of spangkk∈N the approximants. We define theset of all linear combinations of at most m ∈N elements from D by

Σm :=

∑k∈∆

ckgk

∣∣∣ ∆ ⊂N, card(∆) ≤ m

.

In general, a sum of two elements from Σm will need 2m terms in its repre-sentation by gkk∈N. Therefore, the set Σm is nonlinear and the associatedapproximation theory is known as nonlinear approximation [23, 24, 53]. Theprocedure of approximating f with members of Σm is called m−term approx-imation with m measuring the complexity of the approximation [90, 98]. Theapproximation error associated with Σm is measured by

σm( f )H = infh∈Σm

‖ f − h‖H , f ∈ H.

We note that the sequence σm( f )Hm∈N is non-increasing since Σm ⊆ Σm+1for all m ∈ N. In other words, we can increase the resolution of the targetfunction by increasing the complexity of the approximants. This trade-offbetween resolution and complexity is the main study of approximation the-ory [22]. For practical purposes, we are especially interested in characterizingfunctions f ∈ H with approximation errors decaying as

σm( f )H ≤ Cm−α, ∀m ∈N, (5)

for some C > 0. Therefore, we define the approximation spaces

Aατ(H) :=

f ∈ H

∣∣∣ ∑m∈N

(mασm( f )H)τ 1

m< ∞

, α, τ ∈ (0, ∞),

together with the associated (quasi-)norms

‖ f ‖Aατ(H) :=

∥∥∥mα−1/τσm( f )Hm∈N

∥∥∥`τ+ ‖ f ‖H , f ∈ Aα

τ(H).

We note that f ∈ Aατ(H) is a slightly stronger condition than (5). We also

note that the approximation spaces Aατ(H) are "large" function spaces, in

particular they contain all the spaces Σmm∈N. One of the goals of nonlin-ear approximation theory is to characterize the elements of Aα

τ(H), thereby

22

Page 41: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Nonlinear approximation with frames

providing a description of the elements f ∈ H which permit good m−termapproximations with respect to D [24].

If D forms an orthonormal basis for H then every f ∈ H has a uniqueexpansion

f = ∑k∈N

〈 f , gk〉 gk, with ‖ f ‖2H = ∑

k∈N

|〈 f , gk〉|2 .

From this one deduces that the best m−term approximation to f is obtainedby choosing an index set ∆m corresponding to m terms in the expansionfor which |〈 f , gk〉| is largest. The associated approximation error is givenby σm( f )H = (∑k/∈∆m | 〈 f , gk〉 |2)1/2. For this special case have a completecharacterization of the approximation space

f ∈ Aατ(H)⇔ ‖〈 f , gk〉k∈N‖`τ < ∞,

with α ∈ (0, ∞) and 0 < τ := (α + 1/2)−1 < 2. This characterization wasproved by Stechkin [102] for the case α = 1/2 and for general α by DeVoreand Temlyakov [25].

When D is a redundant frame it becomes much more difficult to provide acomplete characterization ofAα

τ(H). It is often possible to construct a simplerspace K with K → Aα

τ(H), i.e., K ⊂ Aατ(H) and ‖ f ‖Aα

τ(H) ≤ C ‖ f ‖K forsome C > 0 and all f ∈ K, see [53]. Unfortunately, the converse embeddingis in general very hard to prove for redundant dictionaries [54]. We callK → Aα

τ(H) a Jackson embedding and Aατ(H) → K a Bernstein embedding

[12]. The space K is referred to as a smoothness or sparsity space. We notethat a Jackson embedding provides an upper bound on the approximationerror σm( f )Hm∈N for all f ∈ K whereas a Bernstein embedding provides alower bound.

An important result by Gröchenig and Samarah [57] shows that modula-tion spaces can be used as smoothness spaces for stationary Gabor frames.Modulation spaces were introduced by Feichtinger [38] in 1983 and furtherstudied in [40, 43, 44]. Given 1 ≤ p < ∞ and γ ∈ S(Rd) \ 0, the modulationspace Mp is defined as those f ∈ S ′(Rd) satisfying

‖ f ‖Mp :=(∫

Rd

∫Rd|Vγ f (x, ξ)|p dxdξ

)1/p< ∞.

It can be shown that Mp is independent of the particular choice of windowfunction γ and different choices yield equivalent norms [38, 55]. For a givenGabor frame gm,nm,n∈Zd = MmbTnagm,n∈Zd , the result by Gröchenig andSamarah states that for 1 ≤ τ < p < ∞ then

Mτ → Aατ(Mp), with α = 1/τ − 1/p.

23

Page 42: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Hence, the approximation error associated with f ∈ Mτ ⊆ Mp (cf. [55] for aproof of this embedding) decays as

σm( f )Mp ≤ Cm−α ‖ f ‖Mτ , ∀m ∈N, (6)

for some C > 0 and α = 1/τ − 1/p. To illustrate the applications of thetheory we return to the piano music from Fig. 1. Since music signals arecontinuous signals of finite energy, it make sense to consider them in theframework of modulation spaces. The result in (6) therefore indicates thatwe can expect good approximations of such signals when thresholding theassociated Gabor frame expansions. The spectrogram in Fig. 9 is constructedfrom a Gabor expansion with 655680 Gabor coefficients. Performing hardthresholding and keeping only the 65568 largest coefficients (10% of the totalnumber of coefficients) we obtain the spectrogram in Fig. 15.

Fig. 15: Original and thresholded spectrogram of the piano music from Fig. 1.

The associated reconstruction error is ‖ f − frec‖2/‖ f ‖2 ≈ 0.005 and theresulting sound is almost indistinguishable from the original sound. Some ofthe overtones with high frequencies have been removed, which produces aless colorful timbre of the instrument (one needs good headphones to noticethis). One the other hand, some of the low background noises have beensuppressed, which results in a more clean sound. In conclusion, with only10% of the expansion coefficients we obtain an approximation of the signalwithout almost any decrease in audio quality. It is important to rememberthat it is the sparsity of the signal which implies such convincing results —for less sparse signals we need more than just 10% of the coefficients forobtaining a reasonable audio quality.

We conclude this introduction by separately addressing each of the fourscientific papers and explaining the connection with the theory presented inthe introduction.

24

Page 43: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

6. Connection with papers A-D

6 Connection with papers A-D

In Paper A of this thesis we consider nonlinear approximation with generalredundant dictionaries (not necessarily restricted to frames). The idea is togeneralize the traditional greedy approach [25, 105] for m−term approxima-tion (cf. Fig. 15) by including weight functions in the thresholding proce-dure. For many real world signals the expansions coefficients are correlatedand organized in structured sets [77, 78] (see also Fig. 9) and the threshold-ing procedure should ideally take this dependency into account. The mainresult of this article is a Jackson embedding for the proposed algorithm un-der rather general conditions. As an application we generalize the result byGröchenig and Samarah [57] and provide a numerical comparison betweenthe proposed method and the traditional greedy approach by thresholdingmusic signals expanded in a Gabor dictionary.

In paper B we prove a Jackson embedding for NSGFs in the frequencydomain. Whereas modulation spaces works as smoothness spaces for Gaborframes, we need a more general framework for the nonstationary case. Such aframework is provided by decomposition spaces as introduced by Feichtingerand Gröbner [39, 41]. Decomposition spaces are based on a flexible partitionof the frequency domain compatible with the structure of NSGFs. The mainresult of this article is a characterization of the decomposition spaces in termsof the frame coefficients from the NSGFs.

In paper C we consider an approach similar to that of paper B. We use de-composition spaces on the time side to provide a stability result of Hausdorff-Young type for NSGFs in the time domain and prove an associated Jacksoninequality for specific choices of parameters. It should be noted that decom-position spaces on the time side are significantly different from decomposi-tion spaces on the frequency side.

Finally, in paper D we consider a practical application and construct anew time-stretching algorithm based on NSGFs in the time domain. Time-stretching is the operation of changing the length of a signal without affect-ing its frequencies [88]. The paper presents a classical technique known asthe phase vocoder [50] in the framework of Gabor theory and extends thistechniques to the nonstationary case. The theory is described in the finitesettings and the corresponding algorithm is implemented in MATLAB withassociated source code and sound files available on-line.

25

Page 44: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

References

[1] J. B. Allen and L. R. Rabiner. A unified approach to short-time Fourieranalysis and synthesis. Proceedings of the IEEE, 65(11):1558–1564, Nov1977.

[2] J. F. Alm and J. S. Walker. Time-frequency analysis of musical instru-ments. SIAM Rev., 44(3):457–476, 2002.

[3] W. O. Amrein and A. M. Berthier. On support properties of Lp-functions and their Fourier transforms. J. Functional Analysis, 24(3):258–267, 1977.

[4] G. Assayag, H. G. Feichtinger, and J. F. Rodrigues, editors. Mathematicsand music. Springer-Verlag, Berlin, 2002.

[5] L. W. Baggett. Processing a radar signal and representations of thediscrete Heisenberg group. Colloq. Math., 60/61(1):195–203, 1990.

[6] P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco. Theory,implementation and applications of nonstationary Gabor frames. J.Comput. Appl. Math., 236(6):1481–1496, 2011.

[7] R. Balian. Un principe d’incertitude fort en théorie du signal ou enmécanique quantique. C. R. Acad. Sci. Paris Sér. II Méc. Phys. Chim. Sci.Univers Sci. Terre, 292(20):1357–1362, 1981.

[8] M. Bastiaans. On the sliding-window representation in digital signalprocessing. IEEE Transactions on Acoustics, Speech, and Signal Processing,33(4):868–873, Aug 1985.

[9] J. J. Benedetto, C. Heil, and D. F. Walnut. Differentiation and the Balian-Low theorem. J. Fourier Anal. Appl., 1(4):355–402, 1995.

[10] M. Benedicks. On Fourier transforms of functions supported on sets offinite Lebesgue measure. J. Math. Anal. Appl., 106(1):180–183, 1985.

[11] J. C. Brown. Calculation of a constant Q spectral transform. Journal ofthe Acoustical Society of America, 89(1):425–434, 1991.

[12] P. L. Butzer and K. Scherer. Jackson and Bernstein-type inequalities forfamilies of commutative operators in Banach spaces. J. ApproximationTheory, 5:308–342, 1972.

[13] O. Christensen. An introduction to frames and Riesz bases. Applied andNumerical Harmonic Analysis. Birkhäuser/Springer, [Cham], secondedition, 2016.

26

Page 45: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[14] C. K. Chui and X. L. Shi. Bessel sequences and affine frames. Appl.Comput. Harmon. Anal., 1(1):29–49, 1993.

[15] M. G. Cowling and J. F. Price. Bandwidth versus time concentration:the Heisenberg-Pauli-Weyl inequality. SIAM J. Math. Anal., 15(1):151–165, 1984.

[16] I. Daubechies. The wavelet transform, time-frequency localization andsignal analysis. IEEE Trans. Inform. Theory, 36(5):961–1005, 1990.

[17] I. Daubechies. Ten lectures on wavelets, volume 61 of CBMS-NSF RegionalConference Series in Applied Mathematics. Society for Industrial and Ap-plied Mathematics (SIAM), Philadelphia, PA, 1992.

[18] I. Daubechies and A. Grossmann. Frames in the Bargmann space ofentire functions. Comm. Pure Appl. Math., 41(2):151–164, 1988.

[19] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonalexpansions. J. Math. Phys., 27(5):1271–1283, 1986.

[20] I. Daubechies, H. J. Landau, and Z. Landau. Gabor time-frequencylattices and the Wexler-Raz identity. J. Fourier Anal. Appl., 1(4):437–478,1995.

[21] N. G. de Bruijn. Uncertainty principles in Fourier analysis. Inequalities,pages 57–71, 1967.

[22] R. A. DeVore. Nonlinear approximation. In Acta numerica, 1998, vol-ume 7 of Acta Numer., pages 51–150. Cambridge Univ. Press, Cam-bridge, 1998.

[23] R. A. DeVore. Nonlinear approximation and its applications. In Mul-tiscale, nonlinear and adaptive approximation, pages 169–201. Springer,Berlin, 2009.

[24] R. A. DeVore and G. G. Lorentz. Constructive approximation, volume 303of Grundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, 1993.

[25] R. A. DeVore and V. N. Temlyakov. Some remarks on greedy algo-rithms. Advances in Computational Mathematics, 5(1):173–187, Dec 1996.

[26] S. Dixon. Onset detection revisited. In Proc. of the Int. Conf. on DigitalAudio Effects (DAFx-06), pages 133–137, Montreal, Quebec, Canada, sep2006.

[27] D. L. Donoho and X. Huo. Uncertainty principles and ideal atomic de-composition. IEEE Transactions on Information Theory, 47(7):2845–2862,Nov 2001.

27

Page 46: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[28] D. L. Donoho and P. B. Stark. Uncertainty principles and signal recov-ery. SIAM J. Appl. Math., 49(3):906–931, 1989.

[29] M. Dörfler. Time-frequency analysis for music signals: A mathematicalapproach. Journal of New Music Research, 30(1):3–12, 2001.

[30] M. Dörfler. Quilted Gabor frames—a new concept for adaptive time-frequency representation. Adv. in Appl. Math., 47(4):668–687, 2011.

[31] M. Dörfler. What time-frequency analysis can do to music signals(and what it can’t do. . .). In Mathematics and culture. III, pages 39–51.Springer, Heidelberg, 2012.

[32] M. Dörfler, N. Holighaus, T. Grill, and G. Velasco. Constructing aninvertible constant-Q transform with nonstationary Gabor frames. Inn Proceedings of the 14th International Conference on Digital Audio Effects(DAFx 11), 2011.

[33] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—existenceand construction. Int. J. Wavelets Multiresolut. Inf. Process., 12(3):1450032,18, 2014.

[34] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—approximately dual frames and reconstruction errors. Adv. Comput.Math., 41(2):293–316, 2015.

[35] R. J. Duffin and A. C. Schaeffer. A class of nonharmonic Fourier series.Trans. Amer. Math. Soc., 72:341–366, 1952.

[36] H. G. Feichtinger. On a new Segal algebra. Monatsh. Math., 92(4):269–289, 1981.

[37] H. G. Feichtinger. Banach convolution algebras of Wiener type. InFunctions, series, operators, Vol. I, II (Budapest, 1980), volume 35 of Col-loq. Math. Soc. János Bolyai, pages 509–524. North-Holland, Amsterdam,1983.

[38] H. G. Feichtinger. Modulation spaces on locally compact abeliangroups. Technical report, University of Vienna, 1983.

[39] H. G. Feichtinger. Banach spaces of distributions defined by decompo-sition methods. II. Math. Nachr., 132:207–237, 1987.

[40] H. G. Feichtinger. Atomic characterizations of modulation spacesthrough Gabor-type representations. Rocky Mountain J. Math.,19(1):113–125, 1989.

28

Page 47: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[41] H. G. Feichtinger and P. Gröbner. Banach spaces of distributions de-fined by decomposition methods. I. Math. Nachr., 123:97–120, 1985.

[42] H. G. Feichtinger and K. Gröchenig. A unified approach to atomic de-compositions via integrable group representations. In Function spacesand applications (Lund, 1986), volume 1302 of Lecture Notes in Math.,pages 52–73. Springer, Berlin, 1988.

[43] H. G. Feichtinger and K. Gröchenig. Gabor wavelets and the Heisen-berg group: Gabor expansions and short time Fourier transform fromthe group theoretical point of view. In Wavelets, volume 2 of WaveletAnal. Appl., pages 359–397. Academic Press, Boston, MA, 1992.

[44] H. G. Feichtinger and K. Gröchenig. Gabor frames and time-frequencyanalysis of distributions. J. Funct. Anal., 146(2):464–495, 1997.

[45] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related to in-tegrable group representations and their atomic decompositions. I. J.Funct. Anal., 86(2):307–340, 1989.

[46] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related to in-tegrable group representations and their atomic decompositions. II.Monatsh. Math., 108(2-3):129–148, 1989.

[47] H. G. Feichtinger and T. Strohmer, editors. Gabor analysis and algorithms.Applied and Numerical Harmonic Analysis. Birkhäuser Boston, Inc.,Boston, MA, 1998. Theory and applications.

[48] H. G. Feichtinger and T. Strohmer, editors. Advances in Gabor analysis.Applied and Numerical Harmonic Analysis. Birkhäuser Boston, Inc.,Boston, MA, 2003.

[49] H. G. Feichtinger and G. Zimmermann. A Banach space of test func-tions for Gabor analysis. In Gabor analysis and algorithms, Appl. Numer.Harmon. Anal., pages 123–170. Birkhäuser Boston, Boston, MA, 1998.

[50] J. L. Flanagan and R. M. Golden. Phase vocoder. The Bell System Tech-nical Journal, 45(9):1493–1509, Nov 1966.

[51] G. B. Folland and A. Sitaram. The uncertainty principle: a mathemati-cal survey. J. Fourier Anal. Appl., 3(3):207–238, 1997.

[52] D. Gabor. Theory of communication. J. IEE, 93(26):429–457, Nov. 1946.

[53] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. I. Direct estimates. J. Fourier Anal. Appl., 10(1):51–71, 2004.

29

Page 48: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[54] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. II. Inverse estimates. Constr. Approx., 24(2):157–173, 2006.

[55] K. Gröchenig. Foundations of time-frequency analysis. Applied and Nu-merical Harmonic Analysis. Birkhäuser Boston, Inc., Boston, MA, 2001.

[56] K. Gröchenig. The mystery of Gabor frames. J. Fourier Anal. Appl.,20(4):865–895, 2014.

[57] K. Gröchenig and S. Samarah. Nonlinear approximation with localFourier bases. Constr. Approx., 16(3):317–331, 2000.

[58] A. Grossmann and J. Morlet. Decomposition of Hardy functions intosquare integrable wavelets of constant shape. SIAM J. Math. Anal.,15(4):723–736, 1984.

[59] V. Havin and B. Jöricke. The uncertainty principle in harmonic analysis,volume 28 of Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Resultsin Mathematics and Related Areas (3)]. Springer-Verlag, Berlin, 1994.

[60] C. Heil. An introduction to weighted Wiener amalgams. Wavelets andtheir Applications (Chennai, January 2002), pages 183–216, 2003.

[61] C. Heil. A basis theory primer. Applied and Numerical Harmonic Anal-ysis. Birkhäuser/Springer, New York, expanded edition, 2011.

[62] C. E. Heil. Wiener amalgam spaces in generalized harmonic analysis andwavelet theory. PhD thesis, University of Maryland, College Park, 1990.

[63] C. E. Heil and D. F. Walnut. Continuous and discrete wavelet trans-forms. SIAM Rev., 31(4):628–666, 1989.

[64] E. Hernández, D. Labate, and G. Weiss. A unified characterization ofreproducing systems generated by a finite family. II. J. Geom. Anal.,12(4):615–662, 2002.

[65] N. Holighaus. Structure of nonstationary Gabor frames and their dualsystems. Appl. Comput. Harmon. Anal., 37(3):442–463, 2014.

[66] N. Holighaus, M. Dörfler, G. A. Velasco, and T. Grill. A framework forinvertible, real-time constant-Q transforms. IEEE Transactions on Audio,Speech, and Language Processing, 21(4):775–785, April 2013.

[67] A. Holzapfel, G. A. Velasco, N. Holighaus, M. Dörfler, and A. Flexer.Advantages of nonstationary Gabor transforms in beat tracking. InProceedings of the 1st International ACM Workshop on Music InformationRetrieval with User-centered and Multimodal Strategies, MIRUM ’11, pages45–50, New York, NY, USA, 2011. ACM.

30

Page 49: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[68] F. Jaillet. Représentation et traitement temps-fréquence des signaux audion-umériques pour des applications de design sonore. PhD thesis, Universitéd’Aix-Marseille II. Faculté des sciences, 2005.

[69] M. S. Jakobsen and J. Lemvig. Reproducing formulas for generalizedtranslation invariant systems on locally compact abelian groups. Trans.Amer. Math. Soc., 368(12):8447–8480, 2016.

[70] A. J. E. M. Janssen. Gabor representation of generalized functions. J.Math. Anal. Appl., 83(2):377–394, 1981.

[71] A. J. E. M. Janssen. Bargmann transform, Zak transform, and coherentstates. J. Math. Phys., 23(5):720–731, 1982.

[72] A. J. E. M. Janssen. Signal analytic proofs of two basic results on latticeexpansions. Appl. Comput. Harmon. Anal., 1(4):350–354, 1994.

[73] A. J. E. M. Janssen. Duality and biorthogonality for Weyl-Heisenbergframes. J. Fourier Anal. Appl., 1(4):403–436, 1995.

[74] A. J. E. M. Janssen. The duality condition for Weyl-Heisenberg frames.In Gabor analysis and algorithms, Appl. Numer. Harmon. Anal., pages33–84. Birkhäuser Boston, Boston, MA, 1998.

[75] A. J. E. M. Janssen. Zak transforms with few zeros and the tie. InAdvances in Gabor analysis, Appl. Numer. Harmon. Anal., pages 31–70.Birkhäuser Boston, Boston, MA, 2003.

[76] A. Klapuri and M. Davy, editors. Signal Processing Methods for MusicTranscription. Springer, New York, 1st edition, 2006.

[77] M. Kowalski. Sparse regression using mixed norms. Appl. Comput.Harmon. Anal., 27(3):303–324, 2009.

[78] M. Kowalski and B. Torrésani. Sparsity and persistence: mixed normsprovide simple signal models with dependent coefficients. Signal, Imageand Video Processing, 3(3):251–264, Sep 2009.

[79] W. Kozek. Matched generalized Gabor expansion of nonstationary pro-cesses. In Proceedings of 27th Asilomar Conference on Signals, Systems andComputers, volume 1, pages 499–503, Nov 1993.

[80] G. Kutyniok and D. Labate. The theory of reproducing systems onlocally compact abelian groups. Colloq. Math., 106(2):197–220, 2006.

[81] J. Lemvig. On some Hermite series identities and their applications toGabor analysis. Monatsh. Math., 182(4):899–912, 2017.

31

Page 50: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[82] E. H. Lieb and M. Loss. Analysis, volume 14 of Graduate Studies inMathematics. American Mathematical Society, Providence, RI, secondedition, 2001.

[83] M. Liuni, A. Röbel, E. Matusiak, M. Romito, and X. Rodet. Automaticadaptation of the time-frequency resolution for sound analysis and re-synthesis. IEEE Transactions on Audio, Speech, and Language Processing,21(5):959–970, May 2013.

[84] F. Low. Complete sets of wave packets. In C. DeTar, editor, A Passion forPhysics - Essay in Honor of Geoffrey Chew, pages 17–22. World Scientific,Singapore, 1985.

[85] Y. I. Lyubarskiı. Frames in the Bargmann space of entire functions. InEntire and subharmonic functions, volume 11 of Adv. Soviet Math., pages167–180. Amer. Math. Soc., Providence, RI, 1992.

[86] S. Mallat. A wavelet tour of signal processing. Elsevier/Academic Press,Amsterdam, third edition, 2009. The sparse way, With contributionsfrom Gabriel Peyré.

[87] Y. Meyer. Wavelets. Society for Industrial and Applied Mathematics(SIAM), Philadelphia, PA, 1993. Algorithms & applications, Translatedfrom the French and with a foreword by Robert D. Ryan.

[88] E. Moulines and J. Laroche. Non-parametric techniques for pitch-scaleand time-scale modification of speech. Speech Commun., 16(2):175–205,feb 1995.

[89] M. Müller. Fundamentals of Music Processing: Audio, Analysis, Algorithms,Applications. Springer, New York, 1st edition, 2015.

[90] K. I. Oskolkov. Polygonal approximation of functions of two variables.Mat. Sb. (N.S.), 107(149)(4):601–612, 639, 1978.

[91] G. E. Pfander. Gabor frames in finite dimensions. In Finite frames,Appl. Numer. Harmon. Anal., pages 193–239. Birkhäuser/Springer,New York, 2013.

[92] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni. Time-frequencyanalysis of musical signals. Proceedings of the IEEE, 84(9):1216–1230, Sep1996.

[93] M. D. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M. E.Davies. Sparse representations in audio and music: From coding tosource separation. Proceedings of the IEEE, 98(6):995–1005, June 2010.

32

Page 51: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[94] Z. Pruša, P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Bal-azs. The Large Time-Frequency Analysis Toolbox 2.0. In M. Aramaki,O. Derrien, R. Kronland-Martinet, and S. Ystad, editors, Sound, Mu-sic, and Motion, Lecture Notes in Computer Science, pages 419–442.Springer International Publishing, 2014.

[95] Z. Pruša and N. Holighaus. Phase vocoder done right. In 2017 25thEuropean Signal Processing Conference (EUSIPCO), pages 976–980, Aug2017.

[96] A. Ron and Z. Shen. Weyl-Heisenberg frames and Riesz bases inL2(Rd). Duke Math. J., 89(2):237–282, 1997.

[97] A. Ron and Z. Shen. Generalized shift-invariant systems. Constr. Ap-prox., 22(1):1–45, 2005.

[98] E. Schmidt. Zur Theorie der linearen und nichtlinearen Integralgle-ichungen. Math. Ann., 63(4):433–476, 1907.

[99] K. Seip. Density theorems for sampling and interpolation in theBargmann-Fock space. I. J. Reine Angew. Math., 429:91–106, 1992.

[100] K. Seip and R. Wallstén. Density theorems for sampling and interpola-tion in the Bargmann-Fock space. II. J. Reine Angew. Math., 429:107–113,1992.

[101] P. L. Søndergaard. Finite discrete Gabor analysis. PhD thesis, TechnicalUniversity of Denmark, 2007.

[102] S. B. Stechkin. On absolute convergence of orthogonal series. Dokl.Akad. Nauk SSSR (N.S.), 102:37–40, 1955.

[103] T. Strohmer. Numerical algorithms for discrete Gabor expansions. InGabor analysis and algorithms, Appl. Numer. Harmon. Anal., pages 267–294. Birkhäuser Boston, Boston, MA, 1998.

[104] T. Strohmer. Approximation of dual Gabor frames, window decay, andwireless communications. Appl. Comput. Harmon. Anal., 11(2):243–262,2001.

[105] V. N. Temlyakov. Greedy approximation. Acta Numer., 17:235–409, 2008.

[106] D. F. Walnut. Continuity properties of the Gabor frame operator. J.Math. Anal. Appl., 165(2):479–504, 1992.

[107] N. Wiener. Tauberian theorems. Ann. of Math. (2), 33(1):1–100, 1932.

33

Page 52: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[108] P. J. Wolfe, S. J. Godsill, and M. Dörfler. Multi-Gabor dictionaries foraudio time-frequency analysis. In Proceedings of the 2001 IEEE Work-shop on the Applications of Signal Processing to Audio and Acoustics (Cat.No.01TH8575), pages 43–46, 2001.

[109] R. M. Young. An introduction to nonharmonic Fourier series, volume 93of Pure and Applied Mathematics. Academic Press, Inc. [Harcourt BraceJovanovich, Publishers], New York-London, 1980.

[110] M. Zibulski and Y. Y. Zeevi. Analysis of multiwindow Gabor-typeschemes by frame methods. Appl. Comput. Harmon. Anal., 4(2):188–221,1997.

34

Page 53: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Part II

Papers

35

Page 54: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen
Page 55: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A

Weighted Thresholding and NonlinearApproximation

Emil Solsbæk Ottosen and Morten Nielsen

The paper has been submitted toIndian Journal of Pure And Applied Mathematics

Page 56: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

c© 2017 SpringerThe layout has been revised.

Page 57: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

1. Introduction

Abstract

We present a new method for performing nonlinear approximation with redundantdictionaries. The method constructs an m−term approximation of the signal bythresholding with respect to a weighted version of its canonical expansion coeffi-cients, thereby accounting for dependency between the coefficients. The main resultis an associated strong Jackson embedding, which provides an upper bound on thecorresponding reconstruction error. To complement the theoretical results, we com-pare the proposed method to the pure greedy method and the Windowed-Group Lassoby denoising music signals with elements from a Gabor dictionary.

1 Introduction

Let X be a Banach space equipped with a norm ‖ · ‖X . We consider the prob-lem of approximating a possibly complicated function f ∈ X using linearcombinations of simpler functions D := gkk∈N. We assume D forms a com-plete dictionary for X such that ‖gk‖X = 1, for all k ∈ N, and spangkk∈N

is dense in X. A natural way of performing the approximation is to constructan m-term approximation fm to f using a linear combination of at most melements from D [22, 27, 32]. This leads us to consider the set

Σm(D) :=

∑k∈∆

ckgk

∣∣∣ ∆ ⊂N, #∆ ≤ m

, m ∈N.

We note that Σm(D) is nonlinear since a sum of two elements from Σm(D)will in general need 2m terms in its representation by gkk∈N. We measurethe approximation error associated to Σm(D) by

σm( f ,D)X := infh∈Σm(D)

‖ f − h‖X , f ∈ X. (A.1)

One of the main challenges of nonlinear approximation theory is to charac-terize the elements f ∈ X, which have a prescribed rate of approximationα > 0 [10, 21, 29]. This is usually done by defining an approximation spaceA ⊆ X with the property

σm( f ,D)X = O(m−α

), ∀ f ∈ A. (A.2)

It is often difficult to characterize the elements of A directly and a standardapproach is therefore to construct a simpler space K ⊆ X together with acontinuous embedding K → A. The space K is referred to as a smoothnessor sparseness space and the continuous embedding as a Jackson embedding[2, 11].

39

Page 58: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

In the special case gkk∈N forms an orthonormal basis for a Hilbert spaceH, it follows from Parseval’s identity that the best m−term approximation tof is obtained by thresholding the (unique) expansion coefficients and keepingonly the m largest coefficients. For a redundant dictionary D, the problem ofconstructing the best m−term approximation is in general computationallyintractable [7]. For this reason, various algorithms have been constructed toproduce fast and good (but not necessarily the best) m−term approximations[6, 12, 19]. By "good approximations" we mean an algorithm Am : f → fm forwhich there exists an approximation space T ⊆ X with

‖ f − fm‖X = O(m−α

), ∀ f ∈ T .

An associated embedding of the type K → T is referred to as a strongJackson embedding. Whereas a standard Jackson embedding only shows thatthe error of best m-term approximation decays as in (A.2), a strong Jacksonembedding also provides an associated constructive algorithm with this rateof approximation [19].

Traditionally, the expansion coefficients are processed individually withan implicit assumption of independence between the coefficients [12]. How-ever, for many signals such an assumption is not valid as the coefficientsare often correlated and organized in structured sets [23, 25]. For instance,it is known that many music signals are generated by components, whichare sparse in either time or frequency [30], resulting in sparse and struc-tured time-frequency representations (cf. Fig. A.1 on page 50). In this articlewe present a thresholding algorithm, which incorporates a weight functionto account for dependency between expansion coefficients. The main resultis given in Theorem 3.1 and provides a strong Jackson embedding for theproposed method. The idea of exploiting the structure of the expansion co-efficients is somewhat similar to the one found in social sparsity [24, 26, 33].Social sparsity can be seen as a generalization of the classical Lasso [38] (alsoknown as basis pursuit denoising [4]) where a weighted neighborhood ofeach coefficient is considered for deciding whether or not to keep the coeffi-cient. However, whereas social sparsity searches through the dictionary forsparse reconstruction coefficients, the purposed method considers weightedthresholding of the canonical frame coefficients. We compare the proposedalgorithm to the Windowed-Group-Lasso (WGL) [25, 34] from social sparsityand the greedy thresholding approach [12, 19] from nonlinear approximationtheory by denoising music signals expanded in a Gabor dictionary [5, 20].

The structure of this article is as follows. In Section 2 we introduce thenecessary tools from nonlinear approximation theory and in Section 3 wepresent the proposed algorithm and prove Theorem 3.1. In Section 4 weprovide the link to modulation spaces and Gabor frames and in Section 5we provide the numerical experiments. Finally, in Section 6 we give theconclusions.

40

Page 59: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Elements from nonlinear approximation theory

2 Elements from nonlinear approximation theory

In this section we define the concepts from nonlinear approximation theorythat we will use throughout this article. We refer the reader to [8, 9, 11, 19] forfurther details. For amm∈N ⊂ C, we let a∗mm∈N denote a non-increasingrearrangement of amm∈N such that |a∗m| ≥ |a∗m+1| for all m ∈N.

Definition 2.1 (Lorentz spaces). Given τ ∈ (0, ∞), q ∈ (0, ∞], we define theLorentz space `τ

q as the collection of amm∈N ⊂ C satisfying that

‖amm∈N‖`τq

:=

(

∑∞m=1

[m1/τ |a∗m|

]q 1m

)1/q, 0 < q < ∞

supm≥1 m1/τ |a∗m| , q = ∞

is finite.

We note that ‖ · ‖`ττ= ‖ · ‖`τ for any τ ∈ (0, ∞). The Lorentz spaces are

rearrangement invariant (quasi-)Banach spaces (Banach spaces if 1 ≤ q ≤τ < ∞) satisfying the continuous embeddings

`τ1q1 → `τ2

q2 if τ1 < τ2 or if τ1 = τ2 and q1 ≤ q2, (A.3)

see [3, 8] for details. Let D := gkk∈N be a complete dictionary for a Banachspace X and define σm( f ,D)Xm∈N as in (A.1). We will use the followingapproximation spaces [11].

Definition 2.2 (Approximation spaces). Given α ∈ (0, ∞), q ∈ (0, ∞], we de-fine

Aαq (D, X) :=

f ∈ X

∣∣∣ ‖ f ‖Aαq (D,X) := ‖σm( f ,D)Xm∈N‖`1/α

q+ ‖ f ‖X < ∞

.

The quantity ‖ · ‖Aαq (D,X) forms a (quasi-)norm on Aα

q (D, X), and from(A.3) we immediately obtain the continuous embeddings

Aα1q1 (D, X) → Aα2

q2 (D, X) if α1 > α2 or if α1 = α2 and q1 ≤ q2.

We note the f ∈ Aαq (D, X) implies the decay in (A.2) as desired. Following

the approach in [12], we define smoothness spaces as follows.

Definition 2.3 (Smoothness spaces). Given τ ∈ (0, ∞), q ∈ (0, ∞], M > 0, let

Kτq (D, X, M) := closX

∑k∈∆

ckgk ∈ X∣∣∣∆ ⊂N, #∆ < ∞, ‖ckk∈∆‖`τ

q≤ M

.

We then define Kτq (D, X) := ∪M>0Kτ

q (D, X, M) with

| f |Kτq (D,X) := inf

M > 0

∣∣∣ f ∈ Kτq (D, X, M)

.

41

Page 60: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

It can be shown that | · |Kτq (D,X) is a (semi-quasi-)norm on Kτ

q (D, X) and a(quasi-)norm if τ ∈ (0, 1).

Remark 2.1. We note that for general D and X, f ∈ Kτq (D, X) does not

imply the existence of ckk∈N ∈ `τq with f = ∑k∈N ckgk. All realizations of

Kτq (D, X) considered in this article will, however, guarantee the existence of

such reconstruction coefficients (cf. Proposition 2.1 below).

Example 2.1. If B = gkk∈N is an orthonormal basis for a Hilbert space H,we have the characterization

Aαq (B,H) = Kτ

q (B,H) =

f = ∑

k∈N

〈 f , gk〉 gk ∈ H∣∣∣ ‖〈 f , gk〉k∈N‖`τ

q< ∞

,

with α ∈ (0, ∞), q ∈ (0, ∞] and 0 < τ = (α + 1/2)−1 < 2 [12, 36]. 4Denoting the space of finite sequences on N by `0, we define the recon-

struction operator R : `0 → X by

R : ckk∈∆ → ∑k∈∆

ckgk, ckk∈∆ ∈ `0. (A.4)

Recalling that ‖gk‖X = 1, for all k ∈ N, we can extend this operator to abounded operator from `1 to X since

‖Rckk∈∆‖X ≤ ∑k∈∆|ck| ‖gk‖X = ‖ckk∈∆‖`1 , ∀ckk∈∆ ∈ `0. (A.5)

Following the approach in [19] we introduce the following class of dictionar-ies.

Definition 2.4 (Hilbertian dictionary). Let D = gkk∈N be a dictionary in aBanach space X. Given τ ∈ (0, ∞), q ∈ (0, ∞], we say that D is `τ

q−hilbertianif the reconstruction operator R given in (A.4) is bounded from `τ

q to X.

It follows from (A.5) and (A.3) that every D is `τq−hilbertian if τ < 1.

According to [19, Proposition 3] we have the following characterization.

Proposition 2.1. Assume D is `p1−hilbertian with p ∈ (1, ∞). Let τ ∈ (0, p) and

q ∈ (0, ∞]. For all f ∈ Kτq (D, X), there exists some c := cτ,q( f ) ∈ `τ

q with f = Rcand ‖c‖`τ

q = | f |Kτq (D,X). If 1 < q ≤ τ < ∞, then c is unique. Consequently

| f |Kτq (D,X) = min

c∈`τq , f=Rc

‖c‖`τq

,

and

Kτq (D, X) =

k∈N

ckgk ∈ X

∣∣∣∣∣ ‖ckk∈N‖`τq< ∞

is a (quasi-)Banach space with Kτ

q (D, X) → X.

In the next section we describe the proposed algorithm using the frame-work presented in this section.

42

Page 61: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Weighted thresholding

3 Weighted thresholding

Let D := gkk∈N be a complete dictionary for a Banach space X and letR be the reconstruction operator defined in (A.4). Given ckk∈N ⊂ C, welet π : N → N denote a bijective mapping such that |cπ(k)|k∈N is non-increasing, i.e, cπ(k)k∈N = c∗kk∈N. For f ∈ X, ckk∈N ⊂ C, and m ∈ N,a standard way of constructing an m-term approximant to f from D is bythresholding

fm := fm(π, ckk∈N,D) := Rcπ(k)mk=1 =

m

∑k=1

cπ(k)gπ(k). (A.6)

For all practical purpose we choose ckk∈N ⊂ C as reconstruction coeffi-cients for f such that f = Rckk∈N (cf. Remark 2.1). With this choice, the ap-proximants fmm∈N converge to f as m→ ∞. The thresholding approach in(A.6) chooses the m elements from D corresponding to the m largest of the co-efficients ckk∈N. As mentioned in the introduction, many real world signalshave an inherent structure between the expansion coefficients, which shouldbe accounted for in the approximation procedure. Therefore, we would liketo construct an algorithm which preserves local coherence, such that a smallcoefficient c1 might be preserved, in exchange for a larger (isolated) coeffi-cient c2, if c1 belongs to a neighborhood with many large coefficients. Thisleads us to consider banded Toeplitz matrices.

Let Λ denote a banded non-negative Toeplitz matrix with bandwidth Ω ∈N, i.e., the non-zero entries Λ(i,j) satisfy |i − j| ≤ Ω. We assume the scalaron the diagonal of Λ is positive. Given ckk∈N ⊂ C we define cΛ

k k∈N :=Λ(|ck|k∈N).

Example 3.1. With Ω = 2 we obtain

cΛk k∈N =

λ0 λ1 λ2 0 0 0 · · ·

λ−1 λ0 λ1 λ2 0 0 · · ·λ−2 λ−1 λ0 λ1 λ2 0 · · ·

0 λ−2 λ−1 λ0 λ1 λ2 · · ·...

......

......

.... . .

|c1||c2||c3||c4|

...

,

with λ0 > 0 and λl ≥ 0 for all l ∈ −2,−1, 1, 2. We note that

cΛk = λ−2 |ck−2|+ · · ·+ λ0 |ck|+ · · ·+ λ2 |ck+2| =

k+Ω

∑j=k−Ω

λj−k∣∣cj∣∣ ,

for all k ∈N. 4

43

Page 62: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

Given ckk∈N ⊂ C, we let πΛ : N→N denote a bijective mapping suchthat the sequence cΛ

πΛ(k)k∈N is non-increasing. If Λ is the identity map, wewrite π instead of πΛ to be consistent with the notation of (A.6). We shalluse the following technical result in the proof of Theorem 3.1.

Lemma 3.1. Let Λ be a banded non-negative Toeplitz matrix with bandwidth Ω ∈N and let p ∈ (0, ∞), q ∈ (0, ∞]. There exists a constant C > 0, such that for anynon-increasing sequence ckk∈N ∈ `

pq we have the estimate∥∥∥cπΛ(k)∞

k=m+1

∥∥∥`

pq≤ C

∥∥ck∞k=m+1−Ω

∥∥`

pq

, ∀m ≥ Ω.

Proof. Fix m ∈ N and let a := |cπΛ(k)|∞k=m+1 and b := |ck|∞

k=m+1. If a

contains the same coefficients as b, then there is nothing to prove. If this isnot the case, then there exists k′ ≤ m, with |ck′ | ∈ a, and k′′ ≥ m + 1, with|ck′′ | /∈ a, satisfying

k′+Ω

∑j=k′−Ω

λj−k′∣∣cj∣∣ ≤ k′′+Ω

∑j=k′′−Ω

λj−k′′∣∣cj∣∣⇒ |ck′ | ≤

1λ0

k′′+Ω

∑j=k′′−Ω

λj−k′′∣∣cj∣∣ .

Denoting the maximum value of the λl’s by λmax and the maximum value ofthe

∣∣cj∣∣’s, for j ∈ [k′′ −Ω, k′′ + Ω], by

∣∣ck

∣∣, we thus get

|ck′ | ≤λmax

λ0(2Ω + 1)

∣∣ck

∣∣ .

Since k ∈ [k′′ −Ω, k′′ + Ω], and k′′ ≥ m + 1, then k ∈ [m + 1−Ω, ∞). Weconclude that ck ∈ ck∞

k=m+1−Ω. The lemma follows directly from this ob-servation.

Given f ∈ X, a set of reconstruction coefficients ckk∈N ⊂ C, and m ∈N,we generalize the notation of (A.6) and construct an m-term approximant tof by

f Λm := f Λ

m (πΛ, ckk∈N,D) := RcπΛ(k)mk=1 =

m

∑k=1

cπΛ(k)gπΛ(k). (A.7)

If Λ is the identity map, we just obtain the approximant fm given in (A.6). Ifnot, we obtain an approximant which chooses the elements of gkk∈N corre-sponding to the indices of the m largest of the weighted coefficients cΛ

k k∈N.We generalize the notation of [19] and define weighted thresholding approx-imation spaces as follows.

44

Page 63: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Weighted thresholding

Definition 3.1. Let Λ be a banded non-negative Toeplitz matrix with band-width Ω ∈N. Given α ∈ (0, ∞), q ∈ (0, ∞], we define

T αq (D, X, Λ) :=

f ∈ X

∣∣∣ ‖ f ‖T αq (D,X,Λ) := | f |T α

q (D,X,Λ) + ‖ f ‖X < ∞

,

with

| f |T αq (D,X,Λ) :=

infπΛ ,ckk∈N

(∑∞

m=1[mα∥∥ f − f Λ

m∥∥

X

]q 1m

)1/q, 0 < q < ∞

infπΛ ,ckk∈N

(supm≥1 mα

∥∥ f − f Λm∥∥

X

), q = ∞

When Λ is the identity mapping, we write T αq (D, X) instead of T α

q (D, X, Λ).

Remark 3.1. We note that the expression for | f |T αq (D,X,Λ) cannot be written

using the Lorentz norm, since the sequence ‖ f − f Λm ‖Xm∈N might not be

non-increasing.

In order to prove Theorem 3.1 we need to impose further assumptionson the dictionary D. Given τ ∈ (0, ∞), q ∈ (0, ∞], we call D an atomic de-composition (AD) [16, 17] for X, with respect to `τ

q , if there exists a sequencegkk∈N, in the dual space X′, such that

1. There exist 0 < C′ ≤ C′′ < ∞ with

C′ ‖〈 f , gk〉k∈N‖`τq≤ ‖ f ‖X ≤ C′′ ‖〈 f , gk〉k∈N‖`τ

q, ∀ f ∈ X.

2. The reconstruction operator R given in (A.4) is bounded from `τq onto

X and we have the expansions

R(〈 f , gk〉k∈N) = ∑k∈N

〈 f , gk〉 gk = f , ∀ f ∈ X.

Example 3.2. Standard examples of ADs are Gabor frames for modulationspaces [20] and wavelets for Besov spaces [10]. However, many other ex-amples have been constructed in the general framework of decompositionspaces [1, 14, 15]. For instance curvelets, shearlets and nonstationary Gaborframes (or generalized shift-invariant systems) [1, 28, 40]. 4

We can now present the main result of this article.

Theorem 3.1. Let Λ be a banded non-negative Toeplitz matrix with bandwidth Ω ∈N and let p ∈ (1, ∞), τ ∈ (0, p), q ∈ (0, ∞]. If D = gkk∈N is `

p1−hilbertian

and forms an AD for X, with respect to `τq , then

Kτq (D, X) → T α

q (D, X, Λ) → Aαq (D, X) ,

with α = 1/τ − 1/p > 0.

45

Page 64: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

Proof. The embedding T αq (D, X, Λ) → Aα

q (D, X) follows directly from thedefinitions of these spaces. Let us now show that Kτ

q (D, X) → T αq (D, X, Λ).

Since the Lorentz spaces are rearrangement invariant, we may assume thecanonical coefficients d := dkk∈N := 〈 f , gk〉k∈N form a non-increasingsequence. Given f ∈ Kτ

q (D, X), the `p1−hilbertian property of D implies

‖ f ‖X = ‖Rd‖X ≤ C1 ‖d‖`p1

.

Defining f Λm (πΛ,d,D) = RdπΛ(k)m

k=1 as in (A.7), we thus get

‖ f ‖T αq (D,X,Λ) = inf

πΛ ,ckk∈N

(∞

∑m=1

[mα∥∥∥ f − f Λ

m

∥∥∥X

]q 1m

)1/q

+ ‖ f ‖X

≤(

∑m=1

[mα∥∥∥ f − f Λ

m (πΛ,d,D)∥∥∥

X

]q 1m

)1/q

+ C1 ‖d‖`p1

. (A.8)

Now, since∥∥∥ f − f Λm (πΛ,d,D)

∥∥∥X=∥∥∥RdπΛ(k)∞

k=m+1

∥∥∥X

≤ C1

∥∥∥dπΛ(k)∞k=m+1

∥∥∥`

p1

≤ C1 ‖d‖`p1

,

we get the following estimate for the first Ω terms in (A.8)

Ω

∑m=1

[mα∥∥∥ f − f Λ

m (πΛ,d,D)∥∥∥

X

]q 1m≤ C2 ‖d‖

q`

p1

. (A.9)

For m ≥ Ω + 1, Lemma 3.1 implies∥∥∥ f − f Λm (πΛ,d,D)

∥∥∥X≤ C1

∥∥∥dπΛ(k)∞k=m+1

∥∥∥`

p1

≤ C3∥∥dk∞

k=m+1−Ω∥∥`

p1

= C3

∥∥∥d− dkm−Ωk=1

∥∥∥`

p1

= C3σm−Ω(d,B)`p1,

with B denoting the canonical basis of `p1 . Hence,

∑m=Ω+1

[mα∥∥∥ f − f Λ

m (πΛ,d,D)∥∥∥

X

]q 1m≤ C3

∑m=Ω+1

[mασm−Ω(d,B)`p

1

]q 1m

= C3

∑m=1

[(m + Ω)ασm(d,B)`p

1

]q

m + Ω

≤ C4

∥∥∥σm(d,B)`p1m∈N

∥∥∥q

`1/αq

. (A.10)

46

Page 65: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Weighted thresholding

Combining (A.9) and (A.10), then (A.8) yields

‖ f ‖T αq (D,X,Λ) ≤ C5

(∥∥∥σm(d,B)`p1m∈N

∥∥∥`1/α

q+ ‖d‖`p

1

)= C5 ‖d‖Aα

q(B,`p1)

.

Applying [18, Theorem 3.1] thus yields

‖ f ‖T αq (D,X,Λ) ≤ C6 |d|Kτ

q (B,`p1 )

= C6 ‖d‖`τq≤ C7 ‖ f ‖X . (A.11)

Finally, since D is `p1−hilbertian, Proposition 2.1 states that we can find c ∈ `τ

qwith f = Rc and ‖c‖`τ

q = | f |Kτq (D,X). Combining this with (A.11) we arrive at

‖ f ‖T αq (D,X,Λ) ≤ C7 ‖Rc‖X ≤ C8 ‖c‖`τ

q= C8 | f |Kτ

q (D,X) .

This completes the proof.

Remark 3.2. The assumption in Theorem 3.1 of D being `p1−hilbertian di-

rectly implies the boundedness of R : `τq → X in the definition of an AD

since τ ∈ (0, p). It should also be noted that if Λ is the identity operator,then Theorem 3.1 holds without the assumption of an AD — this was provenin [19, Theorem 6]. However, in contrast to the proof presented here, thereis no constructive way of obtaining the sparse expansion coefficients in theproof of [19, Theorem 6].

The Jackson embedding in Theorem 3.1 is strong in the sense that there isan associated algorithm, which obtains the approximation rate. Given f ∈ X,the algorithm goes as follows:

1. Calculate the canonical coefficients dkk∈N = 〈 f , gk〉k∈N.

2. Construct the weighted coefficients dΛk k∈N according to Λ by

dΛk =

k+Ω

∑j=k−Ω

λj−k∣∣dj∣∣ , k ∈N.

3. Choose πΛ : N→N such that dΛπΛ(k)k∈N is non-increasing.

4. Construct an m-term approximation to f by

f Λm (πΛ, dkk∈N,D) =

m

∑k=1

dπΛ(k)gπΛ(k).

With this construction, Theorem 3.1 states that∥∥∥ f − f Λm (πΛ, dkk∈N,D)

∥∥∥X= O(m−α), ∀ f ∈ Kτ

q (D, X),

with α = 1/τ − 1/p > 0. In the next section we consider the particularcase where the dictionary is a Gabor frame and the smoothness space is amodulation space.

47

Page 66: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

4 Modulation spaces and Gabor frames

In this section we choose X as the modulation space Mp [13], with 1 ≤ p < ∞,consisting of all f ∈ S ′(Rd) satisfying

‖ f ‖Mp :=(∫

Rd

∫Rd|Vγ f (x, y)|p dxdy

)1/p< ∞.

Here, Vγ f (x, y) denotes the short-time Fourier transform of f with respect toa window function γ ∈ S(Rd) \ 0 (it can be shown that Mp is independentof the particular choice of window function [20]). Given g ∈ S(Rd) \ 0 andlattice parameters a, b ∈ (0, ∞), we consider the Gabor system gm,nm,n∈Zd

defined bygm,n(t) := g(t− na)e2πimb·t, t ∈ Rd.

If gm,nm,n is a frame for L2(Rd) (see [5] for details) then there exists a dualframe gm,nm,n such that gm,nm,n forms an AD for Mp with respect to `p

for all 1 ≤ p < ∞ [20]. We have the following version of [21, Proposition 3].

Proposition 4.1. Let Λ be a banded non-negative Toeplitz matrix with bandwidthΩ ∈ N. Let 1 ≤ τ < p < ∞ and g ∈ S(Rd) \ 0. If D = gm,nm,n∈Zd is aGabor frame for L2(Rd), with ‖gm,n‖Mp = 1 for all m, n ∈ Zd, then

Mτ(Rd) = Kττ(D, Mp) → T α

τ (Λ,D, Mp) → Aατ (D, Mp) , α = 1/τ − 1/p,

where the first equality is with equivalent norms.

Proof. We first note that D simultaneously forms an AD for both Mp(Rd) andMτ(Rd). Since D constitutes an AD for Mp(Rd), and p > 1, we get

‖Rc‖Mp ≤ C1 ‖c‖`p ≤ C2 ‖c‖`p1

, ∀c ∈ `p1 ,

which shows that D is a `p1−hilbertian dictionary for Mp. Hence, the embed-

dings in Proposition 4.1 follows from Theorem 3.1. Let us now prove thatMτ(Rd) = Kτ

τ(D, Mp) with equivalent norms. According to Proposition 2.1then

Kττ(D, Mp) =

m∈Zd∑

n∈Zd

cm,ngm,n ∈ Mp∣∣∣ ∥∥∥cm,nm,n∈Zd

∥∥∥`τ

< ∞

with

| f |Kττ (D,Mp) = min

c∈`τ , f=Rc‖c‖`τ , f ∈ Kτ

τ(D, Mp).

Since D constitutes an AD for Mτ(Rd), then for f ∈ Mτ(Rd) we have

f = ∑m∈Zd

∑n∈Zd

〈 f , gm,n〉 gm,n, with∥∥∥〈 f , gm,n〉m,n∈Zd

∥∥∥`τ≤ C ‖ f ‖Mτ .

48

Page 67: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Numerical experiments

As τ < p then Mτ ⊆ Mp (cf. [20]), which shows that f ∈ Kττ(D, Mp) and

| f |Kττ (D,Mp) ≤ C‖ f ‖Mτ . The converse embedding follows from

‖ f ‖Mτ = ‖Rc‖Mτ ≤ C′ ‖c‖`τ = C′| f |Kττ (D,Mp), f ∈ Kτ

τ(D, Mp).

This completes the proof.

In the next section we apply the proposed method for denoising musicsignals with elements from a Gabor dictionary. For an introduction to Gabortheory in the finite settings, we refer the reader to [35, 37].

5 Numerical experiments

For the implementation we use MATLAB 2017B and apply the routines fromthe following two toolboxes: The "Large time-frequency analysis toolbox"(LTFAT) version 2.2.0 [31] avaliable from http://ltfat.sourceforge.net/and the StrucAudioToolbox [33] available from http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html. All Gabor transforms are con-structed using 1024 frequency channels, a hop size of 256, and a Hanningwindow of length 1024 (this is the default settings in the StrucAudioTool-box). These settings lead to transforms of redundancy of four, meaning thereare four times as many time-frequency coefficients as signal samples. Themusic signals we consider are part of the EBU-SQAM database [39], whichconsists of 70 test sounds sampled at 44 kHz. The database contains a largevariety of different music sounds including single instruments, vocal, and or-chestra. We measure the reconstruction error of an algorithm by the relativeroot mean square (RMS) error

RMS( f , frec) :=‖ f − frec‖2‖ f ‖2

.

We begin by analyzing the first 524288 samples of signal 8 in the EBU-SQAMdatabase, which consists of an increasing melody of 10 tones played by aviolin. A noisy version of the signal is constructed by adding white Gaussiannoise and the resulting spectrograms can be found in Fig. A.1.

For the task of denoising we first compare the greedy thresholding ap-proach from nonlinear approximation theory (cf. (A.6)) with the Windowed-Group-Lasso (WGL) from social sparsity. For the WGL we use the defaultsettings of the StrucAudioToolbox, which applies a horizontal asymmetricneighborhood for the shrinkage operator, see [33] for further details. TheWGL constructs a denoised version of the spectrogram using only 74739 non-zero coefficients (out of a total of 1050624 coefficients). Using the same num-ber of non-zero coefficients for the greedy thresholding approach we obtainthe spectrograms shown in Fig. A.2.

49

Page 68: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

Clean spectrogram

0 5 10Time (s)

0

5000

10000

15000

Freq

uency

(H

z)Noisy spectrogram

0 5 10Time (s)

0

5000

10000

15000

Freq

uency

(H

z)Fig. A.1: Clean and noisy spectrograms of violin music.

Greedy thresholding

0 5 10Time (s)

0

5000

10000

15000

Frequency

(H

z)

Windowed-Group-Lasso

0 5 10Time (s)

0

5000

10000

15000

Frequency

(H

z)

Fig. A.2: Greedy thresholding and the WGL with 74739 non-zero coefficients.

We note from Fig. A.2 that the greedy thresholding approach includesmore coefficients at higher frequencies than the WGL. On the other hand, theWGL includes more coefficients at lower frequencies, resulting in a smootherresolution for the fundamental frequencies and the first harmonics. This illus-trates the way the WGL is designed, namely that a large isolated coefficientmay be discarded in exchange for a smaller coefficient with large neighbors.The RMS error is ≈ 0.084 for the WGL and ≈ 0.034 for the greedy threshold-ing algorithm. To visualize the performance of the proposed algorithm wechoose a rather extreme (horizontal) weight with

cΛm,n = |cm,n−2|+ |cm,n−1|+ |cm,n|+ |cm,n+1|+ |cm,n+2| . (A.12)

We then choose the 74739 coefficients with largest weighted magnitudes ac-cording to (A.12). In Fig. A.3 we have compared the resulting spectrogramagainst the spectrogram obtained using the greedy thresholding approach.

50

Page 69: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Numerical experiments

Greedy thresholding

0 5 10Time (s)

0

5000

10000

15000

Freq

uency

(H

z)Weighted thresholding

0 5 10Time (s)

0

5000

10000

15000

Freq

uency

(H

z)Fig. A.3: Greedy thresholding and weighted thresholding with 74739 non-zero coefficients.

We can see from Fig. A.3 that the horizontal weight in (A.12) enforces thestructures at higher frequencies even further than the greedy thresholdingapproach. This might be desirable for some applications as the timbre ofthe instrument is determined by the harmonics. The RMS error associatedwith the weighted thresholding approach is ≈ 0.074, which is higher thanfor greedy thresholding but lower than the WGL. Applying a more moderateweight (for instance Weight 2 defined in (A.13) below) we obtain an RMSerror of ≈ 0.031, which is lower than for greedy thresholding.

We now extend the experiment described above to the entire EBU-SQAMdatabase. For the weighted thresholding we consider the following threeweights

Weight 1: cΛm,n = |cm,n|+ (|cm−1,n|+ |cm+1,n|)/2,

Weight 2: cΛm,n = |cm,n|+ (|cm,n−1|+ |cm,n+1|)/2, (A.13)

Weight 3: cΛm,n = |cm,n|+ (|cm,n−1|+ |cm−1,n|+ |cm,n+1|+ |cm+1,n|)/4.

For each of the 70 test sounds in the EBU-SQAM database, we applythe WGL and calculate the associated RMS reconstruction error and numberof non-zero coefficients. Using the same number of non-zero coefficients,we then apply the greedy thresholding approach and the three weightedthresholding approaches defined in (A.13). The resulting averaged valuescan be found in in Table A.1.

Table A.1: Average RMS errors over the EBU-SQAM database for the WGL, the greedy thresh-olding approach, and the three weighted thresholding approaches defined in (A.13).

Algorithm: WGL Greedy Weight 1 Weight 2 Weight 3Average RMS error.: 0.1031 0.0462 0.0511 0.0453 0.0472

51

Page 70: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper A.

The average number of coefficients was 68983, which corresponds to 6.5%of the total number of coefficients. We note that the RMS error associatedwith the WGL is roughly twice as large as for the various thresholding algo-rithms. We also note that the smallest RMS error is obtained by the weightedthresholding approach, which applies a horizontal weight (Weight 2). This islikely due to the horizontal structure of the harmonics as seen in Fig. A.1.

Let us mention that it might be possible to reduce the error for the WGLby tuning the parameters instead of using the default settings (cf. [34] for adetailed analysis of the parameter settings for the WGL). On the other hand,the same holds true for the thresholding algorithms. In the experiment de-scribed above we have used the same number of non-zero coefficients as waschosen by the WGL. This is not likely to be optimal for the thresholding algo-rithms since the optimal number of non-zero coefficients usually depends onthe sparsity of the signal (few coefficients for sparse signals and vice versa).Finally, we have not addressed the resulting audio quality of the denoisedsounds. In general, it is very hard to decide which algorithm sounds "thebest" as this depends on the application and the subjective opinion of the lis-tener. However, there are indeed audible differences between the algorithms.For the violin music in Fig. A.1, the WGL does the best job of removingthe noise, but at the price of a poor timbre of the resulting sound. As weinclude more coefficients at the higher frequencies, the original timbre of theinstrument improves together with an increase in noise.

6 Conclusion

We have presented a new thresholding algorithm and proven an associatedstrong Jackson embedding under rather general conditions. The algorithmextends the classical greedy approach by incorporating a weight function,which exploits the structure of the expansion coefficients. In particular, the al-gorithm applies to approximation in modulation spaces using Gabor frames.As an application we have considered the task of denoising music signalsand compared the proposed method with the greedy thresholding approachand the WGL from social sparsity. The numerical experiments show that theproposed method can be used both for improving the time-frequency resolu-tion and for reducing the RMS error compared to the other two algorithms.The experiments also show that the performance of the algorithm dependscrucially on the choice of weight function, which should be adapted to theparticular signal class under consideration.

52

Page 71: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

References

[1] L. Borup and M. Nielsen. Frame decomposition of decompositionspaces. J. Fourier Anal. Appl., 13(1):39–70, 2007.

[2] P. L. Butzer and K. Scherer. Jackson and Bernstein-type inequalities forfamilies of commutative operators in Banach spaces. J. ApproximationTheory, 5:308–342, 1972.

[3] M. a. J. Carro, J. A. Raposo, and J. Soria. Recent developments in thetheory of Lorentz spaces and weighted inequalities. Mem. Amer. Math.Soc., 187(877):xii+128, 2007.

[4] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decompositionby basis pursuit. SIAM Rev., 43(1):129–159, Jan. 2001.

[5] O. Christensen. An introduction to frames and Riesz bases. Applied and Nu-merical Harmonic Analysis. Birkhäuser/Springer, [Cham], second edi-tion, 2016.

[6] C. Darken, M. Donahue, L. Gurvits, and E. Sontag. Rate of approxima-tion results motivated by robust neural network learning. In Proceedingsof the Sixth Annual Conference on Computational Learning Theory, COLT ’93,pages 303–309, New York, NY, USA, 1993. ACM.

[7] G. Davis, S. Mallat, and M. Avellaneda. Adaptive greedy approxima-tions. Constructive Approximation, 13(1):57–98, Mar 1997.

[8] R. A. DeVore. Nonlinear approximation. In Acta numerica, 1998, volume 7of Acta Numer., pages 51–150. Cambridge Univ. Press, Cambridge, 1998.

[9] R. A. DeVore. Nonlinear approximation and its applications. In Mul-tiscale, nonlinear and adaptive approximation, pages 169–201. Springer,Berlin, 2009.

[10] R. A. DeVore, B. Jawerth, and V. Popov. Compression of wavelet decom-positions. Amer. J. Math., 114(4):737–785, 1992.

[11] R. A. DeVore and G. G. Lorentz. Constructive approximation, volume 303of Grundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, 1993.

[12] R. A. DeVore and V. N. Temlyakov. Some remarks on greedy algorithms.Advances in Computational Mathematics, 5(1):173–187, Dec 1996.

[13] H. G. Feichtinger. Modulation spaces on locally compact abelian groups.Technical report, University of Vienna, 1983.

53

Page 72: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[14] H. G. Feichtinger. Banach spaces of distributions defined by decompo-sition methods. II. Math. Nachr., 132:207–237, 1987.

[15] H. G. Feichtinger and P. Gröbner. Banach spaces of distributions definedby decomposition methods. I. Math. Nachr., 123:97–120, 1985.

[16] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related to in-tegrable group representations and their atomic decompositions. I. J.Funct. Anal., 86(2):307–340, 1989.

[17] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related tointegrable group representations and their atomic decompositions. II.Monatsh. Math., 108(2-3):129–148, 1989.

[18] R. Gribonval and M. Nielsen. Some remarks on non-linear approxima-tion with Schauder bases. East J. Approx., 7(3):267–285, 2001.

[19] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. I. Direct estimates. J. Fourier Anal. Appl., 10(1):51–71, 2004.

[20] K. Gröchenig. Foundations of time-frequency analysis. Applied and Nu-merical Harmonic Analysis. Birkhäuser Boston, Inc., Boston, MA, 2001.

[21] K. Gröchenig and S. Samarah. Nonlinear approximation with localFourier bases. Constr. Approx., 16(3):317–331, 2000.

[22] R. S. Ismagilov. Diameters of sets in normed linear spaces, and theapproximation of functions by trigonometric polynomials. Uspehi Mat.Nauk, 29(3(177)):161–178, 1974.

[23] M. Kowalski. Sparse regression using mixed norms. Appl. Comput. Har-mon. Anal., 27(3):303–324, 2009.

[24] M. Kowalski, K. Siedenburg, and M. Dörfler. Social sparsity! neighbor-hood systems enrich structured shrinkage operators. IEEE Transactionson Signal Processing, 61(10):2498–2511, May 2013.

[25] M. Kowalski and B. Torrésani. Sparsity and persistence: mixed normsprovide simple signal models with dependent coefficients. Signal, Imageand Video Processing, 3(3):251–264, Sep 2009.

[26] M. Kowalski and B. Torrésani. Structured sparsity: from mixed normsto structured shrinkage. In R. Gribonval, editor, SPARS’09 - Signal Pro-cessing with Adaptive Sparse Structured Representations, Saint Malo, France,Apr. 2009. Inria Rennes - Bretagne Atlantique.

[27] K. I. Oskolkov. Polygonal approximation of functions of two variables.Mat. Sb. (N.S.), 107(149)(4):601–612, 639, 1978.

54

Page 73: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[28] E. S. Ottosen and M. Nielsen. A characterization of sparse nonstationaryGabor expansions. Journal of Fourier Analysis and Applications, May 2017.

[29] P. P. Petrushev. Direct and converse theorems for spline and rational ap-proximation and Besov spaces. In Function spaces and applications (Lund,1986), volume 1302 of Lecture Notes in Math., pages 363–377. Springer,Berlin, 1988.

[30] M. D. Plumbley, T. Blumensath, L. Daudet, R. Gribonval, and M. E.Davies. Sparse representations in audio and music: From coding tosource separation. Proceedings of the IEEE, 98(6):995–1005, June 2010.

[31] Z. Pruša, P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Bal-azs. The Large Time-Frequency Analysis Toolbox 2.0. In M. Aramaki,O. Derrien, R. Kronland-Martinet, and S. Ystad, editors, Sound, Music,and Motion, Lecture Notes in Computer Science, pages 419–442. SpringerInternational Publishing, 2014.

[32] E. Schmidt. Zur Theorie der linearen und nichtlinearen Integralgleichun-gen. Math. Ann., 63(4):433–476, 1907.

[33] K. Siedenburg and M. Dörfler. Structured sparsity for audio signals. InProc. of the 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris,France, 2011.

[34] K. Siedenburg and M. Dörfler. Audio denoising by generalized time-frequency thresholding. In Proceedings of the AES 45th Conference on Ap-plications of Time-Frequency Processing, Helsinki, Finland, 2012.

[35] P. L. Søndergaard. Finite discrete Gabor analysis. PhD thesis, TechnicalUniversity of Denmark, 2007.

[36] S. B. Stechkin. On absolute convergence of orthogonal series. Dokl. Akad.Nauk SSSR (N.S.), 102:37–40, 1955.

[37] T. Strohmer. Numerical algorithms for discrete Gabor expansions. InGabor analysis and algorithms, Appl. Numer. Harmon. Anal., pages 267–294. Birkhäuser Boston, Boston, MA, 1998.

[38] R. Tibshirani. Regression shrinkage and selection via the lasso. Journalof the Royal Statistical Society (Series B), 58:267–288, 1996.

[39] E. B. Union. Sound quality assessment material: Recordings for sub-jective tests ; User’s handbook for the EBU - SQAM compact disc, Sep2008.

[40] F. Voigtlaender and A. Pein. Analysis vs. synthesis sparsity for α-shearlets. ArXiv preprint, arXiv:1702.03559v1, 2017.

55

Page 74: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

56

Page 75: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B

A Characterization of Sparse Nonstationary GaborExpansions

Emil Solsbæk Ottosen and Morten Nielsen

The paper has been accepted for publication inJournal of Fourier Analysis and Applications, 2017.

Page 76: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

c© 2017 SpringerThe layout has been revised.

Page 77: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

1. Introduction

Abstract

We investigate the problem of constructing sparse time-frequency representationswith flexible frequency resolution, studying the theory of nonstationary Gabor framesin the framework of decomposition spaces. Given a painless nonstationary Gaborframe, we construct a compatible decomposition space and prove that the nonstation-ary Gabor frame forms a Banach frame for the decomposition space. Furthermore,we show that the decomposition space norm can be completely characterized by asparseness condition on the frame coefficients and we prove an upper bound on theapproximation error occurring when thresholding the frame coefficients for signalsbelonging to the decomposition space.

1 Introduction

Redundant Gabor frames play an essential role in time-frequency (TF) analy-sis as these frames provide expansions with good TF resolution [6, 27]. Gaborframes are based on translation and modulation of a single window functionaccording to lattice parameters which largely determine the redundancy ofthe frame. By varying the support of the window function one can changethe overall resolution of the frame, but it is in general not possible to changethe resolution in specific regions of the TF plane. For signals with varying TFcharacteristics, a fixed resolution is often undesirable. To overcome this prob-lem, the usage of multi-window Gabor frames has been proposed [10, 34, 42, 43].As opposed to standard Gabor frames, multi-window Gabor frames use awhole catalogue of window functions of different shapes and sizes to createadaptive representations. A recent example is the nonstationary Gabor frames(NSGFs) which have shown great potential in capturing the essential TF in-formation of music signals [1, 11, 32, 33]. These frames use different windowfunctions along either the time- or the frequency axes and guarantee perfectreconstruction and an FFT-based implementation in the painless case. Orig-inally, NSGFs were studied by Hernández, Labate & Weiss [30] and laterby Ron & Shen [40] who named them generalized shift-invariant systems. Wechoose to work with the terminology introduced in [1] as we will only con-sider frames in the painless case for which several practical implementationshave been constructed under the name of NSGFs [1, 11, 32]. We considerpainless NSGFs with flexible frequency resolution, corresponding to a sam-pling grid in the TF plane which is irregular over frequency but regular overtime at each fixed frequency position. This construction is particularly use-ful in connection with music signals since the NSGF can be set to coincidewith the semitones used in Western music. Based on the nature of musicaltones [9, 38], we expect music signals to permit sparse expansions relative tothe redundant NSGF dictionaries.

59

Page 78: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

The main contribution of this paper is a theoretical characterization of thesignals with sparse expansions relative to the NSGF dictionaries. By a sparseexpansion we mean an expansion for which the original signal can be approx-imated at a certain rate by thresholding the expansion coefficients. To provesuch a characterization, we follow the approach in [22, 23, 28] and searchfor a smoothness space compatible with the structure of the frame. Classi-cal smoothness spaces such as modulation spaces [15] or Besov spaces [41]cannot be expected to be linked with sparse expansions relative to the NSGFdictionaries since these smoothness spaces are not compatible with the flex-ible frequency resolution of the NSGFs. Modulation spaces correspond to auniform partition of the frequency domain while Besov spaces correspondto a dyadic partition. Therefore, we study NSGFs in the framework of de-composition spaces. Decomposition spaces were introduced by Feichtinger& Gröbner in [17], and further studied by Feichtinger in [16], and form a largeclass of function spaces on Rd including smoothness spaces such as modu-lation spaces, Besov spaces, and the intermediate α−modulation spaces asspecial cases [2, 3, 25]. We construct the decomposition spaces using struc-tured coverings, as introduced by Borup & Nielsen in [3], which leads toa partition of the frequency domain obtained by applying invertible affinetransformations Ak(·) + ckk∈N on a fixed set Q ⊂ Rd.

Given a painless NSGF, we provide a method for constructing a compat-ible structured covering and the associated decomposition space. We thenshow that the NSGF forms a Banach frame for the decomposition space andprove that signals belong to the decomposition space if and only if they per-mit sparse frame expansions. Based on the sparse expansions, we prove anupper bound on the approximation error occurring when thresholding theframe coefficients for signals belonging to the decomposition space. All theseresults are based on the characterization given in Theorem 5.1 which is themain contribution of this article. This theorem yields the existence of con-stants 0 < C1, C2 < ∞ such that all signals f , belonging to the decompositionspace D(Q, Lp, `q

ωs), satisfy

C1 ‖ f ‖D(Q,Lp ,`qωs )≤∥∥∥∥⟨ f , hp

T,n

⟩T,n

∥∥∥∥d(Q,`p ,`q

ωs )≤ C2 ‖ f ‖D(Q,Lp ,`q

ωs ),

with hpT,nT,n denoting Lp−normalized elements from the NSGF and

d(Q, `p, `qωs) an associated sequence space. In this way we completely char-

acterize the decomposition space using the frame coefficients from the NSGF.The outline of the article is as follows. In Section 2 we define decomposi-

tion spaces based on structured coverings and in Section 3 we define NSGFsin the notation of [1]. We construct the compatible decomposition space inSection 4 and in Section 5 we prove Theorem 5.1. In Section 6 we show that

60

Page 79: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Decomposition spaces

the NSGF forms a Banach frame for the compatible decomposition space andin Section 7 we provide the link to nonlinear approximation theory.

Let us now introduce some of the notation used throughout this arti-cle. We let f (ξ) :=

∫Rd f (x)e−2πix·ξ dx denote the Fourier transform with the

usual extension to L2(Rd). By F G we mean that there exist two constants0 < C1, C2 < ∞ such that C1F ≤ G ≤ C2F. For two (quasi-)normed vectorspaces X and Y, X → Y means that X ⊂ Y and ‖ f ‖Y ≤ C ‖ f ‖X for someconstant C and all f ∈ X. We say that a non-empty open set Ω′ ⊂ Rd iscompactly contained in an open set Ω ⊂ Rd if Ω′ ⊂ Ω and Ω′ is compact. Wedenote the matrix norm max

∣∣aij∣∣ by ‖A‖`∞(Rd×d) and we call ξii∈I ⊂ Rd

a δ−separated set if infj,k∈I ,j 6=k ‖ξ j − ξk‖2 = δ > 0. Finally, by Id we denotethe identity operator on Rd and by χQ we denote the indicator function for aset Q ⊂ Rd.

2 Decomposition spaces

In order to construct decomposition spaces, we first need the notion of astructured covering with an associated bounded admissible partitions ofunity (BAPU) as defined in Section 2.1. A BAPU defines a (flexible) parti-tion of the frequency domain corresponding to the structured covering. Weuse the notation of [3] but with slightly modified definitions for both thestructured coverings and the BAPUs.

2.1 Structured covering and BAPU

For an invertible matrix A ∈ GL(Rd), and a constant c ∈ Rd, we define theaffine transformation

Tξ := Aξ + c, ξ ∈ Rd.

For a subset Q ⊂ Rd we let QT := T(Q), and for notational convenience wedefine |T| := |det(A)|. Given a family T = Ak(·) + ckk∈N of invertibleaffine transformations on Rd, and a subset Q ⊂ Rd, we set Q := QTT∈Tand

T :=

T′ ∈ T∣∣ QT′ ∩QT 6= ∅

, T ∈ T . (B.1)

We say thatQ is an admissible covering of Rd if⋃

T∈T QT = Rd and there existsn0 ∈ N such that |T| ≤ n0 for all T ∈ T . We note that the (minimal) numbern0 is the degree of overlap between the sets constituting the covering.

Definition 2.1 (Q−moderate weight). Let Q := QTT∈T be an admissiblecovering. A function u : Rd → (0, ∞) is called Q−moderate if there existsC > 0 such that u(x) ≤ Cu(y) for all x, y ∈ QT and all T ∈ T . AQ−moderateweight (derived from u) is a sequence ωTT∈T := u(ξT)T∈T with ξT ∈ QTfor all T ∈ T .

61

Page 80: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

For the rest of this article we shall use the explicit choice u(ξ) := 1+ ‖ξ‖2for the function u in Definition 2.1. We now define the concept of a structuredcovering, first considered in [3]. To ensure that the resulting decompositionspaces are complete, we consider an extended version of the definition givenin [3].

Definition 2.2 (Structured covering). Given a family T = Ak(·) + ckk∈N

of invertible affine transformations on Rd, suppose there exist two boundedopen sets P ⊂ Q ⊂ Rd, with P compactly contained in Q, such that

1. PTT∈T and QTT∈T are admissible coverings.

2. There exists K > 0, such that∥∥∥A−1

k′ Ak

∥∥∥`∞(Rd×d)

≤ K holds whenever

(Ak′Q + ck′) ∩ (AkQ + ck) 6= ∅.

3. There exists K∗ > 0, such that∥∥∥A−1

k

∥∥∥`∞(Rd×d)

≤ K∗ holds for all k ∈N.

4. There exists a δ−separated set ξTT∈T ⊂ Rd, with ξT ∈ QT for allT ∈ T , such that ωTT∈T := u(ξT)T∈T is a Q−moderate weight.

5. There exists γ > 0, such that |QT | ≤ ωγT for all T ∈ T .

Then we call Q = QTT∈T a structured covering.

Remark 2.1. Definition 2.2(3)-(5) are new additions compared to the defini-tion given in [3] and are necessary for proving Theorem 2.1 page 65. We notethat Definition 2.2(2) implies |QT′ | |QT | uniformly for all T ∈ T and allT′ ∈ T, and Definition 2.2(3) implies a uniform lower bound on |QT |.

For a structured covering we have the associated concept of a BAPU, firstconsidered in [3, 17]. With a small modification of the proof of [3, Proposition1] we have the following result.

Proposition 2.1. Given a structured covering Q = QTT∈T , there exists a familyof non-negative functions ψTT∈T ⊂ C∞

c (Rd) satisfying

1. supp(ψT) ⊂ QT for all T ∈ T .

2. ∑T∈T

ψT(ξ) = 1 for all ξ ∈ Rd.

3. supT∈T|QT |1/p−1

∥∥∥F−1ψT

∥∥∥Lp

< ∞ for all 0 < p ≤ 1.

4. For all α ∈Nd0, there exists Cα > 0 such that |∂αψT(ξ)| ≤ CαχQT (ξ), for all

ξ ∈ Rd and all T ∈ T .

We say that ψTT∈T is a BAPU subordinate to Q.

62

Page 81: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Decomposition spaces

Remark 2.2. Proposition 2.1(3) is necessary to ensure that the decompositionspaces under consideration will be well-defined for 0 < p < 1. This case is ofspecific interest since it plays an essential role in connection with nonlinearapproximation theory (cf. Section 7).

Remark 2.3. Proposition 2.1(4) is a new addition compared to [3, Proposition1] and is necessary for proving Theorem 2.1 page 65. The proof of Proposition2.1(4) follows easily from the arguments in the proof of [3, Proposition 1]and Definition 2.2(3). Finally, it should be noted that the assumptions inDefinition 2.2(4)-(5) are not necessary for proving Proposition 2.1, however,these assumptions are needed for the proof of Theorem 2.1.

The proof of [3, Proposition 1] is constructive and provides a methodfor constructing the associated BAPU. Given a structured covering QTT∈T(with P being compactly contained in Q), the method goes as follows:

1. Pick a non-negative function Φ ∈ C∞c (Rd) with Φ(ξ) = 1 for all ξ ∈ P

and supp(Φ) ⊂ Q.

2. For all T ∈ T , define

ψT(ξ) =Φ(T−1ξ)

∑T′∈T Φ(T′−1ξ).

3. Then ψTT∈T is a BAPU subordinate to Q = QTT∈T .

In the next section we define decomposition spaces based on structured cov-erings.

2.2 Definition of decomposition spaces

Given a structured covering Q = QTT∈T with corresponding Q−moderateweight ωTT∈T and BAPU ψTT∈T . For s ∈ R and 0 < q ≤ ∞, we definethe associated weighted sequence space `

qωs(T ) as the sequences of complex

numbers aTT∈T satisfying

‖aTT∈T ‖`qωs

:= ‖ωsTaTT∈T ‖`q < ∞.

Given aTT∈T ∈ `qωs(T ), we define a+T T∈T by a+T := ∑T′∈T aT′ . Since

ωTT∈T is Q−moderate, aTT∈T → a+T T∈T defines a bounded operatoron `

qωs(T ) [17, Remark 2.13 and Lemma 3.2]. Denoting its operator norm by

C+, we have∥∥∥a+T

T∈T

∥∥∥`

qωs≤ C+

∥∥aTT∈T∥∥`

qωs

, ∀ aTT∈T ∈ `qωs(T ). (B.2)

63

Page 82: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

We will use (B.2) several times throughout this article. Using the notationof [3] we define the Fourier multiplier ψT(D) by

ψT(D) f := F−1(ψTF f ), f ∈ L2(Rd).

Combining Proposition 2.1(3) with Lemma A.2 page 81 and [3, Lemma 1] wecan show the existence of a uniform constant C > 0 such that all band-limitedfunctions f ∈ Lp(Rd) satisfy

‖ψT(D) f ‖Lp ≤ C ‖ f ‖Lp ,

for all T ∈ T and all 0 < p ≤ ∞. That is, ψT(D) extends to a boundedoperator on the band-limited functions in Lp(Rd), uniformly in T ∈ T . Letus now give the definition of decomposition spaces on the Fourier side.

Definition 2.3 (Decomposition space). Let Q = QTT∈T be a structuredcovering of Rd with corresponding Q−moderate weight ωTT∈T and sub-ordinate BAPU ψTT∈T . For s ∈ R and 0 < p, q < ∞, we define the decom-position space D(Q, Lp, `q

ωs) as the set of distributions f ∈ S ′(Rd) satisfying

‖ f ‖D(Q,Lp ,`qωs )

:=∥∥‖ψT(D) f ‖LpT∈T

∥∥`

qωs

< ∞.

Remark 2.4. According to [17, Theorem 3.7], two different BAPUs yield thesame decomposition space with equivalent norms so D(Q, Lp, `q

ωs) is in factwell defined and independent of the choice of BAPU. Actually, the resultsin [17] show that decomposition spaces are invariant under certain geometricmodifications of the covering Q, but we will not go into detail here.

Remark 2.5. In their most general form, decomposition spaces D(Q, B, Y) areconstructed using a local component B and a global component Y [17]. This con-struction is similar to the construction of Wiener amalgam spaces W(B, C)with local component B and global component C [14, 29, 39]. However,Wiener amalgam spaces are based on bounded uniform partitions of unity,which corresponds to a uniform upper bound on the size of the membersof the covering. We do not find such an assumption natural in relation toNSGFs (cf. Section 3) and have therefore chosen the more general frameworkof decomposition spaces.

In Theorem 2.1 below we prove that D(Q, Lp, `qωs) is in fact a (quasi-

)Banach space. Before presenting this result, let us first consider some ex-amples of familiar decomposition spaces. By standard arguments, one caneasily show that D(Q, L2, `2) = L2(Rd) with equivalent norms for any struc-tured covering Q. The next two examples are not as straightforward anddemand some structure on the covering. Recall that ξTT∈T denotes theδ−separated set from Definition 2.2(4).

64

Page 83: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Nonstationary Gabor frames

Example 2.1 (Modulation spaces). Let Q ⊂ Rd be an open cube with center0 and side length r > 1. Define T := Tkk∈Zd , with Tkξ := ξ − k, and setξTk := k for all k ∈ Zd. With Q := QTT∈T then D(Q, Lp, `q

ωs) = Msp,q(R

d)for s ∈ R and 0 < p, q < ∞, see [15, Section 4] for further details. 4

Example 2.2 (Besov spaces). Let E2 := ±1,±2, E1 := ±1 and E :=Ed

2 \ Ed1 . For j ∈N and k ∈ E define cj,k := 2j(v(k1), . . . , v(kd)), where

v(x) := sgn(x) ·

1/2 for x = ±13/2 for x = ±2

Let Q ⊂ Rd be an open cube with center 0 and side length r > 2. DefineT := I, Tj,kj∈N,k∈E, with Tj,kξ := 2jξ + cj,k, and set ξTj,k := cj,k for all j ∈ N

and k ∈ E. With Q := QTT∈T then D(Q, Lp, `qωs) = Bs

p,q(Rd) for s ∈ R and

0 < p, q < ∞, see [41, Section 2.5.4] for further details. 4

Let us now study some important properties of decomposition spaces, inparticular completeness.

Theorem 2.1. Given a structured covering Q = QTT∈T with Q−moderateweight ωTT∈T and subordinate BAPU ψTT∈T . For s ∈ R and 0 < p, q < ∞,

1. S(Rd) → D(Q, Lp, `qωs) → S ′(Rd).

2. D(Q, Lp, `qωs) is a (quasi-)Banach space (Banach space if 1 ≤ p, q < ∞).

3. S(Rd) is dense in D(Q, Lp, `qωs).

Remark 2.6. As was pointed out in [21], the definition of decompositionspaces given in [3] cannot guarantee completeness in the general case. How-ever in [4], this problem was fixed by imposing certain weight conditions onthe structured covering. Our proof of Theorem 2.1 is based on the approachtaken in [4].

In Appendix A we have provided a sketch of the proof for Theorem 2.1.The underlying ideas for the proof are similar to those of [4, Proposition 5.2]and several references are made to results in [4]. However, in [4] the authorsconsidered only coverings made up from open balls and not all argumentscarry over to the general case of an arbitrary structured covering.

3 Nonstationary Gabor frames

In this section we define nonstationary Gabor frames with flexible frequencyresolution using the notation of [1]. Given a set of window functionshmm∈Zd in L2(Rd), with corresponding time sampling steps am > 0, form, n ∈ Zd we define atoms of the form

65

Page 84: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

hm,n(x) := hm(x− nam), x ∈ Rd.

The choice of Zd as index set for m is only a matter of notational convenience;any countable index set would do. If ∑m,n |〈 f , hm,n〉|2 ‖ f ‖2

2 for all f ∈L2(Rd), we refer to hm,nm,n as a nonstationary Gabor frame (NSGF). For anNSGF hm,nm,n, the frame operator

S f = ∑m,n∈Zd

〈 f , hm,n〉 hm,n, f ∈ L2(Rd),

is invertible and we have the expansions

f = ∑m,n∈Zd

〈 f , hm,n〉 hm,n, f ∈ L2(Rd),

with hm,nm,n := S−1hm,nm,n being the canonical dual frame of hm,nm,n.An NSGF with flexible frequency resolution corresponds to a grid in thetime-frequency plane which is irregular over frequency but regular over timeat each frequency position. This property allows for adaptive time-frequencyrepresentations as opposed to standard Gabor frames. According to [1, Corol-lary 2], we have the following important result for NSGFs with band-limitedwindow functions.

Theorem 3.1. Let hmm∈Zd ⊂ L2(Rd) with time sampling steps amm∈Zd ,am > 0 for all m ∈ Zd. Assuming supp(hm) ⊆ [0, 1

am]d + bm, with bm ∈ Rd for

all m ∈ Zd, the frame operator for the system

hm,n(x) = hm(x− nam), ∀m, n ∈ Zd, x ∈ Rd,

is given by

S f (x) =

(F−1

(∑

m∈Zd

1ad

m

∣∣∣hm

∣∣∣2) ∗ f

)(x), f ∈ L2(Rd).

The system hm,nm,n∈Zd constitutes a frame for L2(Rd), with frame-bounds0 < A ≤ B < ∞, if and only if

A ≤ ∑m∈Zd

1ad

m

∣∣∣hm(ξ)∣∣∣2 ≤ B, for a.e. ξ ∈ Rd, (B.3)

and the canonical dual frame is then given by

hm,n(x) = F−1

hm

∑l∈Zd1ad

l

∣∣∣hl

∣∣∣2 (x− nam), x ∈ Rd. (B.4)

66

Page 85: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Nonstationary Gabor frames

Remark 3.1. We note that the canonical dual frame in (B.4) posses the samestructure as the original frame, which is a property not shared by generalNSGFs. We also note that the canonical tight frame can be obtained by takingthe square root of the denominator in (B.4).

Traditionally, an NSGF satisfying the assumptions of Theorem 3.1 is calleda painless NSGF, referring to the fact that the frame operator is simply a mul-tiplication operator (in the frequency domain) and therefore easily invertible.This terminology is adopted from the classical painless nonorthogonal expan-sions [7], which corresponds to the painless case for standard Gabor frames.

By slight abuse of notation we use the term "painless" to denote the NSGFssatisfying Definition 3.1 below. In order to properly formulate this definition,we first need some preliminary notation. Let hmm∈Zd ⊂ L2(Rd) satisfy theassumptions in Theorem 3.1. Given C∗ > 0 we denote by Imm∈Zd the opencubes

Im :=(−εm,

1am

+ εm

)d+ bm, m ∈ Zd, (B.5)

with εm := C∗/am for all m ∈ Zd. We note that supp(hm,n) ⊂ Im for allm, n ∈ Zd. For m ∈ Zd we define

m :=

m′ ∈ Zd ∣∣ Im′ ∩ Im 6= ∅

,

using the notation of (B.1). With this definition, |m| denotes the number ofcubes overlapping with Im. Finally, we recall the choice u(ξ) := 1 + ‖ξ‖2 forthe function u in Definition 2.1.

Definition 3.1 (Painless NSGF). Let hmm∈Zd ⊂ S(Rd) satisfy the assump-tions in Theorem 3.1 and assume that

1. hmm∈Zd ⊂ C∞c (Rd) and for β ∈Nd

0 there exists Cβ > 0, such that

supξ∈Rd

∣∣∣∂βξ hm(ξ)

∣∣∣ ≤ Cβad/2+|β|m , for all m ∈ Zd.

2. supm∈Zd am := a < ∞.

3. There exists C∗ > 0 and n0 ∈ N, such that the open cubes Imm∈Zd

satisfy |m| ≤ n0 and am′ am uniformly for all m ∈ Zd and all m′ ∈ m.

4. The centerpoints bmm∈Zd forms a δ−separated set and the sequenceωmm∈Zd := u(bm)m∈Zd constitutes a Imm∈Zd−moderate weight.

5. There exists γ > 0 such that |Im| ≤ ωγm for all m ∈ Zd.

Then we refer to hm,nm,n∈Zd as a painless NSGF.

67

Page 86: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

Remark 3.2. Definition 3.1(2) implies a uniform lower bound on |Im| andDefinition 3.1(4) guarantees a minimum distance between the center of thecubes. Furthermore, Definition 3.1(3) implies that each cube Im has at mostn0 overlap with other cubes and that the side-length of Im is equivalent to theside-length of any overlapping cube.

Remark 3.3. The assumptions in Definition 3.1 are natural in relation to de-composition spaces and are easily satisfied. However, the support conditionsfor hmm∈Zd given in Theorem 3.1 are rather restrictive and deserves a dis-cussion. In fact, compact support of the window functions is not a necessaryassumption for characterizing modulation spaces [27], Besov spaces [20], oreven general decomposition spaces [37]. However, a certain structure of thedual frame is needed and general NSGFs does not provide such structure.We choose to work with the painless case and base our argument on the factthat the dual frame posses the same structure as the original frame. We ex-pect that it is possible to extend the theory developed in this paper to a moregeneral setting by applying existence results for general NSGFs [12, 13, 31]or generalized shift invariant systems [30, 35, 36, 40]. In particular, the pa-per [31] by Holighaus seems to provide interesting results in this regard. Inthis paper, it is shown that for compactly supported window functions, thesampling density in Theorem 3.1 can (under mild assumptions) be relaxedsuch that the dual frame posses a structure similar to that of the originalframe. However, it is outside the scope of this paper to include such resultsand we will not go into further details.

We now provide a simple example of a set of window functions satisfyingDefinition 3.1(1).

Example 3.1. Let ϕ ∈ C∞c (Rd) \ 0 with supp(ϕ) ⊆ [0, 1]d and for m ∈ Zd

definehm(ξ) := ad/2

m ϕ (am(ξ − bm)) , ∀ξ ∈ Rd,

with bm ∈ Rd and am > 0. Then supp(hm) ⊆ [0, 1am]d + bm. Furthermore,

with w := am(ξ − bm) the chain rule yields∣∣∣∂βξ hm(ξ)

∣∣∣ = ∣∣∣[∂βξ ϕ](w)

∣∣∣ ad/2+|β|m ≤ Cβad/2+|β|

m χ[0, 1am ]d+bm

(ξ), ∀ξ ∈ Rd.

This shows Definition 3.1(1). 4

In the next section we consider painless NSGFs in the framework of de-composition spaces in order to characterize signals with sparse expansionsrelative to the NSGF dictionaries.

68

Page 87: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Decomposition spaces based on nonstationary Gabor frames

4 Decomposition spaces based on nonstationaryGabor frames

We first provide a method for constructing a structured covering which iscompatibly with a given painless NSGF hm,nm,n∈Zd ⊂ S(Rd). We recallthe definition of εm = C∗/am used in the construction of Imm∈Zd in (B.5).Define Q := (0, 1)d together with the set of affine transformations T :=Am(·) + cmm∈Zd with

Am :=(

2εm +1

am

)· Id and (cm)j := −εm + (bm)j, 1 ≤ j ≤ d.

Then Q := QTT∈T = Imm∈Zd and, furthermore, we have the followingresult.

Lemma 4.1. Q is a structured covering of Rd.

Proof. Define the set

P :=(

C∗2C∗ + 1

,C∗ + 1

2C∗ + 1

)d.

By straightforward calculations, it is easy to show that P is compactly con-tained in Q and P := PTT∈T = (0, 1

am)d + bmm∈Zd . Let us now show that

P and Q satisfy the five conditions of Definition 2.2 page 62.

1. First we show that P covers Rd. We note that this immediately impliesthat Q also covers Rd. Assume P does not cover Rd, i.e., that thereexists some ξ ′ ∈ Rd such that ξ ′ /∈ (0, 1

am)d + bm for all m ∈ Zd. Since

supp(hm) ⊆ [0, 1am]d + bm, and hm is continuous, we get hm(ξ ′) = 0 for

all m ∈ Zd. This contradicts the inequality in (B.3) concerning the lowerframe bound and thus shows that P covers Rd. Now, Definition 3.1(3)is precisely the admissibility condition for Q and thus guarantees thatboth P and Q are admissible coverings. This shows Definition 2.2(1).

2. If (Am′Q + cm′) ∩ (AmQ + cm) 6= ∅, then am′ am according to Def-inition 3.1(3). Furthermore, since A−1

m′ Am is a diagonal matrix andεm = C∗/am we get∥∥∥A−1

m′ Am

∥∥∥`∞(Rd×d)

=am′

am≤ Kam

am= K,

for some K > 0, so Definition 2.2(2) is satisfied.

3. To show Definition 2.2(3) we note that∥∥∥A−1m

∥∥∥`∞(Rd×Rd)

=am

2C∗ + 1≤ a

2C∗ + 1, ∀m ∈ Zd,

according to Definition 3.1(2).

69

Page 88: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

4. Finally, Definition 2.2(4)-(5) follow directly from Definition 3.1(4)-(5).

Since Q is a structured covering, Proposition 2.1 applies and we ob-tain a BAPU ψTT∈T subordinate to Q. Given parameters s ∈ R and0 < p, q < ∞ we may, therefore, construct the associated decompositionspace D(Q, Lp, `q

ωs). For notational convenience, we change notation andwrite hT,nT∈T ,n∈Zd , such that supp(hT,n) ⊂ QT for all T ∈ T and alln ∈ Zd. Since AT = (2εT + a−1

T ) · Id, the chain rule and Definition 3.1(1)yield

∣∣∣∂βξ

[hT(Tξ)

]∣∣∣ = ∣∣∣[∂βξ hT

](Tξ)

∣∣∣ ·(2εT +1aT

)|β|≤ Cβad/2

T · (2C∗ + 1)|β| χQT (Tξ) = C′βad/2T χQ(ξ), ∀ξ ∈ Rd.

(B.6)

Using (B.6) we can prove the following decay property of hT,nT,n.

Proposition 4.1. For every N ∈ N there exists a constant CN > 0 such that forT = AT(·) + cT ∈ T and n ∈ Zd,

|hT,n(x)| ≤ CN |T|1/2 (1 + ‖AT(x− naT)‖2)−N , ∀x ∈ Rd.

Proof. We will use the fact that

u(ξ)N = (1 + ‖ξ‖2)N ∑

|β|≤N

∣∣∣ξβ∣∣∣ , ξ ∈ Rd, (B.7)

for any N ∈ N with β ∈ Nd0. Let gT(ξ) := hT(Tξ) such that supp(gT) ⊂ Q

for all T ∈ T . Using (B.7) we get

|gT(x)| ≤ C1(1 + ‖x‖2)−N ∑|β|≤N

∣∣∣xβgT(x)∣∣∣

= C1(1 + ‖x‖2)−N ∑|β|≤N

∣∣∣F−1[∂

βξ gT

](x)∣∣∣

≤ C1(1 + ‖x‖2)−N ∑|β|≤N

∫Rd

∣∣∣∂βξ gT(ξ)

∣∣∣ dξ, x ∈ Rd.

Applying (B.6) we may continue and write

|gT(x)| ≤ C2ad/2T (1 + ‖x‖2)

−N ∑|β|≤N

∫Rd

χQ(ξ)dξ = C3ad/2T (1 + ‖x‖2)

−N .

(B.8)

70

Page 89: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Decomposition spaces based on nonstationary Gabor frames

Now, since εT = C∗/aT then

|T| |Q| = |QT | =(

2εT +1aT

)d= (2C∗ + 1)d(aT)

−d. (B.9)

Hence, ad/2T = C|T|−1/2 so (B.8) yields

|gT(x)| ≤ C4 |T|−1/2 (1 + ‖x‖2)−N , x ∈ Rd. (B.10)

Using the fact that AT is a diagonal matrix, we obtain the relationship

hT(x) =∫

RdhT(ξ)e2πiξ·xdξ = |T|

∫Rd

gT(u)e2πi(ATu+cT)·xdu

= e2πicT ·x |T|∫

RdgT(u)e2πiu·AT xdu = e2πicT ·x |T| gT(ATx), x ∈ Rd.

(B.11)

Combining (B.11) and (B.10) we arrive at

|hT,n(x)| = |hT(x− naT)| = |T| |gT(AT(x− naT))|

≤ C4 |T|1/2 (1 + ‖AT(x− naT)‖2)−N , x ∈ Rd.

This proves the proposition.

As a direct consequence of Proposition 4.1 we can prove the followinglemma.

Lemma 4.2. For 0 < p < ∞, we have

supx∈Rd

∥∥hT,n(x)n∈Zd

∥∥`p

≤ C |T|1/2 , and (B.12)

supn∈Zd

‖hT,n‖Lp ≤ C′ |T|1/2−1/p , (B.13)

with constants C, C′ > 0 independent of T ∈ T .

Proof. We will use the fact that∫Rd

u(ξ)−mdξ =∫

Rd(1 + ‖ξ‖2)

−m dξ < ∞, (B.14)

for any m > d. Choosing N > d/p in Proposition 4.1, then (B.14) yields

∥∥hT,n(x)n∈Zd

∥∥`p ≤ C1 |T|1/2

(∑

n∈Zd

(1 + ‖AT(x− naT)‖2)−Np

)1/p

≤ C2 |T|1/2 (adT |T|)−1/p. (B.15)

71

Page 90: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

According to (B.9), adT = C|T|−1, which inserted into (B.15) yields (B.12). To

show (B.13), we again let N > d/p in Proposition 4.1, so (B.14) yields

‖hT,n‖Lp ≤ C1 |T|1/2(∫

Rd(1 + ‖AT(x− naT)‖2)

−Npdx)1/p

≤ C2 |T|1/2−1/p .

This proves (B.6).

In the next section we use the painless NSGF hT,nT,n to prove a completecharacterization of the corresponding decomposition space D(Q, Lp, `q

ωs).

5 Characterization of decomposition spaces

The main result of this section is the characterization given in Theorem 5.1.To prove this result, we follow the approach taken in [3] where the authorsproved a similar result for a certain type of tight frames for Rd (see [3, Propo-sition 3]). Since the frames we consider are not assumed to be tight we needto modify the arguments given in [3]. We start with the following observa-tions.

Lemma 5.1. For 0 < p < ∞, the Fourier multiplier

ψhT(D) f := F−1

(ψh

TF f)

:= F−1

(ψT

∑l∈T1ad

l

∣∣∣hl

∣∣∣2F f

)(B.16)

is bounded on the band-limited functions in Lp(Rd) uniformly in T ∈ T . Further,

supx∈Rd

∥∥∥ψhT(D)hT′ ,n

n∈Zd

∥∥∥`p

≤ C |T|1/2 , T ∈ T , T′ ∈ T, (B.17)

with a constant C > 0 independent of T ∈ T .

Proof. Let ψh′T (ξ) := ψh

T(T(ξ)). For N > d/p, (B.14) and (B.7) imply∥∥∥F−1ψh′T

∥∥∥Lp≤ C1

∥∥∥u(·)NF−1ψh′T

∥∥∥L∞≤ C2 ∑

|β|≤N

∥∥∥(·)βF−1ψh′T

∥∥∥L∞

= C2 ∑|β|≤N

∥∥∥F−1(

∂βψh′T

)∥∥∥L∞≤ C2 ∑

|β|≤N

∥∥∥∂βψh′T

∥∥∥L1

. (B.18)

Since εT = C∗/aT , the chain rule yields

∂βψh′T (ξ) =

(∂βψh

T

)(Tξ)

(2εT +

1aT

)|β|= Ca−|β|T

(∂βψh

T

)(Tξ). (B.19)

72

Page 91: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Characterization of decomposition spaces

For estimating ∂βψhT we use the quotient rule. Because all derivatives of ψT

are bounded according to Proposition 2.1(4), we need only to consider thederivatives of the denominator of ψh

T . The sum in the denominator consists ofat most n0 terms and for each term in the sum, the chain rule and Definition3.1(1)-(2) imply an upper bound of Ca|β|T . Therefore, (B.19) yields |∂βψh′

T (ξ)| ≤C′χQ(ξ), since supp(ψh′

T ) ⊂ Q for all T ∈ T . Combing this with (B.18)we get ‖F−1ψh′

T ‖Lp ≤ C3. It now follows from Lemma A.2 page 81 thatf → F−1(ψh′

T F f ) defines a bounded operator on the band-limited functionsin Lp(Rd) uniformly in T ∈ T . Finally, applying [3, Lemma 1] we obtain thesame statement for ψh

T(D).We now prove (B.17). Repeating the arguments from the proof of Propo-

sition 4.1 (using (B.3) and Definition 3.1(1)) we can prove the same decayproperty for ψh

T(D)hT′ ,n. The result therefore follows from the arguments inthe proof of (B.12).

The statement in Theorem 5.1 follows directly once we have proven thefollowing technical lemma. We use the notation ψT := ∑T′∈T ψT′ .

Lemma 5.2. Given f ∈ S(Rd) and 0 < p < ∞. For all T ∈ T ,∥∥〈 f , hT,n〉n∈Zd

∥∥`p ≤ C |T|1/p−1/2 ∥∥ψT(D) f

∥∥Lp , and (B.20)

‖ψT(D) f ‖Lp ≤ C′ |T|1/2−1/p ∑T′∈T

∥∥∥⟨ f , hT′ ,n⟩

n∈Zd

∥∥∥`p

, (B.21)

with constants C, C′ > 0 independent of T ∈ T .

Proof. The proof of (B.20) follows directly from (B.12) and the arguments forthe first part of the proof for [3, Lemma 2]. To prove (B.21) we first assumep ≤ 1 and note

‖ψT(D) f ‖Lp ≤ C1 ∑T′∈T

∑n∈Zd

∣∣⟨ f , hT′ ,n⟩∣∣ ∥∥ψT(D)hT′ ,n

∥∥Lp (B.22)

≤ C2 ∑T′∈T

(∑

n∈Zd

∣∣⟨ f , hT′ ,n⟩∣∣p ∥∥ψT(D)hT′ ,n

∥∥pLp

)1/p

,

with hT,nT,n being the dual frame given in (B.4) page 66. Applying (B.16)and (B.13) this proves (B.21) for the case p ≤ 1. For p > 1, we note that

73

Page 92: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

Hölders inequality (with p′ being the conjugate index of p) yields∥∥∥∥∥ ∑n∈Zd

⟨f , hT′ ,n

⟩ψT(D)hT′ ,n

∥∥∥∥∥p

Lp

≤∫

Rd ∑n∈Zd

∣∣⟨ f , hT′ ,n⟩∣∣p ∣∣ψT(D)hT′ ,n(x)

∣∣ ( ∑n′∈Zd

∣∣ψT(D)hT′ ,n′(x)∣∣)p/p′

dx

≤ C1 |T|p/2p′−1/2 ∑n∈Zd

∣∣⟨ f , hT′ ,n⟩∣∣p ,

according to Lemma 5.1 and (B.13). Taking the p’th root on both sides andapplying (B.22) finishes the proof of (B.21) for p > 1.

Using the notation of [3] we define Lp−normalized atoms hpT,n :=

|T|1/2−1/p hT,n, for all T ∈ T , n ∈ Zd and 0 < p < ∞. We also define thecoefficient space d(Q, `p, `q

ωs) as the set of coefficients cT,nT∈T ,n∈Zd ⊂ C

satisfying

∥∥∥cT,nT∈T ,n∈Zd

∥∥∥d(Q,`p ,`q

ωs ):=∥∥∥∥∥cT,nn∈Zd

∥∥`p

T∈T

∥∥∥`

qωs

< ∞.

Combining Lemma 5.2 with the fact that S(Rd) is dense in D(Q, Lp, `qωs) we

obtain a characterization similar to that of [3, Proposition 3].

Theorem 5.1. For s ∈ R and 0 < p, q < ∞ we have the equivalence

‖ f ‖D(Q,Lp ,`qωs )∥∥∥∥⟨ f , hp

T,n

⟩T∈T ,n∈Zd

∥∥∥∥d(Q,`p ,`q

ωs ),

for all f ∈ D(Q, Lp, `qωs).

Remark 5.1. The characterization in Theorem 5.1 differs from the one givenin [3, Proposition 3] in two ways. In [3] the frame elements are obtaineddirectly from the structured covering such that the resulting system forms atight frame. In our framework we take the "reverse" approach and explicitlystate sufficient conditions which guarantee the existence of a compatible de-composition space for a given NSGF (cf. Definition 3.1). More importantly,we show that the assumption on tightness of the frame can be replaced withthe structured expression for the dual frame given in (B.4) page 66.

In the next section we use the characterization given in Theorem 5.1 toprove that hp

T,nT,n forms a Banach frame for D(Q, Lp, `qωs) with respect to

d(Q, `p, `qωs) for s ∈ R and 0 < p, q < ∞.

74

Page 93: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

6. Banach frames for decomposition spaces

6 Banach frames for decomposition spaces

Let us start by giving the general definition of a Banach frame [26, 27]. Tradi-tionally, Banach frames are only defined for Banach spaces but we will alsouse the concept for (quasi-)Banach spaces.

Definition 6.1 (Banach Frame). Let X be a (quasi-)Banach space and let Xdbe an associated (quasi-)Banach sequence space on N. A Banach frame forX, with respect to Xd, is a sequence ynn∈N in the dual space X′, such that

1. The coefficient operator CX : f → 〈 f , yn〉n∈N is bounded from X intoXd.

2. Norm equivalence:

‖ f ‖X ‖〈 f , yn〉n∈N‖Xd, ∀ f ∈ X.

3. There exists a bounded operator RXd from Xd onto X, called a recon-struction operator, such that

RXd CX f = RXd (〈 f , yn〉n∈N) = f , ∀ f ∈ X.

Remark 6.1. We will actually prove that hpT,nT,n forms an atomic decom-

position [5, 18, 19] for D(Q, Lp, `qωs) as the reconstruction operator takes the

form f = ∑T,n〈 f , hpT,n〉xT,n with xT,n ⊂ D(Q, Lp, `q

ωs) (see Theorem 6.1below).

In order to show that that hpT,nT,n forms a Banach frame for

D(Q, Lp, `qωs), we first note that

hpT,nT∈T ,n∈Zd ⊂ S(Rd) ⊂ D′(Q, Lp, `q

ωs)

as required by Definition 6.1. Furthermore, the equivalence in Theorem 5.1implies that Definition 6.1(2) is satisfied and the corresponding proof revealsthat Definition 6.1(1) is satisfied. What remains to be shown is the existenceof a bounded reconstruction operator such that Definition 6.1(3) holds. ForcT,nT,n ∈ d(Q, `p, `q

ωs), we define the reconstruction operator as

Rd(Q,`p ,`qωs )

(cT,nT,n

)= ∑

T∈T ,n∈Zd

cT,n |T|1/p−1/2 hT,n, (B.23)

with hT,nT∈T ,n∈Zd being the dual frame given in (B.4) page 66. We nowprovide the main result of this section.

75

Page 94: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

Theorem 6.1. Given s ∈ R and 0 < p, q < ∞, hpT,nT∈T ,n∈Zd forms a Banach

frame for D(Q, Lp, `qωs). Furthermore, we have the expansions

f = ∑T∈T ,n∈Zd

〈 f , hT,n〉 hT,n, ∀ f ∈ D(Q, Lp, `qωs), (B.24)

with unconditional convergence.

Proof. Let R and C denote the reconstruction- and coefficient operator, respec-tively. We first prove that R is bounded from d(Q, `p, `q

ωs) onto D(Q, Lp, `qωs).

For cT,nT,n ∈ d(Q, `p, `qωs) we let g := R(cT,nT,n). For T ∈ T , Lemma 5.1

implies

‖ψT(D)g‖Lp =

∥∥∥∥∥∥∥∥F−1

ψT · ψT

∑l∈T1ad

l

∣∣∣hl

∣∣∣2 · ∑T′∈T ,n∈Zd

cT′ ,n∣∣T′∣∣1/p−1/2 hT′ ,n

∥∥∥∥∥∥∥∥

Lp

≤ C1

∥∥∥∥∥∥ψT(D)

∑T′∈T ,n∈Zd

cT′ ,n∣∣T′∣∣1/p−1/2 hT′ ,n

∥∥∥∥∥∥Lp

. (B.25)

Repeating the arguments from the proof of [3, Lemma 4] we can show that∥∥∥∥∥∥ ∑T∈T ,n∈Zd

cT,n |T|1/p−1/2 hT,n

∥∥∥∥∥∥D(Q,Lp ,`q

ωs )

≤ C ‖cT,nT,n‖d(Q,`p ,`qωs )

. (B.26)

Applying (B.2) page 63 to (B.25) and then using (B.26) we get

‖g‖D(Q,Lp ,`qωs )≤ C2

∥∥∥∥∥∥ ∑T∈T ,n∈Zd

cT,n |T|1/p−1/2 hT,n

∥∥∥∥∥∥D(Q,Lp ,`q

ωs )

≤ C3

∥∥∥cT,nT∈T ,n∈Zd

∥∥∥d(Q,`p ,`q

ωs ). (B.27)

This proves that R is bounded from d(Q, `p, `qωs) onto D(Q, Lp, `q

ωs). Let usnow show the unconditional convergence of (B.24). Given f ∈ D(Q, Lp, `q

ωs),we can find a sequence fkk≥1, with fk ∈ S(Rd) for all k ≥ 1, such thatfk → f in D(Q, Lp, `q

ωs) as k → ∞. Furthermore, since hT,nT,n forms aframe for L2(Rd), for each k ≥ 1 we have the expansion

fk = ∑T∈T ,n∈Zd

〈 fk, hT,n〉 hT,n = RC( fk),

with unconditional convergence. Since RC : D(Q, Lp, `qωs)→ D(Q, Lp, `q

ωs) iscontinuous, letting k→ ∞ yields

f = RC( f ) = ∑T∈T ,n∈Zd

〈 f , hT,n〉 hT,n. (B.28)

76

Page 95: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

7. Application to nonlinear approximation theory

Given ε > 0, (B.27) implies that we can find a finite subset F0 ⊂ T ×Zd, suchthat∥∥∥∥∥∥ f − ∑

(T,n)∈F〈 f , hT,n〉 hT,n

∥∥∥∥∥∥D(Q,Lp ,`q

ωs )

≤ C∥∥∥〈 f , hT,n〉(T,n)/∈F

∥∥∥d(Q,Lp ,`q

ωs )< ε,

for all finite sets F ⊇ F0. According to [27, Proposition 5.3.1], this property isequivalent to unconditional convergence.

We close this section by discussing the implications of the achieved re-sults. According to Theorem 5.1 and Theorem 6.1, every f ∈ D(Q, Lp, `q

ωs)has an expansion of the form

f = ∑T∈T ,n∈Zd

⟨f , hp

T,n

⟩|T|1/p−1/2 hT,n, with

‖ f ‖D(Q,Lp ,`qωs )∥∥∥∥⟨ f , hp

T,n

⟩T,n

∥∥∥∥d(Q,`p ,`q

ωs ).

Now, assume there exists another set of reconstruction coefficients cT,nT,n ∈d(Q, `p, `q

ωs) which is sparser than 〈 f , hpT,n〉T,n when sparseness is mea-

sured by the d(Q, `p, `qωs)-norm. Since the reconstruction operator R is

bounded we get

‖cT,nT,n‖d(Q,`p ,`qωs )≤∥∥∥∥⟨ f , hp

T,n

⟩T,n

∥∥∥∥d(Q,`p ,`q

ωs )≤ C1 ‖ f ‖D(Q,Lp ,`q

ωs )

= C1 ‖R(cT,nT,n)‖D(Q,Lp ,`qωs )

≤ C2 ‖cT,nT,n‖d(Q,`p ,`qωs )

.

We conclude that the canonical coefficients 〈 f , hpT,n〉T,n are (up to a con-

stant) the sparsest possible choice for expanding f as

f = ∑T∈T ,n∈Zd

cT,n |T|1/p−1/2 hT,n,

when sparseness of the coefficients is measured by the d(Q, `p, `qωs)-norm.

Furthermore, f ∈ D(Q, Lp, `qωs) if and only if f permits a sparse expansion

relative to the dictionary |T|1/p−1/2hT,nT,n .

7 Application to nonlinear approximation theory

In this section we provide the link to nonlinear approximation theory. An im-portant property of the sparse expansions obtained in Theorem 6.1 is that we

77

Page 96: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

can obtain a good compression by simply thresholding the coefficients fromthe expansion. As mentioned in the introduction, NSGFs can create adap-tive time-frequency representations as opposed to standard Gabor frames.Such adaptive representations can be constructed to fit the particular natureof a given signal, thereby producing a more precise (and hopefully sparser)time-frequency representation. In particular, NSGFs have proven to be use-ful in connection with music signals. For instance, in [1, 11] the authors useNSGFs to construct an invertible constant-Q transform with good frequencyresolution at the lower frequencies and good time resolution at the higher fre-quencies. Such a time-frequency resolution is often more natural for musicsignals than the uniform resolution provided by Gabor frames.

The main result of this section is given in (B.30) below. The correspondingproof follows directly from the results obtained in Sections 5 and 6 togetherwith standard arguments from nonlinear approximation theory [23]. Let f ∈D(Q, Lτ , `τ

ωs), with 0 < τ < ∞, and let 0 < p < ∞ satisfy α := 1/τ −1/p > 0. Write the frame expansion of f with respect to the Lp−normalizedcoefficients

f = ∑T∈T ,n∈Zd

⟨f , hp

T,n

⟩|T|1/p−1/2 hT,n. (B.29)

Let θmm∈N be a decreasing rearrangement of the frame coefficients and letfN be the N-term approximation to f obtained by extracting the coefficientsin (B.29) corresponding to the N largest coefficients θmN

m=1. Then, we canprove the existence of C > 0 such that for f ∈ D(Q, Lτ , `τ

ωs) and N ∈N,

‖ f − fN‖D(Q,Lp ,`pωs )≤ CN−α ‖ f ‖D(Q,Lτ ,`τ

ωs ). (B.30)

In other words, for f ∈ D(Q, Lτ , `τωs) we obtain good approximations in

D(Q, Lp, `pωs) by thresholding the Lp−normalized frame coefficients. We note

that for 0 < τ < 2 we obtain good approximations in L2(Rd) with respect tothe original coefficients 〈 f , hT,n〉T,n.

We now explain the obtained results in the general framework ofJackson- and Bernstein inequalities [8]. Let D denote the dictionary|T|1/p−1/2hT,nT,n and define the nonlinear set of all linear combinationsof at most N elements from D as

ΣN(D) :=

T,n∈∆cT,n |T|1/p−1/2 hT,n

∣∣∣∣∣ #∆ ≤ N

.

For any f ∈ D(Q, Lp, `pωs), the error of best N-term approximation to f is

σN ( f ,D) := infh∈ΣN(D)

‖ f − h‖D(Q,Lp ,`pωs )

.

Since fN ∈ ΣN(D), (B.30) yields

σN ( f ,D) ≤ CN−α ‖ f ‖D(Q,Lτ ,`τωs )

.

78

Page 97: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

A. Proof of Theorem 2.1

This is a so-called Jackson inequality for nonlinear N-term approximationwith D. It provides us with an upper bound for the error obtained by ap-proximating f with the best possible choice of linear combinations of at mostN elements from the dictionary. The converse inequality is called a Bernsteininequality and is in general much more difficult to obtain for redundant sys-tems [24]. The existence of a Bernstein inequality would provide us with alower bound and hence a full characterization of the error of best N-termapproximation to f with respect to the dictionary D. However, for this par-ticular system (and for many other redundant systems), the existence of aBernstein inequality is still an open question.

Acknowledgements

We thank the two anonymous reviewers for their constructive comments onthe original manuscript. Their valuable suggestions have helped improve themanuscript considerably.

A Proof of Theorem 2.1

Proof. To simplify notation we let Dsp,q := D(Q, Lp, `q

ωs). Let us first proveTheorem 2.1(1). Allowing the extension q = ∞, and repeating the argumentsfrom the proof of [4, Proposition 5.7], we can show that

Ds+εp,∞ → Ds

p,q → Dsp,∞, ε > d/q,

for any s ∈ R and 0 < p < ∞ using Definition 2.2(4). It therefore suffice toshow that S(Rd) → Ds

p,∞ → S ′(Rd) for any s ∈ R and 0 < p < ∞. ForN ∈N, we define semi-norms on S(Rd) by

pN(g) := supξ∈Rd

u(ξ)N ∑|β|≤N

∣∣∣∂β g(ξ)∣∣∣ , g ∈ S(Rd),

with u(ξ) = 1 + ‖ξ‖2 as usual. Following the approach in [4, Page 149], andapplying Proposition 2.1(4), we get

‖ f ‖Dsp,∞≤ CpN( f ) and ‖ f ‖Ds

1,1≤ C′pN′( f ),

for sufficiently large N and N′. This proves that S(Rd) → Dsp,∞ and

S(Rd) → Ds1,1. To show that Ds

p,∞ → S ′(Rd) we need to take a differentapproach than in [4]. Setting ψT := ∑T′∈T ψT′ , we first note that for f ∈ Ds

p,∞

and ϕ ∈ S(Rd),

79

Page 98: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper B.

|〈 f , ϕ〉| ≤ ∑T∈T

∥∥ψT(D) f ψT(D)ϕ∥∥

L1 ≤ ∑T∈T‖ψT(D) f ‖L∞

∥∥ψT(D)ϕ∥∥

L1 .

Using Lemma A.1 below (with g = F−1ψT f (Tξ)) we thus get

|〈 f , ϕ〉| ≤ C1 ∑T∈T|T|1/p ‖ψT(D) f ‖Lp

∥∥ψT(D)ϕ∥∥

L1

≤ C1 ‖ f ‖Dsp,∞ ∑

T∈T|T|1/p ω−s

T

∥∥ψT(D)ϕ∥∥

L1

≤ C2 ‖ f ‖Dsp,∞

∥∥∥∥∥ψT(D)ϕ∥∥

L1

T∈T

∥∥∥`1

ωγ/p−s

, (B.31)

since |T| = |Q|−1|QT | ≤ |Q|−1ωγT according to Definition 2.2(5). Applying

(B.2) page 63 we may continue on (B.31) and write

|〈 f , ϕ〉| ≤ C3 ‖ f ‖Dsp,∞‖ϕ‖

Dγ/p−s1,1

≤ C4 ‖ f ‖Dsp,∞

pN(ϕ),

for sufficiently large N since S(Rd) → Ds1,1. We conclude that Ds

p,∞ →S ′(Rd) which proves Theorem 2.1(1).

The proof of Theorem 2.1(2) follows directly from Theorem 2.1(1) and thearguments in [4, Page 150].

To prove Theorem 2.1(3) we let f ∈ Dsp,q and choose I ∈ C∞

c (Rd) with0 ≤ I(ξ) ≤ 1 and I(ξ) ≡ 1 in a neighborhood of ξ = 0. Also, we define( f ) := I f and

fε := F−1

ϕε ∗(

f) ∈ S(Rd),

with ϕε(ξ) := ε−d ϕ(ξ/ε) and ϕ being a compactly supported mollifier. Sincesupp(I) is compact, we may choose a finite subset T∗ ⊂ T with supp(I) ⊂∪T∈T∗QT and ∑T∈T∗ ψT(ξ) ≡ 1 on supp(I). Using Lemma A.2 below weobtain

‖ f ‖Lp =

∥∥∥∥∥F−1 IF(F−1

(∑

T∈T∗ψT · f

))∥∥∥∥∥Lp

≤ C ∑T∈T∗

∥∥∥F−1 I∥∥∥

L p‖ψT(D) f ‖Lp < ∞,

with p = min1, p. The dominated convergence theorem thus yields∥∥∥ f − fε

∥∥∥Ds

p,q≤ C

∥∥∥∥∥∥∥ f − fε

∥∥∥Lp

T∈T

∥∥∥∥`

qωs

→ 0, as ε→ 0,

so the proof is done if we can show that ‖ f − f ‖Dsp,q can be made arbitrary

small by choosing f appropriately. To show this, we define the set

80

Page 99: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

T := T ∈ T | I(ξ) ≡ 1 on supp(ψT). Denoting the complement Tc,

Lemma A.2 below yields∥∥∥ f − f∥∥∥q

Dsp,q

= ∑T∈Tc

ωsqT

∥∥∥F−1(

ψT

(f − I f

))∥∥∥q

Lp

≤ C1 ∑T∈Tc

ωsqT

(‖ψT(D) f ‖Lp +

∥∥∥F−1 IF (ψT(D) f )∥∥∥

Lp

)q

≤ C2 ∑T∈Tc

ωsqT ‖ψT(D) f ‖q

Lp .

Finally, since f ∈ Dsp,q we can choose supp(I) large enough, such that

‖ f − f ‖Dsp,q < ε, for any given ε > 0. This proves Theorem 2.1(3).

In the proof of Theorem 2.1 we used the following two lemmas. A proofof Lemma A.1 can be found in [3, Lemma 3] and a proof of Lemma A.2 canbe found in [41, Proposition 1.5.1].

Lemma A.1. Let g ∈ Lp(Rd) and supp(g) ⊂ Γ, with Γ ⊂ Rd compact. Given aninvertible affine transformation T, let gT(ξ) := g(T−1ξ). Then for 0 < p ≤ q ≤ ∞,

‖gT‖Lq≤ C |T|1/p−1/q ‖gT‖Lp

,

for a constant C independent of T.

Lemma A.2. Let Ω and Γ be compact subsets of Rd. Let 0 < p ≤ ∞ and p =min1, p. Then there exists a constant C such that∥∥∥F−1MF f

∥∥∥Lp≤ C

∥∥∥F−1M∥∥∥

L p‖ f ‖Lp

for all f ∈ Lp(Rd) with supp( f ) ⊂ Ω and all F−1M ∈ L p(Rd) with supp(M) ⊂Γ.

References

[1] P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco. Theory, im-plementation and applications of nonstationary Gabor frames. J. Comput.Appl. Math., 236(6):1481–1496, 2011.

[2] L. Borup and M. Nielsen. Banach frames for multivariate α-modulationspaces. J. Math. Anal. Appl., 321(2):880–895, 2006.

[3] L. Borup and M. Nielsen. Frame decomposition of decompositionspaces. J. Fourier Anal. Appl., 13(1):39–70, 2007.

81

Page 100: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[4] L. Borup and M. Nielsen. On anisotropic Triebel-Lizorkin type spaces,with applications to the study of pseudo-differential operators. J. Funct.Spaces Appl., 6(2):107–154, 2008.

[5] P. Casazza, D. Han, and D. Larson. Frames for Banach spaces. Contem-porary Mathematics, 247:149–182, 1999.

[6] O. Christensen. An introduction to frames and Riesz bases. Applied and Nu-merical Harmonic Analysis. Birkhäuser/Springer, [Cham], second edi-tion, 2016.

[7] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonalexpansions. J. Math. Phys., 27(5):1271–1283, 1986.

[8] R. A. DeVore and G. G. Lorentz. Constructive approximation, volume 303of Grundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, 1993.

[9] M. Dörfler. Time-frequency analysis for music signals: A mathematicalapproach. Journal of New Music Research, 30(1):3–12, 2001.

[10] M. Dörfler. Quilted Gabor frames—a new concept for adaptive time-frequency representation. Adv. in Appl. Math., 47(4):668–687, 2011.

[11] M. Dörfler, N. Holighaus, T. Grill, and G. Velasco. Constructing aninvertible constant-Q transform with nonstationary Gabor frames. Inn Proceedings of the 14th International Conference on Digital Audio Effects(DAFx 11), 2011.

[12] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—existenceand construction. Int. J. Wavelets Multiresolut. Inf. Process., 12(3):1450032,18, 2014.

[13] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—approximately dual frames and reconstruction errors. Adv. Comput.Math., 41(2):293–316, 2015.

[14] H. G. Feichtinger. Banach convolution algebras of Wiener type. In Func-tions, series, operators, Vol. I, II (Budapest, 1980), volume 35 of Colloq. Math.Soc. János Bolyai, pages 509–524. North-Holland, Amsterdam, 1983.

[15] H. G. Feichtinger. Modulation spaces on locally compact abelian groups.Technical report, University of Vienna, 1983.

[16] H. G. Feichtinger. Banach spaces of distributions defined by decompo-sition methods. II. Math. Nachr., 132:207–237, 1987.

82

Page 101: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[17] H. G. Feichtinger and P. Gröbner. Banach spaces of distributions definedby decomposition methods. I. Math. Nachr., 123:97–120, 1985.

[18] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related to in-tegrable group representations and their atomic decompositions. I. J.Funct. Anal., 86(2):307–340, 1989.

[19] H. G. Feichtinger and K. H. Gröchenig. Banach spaces related tointegrable group representations and their atomic decompositions. II.Monatsh. Math., 108(2-3):129–148, 1989.

[20] M. Frazier and B. Jawerth. Decomposition of Besov spaces. Indiana Univ.Math. J., 34(4):777–799, 1985.

[21] H. Führ and F. Voigtlaender. Wavelet coorbit spaces viewed as decom-position spaces. J. Funct. Anal., 269(1):80–154, 2015.

[22] R. Gribonval and M. Nielsen. Some remarks on non-linear approxima-tion with Schauder bases. East J. Approx., 7(3):267–285, 2001.

[23] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. I. Direct estimates. J. Fourier Anal. Appl., 10(1):51–71, 2004.

[24] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. II. Inverse estimates. Constr. Approx., 24(2):157–173, 2006.

[25] P. Gröbner. Banachräume glatter Funktionen und Zerlegungsmethoden. Pro-Quest LLC, Ann Arbor, MI, 1992. Thesis (Dr.natw.)–Technische Univer-sität Wien (Austria).

[26] K. Gröchenig. Describing functions: atomic decompositions versusframes. Monatsh. Math., 112(1):1–42, 1991.

[27] K. Gröchenig. Foundations of time-frequency analysis. Applied and Nu-merical Harmonic Analysis. Birkhäuser Boston, Inc., Boston, MA, 2001.

[28] K. Gröchenig and S. Samarah. Nonlinear approximation with localFourier bases. Constr. Approx., 16(3):317–331, 2000.

[29] C. Heil. An introduction to weighted Wiener amalgams. Wavelets andtheir Applications (Chennai, January 2002), pages 183–216, 2003.

[30] E. Hernández, D. Labate, and G. Weiss. A unified characterization ofreproducing systems generated by a finite family. II. J. Geom. Anal.,12(4):615–662, 2002.

[31] N. Holighaus. Structure of nonstationary Gabor frames and their dualsystems. Appl. Comput. Harmon. Anal., 37(3):442–463, 2014.

83

Page 102: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[32] N. Holighaus, M. Dörfler, G. A. Velasco, and T. Grill. A framework forinvertible, real-time constant-Q transforms. IEEE Transactions on Audio,Speech, and Language Processing, 21(4):775–785, April 2013.

[33] A. Holzapfel, G. A. Velasco, N. Holighaus, M. Dörfler, and A. Flexer. Ad-vantages of nonstationary Gabor transforms in beat tracking. In Proceed-ings of the 1st International ACM Workshop on Music Information Retrievalwith User-centered and Multimodal Strategies, MIRUM ’11, pages 45–50,New York, NY, USA, 2011. ACM.

[34] F. Jaillet and B. Torrésani. Time-frequency jigsaw puzzle: adaptive multi-window and multilayered Gabor expansions. Int. J. Wavelets Multiresolut.Inf. Process., 5(2):293–315, 2007.

[35] M. S. Jakobsen and J. Lemvig. Reproducing formulas for generalizedtranslation invariant systems on locally compact abelian groups. Trans.Amer. Math. Soc., 368(12):8447–8480, 2016.

[36] G. Kutyniok and D. Labate. The theory of reproducing systems on lo-cally compact abelian groups. Colloq. Math., 106(2):197–220, 2006.

[37] M. Nielsen and K. N. Rasmussen. Compactly supported frames for de-composition spaces. J. Fourier Anal. Appl., 18(1):87–117, 2012.

[38] W. J. Pielemeier, G. H. Wakefield, and M. H. Simoni. Time-frequencyanalysis of musical signals. Proceedings of the IEEE, 84(9):1216–1230, Sep1996.

[39] H. Rauhut. Wiener amalgam spaces with respect to quasi-Banach spaces.Colloq. Math., 109(2):345–362, 2007.

[40] A. Ron and Z. Shen. Generalized shift-invariant systems. Constr. Approx.,22(1):1–45, 2005.

[41] H. Triebel. Theory of function spaces, volume 38 of Mathematik und ihreAnwendungen in Physik und Technik [Mathematics and its applications inphysics and technology]. Akademische Verlagsgesellschaft Geest & PortigK.-G., Leipzig, 1983.

[42] P. J. Wolfe, S. J. Godsill, and M. Dörfler. Multi-Gabor dictionaries foraudio time-frequency analysis. In Proceedings of the 2001 IEEE Work-shop on the Applications of Signal Processing to Audio and Acoustics (Cat.No.01TH8575), pages 43–46, 2001.

[43] M. Zibulski and Y. Y. Zeevi. Analysis of multiwindow Gabor-typeschemes by frame methods. Appl. Comput. Harmon. Anal., 4(2):188–221,1997.

84

Page 103: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C

Nonlinear Approximation with Nonstationary GaborFrames

Emil Solsbæk Ottosen and Morten Nielsen

The paper has been accepted for publication inAdvances in Computational Mathematics, 2017.

Page 104: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

c© 2017 SpringerThe layout has been revised.

Page 105: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

1. Introduction

Abstract

We consider sparseness properties of adaptive time-frequency representations ob-tained using nonstationary Gabor frames (NSGFs). NSGFs generalize classical Ga-bor frames by allowing for adaptivity in either time or frequency. It is known that theconcept of painless nonorthogonal expansions generalizes to the nonstationary case,providing perfect reconstruction and an FFT based implementation for compactlysupported window functions sampled at a certain density. It is also known that forsome signal classes, NSGFs with flexible time resolution tend to provide sparser ex-pansions than can be obtained with classical Gabor frames. In this article we show,for the continuous case, that sparseness of a nonstationary Gabor expansion is equiv-alent to smoothness in an associated decomposition space. In this way we characterizesignals with sparse expansions relative to NSGFs with flexible time resolution. Basedon this characterization we prove an upper bound on the approximation error occur-ring when thresholding the coefficients of the corresponding frame expansions. Wecomplement the theoretical results with numerical experiments, estimating the rateof approximation obtained from thresholding the coefficients of both stationary andnonstationary Gabor expansions.

1 Introduction

The field of Gabor theory [6, 19, 41] is concerned with representing signalsas atomic decompositions using time-frequency localized atoms. The atomsare constructed as time-frequency shifts of a fixed window function, accord-ing to some lattice parameters, such that the resulting system constitutes aframe and, therefore, guarantees stable expansions [5, 31, 35]. Such framesare known under the name of Weyl-Heisenberg frames or Gabor frames andhave been proven useful in a variety of applications [10, 22, 33]. The structureof Gabor frames implies a time-frequency resolution which depends only onthe lattice parameters and the window function. In particular, the resolutionis independent of the signal under consideration, which makes the corre-sponding implementation fast and easy to handle. The usage of a prede-termined time-frequency resolution naturally raises the question of whetheran improvement can be obtained by taking the signal class into considera-tion? This question has lead to many interesting approaches for construct-ing adaptive time-frequency representations [11, 27, 40, 42]. Unfortunately,for representations with resolution varying in both time and frequency thereseems to be a trade-off between perfect reconstruction and fast implemen-tation [30]. In this article, we therefore consider time-frequency represen-tations with resolution varying in either time or frequency. The idea is togeneralise the theory of painless nonorthogonal expansions [6] to the situ-ation where multiple window functions are used along either the time- or

87

Page 106: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

the frequency axis. The resulting systems, which allow for perfect recon-struction and an FFT based implementation, are called painless generalisedshift-invariant systems [25, 36] or painless nonstationary Gabor frames (pain-less NSGFs) [1, 26]. As already noted in [1], painless NSGFs tend to producesparser representations than classical Gabor frames for certain classes of mu-sic signals. Sparseness of a time-frequency representation is desirable forseveral reasons, mainly because it may reduce the computational cost formanipulating and storing the coefficients [8, 18]. Additionally, many signalclasses are characterized by some kind of sparseness in time or frequencyand the corresponding signals are, therefore, best described by a sparse time-frequency representation. For such signals, the task of feature identificationalso benefits from a sparse representation as the particular characteristics ofthe signal becomes easier to identify.

In this article we consider sparseness properties of painless NSGFs withresolution varying in time. Whereas modulation spaces [14, 16, 22] haveturned out to be the proper function spaces for analyzing sparseness prop-erties of classical Gabor frames [23], we need a more general framework forthe nonstationary case. A painless NSGF with flexible time resolution cor-responds to a sampling grid which is irregular over time but regular overfrequency for each fixed time point. We therefore search for a smoothnessspace which is compatible with a (more or less) arbitrary partition of the timedomain. Such a flexibility can be provided by decomposition spaces, as in-troduced by Feichtinger and Gröbner in [15, 17]. Decomposition spaces maybe viewed as a generalization of the classical Wiener amalgam spaces [13, 24]but with no assumption of an upper bound on the measure of the membersof the partition. Another way of stating this is that decomposition spacesare constructed using bounded admissible partitions of unity [17] instead ofbounded uniform partitions of unity [13]. The partitions we consider are ob-tained by applying a set of invertible affine transformations Ak(·) + ckk∈N

on a fixed set Q ⊂ Rd [2].We use decomposition spaces to characterize signals with sparse expan-

sions relative to painless NSGFs with flexible time resolution. We measuresparseness of an expansion by a mixed norm on the coefficients and showthat the sparseness property implies an upper bound on the approximationerror obtained by thresholding the expansion. Using the terminology fromnonlinear approximation, such an upper bound is also known as a Jacksoninequality [4, 8]. A similar characterization for classical Gabor frames usingmodulation spaces was proven by Gröchenig and Samarah in [23]. For thenonstationary case, we provided a characterization in [32] for painless NSGFswith flexible frequency resolution using decomposition spaces. A differentapproach to this problem is considered by Voigtlaender in [39], where thepainless assumption is replaced with a more general analysis of the sam-pling parameter. The decomposition spaces considered in both [32] and [39]

88

Page 107: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Decomposition spaces

are based on partitions of the frequency domain, which is not a natural choicefor NSGFs with flexible time resolution. In this article we consider decom-positions of the time domain, which allow for compactly supported windowfunctions sampled at a low density (compared to the general theory formu-lated in [39]). It is worth noting that there is a significant mathematical dif-ference between decomposition spaces in time and in frequency.

The structure of this this article is as follows. In Section 2 we formally in-troduce decomposition spaces in time and prove several important propertiesof these spaces. Then, based on the ideas in [32], we show in Section 3 howto construct a suitable decomposition space for a given painless NSGF withflexible time resolution. In Section 4 we prove that the suitable decomposi-tion space characterizes signals with sparse frame expansions and we providean upper bound on the approximation rate occurring when thresholding theframe coefficients. Finally, in Section 5 we present the numerical results andin Section 6 we give the conclusions.

Let us now briefly go through our notation. By f (ξ) :=∫

Rd f (x)e−2πix·ξdxwe denote the Fourier transform with the usual extension to L2(Rd). WithF G we mean that there exist two constants 0 < C1, C2 < ∞ such thatC1F ≤ G ≤ C2F. For two normed vector spaces X and Y, X → Y means thatX ⊂ Y and ‖ f ‖Y ≤ C ‖ f ‖X for some constant C and all f ∈ X. We say thata non-empty open set Ω′ ⊂ Rd is compactly contained in an open set Ω ⊂ Rd

if Ω′ ⊂ Ω and Ω′ is compact. We call xii∈I ⊂ Rd a δ−separated set ifinfj,k∈I ,j 6=k ‖xj − xk‖2 = δ > 0. Finally, by Id we denote the identity operatoron Rd and by χQ we denote the indicator function for a set Q ⊂ Rd.

2 Decomposition spaces

In this section we define decomposition spaces [17] based on structured cov-erings [2]. For an invertible matrix A ∈ GL(Rd), and a constant c ∈ Rd, wedefine the affine transformation Tx = Ax + c with x ∈ Rd. Given a familyT = Ak(·) + ckk∈N of invertible affine transformations on Rd, and a subsetQ ⊂ Rd, we let QTT∈T := T(Q)T∈T and

T :=

T′ ∈ T∣∣ QT′ ∩QT 6= ∅

, T ∈ T . (C.1)

We say that Q := QTT∈T is an admissible covering of Rd if⋃

T∈T QT = Rd

and there exists n0 ∈N such that |T| ≤ n0 for all T ∈ T .

Definition 2.1 (Q−moderate weight). Let Q := QTT∈T be an admissiblecovering. A function u : Rd → (0, ∞) is called Q−moderate if there existsC > 0 such that u(x) ≤ Cu(y) for all x, y ∈ QT and all T ∈ T . AQ−moderateweight (derived from u) is a sequence ωTT∈T := u(xT)T∈T with xT ∈ QTfor all T ∈ T .

89

Page 108: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

For the rest of this article, we shall use the explicit choice u(x) := 1+ ‖x‖2for the function u in Definition 2.1. Let us now define structured coverings [2]of the time domain.

Definition 2.2 (Structured covering). Given a family T = Ak(·) + ckk∈N

of invertible affine transformations on Rd, suppose there exist two boundedopen sets P ⊂ Q ⊂ Rd, with P compactly contained in Q, such that

1. PTT∈T and QTT∈T are admissible coverings.

2. There exists a δ−separated set xTT∈T ⊂ Rd, with xT ∈ QT for all T ∈T , such that ωTT∈T := 1 + ‖xT‖2T∈T is a Q−moderate weight.

Then we call Q = QTT∈T a structured covering.

For a structured covering we have the associated concept of a boundedadmissible partition of unity (BAPU) [17].

Definition 2.3 (BAPU). Let Q = QTT∈T be a structured covering of Rd.A BAPU subordinate to Q is a family of non-negative functions ψTT∈T ⊂C∞

c (Rd) satisfying

1. supp(ψT) ⊂ QT , ∀T ∈ T .

2. ∑T∈T

ψT(x) = 1, ∀x ∈ Rd.

We note that the assumptions in Definition 2.3 imply that the membersof the BAPU are uniformly bounded, i.e., supT∈T ‖ψT‖L∞ ≤ 1. Given astructured covering Q = QTT∈T , we can always construct a subordinateBAPU. Choose a non-negative function Φ ∈ C∞

c (Rd), with Φ(x) = 1 for allx ∈ P and supp(Φ) ⊂ Q, and define

ψT(x) :=Φ(T−1x)

∑T′∈T Φ(T′−1x), x ∈ Rd,

for all T ∈ T . With this construction, it is clear that Definition 2.3(1) issatisfied. Furthermore, since PTT∈T is an admissible covering, then 1 ≤∑T′∈T Φ(T′−1x) ≤ n0 for all x ∈ Rd which shows that Definition 2.3(2) holds.

Remark 2.1. We note that the assumption in Definition 2.2(2) is not necessaryfor constructing a subordinate BAPU, however, the assumption is needed forproving Theorem 2.1.

90

Page 109: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Decomposition spaces

Let Q = QTT∈T be a structured covering with Q−moderate weightωTT∈T = 1 + ‖xT‖2T∈T and BAPU ψTT∈T . For s ∈ R and 1 ≤ q ≤ ∞,we define the associated weighted sequence space

`qωs(T ) :=

aTT∈T ⊂ C

∣∣∣ ‖aTT∈T ‖`qωs

:= ‖ωsTaTT∈T ‖`q < ∞

.

Given aTT∈T ∈ `qωs(T ), we define a+T T∈T by a+T := ∑T′∈T aT′ . Since

ωTT∈T is Q−moderate, aTT∈T → a+T T∈T defines a bounded opera-tor on `

qωs(T ) according to [17, Remark 2.13 and Lemma 3.2]. Denoting its

operator norm by C+, we have∥∥∥a+T

T∈T

∥∥∥`

qωs≤ C+

∥∥aTT∈T∥∥`

qωs

, ∀ aTT∈T ∈ `qωs(T ). (C.2)

We now define decomposition spaces as first introduced in [17].

Definition 2.4 (Decomposition space). Let Q = QTT∈T be a structuredcovering with Q−moderate weight ωTT∈T = 1 + ‖xT‖2T∈T and BAPUψTT∈T . For s ∈ R and 1 ≤ p, q ≤ ∞, we define the decomposition spaceD(Q, Lp, `q

ωs) as the set of distributions f ∈ S ′(Rd) satisfying

‖ f ‖D(Q,Lp ,`qωs )

:=∥∥‖ψT f ‖LpT∈T

∥∥`

qωs

< ∞.

Remark 2.2. According to [17, Theorem 3.7], D(Q, Lp, `qωs) is independent of

the particular choice of BAPU and different choices yield equivalent norms.Actually the results in [17] show that D(Q, Lp, `q

ωs) is invariant under certaingeometric modifications of Q, but we will not go into detail here.

Remark 2.3. In contrast to the approach taken in [32] (where the decomposi-tion is performed on the frequency side), we do not allow p, q < 1 in Defini-tion 2.4 since a simple consideration shows that the resulting decompositionspaces would not be complete in this case.

We now consider some familiar examples of decomposition spaces. Bystandard arguments it is easy to verify that D(Q, L2, `2) = L2(Rd) with equiv-alent norms for any structured covering Q. The next example shows how toconstruct Wiener amalgam spaces.

Example 2.1. Let Q ⊂ Rd be an open cube with center 0 and side-lengthr > 1. Define T := Tkk∈Zd , with Tkx := x − k for all k ∈ Zd, and letωTkTk∈T = 1 + ‖k‖2Tk∈T . With Q := QTkTk∈T , then D(Q, Lp, `q

ωs)

corresponds to the Wiener amalgam space W(Lp, `qωs) for s ∈ R and 1 ≤

p, q ≤ ∞, see [13] for further details. 4

Let us now prove the following important properties of decompositionspaces.

91

Page 110: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

Theorem 2.1. Let Q = QTT∈T be a structured covering with Q−moderateweight ωTT∈T = 1+ ‖xT‖2T∈T and subordinate BAPU ψTT∈T . For s ∈ R

and 1 ≤ p, q ≤ ∞,

1. S(Rd) → D(Q, Lp, `qωs) → S ′(Rd).

2. D(Q, Lp, `qωs) is a Banach space.

3. If p, q < ∞, then S(Rd) is dense in D(Q, Lp, `qωs).

4. If p, q < ∞, then the dual space of D(Q, Lp, `qωs) can be identified with

D(Q, Lp′ , `q′

ω−s) with 1/p + 1/p′ = 1 and 1/q + 1/q′ = 1.

The proof of Theorem 2.1 can be found in Appendix A. In the next sectionwe construct decomposition spaces, which are compatible with the structureof painless NSGFs with flexible time resolution.

3 Nonstationary Gabor frames

In this section, we construct NSGFs with flexible time resolution using thenotation of [1]. Given a set of window functions gnn∈Zd ⊂ L2(Rd), withcorresponding frequency sampling steps bn > 0, then for m, n ∈ Zd we defineatoms of the form

gm,n(x) := gn(x)e2πimbn ·x, x ∈ Rd.

The choice of Zd as index set for n is only a matter of notational convenience;any countable index set would do.

Example 3.1. With gn(x) := g(x− na) and bn := b for all n ∈ Zd we get

gm,n(x) := g(x− na)e2πimb·x, x ∈ Rd,

which just corresponds to a standard Gabor system. 4

If ∑m,n |〈 f , gm,n〉|2 ‖ f ‖22 for all f ∈ L2(Rd), we refer to gm,nm,n as an

NSGF. For an NSGF gm,nm,n, the frame operator

S f = ∑m,n∈Zd

〈 f , gm,n〉 gm,n, f ∈ L2(Rd),

is invertible and we have the expansions

f = ∑m,n∈Zd

〈 f , gm,n〉 gm,n, f ∈ L2(Rd),

with gm,nm,n := S−1gm,nm,n being the canonical dual frame of gm,nm,n

[5]. For notational convenience we define G(x) := ∑n∈Zd 1/bdn |gn(x)|2. With

this notation we have the following result [1, Theorem 1].

92

Page 111: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. Nonstationary Gabor frames

Theorem 3.1. Let gnn∈Zd ⊂ L2(Rd) with frequency sampling steps bnn∈Zd ,bn > 0 for all n ∈ Zd. Assuming supp(gn) ⊆ [0, 1

bn]d + an, with an ∈ Rd for all

n ∈ Zd, the frame operator for the system

gm,n(x) = gn(x)e2πimbn ·x, ∀m, n ∈ Zd, x ∈ Rd,

is given byS f (x) = G(x) f (x), f ∈ L2(Rd).

The system gm,nm,n∈Zd constitutes a frame for L2(Rd), with frame-bounds0 < A ≤ B < ∞, if and only if

A ≤ G(x) ≤ B, for a.e. x ∈ Rd, (C.3)

and the canonical dual frame is then given by

gm,n(x) =gn(x)G(x)

e2πimbn ·x, x ∈ Rd. (C.4)

Remark 3.1. We note that the canonical dual frame in (C.4) posses the samestructure as the original frame, which is a property not shared by generalNSGFs. We also note that the canonical tight frame can be obtained by takingthe square root of the denominator in (C.4).

Traditionally, an NSGF satisfying the assumptions of Theorem 3.1 is calleda painless NSGF, referring to the fact that the frame operator is a simple mul-tiplication operator. This terminology is adopted from the classical painlessnonorthogonal expansions [6], which corresponds to the painless case for clas-sical Gabor frames. By slight abuse of notation we use the term "painless"to denote the NSGFs satisfying Definition 3.1 below. In order to properlyformulate this definition, we first need some preliminary notation which weadopt from [32].

Let gnn∈Zd ⊂ L2(Rd) satisfy the assumptions in Theorem 3.1. GivenC∗ > 0 we denote by Inn∈Zd the open cubes

In :=(−εn,

1bn

+ εn

)d+ an, ∀n ∈ Zd, (C.5)

with εn := C∗/bn for all n ∈ Zd. We note that supp(gm,n) ⊂ In for allm, n ∈ Zd. For n ∈ Zd we define

n :=

n′ ∈ Zd ∣∣ In′ ∩ In 6= ∅

,

using the notation of (C.1).

93

Page 112: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

Definition 3.1 (Painless NSGF). Let gnn∈Zd ⊂ L2(Rd) satisfy the assump-tions in Theorem 3.1, and assume further that,

1. There exists C∗ > 0 and n0 ∈ N, such that the open cubes Inn∈Zd ,given in (C.5), satisfy |n| ≤ n0 uniformly for all n ∈ Zd.

2. ann∈Zd is a δ−separated set and 1 + ‖an‖2n∈Zd constitutes aInn∈Zd−moderate weight.

3. The gn’s are continuous, real valued and satisfy

gn(x) ≤ Cbd/2n χIn(x), for all n ∈ Zd,

for some uniform constant C > 0.

Then we refer to gm,nm,n∈Zd as a painless NSGF.

The assumptions in Definition 3.1 are easily satisfied, but the support con-dition in Theorem 3.1 is rather restrictive and implies a certain redundancy ofthe system. Nevertheless, we must assume some structure on the dual frame,which is not provided by general NSGFs. We choose the framework of pain-less NSGFs and base our arguments on the fact that the dual frame possessthe same structure as the original frame. We expect it is possible to extendthe theory developed in this article to a more general settings by imposinggeneral existence results for NSGFs [12, 26, 39]. We now provide a simpleexample of a set of window functions satisfying Definition 3.1(3).

Example 3.2. Choose a continuous real valued function ϕ ∈ L2(Rd) \ 0with supp(ϕ) ⊆ [0, 1]d. For n ∈ Zd define

gn(x) := bd/2n ϕ(bn(x− an)), x ∈ Rd,

with an ∈ Rd and bn > 0. Then supp(gn) ⊆ [0, 1bn] + an and Definition 3.1(3)

is satisfied. 4

Following the approach taken in [32], we define Q := (0, 1)d together withthe set of affine transformations T := An(·) + cnn∈Zd with

An :=(

2εn +1bn

)· Id, and (cn)j := −εn + (an)j, 1 ≤ j ≤ d.

It is then easily shown that Q := QTT∈T = Inn∈Zd forms a structuredcovering of Rd [32, Lemma 4.1]. Given s ∈ R and 1 ≤ p, q ≤ ∞, we maytherefore construct the associated decomposition space D(Q, Lp, `q

ωs) withωTT∈T := 1 + ‖an‖2n∈Zd .

94

Page 113: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Characterization of decomposition spaces

Example 3.3. Let gm,nm∈Zd ,n∈Zd be a painless NSGF according to Definition3.1. Assume additionally that K := infbnn∈Zd > 0 and that Definition 3.1(1)and Definition 3.1(2) hold for the larger cubes Kn := (−ε, 1/K + ε)d + an forsome ε > 0. Defining Q := (0, 1)d and T := An(·) + cnn∈Zd , with

An :=(

2ε +1K

)· Id, and (cn)j := −ε + (an)j, 1 ≤ j ≤ d,

we obtain the structured covering Q := Knn∈Zd . In this special case theassociated decomposition space is the Wiener amalgam space W(Lp, `q

ωs) fors ∈ R and 1 ≤ p, q ≤ ∞ (cf. Example 2.1). 4

For the rest of this article, we write gm,Tm∈Zd ,T∈T for a painless NSGFwith associated structured covering Q := QTT∈T . With this notation,then supp(gm,T) ⊂ QT for all m ∈ Zd and all T ∈ T . Similarly we writeωTT∈T = 1 + ‖aT‖2T∈T for the associated weight function.

4 Characterization of decomposition spaces

Using the notation of [2] we define the sequence space d(Q, `p, `qωs) as the set

of coefficients cm,Tm∈Zd ,T∈T ⊂ C satisfying∥∥∥cm,Tm∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs ):=∥∥∥∥∥cm,Tm∈Zd

∥∥`p

T∈T

∥∥∥`

qωs

< ∞,

for s ∈ R and 1 ≤ p, q ≤ ∞. We can now prove the following importantstability result.

Theorem 4.1. Let gm,Tm∈Zd ,T∈T be a painless NSGF with associated structuredcovering Q = QTT∈T and weight function ωTT∈T = 1 + ‖aT‖2T∈T . Fixs ∈ R, 1 ≤ p ≤ 2 and let p′ := p/(p− 1). For f ∈ D(Q, Lp, `q

ωs) and 1 ≤ q ≤ ∞,∥∥∥〈 f , gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p′ ,`q

ωs )≤ C ‖ f ‖D(Q,Lp ,`q

ωs ), (C.6)

and for h ∈ D(Q, Lp′ , `qωs) and 1 ≤ q < ∞,

‖h‖D(Q,Lp′ ,`qωs )≤ C′

∥∥∥〈h, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs ). (C.7)

Proof. We first prove (C.6). Given f ∈ D(Q, Lp, `qωs), since ψT := ∑T′∈T ψT ≡

1 on QT , then

∥∥〈 f , gm,T〉m∈Zd

∥∥`p′ =

(∑

m∈Zd

∣∣∣⟨ψT f , gm,T

⟩∣∣∣p′)1/p′

= b−d/2T

(∑

m∈Zd

∣∣∣∣bd/2T

∫Rd

ψT(x) f (x)gT(x)e−2πimbT ·xdx∣∣∣∣p′)1/p′

,

95

Page 114: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

with bT > 0 being the frequency sampling step. Since 1 ≤ p ≤ 2 we can usethe Hausdorff-Young inequality [28, Theorem 2.1 on page 98], which togetherwith Definition 3.1(3) imply∥∥〈 f , gm,T〉m∈Zd

∥∥`p′ ≤ b−d/2

T

∥∥∥ψT f gT

∥∥∥Lp≤ C1

∥∥∥ψT f∥∥∥

Lp.

Hence, using (C.2) we get∥∥∥〈 f , gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p′ ,`q

ωs )≤ C1

∥∥∥∥∥∥∥ψT f∥∥∥

Lp

T∈T

∥∥∥∥`

qωs

≤ C2 ‖ f ‖D(Q,Lp ,`qωs )

.

Let us now prove (C.7). Given h ∈ D(Q, Lp′ , `qωs) we may write the norm as

‖h‖D(Q,Lp′ ,`qωs )

= supσ∈S(Rd),‖σ‖

D(Q,Lp ,`q′ω−s )

=1|〈h, σ〉| , q′ := q/(q− 1), (C.8)

since the dual space of D(Q, Lp′ , `qωs) can be identified with D(Q, Lp, `q′

ω−s)

and since S(Rd) is dense in D(Q, Lp, `q′

ω−s). Given σ ∈ S(Rd), with‖σ‖

D(Q,Lp ,`q′ω−s )

= 1, we write the frame expansion of σ with respect to

gm,Tm,T and apply Hölder’s inequality twice to obtain

|〈h, σ〉| ≤ ∑T∈T

∑m∈Zd

|〈σ, gm,T〉 〈h, gm,T〉|

≤ ∑T∈T

∥∥〈σ, gm,T〉m∈Zd

∥∥`p′∥∥〈h, gm,T〉m∈Zd

∥∥`p

≤∥∥∥〈σ, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p′ ,`q′

ω−s )

∥∥∥〈h, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs ).

(C.9)

According to (C.6) then

‖〈σ, gm,T〉m,T‖d(Q,`p′ ,`q′ω−s )≤ C1 ‖σ‖D(Q,Lp ,`q′

ω−s )= C1,

which combined with (C.9) and (C.3) yield

|〈h, σ〉| ≤ C1

∥∥∥〈h, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs )

≤ C2

∥∥∥〈h, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs ), (C.10)

96

Page 115: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. Characterization of decomposition spaces

with C2 := C1/A. Finally, combining (C.8) and (C.10) we arrive at

‖h‖D(Q,Lp′ ,`qωs )≤ C2

∥∥∥〈h, gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`p ,`q

ωs ),

which proves (C.7).

We note that for s ∈ R, 1 ≤ q < ∞ and p = 2, Theorem 4.1 yields theequivalence

‖ f ‖D(Q,L2,`qωs )∥∥∥〈 f , gm,T〉m∈Zd ,T∈T

∥∥∥d(Q,`2,`q

ωs ), f ∈ D(Q, L2, `q

ωs).

It follows that the coefficient operator C : f → 〈 f , gm,T〉m,T is bounded fromD(Q, L2, `q

ωs) into d(Q, `2, `qωs). We define the corresponding reconstruction

operator as

R(cm,Tm∈Zd ,T∈T

)= ∑

T∈T∑

m∈Zd

cm,T gm,T , ∀cm,Tm,T ∈ d(Q, `2, `qωs).

With this notation we have the following result.

Proposition 4.1. Let gm,Tm∈Zd ,T∈T be a painless NSGF with associated struc-tured covering Q = QTT∈T and weight function ωTT∈T = 1+ ‖aT‖2T∈T .Given s ∈ R and 1 ≤ q < ∞, the reconstruction operator R is bounded fromd(Q, `2, `q

ωs) onto D(Q, L2, `qωs) and we have the expansions

f = RC( f ) = ∑m∈Zd ,T∈T

〈 f , gm,T〉 gm,T , f ∈ D(Q, L2, `qωs), (C.11)

with unconditional convergence.

Proof. We first prove that R is bounded. Given cm,Tm,T ∈ d(Q, `2, `qωs), (C.2)

and (C.3) yield

‖R(cm,Tm,T)‖D(Q,L2,`qωs )

=

∥∥∥∥∥∥∥∥∥∥∥ψT

(∑

T′∈T∑

m∈Zd

cm,T′ gm,T′

)∥∥∥∥∥L2

T∈T

∥∥∥∥∥∥`

qωs

≤ C1

∥∥∥∥∥∥∥∥∥∥∥ ∑

m∈Zd

cm,T gm,T

∥∥∥∥∥L2

T∈T

∥∥∥∥∥∥`

qωs

. (C.12)

Applying Definition 3.1(3) and the Hausdorff-Young inequality [28, Theorem2.2 on page 99] we get∥∥∥∥∥ ∑

m∈Zd

cm,T gm,T

∥∥∥∥∥2

L2

≤ C∫

Rd

∣∣∣∣∣bd/2T ∑

m∈Zd

cm,Te2πimbT ·x∣∣∣∣∣2

dx ≤ C∥∥cm,Tm∈Zd

∥∥2`2 .

(C.13)

97

Page 116: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

Combining (C.12) and (C.13) we arrive at

‖R(cm,Tm,T)‖D(Q,L2,`qωs )≤ C2

∥∥∥∥∥cm,Tm∈Zd

∥∥`2

T∈T

∥∥∥`

qωs

= C2 ‖cm,Tm,T‖d(Q,`2,`qωs )

, (C.14)

which shows the boundedness of R. Let us now prove the unconditionalconvergence of (C.11). Given f ∈ D(Q, L2, `q

ωs) we can find a sequence fkk∈N ⊂ S(Rd) such that fk → f in D(Q, L2, `q

ωs). For each k we havethe expansion fk = RC( fk) and by continuity of RC we get f = RC( f ). Givenε > 0, (C.14) implies that we can find a finite subset F0 ⊆ Zd × T , such thatfor all finite sets F ⊇ F0,∥∥∥∥∥∥ f − ∑

(m,T)∈F〈 f , gm,T〉 gm,T

∥∥∥∥∥∥D(Q,L2,`q

ωs )

≤ C2

∥∥∥〈 f , gm,T〉(m,T)/∈F

∥∥∥d(Q,`2,`q

ωs )< ε.

According to [22, Proposition 5.3.1 on page 98], this property is equivalent tounconditional convergence.

Based on Proposition 4.1, we can show some important properties ofgm,Tm∈Zd ,T∈T in connection with nonlinear approximation theory [7, 8].Assume f ∈ D(Q, L2, `2

ωs), for s ∈ R, and write the frame expansion

f = ∑m∈Zd ,T∈T

〈 f , gm,T〉 gm,T . (C.15)

Let θkk∈N be a rearrangement of the frame coefficients 〈 f , gm,T〉m,T suchthat |θk|k∈N constitutes a non-increasing sequence. Also, let fN be theN-term approximation to f obtained by extracting the terms in (C.15) corre-sponding to the N largest coefficients θkN

k=1. Since R is bounded, [20, The-orem 6] implies that for each 1 ≤ τ < 2,

‖ f − fN‖D(Q,L2,`2ωs )≤ C1

∥∥θkk>N∥∥

d(Q,`2,`2ωs )≤ C2N−α

∥∥θkk∈N

∥∥d(Q,`τ ,`τ

ωs )

= C2N−α∥∥〈 f , gm,T〉k∈N

∥∥d(Q,`τ ,`τ

ωs ), α := 1/τ − 1/2.

(C.16)

We conclude that for f ∈ D(Q, L2, `2ωs), with frame coefficients in

d(Q, `τ , `τωs), we obtain good approximations in D(Q, L2, `2

ωs) by threshold-ing the frame coefficients in (C.15). The rate of the approximation is given byα ∈ (0, 1/2].

98

Page 117: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Numerical experiments

5 Numerical experiments

In this section we provide the numerical experiments, thresholding coef-ficients of both stationary and nonstationary Gabor expansions. We notethat analysis with a stationary Gabor frame corresponds to analysis withthe short-time Fourier transform (STFT) as the Gabor coefficients can be re-written as

〈 f , gm,n〉 =∫

Rdf (t)g(t− na)e−2πimb·tdt = Vg f (na, mb), f ∈ L2(Rd),

with Vg f (na, mb) denoting the STFT of f , with respect to g, at time na andfrequency mb.

For the implementation we use MATLAB 2017B and in particular we usethe following two toolboxes: The LTFAT [34] (version 2.2.0 or above) avail-able from http://ltfat.github.io/ and the NSGToolbox [1] (version 0.1.0or above) available from http://nsg.sourceforge.net/. The sound files weconsider are part of the EBU-SQAM database [38], which consists of 70 testsounds sampled at 44.1 kHz. The test sounds form a large variety of speechand music including single instruments, classical orchestra, and pop music.Since music signals are continuous signals of finite energy, it make sense toconsider them in the framework of decomposition spaces. Moreover, the de-composition space norm constitutes a natural measure for such nonstationarysignals, capable of detecting local signal changes as opposed to the standardLp−norm.

We divide the numerical analysis into two sections. In Section 5.1 wecompare the performance of an adaptive nonstationary Gabor expansion tothat of a classical Gabor expansion by analyzing spectrograms, reconstruc-tion errors, and approximation rates associated to a particular music signal(signal 39 of the EBU-SQAM database). Then, in Section 5.2 we extend theexperiment to cover the entire EBU-SQAM database and compare the averagereconstruction errors and approximation rates, taken over the 70 test signals,for the two methods. To analyse the performance of an expansion we use therelative root mean square (RMS) reconstruction error

RMS( f , frec) :=‖ f − frec‖2‖ f ‖2

.

As a general rule of thumb, an RMS error below 1% is hardly noticeable tothe average listener. We measure the redundancy of a transform by

number of coefficientslength of signal

.

The redundancy of the adaptive NSGF is approximately 5/3 and we havechosen parameters for the stationary Gabor frame, which mathes this redun-dancy.

99

Page 118: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

5.1 Single experiment

In this experiment we consider sample 22000-284143 of signal 39 in the EBU-SQAM database. This signal is a piece of piano music consisting of an in-creasing melody of 10 individual tones (taken from an F major chord) startingat F2 (87 Hz fundamental frequency) and ending at F5 (698 Hz fundamentalfrequency). We construct the Gabor expansion using 1536 frequency chan-nels and a hop size of 1024. The window function is chosen as a Hanningwindow of length 1536 such that the resulting system constitutes a painlessGabor frame. The Gabor transform has a redundancy of ≈ 1.51 and the to-tal number of Gabor coefficients is 198402 (of which 195326 are non-zero).We only work with the coefficients of the positive frequencies since the sig-nal is real valued. Performing hard thresholding, and keeping only the 15800largest coefficients, we obtain a reconstructed signal with an RMS reconstruc-tion error just below 1%.

For the adaptive NSGF, we choose to follow the adaptation procedurefrom [1], resulting in the construction of so-called scale frames. The idea isto calculate the onsets of the music piece, using a separate algorithm [9],and then to use short window functions around the onsets and long windowfunctions between the onsets. The space between two onsets is spanned insuch a way that the window length first increases (as we move away fromthe first onset) and then decreases (as we approach the second onset). Toobtain a smooth resolution, the construction is such that adjacent windowsare either of the same length or one is twice as long as the other. We re-fer the reader to [1] for further details. For the actual implementation, weuse 8 different Hanning windows with lengths varying from 192 (around theonsets) to 192 · 27 = 24576. For the particular signal, the nonstationary Ga-bor transform has a redundancy of ≈ 1.66, which is comparable to that ofthe Gabor transform. The total number of coefficients is 217993 (of which216067 are non-zero). Again, we only consider the coefficients of the positivefrequencies. Keeping the 13100 largest coefficients we obtain an expansionwith an RMS reconstruction error just below 1%. This is considerably fewercoefficients than needed for the stationary Gabor expansion, which shows anatural sparseness of scale frames for this particular signal class. This prop-erty was already noted by the authors in [1]. Spectrograms based on theoriginal expansions and the thresholded expansions can be found in Fig. C.1.

The 10 "vertical stripes" in the spectrograms correspond to the onsets ofthe 10 tones in the melody and the "horizontal stripes" correspond to thefrequencies of the harmonics. We note that the adaptive behaviour of theNSGF is clearly visible in the spectrograms, resulting in a good time resolu-tion around the onsets and a good frequency resolution between the onsets.In contrast to this behaviour, the stationary Gabor frame uses a uniform res-

100

Page 119: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Numerical experiments

Fig. C.1: Spectrograms based on the original and thresholded Gabor- and nonstationary Gabor(NSG) expansions with RMS errors just below 1%.

olution over the whole time-frequency plane.Based on the results from Section 4 (in particular (C.16)), we expect the

RMS error E(N) to decrease as N−α, for some α > 0, with N being thenumber of non-zero coefficients. Calculating E(N) for different values of Nand performing power regression, we obtain the plots shown in Fig. C.2.

The results in Fig. C.2 show that both the RMS error E(N) and the ap-proximation rate α are lower for the nonstationary Gabor expansion than forthe stationary Gabor expansion. Clearly, a small RMS error is more importantthan a fast approximation rate. Also, the fast approximation rate for the sta-tionary Gabor frame is caused mainly by the high RMS error associated withsmall values of N. We note that both approximation rates are considerablyfaster than the rate given in (C.16) (which belongs to (0, 1/2]). This illustratesthat (C.16) only provides us with an upper bound on the approximation error— the actual error might be much smaller. It also illustrates that both meth-ods work extremely well for this kind of sparse signal. In the next section we

101

Page 120: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

0 1 2 3 4N x104

0

0.05

0.1

0.15

E(N)

Stationary Gabor

0 1 2 3 4N x104

0

0.02

0.04

0.06

0.08

0.1

E(N)

Nonstationary Gabor

α≈2.45 α≈2.15

Fig. C.2: RMS error E(N) as a function of N, the number of non-zero coefficients, for bothstationary and nonstationary Gabor expansions. Also, an estimated power function is plottedfor each expansion together with the associated value of the exponent α.

extend the analyzis presented here to cover the entire EBU-SQAM database.

5.2 Large scale experiment

For this experiment we consider the first 524288 samples of each of the 70test sounds avaliable in the EBU-SQAM database. For each test sound weconstruct a nonstationary Gabor expansion, with parameters as described inSection 5.1, and three stationary Gabor expansions with different parame-ter settings. Using the notation (hopsize,number of frequency channels), weuse the parameter settings (1024, 2048), (1536, 2048), and (1024, 1536) for thethree Gabor expansions. The window function associated to a Gabor expan-sion is chosen as a Hanning window with length equal to the correspondingnumber of frequency channels (resulting in a painless Gabor frame). For eachof the four expansions we calculate for each test sound

1. The redundancy of the (non-thresholded) expansion.

2. Thresholded expansions with respect to N, the number of non-zerocoefficient, where N takes on the values

N ∈ 10000, 11000, · · · , 29000, 30000, 35000, · · · , 195000, 200000 .

3. The sum of RMS errors ∑N E(N) taken over all the 55 possible valuesof N.

4. The value α of the estimated power function.

Repeating the experiment for all 70 test sounds we get the averaged valuesshown in Table C.1.

102

Page 121: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

6. Conclusion

Table C.1: Average redundancies, sum of RMS errors, and approximation rates taken over the 70test signals in the EBU-SQAM database. The experiment includes three stationary Gabor frames,with different parameters settings, and one NSGF.

Transform: G(1024, 2048) G(1536, 2048) G(1024, 1536) NSGFAverage redun.: 2.0020 1.3451 1.5049 1.6206Average error: 2.1448 1.9128 1.9492 1.7367Average α: 1.3088 1.4455 1.4278 1.2606

The results in Table C.1 show the same behaviour as the experiment inSection 5.1 — The NSGF provides the smallest RMS error and the slowestapproximation rate. We note that the approximation rates all belong to theinterval [1.25; 1.45], which is much lower than the rates obtained in Section5.1. This is due to the fact that the piano signal in Section 5.1 has a verysparse expansion, which is not true for all 70 test signals in the database.At first glance, the Gabor frame which seems to provide the best results isthe one with parameter settings (1536, 2048) — it produces the smallest RMSerror and the largest approximation rate. However, this is mainly due to thelow redundancy of the frame, which is only around 1.35. A low redundancyimplies fewer Gabor coefficients (with more time-frequency information con-tained in each coefficient), which implies good results in terms of RMS errorand approximation rate. However, a low redundancy also implies a worsenedtime-frequency resolution, which is not desirable for practical purposes. Fi-nally, it is worth noting that the NSGF produces a significantly lower RMSerror than the Gabor frame with parameters (1536, 2048) even with a higherredundancy.

6 Conclusion

We have provided a self-contained description of decomposition spaces onthe time side and proven several important properties of such spaces. Givena painless NSGF with flexible time resolution, we have shown how to con-struct an associated decomposition space, which characterizes signals withsparse expansions relative to the NSGF. Based on this characterization wehave proven an upper bound on the approximation error occurring whenthresholding the coefficients of the frame expansions. The theoretical re-sults have been complemented with numerical experiments, illustrating thatthe approximation error is indeed smaller than the theoretical upper bound.Using terminology from nonlinear approximation theory, we have proven aJackson inequality for nonlinear approximation with certain NSGFs. It couldbe interesting to consider the inverse estimate, a so-called Bernstein inequal-ity, providing us with a lower bound on the approximation error. The numer-

103

Page 122: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

ical experiments indeed suggest that the approximation error acts as a powerfunction of the number of non-zero coefficients. Unfortunately, obtaining aBernstein inequality for such a redundant dictionary is in general beyond thereach of current methods [21].

A Proof of Theorem 2.1

Proof. We will use the well known fact that∫Rd

(1 + ‖x‖2)−mdx < ∞, m > d. (C.17)

We prove each of the four statements separately and we write Dsp,q :=

D(Q, Lp, `qωs) to simplify notation.

1. Repeating the arguments from [3, Proposition 5.7], using Definition2.2(2), we can show that

Ds+εp,∞ → Ds

p,q → Dsp,∞, ε > d/q, (C.18)

for any s ∈ R and 1 ≤ p, q ≤ ∞. Hence, to prove Theorem 2.1(1) itsuffices to show that S(Rd) → Ds

p,∞ → S ′(Rd) for any s ∈ R and1 ≤ p ≤ ∞. We first show that S(Rd) → Ds

p,∞. Since ωTT∈T =1 + ‖xT‖2T∈T is Q−moderate, and ψT is uniformly bounded, thisresult follows from (C.17) since

ωsT ‖ψT f ‖Lp ≤ C1 ‖(1 + ‖·‖2)

sψT f ‖Lp ≤ C1 ‖(1 + ‖·‖2)s f ‖Lp

≤ C2∥∥(1 + ‖·‖2)

s+r f∥∥

L∞

≤ C2 max|β|≤N

supx∈Rd

∣∣∣(1 + ‖x‖2)N∂

βx f (x)

∣∣∣ , f ∈ S(Rd),

for r > d/p and N ≥ s + r. To show that Dsp,∞ → S ′(Rd), we define

ψT := ∑T′∈T ψT′ . Given f ∈ Dsp,∞ and ϕ ∈ S(Rd), Hölder’s inequality

yields

|〈 f , ϕ〉| =∣∣∣∣∣ ∑T∈T

⟨ψT f , ψT ϕ

⟩∣∣∣∣∣ ≤ ∑T∈T

∥∥∥ψT f ψT ϕ∥∥∥

L1

≤ ∑T∈T‖ψT f ‖Lp

∥∥∥ψT ϕ∥∥∥

Lp′≤ ‖ f ‖Ds

p,∞ ∑T∈T

ω−sT

∥∥∥ψT ϕ∥∥∥

Lp′, (C.19)

104

Page 123: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

A. Proof of Theorem 2.1

with 1/p + 1/p′ = 1. Applying (C.2) we get

∑T∈T

ω−sT

∥∥∥ψT ϕ∥∥∥

Lp′≤

∥∥∥∥∥∥

∑T′∈T

‖ψT′ϕ‖Lp′

T∈T

∥∥∥∥∥∥`1

ω−s

=

∥∥∥∥(‖ψT ϕ‖Lp′

)+T∈T

∥∥∥∥`1

ω−s

≤ C+

∥∥∥‖ψT ϕ‖Lp′

T∈T

∥∥∥`1

ω−s

= C+ ‖ϕ‖D−sp′ ,1

. (C.20)

Now, (C.18) implies ‖ϕ‖D−sp′ ,1≤ C ‖ϕ‖Dε−s

p′ ,∞for ε > d. Hence, since we

have already shown that S(Rd) → Dsp,∞, we conclude from (C.19) and

(C.20) that Dsp,∞ → S ′(Rd). This proves Theorem 2.1(1).

2. Theorem 2.1(2) follows from Theorem 2.1(1) and the arguments in [3,Page 150].

3. To prove Theorem 2.1(3) we let f ∈ Dsp,q and choose a function I ∈

C∞c (Rd) satisfying 0 ≤ I(x) ≤ 1 and I(x) ≡ 1 on some neighbourhood

of x = 0. Since supp(I) is compact we can choose a finite subset T∗ ⊂ Tsuch that supp(I) ⊂ ∪T∈T∗QT and ∑T∈T∗ ψT(x) ≡ 1 on supp(I). Hence,with f := I f we get∥∥∥ f

∥∥∥Lp

=

∥∥∥∥∥ ∑T∈T∗

ψT I f

∥∥∥∥∥Lp

≤ ∑T∈T∗

‖ψT f ‖Lp < ∞, (C.21)

since f ∈ Dsp,q. Let ϕ ∈ C∞

c (Rd) with 0 ≤ ϕ(x) ≤ 1 and∫

Rd ϕ(x)dx = 1.

Also, for ε > 0 define ϕε(x) := ε−d ϕ(x/ε) and let fε := ϕε ∗ f ∈ S(Rd).It follows from (C.21) and a standard result on Lp-spaces [29, Theorem2.16 on page 64] that∥∥∥ f − fε

∥∥∥Ds

p,q≤∥∥∥∥∥∥∥ f − fε

∥∥∥Lp

T∈T

∥∥∥∥`

qωs

→ 0

as ε → 0. Hence, the proof is done, if we can show that ‖ f − f ‖Dsp,q

can be made arbitrary small by choosing f appropriately. To showthis, we define T := T ∈ T

∣∣ I(x) ≡ 1 on supp(ψT). Denoting itscomplement by Tc

we get∥∥∥ f − f∥∥∥

Dsp,q≤ 2

∥∥∥‖ψT f ‖LpT∈Tc

∥∥∥`

qωs

.

Finally, since f ∈ Dsp,q, we can choose supp(I) large enough, such that

‖ f − f ‖Dsp,q < ε for any given ε > 0. This proves Theorem 2.1(3).

105

Page 124: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper C.

4. To prove Theorem 2.1(4) we first note that (Dsp,q)′ ⊂ S ′(Rd) since

S(Rd) ⊂ Dsp,q. Furthermore, by Remark 2.2 we may assume the same

BAPU ψTT∈T is used for both Dsp,q and D−s

p′ ,q′ . Let us first show that

D−sp′ ,q′ ⊆ (Ds

p,q)′. Given σ ∈ D−s

p′ ,q′ and f ∈ Dsp,q, applying (C.2) and

Hölder’s inequality twice yield

|〈 f , σ〉| =∣∣∣∣∣ ∑T∈T

⟨ψT f , ψTσ

⟩∣∣∣∣∣ ≤ ∑T∈T

∥∥∥ψT f∥∥∥

Lp‖ψTσ‖Lp′

≤ ∑T∈T

(ωs

T ∑T′∈T

‖ψT′ f ‖Lp

)(ω−s

T ‖ψTσ‖Lp′

)≤∥∥∥∥∥∥∥(ψT f )+

∥∥∥Lp

T∈T

∥∥∥∥`

qωs

∥∥∥‖ψTσ‖Lp′

T∈T

∥∥∥`

q′ω−s

≤ C+ ‖ f ‖Dsp,q‖σ‖D−s

p′ ,q′.

To prove that (Dsp,q)′ ⊆ D−s

p′ ,q′ we define the space `q (Lp) as those

fTT∈T ⊂ S ′(Rd) satisfying

‖ fTT∈T ‖`q(Lp) :=∥∥‖ fT‖LpT∈T

∥∥`q < ∞.

With this notation we get

‖ f ‖Dsp,q

=∥∥ωs

T ‖ψT f ‖LpT∈T∥∥`q = ‖ωs

TψT f T∈T ‖`q(Lp) ,

for all f ∈ Dsp,q. Since f → ωs

TψT f T∈T defines an injective mappingfrom Ds

p,q onto a subspace of `q (Lp), every σ ∈ (Dsp,q)′ can be inter-

preted as a functional on that subspace. By the Hahn-Banach theorem,σ can be extended to a continuous linear functional on `q (Lp) wherethe norm of σ is preserved. It thus follows from [37, Proposition 2.11.1on page 177] that for f ∈ Ds

p,q we may write

σ( f ) =∫

Rd ∑T∈T

σT(x)ωsTψT(x) f (x)dx, where (C.22)

σT(x)T∈T ∈ `q′(

Lp′)

, and ‖σ‖∗ = ‖σTT∈T ‖`q′(Lp′) , (C.23)

with ‖σ‖∗ := sup‖hT‖`q(Lp)=1 |σ(hT)| denoting the standard norm

on (`q (Lp))′. From (C.22) we conclude that the proof is done if we can

106

Page 125: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

show that ∑T∈T σT(x)ωsTψT(x) ∈ D−s

p′ ,q′ . This follows from (C.2) since

∥∥∥∥∥ ∑T∈T

σTωsTψT

∥∥∥∥∥D−s

p′ ,q′

=

∥∥∥∥∥∥∥∥∥∥∥ψT

(∑

T′∈T

σT′ωsT′ψT′

)∥∥∥∥∥Lp′

T∈T

∥∥∥∥∥∥`

q′ω−s

≤ C∥∥∥‖σTωs

TψT‖Lp′

T∈T

∥∥∥`

q′ω−s

≤ C∥∥∥‖σT‖Lp′

T∈T

∥∥∥`q′

= C∥∥σTT∈T

∥∥`q′ (Lp′ ) = C ‖σ‖∗ ,

where we use (C.23) in the last equation. This proves Theorem 2.1(4).

References

[1] P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco. Theory, im-plementation and applications of nonstationary Gabor frames. J. Comput.Appl. Math., 236(6):1481–1496, 2011.

[2] L. Borup and M. Nielsen. Frame decomposition of decompositionspaces. J. Fourier Anal. Appl., 13(1):39–70, 2007.

[3] L. Borup and M. Nielsen. On anisotropic Triebel-Lizorkin type spaces,with applications to the study of pseudo-differential operators. J. Funct.Spaces Appl., 6(2):107–154, 2008.

[4] P. L. Butzer and K. Scherer. Jackson and Bernstein-type inequalities forfamilies of commutative operators in Banach spaces. J. ApproximationTheory, 5:308–342, 1972.

[5] O. Christensen. An introduction to frames and Riesz bases. Applied and Nu-merical Harmonic Analysis. Birkhäuser/Springer, [Cham], second edi-tion, 2016.

[6] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonalexpansions. J. Math. Phys., 27(5):1271–1283, 1986.

[7] R. A. DeVore. Nonlinear approximation. In Acta numerica, 1998, volume 7of Acta Numer., pages 51–150. Cambridge Univ. Press, Cambridge, 1998.

[8] R. A. DeVore and G. G. Lorentz. Constructive approximation, volume 303of Grundlehren der Mathematischen Wissenschaften [Fundamental Principlesof Mathematical Sciences]. Springer-Verlag, Berlin, 1993.

107

Page 126: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[9] S. Dixon. Onset detection revisited. In Proc. of the Int. Conf. on DigitalAudio Effects (DAFx-06), pages 133–137, Montreal, Quebec, Canada, sep2006.

[10] M. Dörfler. Time-frequency analysis for music signals: A mathematicalapproach. Journal of New Music Research, 30(1):3–12, 2001.

[11] M. Dörfler. Quilted Gabor frames—a new concept for adaptive time-frequency representation. Adv. in Appl. Math., 47(4):668–687, 2011.

[12] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—existenceand construction. Int. J. Wavelets Multiresolut. Inf. Process., 12(3):1450032,18, 2014.

[13] H. G. Feichtinger. Banach convolution algebras of Wiener type. In Func-tions, series, operators, Vol. I, II (Budapest, 1980), volume 35 of Colloq. Math.Soc. János Bolyai, pages 509–524. North-Holland, Amsterdam, 1983.

[14] H. G. Feichtinger. Modulation spaces on locally compact abelian groups.Technical report, University of Vienna, 1983.

[15] H. G. Feichtinger. Banach spaces of distributions defined by decompo-sition methods. II. Math. Nachr., 132:207–237, 1987.

[16] H. G. Feichtinger. Modulation spaces: looking back and ahead. Sampl.Theory Signal Image Process., 5(2):109–140, 2006.

[17] H. G. Feichtinger and P. Gröbner. Banach spaces of distributions definedby decomposition methods. I. Math. Nachr., 123:97–120, 1985.

[18] S. Foucart and H. Rauhut. A mathematical introduction to compressive sens-ing. Applied and Numerical Harmonic Analysis. Birkhäuser/Springer,New York, 2013.

[19] D. Gabor. Theory of communication. J. IEE, 93(26):429–457, Nov. 1946.

[20] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. I. Direct estimates. J. Fourier Anal. Appl., 10(1):51–71, 2004.

[21] R. Gribonval and M. Nielsen. Nonlinear approximation with dictionar-ies. II. Inverse estimates. Constr. Approx., 24(2):157–173, 2006.

[22] K. Gröchenig. Foundations of time-frequency analysis. Applied and Nu-merical Harmonic Analysis. Birkhäuser Boston, Inc., Boston, MA, 2001.

[23] K. Gröchenig and S. Samarah. Nonlinear approximation with localFourier bases. Constr. Approx., 16(3):317–331, 2000.

108

Page 127: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[24] C. Heil. An introduction to weighted Wiener amalgams. Wavelets andtheir Applications (Chennai, January 2002), pages 183–216, 2003.

[25] E. Hernández, D. Labate, and G. Weiss. A unified characterization ofreproducing systems generated by a finite family. II. J. Geom. Anal.,12(4):615–662, 2002.

[26] N. Holighaus. Structure of nonstationary Gabor frames and their dualsystems. Appl. Comput. Harmon. Anal., 37(3):442–463, 2014.

[27] F. Jaillet and B. Torrésani. Time-frequency jigsaw puzzle: adaptive multi-window and multilayered Gabor expansions. Int. J. Wavelets Multiresolut.Inf. Process., 5(2):293–315, 2007.

[28] Y. Katznelson. An introduction to harmonic analysis. Dover Publications,Inc., New York, corrected edition, 1976.

[29] E. H. Lieb and M. Loss. Analysis, volume 14 of Graduate Studies in Math-ematics. American Mathematical Society, Providence, RI, second edition,2001.

[30] M. Liuni, A. Röbel, E. Matusiak, M. Romito, and X. Rodet. Automaticadaptation of the time-frequency resolution for sound analysis and re-synthesis. IEEE Transactions on Audio, Speech, and Language Processing,21(5):959–970, May 2013.

[31] S. Mallat. A wavelet tour of signal processing. Elsevier/Academic Press,Amsterdam, third edition, 2009. The sparse way, With contributionsfrom Gabriel Peyré.

[32] E. S. Ottosen and M. Nielsen. A characterization of sparse nonstationaryGabor expansions. Journal of Fourier Analysis and Applications, May 2017.

[33] G. E. Pfander, H. Rauhut, and J. A. Tropp. The restricted isometry prop-erty for time-frequency structured random matrices. Probab. Theory Re-lated Fields, 156(3-4):707–737, 2013.

[34] Z. Pruša, P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Bal-azs. The Large Time-Frequency Analysis Toolbox 2.0. In M. Aramaki,O. Derrien, R. Kronland-Martinet, and S. Ystad, editors, Sound, Music,and Motion, Lecture Notes in Computer Science, pages 419–442. SpringerInternational Publishing, 2014.

[35] A. Ron and Z. Shen. Weyl-Heisenberg frames and Riesz bases in L2(Rd).Duke Math. J., 89(2):237–282, 1997.

[36] A. Ron and Z. Shen. Generalized shift-invariant systems. Constr. Approx.,22(1):1–45, 2005.

109

Page 128: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[37] H. Triebel. Theory of function spaces, volume 38 of Mathematik und ihreAnwendungen in Physik und Technik [Mathematics and its applications inphysics and technology]. Akademische Verlagsgesellschaft Geest & PortigK.-G., Leipzig, 1983.

[38] E. B. Union. Sound quality assessment material: Recordings for sub-jective tests ; User’s handbook for the EBU - SQAM compact disc, Sep2008.

[39] F. Voigtlaender. Structured, compactly supported Banach frame decom-positions of decomposition spaces. ArXiv preprint, arXiv:1612.08772,2016.

[40] P. J. Wolfe, S. J. Godsill, and M. Dörfler. Multi-Gabor dictionaries foraudio time-frequency analysis. In Proceedings of the 2001 IEEE Work-shop on the Applications of Signal Processing to Audio and Acoustics (Cat.No.01TH8575), pages 43–46, 2001.

[41] R. M. Young. An introduction to nonharmonic Fourier series, volume 93of Pure and Applied Mathematics. Academic Press, Inc. [Harcourt BraceJovanovich, Publishers], New York-London, 1980.

[42] M. Zibulski and Y. Y. Zeevi. Analysis of multiwindow Gabor-typeschemes by frame methods. Appl. Comput. Harmon. Anal., 4(2):188–221,1997.

110

Page 129: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D

A Phase Vocoder Based on Nonstationary GaborFrames

Emil Solsbæk Ottosen and Monika Dörfler

The paper has been published inIEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 25(11),

pp. 2199–2208, 2017.

Page 130: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

c© 2017 IEEEThe layout has been revised.

Page 131: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

1. Introduction

Abstract

We propose a new algorithm for time stretching music signals based on the theory ofnonstationary Gabor frames (NSGFs). The algorithm extends the techniques of theclassical phase vocoder (PV) by incorporating adaptive time-frequency (TF) represen-tations and adaptive phase locking. The adaptive TF representations imply good timeresolution for the onsets of attack transients and good frequency resolution for thesinusoidal components. We estimate the phase values only at peak channels and theremaining phases are then locked to the values of the peaks in an adaptive manner.During attack transients we keep the stretch factor equal to one and we propose anew strategy for determining which channels are relevant for reinitializing the cor-responding phase values. In contrast to previously published algorithms we use anon-uniform NSGF to obtain a low redundancy of the corresponding TF representa-tion. We show that with just three times as many TF coefficients as signal samples,artifacts such as phasiness and transient smearing can be greatly reduced comparedto the classical PV. The proposed algorithm is tested on both synthetic and real worldsignals and compared with state of the art algorithms in a reproducible manner.

1 Introduction

The task of time stretching or pitch shifting music signals is fundamental incomputer music and has many applications within areas such as transcrip-tion, mixing, transposition and auto-tuning [15, 25]. Time stretching is theoperation of changing the length of a signal, without affecting its spectralcontent, while pitch shifting is the operation of raising or lowering the orig-inal pitch of a sound without affecting its length. As pitch shifting can beperformed by combining time stretching and sampling rate conversion, weshall only focus on time stretching in this paper.

Introduced by Flanagan and Golden in [12], the phase vocoder (PV)stretches a signal by modifying its short time Fourier transform (STFT) insuch a way that a stretched version can be obtain by reconstructing with re-spect to a different hop size. Through the years many improvements havebeen made and the PV is today a well-established technique [13, 17, 21, 22].Unfortunately, it is known that the PV induces artifacts known as "phasi-ness" and "transient smearing" [17]. Phasiness is perceived as a characteristiccolouration of the sound while transient smearing is heard as a lack of sharp-ness at the transients. Many modern techniques exist for dealing with theseissues [10, 17, 26], but with only few exceptions [3, 7, 19], they are all basedon the traditional idea of modifying a time-frequency (TF) representation ob-tained through the STFT. The STFT applies a sampling grid corresponding toa uniform TF resolution over the whole TF plane. For music signals it is oftenmore appropriate to use good time resolution for the onset of attack tran-

113

Page 132: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

sients and good frequency resolution for the sinusoidal components. We willconsider the task of time stretching in the framework of Gabor theory [5, 14].Applying nonstationary Gabor frames (NSGFs) [1, 9] we extend the theory ofthe PV to incorporate TF representations with the above-mentioned adaptiveTF resolution.

In Section 1.1 of this article we describe some related work and explainthe contributions of the proposed algorithm in relation to state of the art. InSection 2 we introduce the necessary tools from Gabor theory, including thepainless condition for NSGFs. We use this framework to present the classicalPV in Section 3 and the proposed algorithm in Section 4. We include thederivation of the classical PV for two reasons: Firstly, because it makes thetransition to the nonstationary case easier and secondly, because we have notfound any other thorough derivation in the literature that uses the frameworkof Gabor theory. Finally, in Section 5 we provide the numerical experimentsand in Section 6 we give the conclusions.

1.1 State of the art

Traditionally, time-stretching algorithms are categorized into time-domainand frequency-domain techniques [21]. While time-domain techniques suchas synchronous overlap-add (SOLA) [27] (and its extension PSOLA [4]) are ca-pable of producing good results for monophonic signals, at a low computa-tional cost, they tend to perform poorly when applied to polyphonic signalssuch as music. In contrast, frequency-domain methods, such as the PV [12],also work for polyphonic signals but with induced artifacts of their own,namely phasiness and transient smearing. As a first improvement to reducephasiness, Puckette [24] suggested to use phase-locking to keep phase coher-ence intact over neighbouring frequency channels. This method was furtherstudied by Laroche and Dolson [17] who proposed to separate the frequencyaxis into regions of influences, located around peak channels, and to lock thephase values of channels in a given region according to the phase value ofthe corresponding peak.

To deal with the issue of transient smearing, Bonada [3] proposed to keepthe stretch factor equal to one during attack transients and then reinitializeall phase values for channels above a certain frequency cut, i.e. the phasevalues of these channels are set equal to the original phase values. In thisway, the original timbre is kept intact without ruining the phase coherencefor stationary partials at the lower frequencies. A more advanced approachfor reducing transient smearing was presented by Röbel in [26]. Here, thetransient detection algorithm works on the level of frequency channels andthe reinitialization of a detected channel is performed for all time instantsinfluenced by the transient. In this way, there is no need to set the stretchfactor equal to one, which is a great advantage in regions with a dense set of

114

Page 133: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

1. Introduction

transients.More recent techniques have successfully reduced the PV artifacts by ap-

plying more sophisticated TF representations than the STFT. Bonada pro-posed the application of different FFTs for each time instant, which results ina TF representation with good frequency resolution at the lower frequenciesand good time resolution at the higher frequencies. Derrien [7] suggestedto construct an adaptive TF representation by choosing TF coefficients froma multi-scale Gabor dictionary under a matching constraint. A more recentalgorithm, based on the theory of NSGFs, was proposed by Liuni et al. [19].The idea behind their algorithm is to choose a fixed number of frequencybands and to apply, in each band, a NSGF with resolution varying in time.The window functions corresponding to the NSGFs are adapted to the signalby minimizing the Rényi entropy, which ensures a sparse TF representation.The techniques described in [26] and [19] are both implemented in the (com-mercialized) super phase vocoder (SuperVP) from IRCAM1.

Contributions to state of the art

In order to generalize the techniques from the classical PV to the case wherethe TF representation is obtained through a NSGF, it is necessary to use thesame number of frequency channels for each time instant. This constructioncorresponds to a uniform NSGF and, since the number of frequency channelsmust be at least equal to the length of the largest window function, necessar-ily leads to a high redundancy of the resulting transform.

In this paper we propose an algorithm, which fully exploits the potentialof NSGFs to provide adaptivity while keeping a redundancy similar to theclassical PV. This is achieved by letting the number of frequency channelsfor a given time instant equal the length of the window function selectedfor that particular time instant. This approach allows for using very longwindow functions, which is an advantage in regions with stationary partials.We summarize the contributions of this article as follows:

1. We explain the classical PV and the proposed algorithm in a unifiedframework using discrete Gabor theory.

2. We present a new time stretching algorithm, which uses an adaptiveTF representation of lower redundancy than any other previously pub-lished algorithm.

3. While the proposed algorithm combines various familiar techniquesfrom the literature, several new techniques are introduced in order totackle the challenges arising from the application of non-uniform NS-GFs. Hence, the proposed algorithm relies on techniques such as phase

1http://anasynth.ircam.fr/home/english/software/supervp

115

Page 134: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

locking [17], transient detection [8], and quadratic interpolation [2] andintegrates new methods for dealing with attack transients (cf. Sec-tion 4.2), for determining the phase values from frequencies estimatedby quadratic interpolation (cf. Section 4.2), and for constructing thestretched signal from the modified (non-uniform) NSGF (cf. Section4.3).

4. We provide a collection of sound files on-line (cf. Section 5) and includeall source code necessary for reproducing the results.

2 Discrete Gabor theory

We write f = ( f [0], . . . , f [L− 1])T for a vector f ∈ CL and ZL = 0, . . . , L− 1for the cyclic group. Given a, b ∈ ZL, we define the translation operatorTa : CL → CL and the modulation operator Mb : CL → CL as

Ta f [l] := f [l − a] and Mb f [l] := f [l]e2πibl

L ,

for l = 0, . . . , L− 1 and with translation performed modulo L. For g ∈ CL

and a, b ∈ ZL, we define the Gabor system gm,nm∈ZM ,n∈ZN as

gm,n[l] := TnaMmbg[l] = g[l − na]e2πimb(l−na)

L ,

with Na = Mb = L for some N, M ∈ N [28, 29]. If gm,nm,n spans CL, thenit is called a Gabor frame. The associated frame operator S : CL → CL, definedby

S f :=M−1

∑m=0

N−1

∑n=0〈 f , gm,n〉 gm,n, ∀ f ∈ CL,

is invertible if and only if gm,nm,n is a Gabor frame [5]. If S is invertible,then we have the expansions

f =M−1

∑m=0

N−1

∑n=0〈 f , gm,n〉 gm,n, ∀ f ∈ CL, (D.1)

with gm,n := TnaMmbS−1g. We say that gm,nm,n is the canonical dual frame ofgm,nm,n and that S−1g is the canonical dual window of g. The discrete Gabortransform (DGT) of f ∈ CL is the matrix c ∈ CM×N given by the coefficients〈 f , gm,n〉m,n in the expansion (D.1). Finally, the ratio MN/L is called theredundancy of gm,nm,n.

116

Page 135: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

2. Discrete Gabor theory

Nonstationary Gabor frames

In this section we extend the classical Gabor theory to the nonstationarycase [1]. Just as for the stationary case, we denote the total number of sam-pling points in time by N ∈N, however, we do not assume these points to beuniformly distributed. Further, instead of using just one window function,we apply Nw ≤ N different window functions gnn∈ZNw

to obtain a flex-ible resolution. The window function corresponding to time point n ∈ ZNis denoted by gj(n) with j : ZN → ZNw being a surjective mapping. Thenumber of frequency channels corresponding to time point n ∈ ZN is de-noted by Mn ∈ ZL and the resulting frequency hop size by bn := L/Mn. Fi-nally, the window functions gnn∈ZNw

are assumed to be symmetric aroundzero and we use translation parameters ann∈ZN ⊂ ZL to obtain to theproper support. With this notation, the nonstationary Gabor system (NSGS)gm,nm∈ZMn ,n∈ZN is defined as

gm,n[l] := Tan Mmbn gj(n)[l] = gj(n)[l − an]e2πimbn(l−an)

L .

If gm,nm,n spans CL, then it is called a NSGF. If Mn := M, for all n ∈ ZN ,then it is called a uniform NSGS (or uniform NSGF if it is also a frame). InFigure D.1 we see an example of a simple (non-uniform) NSGS with Nw = 2and N = 4.

Time

Mnbn

Tangj(n)

Freq

uen

cy

an n

Fig. D.1: Illustration of a NSGS with Nw = 2 and N = 4.

Let us now show that the theory of NSGFs extends the theory of standardGabor frames.

117

Page 136: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

Example 2.1. Let g ∈ CL and a, b ∈ ZL satisfy Na = Mb = L for someN, M ∈ N. Then, with gj(n) := g, an := na, and bn := b for all n ∈ ZN , weobtain the NSGS

gm,n[l] = TnaMmbg[l], m ∈ ZM, n ∈ ZN ,

which just corresponds to a standard Gabor system. 4

The total number of elements in a NSGS gm,nm,n is given by P =

∑N−1n=0 Mn and the redundancy is therefore P/L. The associated frame op-

erator S : CL → CL, defined by

S f :=N−1

∑n=0

Mn−1

∑m=0〈 f , gm,n〉 gm,n, ∀ f ∈ CL,

is invertible if and only if gm,nm,n constitutes a NSGF. If S is invertible, thenwe have the expansions

f =N−1

∑n=0

Mn−1

∑m=0〈 f , gm,n〉 gm,n, ∀ f ∈ CL, (D.2)

with gm,nm,n := S−1gm,nm,n being the canonical dual frame of gm,nm,n.The nonstationary Gabor transform (NSGT) of f ∈ CL is given by the coefficientscn(m)m,n := 〈 f , gm,n〉m,n in the expansion (D.2). We note that thesecoefficients do not form a matrix in the general case. We now consider animportant case for which the calculation of gm,nm,n is particularly simple.

Painless NSGFs

If supp(gj(n)) ⊆ [cj(n), dj(n)] and dj(n) − cj(n) ≤ Mn for all n ∈ Zn, thengm,nm,n is called a painless NSGS (or painless NSGF if it is also a frame). Inthis case we have the following result [1].

Proposition 2.1. If gm,nm,n is a painless NSGS, then the frame operator S is anL× L diagonal matrix with entries

Sll =N−1

∑n=0

Mn

∣∣∣gj(n)[l − an]∣∣∣2 , ∀l ∈ ZL.

The system gm,nm,n is a frame for CL if and only if ∑N−1n=0 Mn

∣∣∣gj(n)[l − an]∣∣∣2 > 0

for all l ∈ ZL, and in this case the canonical dual frame gm,nm,n is given by

gm,n[l] =gm,n[l]

∑N−1n′=0 Mn′

∣∣∣gj(n′)[l − an′ ]∣∣∣2 ,

for all n ∈ ZN and all m ∈ ZMn .

118

Page 137: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. The phase vocoder

We note that the canonical dual frame is also a painless NSGF, whichis a property not shared by general NSGFs. An immediate consequence ofProposition 2.1 is the classical result for painless nonorthogonal expansions[6], which just corresponds to the painless case for standard Gabor frames.

3 The phase vocoder

In this section we explain the classical PV [17] in the framework of Gabortheory. The PV stretches the length of a signal by means of modifying itsdiscrete STFT. Since the discrete STFT corresponds to a DGT, this techniquecan be perfectly well explained using Gabor theory. The main idea is toconstruct a DGT of the signal with respect to an analysis hop size a, modifyingthe DGT and then reconstructing from the modified DGT using a differentsynthesis hop size a∗. We only consider the case a∗ = ra, for a constantmodification rate r > 0. The case r > 1 corresponds to slowing down thesignal by extending its length, while r < 1 corresponds to speeding it upby shortening its length. The PV is a classic analysis-modification-synthesistechnique, and we will explain each of these three steps separately in thefollowing sections.

3.1 Analysis

Let gm,nm,n be a painless Gabor frame for CL. Given a real valued signalf ∈ RL, we calculate the DGT c ∈ CM×N of f with respect to gm,nm,n as

cm,n = 〈 f , gm,n〉 =L−1

∑l=0

f [l]g[l − na]e−2πimb(l−na)

L , (D.3)

for all m ∈ ZM and n ∈ ZN . Let us explain the consequences of the phaseconvention used in (D.3). Define Ωm := 2πm/M as the center frequency ofthe m’th channel and assume that g is real and symmetric around zero. Then,since gm,nm,n is painless and b/L = 1/M, we may write (D.3) as

cm,n =M−1

∑l=0

f [l]g[na− l]e−iΩm(l−na) = eiΩmna ( fm ∗ g) [na], (D.4)

with fm[l] := f [l]e−iΩm l . If g and g are both well-localized around zero, theconvolution in (D.4) extracts the baseband spectrum of fm at time na. Recall-ing that fm is just a version of f that has been modulated down by m, thisbaseband spectrum corresponds to the spectrum of f in a neighbourhood offrequency m at time na. Finally, modulating back by m we obtain the band-pass spectrum of f in a neighbourhood of frequency m at time na. This phaseconvention is the traditional one used in the PV [17, 18, 21].

119

Page 138: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

3.2 Modification

To explain the modification step of the PV, we refer to a quasi-stationarysinusoidal model that f is assumed to satisfy [16, 20]. This model is not usedexplicitly anywhere in the derivation of the PV, but it serves an importantrole for explaining the underlying ideas. We assume that f can be written asa finite sum of sinusoids

f (t) = ∑k

Ak(t)eiθk(t), (D.5)

in which Ak(t) is the amplitude, θk(t) is the phase, and θ′k(t) is the frequency ofthe k’th sinusoid at time t. Since the model is quasi-stationary, Ak(t) and θ′k(t)are assumed to be slowly varying functions. In particular, they are assumedto be almost constant over the duration of g. Based on (D.5), the perfectlystretched signal f∗ at time na∗ = nra is given by

f∗[na∗] = ∑k

Ak(na)eirθk(na). (D.6)

We note that the amplitudes and frequencies of the stretched signal f∗ at timena∗ equal the amplitudes and frequencies of the original signal f at time na.

The idea behind the modification step is to construct a new DGT d ∈CM×N , based on c ∈ CM×N , such that reconstruction from d, with respectto a∗, yields a time stretched version of f in the sense of (D.6). Since theamplitudes need to be preserved we set

dm,n = |cm,n| ei∠dm,n , m ∈ ZM, n ∈ ZN ,

using polar coordinates. Estimating the phases ∠dm,nm,n involves a taskcalled phase unwrapping [17].

Phase unwrapping

Assume there is a sinusoid of frequency ω in the vicinity of channel m attime na. Then, we make the estimate

ei∠dm,n = ei(∠dm,n−1+ωa∗), (D.7)

since the two DGT samples dm,n−1 and dm,n are a∗ time samples apart. Usingthe same argument we may write ei∠cm,n = ei(∠cm,n−1+ωa). Setting ω = ∆ω +Ωm, and isolating the deviation ∆ω, yields

princarg ∆ωa = princarg ∠cm,n −∠cm,n−1 −Ωma ,

with "princarg" denoting the principal argument in the interval ]−π, π]. As-suming ω is close to the center frequency Ωm, such that ∆ω ∈]− π/a, π/a],we arrive at

∆ω =princarg ∠cm,n −∠cm,n−1 −Ωma

a.

120

Page 139: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

3. The phase vocoder

We can now calculate ω as ∆ω + Ωm and use (D.7) to determine ∠dm,nm,nby initializing dm,0 = cm,0 for all m ∈ ZM.

3.3 Synthesis

The final step of the PV is to construct a time stretched version of f in thesense of (D.6) from the modified DGT d ∈ CM×N . This is done by recon-structing from d with respect to the synthesis hop size a∗. According to(D.1), such a reconstruction yields

f∗[l] =M−1

∑m=0

N−1

∑n=0

dm,nTna∗MmbS−1∗ g[l], (D.8)

with S∗ : CL → CL being the modified frame operator

S∗x[l] =M−1

∑m=0

N∗−1

∑n=0〈x, Tna∗Mmbg〉Tna∗Mmbg[l],

where N∗ := L/a∗. The length of the reconstructed signal f∗ is given byL∗ = Na∗ = Lr and translation is performed modulo L∗ in (D.8). In practice,the reconstruction formula (D.8) is realized by applying an inverse FFT andoverlap-add.

Traditionally, a DGT with 75% overlap is used in the analysis step, whichallows for modification factors r ≤ 4. We note that if no modifications aremade (r = 1), we recover the original signal. In the next section we considersome of the problems connected with the PV.

3.4 Drawbacks

The idea behind the PV is intuitive and easily implementable, which makesit attractive from a practical point of view. Unfortunately, the assumptionsmade in the modification part are not easily satisfied. This is true even forsignals constructed explicitly from the sinusoidal model (D.5). We now listthree main issues to be considered.

1. Vertical coherence: The PV ensures horizontal coherence [17] within eachfrequency channel but no attempt is made to ensure vertical coherence[17] across the frequency channels. If a sinusoid moves from one chan-nel to another, the corresponding phase estimate might change dramat-ically. This is undesirable since a small change in frequency should onlyimply a small change in phase.

2. Resolution: In practice, we cannot assume that the sinusoids constitut-ing f are well resolved in the DGT in the sense at most one sinusoid is

121

Page 140: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

present within each frequency channel. The channels will only providea "blurred" image of various neighbouring sinusoids. Furthermore, theamplitudes and frequencies of each sinusoid will often not be constantover the entire duration of g. As a consequence, the estimates made inthe modification part will be subject to error.

3. Transients: The presence of attack transients is not well modelledwithin the PV as the phase values at such time instants cannot be pre-dicted from previous estimates. Also, for music signals we often wantthe onsets to stay intact after time stretching, which is not accountedfor in the PV approach.

In the next section we construct a new PV, which addresses the problemsmentioned above.

4 A phase vocoder based on nonstationary Gaborframes

As mentioned in the introduction, the DGT is not always preferable for rep-resenting music signals as it corresponds to a uniform resolution over thewhole TF plane. A poor TF resolution conflicts with the fundamental idea ofwell resolved sinusoids and therefore causes problems for the PV. In this sec-tion we change the TF representation from the DGT to an adaptable NSGT,which better matches the sinusoidal model (D.5). To be consistent with thedescription of the PV in Section 3 we separately explain the analysis, modifi-cation, and synthesis steps of the proposed algorithm.

4.1 Analysis

First of all, an adaptation procedure must be chosen for the NSGT. We chooseto work with the procedure described in [1] since it is suitable for represent-ing signals, which consist mainly of transient and sinusoidal components.The adaptation procedure is based on the idea that window functions withsmall support should be used around the onsets of attack transients, whilewindow functions with longer support should be used between these onsets.

Remark 4.1. The construction presented here necessarily yields the problemof a coarse frequency resolution for the transient regions. However, as wepropose to keep the stretch factor equal to one during attack transients (cf.Section 4.2), the impact of this problem is limited.

The onsets are calculated using a separate algorithm [8] and the windowfunctions are constructed as scaled versions of a single window prototype(a Hanning window or similar). The resulting system is referred to as a

122

Page 141: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. A phase vocoder based on nonstationary Gabor frames

scale frame. In the following paragraphs we explain the construction of scaleframes in details.

Transient detection

To perform the transient detection we use a spectral flux (SF) onset functionas described in [1, 8]. This function is computed with a DGT of redundancy16, and it measures the sum of (positive) change in magnitude for all fre-quency channels. A time instant, corresponding to a local maximum of theSF function, is determined as an onset if its SF value is larger than the SFmean value in a certain neighbourhood of time frames. Hence, for regionwith a dense set of transients, only the most significant onsets are calculated.It is clear that such an approach must be taken to avoid an undesirably lowfrequency resolution in such regions. The redundant DGT used for the SFonset function is not used anywhere else in our algorithm and does not con-tribute significantly to the overall complexity.

Constructing the window functions

After a set of onsets has been extracted, the window functions are constructedfollowing the rule that the space between two onsets is spanned in such a waythat the window length first increases (as we get further away from the firstonset) and then decreases (as we approach the next onset). The constructionis performed in a smooth way such that the change from one step to the nextcorresponds to a window function that is either half as long, twice as longor of the same length. For details see [1]. The overlap between the windowfunctions is chosen such that at most one onset is present within each timeframe, we shall elaborate further on this particular construction in Section4.3.

Constructing the NSGT

Once the window functions gnn∈ZNwhave been constructed, we choose

the numbers of frequency channels Mnn∈ZN such that the resulting sys-tem constitutes a painless NSGF. Additionally, we choose a lower bound onMnn∈ZN to avoid an undesirably low number of channels around the on-sets (explicit choices of parameters are described in Section 6). Given a realvalued signal f ∈ RL, we calculate the NSGT cn(m)m∈ZMn ,n∈ZN of fwith respect to the scale frame gm,nm,n as

cn(m) = 〈 f , gm,n〉 =L−1

∑l=0

f [l]gj(n)[l − an]e−2πimbn(l−an)

L ,

for all n ∈ ZN and all m ∈ ZMn . We note that the phase convention is thesame as used in the PV (cf. Section 3.1).

123

Page 142: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

4.2 Modification

The idea behind the modification step is the same as for the PV. We assumef satisfies (D.5), and we construct a modified NSGT dn(m)m,n, based oncn(m)m,n, such that reconstruction from dn(m)m,n, with respect toa set of synthesis translation parameters, yields a time stretched version off in the sense of (D.6). Given a stretch factor r > 0, the distance betweensynthesis time sample n and n + 1 is

a∗n := r(an+1 − an), n ∈ ZN . (D.9)

Since we do not want the transients to be stretched, we let r = 1, when ancorresponds to the onset of a transient, and then stretch with a correspond-ingly larger factor r′ > r in remaining regions. Using polar coordinates weset

dn(m) = |cn(m)| ei∠dn(m), n ∈ ZN , m ∈ ZMn ,

with ∠d0(m) = ∠c0(m) for all m ∈ ZM0 . Hence, in complete analogywith the approach in the PV, the problem boils down to estimating the phasevalues ∠dn(m)m,n.

Making the transition from stationary Gabor frames to NSGFs, we arefacing a fundamental problem. The DGT corresponds to a uniform samplinggrid over the TF plane, while the NSGF corresponds to a sampling grid whichis irregular over time but regular over frequency for each fixed time position.This is illustrated in Figure D.2.

Time

Freq

uency

DGT

Time

Freq

uency

NSGT

Fig. D.2: Sampling grids corresponding to a DGT and a NSGT.

As a consequence, we cannot guarantee that each sampling point has ahorizontal neighbour that can be used for estimating the frequency as in the

124

Page 143: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. A phase vocoder based on nonstationary Gabor frames

PV (cf. Section 3). We therefore generalize the approach from [2] to thenonstationary case and calculate the frequencies using quadratic interpolation.

Calculating the frequencies

For fixed n ∈ ZN , we define channel mp as a peak if its magnitude∣∣cn(mp)

∣∣is larger than the magnitudes of its two vertical neighbors, i.e.

∣∣cn(mp)∣∣ >∣∣cn(mp ± 1)

∣∣. If there is a sinusoid of frequency ω in the vicinity of peakchannel mp, the "true" peak position will differ from mp unless ω is exactlyequal to 2πmp/Mn. The idea is thus to interpolate the true peak position,using the neighboring channels mp ± 1, and then to apply this value as anestimate for ω. To describe the setup we set the position of the peak channelmp to 0, and the positions of its two neighbors to −1 and 1, respectively. Also,we denote the true peak position by p and define

α := |cn(−1)| , β := |cn(0)| , and γ := |cn(1)| .

The situation is illustrated in Figure D.3, with y denoting the parabola to beinterpolated.

Channel

Magnitude

-1 p 0 1

y

α

γ

β

Fig. D.3: Illustration of quadratic interpolation.

Writing y(x) = a(x− p)2 + b and solving for p yields

p =12· α− γ

α− 2β + γ∈(−1

2,

12

).

The value of p determines the deviation from the peak channel to the truepeak proportional to the size of the channel. After p has been determined,

125

Page 144: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

we calculate the frequency as

ω =2π(mp + p)

Mn. (D.10)

In practice, the calculations are done on a dB scale for higher accuracy ofthe quadratic interpolation. Let us now explain how the frequency estimate(D.10) is used to calculate the corresponding phase value ∠dn(mp).

Calculating the phases

Between each pair of peaks we define the (lowest) channel with smallest mag-nitude as a valley and then use these valleys to separate the frequency axisinto regions of influence. As noted in [17], if a peak switches from channelmp′ at time n− 1 to channel mp at time n, the corresponding phase estimateshould take this behavior into account. A simple way of determining theprevious peak mp′ is to choose the peak of the corresponding region of in-fluence that channel mp would have belonged to in time frame n− 1. This isillustrated in Figure D.4.

Time

Freq

uency

Region ofinfluence

Valley

Peak

nn-1

mp'

mp

a*n-1

Fig. D.4: Illustration of peak, valley and region of influence.

Based on this construction, with a∗n−1 given in (D.9), the phase estimate atpeak channel mp is

dn(mp) =∣∣cn(mp)

∣∣ ei(∠dn−1(mp′ )+ωa∗n−1). (D.11)

126

Page 145: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. A phase vocoder based on nonstationary Gabor frames

For the neighboring channels in the corresponding region of influence, thephase values will be locked to the phase of the peak. Following the approachin [17], we let

ei∠dn(m) = ei(∠dn(mp)+∠cn(m)−∠cn(mp)),

for all channels m in the region of influence corresponding to peak channelmp. Hence, the phase locking is such that the difference in synthesis phase isthe same as the difference in analysis phase. It is important to note that theactual phase estimates are done only at peak channels, which allows for a fastimplementation. As mentioned in Section 3.4, the phase estimate (D.11) is notwell suited for modelling attack transients. In the next section we explain ourapproach for dealing with this issue.

Transient preservation

Since the phase values ∠dn(m) at transients locations cannot be predictedfrom previous estimates, one might choose to simply reinitialize all phasevalues at such locations ∠dn(m) = ∠cn(m). However, for stationary par-tials passing through the transient, such a reinitialization completely destroysthe horizontal phase coherence, thereby producing undesirable artifacts inthe resulting sound. To deal with this problem, we propose the followingrule for phase estimation at transient locations: Assume time-instant n cor-responds to the onset of an attack transient. Consider channel m, belongingto the region of influence dominated by a peak channel mp, and let mp′ de-note the peak channel of the region of influence that channel mp would havebelonged to in time frame n− 1 (same notation as in (D.11), see also FigureD.4). Then, given a tolerance ε > 0, we reinitialize ∠dn(m) = ∠cn(m) ifand only if

|cn(m)| >∣∣∣cn− 1(mp′)

∣∣∣+ ε. (D.12)

For the implementation, the calculations are done on a dB scale with ε = 2dB.We note that in contrast to previously proposed techniques for onset reini-tialization [3, 7, 26], our algorithm has the advantage that it tracks sinusoidsacross frequency channels.

4.3 Synthesis

Before we can provide the actual synthesis formula, we need to return tothe issue of choosing the overlap between window functions (cf. Section4.1). Originally, scale frames were invented with the intention of construc-tion adaptive TF representations with a very low redundancy. To ensure alow redundancy, and a stable reconstruction, the overlap between adjacent

127

Page 146: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

window functions is chosen as 1/3 of the length for equal windows and 2/3of the length of the shorter window for different windows [1].

This construction makes sense in the general settings, since the result-ing system constitutes a frame for CL as long as the painless condition fromProposition 2.1 is satisfied. However, in the case of time-stretching with afactor r > 1, this construction cannot guarantee that the dual windows (cf.Proposition 2.1) overlap coherently when placed at the synthesis time in-stants. To tackle this issue, we have chosen the overlap between windowfunctions in the following way:

1. First the onsets of attack transients are calculated (using the onset de-tection algorithm from Section 4.1).

2. Then these onsets are relocated such that the distance between the re-located onsets is r times the distance between the original onsets.

3. The window functions are now calculated according to the relocatedonsets, using the approach in [1], and afterwards centered at the origi-nal time instants.

While this approach might give the impression that we just stretch the win-dow lengths by a factor of r, this is not the case. Calculating the windowswith respect to the relocated onsets still produce a sequence of windowsfunctions of the same lengths as if the original onsets had been used. This isillustrated in Figure D.5.

With this choice of overlap, we can construct the stretched signal f∗ usingthe synthesis formula

f∗ = ∑n∈ZN

∑m∈ZMn

dn(m)gm,n, (D.13)

with gm,nm,n being the canonical dual frame from Proposition 2.1 con-structed using the synthesis time instants. In practice, the reconstructionformula (D.13) is realized by applying an inverse FFT and overlap-add as inthe classical PV.

4.4 Advances

In this section we explain how the proposed algorithm improves the tech-niques of the PV. We do so by separately addressing the three drawbacksdescribed in Section 3.4.

1. Vertical coherence: If a sinusoid moves from channel mp′ at time n− 1to channel mp at time n, then the corresponding peak channel alsochanges from mp′ to mp. The estimate given in (D.11) therefore en-sures that the corresponding phase increment takes this behavior into

128

Page 147: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

4. A phase vocoder based on nonstationary Gabor frames

Time

Frequency

Window functions constructed from relocated onsets

Time

Frequency

Window functions centered at original time instants

Relocated onset Relocated onset

Original onsetOriginal onset

Fig. D.5: Construction of the overlap between window functions.

account. In this way we get coherence across the various frequencychannels in contrast to the standard PV which only provides coherencewithin each frequency channel.

2. Resolution: Changing the representation from that of a DGT to anadaptable NSGT automatically improves the TF resolution for signals,which are well represented by the sinusoidal model (D.5). Further-more, calculating the phase increment only at peak channels replacesthe underlying assumption of well resolved sinusoids in each frequencychannel with the weaker assumption of well resolved sinusoids in eachregion of influence.

3. Transients: To reduce transient smearing, we keep the stretch factorequal to one during attack transients and we reinitialize the phase val-ues of relevant channels according to (D.12).

While the PV serves as a good starting point for understanding the funda-mental concepts behind the proposed algorithm, it is not the main goal ofthis article only to improve the resulting sound quality compared to this clas-sical technique. The main advantage of the proposed algorithm is the abilityto produce good results, when compared to state of the art, while keeping alow redundancy of the applied TF transform.

129

Page 148: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

Redundancy of the NSGT

As mentioned in Section 3.3, the classical PV applies an overlap of 75% cor-responding to a redundancy of 4 in the DGT. There is some mathematicaljustification to this choice [17], but mainly the overlap is chosen to ensure agood TF resolution. It should be noted that the redundancy of the DGT isindependent of the signal under consideration — it only depends on the anal-ysis hop size and the length of the window function (assuming the painlesscondition is satisfied).

For multi-resolution methods, the situation changes as the TF resolutionadapts to the particular signal. A standard approach for multi-resolutionmethods is to choose non-uniform sampling points in time, with correspond-ing window functions, and a uniform number of frequency channels corre-sponding to the length of the largest window function [19, 21]. This con-struction corresponds to applying a uniform NSGF (cf. Section 2). Such anapproach is desirable from a practical point of view as the coefficients thenform a matrix and the standard techniques from the PV (and its improve-ments) immediately apply. However, the choice of a uniform NSGF naturallyimplies a high redundancy of the transform as the sampling density is muchhigher than needed for the painless case (cf. Proposition 2.1). For real worldsignals, such a high redundancy is undesirable as it implies a high computa-tional cost for the time-stretching algorithm.

In contrast to previously suggested methods, our algorithm takes full ad-vantage of the painless condition and produces good results with a redun-dancy of ≈ 3 for a stretch factor of r = 1.5. It is important to note thatthe redundancy of the proposed algorithm depends both on the signal underconsideration and the stretch factor (at least in the case where r > 1). Fordifferent signals, the onset detection algorithm calculates different onsets,which results in different time sampling points and different numbers of fre-quency channels. As for the stretch factor, we recall the choice of overlap asdescribed in Section 4.3. For a large stretch factor, we need a large overlapbetween the window functions to guarantee that the synthesis formula (D.13)makes sense. We do not consider the dependency between the redundancyand the stretch factor a problem, since the redundancy is still manageableeven for large stretch factors. For a stretch factor of r = 3, the redundancy is≈ 5 and for a stretch factor of r = 4, the redundancy is ≈ 7.

In the next section we present the numerical experiments and comparethe proposed algorithm with state of the art algorithms for time stretching(cf. Section 1.1).

130

Page 149: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Experiments

5 Experiments

The proposed algorithm has been implemented in MATLAB R2017A and thecorresponding source code is available at

http://homepage.univie.ac.at/monika.doerfler/NSPV.html

The source code depends on the following two toolboxes: The LTFAT [23](version 2.1.2 or above) freely available from http://ltfat.github.io/ andthe NSGToolbox [1] (version 0.1.0 or above) freely available from http://nsg.sourceforge.net/.

For the classical PV, we use an implementation by Ellis [11], which in-cludes some improvements to the procedure described in Section 3 (in par-ticular, interpolation of magnitudes). As these improvements result in a sig-nificantly improved audio quality, we have chosen this implementation forcomparison.

In Section 5.1 we compare the proposed algorithm to the classical PV bystretching synthetic (music) signals and in Section 5.2 we turn to the analysisof real world signals and compare the proposed algorithm with the algo-rithms from Derrien [7] and Liuni et al. [19].

5.1 Synthetic signals

Analysing synthetic signals has the advantage that the perfect stretched ver-sion is available and can be used as ground truth. For this experiment, weconstruct a large number of synthetic signals and compared the performanceof the proposed algorithm with the classical PV for each of these signals.More precisely, the approach is as follows:

1. For each synthetic melody we choose a random number of notes be-tween 4 and 10. Each note has a randomly chosen duration of either0.5 or 1 second and the corresponding tone consists of a fundamentalfrequency and three harmonics of decreasing amplitudes. The funda-mental frequencies are set to coincide with those of a piano and themelody is allowed to move either 1 or 2 half notes up or down (ran-domly chosen) per step. A randomly chosen envelope ensures that thetones have both an attack and a release. The sampling frequency of theresulting signal s is 16000 Hz.

2. A stretch factor 0.5 ≤ r ≤ 3.75 is chosen at random and another syn-thetic signal sper f is constructed, such that sper f corresponds to a per-fectly time stretched version of s in the sense of (D.6). The classical PVand the proposed algorithm are applied to the original signal s, withrespect to the stretch factor r, resulting in the time stretched versionsspv and snsgt.

131

Page 150: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

3. Three DGTs Sper f , Spv and Snsgt are constructed from the time stretchedversions sper f , spv and snsgt, using the same parameter settings for eachsignal. With |S| denoting a vector consisting of the absolute values of aDGT S, we use the following error measure

E(Sper f , S) =

∥∥∥∣∣∣Sper f

∣∣∣− |S|∥∥∥2∥∥∥∣∣∣Sper f

∣∣∣∥∥∥2

, (D.14)

with S being either Spv or Snsgt.

Note that we cannot apply a sample by sample error measure in the timedomain, since in this case a small change in phase for the stretched signalsmight cause a large error, which does not reflect the actual sound quality.We therefore choose to compare the stretched versions using the magnitudedifference of their DGTs. Let us now define the parameters used for the TFrepresentations in this experiment.

Choice of parameters

For the DGT used in the PV, we apply two different parameter settings. Usingthe notation (hopsize, number of frequency channels) we use the parameters(256, 1024) and (128, 512). For the first parameter setting we use a Han-ning window of length 1024 and for the second parameter setting we usea Hanning window of length 512. In this way we obtain painless DGTs ofredundancy 4.

For the NSGT used in the proposed algorithm, we use 5 different Hanningwindows with lengths varying from 96 samples (at attack transients) to 96 ·24 = 1536 samples. The lower bound on the number of frequency channels isset to 96 · 23 = 768, corresponding to the length of the second largest windowfunctions.

For the DGT used for computing Sper f , Spv, and Snsgt, we use parameters(128, 2048) and a Hanning window of 2048 samples. This results in a painlessDGT of redundancy 16.

Results

Repeating the experiment described above for 1000 synthetic test signals weget the average results shown in Table D.1.

The results in Table D.1 show that the proposed algorithm outperformsthe classical PV, with respect to the error measure in (D.14), while keep-ing a comparable redundancy of the applied transform. For a visualizationof the performances of the algorithms we have plotted, in Figure D.6, thespectrograms corresponding to the three DGTs Sper f , Spv (with parameters(128, 512)), and Snsgt for one particular synthetic test signal (with r = 1.5).

132

Page 151: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

5. Experiments

Table D.1: Average results for 1000 synthetic test signals.

Algorithm: PV(256, 1024) PV(128, 512) Proposed algorithmAverage red.: 3.954813 3.977300 3.637370Average error: 0.439982 0.415139 0.095104

We can easily see how the proposed algorithm more accurately repro-duces the onsets, and how it reduces the noisy components between the har-monics compared to the PV. However, we can also see how the frequenciescorresponding to the harmonics are better reproduced with the PV than withthe proposed algorithm. The proposed algorithm induces a certain amplitudemodulation due to the peak detection and phase locking approach describedin Section 4.2.

We have provided sound files on-line for the test signal shown in FigureD.6 with respect to the stretch factors r = [0.75, 1.25, 1.5, 2.25, 3.0, 3.75]. Thesound files are available for the perfect stretched version, the PV(128, 512),and the proposed algorithm. It is important to note that the error measuregiven in (D.14) is not a direct reflection of the actual audio quality — it isfor instance not true that the proposed algorithm consistently performs 4times as good as the classical PV. The results for the proposed algorithm areparticularly convincing for stretch factors r ≤ 2, where the timbre at attacktransients is nicely preserved in contrast to the classical PV. However forlarger stretch factors r ≥ 2, the impact of the amplitude modulation, and ofthe coarse frequency resolution around onsets, becomes audible. Eventually,this results in an overall sound quality comparable to the PV (or even belowfor very large stretch factors r ≥ 3).

Since the authors do not have access to the source code of the more so-phisticated algorithms as proposed in [3, 7, 19], the comparison for syntheticsignals could only be done for the PV and the proposed algorithm. However,as the authors from [7] and [19] kindly provided us with sound files for realworld signals, we have included these algorithms for the comparison in thenext section.

5.2 Real world signals

For this experiment we consider three real world signals, each of length ≈ 4seconds and with a sampling frequency of 44100 Hz. The signals are chosensuch that they challenge different aspects of the time stretching algorithms:

1. The first signal is a glockenspiel signal with few transients and manyharmonics at the higher frequencies.

2. The second signal is a piece of piano music consisting of a dense set of

133

Page 152: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

Paper D.

Perfect stretch

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time (s)

0

500

1000Fr

equency

(Hz)

PV(128,512)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time (s)

0

500

1000

Frequency

(Hz)

Proposed algorithm

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

Time (s)

0

500

1000

Frequency

(Hz)

Fig. D.6: Spectrograms for stretched versions of a synthetic signal with r = 1.5.

transients with most of the energy concentrated at the lower frequen-cies.

3. The third signal is from a rock song played by a full band, therebyproducing a complex polyphonic sound.

We chose to work with the stretch factors 0.75, 1.25, 1.5 and 2.25 for thecomparison. The algorithms we include are:

1. The PV as described in Section 3 and implemented in [11]. For the DGTused in the PV, we use parameters (512, 2048) and a Hanning windowof length 2048.

2. The proposed algorithm from Section 4. We use 5 Hanning windowswith lengths varying from 384 to 384 · 24 = 6144 and with 384 · 22 =

134

Page 153: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

6. Conclusion and perspectives

1536 being the lower bound on the number of frequency channels.

3. The matching pursuit algorithm by Derrien [7].

4. The SuperVP from IRCAM based on the theory of Röbel [26] and Liuniet al. [19]. The algorithm uses only one frequency band and choosesbetween window lengths of 1024, 2048, 3072, and 4096 samples for theadaptive (uniform) NSGT. We refer the reader to [19] for details.

Since all the stretched sounds are available on-line, we only give the mainconclusions. The classical PV and the algorithm by Derrien are rather similarin performance — they both produce a good overall sound quality but withsignificant transient smearing. The proposed algorithm, on the other hand,does a much better job of preserving the original timbre at attack transient,but induces a certain "roughness" to the sounds (mainly for r = 2.25). Also,some of the weaker transients, which are not detected by the onset detec-tion algorithm, suffer from transient smearing for the proposed algorithm(in particular, the "tapping" noises in the background of the piano music).The SuperVP does not have this problem as the transient detection algorithmworks on the level of spectral bins. Overall, the SuperVP provides the bestaudio quality for the three signals, which is to be expected as it applies a TFrepresentation of much higher redundancy than the other algorithms. Cal-culating the average redundancies for the proposed algorithm (over the 4stretch factors) for each signal we get 2.40, 2.90 and 2.65. Finally, let us notethat the third signal (the rock band signal) reveals a fundamental issue withthe application of NSGFs. For r = 2.25, neither the proposed algorithm northe SuperVP are capable of maintaining a steady bass, which results fromthe changing window lengths. This particular issue is better resolved by theclassical PV as well as the algorithm by Derrien.

6 Conclusion and perspectives

Using discrete Gabor theory we have presented the classical PV and proposeda new time stretching algorithm in a unified framework. This approach hasallowed us to address and improve on the disadvantages of the classical PV,while preserving the mathematical structure provided by Gabor theory. Theproposed algorithm is the first attempt to use non-uniform NSGFs for time-stretching, which allows for a low redundancy of the adaptive TF represen-tation and leads to a fast implementation. The proposed algorithm has beencompared to other multi-resolution methods, in a reproducible manner, andwe have discussed its advantages and its shortcomings. As a future improve-ment it could be interesting to connect the techniques presented in this articlewith the ideas proposed by Röbel in [26], possibly allowing for an algorithm

135

Page 154: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

that uses non-uniform NSGFs without the need for fixing the stretch factorduring attack transients.

Acknowledgement

We would like to thank the three anonymous reviewers for their suggestions,which clearly improved the over-all presentation of this manuscript. Also aspecial thanks to Olivier Derrien and Marco Liuni for providing us with thesound files used for comparison.

References

[1] P. Balazs, M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco. Theory, im-plementation and applications of nonstationary Gabor frames. J. Comput.Appl. Math., 236(6):1481–1496, 2011.

[2] G. T. Beauregard, M. Harish, and L. Wyse. Single pass spectrogram in-version. In 2015 IEEE International Conference on Digital Signal Processing(DSP), pages 427–431, 2015.

[3] J. Bonada. Automatic technique in frequency domain for near-losslesstime-scale modification of audio. In Proceedings of International ComputerMusic Conference, pages 396–399, 2000.

[4] F. Charpentier and M. Stella. Diphone synthesis using an overlap-addtechnique for speech waveforms concatenation. In ICASSP ’86. IEEEInternational Conference on Acoustics, Speech, and Signal Processing, vol-ume 11, pages 2015–2018, Apr 1986.

[5] O. Christensen. An introduction to frames and Riesz bases. Applied and Nu-merical Harmonic Analysis. Birkhäuser/Springer, [Cham], second edi-tion, 2016.

[6] I. Daubechies, A. Grossmann, and Y. Meyer. Painless nonorthogonalexpansions. J. Math. Phys., 27(5):1271–1283, 1986.

[7] O. Derrien. Time-scaling of audio signals with multi-scale Gabor anal-ysis. In DAFx’07, pages CD–ROM (6 pages), Bordeaux, France, Sept.2007.

[8] S. Dixon. Onset detection revisited. In Proc. of the Int. Conf. on DigitalAudio Effects (DAFx-06), pages 133–137, Montreal, Quebec, Canada, sep2006.

136

Page 155: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[9] M. Dörfler and E. Matusiak. Nonstationary Gabor frames—existenceand construction. Int. J. Wavelets Multiresolut. Inf. Process., 12(3):1450032,18, 2014.

[10] D. Dorran and R. Lawlor. An efficient phasiness reduction technique formoderate audio time-scale modification. In in Proc. of the 7 th InternationalConference on Digital Audio Effects (DAFx-04), 2004.

[11] D. P. W. Ellis. A phase vocoder in Matlab, 2002. Web resource.

[12] J. L. Flanagan and R. M. Golden. Phase vocoder. The Bell System TechnicalJournal, 45(9):1493–1509, Nov 1966.

[13] D. Griffin and J. Lim. Signal estimation from modified short-time Fouriertransform. IEEE Transactions on Acoustics, Speech, and Signal Processing,32(2):236–243, Apr 1984.

[14] K. Gröchenig. Foundations of time-frequency analysis. Applied and Nu-merical Harmonic Analysis. Birkhäuser Boston, Inc., Boston, MA, 2001.

[15] H. Ishizaki, K. Hoashi, and Y. Takishima. Full-automatic dj mixing sys-tem with optimal tempo adjustment based on measurement functionof user discomfort. In ISMIR, pages 135–140. International Society forMusic Information Retrieval, 2009.

[16] J. Laroche. Time and Pitch Scale Modification of Audio Signals, pages 279–309. Springer US, Boston, MA, 2002.

[17] J. Laroche and M. Dolson. Improved phase vocoder time-scale modifica-tion of audio. IEEE Transactions on Speech and Audio Processing, 7(3):323–332, May 1999.

[18] M. Liuni and A. Röbel. Phase vocoder and beyond. Musica/Tecnologia,7:73–89, 2013.

[19] M. Liuni, A. Röbel, E. Matusiak, M. Romito, and X. Rodet. Automaticadaptation of the time-frequency resolution for sound analysis and re-synthesis. IEEE Transactions on Audio, Speech, and Language Processing,21(5):959–970, May 2013.

[20] R. McAulay and T. Quatieri. Speech analysis/synthesis based on a sinu-soidal representation. IEEE Transactions on Acoustics, Speech, and SignalProcessing, 34(4):744–754, Aug 1986.

[21] E. Moulines and J. Laroche. Non-parametric techniques for pitch-scaleand time-scale modification of speech. Speech Commun., 16(2):175–205,feb 1995.

137

Page 156: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

References

[22] M. Portnoff. Implementation of the digital phase vocoder using the fastFourier transform. IEEE Transactions on Acoustics, Speech, and Signal Pro-cessing, 24(3):243–248, Jun 1976.

[23] Z. Pruša, P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Bal-azs. The Large Time-Frequency Analysis Toolbox 2.0. In M. Aramaki,O. Derrien, R. Kronland-Martinet, and S. Ystad, editors, Sound, Music,and Motion, Lecture Notes in Computer Science, pages 419–442. SpringerInternational Publishing, 2014.

[24] M. Puckette. Phase-locked vocoder. In Proceedings of 1995 Workshop onApplications of Signal Processing to Audio and Accoustics, pages 222–225,Oct 1995.

[25] J.-C. Risset. Examples of the musical use of digital audio effects. Journalof New Music Research, 31(2):93–97, 2002.

[26] A. Röbel. A new approach to transient processing in the phase vocoder.In 6th International Conference on Digital Audio Effects (DAFx), pages 344–349, London, United Kingdom, Sept. 2003.

[27] S. Roucos and A. Wilgus. High quality time-scale modification forspeech. In ICASSP ’85. IEEE International Conference on Acoustics, Speech,and Signal Processing, volume 10, pages 493–496, Apr 1985.

[28] P. L. Søndergaard. Finite discrete Gabor analysis. PhD thesis, TechnicalUniversity of Denmark, 2007.

[29] T. Strohmer. Numerical algorithms for discrete Gabor expansions. InGabor analysis and algorithms, Appl. Numer. Harmon. Anal., pages 267–294. Birkhäuser Boston, Boston, MA, 1998.

138

Page 157: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen
Page 158: Aalborg Universitet Sparse Nonstationary Gabor Expansions ...€¦ · Sparse Nonstationary Gabor Expansions with Applications to Music Signals Ph.D. Dissertation Emil Solsbæk Ottosen

EMIL SO

LSBÆ

K O

TTOSEN

SPAR

SE NO

NSTATIO

NA

RY GA

BO

R EXPA

NSIO

NS

ISSN (online): 2446-1636 ISBN (online): 978-87-7210-153-8