DIPLOMARBEIT
Titel der Diplomarbeit
Gabor Analysis of Structured Sparsityand some Applications
Verfasser
Dominik Fuchs
angestrebter akademischer Grad
Magister der Naturwissenschaften (Mag. rer. nat.)
Sturienkennzahl: A 405Studienrichtung: MathematikBetreuer: Ao. Univ.-Prof. Dr. Hans Georg Feichtinger
Wien, April 2013
Acknowledgments
I want to thank my supervisor Prof. Hans Georg Feichtinger from the Numerical Har-
monic Analysis Group (NuHAG) for his guidance, support and experienced advices during
my work.
Special thanks to Monika Dorfler for giving me an understanding of Structured Sparsity
and for her time and patience while guiding me through this theory, Kai Siedenburg for
his help concerning MATLABTM routines, and Markus Schaner for sharing his experience
in the field of audio engineering with me.
Last but not least I want to thank my study colleagues and friends as well as my beloved
family for their support and for believing in me.
I should also thank my guitar which always psyched me up when I was dejected.
1
2
Abstract
“Thresholding” is well known in the field of audio engineering and often used to obtain
noise suppressed signals. There are many possibilities to fix the analog way of a signals
and erase the disturbing noise but if the signal has already been recorded, it becomes a
very difficult task to suppress resp. take out noise afterwards. This motivates the idea for
this thesis, which is mainly based on [Siedenburg1]. This thesis differs from the latter one
mainly in the aspect, that the access to structured sparsity is be simplified for newcomer
in signal processing, i.e. more prior knowledge will be elucidated. In addition a new oper-
ator type called empirical Wiener estimation is introduced.
‘Sparsity’ is a very new field in mathematics which provides powerful tools for applica-
tions in signal processing. Denoising is the most researched application for this, but it
turns out that structured sparsity entails many other advantages which offer solutions for
more applications as declipping or signal decompositions. In this case the background of
sparsity is the representation of a signal by using as few non-zero coefficients as possible.
It stands for reason to connect this assignment with Gabor analysis, i.e. with frames,
since this field is based on redundant signals. Frames generalize bases with the conse-
quence of non-uniqueness while the energy stays preserved.
The first chapter gives a fast overview of the mathematical tools in order to understand
the basics of audio processing. The focus is on the Fourier transform which maps a signal
into its frequency space, i.e. a more exact investigation of signals concerning their inher-
ent frequencies can be done. Furthermore a typical application in audio engineering is
explained to show the importance of convolution, which is a operation for filtering a signal.
Finally the essentially short time Fourier transform is explained. This one uses window
functions shifted over the signal in order to perform a FT in each step to build an image
where the time and frequency label are confronted.
One special kind of the STFT is the Gabor transform which is discussed in Chapter two.
Gabor analysis uses frames instead of bases. The dictionary, which is an alternative name
for frame, is build by time-frequency shifts. It turned out that that windowed Fourier and
cosine bases perform very good in audio processing and are additionally easy to interpret.
Since frames are redundant they are a good playground for structured sparsity.
The third chapter gives an understanding of sparsity, regularization and thresholding,
first for basis in order to generalize them subsequently to frames. This chapter is all about
the minimization of the so-called Lagrangian. This problem contains the two aspects of
minimizing the discrepancy of the synthesis and minimizing the number of non-zero coef-
ficients depending on a threshold function. For the latter problem the `1 norm can be used
3
which yields the Lasso (least absolute shrinkage operator), but it is shown that replacing
this norm by so-called weighted mixed norms provides further flexibility and opportuni-
ties. Finally the ISTA (iterative soft-thresholding algorithm) and its improved version FISTAare introduced for frames, while in case of a basis only one threshold step is necessary.
In the fourth chapter an additional improvement is presented which includes the coeffi-
cients neighborhoods. They can be taken into account by convolution with the threshold
function. Alternative to the discussed operators the empirical Wiener estimation which is
observed at the end of this chapter works with estimation risks between the original and
the reconstructed signal. Additionally we obtain some suggestions of choosing the thresh-
old level.
At the end of this thesis, in chapter six, some examples for applications are presented as
well as a small prospect of research possibilities.
Some papers and experiments as well as the download link for the “StrucAudioToolbox”
used in this thesis are collected on the webpage:
http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html.
4
Zusammenfassung
“Thresholding” ist weit verbreitet im Gebiet der Tontechnik und wird oft benutzt um
rauschunterdruckte Signale zu erhalten. Es gibt viele Moglichkeiten den analogen Weg
eines Signals zu sichern und das storende Rauschen zu entfernen. Ist das Signal aber
bereits aufgenommen, wird es eine schwierige Aufgabe das Rauschen im Nachhinein zu
unterdrucken bzw. herauszunehmen. Dies motiviert die Idee hinter dieser These, die sich
an [Siedenburg1] orientiert. Diese These unterscheidet sich von letzterer hauptsachlich
in dem Aspekt, dass der Zugang zu Structured Sparsity erleichtert wird fur Neueinsteiger
in der Signalverarbeitung, im speziellen werden mehr Vorkenntnisse erlautert. Zusatzlich
wird ein neuer Operatortyp namens ‘Empirical Wiener Estimation’ eingefuhrt.
‘Structured Sparsity’ ist ein neues Gebiet der Mathematik, das machtige Werkzeuge fur
Anwendungen in der Signalverarbeitung zur Verfugung stellt. Entrauschung ist die am
meisten untersuchte Aufgabe dafur, aber es wird sich herausstellen, dass ‘Structured
Sparsity’ viele Voretile mit sich bringt, die Losungen fur weitere Anwendungen wie ‘declip-
ping’ oder Signalzerlegungen bieten. In diesem Zusammenhang besteht der Hintergrund
von ‘Sparsity’ darin, ein Signal durch Verwendung von so wenigen nicht-Null Koeffizienten
wie moglich darzustellen. Es macht Sinn diese Aufgabe mit ‘Gabor analysis’, insbeson-
dere ‘Frames’, in Verbindung zu bring, da dieses Gebiet auf redundanten Signalen basiert.
Frames verallgemeinern Basen mit der Folge der nicht-Eindeutigkeit, wahrend aber die
Energie erhalten bleibt.
Das erste Kapitel gibt einen schnellen Uberblick von mathematischen Methoden um die
Grundlagen des Audioverarbeitung zu verstehen. Das Hauptaugenmerk liegt auf der
Fourier Transformation, die ein Signal in seinen Frequenzraum abbildet oder genauer, wir
konnen damit Signale bezuglich ihrer enthaltenen genauer untersuchen. Des weiteren
wird eine typische Anwendung aus der Tontechnik erklart, um die Bedeutsamkeit der Fal-tung zu zeigen, welche eine Operation zur Filterung von Signalen ist. Schließlich wird die
essentielle Short Time Fourier Transformation erklart. Diese benutzt Fenster-Funktionen,
die uber das Signal geschoben werden, um dann die FT in jedem Schritt durchzufuhren,
sodass ein Bild erzeugt wird, in dem Zeit und Frequenz gegenubergestellt werden.
Eine spezielle Form der STFT ist die Gabor Transformation, die in Kapitel 2 behandelt
wird. Gabor Analysis benutzt Frames anstelle von Basen. Das ‘Dictionary’, das eine Alter-
native bezeichnung fur Frame ist, wird durch Zeit-Frequenz Verschiebungen konstruiert.
Es hat sich herausgestellt das gefensterte Fourier und Kosinus Basen sich sehr gut in der
Audioverabeitung behaupten und sind dazu noch einfach zu interpretieren. Da Frames
redundant sind, bieten sie eine gute Grundlage fur ‘Structured Sparsity’.
Das dritte Kapitel vermittelt das Grundverstandnis von ‘Sparsity, Regularization’ und
‘Thresholding’ zuerst fur Basen, um sie in Folge fur Frames zu verallgemeinern. Dieses
Kapitel dreht sich um das Minimierungsproblem des sogenannten ‘Lagrangian’. Dieses
Problem beinhaltet die zwei Aspekte der Minimierung der Diskrepanz der Synthese wie
auch die Minimierung der Anzahl der nicht-Null Koeffizienten, das von der ‘Threshold
Funktion’ abhangt. Fur das letztere Problem kann die `1 Norm benutzt, woraus sich das
Lasso (‘least absolute shrinkage operator’), aber es zeigt sich, dass sich weitere Flexibilitat
und Moglichkeiten ergeben, indem wir diese Norm durch sogenannte gewichtete gemis-chte Normen. Schließlich wird der ISTA (‘iterative soft-thresholding algorithm’) und seine
verbesserte Version FISTA fur Frames eingefuhrt, wahrend im Fall von Bases nur ein
‘Thresholding’ Schritt notwendig ist.
Im vierten Kapitel wird eine zusatzliche Verbesserungsmoglichkeit prasentiert, welche
die Nachbarschaft der Koeffizienten miteinbezieht. Diese konnen durch Faltung mit der
‘Threshold Funktion’ miteinberechnet werden. Als Alternative zu den behandelten Oper-
atoren arbeitet die Empirische Wiener Schatzung, die am Ende dieses Kapitels betrachtet
wird, mit dem Schatzungsrisiko zwischen dem Original und dem Rekonstruierten Signal.
Zusatzlich erhalten wir ein paar Vorschlage den ‘Threshold Level’ zu wahlen.
Am Ende dieser These, in Kapitel Sechs, werden ein paar Beispiele fur Anwendungen
prasentiert wie auch ein kleiner Ausblick auf Untersuchungsmoglichkeiten.
Einige Artikel und Experimente, wie auch der Downloadlink fur die “StrucAudioToolbox”,
die in dieser These benutzt wird, wurden auf folgender Webpage gesammelt:
http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html.
6
Contents
1 Basics 111.1 Banach and Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Introduction to Signals and Processing . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Introduction to Gabor Analysis 232.1 Frames and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.2 Gabor Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3 Discrete Gabor analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Structured Sparse Recovery 313.1 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2 Sparse Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.3 Thresholding with Mixed Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353.4 Thresholding Algorithms for Frames . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Improvements and Threshold Selection 494.1 Persistence and Neighborhoods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Empirical Wiener Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Applications and Experiments 55
6 Conclusion 65
Bibliography 67
Appendix 71
Curriculum Vitae 77
7
8
List of Figures
1.1 Sine and cosine oscillation with frequency k and wavelength λ . . . . . . . . . 141.2 Frequency-shift: sine oscillation with original and doubled frequency . . . . . 151.3 Time-shift: original sine oscillation and phase-shifted signal . . . . . . . . . . 151.4 Logarithmic sine sweep with frequency 20-22000 Hz. . . . . . . . . . . . . . . 181.5 System chain for generating an impulse. The grey boxes show the system
components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.6 Signal with 3 different frequencies represented in the time label. [Dopfner] . . 201.7 Signal from 1.6 represented in the frequency label. [Dopfner] . . . . . . . . . . 201.8 Signal from 1.6 represented in the time-frequency plane. [Dopfner] . . . . . . 211.9 Signal short-time-Fourier transformed with a wide resp. a small window.
[Dopfner] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1 Chamber pitch ‘a’ with 440Hz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Spectrogram of the famous Star Wars Theme . . . . . . . . . . . . . . . . . . . 24
3.1 Unit balls of `q for: (a) q = 0, (b) q = 0.5, (c) q = 1, (d) q = 1.5, (e) q = 2 . . . . . . 343.2 Unit balls of the mixed norms, with horizontal group Γ1 = {x, y} and elevation
group Γ2 = {z} Top: left: `1 (Lasso), right: `2 (Tykhonov), Bottom: left: `2,1(Group-Lasso), right: `1,2 (Elitist-Lasso) [Siedenburg1] . . . . . . . . . . . . . . 36
3.3 (a) Standard-Gaussian, (b) Standard-Gaussian with thresholding value λ =0.2, (c) after soft-thresholding, (d) after hard-thresholding . . . . . . . . . . . . 38
3.4 Sketch of the stepwise intended partition of Γ. . . . . . . . . . . . . . . . . . . 44
4.1 Sketch of different neighborhoods. Rectangular or triangular windows canbe used as well as different chosen centers. . . . . . . . . . . . . . . . . . . . . 50
4.2 Non-rectangular neighborhood. Not implemented in Toolbox. . . . . . . . . . . 51
5.1 Clean, noisy and denoised signal coefficients. . . . . . . . . . . . . . . . . . . . 565.2 Transformation in every step with bad shift causes increasing number of
coefficients and therefore noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Different shift values for fixed window length w = 1024. . . . . . . . . . . . . . 585.4 Lasso with alternating shift. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585.5 Relative error in each iteration step for different shrinkage levels. . . . . . . . 595.6 Table of necessary iteration steps until algorithm aborts. . . . . . . . . . . . . 595.7 SNR in each iteration step for different shrinkage levels. . . . . . . . . . . . . . 605.8 Reconstructed voice with loud background. . . . . . . . . . . . . . . . . . . . . 615.9 Reconstructed voice with loud background; group label in time. . . . . . . . . 625.10Multi-layered decomposition from noisy ‘musical clock’. . . . . . . . . . . . . . 63
9
10
1 Basics
We will start with some basic knowledge of signals. This chapter deals with the definition
of a signal in the one dimensional case with its most important properties and leads to
the most important tools of simple signal and image processing. For this we first need
some statements and definitions from functional analysis to know on which spaces we
operate. These results can be found in nearly every work that covers the field of functional
analysis. A good reference book would be e.g. [Heuser].
1.1 Banach and Hilbert Spaces
We consider a vector space over the scalar field K. In this work the relevant scalar fields
are R and C, hence we expect it to be one of them.
Definition 1.1. A mapping ‖.‖ : V → R+ with the properties
• ‖x‖ = 0⇒ x = 0
• ‖α · x‖ = |α| · ‖x‖
• ‖x+ y‖ ≤ ‖x‖+ ‖y‖
for vectors x, y ∈ V and scalar α ∈ K, is called norm.
A vector space V together with a norm is called normed space. Its notation is (V, ‖.‖V ).
Definition 1.2. For a subset W ⊆ V of a vector space we call(i) W a subspace of V , if for all x, y ∈W , α ∈ K⇒ x+ y ∈W , αx ∈W .
(ii) W a dense subspace of V , if for any x ∈ V there is a sequence an ∈W with limn an = x.
(iii) the linear span of W , span(W ), the smallest subspace of V which contains W . Thiscoincides with the set of all finite linear combinations of elements in W . The closedlinear span of W is the closure span(w).
Definition 1.3. A normed space (X, ‖.‖) is said to be separable if there exists a countabledense subset Y ⊆ X.
Definition 1.4. Let an be a sequence in V.(i) an converges to a in V , if ∀ε > 0, ∃N 0 : ‖an − a‖ < ε, ∀n > N .
(ii) an is a Cauchy sequence in V , if ∀ε > 0, ∃N 0 : ‖an − am‖ < ε, ∀n,m > N .
Definition 1.5 (Banach Space). Let (B, ‖.‖B) be a normed space. B is said to be completeif every Cauchy sequence in B converges in B. A Banach space B is a complete normedspace. It is called separable if it contains a countable dense subset.
11
1 Basics
Two examples of important Banach spaces in context of the basic Fourier transform are.
(i) lp(I) := {a = {ai}i∈I : ai ∈ C, ‖a‖p <∞}
with ‖a‖p :=(∑
i∈I |ai|p)1/p
. We call this the space of p-summable sequences. We
obtain a special case for setting p = ∞, the space of all bounded sequences `∞(I)
equipped with the infinity norm‖a‖∞ := supi∈I |ai|.
(ii) Lp(Rd) := {f : ‖f‖p <∞}
with ‖f‖p :=(∫ d
R |f(x)|pdx)1/p
. These Banach spaces are called the space of p-integrable
functions, referring to the Lebesgue integral.
We will see some of the most important signal spaces are Banach spaces as for example
Feichtinger’s algebra S0(Rd) from Definition 1.16.
Definition 1.6 (Inner Product). For a vector space V over R resp. C, we call a mapping〈.|.〉 : V × V → K an inner product, if it has the following properties:
• 〈v|w〉 = 〈w|v〉
• 〈v1 + λv2|w〉 = 〈v1|w〉+ λ〈v2|w〉
• 〈v|v〉 ≥ 0; 〈v|v〉 = 0⇔ v = 0
for vectors v, w ∈ V and scalar λ ∈ K.
Definition 1.7 (Hilbert Space). A vector space V together with an inner product 〈.|.〉 is calledan inner product space. With the induced norm ‖x‖V :=
√〈x|x〉, V is also normed space.
An inner product space with induced norm that is additionally complete is called a Hilbertspace.
In fact Hilbert spaces are special cases of Banach spaces. We can also see this fact by
considering two very important examples.
(i) l2(I) := {x = {xi}i∈I : xi ∈ C,∑i∈I |xp|2 <∞}
with the inner product 〈x, y〉 :=∑i∈I xnyn.
(ii) L2(Rd) := {f :∫R |f |
2 <∞}
with the inner product 〈f, g〉 :=∫Rd f(x)g(x)dx
The most important Hilbert spaces we will need in the field of structured sparsity are the
synthesis-space Hs and the coefficient-space Hc. But first we will discuss some other very
important Hilbert spaces in the terms of the Fourier transform and Short-Time-Fourier
transform such as the space S0(Rd).
Definition 1.8 (Bounded linear operators). Let V,W be normed spaces. A linear operatorT : V →W with finite operator norm
|||T ||| := sup‖x‖V ≤1
‖Tx‖W = sup‖x‖V =1
‖Tx‖W <∞.
is called bounded. We denote the set of these operators by L(V,W ).
12
1.1 Banach and Hilbert Spaces
Theorem 1.1. Let (V, ‖.‖V ) and (W, ‖.‖W ) be a normed space over K and T : V →W a linearoperator. Then the following statements are equivalent:
(i) T is continuous.
(ii) T is continuous in 0.
(iii) T is bounded.
Proof: Can be found in [Heuser].
Definition 1.9 (Self-adjoint Operator). Let T ∈ L(V,W ). Then there exists a unique operatorT ∗ ∈ L(V,W ) satisfying
〈Tx|y〉V = 〈x|T ∗y〉W
T ∗ is called the adjoint operator of T. If T = T ∗, T is called self-adjoint.
13
1 Basics
1.2 Introduction to Signals and Processing
In this section the fundamental properties of signals and processing these are given. The
definitions and results are based on [Haltmeier] and [Grochenig].
A 1-dimensional signal is a mapping
f : D → C
where D ⊂ R.
The choice of D provides which kind of signal we observe:
• For D = R, f is a continuous signal.
• For D = [a, b] bounded, f : [a, b]→ C is a finite signal.
• For D = Z, f : Z→ C is a discrete signal.
Analogously, we can look at signals with higher dimensions. For example, dimension 2
gives us an image. In this work we consider only 1-dimensional signals for audio
processing.
Two of the simplest signals we can think of are the sine and cosine functions.
Figure 1.1: Sine and cosine oscillation with frequency k and wavelength λ
It is important to understand how and with which parameters a signal is build, so we will
take a short look at these. We handle signals by using these values:
• Wavelength: after distance λ the function repeats this part of length λ.
• Frequency: defined as k = 1λ , the number of oscillation periods in [0, 1] (also known
as wavenumber. See 1.2
• Phase: ϕ = ∆xλ · 2π is a constant we add to the argument for getting a shift in time.
See Figure 1.3.
• Amplitude: by multiplying the parameter r we scale the function (change the
volume of the sound).
14
1.2 Introduction to Signals and Processing
Figure 1.2: Frequency-shift: sine oscillation with original and doubled frequency
Figure 1.3: Time-shift: original sine oscillation and phase-shifted signal
We see that we can modify a given signal with simple operations. The operators realizing
these shifts are defined in the following way.
Definition 1.10. Let x, ω ∈ Rd.(i) The translation operator (time shift) Tx : L2(Rd)→ L2(Rd) is defined by
Txf(t) := f(t− x) (1.1)
(ii) The modulation operator (frequency shift) Mω : L2(Rd)→ L2(Rd) is defined by
Mωf(t) := e2πitωf(t) (1.2)
They yield the property that
TxMωf(t) = e−2πixωMωTxf(t).
15
1 Basics
The combination of these two operators will later be used to define a so called ‘atom’.
Atoms are the smallest building blocks for the Gabor transform discussed in Chapter 2.
Thats the reason for their name’s choice. For completeness, we define the
time-frequency shift operator.
Definition 1.11. Let λ = (x, ω) ∈ Rd × Rd = R2d. The time-frequency shiftπ(λ) : L2(Rd)→ L2(Rd)
π(λ) := MωTx.
Thereforeπ(λ2)π(λ1) = e−2πix2ω1π(λ1 + λ2).
Of course this is not sufficient to do useful processing. We need more information about
the signal. This leads us to the most important definition to deal with signals, the
Fourier transform.
Definition 1.12 (Fourier transform). For f ∈ L1(Rd) and ω ∈ Rd we set
(Ff)(ω) := f(ω) =
∫Rdf(x)e−2πixωdx. (1.3)
The mapping f : ω → f(ω), f ∈ L1(Rd) is called Fourier transform of f .
We observe signals in terms of complex analysis, thus each point x ∈ Rd is associated to a
complex value f(x) ∈ Rd. For applications it might be more useful to consider a signal as
a mapping f : time→ amplitude. To get the additional information of frequency, the
Fourier transform has been developed. By introducing this tool we get a possibility to
lead our time domain into a frequency domain and change our mapping into
f : frequency → magnitude. This tells us which frequencies occur in the whole signal and
what are their magnitudes.
Remark 1.2.1. The time values t are measured in seconds and the frequency values ω inhertz.
Remark 1.2.2. Note that the frequency space tells us which frequencies generally exist,but not at which time! For this task consider the Short-Time Fourier transform below.
In other words we could say we partition our signal into its basic oscillations. The proofs
for all following results can be found in [Haltmeier].
Theorem 1.2. The Fourier transform
F : L1(Rd) −→ C0(Rd)
f 7−→ f
is well-defined, linear and continuous and vanishes at infinity. Furthermore it yields theinequality
‖f‖∞ ≤ ‖f‖1.
16
1.2 Introduction to Signals and Processing
Theorem 1.3 (Inversion Formula). For f ∈ L1(Rd), if f ∈ L1(Rd), then
f(x) =
∫Rdf(ω)e2πixωdω (1.4)
in all points where f is continuous.
It turns out that extending F to a unitary operator on L2(R) we get an energy preserving
operation on signals. This ensures the theorem of Plancherel.
Theorem 1.4 (Plancherel). For f ∈ L1(R) ∩ L2(R) is
‖f‖2 = ‖f‖2
and for f, g ∈ L2(R) Parseval’s formula
〈f |g〉 = 〈f |g〉.
The Fourier transform is usually first defined on L1. By Plancherel’s theorem it is shown
that we can extend it to L2. It is even possible to extend the definition to Lp or even bigger
classes of functions.i.e. the Schwartz class. In [Feichtinger2] the Fourier transform is
first defined for bounded measures in order to cover the full background. This one is
often referred to as Fourier-Stieltjes transform since it can be performed over R by
using Riemann-Stieltjes integrals.
We should keep in mind that for application we always use the discrete analogon. By
recalling (1.3) the Fourier transform on Cn is normalized as a unitary operator ν 7→ ν in
Cn,
ν(k) =1√n
n−1∑l=0
ν(k)e−2πikl/n (1.5)
for k = 0, ..., n− 1, which is called discrete Fourier transform (DFT) An Algorithm has
been implemented to compute the Discrete Fourier transform in a very fast way, this
algorithm is called Fast Fourier Transform (FFT).
Definition 1.13 (Convolution). The convolution of two functions f, g ∈ L1(Rd) is defined by
(f ∗ g)(x) :=
∫Rdf(y)g(x− y)dy (1.6)
and satisfies‖f ∗ g‖1 ≤ ‖f‖1‖g‖1
andF(f ∗ g) = Ff · Fg. (1.7)
Remark 1.2.3.(f ∗ g)(x) :=
∫Rdf(y)g(x− y)dy =
∫Rdf(x− y)g(y)dy
17
1 Basics
Equation (1.7) tells us that the convolution in the time space is analogue to the
multiplication in the Fourier/frequency space. This statement is also right when
observing a multiplication in time and convolution in frequency.
The convolution is a very important tool for modifying a signal. A well known example
would be the so called ”convolution reverb”. In this special case a (music-)signal gets
convolved with an impulse obtained by the recorded response in a room to capture its
reverb characteristics and gives us the original one with additional computed reverb. In
fact many plug-ins for audio recording and engineering programs work with impulse
yield by convolution. We will see in the following example more detailed how this can be
used in the field of audio engineering.
Example 1.2.1. It is possible to generate an impulse of an existing system in order tosimulate a ‘real-life’ system. To realize this one can use for example the freeware ”VoxengoDeconvolver” (http://www.voxengo.com). This program gives us the opportunity to capturethe system behavior, i.e. the before mentioned convolution reverb or the frequency responseof a guitar power amp with speaker cabinet. For demonstration purposes we take a look atthe latter.
First we build a sine sweep with help of the Voxengo Deconvolver (also possible to computein MATLABTM with the routine ‘chirp’) and import the obtained audio file into the digitalaudio workstation. A sine sweep is a sine oscillation with increasing frequency over time,in other words we sweep through the human audible range from 20 Hz to 22 kHz and hasa logarithmic behavior. The corresponding spectrogram (see short-time Fourier transform) isshown in Figure 1.4.
Figure 1.4: Logarithmic sine sweep with frequency 20-22000 Hz.
18
1.2 Introduction to Signals and Processing
Figure 1.5: System chain for generating an impulse. The grey boxes show the systemcomponents.
In Figure 1.5 are the ordered components (and their reference models used for thisexperiment) listed that can be used to generate an impulse. It is necessary to have a fullduplex audio interface for playing and recording audio simultaneously. The recordedsweep response is therefore used to create the intended impulse by deconvolution of thesweep with the sweep response. That means for sine sweep s and its response h, we arelooking for the impulse g in
s ∗ g = h.
Thus in plug-ins like ”Poulin LeCab” (http://lepouplugins.blogspot.co.at/) our impulse g canbe loaded in order to be convolved to every recorded signal f. By performing thisconvolution f ∗ g we finally found a way to simulate the system characteristics for a signal.So when a suited impulse is given this is the most promising opportunity to get non-linearsounds, i.e. guitar distortion, in the digital way.
19
1 Basics
Now it’s time to fish for more information out of our given signal. We have seen so far,
that we are able to look at the signal either in time or in frequency. Although it’s good to
know which frequencies appear it would also be nice to know when these frequency
occur. This makes sense, since for example if we have a piano piece, we want to know
when a tone ‘C’ is played, not just how often it is played in this piece. In order to obtain
this information, Time-Frequency analysis is the right setting for us.
The idea is quite simple: localizing f in time at a point x by multiplying with a suitable
window function g centered at x and then perform the already known Fourier transform.
In this way it should be possible to obtain information about the frequencies contained in
the area around x. By bringing this into a mathematical statement we get the
Short-Time Fourier transform.
Definition 1.14 (Short-Time Fourier transform). Let g ∈ L2(Rd), g 6= 0, be fixed. Then theshort-time Fourier transform (STFT) of a function f with respect to g is defined as
Vgf(x,w) : = F(f · Txg(ω)) (1.8)
=
∫Rdf(t)g(t− x)e−2πitωdt (1.9)
=
∫Rdf(t)π(λ)gdt (1.10)
for x, ω ∈ Rd.
Figure 1.6: Signal with 3 different frequencies represented in the time label. [Dopfner]
Figure 1.7: Signal from 1.6 represented in the frequency label. [Dopfner]
20
1.2 Introduction to Signals and Processing
Figure 1.8: Signal from 1.6 represented in the time-frequency plane. [Dopfner]
How the different representations of a signal look like is shown in Figure 1.6, 1.7 and
1.8. The x-axis corresponds to time, the y-axis to frequency and each pixel represents
the value of the coefficients’ energy in decibel (= 10log10(ck,j) in the discrete Gabor
notation discussed in Chapter 2). For the sake of continuity and to avoid unwanted noise
the standard window used for the STFT is the smooth Gaussian. Furthermore the
Gaussian minimizes the below mentioned uncertainty principle.
Definition 1.15 (Gaussian). Letφa(x) = e−
πx2
a (1.11)
denote the non-normalized Gaussian function with width a > 0, a ∈ R.
So does this give us the full and exact information about time and frequency?
Unfortunately not. There is a theorem about the relation of time and frequency in the
signals picture called the uncertainty principle.
Theorem 1.5. Uncertainty Principle For f ∈ L2(R) and a, b ∈ R
(∫R
(x− a)2|f(x)|2dx)1/2(∫
R(ω − b)2|f(ω)|2dω
)1/2
≥ 1
4π‖f‖22 (1.12)
with equality if and only if f is a multiple of TaMbϕc(x) = e2πib(x−a) · e−π(x−a)2/c for somea, b ∈ R and c > 0.
Proof: See [Grochenig].
The consequences of the uncertainty principle are represented in Figure 1.9. If we choose
a wide window the frequency approximation is more accurate but unfortunately the
accuracy of time gets smeared. In reverse for small windows the time representation gets
better and the frequency representation suffers.
The last definition in this chapter which will not be discussed here in detail for is
Feichtinger’s algebra S0(Rd) ⊂ L2 which is the appropriate window class for
time-frequency analysis. For more information, see [FeichGroch] or [Feichtinger2].
21
1 Basics
Figure 1.9: Signal short-time-Fourier transformed with a wide resp. a small window.[Dopfner]
Definition 1.16. Let g be the normalized Gaussian g(x) = ϕ1(x) = e−πx2
, then the so-calledFeichtinger’s algebra S0(Rd) is defined by
S0(Rd) := {f ∈ L1(Rd), ‖Vgf‖L1(Rd×Rd) <∞}. (1.13)
Remark 1.2.4. This definition works equivalently for any arbitrary fixed Schwartz classwindow g 6= 0 be used. See [Dopfner].
We will stop here since it was our desire to give the fundamental comprehension of
signals. Further important definitions and properties as for the Schwartz space S(Rd)resp. S0(Rd) can be found in [Grochenig]. Another important tool would be the
Banach-Gelfand-Triple, which is discussed for example in [CorFeiLu], [Feichtinger1] or
[Bannert]. We jump to the next section discussing fundamentals of Gabor Analysis which
provides the transform used in the further context of structured sparsity.
22
2 Introduction to Gabor Analysis
We learned in the first chapter about the the Short-Time-Fourier transform to represent
a signal simultaneously in both, time and frequency. We will now discuss a specification
of this, the so-called Gabor analysis theory. The further theory of structured sparsity will
be discussed on the ”playground” of Gabor analysis, of course this field is very copious,
so we try to concentrate on the most important definitions and their fundamental grasp.
To get a short introduction, we show one of Peter Soendergard’s ‘LTFAT’-toolbox
implementations (http://ltfat.sourceforge.net/). The contained function ‘sgram’ which
provides the spectrogram of a signal resp. to its settings uses a discrete Gabor transform.
The following example uses this function in combination of a small excursion of how to
generate synthetic signals in MATLABTM. The MATLABTM-Code can be found at the end
of this thesis.
Example 2.0.1. First of all we can provide a simple signal by determining a sinusoid. So ifwe do this with a frequency a human is able to hear, we can hear this sound in MATLABTM
by using the command ‘sound’. The spectrogram is shown in Figure 2.1.
Figure 2.1: Chamber pitch ‘a’ with 440Hz
23
2 Introduction to Gabor Analysis
We can build any kind of melody by composing different sinusoids. The Chamber pitch ‘a’may be computed with
a = sin(2π · 440 · H
8000)
where 440(Hz) is the chosen frequency, the divisor 8000 the sampling rate with sequencepoints H. H should be chosen based on the wished tone length, a whole note in this termcorrespond to the sequence H = (h)i, i = 1, 2, 3, ..., 8000.An example for this may be the famous ‘Star Wars’ - Theme depicted in Figure 2.2. We willsee that we can clearly look at every note of the melody as frequency.It should be noted that at the beginning (the onset) and the end of an digital generatedsignal will alway be a blur. By generating more than one sinusoided signal and puttingthem together in one matrix we also adopt the onset-blurs, a small side effect we have toapprove. We could try to hush this problem up by operating with envelopes, but this maynot be considered now. But still this is a very simple method to get a simple melody just byprogramming.
Figure 2.2: Spectrogram of the famous Star Wars Theme
24
2.1 Frames and Bases
2.1 Frames and Bases
The definitions and results are given in this and the next section are mainly taken from
[Grochenig] and [Dopfner]. Another good reference for Gabor analysis is [Dorfler3].
We start right away with the definition of frames.
Definition 2.1 (Frame). Let H be a separable Hilbert space and Γ a countable index set.The sequence of functions (ϕγ)γ∈Γ ⊂ H is called frame for H, if there exist constantsA,B > 0 such that
A‖f‖2 ≤∑γ∈Γ
|〈f |ϕγ〉|2 ≤ B‖f‖2 ∀f ∈ H. (2.1)
The scalars A and B are called frame bounds. A frame is called tight if A = B. If theframe constitutes additionally a basis, i.e. (ϕγ)γ ∈ Γ are linearly independent and theexpansion coefficients are unique, the frame is called Riesz basis. We could say, if a frameis a basis, then it is a Riesz basis.
Definition 2.2 (Riesz basis). A Riesz basis for a Hilbert space H is a family of the form{f}∞k = 1 = {Uek}∞k = 1, where {ek}∞k = 1 is an orthonormal basis for H and U : H → H is abounded bijective operator.
Remark 2.1.1. Note that this is is not the usual definition taught in lectures. Commonly anequivalent one, cf. [Heuser] is defined for a complete sequence {f}∞k ∈ H and constantsA,B > 0 such that for every finite scalar sequence ck we have
A∑|ck|2 ≤ ‖
∑ckfk‖2 ≤ B
∑|ck|2
We make a small discussion, to specify the fact that frames generalize bases. In fact the
energy stays preserved in the expansion. Furthermore the inequality (2.1) implies
span{(ϕγ)γ∈Γ} = H. The idea behind frames is to get higher flexibility than in bases since
we might not always find a basis. Unfortunately frames have the negative connotation
that we loose the uniqueness of the analysis coefficients, but the expansion property
stays preserved. We know that we obtain a tight frame for equal frame bounds A and B.
Any orthogonal basis is a tight frame with A = B = 1, a union of two orthogonal bases is a
tight frame with A = B = 2 and so on. This tells us that the union of finitely many frames
is again a frame.
We introduce now the earlier mentioned signal space Hs resp. coefficient space Hc. These
two play a decisive role in this work. Both are assumed to be separable Hilbert spaces. In
continuous time we assume Hs = L2(R) as the function- and Hc = l2(Γ) as the
sequence-space of finite energy. In discrete terms we think of Hs = CL and Hc = Cp with
L ≥ p.
The corresponding operators that will help us in the further context of structured
sparsity are given in the following definition.
25
2 Introduction to Gabor Analysis
Definition 2.3. For the frame (ϕγ)γ∈Γ with atoms ϕγ the synthesis operator Φ : Hc → Hswith c = (cγ)γ∈Γ ∈ Hc is defined as
Φc =∑γ∈Γ
cγϕγ (2.2)
Its adjoint operator Φ : Hs → Hc with
Φ∗f = (〈f |ϕγ〉)γ∈Γ (2.3)
is called the analysis operator.This operator fulfills
〈f |ϕ〉 = 〈f |∑γ∈Γ
cγϕγ〉 =∑γ∈Γ
cγ〈f |ϕγ〉 = 〈ϕ∗f |c〉.
Since it’s much easier to handle coefficients in the sense of processing it’s common first
to analyze a signal, then to do our computations in the coefficient space, i.e. sparse
recovery introduced in Chapter 3, and finally synthesize these coefficients to recover the
changed signal.
Definition 2.4. For analysis operator Φ and synthesis operator Φ∗ the frame operator isdefined by
S := ΦΦ∗ : Hs → Hs (2.4)
and fulfills the frame inequality
A‖f‖2 ≤ 〈Sf |f〉 ≤ B‖f‖2 ∀f ∈ Hs. (2.5)
Remark 2.1.2. In the finite/discrete case we can treat all these operators as (infinite)matrices. In this sense the atoms ϕγ constitutes the columns of Φ. The smallest and largesteigenvalues of S represent the optimal frame bounds and the ratio B
A corresponds to thenumerical condition number responsible for its stability.
We will later observe constrained minimization problems to get sparsity of a signal. They
are of the form
minc
1
2‖c‖2
s.t. f = Φc
and have the solution
c = Φ∗S−1f. (2.6)
Solution (2.6) is often called ‘method of frames’. In fact this minimization problem is
analog to the Tykonov-regularization introduced in Chapter 3.
We look at one more corollary before we finally discuss Gabor frames.
26
2.2 Gabor Frames
Corollary 2.1.1. If (ϕγ)γ∈Γ is a tight frame with bound A, then the canonical dual frame is(A−1ϕγ)γ∈Γ and
f = A−1ΦΦ∗f ∀f ∈ Hs.
Remark 2.1.3. Analogous to the canonical dual frame, there exists a canonical tight frame.
This corollary tells us the consequence of tight frames is that analysis and synthesis uses
the same frame. This is very useful since this makes it possible to reduce the effort. To
be more precise, also g/√A for example is a tight atom reproducing the identity which is
good in case of Gabor multipliers. In fact constant functions are providing multiples of
the identity operator,i.e. the real case gives symmetric operators, for which we can find
eigenvalues resp. eigenvectors. If it is necessary to use a different dual window for
synthesis the Gabor multipliers would not be symmetric anymore. For more details see
[Dopfner].
Proposition 2.1.1. Let (ϕγ)γ∈Γ be a frame and S the associated frame operator. Then(S−
12ϕγ)γ∈Γ is a tight frame with frame bounds A = B = 1, which is called canonical tight
frame.
Remark 2.1.4. Note that S−12 is self adjoint, i.e. S−
12SS−
12 = Id.
For Gabor Frames, this result is comparable to a canonical tight window gt := S− 1
2
α,βg0 of
closest to the original window g0 for the frame operator Sα,β with given given lattice
values.
2.2 Gabor Frames
From the point of time-frequency analysis the two most important specifications are
wavelet analysis and Gabor analysis. Two differences should be mentioned namely while
wavelet transforms focus on terms of time-scale representations, the Gabor transform
gives us superpositions of time-frequency atoms. The other important point is that the
wavelet case can be generated with orthonormal basis in contrast to Gabor analysis,
where we live only in dependence of frames. It is said that wavelets may be better for
image processing and Gabor analysis for audio processing but this depends of course on
the respective application.
We already defined in (1.1) and (1.2) the two main tools that generate a Gabor frame, the
translation and modulation operator, or more precise their combination TaMb. But this
time we apply a fixed window function along a discrete subset of Rd, i.e. Hs = L2(R) and
Hc = l2(Γ) for Γ = αZ× βZ.
Definition 2.5. Let g ∈ L2(R) be a non-zero window function. The set of functions
G(g, α, β) := (TαnMβmg)(n,m)∈Z×Z
is called Gabor system. If G(g, α, β) constitutes a frame it is called a Gabor frame.
27
2 Introduction to Gabor Analysis
Remark 2.2.1. The window function g is typically non-negative, centered at the origin andsymmetric.
In connection with the STFT we can write Vgf(x, ω) = 〈f,MωTxg〉 = e−2πixω〈f, TxMωg〉restricted to a discrete lattice (x, ω) ∈ αZ× βZ. The frame operator yields the form
Sf =∑m,n∈Z
〈f |TαnMβmg〉TαnMβmg
=∑m,n∈Z
Vgf(αn, βm)MβmTαng.
Notation: Usually we note atoms corresponding to a Gabor frame by ϕm,n = TαnMβmg.
Proposition 2.2.1. Let G(g, α, β) be a frame for L2(R). Then there exists a dual windowg = S−1g, such that G(g, α, β) is the dual frame of G(g, α, β). Furthermore every f ∈ L2(R)
satisfies
f =∑m,n∈Z
〈f |TαnMβmg〉TαnMβmg
=∑m,n∈Z
〈f |TαnMβmg〉TαnMβmg.
So up to a constant factor the dual frame is close to the original frame.
Theorem 2.1. Let g ∈ L2(R), α, β > 0 and let G(g, α, β) be a frame. Then αβ ≤ 1. If αβ = 1,then G(g, α, β) is a Riesz basis.
This Theorem gives us the connection of frames and Riesz bases. It seems to be ideal to
use this connection for finding basis, but it turns out that Riesz basis do not have the
expected good properties. The theorem of Balian and Low (see [Grochenig]) is kind of a
uncertainty principle for Gabor frames which shows the weakness of a basis in this
context. It makes us aware of the importance of redundancy and the form of frames
since there exists no basis of L2(R) inhibiting the structure of Gabor systems which have
good time-frequency resolution.
So the conclusion is that if a window g is well time-frequency localized then we have
αβ < 1. Furthermore αβ < 1 is fulfilled for all Gabor frames if the Gabor window belongs
to Feichtinger’s algebra S0(Rd). In [Kaiblinger] is discussed to which extent we can use
the discrete setting to approximate the continuous setting.
Remark 2.2.2. There exists a generalization of Gabor frames, the so-called non-stationaryGabor Frames. The special advantage is that they allow adaptivity of windows and latticein either time or frequency. It is clear that this would offer more flexibility and betterresults. They are discussed for example in [DorfMatu] or [VelHoliDorfGrill].
28
2.3 Discrete Gabor analysis
2.3 Discrete Gabor analysis
Since we can only perform computations in the discrete way for applications, we try to
translate some facts into its discrete analogies.
For this task we usually observe signals f ∈ CL, L ∈ N. This appears to make more sense
when we imagine this as an embedding into the space constructed by L-periodic
sequences, with convex values, i.e. CL ∼= l2(N/LN).
Definition 2.6. For f ∈ CL the
• discrete translation operator is defined by
(Tkf)[n] := f [n− k]
• discrete modulation operator is defined by
(Mjf)[n] := e2πijnL f [n]
for k, j ∈ Z
By time-frequency shifting only one window we generate a discrete Gabor system with
atoms
ϕk,j := MjbTkag
for g ∈ CL, k = 0, ...,K − 1, j = 0, ..., J − 1 and Ka = Jb = L. Note that a gives the number of
samples, that means the window gets a-times shifted over the signal, and J = Lb
determines the number of frequency channels in each area regarding the Fourier
transform. By summarizing this we obtain, with our Gabor atoms, the Gabor coefficients
ck,j = 〈f, ϕk,j〉.
Remark 2.3.1. In case ab = L the discrete Gabor frame corresponds to a basis and ifab < L the frame is oversampled with redundancy L
ab . The discrete analogue of the STFTseems to be the Gabor transform with a = b = 1, i.e. a maximal redundancy of L.
It is suggestive to choose windows of length l << L. Therefore we get a frequency lattice Ll
and reduced redundancy Lab = l
a with a ≤ l. The best results have been achieved with
redundancies of 2 to 8 in practical applications.
In order to constitute a frame from a discrete Gabor system it is necessary that
r := KJ ≥ L and that the matrix Φ = (ϕk,j)k,j ∈ CL×r occupies full rank. Furthermore the
frame operator S has a diagonal form if supp(g) ≤ J , where g denotes the window and J
the length of the FFT (mentioned in the first chapter) and corresponding to the Walnut
representation given in [Dorfler1] (original: [HeilWalnut])
Sp,q =
J(∑K−1k=0 Tkag(p)Tkag(q) if |p− q| mod J = 0
0 else
29
2 Introduction to Gabor Analysis
we get
Sf [n] = J
(K−1∑k=0
Tka|g[n]|2)f [n]
for n = 1, ..., L and thus the calculation of the dual window
g[n] = g
/(J
K−1∑k=0
Tka|g[n]|2)
For more details on the complexity of the discrete Gabor form, see [Holighaus]. Note that
in applications all operations are implemented on the basis of the FFT.
30
3 Structured Sparse Recovery
Up to now we have explained how to represent a given signal. These results will further
be used to observe some methods for processing realistic signals. If one has ever tried to
record an instrument or someone’s singing by making use of a simple microphone, you
might have noticed, that there always seems to be a background noise as a well known
‘air-rustle’. So we have to observe now a signal with additive noise e. Let f be the wanted
signal and y the observed data from recording. Hence then we can consider the signal as
y = f + e (3.1)
The aim of our further work concerns the recovering of the signal f , at least
approximately. We will only operate on two special Hilbert spaces
• Hs... signal space
• Hc... coefficient space
Hs in application can be seen as L2(R) or CL. We represent f as a linear combination of
frame elements, f =∑γ cγϕγ with coefficients c ∈ Hc and corresponding index set γ ∈ Γ.
So c ∈ Hc can be thought of as being `2(Γ) or also as CL in applications. Φ : Hc → Hs is
known as the synthesis operator with Φ = (ϕ1, ..., ϕγ , ...). We need to search the
coefficients, which minimizes the discrepancy
∆(c) :=1
2‖y − Φc‖22 (3.2)
of the given data y and the image of c. As [[Siedenburg1], p.16] says: “This task can be
considered as a linear inverse problem, as we seek to infer c from its noisy image under
the linear operator Φ.” Unfortunately the linear problem does not have a unique solution
and is furthermore not continuously dependent on the data. So we are lead to the term
of regularization. By adding some constraints on the coefficients in form of a penaltymeasure Ψ : Hc → R+
0 we obtain the regularized functional, the so-called Lagrangian
L(c) := Ly,λ(c) := ∆(c) + λΨ(c) (3.3)
and seek c ∈ Hc such that
c = argmincL(c). (3.4)
31
3 Structured Sparse Recovery
The value λ > 0 is named Lagrange-multiplier. The more common name for λ is sparsitylevel since it gives us the weight given to the penalty term. The bigger λ, the more the
penalty will be taken into account and and vice versa.
3.1 Sparsity
The idea behind structured sparsity is to represent a signal with as less information as
necessary, what actually means ‘sparse’.
Example 3.1.1. As example for sparsity we can observe a matrix with zeros in it.2 0 0 0
0 7 0 0
0 0 5 0
3 0 14 1
Unless it’s unnecessary to save the whole matrix (because it would need more data incomputational application need more and thus an application might get slower) we alsocould save only the relevant entries which are not zero. In MATLABTM we get therepresentation
(1,1) 2(4,1) 3(2,2) 7(3,3) 5(4,3) 14(4,4) 1
with the command sparse(’matrix’) .
As in the matrix example, we try to compute an approximation of the signal s by using as
few atoms ϕγ as possible. Indeed large classes of signals seem to be sparsely represented
by using a suited dictionary. That causes the importance of this method.
An useful example for sparsity in the following sense is the separation of an audio signal
into a separation model that can be thought of signal = tonal + transient+ noise. We call
this resolution a multi-layer decomposition. For imagination we can think of the tonal
part as melody composed of frequencies and the transient part as a composition of beats
of drums or attack of an electric guitar. We can connect the tonal layer to the stationary
part of sounds. This one is sparsely represented in a dictionary realized with a long
supported window. In contrast, the transients which actually are the non-stationary
parts will be be represented in a Gabor frame with short support. This kind of
decomposition is shown with an example in Chapter 5.
More natural examples may be the mentioned application of denoising. In this thesis our
focus will lie on this one which is directly connected to the term of structured sparsity.
32
3.2 Sparse Regularization
This application effects miracles, if someone is trying to make a ‘noisy’ signal clean. Just
think of the crunchy sound known from ancient discs for a phonograph.
And a third very important example is declipping. Clipping is a kind of waveform
distortion that occurs when an amplifier is overdriven and attempts to output a signal
current beyond its maximum capability. This engenders an unwanted ‘click’ where the
signal is clipped. But often deliberately overdriving of a signal is exactly what is wanted.
Many electric guitar players for example intentionally overdrive their guitar amplifiers to
cause clipping in order to get a desired sound, a so called ‘Overdrive’- resp.
‘Distortion’-effect.
Fact is that penalty measures Ψ which support sparsity will offer a solution which seems
to be more natural and provides with higher resolution of the recovery. In this chapter
this kind of problem will be discussed based on [Siedenburg1]. An overview of how we got
to research structured sparsity is also to find there.
3.2 Sparse Regularization
The problem we have to observe is clarified in (3.4). The next step lies in the task of
finding a suited penalty measure Ψ for the Lagrangian defined in (3.3). In fact this is the
point for which we will define all operators in the further context.
To realize this in respect of a sparse representation, it is clear that the aim is the
minimization of the number of non-zero coefficients given by ‖c‖0 := #{cγ : cγ 6= 0}, i.e.
Ψ = ‖c‖0. ”This approach was also called ideal atomic decomposition by [Donoho] since
is it yields the most efficient representation of a signal” as mentioned in [Siedenburg1].
But there is one little problem. Using the `0-penalty brings us to a non-convex problem,
which is NP-hard. We make a small digression in the next paragraph which can be
omitted for our further work.
A decision problem is a question in some formal system with an answer that can only be
‘yes’ or ‘no’, depending on the values of some input parameters. For example, the problem
”given two numbers x and y, does x evenly divide y?”. NP means non-deterministicpolynomial-time and is the set of all such decision problems for which the instances,
where the answer is ”yes”, have efficiently verifiable proofs of the fact that the answer is
indeed ”yes”. These proofs have to be verifiable in polynomial time. NP-hard is a class of
problems that are ”at least as hard as the hardest problems in NP”. For understanding
the detailed theory of NP-Completeness the lecture [GareyJohnson] is recommended.
The only thing we should keep in mind is that this non-convex problem cannot be solved
in finite time. But the `0 norm will still be important to compare sparsity properties.
We continue to find a solution. For getting an easier and more solid way of computing, we
will use instead of the `0-norm the `1-norm, i.e. Ψ(c) = ‖c‖1. It was shown that the
`0-solution can be recovered uniquely by the `1-minimization under the condition of a
33
3 Structured Sparse Recovery
sufficient sparse signal. See [Donoho].
Figure 3.1: Unit balls of `q for: (a) q = 0, (b) q = 0.5, (c) q = 1, (d) q = 1.5, (e) q = 2
We could say the `1-norm is a convexification of `0. This point may not be clear
immediately in [Siedenburg1]. So this can be visualized by approximating the `0-norm by
a `q pseudo-norm
‖c‖q =
(∑γ
|cγ |q)1/q
(3.5)
which is non-convex for 0 ≤ q < 1, and convex and thus a norm for q ≥ 1. If we take a look
at Figure 3.1 we can observe in two dimensions for decreasing q, that the unit balls of
vectors ‖c‖q ≤ 1 approach the `0 unit ball, which corresponds to the two axes. Thus
analogue to that the `1 Lagrangian minimization ‘convexificates’ the `0 Lagrangian
minimization. Based on these considerations our problem (3.4) changed to the convex
problem
minc
{1
2‖y − Φc‖22 + λ‖c‖1
}. (3.6)
An equivalent formulation of this problem was given by [Tibshirani] in the field of
statistics with the name Basis Pursuit Denoising (BPDN) and LASSO - least absoluteshrinkage and selection operator. [Siedenburg1] tells us that the deterministic
linear-inverse problem and the stochastic based regression-problem coincide. We will
discuss this problem from our deterministic point of view.
Remark 3.2.1. To get a sense for BP, we take a look of an other term, which is mentionedin [WeinWakin]. One of the most promising fields for recovering certain signal information isCompressive sensing (see [Mallat] or [Candles]).Compressive sensing (also known as compressed sensing, compressive sampling, orsparse sampling) is a signal processing technique for efficiently acquiring andreconstructing a signal by finding solutions to underdetermined linear systems. This takesadvantage of the signal’s sparseness or compressibility in some domain, allowing theentire signal to be determined from relatively few measurements. So we can conclude thatthis plays an important role in the application of denoising. Knowing this, we see BP as thecanonical CS-method for recovering a sparse signal.The main difference between BP and Lasso is that the first one only works for
34
3.3 Thresholding with Mixed Norms
underdetermined systems. Whereas the Lasso is more tailored for overdeterminedsystems. The last one is realized by minimizing the squared error rather than constrainingit to be equal to zero. As mentioned before, the equivalent problem to LASSO is BPDN.
3.3 Thresholding with Mixed Norms
With the Lasso, we have our first operator that delivers us a way to sparse regularization.
However there may be a more natural way of considering (3.6). Since we are observing
problems from the point of view of Gabor analysis with atoms ϕγ, generated by
translation and modulation and therefore ordered along two dimensions, it makes sense
to split our indices into groups and members. The way to realize this to replace our `1penalty by a weighted mixed norm. This yields the result that the global (group) level
and local (member) level will work in a different way.
Definition 3.1 (Weighted Mixed Norm). Let Γ be a doubly labelled index set andK,J1, J2, ..., Jk, ... be countable index sets such that Γk := {(k, j) : j ∈ Jk} ∀k ∈ K we haveΓ =
⋃k∈KΓk, in other words Γ is the disjoint union of the groups of indices Γk.
Let w = (wγ)γ∈Γ be a positive sequence of weights. The weighted mixed norm `w,p,q on Hcfor 1 ≤ p, q <∞ is defined by
‖c‖w,p,q :=
∑k∈K
∑j∈Jk
wk,j |ck,j |pq/p
1/q
(3.7)
So one of these groups consists of one fixed index k together with all appointed indices j.
We will see that these groups can be ordered along the time as well as the frequency
label, to achieve different results.
Remark 3.3.1. We assume our index sets always to be countable.
Note that weighted mixed norms are a generalization of the regular weighted norms, that
means for p=q:
`w,p,q = `w,p = `w,q
These norms fulfill the known norm properties from Definition 1.1 since they are
equivalent to a composition of `p and `q:
‖c‖w,p,q =
∥∥∥∥{∥∥∥(ck,j(wk,j)1/p)j∈Jk
∥∥∥p
}k∈K
∥∥∥∥q
(3.8)
Notation: The notation of the indices will further be used depending on their reference. γ
denotes the non-partitioned set Γ while (k, j) is connected to its explicit group and
member form Γ =⋃k Γk.
With help of this structure we can emphasize sparsity on either member or group level.
The penalties we will use further on are the already known regular Lasso `w,1,1 = `w,1 and
the new mixed penalties
35
3 Structured Sparse Recovery
Figure 3.2: Unit balls of the mixed norms, with horizontal group Γ1 = {x, y} and elevationgroup Γ2 = {z} Top: left: `1 (Lasso), right: `2 (Tykhonov), Bottom: left: `2,1(Group-Lasso), right: `1,2 (Elitist-Lasso) [Siedenburg1]
36
3.3 Thresholding with Mixed Norms
• `w,2,1... Group Lasso
• `w,1,2... Elitist Lasso
`w,2,1 which was first discussed in [YuanLin] intensifies sparsity on the group level while
retaining diversity on the level of members. It is reasonable that `w,1,2 does the opposite.
The name ‘Elitist Lasso’ appears first in [Kowalski]. As the names suggests, either a
group of coefficients is important and will be held or discarded or the coefficients are
treated completely individual and are discarded only according to their size. To simplify
the latter one we could say ‘only the strongest will survive’. It turns out that each of
these strategies has advantages and disadvantages. To get an impression, Figure 3.2
sketches the unit balls corresponding to p, q ∈ {1, 2}.
Reformulating the Lagrangian by setting Ψ(c) = 1q‖c‖
qw,p,q gives us
Lw,p,q(c) :=1
2‖y − Φc‖22 +
1
q‖c‖qw,p,q.
Thus our sparse recovery problem (3.4) has changed to
mincLw,p,q(c). (3.9)
Remark 3.3.2. The information of the λ is w.l.o.g. adopted by the weights w.
The following definition is the most elementary tool for proceeding in sparse
regularization. After the definition of the generalized thresholding operator we will
derive the four cases with the underlying mixed norms.
Definition 3.2 (Generalized Thresholding Operator). For z, w ∈ Hc, wγ > δ > 0 and anon-negative function ξ = ξγ,w : Hc → [0,∞], the generalized thresholding operator isdefined component-wise by
Sξ(zγ) := zγ(1− ξγ,w(z))+ (3.10)
where b ∈ R, b+ := max(b, 0). ξ is called the threshold function.
Notation: We usually write Sξ(z) := (Sξ(zγ))γ∈Γ and the subscript is mostly adjusted for
the respective dependencies. For example, Sw,p,q highlights the relation to ξγ,wcorresponding to the weighted mixed norm `w,p,q (see below).
In face there are two possibilities of defining a threshold operator. The one we already
know is the so-called soft-thresholding operator SST which can be written as
SSTλ (z) := z
(1− λ
|z|
)+
=
z − λ : z ≥ λ
0 : |z| < λ
z + λ : z ≤ −λ
, i.e. ξST =λ
|z|
for z ∈ C.
37
3 Structured Sparse Recovery
The second type is the hard-thresholding operator SHT :
SHTλ (z) :=
z : |z| > λ
0 : |z| ≤ λ
The interesting connection between these two operators is
ξHT (z) = limk→∞(ξST (z))k.
In Figure 3.3 is shown how the two thresholding operators work. While the
hard-thresholding operator just sets every value under the shrinkage level λ equal zero
and leaves the rest untouched, the soft-thresholding operator lowers in addition the rest
by λ. Soft-thresholding provides a smoother result which mostly delivers better results.
Figure 3.3: (a) Standard-Gaussian, (b) Standard-Gaussian with thresholding value λ =0.2, (c) after soft-thresholding, (d) after hard-thresholding
[[Kowalski], Theorem 3] gives the following theorem for the thresholding operators
corresponding to each of our weighted norms. A proof can also be found there.
38
3.3 Thresholding with Mixed Norms
Theorem 3.1. Let Φ be the unitary synthesis operator and w = (wγ)γ the strictly positivesequence of weights. For γ = (k, j), let xk := (xk,j)j denote the subsequence of members and‖xk‖p its respective `p-norm. Then the minimizer Lw,p,q from (3.9) is given by thegeneralized soft thresholding operation
c = Sξ(Φ∗y), (3.11)
which is defined component-wise by ξw for zk,j 6= 0 in the following cases:
(i) p = q = 1 : ξw(zγ) =wγ|zγ | (Lasso)
(ii) p = q = 2 : ξw(zγ) =wγ
1+wγ(Tykhonov Regularization)
(iii) p = 2, q = 1;wk,j = wk∀k, j : ξw(zk,j) =√wk,j‖zk‖2 (Group-Lasso)
(iv) p = 1, q = 2 : ξw(zk,j) =wk,j
1+Wwk||||zk|||wk|zk,j | (Elitist-Lasso)
where Wwk :=∑Jkjk=1 w
2k,jk
, and |||zk|||wk =∑Jkjk=1 wk,jk |zk,jk | and for any k, jk is a
sequence of indices such that rk,jk :=|zk,jk |wk,jk
is decreasing in jk, and Jk is the quantityverifying
rk,Jk+1 ≤Jk+1∑jk=1
w2k,jk
(rk,jk − rk,Jk+1) and rk,Jk >
Jk∑jk=1
w2k,jk
(rk,jk − rk,Jk)
Proof: The proof is taken from [[Siedenburg1], Theorem 2.6]. The aim is the
minimization of ,
1
2‖y − Φc‖22 +
1
q‖c‖qw,p,q. (3.12)
The cases for subsequences ck = 0 for p = 2, q = 1 and with components ck,j = 0 for p = 1,
q ∈ {1, 2} will be shown separately, since the problem (3.12) becomes non-differentiable in
these.
We define y′ := Φ∗y, θγ := arg(y′γ) and ϑγ := arg(cγ). From the unitarity of Φ follows
‖y − Φc‖22 = ‖y′ − c‖22 =∑γ
|y′γ − cγ |2 =∑γ
|y′γ |2 + |cγ |2 − 2|y′γ ||cγ |cos(θγ − ϑγ).
This tells us that we can execute the minimization component-wise. Next, we fix γ = (k, j)
and ck,j 6= 0, and differentiate (3.12) to obtain
|ck,j | = |y′k,j |cos(θk,j − ϑk,j)− wk,j |ck,j |p−1∑k
∑j
wk,j |ck,j |p
q−pp
(3.13)
0 = 2|cγ ||y′γ | sin(θγ − ϑγ). (3.14)
forθγ = ϑγ + lπ with l ∈ {0, 1}. Note that the differentiation made with regard to modulus
39
3 Structured Sparse Recovery
and argument of cγ. From (3.13) can be derived that l = 1 is impossible, so θγ = ϑγ must
be fulfilled.
Summarizing the sums over k and j of the righthandside of (3.13) yields the variational
equations
|ck,j | = |y′k,j | − wk,j |ck,j |p−1‖ck‖q−pwk,p(3.15)
arg(cγ) = arg(y′γ). (3.16)
We will now observe the four cases separately:
(i) p = q = 1. For wγ < |y′γ |, we have
|cγ | = |y′γ | − wγ . (3.17)
For wγ ≥ |y′γ |, (3.15) can not be verified, which implies cγ = 0. Thus in combination
with (3.16) and (3.17) finally yields cγ = y′γ(1− wγ|y′γ |
)+.
(ii) p = q = 2. This one is the simplest case since problem is differentiable everywhere
and we obtain cγ =y′γ
1+wγ= y′γ(1− wγ
1+wγ)+.
(iii) p = 2,q = 1. Since the weights w are assumed to be constant over each
subsequence, we can neglect the second index j and write wk = wk,j. For the special
case ck 6= 0 mentioned at the beginning of the proof, (3.15) implies
|ck,j | = |y′k,j | − wk|ck,j |‖ck‖−1wk,2
. (3.18)
The index j can w.l.o.g. be substituted by an independent index l, because wk is
constant in this group as mention before. So we can write
wk‖ck‖−1wk,2
=|y′k,l| − |ck,l||ck,l|
.
If we insert this term into
|ck,j | =|y′k,j |
1 + wk‖ck‖−1wk,2
we obtain
|ck,j | =|y′k,j ||y′k,l|
|ck,l| ⇐⇒ |ck,l| = |ck,j ||y′k,l||y′k,j |
∀j, l.
We use the last identity to decouple the coefficients in the group norm of c:
‖ck‖wk,2 =
√∑l
wl|ck,l|2 =
√∑l
wl|y′k,l|2|ck,j |2|y′k,j |2
=√wl|ck,j ||y′k,j |
‖y′k‖2.
Inserting into equation (3.18) yields
40
3.3 Thresholding with Mixed Norms
|ck,j | = |y′k,j |(
1−√wl
‖y′k‖2
).
and since for√wl > ||y′k||2, (3.18) does not hold, except for ck = 0 we receive
ck,j = y′k,j
(1−
√wl
‖y′k‖2
)+
.
(iv) p = 1,q = 2. In this case, we will decouple the entries of c in order to a re-sort the
coefficients. Thus an explicit expression for thresholding can be achieved. We start
by rewriting (3.15) for all k and j which has the form.
|ck,j | = ||y′k,j | − wk,j‖ck‖wk,1, (3.19)
By rearranging this equation changes to
‖ck‖wk,1 =|y′k,j | − |ck,j |
wk,j∀k, j (3.20)
Then, ∀k, j, l with ck,j 6= 0 and ck,l 6= 0: |ck,l| = |y′k,l| −wk,l(|y′k,j |−|ck,j |)
wk,j. To keep the
overview we use the abbreviations
Wwk :=∑
l:|ck,l|6=0
w2k,l, |||y′k|||wk =
∑l:|ck,l|6=0
wk,l|y′k,l| and ρk := ‖ck‖wk,1,
and consider
ρk =∑l
wk,l|ck,l|
=∑
l:|ck,l|6=0
wk,l
(|y′k,l| −
wk,l(|y′k,j | − |ck,j |)wk,j
)(3.21)
=
∑l:|ck,l|6=0
wk,l|y′k,l|
−Wwk
|y′k,j | − |ck,j |wk,j
= |||y′k|||wk −Wwkρk,
which is equivalent to
ρk =|||y′k|||wk
1 +Wwk
(3.22)
Note that this is still an implicit form which depends on the support set
{l : |ck,l| 6= 0} = {l : |y′k,l | > wk,lρk}. To receive an explicit expression for the support,
we observe a sequence of indices jk for every k, such that the sequence rk,jk :=|y′k,jk |wk,jk
decreases. Next, we choose Jk such that rk,Jk+1 ≤ ρk < rk,Jk , i.e. jk = 1, ..., Jk should
belong to the support of c. Now we use (3.21) to obtain
41
3 Structured Sparse Recovery
rk,Jk+1 ≤Jk+1∑jk=1
w2k,jk
(rk,jk − ρk) and rk,Jk >
Jk∑jk=1
w2k,jk
(rk,jk − ρk)
and for inserted ρ
rk,Jk+1 ≤Jk+1∑jk=1
w2k,jk
(rk,jk − rk,Jk+1) and rk,Jk >
Jk∑jk=1
w2k,jk
(rk,jk − rk,Jk).
We finally attained an implicit expression of Jk which still is independent of c. We
consider that
Wwk =
Jk∑jk=1
w2k,jk
, and |||y′k|||wk =
Jk∑jk=1
wk,jk |y′k,jk |.
Combining (3.22), (3.20) and (3.16), the formula we where looking for is shown:
ck,j = y′k,j
(1− wk,j
(1 +Wwk)
|||y′k|||w−k|y′k,j |
)+
.
Remark 3.3.3. The Tykhonov-regularization with p = q = 2 is in fact no thresholding in oursense, since ξw(zγ) =
wγ1+wγ
< 1, i.e. Sξ,w(z) 6= 0 ∀z 6= 0. That means, no non-zero coefficientsare indeed set to zero and thus the solution exhibits no sparsity. But it still has anadvantage: a solution with closed form for general linear operators: c = (Φ∗Φ + Iw)−1Φ∗y)
These formulas seem quite complicated. It is not that important, not to understand them
in detail if we just see them as a tool for thresholding, i.e. for the later discussed
algorithms. The following corollary from [Kowalski] gives us comparison between mixed
norm regularization and the weighted `1 norm, which simplifies the handling of mixed
norms.
Corollary 3.3.1. Let Φ be unitary and w = wγ > 0. Then there exists a strictly positivesequence u = (uγ)γ∈Γ depending on y′ := Φ∗y, such that the minimizer and minima of Lw,p,qand Lu,1 coincide, i.e. for Lw,p,q-minimizer c and Lu,1-minimizer c is
‖c‖u,1 = ‖c‖qw,p,q . (3.23)
The proof can also be found in [Siedenburg1].
42
3.4 Thresholding Algorithms for Frames
3.4 Thresholding Algorithms for Frames
It is time to proceed some methods for obtaining the intended minimizer of 3.9 for
Frames. The most obvious way seems to be finding algorithms which converge to this
solution. By considering the case of orthonormal bases we receive an algorithm called
block coordinate relaxation for finite dimensions, proposed by [Sardy]. Unfortunately
Gabor analysis in which we want to operate does not give us the foundation of a
orthonormal basis as for example the modified discrete cosine transform (MDCT) has it.
We already know that we can use an alternative to these, the Gabor frames. Therefore we
want to generalize the soft-thresholding results to frames or general linear operators. We
should keep in mind that in case of an orthonormal basis we only need one step resp.
operation while in Gabor representation we need an iterative process.
To get the desired iterations, we start with the so-called Landweber iteration for the
approximate solution of inverse integral operator problems and proceed by modifying it,
since it does not contain a thresholding step.
First we discuss the case of LASSO with `1 and then generalize them for the other cases.
The following Theorem which similar to [Daubechies] guarantees the convergence of the
sequence generating the the iterative algorithm.
Theorem 3.2. Let Φ : Hc → Hs a bounded linear operator s.t. ‖Φ‖ < 1, the weightsw = (wγ)γ uniformly bounded from below, i.e. wγ ≥ w > 0 get a sequence. Then, forarbitrary c0 ∈ Hc the sequence (χn(c0))n∈N⊂Hc with
χ(c) = Sw(c+ Φ∗(y − Φc)) (3.24)
that converges to a minimizer of the Lagrangian Lw,1.
Adapting this result we obtain the iterative soft-thresholding algorithm (ISTA), also
called thresholded Landweber iteration:
cn+1 = (χn+1(c0))n+1 = Sw(cn + Φ∗(y − Φcn)) (3.25)
We will not cite the proof of Theorem 3.2 and hence the convergence of the ISTA for the
case of `1 penalization in this work since this is a long and technical task. But the idea
should be considered. In fact [Opial] gives the following theorem that serves as a tool for
proofing Theorem 3.2. In further consequence the generalized ISTA called multi-layer
decompositions can be proofed analogously. All of these proofs can be read in
[Siedenburg1].
43
3 Structured Sparse Recovery
Theorem 3.3. Let H be a Hilbert space and let χ : H → H satisfy the conditions(i) χ is non-expansive, i.e. ‖χ(c)− χ(a)‖2 ≤ ‖c− a‖2 ∀c, a ∈ H;
(ii) χ is asymptotically regular, i.e. ∀c ∈ H : ‖χn+1(c)− χn(c)‖2 → 0 for n→∞;
(iii) the set Fix(χ) of fixed points of χ is non-empty.
Then the sequence (χn(c0))n∈N converges weakly to a fixed point in Fix(χ), for all c0 ∈ H.
It turns out that the ISTA converges even strongly.
Theorem 3.4 ([Daubechies]). Under the assumptions of Theorem 3.2, the sequence ofiterates (cn)n∈N with
cn+1 = Sw(cn + Φ∗(y − Φcn))
converges strongly to a minimizer Lw,1.
In order to obtain the implication from weak to strong convergence we use the following
Lemma 3.4.1. In fact to be able to proof the lemma two other results proven in
[Daubechies] are necessary, namely that for c? := w − limncn and h := c? + Φ∗(y − Φc?) we
yield
• ‖Φ(cn − c?)‖22 → 0 for n→∞;
• ‖Sw(h+ (cn − c?))− Sw(h)− (cn − c?)‖2 → 0 for n→∞.
Lemma 3.4.1. If a ∈ Hc and (bn)n∈N ⊂ Hc, with weak limit w − limnbn = 0, and
limn‖Sw(a+ bn)− Sw(a)− bn‖2 = 0, then ‖bn‖2 → 0 for n→∞.
Proof: This proof is taken from [Siedenburg1] with small alterations. We follow the idea:
a partition shown in Figure 3.4 of the index set Γ into Γ0, Γn1 and Γn1 , ∀n will be made in
order to show strong for each of these, i.e. Γn1 will vanish for sufficiently large n.
Figure 3.4: Sketch of the stepwise intended partition of Γ.
44
3.4 Thresholding Algorithms for Frames
For w := infγ wγ, define a finite set Γ0 ⊂ Γ such that for Γ1 := Γ \ Γ0,
∑γ∈Γ1
|aγ |2 ≤ (w
4)2.
Since we have weak convergence of bn we yield strong convergence on this finite set Γ0,
i.e.
∑γ∈Γ0
|bnγ |2n→∞−−−−→ 0.
The next sets are defined as Γn1 := {γ ∈ Γ1 : |bnγ + aγ | < wγ} and Γn1 := Γ1 \ Γn1 . If γ ∈ Γn1 , we
have 0 = Swγ (aγ + bnγ ) = Swγ (aγ), where the last equality fulfillsis |aγ | ≤ w4 ≤ wγ.
Thus |Swγ (aγ + bnγ )− Swγ (aγ)− bnγ | = |bnγ |, implying
∑γ∈Γn1
|bnγ |2 ≤∑γ∈Γ
|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2n→∞−−−−→ 0.
The strong convergence is now showed for the first sets. If we are successful with showing
that Γn1 vanishes for large n, i.e.∑γ∈Γn1
|bnγ |2 → 0, the proof is completed. We consider Γn1
summarized as set {y ∈ Γn1 : |aγ + bnγ | ≥ wγ ; wγ − w4 > |aγ |} For this we estimate
|bnγ − Swγ (aγ + bnγ ) + Swγ (aγ)| = |bnγ − Swγ (aγ + bnγ )|
= |bnγ − ei arg(aγ+bnγ )(|aγ + bnγ | − wγ)|
≥∣∣∣|ei arg(aγ+bnγ )wγ | − |ei arg(aγ+bnγ )(|aγ + bnγ |) + bnγ |
∣∣∣= |wγ − |aγ ||
>w
4,
and therefore
∑γ∈Γn1
|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2 ≥( w
4
)2
ρ.
where ρ := #Γn1 . As assumed,∑γ∈Γn1
|Swγ (aγ + bnγ )− Swγ (aγ)− bnγ |2 → 0 for n→∞.
Hence the result from Theorem 3.4 follows.
Now we will generalize the received results to the so-called multi-layer decomposition,
i.e. to multi-frame/penalty expansions and simultaneously to mixed norms penalties. An
example is discussed in Chapter 5. First we adjust the notation for this generalized case
similar to [Siedenburg1].
45
3 Structured Sparse Recovery
Notation: We have the signal space Hc = Hc,1 × · · · × Hc,M for M ∈ N. The respective
coefficients are of the form c = (c[1], ..., c[M ])T ∈ Hc. The syntheses operator is denoted by
Φ = ⊕Mi=1Φi : Hc → Hs and Φc =∑i Φic[i] with the frames Φi : Hc,i → Hs. Furthermore
w = (w[1], ..., w[M ]) is the sequence of strictly positive weights and the multi-indices
p = (p1, ..., pM ), q = (q1, ..., qM ) corresponding to pi, qi ∈ {1, 2} will be necessary. The
multi-layered Lagrangian will be
Lw,p,q(c) :=1
2‖y −
∑i
Φic[i]‖22 +∑i
1
qi‖c[i]‖qiw[i],pi,qi
(3.26)
Sw,p,q = (Sw[1],p1,q1 , ...,Sw[M],pM ,qM ) should be considered as soft-thresholding operator,
which works independently on each c[i] and each Sw[i],pi,qi .
Theorem 3.5 ([Kowalski]). Let M ∈ N and Φ,Ψ, c, w, p, q be the concatenated multi-layeroperators, coefficients, weights and parameters as defined above. Let each Φi be a linearbounded operator such that ‖Φ‖ < 1, w[i] a positive sequence and strictly bounded frombelow, and pi, qi ∈ {1, 2} for i = 1, ...,M . Then for any c0 ∈ Hc, the iterative sequence
cn+1 = Sw,p,q(cn + Φ∗(y − Φcn)) (3.27)
converges strongly to the minimizer of Lw,p,q.
The proof given in [Siedenburg1] is made by several steps. The idea:
First an analogue approach to Theorem 3.3 for the extended case is made to receive weak
convergence. Next the assumption is made and proven that under the assumptions of
Theorem 3.5 there is a strictly positive sequence u = (uγ)γ∈Γ depending on the weak c?
limit of (3.27), such that the ISTAs associated to
Lw,p,q(c) =1
2‖y − Φc‖22 +
∑i
1
qi‖c[i]‖qiw[i],pi,qi
and Lu,1(c) =1
2‖y − Φc‖22 + ‖c‖u,1
reach their minima at the same point c?.
Finally by combining the result for weak convergence and the assumption from above,
the strong convergence can be proven.
The algorithm for applications would be as followed.
Algorithm 1 (Multi-layered ISTA)
(c0[1], ..., c0[M ])
T ∈ Hcrepeat
for i=1:M docn+1[i] = Sw[i],pi,qi(c
n[i] + Φ∗i (y − Φcn))
end foruntil convergence
46
3.4 Thresholding Algorithms for Frames
Remark 3.4.1. For the sake of easier implementation the non-negative threshold functionsξ = ξ(g,m),λ corresponding to their weighted mixed norms which define the generalizedthresholding operator Sλ,ξ(zg,m) = zg,m(1− ξ(z))+, where (g,m) refers to the group -memberstructure, can be represented equivalently as
(i) p = q = 1 : ξL(cg,m) = λ|cg,m| (Lasso)
(ii) p = 2, q = 1; ξGL(cg,m) = λ
(∑m |cg,m|2)
12
(Group-Lasso)
(iii) p = 1, q = 2 : ξEL(cg,m) = λ1+Mgλ
‖cg‖1|cg,m| (Elitist-Lasso)
where cg = (c′g,1, ..., c′g,M ) and {c′g,m′}m′ denotes for each group the sequence of scalars |cg,m|
in descendant order. Mg denotes some natural number depending on the magnitudes ofcoefficients in the group (cg,1, ..., cg,M ).
The implementation of the ‘StrucAudioToolbox’ used in Chapter 5 is based on thisrepresentation.
We finally have established the algorithm for any case discussed before. But since the
velocity of an algorithm is still as important as its convergence in applications a simple
modification to the ISTA has been made. The so-called fast ISTA (FISTA) given below
performs best in almost all situations.
The ISTA is constructed by setting cn = S(bn) with bn = cn−1 + Φ∗(y − Φcn−1). The
modification that provides the FISTA with its velocity is the choice of bn which should be
replaced by a linear combination of cn and cn−1. For comparison while the ISTA
converges in the image sub-linearly like O(1/n), the FISTA converges at the rate O(1/n2).
Algorithm 2 (FISTA)S = Sw,p,qc0 = b1 ∈ Hst1 = 1repeat
cn = S(bn + Φ∗(y − Φbn))tn+1 = 1
2 (1 +√
1 + 4t2n)
bn+1 = cn +(tn−1tn+1
)(cn − cn−1)
until convergence
47
48
4 Improvements and Threshold Selection
This chapter gives a further improvisation called Persistence regarding the threshold
function ξ in dependence of a coefficients’ neighbored. After this a short view into a new
method called Empirical Wiener Estimation will be presented.
4.1 Persistence and Neighborhoods
We recall what we have done so far. We defined the general problem (3.4) that made it
possible to deal with thresholded sparsity. The second step, in order to optimize our
problem, we introduced mixed norms (3.7) and replaced our penalty term Ψ by these.
The minimizers of Lw,p,q in (3.9) for p, q ∈ {1, 2} have been achieved by generalized
soft-thresholding with a threshold function ξ. This soft-thresholding operator introduced
in Definition 3.2 has the form S(z) := z(1− ξ(z))+ together with one of the threshold
functions derived in Theorem 3.1). All of this has been done solely considering
orthonormal bases. Since the analysis coefficients of frames are not unique, we had to
derive an iterative approach in order to minimize the problem. Still using the threshold
operator and therefore a threshold function, the IST-algorithm resp. its faster version
FISTA, discussed in the last section, will realize this.
We now want to go a step further to adjust the framework in order to cover a wider
spectrum of audio signals. Observing some ‘real’ signal examples we may recognize that
most parts are sparse in time but persistent in frequency resp. vice versa. By regarding
this fact a benefit should be extracted. For this task, the threshold function ξ will be
considered again. This approach is known from [KowBruno] as persistent generalizedthresholding. The new operators will be evaluated in terms of Gabor analysis.
Notation: Since we give the following terms in the general case of the mixed norms, and
for the sake of brevity, we write ξ = ξw = ξw,p,q.
Definition 4.1 (Time-Frequency Neighborhood). For the countable index set Γ thetime-frequency neighborhood weights are defined as the non-negative sequencesvγ = vγ(γ) ≥ 0 ∀γ, γ ∈ Γ which fulfill the following properties:
• ‖vγ‖2 = 1
•∑γ
vγ(γ) ≤ C <∞ ∀γ
• vγ(γ) > 0 ∀γ
49
4 Improvements and Threshold Selection
Figure 4.1: Sketch of different neighborhoods. Rectangular or triangular windows can beused as well as different chosen centers.
Nγ := supp(vγ) = {γ ∈ Γ : vγ(γ) > 0} is called the time-frequency neighborhood of γ.For given neighborhood weights vγ , let the neighborhood-smoothing functional η : Hc → R+
0
is defined component-wise by
η(cγ) :=
∑γ∈Γ
vγ(γ)|cγ |21/2
(4.1)
For c ∈ Hc, we set η(c) := (η(cγ))γ∈Γ).
The advantage of neighborhoods in contrast to the groups of GL and EL is that
neighborhoods can be modeled flexibly, e.g. using weighting and overlap.
Equipped with this definitions, we are now able to exploit the soft-thresholding operator
via convolution.
Definition 4.2 (Persistent Soft-Thresholding Operator). For neighborhood weights vγ , thepersistent soft-thresholding operator is defined as
S∗p,q(c) := Sp,q,v(c) := c(1− ξ∗p,q(c))+ (4.2)
with threshold function ξ∗p,q := ξp,q ∗ η.
The index γ determining the neighborhood can take any positive size. In particular, it
may happen that the neighborhood comprises only one coefficients index, i.e. Nγ = {γ}.Then η(cγ) = vγ |cγ | = |cγ |, since vγ = 1. You can see this kind of neighborhood choice in
the middle of Figure 4.1. In fact the regular operators occur to be a special case of the
persistent soft-thresholding operators, since the neighborhood weighting seems to
disappear for this case and the single coefficient resp. the center stays independently.
Remark 4.1.1. It might be a good idea of choosing a neighborhood with a non-rectangularground. An example is shown in Figure 4.2. But the effects of kind of neighborhood haven’t
50
4.1 Persistence and Neighborhoods
been researched yet. We note that in the ‘StrucAudioToolbox’ solely rectangular basedneighborhoods are implemented since they are represented in form of matrices.
Figure 4.2: Non-rectangular neighborhood. Not implemented in Toolbox.
We make a list of all threshold functions used for our different thresholding operators:
ξL = ξ1,1 − Lasso (L)
ξGL = ξ2,1 − Group− Lasso (GL)
ξEL = ξ1,2 − Elitist− Lasso (EL)
ξWGL = ξ∗1,1 = ξL ∗ ηN − windowed GroupLasso (WGL)
ξPGL = ξ∗2,1 = ξGL ∗ ηN − persistent GroupLasso (PGL)
ξPEL = ξ∗1,2 = ξEL ∗ ηN − persistent Elitist− Lasso (PEL)
We remember that there was another threshold function, the Tykonov-case with p=q=2,
but because of the reasons mentioned before, is not useful in this context and therefore
not further discussed.
Remark 4.1.2. It might be confusing why we use the term ‘windowed Group Lasso’instead of ‘persistent Lasso’. As discussed in [KowBruno], the persistent Lasso (PL) is oftenconsidered as modification of GL and thus the name WGL was used. In detail if theneighborhoods Nγ are chosen as a partition of Γ, i.e. Γ = ∪kNk, and the neighborhood- andpenalty weights are constant over each neighborhood Nk, then PL/WGL coincides with theGL. It might also be appropriate to use the label PL but in sake of non-confusing inconnection to other References and for further research we will use the label WGL.
51
4 Improvements and Threshold Selection
4.2 Empirical Wiener Estimation
Before we finally come to applicated examples and experiments we take a short look at
an alternative denoising operator based on neighborhood smoothed shrinkage acting
similar to Wiener filters. In this concept it will be possible to find resp. compute a suited
selection of the threshold λ. A consequence is that in the iterative process of Gabor
analysis, the optimal λ can be determined in each step, but this will not be a part of this
work.
Some definitions from [Siedenburg2] are presented in this section without giving further
computations to get an impression. For more details it is recommended to read this
paper as well as [Kowalski].
Since the results are easier to discuss in terms of orthonormal bases we work here with
the MDCT, which is a special case of orthogonal time frequency transform given with
atoms
ϕl,k(n) = gl(n)
√2
Lcos
[π
L(k +
1
2)(n+ nl)
]where nl = (L+ 1)/2− lL and k = 0, ..., L− 1 and window
gl(n) = sin[π
2L(n− lL+
L
2)].
The hop size L is the half of the window length since the MDCT is critically sampled,
which means that though it is 50% overlapped, the signal data after transformation and
back-transformation is different from its original, but the number of coefficients is the
same. Note that the MDCT is an orthonormal on the global signal and not on local
frames, i.e. Φ∗Φ = I, where I denotes the identity matrix. An important consequence is
that white noise is again transformed into white noise.
The aim is to minimize the estimation risk
R(f, f) = E[‖f − f‖2] = E[
p∑n=1
|f(n)− f(n)|2] (4.3)
by choosing an appropriate diagonal operator D = diag(d1, ..., dp) with non-negative dγ ≥ 0.
f = ΦDy∗ is the diagonal estimate of f , i.e. a reconstruction of dγ-weighted analysis
coefficients. After some computations we obtain the estimator
ξγ := σ−2y∗2
γ − 1 (4.4)
of the a priori SNR (sound to noise ratio; explained in Chapter 5) ξγ, i.e. E[ξγ ] = ξγ, and
therefore the shrinkage weights dγ = (1− σ2
|y∗γ |2)+ with noise level σ, where (x)+ = max(x, 0)
and 10 =∞. Since they are similar to Wiener filters, the name empirical Wiener
attenuation (EW) has got picked. So the empirical Wiener diagonal estimation is given
component wise by
SEW (zγ) := zγdγ .
52
4.2 Empirical Wiener Estimation
The question is now how the empirical Wiener Operator is related to the operators
discussed in Chapter 3. Generalizing the soft-thresholding operator yields
Sαλ(y∗) := y∗(1−[
λ
‖y∗‖?
]α)+. (4.5)
Remembering the representation of the threshold functions in Remark 3.4.1 we see that
(4.5) corresponds to the Lasso for α = 1 and ‖ · ‖?. Furthermore it seems to be natural to
choose for λ the noise level sigma. Now for setting α = 2 and λ = σ we receive SEW = S2σ.
Remark 4.2.1. There are more ways of choosing an appropriate threshold λ.(i) The first and most natural (non adaptive) choice seems to be the noise level, i.e. λ = σ.
The disadvantage is that for a zero signal, this would imply to retain around one thirdof the overall noise, which is not a desirable result.
(ii) Another possibility would be the universal threshold λ = σ√
2ln(p). Slowlyincreasing with the signal length it often produces estimates which are to sparse.
(iii) The most common but also most complicated choice is the Stein unbiased riskestimate (SURE). This one is a tool for adapting the threshold to the actual data inorder to minimize the estimation risk. But for very sparse signals it is useful to toreplace the SURE by the universal threshold.
In [Siedenburg2] is also discussed a way of automatically choosing the threshold.
This approach works for the Lasso, WGL, and EL. We learned in the previous section
about operators that make usage of their neighborhood, so we still have to observe a
combination of neighborhood persistent and empirical Wiener shrinkage. The received
operator is called persistent empirical Wiener (PEW).Analogous to 4.4 estimate for persistent SNR is
ξ∗γ := σ−2∑γ′∈Γ
wγγ′ |y∗γ′ |2 − 1 (4.6)
with the sequence wγ = (wγγ′)γ′∈Γ of non-negative and normalized neighborhood weights
for each γ fulfilling∑γ′∈Γ wγγ′ = 1 and wγγ > 0. In other words the requirements from
Definition 4.1 are fulfilled. Then the PEW is coordinate-wise given by
S(y∗γ) := y∗γ
(1− σ2∑
γ′∈Γ wγγ′ |y∗γ′ |2
)∗(4.7)
Note that the new operator differs from the WGL only in the exponent α = 1 to α = 2 in
4.5. Furthermore to obtain SPEW = S2σ we only have to set
‖y∗‖? =
√∑γ′∈Γ
wγγ′ |y∗γ′ |2.
Again analogous to the relations discussed in the previous section, the PEW coincides
with the EW for neighborhood weights with a single-coefficient support supp(wγ) = γ.
53
4 Improvements and Threshold Selection
Finally we are able to compare the operators EW and PEW by considering the squared
error risk (4.3) of ξ∗γ and ξγ. It yields for any estimator ξ
R(ξγ , f) = E[(ξγ − f)2] = V ar(ξ) +Bias(ξ)2
with
V ar(ξ) = 2∑γ′∈Γ
w2γγ′ + 4σ−2
∑γ′∈Γ
w2γγ′c
∗γ′
2
Bias(ξ)2 = σ−4
∑γ′∈Γ
wγγ′c∗γ′
2 − c∗γ2
2
for the persistent case and otherwise
R(ξγ , f) = 2 + 4σ−2c∗γ2.
Dependent on max(R(ξγ , ξγ), R(ξ∗γ , ξγ)
)the corresponding operator to the bigger risk is
preferred for the given signal.
54
5 Applications and Experiments
We will now give some examples for the discussed operations as well as some
observations for performance differences. For this task we use the StrucAudioToolboximplemented by Kai Siedenburg. First of all, we need to know, how we can observe the
results of the different methods of structured sparsity. The best tool for this seems to be
the so-called Signal-to-Noise ratio (SNR)
SNR(f, f) = 20log10
(‖f‖‖f − f‖
)
where f denotes the clean signal and f the noisy signal. The SNR is measured in decibel
(dB). It compares the amounts of the signal to the amount of noise. A component with a
SNR of 100dB means that the level of the signal is 100 dB higher than the level of the
noise, so it is clear that this one would be a better specification than a component with
SNR of 90dB for example.
Furthermore the relative error is used to attain a break condition for the iterative
process. It is defined as err = ‖f−f‖‖f‖ . We could say the algorithm aborts, when the
difference between the signal and its noisy version is sufficient small.
For using the certain operator types, e.g. Lasso, GL, PEW, etc., we first need to set the
right parameters to get the desired one. The description of the toolbox in included in the
download. We give an example for obtaining the Lasso.
Example 5.0.1. A signal has to be imported, which is here the recorded slowly playedguitar chord E-Dur, where neck- and bridge-pickups are used simultaneously. It is commonfor research purposes to use an artificially created noise over (good) signals. In MATLABTM
this can be realized by adding the term l · randn(length(‘signal′)), 1), where l is the noiselevel, for this task we chose 0.01. Note that we work with mono signals, i.e. with vectors.After getting the default values with [settings] = thresholding(′settings′);, the parameterscorresponding to the Lasso should be adjusted. Since we don’t use an EW, we set theexponent α = 1, and no neighborhood is taken into account, so we set the neighborhoodmatrix N = 1 which is of course center too. This already guarantees us the usage of theLasso. For comparison, the default values of the StrucAudioToolbox would be α = 2 and therow vector N = (1 1 1 1 1 1 1∗ 1 1) where 1∗ denotes the center, i.e. a PEW with asymmetricneighborhood in the time-label. The last change concerns the shrinkage level which hasbeen changed to λ = 0.001 and only one iteration is used.The rest is default valued, e.g. a Gabor transform with tight window is used withfrequency channel M = 1024 shift parameter s = 256. We can plot the coefficients of the
55
5 Applications and Experiments
clean, noisy and denoised signal as in Figure 5.1.
Figure 5.1: Clean, noisy and denoised signal coefficients.
With this presetting the reconstructed signal sounds a little muffled, i.e. the highfrequencies has been erased. Better results can be obtained by the usage of otheroperators, neighborhoods and shrinkage level respective the number of iteration.
Remark 5.0.2. One may have mentioned that the default setting in the Toolboxrecommends a tight Hann window which is defined by
gH(x) :=1 + cos(2πx)
21[− 1
2 ,12 ](x)
with indicator function 1 of the translated unit interval. From [[Holighaus], Lemma 3.2.6]can be inferred that the Gabor system G(gH ,
14 , 1) forms a tight frame with bounds
A = 4‖gH‖2. This one is for this application preferable since it is zero outside of the intervalwithout a sharp cutoff in contrast the Gaussian and provides ”more continuity”. It has notyet been researched, what are the consequences and what happens by choosing differentwindows. More to this problem can be found in [Holighaus] resp. [Siedenburg1].
56
It is reasonable that many errors can occur for bad selections.
For example it is necessary to do the iterative process with the one time transformed
coefficients. If someone tries to perform a forward and backward transform in every step
with a very small hop size that is proportional not suited for its window length w the
shifted windows overlap highly and generate more and more coefficients. The result for
Lasso is shown in Figure 5.2.
Figure 5.2: Transformation in every step with bad shift causes increasing number of co-efficients and therefore noise.
So if we avoid this mistake transform before the iteration and back afterwards, the
concept of overlapping windows is still compatible with this concept. For fixed window
length, e.g. w = 1024, and different shifts a, again performed with Lasso we see that the
smaller the shift the bigger the relative error, shown in Figure 5.3. The iteration steps
(Iter), elapsed time and the highest relative error in the iteration is listed in Table 5.4.
This assumption seems to be clear, when recalling the redundancy of frames. The
smaller the hop size, the more will the same areas be taken into account with included
noise. In contrast, for big valued shifts the resolution becomes coarse which implies that
less coefficients will be there to be thresholded.
57
5 Applications and Experiments
Figure 5.3: Different shift values for fixed window length w = 1024.
Shift 32 64 128 256 512Iter 849 504 363 254 211Time 3315.461599 993.200849 357.043096 138.336217 73.177419ErrorMax 0.042640 0.035324 0.029582 0.024477 0.016181
Figure 5.4: Lasso with alternating shift.
We already discussed in Chapter 4 that the choice of the threshold level λ is very
important. It can also cause unpleasant reconstructions by choosing a too big resp. too
small λ. In Example 5.0.1 we made the experience that λ was chosen a little bit too big
which ended in a muffled reconstruction, and a even bigger choice would increase this
effect. Observing an iterative process there also seems to be the little side effect that a
bigger λ causes in the first few steps more noise but of cause it has the advantage of
being faster. In Figure 5.5 the number of iterations is plotted against the relative error for
four different shrinkage levels observation to see the difference. The number of necessary
iterations (IT) is listed in table 5.6.
58
Figure 5.5: Relative error in each iteration step for different shrinkage levels.
Shrink 0.001 0.005 0.01 0.05Iter 437 327 255 138
Figure 5.6: Table of necessary iteration steps until algorithm aborts.
For λ = 0.005 resp. λ = 0.001 we hear and see that the reconstructed signal is still very
noisy although the SNR is close to zero. We take a look at the corresponding SNR curves
in Figure 5.7.
59
5 Applications and Experiments
Figure 5.7: SNR in each iteration step for different shrinkage levels.
The reconstruction seems to perform better after some steps. This assumption will be left
undiscussed since many more experiments should be observed on different signals.
In fact this was by the example of Lasso just a small prospect on the way how the
threshold selection can be researched.
60
We will give two more examples what can be done with suited settings.
Example 5.0.2 (Party). This example show a very useful application of iterativethresholding. It is not just possible to filter a signal from a consistent noise, it even filters‘real life noise’. In this example a recording of a women saying reciting a famous‘Giotto’-commercial is used which has been recorded at a party with much noise from otherpeople in the background. We use the GL in the frequency label, that means a groups ofcoefficients in certain frequencies are taken into account. This seems to be a good ideasince the the womens voice has an other (higher) frequency than the background andprotrudes therefore. In addition the shrinkage level has been set to λ = 0.003 and 5iterations are used while the neighborhood is set to default as mentioned in Example 5.0.1.In fact we have a PGL. In Figure 5.8 is the original signal and below its denoised versionrepresented, which sounds pretty well but one can still hear some artifacts which caneasily be removed by playing with the setting.
Figure 5.8: Reconstructed voice with loud background.
If we work with GL it is important to know which label should be used. When setting nowthe GL in the time label it will cause that certain time groups are removed, that means thetime when the women pauses speaking is the ‘weakest’ and the reconstructed signal willswitch between voice with loud background and total silence, which is not the intendedresult. The coefficients of such a reconstruction is shown in Figure 5.9. We will see that
61
5 Applications and Experiments
using the time label in this context gives us the perfect tool for transient reconstructionsdiscussed in the following Example 5.0.3.
Figure 5.9: Reconstructed voice with loud background; group label in time.
Example 5.0.3 (Multilayer Decomposition). Another very useful application of multilayerdecomposition is a separation of tonal, transient and noise parts. Multilayer decompositionas we discussed the in Chapter 3 allows us to combine different operators which makesthis task possible. We make usage of the uncertainty principle, i.e. a window for thetransformation can be chosen in order to yield more exact time resp. frequency information.Therefore it makes sense to use a wide window for the tonal parts while a small window isused to receive the transient parts. In this example a tight Hann-window with lengthw = 4096 and hop size a = 1024 are the parameters for the tonal analysis and for thetransient w = 128 and a = 32. In [Siedenburg1] some experiments have been done whichoperator performs best for each case. It turned out that for tonal aims the WGL withneighborhood expansion in time is preferable and for transients a simple GL with time asgroup index. Here we will use a PGL which respects additionally one coefficient left to itscenter to obtain some additional persistence in frequency. Summarizing this we can writein the notation of (3.26) for the corresponding generated Gabor frames Φ1 and Φ2 the
62
Lagrangian1
2‖y − Φ1c[1] + Φ2c[2]‖22 + ‖c[1]‖w[1],1 + ‖c[2]‖w[2],2,1. (5.1)
(5.1) is then minimized by c? = (c?[1], c?[2]) where Φ1c
?[1] is the tonal and Φ2c
?[2] the transient
signal layer.The constructed algorithm given in the Appendix is realized by setting the tonal parametersand perform the thresholding with a suited shrinkage level for the signal. Next the separatetransient parameters parameters are set and applied by thresholding on the signal minusthe tonal reconstruction. This yields the transient reconstruction. By adding the two signalreconstructions the original signal is obtained without noise. The different coefficientrepresentations are shown in Figure 5.10.
Figure 5.10: Multi-layered decomposition from noisy ‘musical clock’.
63
64
6 Conclusion
This thesis was designed to give the reader a fast way understanding of structured
sparsity started by discussing the most important tools for signal processing. In the first
chapter the necessary mathematical tools for knowing what actually defines a signal and
what are the main results for processing them where given, i.e. functional analytic
basics, time- and frequency-shifts, the Fourier transform, Convolution as well as the
short-time Fourier transform.
The second chapter introduced the idea behind special field of Gabor analysis, the fact
that bases are generalize by frames, where the latter one provides redundancy. Therefore
Gabor frames serve as a good playground for sparsity.
The introduction to the main topic of structured sparsity was given in chapter three. The
minimization of the so-called Lagrangian had a high priority. This linear inverse
optimization problem contained two aspects. The first was the minimization of the
discrepancy for receiving a good synthesis. The second was the reduction of the non-zero
coefficients with an additional threshold function, that determined how this minimization
is taken into account. For the latter problem the `1 norm was used which gave us the
Lasso (least absolute shrinkage operator). Replacing this Norm by weighted mixed norms
provided the further operators Elitist-Lasso and Group-Lasso. By introducing the ISTA
(iterative soft-thresholding algorithm) and its improved version FISTA, thresholding has
been suited for frames. The properties of the FISTA are barely researched yet, so this
would be a good point for further research. In the fourth chapter persistent operators
have been discussed. They gave a way of taking coefficients in the neighborhood into
account. Furthermore with the empirical Wiener estimate an alternative to the previous
operators was offered.
The last chapter made clear how big the area of numerical research still is for structured
sparsity. Three applications have been mentioned: denoising, declipping, signal
decomposition. Many other applications can be found for using this theory.
Furthermore considering the Gabor transform, different window can be chosen with
different lengths, shifts and frequency channels. It is still unknown in detail how these
things affects to sparse recovery. For the selection of the shrinkage level some proposals
were mentioned in chapter 4. They seem to work fine, but there is still no rule for which
kind of signal resp. application the different choices serve with good results. In fact the
kind of signal is an important point.
Another point: it is know that a Gabor transform with different windows, the so-called
non-stationary Gabor transform (NSGT) gives very good results for signal representations.
It may be useful to extend the terms of structured sparsity to non-stationary Gabor
frames. The problem is that no regular time-frequency lattice is generated, so it will be
65
6 Conclusion
necessary to develop novel strategies.
It is obvious that there is enough place for researches on the whole field, but since
structured sparsity in the sense of harmonic analysis is not very old, researchers can get
optimistic to work.
66
Bibliography
[Bannert] Severin Bannert: Banach-Gelfand Triples and Applications in Time-Frequency
Analysis, [mastersthesis] University of Vienna, 2010.
[Candles] E.J. Candes: Compressive sampling, in Proc. Int. Congr. Math., vol. 17, no. 4,
Spain, 2006.
[CorFeiLu] Elena Cordero, Hans G. Feichtinger, Franz Luef Banach Gelfand triples for
Gabor analysis, [incollection] in ”Pseudo-differential Operators ”, Springer, Lecture
Notes in Mathematics, Vol.1949 p.1–33, Berlin, 2008.
[Daubechies] Ingrid Daubechies, Michel Defrise, Christine de Mol: An iterative
thresholding algorithm for linear inverse problems with a sparsity constraint,
Communication on Pure and Applied Mathematics, 2004.
[Donoho] David Donoho: For most large underdetermined systems of linear equations
the minimal `1-norm solution is also the sparsest solution, Communication on Pure
and Applied Mathematics, 59(6):797-829, 2006.
[Dopfner] Kirian Dopfner: Quality of Gabor Multipliers for Operator Approximation,
Diplomarbeit, Universitat Wien, 2013.
[Dorfler1] Monika Dorfler: Time-frequency analysis for music signals A mathematical
approach, Journal of New Music Research, Vol.30 No.1, p.3-12, 2001.
[Dorfler2] Monika Dorfler: What Time-Frequency Analysis Can Do to Music Signals,
”Matematica e Cultura 2003”, Springer Italia, 2003.
[Dorfler3] Monika Dorfler: Gabor Analysis for a Class of Signals called Music,
Dissertation, Universitat Wien, 2002.
[DorfMatu] Monika Dorfler, Ewa Matusiak: Nonstationary Gabor Frames - Existence and
Construction, preprint, submitted, http://arxiv.org/abs/1112.5262, 2012.
[FeichGroch] Hans Georg Feichtinger, Karlheinz Grochenig: Gabor frames and
time-frequency analysis of distributions, J. Funct. Anal. 146(2), 464-495, 1997.
[Feichtinger1] Hans Georg Feichtinger: Banach Gelfand triples for applications in
physics and engineering, [inproceedings] Amer. Inst. Phys., AIP Conf. Proc., AIP
Conf. Proc., Vol.1146 No.1, p.189-228, 2009.
[Feichtinger2] Hans Georg Feichtinger: A Functional Analytic Approach to Applied
Analysis, Script NuHAG, Autumn 2012.
67
Bibliography
[FeichLuef] Hans G. Feichtinger, Franz Luef: Gabor analysis and time-frequency
methods,Article, Encyclopedia of Applied and Computational Mathematics, 2012.
[GareyJohnson] Michael R. Garey, David S. Johnson: Computers and Intractability : a
guide to the theory of NP-Completeness., Freeman, New York, 1985.
[Grochenig] Karlheinz Grochenig: Foundations of Time-Frequency Analysis, Applied and
Nummerical Harmonic Analysis, Birkhauser Boston, 2001.
[Haltmeier] Markus Haltmeier: Bild und Signalverarbeitung, Skriptum, Universitat Wien,
2011.
[HeilWalnut] C. Heil, D. Walnut Continuous and discrete wavelet transforms, SIAM
Review, 31(4), 628-666, 1989.
[Heuser] Harro Heuser: Funktionalanalysis: Theorie und Anwendung, Vieweg+Teubner
Verlag, 2006.
[Holighaus] Nicki Holighaus: Zeit-frequenz analyse mit methoden der gabor analysis.,
Master’s thesis, Universitat Giessen, 2010.
[Kaiblinger] Norbert Kaiblinger: Approximation of the Fourier Transform and the Dual
Gabor Window, Journal of Fourier Analysis and Applications, Vol.11 No.1, p.25-42,
2005.
[Kowalski] Matthieu Kowalski: Sparse regression using mixed norms, Applied and
Computational Harmonic Analysis, vol. 27, no. 3, pp. 303-324, 2009.
[KowBruno] Matthieu Kowalski, Bruno Torresani:Sparsity and persistence: mixed norms
provide simple signal models with dependent coefficients, Signal, Image and Video
Processing, vol. 3, no. 3, pp. 251-264, 2008.
[KowSidDorf] Matthieu Kowalski , Kai Siedenburg and Monika Dorfler: Social Sparsity!
Neighborhood Systems Enrich Structured Shrinkage Operators, IEEE Trans. Signal
Process, 2013.
[Krieger] J. Krieger: Stoffzusammenfassung zur Bild- und Signalverarbeitung,
Universitat Heidelberg, 2006.
[Mallat] Stephane Mallat: A Wavelet Tour of Signal Processing: The Sparse Way, 2008.
[Missbauer] Andreas Missbauer: Gabor Frames and the Fractional Fourier Transform,
Mastersthesis, University of Vienna, 2012.
[Opial] Zdzislaw Opial: Weak convergence of the sequence of successive approximations
for nonexpansive mappings, Bulletin of the American Mathematical Society,
73:591-597, 1967.
[Sardy] Sylvain Sardy, Andrew G. Bruce, Paul Tseng: Block coordinate relaxation
methods for nonparametric wavelet denoising, Journal of Computational and
Graphical Statistics, 2000.
68
Bibliography
[Siedenburg1] Kai Siedenburg: Structured Sparsity in Time-Frequency Analysis,
Diplomarbeit, Humboldt-Universitat zu Berlin, 2011.
[Siedenburg2] Kai Siedenburg: Persistent Empirical Wiener Estimation with adaptive
threshold selection for audio denoising, Proceedings of the 9th Sound and Music
Computing Conference, Kopenhagen, July 11-14th 2012.
[SiedenburgDorfler] Kai Siedenburg, Monika Dorfler: Structured Sparsity for Audio
Signals, Proceedings of the 14th International Conference on Digital Audio Effects,
Paris, 2011.
[Tibshirani] Robert Tibshirani: Regression shrinkage and selection via the lasso, Journal
of the Royal Statistical Society. Series B (Statistical Methodology), 1996.
[VelHoliDorfGrill] Gino Angelo Velasco, Nicki Holighaus, Monika Dorfler, Thomas Grill
Constructing an invertible constant-Q transform with non-stationary Gabor frames,
Article, Proceedings of DAFX11, 2011.
[WeinWakin] Alejandro J. Weinstein, Michael B. Wakin Recovering a Clipped Signal in
Sparseland, to appear in Sampling Theory in Signal and Image Processing, 2011.
[YuanLin] Ming Yuan, Yi Lin Model selection and estimation in regression with grouped
variables, Journal of the Royal Statistical Society. Series B (Statistical Methodology),
68:49-67, 2006.
69
70
Appendix
MATLAB Files
This code gives the possibility of creating a melody in MATLABTM by the example of the
‘Star Wars’ theme. The Spectrogram is realized by a routine of the toolbox LTFAT(http://ltfat.sourceforge.net/).
1 %% STAR WARS
2
3 function starwars
4
5 H=linspace(1,8000,8000); %half note
6 V=linspace(1,4000,4000); %quater note
7 T=linspace(1,1333,1333); %triplet
8
9 w1=2*sin(2*pi*233.081*H/8000); %notes for the melody
10 w2=2*sin(2*pi*349.228*H/8000);
11 w3=2*sin(2*pi*311.1*T/8000);
12 w4=2*sin(2*pi*293.7*T/8000);
13 w5=2*sin(2*pi*261.6*T/8000);
14 w6=2*sin(2*pi*466.2*H/8000);
15 w7=2*sin(2*pi*349.228*V/8000);
16 w8=2*sin(2*pi*311.1*T/8000);
17 w9=2*sin(2*pi*293.7*T/8000);
18 w10=2*sin(2*pi*261.6*T/8000);
19 w11=2*sin(2*pi*466.2*H/8000);
20 w12=2*sin(2*pi*349.228*V/8000);
21 w13=2*sin(2*pi*311.1*T/8000);
22 w14=2*sin(2*pi*293.7*T/8000);
23 w15=2*sin(2*pi*311.1*T/8000);
24 w16=2*sin(2*pi*261.6*H/8000);
25
26 song=[w1,w2,w3,w4,w5,w6,w7,w8,w9,w10,w11,w12,w13,w14,w15,w16]; %composed
melody
27 %wavwrite(song,’starwars’); %creates a wave file if wished
71
Appendix
28 sound(song) %plays melody
29
30 figure
31 fs=8000;
32 sgram(song,fs,90,’wlen’,round(20/200*fs)); %plots spectogram with
LTFAT
33 axis([0 7.495 0 1000]);
34 shg
35
36 %% further observations
37 figure
38 test=[w1,w2]; %discontinuity problem,
39 plot(test) %can be solved by shifting or smoothing
40 axis([7900 8100 -2.5 2.5]); xlabel(’Time’); ylabel(’Amplitude’)
41
42 figure
43 subplot(211) %comparison of frequencies of 1. and 6. note
44 plot(w1)
45 axis([0 1000 -2.5 2.5]); xlabel(’Time’); ylabel(’Amplitude’)
46 subplot(212)
47 plot(w6)
48 axis([0 1000 -2.5 2.5]); xlabel(’Time’); ylabel(’Amplitude’)
72
All of the following Codes are based on the database of the toolboxes LTFAT and
StrucAudioToolbox (http://homepage.univie.ac.at/monika.doerfler/StrucAudio.html).
Most of the MATLABTM files perform the concerning operation(s) with made presettings.
1 %% LASSO
2
3 % get the noisy signal:
4 [sig, fs] = wavread(’CleanNB.wav’);
5 sig_noisy = sig + 0.01*randn(length(sig),1);
6
7
8 % settings for Lasso
9 [settings]=thresholding(’settings’);
10 settings.shrink.expo = 1;
11 settings.shrink.neigh = 1;
12 settings.shrink.center = [1 1];
13
14 % changeable coefficients
15 settings.shrink.lambda = 0.01;
16 settings.trans.M = 1024;
17 settings.trans.shift = 256;
18
19
20 settings.iter.maxit = 1; % set number of iterations
21 settings.iter.disp = 1; % display relative error
22
23 % denoising
24 G = trafo(sig, settings.trans); % clean analysis coefficients
25 Gn = trafo(sig_noisy, settings.trans); % noisy analysis coefficients
26
27 [sig_rec, Gs] = thresholding(sig_noisy, settings);
28
29 subplot(131);
30 imagesc(20*log10(abs(G))); axis off; title(’Clean’);
31 subplot(132);
32 imagesc(20*log10(abs(Gn))); axis off; title(’Noisy’);
33 subplot(133);
34 imagesc(20*log10(abs(Gs))); axis off; title(’Denoised’);
35 shg
73
Appendix
1 %% PARTY NOISE
2
3 % get the noisy signal:
4 [sig, fs] = wavread(’giotto.wav’);
5
6 [settings]=thresholding(’settings’);
7 settings.shrink.expo = 1;
8 settings.shrink.lambda = 0.003;
9 settings.shrink.type=’gl’;
10 settings.shrink.glabel=’frequency’; %’time’ causes bad results
11
12
13 % disp
14 settings.iter.maxit = 5; %HIER DIE ANZAHL DER ITERATIONEN ANGEBEN!
15 settings.iter.disp = 1; %UM DEN FEHLER AUSZUGEBEN!
16 %settings.shrink
17
18 % thresholding algorithm
19 G = trafo(sig, settings.trans); % original analysis coefficient
20
21 [sig_rec, Gs] = thresholding(sig, settings);
22
23 subplot(211);
24 imagesc(20*log10(abs(G))); axis off; title(’Original’);
25 subplot(212);
26 imagesc(20*log10(abs(Gs))); axis off; title(’Denoised’);
27 shg
74
1 %MULTILAYER - DECOMPOSITION
2
3 % get the noisy signal:
4 [sig, fs] = wavread(’spieluhr.wav’);
5 sig = sig + 0.01*randn(length(sig),1);
6
7
8 % settings for tonal layer:
9 [set_ton] = thresholding(’settings’, ’transtype’, ’gab’, ’M’, 4096);
10 set_ton.shrink.neigh = ones(1,12); % persistence in time
11 set_ton.shrink.center = [1, 10]; % center point non-symmetric
12 set_ton.shrink.lambda = 0.01; % adjust tonal threshold
13
14 % settings for transient layer:
15 [set_tra] = thresholding(’settings’, ’transtype’, ’gab’, ’M’, 128);
16 set_tra.shrink.type = ’gl’; % group EW
17 set_tra.shrink.glabel = ’time’; % group labels in time
18 set_tra.shrink.neigh = ones(1,2); % some additional persistence in
frequency
19 set_tra.shrink.center = [1,2]; % non-symmetry
20 set_tra.shrink.lambda = 0.0044; %0.0048; % adjust transient threshold
21 % perform multilayer decomposition:
22 [sig_ton, Gs_ton] = thresholding(sig, set_ton); % estimate tonal layer
23 [sig_tra, Gs_tra] = thresholding(sig - sig_ton, set_tra);
24 sig_mult=sig_ton+sig_tra;
25 % estimate transient layer from residual
26 % plots:
27 [set_ana] = thresholding(’settings’); % get ’neutral’ analysis settings
28 G = trafo(sig, set_ana.trans); % get Gabor analysis coeffs with trafo.m
29 G_mult = trafo(sig_mult, set_ana.trans); % get Gabor analysis
30 % coeffs from both layers
31 subplot(2,2,1); imagesc(20*log10(abs(G))); axis off; title(’Original’);
32 subplot(2,2,2); imagesc(20*log10(abs(Gs_ton))); axis off; title(’Tonal’)
;
33 subplot(2,2,3); imagesc(20*log10(abs(Gs_tra))); axis off; title(’
Transient’);
34 subplot(2,2,4); imagesc(20*log10(abs(G_mult))); axis off; title(’
Multilayer’);
35 shg
75
76
Curriculum Vitae
Personal information
Name: Dominik Torsten Fuchs
Nationality: Austria
Education
1991-1995 Primary School Jagdgasse, Vienna
1997-2006 High School; Gymnasium Laaerberg, Vienna
Graduation June 2006
2012-2013 Programmer at Phonicscore GMBH
2007-2013 Study of mathematics at the University of Vienna
77