Some Contributions to Adaptive Filtering for Acoustic ... · Some Contributions to Adaptive Filtering for Acoustic Multiple-Input/Multiple- ... (LMS) an unforgettable time of my life

Some Contributions to Adaptive Filteringfor Acoustic Multiple-Input/Multiple-Output Systems in the Wave Domain

Beitrage zur adaptiven Filterung fur akustische“Multiple-Input/Multiple-Output” Systeme im

Wellenbereich

Der Technischen Fakultat derFriedrich-Alexander-Universitat Erlangen-Nurnberg

zur Erlangung des Grades

Doktor-Ingenieur

vorgelegt von

Martin Schneider

aus Fulda

Als Dissertation genehmigt vonder Technischen Fakultat der

Friedrich-Alexander-UniversitatErlangen-Nurnberg

Tag der mundlichen Prufung: 26.4.2016Vorsitzender des Promotionsorgans: Prof. Dr. Peter GreilGutachter: Prof. Dr.-Ing. Walter Kellermann

Prof. Dr.-Ing. Sascha Spors

AcknowledgmentsI would like to express my most sincere gratitude to all the people who accompanied meon my journey leading to this thesis.

First of all, I would like to thank Prof. Dr.-Ing. Walter Kellermann for giving me theopportunity to join his research group and to work under his supervision. His enduringsupport and patience allowed me to grow on both a personal and professional level. With-out the fruitful discussions with him and his valuable advice, this work would certainlynot have been possible. Moreover, I would like to thank Prof. Dr.-Ing. Sascha Spors fordedicating a big chunk of his busy schedule for the review of this work. I also would liketo thank Prof. Dr.-Ing. Wolfgang Gerstacker and Prof. Dr.-Ing. habil. Stefan Becker forshowing interest in my work and for participating in the defense of my PhD thesis.

Moreover, I would like to thank Dr.-Ing. Sandra Brix and Prof. Dr.-Ing. ThomasSporer of Fraunhofer IDMT for funding this work in a joint research project and for beingsuch reliable project partners. In that context, special thanks goes to Dr.-Ing. AndreasFrank for providing valuable and detailed feedback on various documents.

There are many wonderful people who made my stay at the “Lehrstuhl fur Multi-mediakommunikation und Signalverarbeitung” (LMS) an unforgettable time of my life.Besides numerous interesting technical and non-technical discussions, I especially enjoyedthe unique spirit at this place. Concerning my scientific work, I would like to thankProf. Dr.-Ing. habil. Rudolf Rabenstein for inspiration and advice on various topics. Thesame holds for Christian Hofmann and Michael Burger, who provided valuable feedbackduring the process of writing. Besides these scientific aspects, I would also like to thankRudiger Nagel for his technical support and Ute Hespelein for her indispensable help withthe daily business and all administrative issues.

Finally, I would like to thank my family for being a constant source of motivation andsupport and for encouraging me to go my way, not only during the last years. The sameholds for all of my friends, although they may not be aware of this.

I

Abstract

Recently emerging techniques like wave field synthesis (WFS) or Higher-Order Ambison-ics (HOA) allow for high-quality spatial audio reproduction, which makes them candidatesfor the audio reproduction in future telepresence systems or interactive gaming environ-ments with acoustic human-machine interfaces. In such scenarios, acoustic echo cancella-tion (AEC) will generally be necessary to remove the loudspeaker echoes in the recordedmicrophone signals before further processing. Moreover, the reproduction quality of WFSor HOA can be improved by adaptive pre-equalization of the loudspeaker signals, as fa-cilitated by listening room equalization (LRE). However, AEC and LRE require adaptivefilters, where the large number of reproduction channels of WFS and HOA imply majorcomputational and algorithmic challenges for the implementation of adaptive filters. Atechnique called wave-domain adaptive filtering (WDAF) promises to master these chal-lenges. However, known literature is still far away from providing sufficient insight toallow for a successful implementation of real-world systems.

This thesis is concerned with the further development of WDAF-based generic signalprocessing algorithms and acoustic models aiming at real-time, real-world implementa-tions of AEC and LRE. As prototypical scenario, an exemplary loudspeaker and micro-phone setup is considered for which the necessary wave-domain transforms are explicitlyderived and analyzed. Thereby, the origins of the desirable wave-domain properties of theloudspeaker-enclosure-microphone system (LEMS) are explained.

For both, AEC and LRE, it is necessary to identify an LEMS, while the computationaldemands of this task render a real-time implementation unrealistic, if a large number ofreproduction channels should be considered without approximative models. The originallyproposed approximative wave-domain LEMS model is generalized such that the numberof degrees of freedom can be chosen to provide the maximum model accuracy given theapplicable computational restrictions. Typical reproduction signals will often not allowto find a unique solution to the system identification problem for multichannel reproduc-tion. A novel, rigorous and in-depth analysis of this so-called nonuniqueness problem isconducted in this thesis. Furthermore, a wave-domain technique to improve the systemidentification when nonuniqueness occurs is presented. This technique does not influencethe reproduced signals, as it would be the case for other known state-of-the-art solutions.

For an implementation of adaptive filters in the wave domain, modified versions of well-known adaptation algorithms are derived, considering approximative models and improv-ing system identification. The modified algorithms are based on the least mean squares(LMS) algorithm, the affine projection algorithm (APA), the recursive least squares (RLS)algorithm, or the generalized frequency-domain adaptive filtering (GFDAF) algorithm,

II

which is identified as an approximation of the RLS algorithm. Additionally, a novel it-erative algorithm for the determination of equalizers is presented. Experimental resultssupport the claim of applicability of the considered approach. Moreover, a real-time wave-domain AEC demonstrator has been developed which facilitates AEC for 48 loudspeakerchannels on a conventional personal computer.

On the other hand, the equalizer determination necessary for LRE is an inverse systemidentification problem. It is shown that the nonuniqueness problem occurs also for thistask while the properties of the LEMS can additionally cause nonuniqueness. In this the-sis, a generalized wave-domain approximative equalizer structure is successfully applied,which potentially allows for a real-time implementation of LRE. The achievable LREperformance with an approximative wave-domain system is assessed by simulations.

Decisive problems for the real-word implementation could be solved or mitigated usingresults presented in this thesis. Nevertheless, other challenging research questions remainunanswered and will fuel future research in this area.

III

Kurzfassung

Verfahren zur Audiowiedergabe auf dem Stand der Technik, wie z.B. Wellenfeldsynthese(WFS) oder “Higher-Order Ambisonics” (HOA), erlauben eine hochqualitative Wieder-gabe raumlicher akustischer Szenen. Das macht sie besonders attraktiv fur die Integrationin zukunftige Teleprasenzsysteme und in interaktive Spielumgebungen mit akustischerMensch-Maschine-Schnittstelle. In beiden Fallen ist eine Kompensation akustischer Echos(“acoustic echo cancellation”, AEC) notig um das Lautsprecherecho aus den aufgenomme-nen Mikrofonsignalen zu entfernen, bevor diese weiterverarbeitet werden. Daruber hinauskann die Wiedergabequalitat von WFS und HOA durch eine Wiedergaberaumentzer-rung (“listening room equalization”, LRE) verbessert werden, die durch eine adaptiveVorentzerrung der Lautsprechersignale erreicht wird. Wie die AEC erfordert auch dieLRE adaptive Filter, wobei die typischerweise hohe Anzahl an Wiedergabekanalen furWFS und HOA eine Herausforderung fur den Entwurf und die Implementierung vonAdaptionsalgorithmen darstellt. Mit “wave-domain adaptive filtering” (WDAF) wurde einAnsatz vorgestellt, mit dem diese Herausforderungen potenziell bewaltigt werden konnen.Dennoch behandeln fruhere Veroffentlichungen diesen Ansatz noch nicht in ausreichenderTiefe, um WDAF AEC- und LRE-Systeme in der Praxis zu realisieren.

Diese Dissertation befasst sich mit der Weiterentwicklung von WDAF mit dem Ziel,eine praktische Implementierung von AEC und LRE zu erreichen. Es wird ein exem-plarisches Lautsprecher-Raum-Mikrofon-System (“loudspeaker-enclosure-microphone sys-tem”, LEMS) betrachtet, fur das die notigen Transformationen in den Wellenbereichexplizit abgeleitet und untersucht werden. Dabei werden neue Erkenntnisse uber dieHerkunft der vorteilhaften Eigenschaften des LEMSs im Wellenbereich gewonnen. DieSystemidentifikation ist eine Voraussetzung fur die Implementierung von AEC und LRE,wobei der durch die große Anzahl an Wiedergabekanalen anfallende Rechenaufwand eineEchtzeitimplementierung ohne Naherungsmodelle unrealistisch erscheinen lasst. Das ur-sprunglich fur WDAF vorgeschlagene Naherungsmodell fur das LEMS wird in dieserArbeit verallgemeinert, so dass die Modellgenauigkeit der verfugbaren Rechenleistungangepasst werden kann. Typische Wiedergabesignale erlauben oft keine eindeutige Iden-tifikation des LEMSs. Dieses sogenannte Ambiguitatsproblem wird in einer bisher nochnicht bekannten Tiefe untersucht. Außerdem wird ein Ansatz zur Linderung dieses Prob-lems vorgestellt, der im Gegensatz zu bekannten Losungen keine Veranderung der Wieder-gabesignale erfordert.

Fur die Implementierung der adaptiven Filter werden verschiedene Adaptionsalgorith-men unter Berucksichtigung der Naherungsmodelle und der verbesserten Systemidentifika-tion im Wellenbereich modifiziert. Die modifizierten Algorithmen basieren jeweils auf dem

IV

“least mean squares”-Algorithmus (LMS-Algorithmus), dem “affine projection algorithm”(APA), dem “recursive least squares”-Algorithmus (RLS-Algorithmus) oder dem “general-ized frequency-domain adaptive filtering”-Algorithmus (GFDAF-Algorithmus). Letztererwird als Naherung des RLS-Algorithmus identifiziert. Daruber hinaus wird ein neuartigeriterativer Algorithmus zur Bestimmung der Entzerrer beschrieben. Evaluationsergebnissebelegen die Tauglichkeit des verfolgten Ansatzes. Daruber hinaus wurde ein Echtzeit-demonstrator fur die AEC erstellt, der 48 Lautsprechersignale auf einem gewohnlichenPersonal-Computer verarbeiten kann.

Die Bestimmung der Entzerrer fur die LRE ist ein inverses Identifikationsproblem.Es wird gezeigt, dass das Ambiguitatsproblem auch fur die Bestimmung der Entzerrerexistiert, wobei Mehrdeutigkeit auch aufgrund des betrachteten LEMS entstehen kann.Außerdem wird gezeigt, dass optimale Entzerrer mit einer vereinfachten Struktur an-genahert werden konnen, die potentiell eine Echtzeitimplementierung erlaubt. Die Wirk-samkeit dieses Ansatzes wird durch experimentelle Ergebnisse belegt.

Damit konnten in dieser Arbeit entscheidende Probleme fur die Realisierung von Vielkanal-AEC und LRE durch den vorgestellten Ansatz gelost werden. Dennoch bleiben einigeHerausforderungen bestehen, die weitere Forschung auf diesem Gebiet motivieren.

V

Contents

1 Introduction 1

2 Wave-Domain Model for Acoustic Multiple-Input/Multiple-Output Systems 72.1 Acoustic Wave Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 Sound propagation in air . . . . . . . . . . . . . . . . . . . . . . . . 82.1.2 Solutions of the homogeneous wave equation . . . . . . . . . . . . . 102.1.3 Solutions of the inhomogeneous wave equation . . . . . . . . . . . . 142.1.4 The Kirchhoff-Helmholtz integral . . . . . . . . . . . . . . . . . . . 182.1.5 The image source model . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2 The Spatial Fourier Transform and Wave Field Decompositions . . . . . . 222.2.1 Plane wave decomposition . . . . . . . . . . . . . . . . . . . . . . . 232.2.2 Cylindrical harmonics decomposition . . . . . . . . . . . . . . . . . 252.2.3 Spherical harmonics decomposition . . . . . . . . . . . . . . . . . . 272.2.4 Dimensionality and degrees of freedom of wave-field decompositions 28

2.3 Wave Fields in Acoustic Multiple-Input/Multiple-Output Systems . . . . . 302.3.1 Spatial sampling by loudspeakers and microphones . . . . . . . . . 302.3.2 Description of loudspeaker-enclosure-microphone systems in the wave

domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.3 Wave-domain model for loudspeaker-enclosure-microphone systems 452.3.4 Spatial audio reproduction . . . . . . . . . . . . . . . . . . . . . . . 492.3.5 Equalization of reproduced wave fields . . . . . . . . . . . . . . . . 52

2.4 Derivation of Wave-Domain Transforms . . . . . . . . . . . . . . . . . . . . 552.4.1 Transforms based on circular harmonics . . . . . . . . . . . . . . . 552.4.2 Wave-domain properties of a loudspeaker-enclosure-microphone sys-

tem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.4.3 Influence of array position . . . . . . . . . . . . . . . . . . . . . . . 62

2.5 Discrete-Time Signal Processing for Continuous-Time Quantities . . . . . . 662.5.1 Representation and equalization of continuous frequency responses

by discrete-time filters . . . . . . . . . . . . . . . . . . . . . . . . . 662.5.2 Discrete-time representation of the loudspeaker-enclosure-microphone

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712.5.3 Discrete-time representation of wave-domain transforms . . . . . . . 742.5.4 Properties of the reproduction signals . . . . . . . . . . . . . . . . . 77

VI

3 Wave-Domain System Identification 793.1 Signal Model and Task Definition . . . . . . . . . . . . . . . . . . . . . . . 80

3.1.1 Acoustic echo cancellation as an example of system identification . 853.1.2 Matrix and vector notation for system identification . . . . . . . . . 86

3.2 Approximative Wave-Domain System Model . . . . . . . . . . . . . . . . . 903.3 The Nonuniqueness Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 93

3.3.1 Origin and consequences for system identification . . . . . . . . . . 933.3.2 Nonuniqueness for limited models . . . . . . . . . . . . . . . . . . . 953.3.3 Consequences for applications relying on system identification . . . 983.3.4 Countermeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1003.3.5 Cost-guided wave-domain system identification . . . . . . . . . . . . 101

3.4 Adaptive Filtering Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 1033.4.1 Derivation of the block least mean squares algorithm . . . . . . . . 1043.4.2 Derivation of the normalized least mean square and the affine pro-

jection algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1063.4.3 Derivation of the multichannel recursive least squares algorithm . . 1093.4.4 Derivation of the generalized frequency-domain adaptive filtering

algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1163.4.5 Choice of algorithmic parameters . . . . . . . . . . . . . . . . . . . 1243.4.6 Summary of adaptation algorithms . . . . . . . . . . . . . . . . . . 129

3.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313.5.1 Evaluation setup for acoustic echo cancellation . . . . . . . . . . . . 1313.5.2 Acoustic echo cancellation using approximative loudspeaker-enclo-

sure-microphone system models . . . . . . . . . . . . . . . . . . . . 1323.5.3 Approximative loudspeaker-enclosure-microphone system models un-

der sub-optimal conditions . . . . . . . . . . . . . . . . . . . . . . . 1363.5.4 Cost-guided acoustic echo cancellation in underdetermined scenarios 144

3.6 Real-Time Implementation of Acoustic Echo Cancellation . . . . . . . . . . 151

4 Wave-Domain Equalization of Reproduced Acoustic Scenes 1554.1 Signal Model and Task Definition . . . . . . . . . . . . . . . . . . . . . . . 1554.2 Scalable Wave-Domain Equalizer Structure . . . . . . . . . . . . . . . . . . 1614.3 Uniqueness of Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1634.4 Determining Equalizers for Estimated Loudspeaker-Enclosure-Microphone

Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1694.5 Adaptation Algorithms for System Equalization . . . . . . . . . . . . . . . 170

4.5.1 The filtered-x structure . . . . . . . . . . . . . . . . . . . . . . . . . 1714.5.2 Application to adaptation algorithms . . . . . . . . . . . . . . . . . 1754.5.3 The iterative discrete-Fourier-transform-domain inversion algorithm 1804.5.4 Summary of adaptation algorithms . . . . . . . . . . . . . . . . . . 184

4.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.6.1 Evaluation scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Contents 1

4.6.2 Considered measures . . . . . . . . . . . . . . . . . . . . . . . . . . 1894.6.3 Time-varying scenarios . . . . . . . . . . . . . . . . . . . . . . . . . 1924.6.4 Stationary scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . 2064.6.5 Evaluation summary . . . . . . . . . . . . . . . . . . . . . . . . . . 216

4.7 Implementation of Listening Room Equalization . . . . . . . . . . . . . . . 217

5 Summary and Conclusions 219

A Transforms Based on Spherical Harmonics 223A.1 Microphone Signal Transform . . . . . . . . . . . . . . . . . . . . . . . . . 223A.2 Loudspeaker Signal Transform . . . . . . . . . . . . . . . . . . . . . . . . . 224

B Correction for a Previous Derivation of the Generalized Frequency-DomainAdaptive Filtering Algorithm 225

C Influence of Algorithm Parameters on Multiple-Input Multiple-Output Sys-tem Identification 227C.1 Step Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230C.2 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235C.3 “Forgetting factor” of the Generalized Frequency-Domain Adaptive Filter-

ing Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240C.4 Length of Considered Microphone Signals . . . . . . . . . . . . . . . . . . . 241

D Implementation of Acoustic Echo Cancellation and Listening Room Equal-ization 243D.1 Solving Systems of Linear Equation using the Cholesky Decomposition . . 244D.2 Adaptive Filters for System Identification and Acoustic Echo Cancellation 244

D.2.1 Least mean squares algorithm . . . . . . . . . . . . . . . . . . . . . 246D.2.2 Affine projection algorithm . . . . . . . . . . . . . . . . . . . . . . . 247D.2.3 Generalized frequency-domain adaptive filtering algorithm . . . . . 248

D.3 Wave-Domain System Model . . . . . . . . . . . . . . . . . . . . . . . . . . 249D.4 Adaptive Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250D.5 Increasing Efficiency for Real-Valued Loudspeaker and Microphone Signals 254

Abbreviations and Acronyms 257

Mathematical Symbols, Operations, and Conventions 259

List of Figures 273

Bibliography 279

1

1 Introduction

In audio reproduction, multiple loudspeakers are used to provide a spatial impression ofan acoustic scene to the listener. The ultimate goal is a perfect wave field reproductionof the desired acoustic scene within the considered listening space, where the quality ofthe reproduction can be improved by using an increased number of loudspeakers. Start-ing with the first stereophonic reproduction in 1881 [Sci81], techniques evolved [Blu37]towards the nowadays wide-spread surround sound systems utilizing six or eight loud-speaker channels even in living room scenarios. Examples of such systems are given bythe products of the companies DTS (formerly Digital Theater Systems) or Dolby, whilerecent multimedia file formats envisage the use of 24 channels [HNO+08]. However, forthe reproduction approaches mentioned so far, an optimal perceptual quality will only beachieved within the so-called sweet spot [The00]. To achieve a more immersive experi-ence for the listener, this region should be enlarged. Recently emerging techniques likewave field synthesis (WFS) [Ber88, BDV93] or Higher-Order Ambisonics (HOA) [Dan03]accomplish this by utilizing several tens to hundreds of loudspeakers. By enhancing theeffectively usable listening space, not only the direction but also the distance of a virtualsource becomes perceptually relevant. Having such highly-detailed spatial reproductionwith the possibility for the listener to freely move inside a sufficiently large listening space,the user experience can no longer be improved by only considering the mere reproductionof an acoustic scene through loudspeakers. Visions of future audio reproduction systemsinclude not only a precise spatial sound reproduction [SWR+13], but also the use of thosesystems in duplex communication scenarios as well as the removal of unwanted influencesof the listening room [HCB11, BSF+13]. To this end, a microphone array must be placedin the listening room in addition to the already present loudspeaker array to capture thelocal sound sources and thus also be able to provide reference signals for equalizing theroom acoustics.

In communication scenarios, acoustic echo cancellation (AEC) systems are aiming atremoving the unwanted echoes of the loudspeaker signals from the microphone signals. Induplex communication scenarios these echoes of their own utterances would disturb thefar-end party. Moreover, in immersive gaming or simulation environments, an automaticspeech recognizer of an interactive acoustic interface would exhibit a poor recognition ratewhen loudspeaker echoes are not removed [MSM+09].

Since the loudspeaker signals for WFS or HOA are typically optimized for a repro-duction under free-field conditions, both techniques are sensitive to the reverberation ofthe listening room. The listening environment must typically be acoustically treated toreduce its reverberation to an acceptable amount unless other measures are taken. How-

2

Adaptationalgorithm H(n) H

n(k)n(k)

x(k)

d(k)d(k)−+

H(n)

Figure 1.1: Signal model of an AEC system

ever, such a treatment is in general expensive and impractical for some surfaces like, e. g.,windows. A listening room equalization (LRE) system allows to relax the requirementsfor the acoustic treatment of a listening room hosting WFS or HOA systems. To thisend, the wave field reproduced in the listening area is measured by a microphone array,and pre-equalizers for the loudspeaker signals are determined such that the unwantedinfluence of the listening room is ideally compensated.

For the realization of AEC and LRE systems, adaptive filters must be implemented,where the intended large number of reproduction channels with WFS or HOA repre-sents a serious challenge due to computational and algorithmic reasons. For both, AECand LRE, the available loudspeaker and microphone signals are observed to identify theloudspeaker-enclosure-microphone system (LEMS), i. e., to estimate the impulse responsesof all loudspeaker-enclosure-microphone paths. These estimates are referred to as the es-timated LEMS in the following.

The signal model of an AEC system is shown in Fig. 1.1, where the loudspeaker signalsx(k) are fed to the LEMS H, which should be estimated by an adaptive filter H(n) that,in the ideal case, models H perfectly. The loudspeaker echo estimate d(k) obtained at theoutput of H(n) is then subtracted from the actual microphone signals d(k) to cancel theloudspeaker echo signals, ideally preserving the signals of the local acoustic scene n(k)perfectly.

In Fig. 1.2, a signal model for LRE is shown, where the pre-equalizers G(n) are de-termined such that the cascade of the pre-equalizers and the actual LEMS H is ideallyequal to a desired system response H0. The desired system response is often chosen tobe the free-field impulse response between loudspeakers and microphones since free-fieldconditions are typically also assumed for determining the loudspeaker signals for WFSor HOA. Since the LEMS H is unknown in real-world scenarios, the pre-equalizers aredetermined for the estimate H(n), which has to be obtained observing the microphonesignals d(k) and the equalized loudspeaker signals y(k) instead of the original loudspeakersignals x(k). For LRE, it is typically assumed that an equalization at the microphonepositions also implies a sufficient equalization at the listener position.

3

G(n− 1)

Equalizerdetermination

Systemidentification

x(k)

d(k)

y(k)

H(n)

H0

G(n)H0H

Figure 1.2: Signal model of an LRE system

When considering typical reproduced acoustic scenes, the individual loudspeaker signalsexhibit a strong mutual cross-correlation. This renders the task of system identificationwith an increasing number of loudspeaker channels increasingly challenging for variousreasons: Typically, H(n) is optimized for echo cancellation, where the correlation prop-erties of the loudspeaker signals allow for an optimal echo cancellation without requiringa unique LEMSs estimate. This implies that even a perfect echo cancellation does notnecessarily lead to a good system identification and is called the nonuniqueness problem,which is already known from the stereophonic AEC [BMS98]. In such scenarios, the esti-mated LEMS is a member of an unbounded set determined by the correlation propertiesof the loudspeaker signals. Hence, a change of the correlation properties will typically in-validate a previously optimum solution and applications relying on system identificationwill suffer.

Even when the individual loudspeaker signals theoretically allow for a perfect identi-fication, their strong cross-correlation can preclude the successful application of compu-tationally inexpensive adaptation algorithms. Considering a conservative estimate, thecomputational complexity of efficient suitable algorithms grows at least quadratically withthe number of reproduction channels and renders the computational demands for systemidentification another obstacle.

Moreover, the determination of pre-equalizers requires a computational complexity thatis proportional to at least the fourth power of the number of reproduction channels,although the reproduced loudspeaker signals do not necessarily influence the algorithmsused for this purpose. In this context, the nature of the LEMS’s system response itselfmakes the determination of the pre-equalizers difficult: As each loudspeaker is coupledto each microphone and vice versa, all of these similarly weighted couplings must beconsidered simultaneously in order to obtain optimal pre-equalizers.

In 2004, a technique called wave-domain adaptive filtering (WDAF) was proposed[BSK04, SBR04], which reduced the computational demands for system identification

4

and equalizer determination drastically by using an approximative model for the LEMS.The original idea behind WDAF is that signal transforms approximate an eigenvalue de-composition of the matrix representing the LEMS, resulting in a strictly diagonal matrixdescribing the LEMS in the wave domain. This is different from conventional LEMS mod-els in the point-to-point domain, where each individual loudspeaker-enclosure-microphonepath has to be modeled. Unlike other transform-domain approaches [HBS10b, SBR06],WDAF achieves the desirable transform-domain LEMS properties using an acoustic modelbased on solutions of the wave equation. Hence, WDAF exploits knowledge about thetransducer positions. The diagonal model proposed in [BSK04, SBR04] provides onlya limited number of degrees of freedom such that all model coefficients can always beuniquely determined and the nonuniqueness problem does not occur.

However, the wave-domain LEMS model presented in [BSK04, SBR04] suffers fromshortcomings in many practically relevant reproduction scenarios. Hence, in this thesisthe WDAF approach is generalized and restrictions on the wave-domain LEMS model willbe imposed in a more flexible manner. Additionally, it is proposed to exploit the wave-domain properties of LEMS even further without restricting the LEMS model. Thesecontributions seem to constitute important steps towards real-world implementations ofWDAF-based AEC and LRE systems that in turn are suitable for many practically rele-vant scenarios. Moreover, the framework necessary for WDAF AEC and LRE is describedin a comprehensive and explicit manner for the first time. This includes formulations ofthe wave-domain transforms and specially tailored adaptation algorithms. While thisframework description already provides practicable solutions for various application sce-narios, other parts of this thesis are intended to help and inspire the further developmentof WDAF and multiple-input multiple-output (MIMO) adaptive filtering in general. Thepredominant interest in real-world application to AEC and LRE did not allow for treatingall other aspects in similar detail. In the following, the contents of the individual chap-ters of this thesis is briefly summarized, if necessary preceded by an introduction to thetreated problems and the state-of-the-art.

In Chapter 2, the fundamentals of acoustics for the wave-domain LEMS models arediscussed. First, a short review of the wave equation and the resulting wave-field decom-positions is given, before the relation of a wave field excited within an LEMS and theconsidered signal quantities are established. This has not been documented previouslysince WDAF has often been interpreted as an approximation of eigenspace adaptive fil-tering [SBRH07, SBR06]. Unlike many previous publications [BSK04, SBR04, PCRP11],the transforms presented in [SK12b] rely only on sound pressure microphone signals.These transforms are also revisited in this chapter due to their importance for the furtherderivations. In doing so, previous publications [BSK04, SBR04, SK11, SK12b] are comple-mented by an analysis of the explicitly formulated transforms. At the end of the chapter,the discrete-time representations of the continuous-frequency representations from thepreceding considerations are explained.

System identification is treated in Chapter 3, where AEC is considered as an obviousapplication. Previously, single-channel AEC has been extensively analyzed and evaluated

5

for practically relevant scenarios [SB80, Kel88, BGM+01]. In multichannel AEC the prob-lems already inherent to the single-channel case become even more crucial, notably theslow convergence of the adaptation algorithms. While the increase of the computationalcomplexity is an obvious consequence of considering multiple rather than one channel,the algorithmically much more challenging nonuniqueness problem is only given in themultichannel case. In this thesis, both, the nonuniqueness problem and the need to keepcomputational complexity low are addressed.

Although the application of AEC for stereophonic reproduction is already well-inves-tigated [SMH95, BMS98], the implementations were only considering a relatively lownumber of loudspeaker channels [SM95, BBK03]. In fact, a large share of the literatureon stereophonic two-channel AEC claims the term “multichannel”, which obscures the factthat considerations of LEMSs with more than 5 loudspeaker channels are rarely found[HBS10b]. This holds especially for remedies against the nonuniqueness problem, whichwere mainly evaluated for two channels [SMH95, BMS98, GT98, GE98, MHB01, WWJ12]or five channels [HBK07].

The increased computational effort fueled the research for fast implementations of pre-viously presented adaptation algorithms. In this context, [BM00, Shy92, GT95] representsome examples among others. Still, the large computational demands when consider-ing several tens of loudspeakers render a real-time implementation rather challenging[SSK12]. This thesis is focused on using the generalized frequency-domain adaptive fil-tering (GFDAF) algorithm, which provides a fast and robust convergence combined withmoderate computational demands. Here, a computational improvement of this algorithmitself is not intended. Instead, computational demands are reduced by applying approx-imative LEMS models [BSK04, SK11]. The approximative LEMS model proposed in[BSK04, SBR04] has been generalized in [SK11] and will be used in this thesis.

After discussing the task of system identification and the signal model for wave-domainsystem identification, the model presented in [SK11] is revisited to prepare an analysis ofthe nonuniqueness problem for approximative models. This complements past analyses[SMH95, BMS98, HBC06] with previously undocumented findings. As a remedy for thenonuniqueness problem, it is proposed to modify the cost functions of the adaptation al-gorithms such that the estimated LEMS reflects some basic wave-domain properties of thetrue LEMS. This so-called cost-guided system identification is also described in [SK16b]and allows for estimating an LEMS more accurately than other known approaches. Unlikemost state-of-the-art solutions [SMH95, BMS98, GT98, GE98, MHB01, WWJ12, HBK07],the loudspeaker signals will not be influenced by this approach, which precludes any degra-dation of the reproduction quality.

To implement approximative LEMS models and cost-guided system identification, mod-ifications to well-known adaptation algorithms are presented. The algorithms underconsideration are the least mean squares (LMS) algorithm, the affine projection algo-rithm (APA), the recursive least squares (RLS) algorithm, and the GFDAF algorithm.The derivation identifies the GFDAF algorithm as an approximation of the RLS algorithmand establishes a link between the regularization of the RLS algorithm and the LMS al-

6

gorithm. These finding are also reported in [SK16a], where the APA and the cost-guidedsystem identification have been disregarded. To conclude this chapter, evaluation resultsfor wave-domain AEC and system identification are presented.

The application of WDAF to LRE is discussed in Chapter 4, where it is shown how theconcepts of approximative wave-domain filter structures and cost-guided filter adaptationcan also be applied. Typically, the task of LRE is reduced to determining the equalizersfor a given LEMS, while the task of simultaneously identifying and equalizing an LEMSis rarely considered in the literature [TZA14]. This runs contrary to the fact that theLEMS properties can vary due to a change in the room temperature [OYS+99] but canbe attributed to the fact that LRE based on an estimated LEMS is considered to be verychallenging [GKMK08].

The inability of single-channel approaches to achieve equalization in an extended area[Mou85] motivated using MIMO equalizers [NHE92, NOBH95, LGF05]. Many contribu-tions propose algorithms for the equalizer determination, while disregarding a model ofthe reproduced wave field [LGF05, KNHOB98, MMK10, Bou03, SK12c]. On the otherhand, approaches considering a model of the reproduced wave field are often more closelyrelated to spatial audio reproduction in reverberant environments rather than determiningequalizers [BA05].

When WDAF was firstly presented, LRE was one of the envisaged applications, sincethe approximative LEMS model implied an approximative equalizer structure [SBR04].This equalizer structure was generalized in [SK12a], where this approach could not bedescribed in detail due to space restrictions. Hence, a rigorous and detailed descriptionof this approach is presented in this thesis.

Moreover, it is shown that the nonuniqueness problem is also encountered with the de-termination of the equalizers, which is documented in this thesis for the first time. Sincethe problem of LRE is often reduced to determining equalizers for a known or previouslyestimated LEMS [BA05, LGF05, NOBH95], the nonuniqueness due to the properties ofthe reproduction signals is not seen as relevant in these publications. The relation of thetasks to equalize a reproduced scene and to determine equalizers for an estimated LEMSis discussed later in this chapter. This includes a description about how the adaptationalgorithms derived in Chapter 3 can be used for equalizer determination and, additionally,a rigorous derivation of the iterative DFT-domain inversion (IDI) algorithm that extendsits first description in [SK12c]. All adaptation algorithms described in Chapter 4 are pre-sented in variants that can be applied to approximative wave-domain equalizer structuresand to cost-guided equalizer determination. This chapter is concluded by an evaluationof a wave-domain LRE using the models and methods presented before. This evaluationalso considers an LRE system that optimizes equalizers for a simultaneously estimatedLEMS, a scenario which is considered to be very challenging and therefore often avoidedin the literature. Finally, this thesis is concluded in Chapter 5.

Note that an errata sheet and a Portable Document Format (PDF) version of this thesisare provided at http://martinschneider.name/phd-thesis/.

http://martinschneider.name/phd-thesis/

7

2 Wave-Domain Model for AcousticMultiple-Input/Multiple-OutputSystems

In this chapter, the acoustic loudspeaker-enclosure-microphone system (LEMS) model isintroduced as the basis for wave-domain adaptive filtering (WDAF). To this end, thedescription of acoustic wave fields, excited within an LEMS, is treated in Sec. 2.1 bypresenting solutions of the wave equation. In Sec. 2.2, the spatial Fourier transform isused to derive wave field decompositions that are based on solutions of the homogeneouswave equation presented in Sec. 2.1. These wave field decompositions form the basis ofthe transforms to the wave domain, which are derived later. In Sec. 2.3, the modeling ofa multiple-input multiple-output (MIMO) LEMS in the wave-domain is discussed. Themain aspect of this chapter is discussed in Sec. 2.4, where transforms to the wave domainare derived such that loudspeaker and microphone signals can be related to the wavefield excited in an LEMS. While the derivations in this chapter are formulated in thecontinuous frequency domain for generality and notational convenience, a discrete-timerepresentation of the transforms and system models is presented in Sec. 2.5 to ensurecompatibility with the following chapters.

2.1 Acoustic Wave FieldsIn this section, the physical fundamentals of acoustic wave fields are briefly presented tolay the foundation for the wave-domain LEMS model presented later. To this end, theacoustic wave equation is derived in Sec. 2.1.1, while solutions of the homogeneous andthe inhomogeneous wave equation are presented in Sections 2.1.2 and 2.1.3, respectively.The Kirchhoff-Helmholtz integral, as treated in Sec. 2.1.4, can be used to determine thesound pressure inside a source-free volume from the boundary conditions imposed on theenclosing surface. The wave field excited by sound sources in the presence of a reflectingsurface can be described by the image source model, which is described in Sec. 2.1.5.

In this thesis, positions are equivalently described in various coordinate systems, suchas Cartesian coordinates, cylindrical coordinates, or spherical coordinates:

~x =

x

y

z

=

% cos(α)% sin(α)

z

=

r cos(α) sin(ϑ)r sin(α) sin(ϑ)

r cos(ϑ)

, (2.1)

8

where x, y, and z denote the respective Cartesian coordinates, α and % describe theazimuth and the radius, in cylindrical and polar coordinates, respectively, and ϑ is theinclination angle, while r is the distance to the origin of the coordinate system when usingspherical coordinates. An alternative vector representation is given by

~x = x~ex + y~ey + z~ez = α~eα + %~e% + z~ez = α~eα + ϑ~eϑ + r~er, (2.2)

where ~ex, ~ey, ~ez, ~eα, ~e%, ~eϑ, and ~er are unit vectors in the directions of the coordinatesx, y, z, α, %, ϑ, and r, respectively.

Similar to (2.1), the wave vector describing the propagation direction of a wave willbe denoted by

∼k where its components

∼kx,

∼ky,

∼kz,

∼k%,

∼kα, and

∼kϑ correspond to x, y, z,

%, α, and ϑ, respectively, in the appropriate coordinate systems. Since this thesis mainlydeals with transducer array geometries located in the x-y-plane, it is often not necessaryto describe a problem in three dimensions. Whenever two dimensions are sufficient, thethird coordinate is omitted and z = 0, ϑ = π/2,

∼kz = 0, or

∼kϑ = π/2 is assumed. Note

that in the x-y-plane r is equal to % and, then, the representation in polar and sphericalcoordinates is identical.

2.1.1 Sound propagation in airIn this section, the acoustic wave equation is discussed for an ideal homogeneous isotropicgas, where state changes can be described as adiabatic processes and gravity forces can beneglected. Thus, the speed of sound c and the equilibrium density of the medium ρ0 areindependent of time t and position ~x. Furthermore, it is assumed that the sound pressurep(~x, t) and the density fluctuations ρ(~x, t) are small compared to local ambient pressurep0 and equilibrium density of the medium, ρ0. These assumptions are typically fulfilledwhen considering the sound propagation in LEMSs [Spo05].

The sound pressure p(~x, t) describes a deviation from the local ambient pressure of themedium, such that the instantaneous pressure of the medium is given by p0 + p(~x, t).Likewise, the instantaneous density of the medium is given by ρ0 + ρ(~x, t). Following aderivation discussed in many textbooks [Bla00, Wil99], the sound pressure p(~x, t) and theparticle velocity ~v(~x, t) can be related to each other by the linearized continuity equationand the linearized momentum equation [Bla00]:

ρ0 〈∇~x , ~v(~x, t)〉+ 1c2∂

∂tp(~x, t) = m(~x, t), (2.3)

ρ0∂

∂t~v(~x, t) +∇~x p(~x, t) = ~d(~x, t). (2.4)

Here, the Nabla operator ∇~x denotes the gradient if applied to a scalar-valued functionand the divergence when applied to a vector-valued function in conjunction with a scalarproduct 〈· , ·〉 between two vectors. The quantity m(~x, t) can be interpreted as a change

2.1 Acoustic Wave Fields 9

p(x, t) p(x+ dx, t)d(x, t)

m(x, t)v(x, t) v(x+ dx, t)

x x+ dx

Figure 2.1: One-dimensional illustration of the quantities relevant for the derivation ofthe wave equation

of density over time in the infinitesimal volume portion, while ~d(~x, t) describes a forceapplied per volume portion, which is (in contrast to m(~x, t)) a vector-valued quantity.Both quantities are necessary to describe two fundamental source models: A monopolesource (or point source) that radiates the sound pressure equally in all directions and adipole source, which exhibits a directional radiation pattern. This directional radiationpattern would also result for two monopole sources located at an infinitesimal distance toeach other that are emitting signals with identical magnitude but inverted phase [Pie89].The non-directional m(~x, t) can be used to describe a monopole source when sampled ata single position. On the other hand, ~d(~x, t) describes a dipole source with both poleslocated on an axis in direction of the force vector.

Figure 2.1 illustrates the meaning of the quantities introduced in (2.3) and (2.4) consid-ering a one-dimensional example. The differential form of (p(x+ dx, t)− p(x, t))/ dx canbe expressed as ∇~x p(~x, t) and describes an acceleration force per volume. Likewise, thedifferential form of (v(x + dx, t) − v(x, t))/ dx can be expressed as 〈∇~x , ~v(~x, t)〉 and de-scribes a temporal frequency which can be related to density fluctuations of the medium.In this example, the functions d(x, t) and m(x, t) describe ~d(~x, t) and m(~x, t), respectively,and are assumed to be constant within dx.

Using (2.3) and (2.4), the acoustic wave equation can be derived for the sound pressureand the particle velocity:

∇2~x p(~x, t)−

1c2∂2

∂t2p(~x, t) =

⟨∇~x , ~d(~x, t)

⟩− ∂

∂tm(~x, t), (2.5)

∇~x 〈∇~x , ~v(~x, t)〉 − 1c2∂2

∂t2~v(~x, t) = 1

ρ0∇~xm(~x, t)− 1

ρ0c2∂

∂t~d(~x, t), (2.6)

where(2.5) is obtained applying the divergence operator to both sides of(2.4) and pluggingthe temporal derivative of (2.3) into the resulting equation. Similarly, (2.6) is obtainedusing the gradients of both sides of (2.3) and plugging the temporal derivative of (2.4)into the resulting equation. For m(~x, t) = 0 and ~d(~x, t) = ~0, (2.5) and (2.6) constitutethe homogeneous wave equation. To describe the inhomogeneous wave equation, m(~x, t)

10

and ~d(~x, t) are set according to the properties of the considered source distribution. Forthe following derivations, a consideration of (2.5) is sufficient and (2.6) is only mentionedto emphasize the equivalence of wave-field descriptions in terms of sound pressure andsound velocity. Note that the wave equation also holds for other acoustic quantities, e. g.,the density fluctuation in the medium or the acoustic potential [Pie89], which are notconsidered here because they are harder to access by intuition.

Using the Fourier transform, (2.5) can be written equivalently in the frequency domainas:

∇2~xP (~x, ω) +

∼k2P (~x, ω) =

⟨∇~x , ~D(~x, ω)

⟩− jωM(~x, ω) (2.7)

= QD(~x, ω)−QM(~x, ω), (2.8)

where P (~x, ω), M(~x, ω), and ~D(~x, ω) are the frequency-domain representations of p(~x, t),m(~x, t), and ~d(~x, t), respectively, ω is the angular frequency, j is used as the imaginaryunit (j2 = −1), and

∼k = ω

c(2.9)

represents the wave number.1Equation (2.7) represents the well-known Helmholtz equation, which will be in the focus

of the following considerations. The quantities QD(~x, ω) and QM(~x, ω) allow for a moreconvenient notation in the following.

In this thesis, the temporal Fourier transform is defined by

X(ω) =∫ ∞−∞

x(t)e−jωt dt ⇔ x(t) = 12π

∫ ∞−∞

X(ω)ejωt dω, (2.10)

as commonly used in electrical engineering.

2.1.2 Solutions of the homogeneous wave equationIn this section, solutions of (2.7) with m(~x, t) = 0 and ~d(~x, t) = ~0 will be discussed, con-sidering Cartesian, cylindrical, and spherical coordinates. Because the gradient operatorassumes different forms depending on the used coordinate system, the corresponding fun-damental solutions of the wave equation differ in the same way. The obtained frequency-domain solutions describe a complex-valued sound pressure in the time domain such thatthey do not necessarily represent wave fields that could directly exist in the real-worldtime domain. The link to real-valued time-domain wave fields is established in the end ofthis section.

1Note that∼k is a notational exception as it is no wave-domain quantity but, nevertheless, denoted by a

tilde. This choice was made to distinguish it from the discrete-time index k used later.


In Cartesian coordinates, the gradient has the form ∇~x = ∂∂x~ex+ ∂

∂y~ey + ∂

∂z~ez. Using

the separation of variables approach, plane waves are obtained as fundamental solutionsfor the homogeneous wave equation in Cartesian coordinates [Wil99]:

P (pw)(~x,∼kx,

∼ky,

∼kz) = e−j

∼kxxe−j

∼kyye−j

∼kzz = e

−j⟨∼

k , ~x⟩, (2.11)

where the relation∼k2 =

∼kx

2 +∼ky

2 +∼kz

2 (2.12)

must always be fulfilled. The wave vector∼k is given by

∼k =

(∼kx,

∼ky,

∼kz)T

(2.13)

where the direction of∼k also describes the traveling direction of the plane wave. Note that

∼kx,

∼ky, and

∼kz are assumed to be real-valued, as this thesis is only covering propagating

longitudinal waves.The traveling direction of a single wave-field component at a given position can be

determined by

~t(~x) = j∇~xP (~x, ω)∼kP (~x, ω)

, (2.14)

where this equation also holds for the following solutions of the wave equation if not statedotherwise. The latter statement is also the reason why the term wave-field componentwas chosen, which captures plane waves in the same way as other fundamental solutionsof the wave equation. For a plane wave, ~t(~x) is independent of the position and given by

~t(~x) =∼k∼k. (2.15)

An important property of plane waves is their shift-invariance with respect to anydirection perpendicular to their traveling direction. Consequently, (2.11) can describetwo-dimensional and one-dimensional wave fields, although this solution was obtainedsolving the wave equation in three dimensions. A two-dimensional wave field in the x-yplane can be described for

∼kz = 0, while a one-dimensional wave field in x-direction is

described for∼ky =

∼kz = 0. In the latter case, there are only two fundamental solutions:

∼kx =

∼k and

∼kx = −

∼k, which can be interpreted as waves traveling in positive and negative

x-direction (or traveling forward and backward), respectively. This is in accordance to thewell-known d’Alembert’s solution of the one-dimensional wave equation [Kut09, D’A47].

12

In cylindrical coordinates, the gradient operator is represented by ∇~x = ∂∂%~e% +

1%∂∂α~eα + ∂

∂z~ez. A separation of the variables leads to the so-called cylindrical harmonics

as a solution of (2.7) [Wil99]:

P(cy)m (~x,

∼k%,

∼kz) = Hm

(∼k%%

)e−j

∼kzzejmα, (2.16)

where∼k2 =

∼k%

2 +∼kz

2 (2.17)

and

Hm (x) =

H(1)m (x) for x ≥ 0,H(2)m (−x) for x < 0,

(2.18)

with H(1)m (x) and H(2)

m (x) representing Hankel functions of order m and of first andsecond kind, respectively. The circle is used to identify m as the cylindrical or circularmode order, where only real-valued arguments x are considered since imaginary argumentswould describe non-propagating (evanescent) waves.

Definition(2.18) is chosen such that, considering the z-axis,(2.16) describes an incomingwave for

∼k% > 0 and an outgoing wave for

∼k% < 0. Hence, the sign of

∼k% defines the traveling

direction of the cylindrical harmonic, which can be verified using (2.14). The definition(2.18) will be used to maintain consistency in the remainder of this thesis, as Hankelfunctions are not unambiguously defined for arguments on the negative real axis. Forlarge arguments x the following approximations can be used [OLBC10]

H(1)m (x) ≈

√2πxej(x−

12 mπ−

14π), (2.19)

H(2)m (x) ≈

√2πxe−j(x−

12 mπ−

14π). (2.20)

A very important special case of (2.16) is given by the free-field case, where an incom-ing wave is always accompanied by an outgoing wave. The resulting superposition of twocylindrical harmonics, which differ only by their sign in

∼k% cannot exhibit a singularity at

the origin of the coordinate system and is given by

P(cy)m (~x,

∼k%,

∼kz) = 2Jm

(∣∣∣∼k%%∣∣∣) e−j∼kzzejmα. (2.21)

Here, Jm (|x|) denotes an ordinary Bessel function of the first kind andHm (x)+Hm (−x) =2Jm (|x|) was exploited. The breve on top of P (cy)

m (~x,∼k%,

∼kz) is used to denote a solution


valid in the free field. Note that applying (2.14) to a solution according to (2.21) is notvalid, as a standing wave does not have a meaningful traveling direction.

If (2.16) describes a solution independent of the z-axis (i. e.∼kz = 0), a solution in terms

of so-called circular harmonics is obtained:

P(ci)m (~x,

∼k) = Hm

(∼k%)ejmα, (2.22)

P(ci)m (~x,

∼k) = 2Jm

(∣∣∣∼k%∣∣∣) ejmα. (2.23)

In spherical coordinates, the gradient operator is represented by ∇~x = ∂∂r~er+

1r sin(ϑ)

∂∂α~eα + 1

r∂∂ϑ~eϑ. A separation of the variables leads to the so-called spherical har-

monics as a solution of (2.7) [Wil99]:

P(sp)m,n(~x,

∼k) = hn

(±

∼kr)Y mn (ϑ, α), (2.24)

where

Y mn (ϑ, α) =

√√√√(2n+ 1)4π

(n− |m|)!(n+ |m|)!P

mn (cosϑ)ejmα, (2.25)

with the associated Legendre polynomials P mn (x) of order |m| and degree n (cf. [Teu07]).

Note that considering the absolute value of m is important to achieve the same weightfor positive and negative m and because the associated Legendre polynomials are onlydefined for non-negative orders. These aspects are often ignored in the literature. Similarto (2.16), (2.24) uses the definition

hn (x) =

h (1)n (x) for x ≥ 0,

h (2)n (−x) for x < 0,

(2.26)

where the argument x is assumed to be real-valued and the spherical Hankel functionsh (1)n (x) and h (2)

n (x) are defined by [OLBC10]

h (1)n (x) =

√π

2xH(1)n+ 1

2(x) , (2.27)

h (2)n (x) =

√π

2xH(2)n+ 1

2(x) . (2.28)

The traveling direction of the components can be determined by(2.14), where a positivesign of the argument hn

(±

∼kr)

describes a wave traveling towards the coordinate originand a negative sign results in the opposite propagation direction. Again, in the free-fieldcase, (2.24) may not exhibit a singularity at r = 0, enforcing a solution according to

P(sp)m,n(~x,

∼k) = 2 jn

(∣∣∣∼kr∣∣∣)Y mn (ϑ, α), (2.29)

14

where jn (|x|) is the spherical Bessel function of the first kind and order n. Like for (2.21),applying (2.14) to a solution according to (2.29) does not lead to a meaningful result.

The solutions of the wave equation above are given in the frequency domain, while theoriginal starting point for their derivation was the sound pressure in the continuous timedomain p(~x, t). However, a straightforward transform of those solutions back to the timedomain would result in a complex-valued sound pressure that could not exist in the realworld. Still, the relation to real-valued sound pressures can be established, as shown inthe following.

In the real-world, wave-field components carry source signals, which could be ignoredabove. To introduce the concept of source signals, two aspect have to be considered:First, solutions of homogeneous differential equations can be scaled arbitrarily. Second,linearity of the wave propagation is required for (2.5) and (2.7), which implies thatprocesses at distinct frequencies do no influence each other. Consequently, a wave-fieldcomponent carrying a source signal can be simply described by a frequency-dependentweighting with the spectrum of the respective source signal, which is denoted by Q(ω) inthe following. As this spectrum corresponds to a real-valued signal in the time domain,the equation

Q(ω) = Q∗(−ω) (2.30)

holds, where ·∗ denotes the conjugate complex. At the same time, it can be seen that(2.30)would not hold for any of the solutions presented above, while real-world sound pressuresare real-valued in principle. To fulfill this requirement, two instances of the same solutionhave to be considered for positive and negative ω, which are complex conjugate to eachother. This can be achieved by an appropriate choice of positive or negative wave-vectorcomponents or mode orders. An example is given for plane waves by

P(pw)R (~x,

∼kx,

∼ky,

∼kz) =

e−j⟨∼

k , ~x⟩

for ω ≥ 0,

e−j⟨−

∼k , ~x⟩

otherwise,(2.31)

where the finally resulting wave field is described by Q(ω)P (pw)R (~x,

∼kx,

∼ky,

∼kz) and can be

transformed back to a real-valued sound pressure the time domain using (2.10). Theprocedure for cylindrical, circular, or spherical harmonics is similar.

2.1.3 Solutions of the inhomogeneous wave equationIn this section, solutions of the inhomogeneous wave equation are discussed. Unlike so-lutions of the homogeneous wave equation, the solutions discussed here allow for the de-scription of certain acoustic sources within the considered volume. To describe monopolesources, (2.7) is considered, where M(~x, ω) (represented by QM(~x, ω)) is allowed to benon-zero. Another case of an inhomogeneous wave equation is given by ~D(~x, ω) 6= ~0 (rep-resented by QD(~x, ω)), which is necessary to describe dipole sources. Sound sources like,


e. g., a loudspeaker will emit a source signal, which is described by the temporal frequencyspectrum Q(ω) in the following. This source signal will then be captured in QM(~x, ω) orQD(~x, ω).

Green’s functions

Green’s functions, here denoted by G(~x|~x0,∼k), can be used to determine the wave field of

an arbitrary monopole source distribution according to [Wil99]

P (~x, ω) =y

VQ

QM(~x0, ω)G(~x|~x0,∼k) dV ′Q, (2.32)

where VQ describes the entire volume in which the source distribution QM(~x, ω) expands,and V ′Q, an infinitesimal volume portion at position ~x0, is used for integration. Thisintegral can be interpreted as a spatial convolution integral. The actual Green’s functiondepends on the given boundary conditions and fulfills

∇2~xG(~x|~x0,

∼k) +

∼k2G(~x|~x0,

∼k) = δ3 (~x− ~x0) , (2.33)

when describing a single monopole sources at position ~x0. where the three-dimensionalDirac distribution δ3 (~x) is defined by the sifting property∫ ∞

−∞

∫ ∞−∞

∫ ∞−∞

f (~x) δ3 (~x0 − ~x) dx dy dz = f (~x0) (2.34)

and f (~x) is an arbitrary finite-valued function.In the free-field case the three-dimensional Green’s function is given by

G(~x|~x0,∼k) = 1

4πe−j

∼k‖~x−~x0‖2

‖~x− ~x0‖2. (2.35)

To describe reflections, the image source model can be used [AB79, Bor84] as will bedescribed in Sec. 2.1.5. Obviously, when superimposing multiple individual solutions ofthe homogeneous wave equation to(2.35),(2.33) is still fulfilled. Hence, a general definitionfor G(~x|~x0,

∼k) would also include a term to consider any other solution of the homogeneous

wave equation, which is omitted here, as it is not needed in the following.The time-domain representation of (2.35) is given by

g(~x|~x0, t) = 14π

δ1 (t− ‖~x− ~x0‖2 /c)‖~x− ~x0‖2

, (2.36)

identifying (2.35) as a delay and an attenuation, where the one-dimensional Dirac distri-bution is again defined by the sifting property∫ ∞

−∞f (x) δ1 (x0 − x) dx = f (x0) , (2.37)

16

with f (x0) being an arbitrary finite-valued function at x = x0.In spherical coordinates, a position in the source distribution is represented by ~x0 =

α0~eα + ϑ0~eϑ + r0~er. Considering this coordinate system, the free-field Green’s function isgiven by [Teu07, Wil99]

G(~x|~x0,∼k) = −j

∼k∞∑n=0

n∑m=−n

jn(∣∣∣∼kr∣∣∣) (Y m

n (ϑ, α))∗ hn

(−

∼kr0

)Y mn (ϑ0, α0) (2.38)

assuming r < r0. Equation (2.38) can be interpreted as describing two steps: First, adescription of the wave field of a point source located at ~x0 in terms of spherical harmonics.Second, applying (2.29) to obtain the sound pressure at ~x. An important property of theGreen’s function is reciprocity (G(~x|~x0,

∼k) = G(~x0|~x,

∼k)), which can be used to formulate

(2.38) for r > r0 as

G(~x|~x0,∼k) = −j

∼k∞∑n=0

n∑m=−n

hn(−

∼kr) (Y mn (ϑ, α)

)∗ jn(∣∣∣∼kr0

∣∣∣)Y mn (ϑ0, α0). (2.39)

In the following, point sources, dipole sources, line sources and planar sources are pre-sented as important special source distributions.

Point sources

Point (or monopole) sources, as discussed in this section, are a very fundamental modelfor acoustic sources. As they describe a spherical source model, they can be used toapproximate the wave field of loudspeakers in closed cabinets [Cro98]. The wave field ofa point source placed at ~x0 can be determined by plugging

QM(~x, ω) = Q(ω)δ3 (~x− ~x0) , QD(~x, ω) = ~0 (2.40)

into (2.8), which leads to [Pie89]

P (pt)(~x, ~x0, ω) = Q(ω) e−j∼k‖~x−~x0‖2

4π ‖~x− ~x0‖2(2.41)

and represents the multiplication of the Green’s function with the source signal spectrum.

Dipole sources

A dipole source is another fundamental source model, which plays a role in the Kirchhoff-Helmholtz integral discussed in Sec. 2.1.4. To describe a dipole source located at ~x0 andoriented in direction ~DS,

QD(~x, ω) = Q(ω)δ3 (~x− ~x0) ~DS, QM(~x, ω) = 0 (2.42)


is plugged into (2.8), where the gradient of the three-dimensional Dirac distribution hasto be defined according to∫ ∞

−∞

∫ ∞−∞

∫ ∞−∞

f (~x)∇~xS δ3 (~xS − ~x) dx dy dz = ∇~x f (~xS) . (2.43)

The resulting wave field is given by

P (~x, ω) = Q(ω)⟨~DS ,∇~x~x0G(~x|~x0,

∼k)⟩. (2.44)

Line sources

Line sources describe sources distributed along infinite lines and are important for thederivation of wave field synthesis (WFS) in Sec. 2.3.4. Using

QM(~x, ω) = Q(ω)δ2(~x− ~x0 −

⟨~n(li), (~x− ~x0)

⟩~n(li)

), (2.45)

where the two-dimensional Dirac distribution is defined by∫ ∞−∞

∫ ∞−∞

f (~x) δ2 (~xS − ~x)|z=0 dx dy = f (~xS) , (2.46)

the wave field of a line source is given by [Spo05]

P (li)(~x, ~x0, ω) = j

4Q(ω)H0(−

∼k∥∥∥~x− ~x0 −

⟨~n(li), (~x− ~x0)

⟩~n(li)

∥∥∥2

). (2.47)

Here, ~x0 = (x0, y0, z0)T (with (·)T denoting the transposition) defines an arbitrary pointon the line source, while the unit length vector ~n(li) is parallel to the line describing thesource distribution.

which accounts for the fact that unlike in (2.34), only an integration over two variablesis necessary in (2.46).

Planar sources

Another important type of sources are planar sources, which excite plane waves. Withinthis thesis, planes are assumed to have an infinite extent, such that they are uniquelydefined by a unit vector ~n(pw) normal to the plane, pointing towards the origin of thecoordinate system, and the minimum distance d(pw) of a point on this plane to the origin,as illustrated in Fig. 2.4. Using

QM(~x, ω) = Q(ω)δ1(⟨~n(pw), ~x

⟩+ d(pw)

), ~D(~x, ω) = ~0 (2.48)

18

in (2.8), the sound pressure resulting from this source is described by [Spo05]

P (pw)(~x, ~x0, ω) = Q(ω)e−j∼k(〈~n(pw), ~x〉+d(pw)). (2.49)

Equation (2.49) is only valid on that side of the plane facing the origin of the coordinatesystem, although a planar monopole source radiates in both directions normal to theplane. However, this leads to two plane waves traveling in opposite directions, one oneach side of the plane, where only one wave is described by (2.49).

When comparing (2.41), (2.47), and (2.49), while considering (2.19), (2.20), (2.27),and (2.28) it can be seen that a wave emitted from a planar source has no amplitudedecay when propagating, while the wave fields of line or point sources exhibit a decay of1/√Dmin and 1/Dmin, respectively, where Dmin is the minimum distance of the observation

point to the source.

2.1.4 The Kirchhoff-Helmholtz integral

Solutions of the homogeneous wave equation are only valid inside a source-free region,although acoustic sources are the natural origin of any wave field. Consequently, wavefields described by the homogeneous wave equation are defined by the imposed boundaryconditions.

The Kirchhoff-Helmholtz integral describes the relation of a wave field on a surface Senclosing a source-free volume V [Wil99, Spo05] and the wave field inside this volume.2For the following discussion, a wave field described by P (~x, ω) is considered that is excitedfrom sources outside V , i.e., it fulfills (2.7) with QD(~x, ω) = QM(~x, ω) = 0 ∀ ~x ∈ V .

While P (~x, ω) is defined for an arbitrary ~x inside and in the circumference of V , only theboundary conditions imposed at the surface S are considered in the Kirchhoff-Helmholtzintegral. Those are described by the sound pressure P (~xS, ω) and its derivative withrespect to the normal direction ~n (~xS) pointing inside V at ~xS, where ~xS is a point on thesurface S.

From these boundary conditions it is possible to determine the sound pressure P ′(~x, ω)according to [Pie89, Wil99]:

P ′(~x, ω) = −∮S

G(~x|~xS,∼k)∂P (~xS, ω)

∂~n (~xS) − P (~xS, ω)∂G(~x|~xS,∼k)

∂~n (~xS)

dS ′, (2.50)

2An elegant derivation of the Kirchhoff-Helmholtz integral can be found in Section 2.6 of [Spo05].


S

V~0

~xS

~x

~x− ~xS

~n (~xS)

~t(~x)

P (~x, ω) P ′(~x, ω) 6= 0

Sound source

Figure 2.2: Vectors, volume, and surface, relevant for the Kirchhoff-Helmholtz integral

where ∂P (~xS,ω)∂~n(~xS) = 〈∇~xP (~xS, ω) , ~n (~xS)〉. Evaluating (2.50) for various ~x leads to

P ′(~x, ω) =

P (~x, ω) for ~x ∈ V ,12P (~x, ω) for ~x ∈ S,0 otherwise.

(2.51)

An illustration of the quantities considered in (2.50) is shown in Fig. 2.2, where a pointsource excites an exemplary wave field. Three of the shown vectors, namely ~xS, ~n (~xS)and the relative position of the considered point (~x−~xS), are considered in (2.50). Besidesthose, the traveling direction ~t(~x) (according to (2.14)) is shown.

The term ∂P (~xS,ω)∂~n(~xS) in (2.50) determines amplitude and phase of a monopole contribu-

tion to the wave field at ~x, which is described by G(~x|~xS,∼k). Complementarily, P (~xS, ω)

determines amplitude and phase for a dipole contribution, where ∂G(~x|~xS,∼k)

∂~n(~xS) describes theradiation pattern of a dipole oriented along ~n (~xS). Both, the monopole and the dipole con-tributions radiate to the inside and to the outside of S simultaneously. While G(~x|~xS,

∼k)

describes a radiation to both sides with the same phase, ∂G(~x|~xS,∼k)

∂~n(~xS) radiates waves with aphase shift of π to both sides. Still, (2.51) states that there is no sound pressure outsideof V . This implies that the ratio of ∂P (~xS,ω)

∂~n(~xS) and P (~xS, ω) is such that cancellation occurson the outside.

As it can be seen from(2.14), the traveling direction is determined by the sound pressureand its gradient. Accordingly, there is a linear relation of ∂P (~xS,ω)

∂~n(~xS) and P (~xS, ω), which is

20

governed by the component of the traveling direction in direction of ~n (~xS) and given by

−j∼k⟨~n (~xS) ,~t(~xS)

⟩· P (~x, ω) = ∂P (~xS, ω)

∂~n (~xS) . (2.52)

Similarly, it is possible to determine the component of ~x− ~xS that points in direction of~n (~xS), by considering the relation of G(~x|~xS,

∼k) and ∂G(~x|~xS,

∼k)

∂~n(~xS) . For the free-field Green’sfunction (2.35) the following equation holds

∂G(~x|~xS,∼k)

∂~n (~xS) = 〈~n (~xS) , ~x− ~xS〉‖~x− ~xS‖2

(jk + 1

‖~x− ~xS‖2

)·G(~x|~xS,

∼k). (2.53)

The sign of the scalar products⟨~n (~xS) ,~t(~xS)

⟩and 〈~n (~xS) , ~x− ~xS〉 determines whether

there is a phase shift by π for the terms G(~x|~xS,∼k)∂P (~xS,ω)

∂~n(~xS) and P (~xS, ω)∂G(~x|~xS,∼k)

∂~n(~xS) . Whenevaluating (2.50), the phase information of both terms determines whether there is con-structive or destructive interference at ~x, i. e., if cancellation occurs.

A change of the direction of ~n (~xS), will only change the sign of the resulting soundpressure, but not change the region of excitation. Hence, ~n (~xS) does not necessarilydescribe the direction of radiation with respect to the surface normal. Instead, it servesas a reference for the traveling direction of the wave ~t(~xS) and the relative observationposition given by ~x−~xS, where the latter two actually determine the direction of radiationwith respect to the surface normal.

The relation of the considered vectors can be explained using an example as shown inFig. 2.3, where P (~x, ω) describes a plane wave, which is excited within a cuboid describedby V only by imposing boundary conditions on S. The cuboid is assumed to have aninfinite extension in y-direction and z-direction and a finite extension in x-direction. Forthe illustrated wave field, only two faces of the cuboid are relevant, a first one where thescalar product

⟨~n (~xS) ,~t(~xS)

⟩is positive and another where it is negative. Both faces

radiate only in direction of ~t(~xS) due to the boundary conditions imposed by the travelingwave. This leads to the following situation at the positions ~x1,~x2, and ~x3:

1. At ~x1, there is no contribution to the wave field by any of the two faces. A monopoleor dipole contribution to waves traveling against ~t(~x) is canceled by other contribu-tions of the same surface. This is determined by

⟨~n (~x) ,~t(~x)

⟩and 〈~n (~x) , ~x1 − ~x〉

having opposite signs for both surfaces (~x = ~xS, ~x′S).

2. At ~x2, there is only a contribution of the first face. This is determined by⟨~n (~x) ,~t(~x)

⟩and 〈~n (~x) , ~x2 − ~x〉 having a positive sign at the first face (~x = ~xS) and the oppositesigns on the second face (~x = ~x ′S).

3. At ~x3, there is a contribution of both faces, where contribution of the first face iscanceled by the contribution of the second face. This is determined by

⟨~n (~x) ,~t(~x)

⟩and 〈~n (~x) , ~x3 − ~x〉 having a positive sign at the first face (~x = ~xS) and a negativeon the second face (~x = ~x ′S).


~t(~x)

S S

V

1st face 2nd face

~n (~xS) ~n (~x ′S)

~0

~xS

~x ′S

~x1

~x2

~x3

x

y

z

Figure 2.3: Radiation of the surface S when a plane wave is excited in V

Finally, it follows from (2.51) that there is no sound pressure outside V ,i. e., P (~x1, ω) =P (~x3, ω) = 0.

Still, when ~n (~xS) is pointing to the outside of V and P (~x, ω) describes a wave fieldof sources exclusively located inside V ((2.7) with QD(~x, ω) = QM(~x, ω) = 0 ∀ ~x /∈ V ),(2.50) can be used to describe the wave field excited in the source-free outside of V . Thisimplies that once the sound pressure P (~xS, ω) and its derivative ∂P (~xS,ω)

∂~n(~xS) can be controlledon S, the wave field radiated from the surface into the enclosed volume and the wave fieldradiated toward the outside can both be controlled independently.

2.1.5 The image source modelThe solutions of the wave equation presented in Sec. 2.1.3 were obtained assuming free-field conditions. However, in this thesis the wave propagation in enclosures is the primaryconsidered scenario, where the reflection of waves on a plane (or wall) plays a dominantrole. The image source model is a well-known approach to incorporate such boundaryconditions when solving problems in various areas [MF53], including the calculation ofroom impulse responses [AB79]. In this thesis, the consideration of infinitely extendedplanes is a sufficient model, so a vector ~n(pw) and the distance d(pw) of the plane tothe origin are again sufficient to describe this plane. The original sound source is thenmirrored on this plane, as illustrated in Fig. 2.4, where ~x0 is the source position and ~x ′0is its mirrored counterpart according to

~x ′0 = ~x0 − 2~n(pw)(d(pw) +⟨~x0 , ~n

(pw)⟩). (2.54)

22

x

y

~n(pw)

Source~x0

~x ′0Mirrored source

d(pw)

Figure 2.4: Illustration of the image source model. The vector ~n(pw) has been shifted suchthat is does not interfere with the distance measured by d(pw)

The resulting Green’s function for this model is described by

G′(~x|~x0, ω) = 14π

e−j∼k‖~x−~x0‖2

‖~x− ~x0‖2+R

14π

e−j∼k‖~x−~x ′0‖2

‖~x− ~x ′0‖2, (2.55)

where −1 ≤ R ≤ 1 defines the reflection factor of the plane. The validity of (2.55) isrestricted to the side of the plane including the origin of the coordinate system, where~x0 must also be on this side. The boundary conditions for rectangular rooms can bedescribed by a superposition of the second term in (2.55) for each wall. Considering onlythe mirror images of the one source located at ~x0, this describes a first order image sourcemodel. When considering at least two planes, a second order model can be described, bymirroring sources at the respective positions ~x ′0 again, only considering the plane surfacesfacing the origin of the coordinate system and the position ~x ′0 simultaneously. Higher-ordermodels can be described by continuing the procedure straightforwardly. For rectangularrooms, it is sufficient to disregard any source mirrored on an already occupied mirrorsource position, for other room shapes, more sophisticated geometrical considerations arenecessary [Bor84].

2.2 The Spatial Fourier Transform and Wave FieldDecompositions

In this section, three wave field decompositions are related to the spatial Fourier trans-form. Since the wave-domain transforms introduced later in Sec. 2.3 are based on thesewave field decompositions, they can also be interpreted as special cases of the spatialFourier transform. Hence, this section provides a link between the wave-domain LEMSmodel considered in this thesis and more general mathematical concepts. A derivation

2.2 The Spatial Fourier Transform and Wave Field Decompositions 23

of the wave-field decompositions can be found in [Kun09], which is summarized here andpresented in the notation used in this thesis.

Within a source-free volume V , i. e., QD(~x, ω) = ~0, QM(~x, ω) = 0 ∀ ~x ∈ V , any wavefield can be described by a weighted superposition of functions according to (2.11), (2.16),or (2.24). As the complex harmonics in (2.11), (2.16), and (2.24) already suggest,these descriptions are closely related to the three-dimensional spatial Fourier transform[Kun09, Wil99]. In Sec. 2.2.1, the plane wave decomposition is related to the latter toprovide an accessible example of a wave field decomposition. Later, the cylindrical andspherical harmonics decomposition are described in Sections 2.2.2 and 2.2.3, respectively,which will be used to derive the wave-domain transforms in Sec. 2.4 and Appendix A.

2.2.1 Plane wave decompositionIn Cartesian coordinates, the three-dimensional spatial Fourier transform of a wave fieldand its inverse by are given by

F (pw)(∼kx,

∼ky,

∼kz, ω) =

∫ ∞−∞

∫ ∞−∞

∫ ∞−∞

P (~x, ω)ej(∼kxx+

∼kyy+

∼kzz

)dx dy dz, (2.56)

P (~x, ω) = 1(2π)3

∫ ∞−∞

∫ ∞−∞

∫ ∞−∞

F (pw)(∼kx,

∼ky,

∼kz, ω)

· e−j(∼kxx+

∼kyy+

∼kzz

)d

∼kx d

∼ky d

∼kz, (2.57)

respectively, noting that e−j(∼kxx+

∼kyy+

∼kzz

)describes the same function as (2.11). Applying

(2.56) to a wave field as described in (2.11) leads to

∫ ∞−∞

∫ ∞−∞

∫ ∞−∞

e−j(∼kx′x+

∼ky ′y+

∼kz ′z

)ej

(∼kxx+

∼kyy+

∼kzz

)dx dy dz = δ3

(∼k−

∼k ′)

(2.58)

with∼k ′ =

(∼kx′,

∼ky′,

∼kz′)T

. As all solutions of (2.7) for source-free volumes fulfill (2.12),F (pw)(

∼kx,

∼ky,

∼kz, ω) may only be non-zero at those

∼kx,

∼ky, and

∼kz where

∼k is located on a

spherical surface. At the same time, (2.57) describes an integral over a volume insteadof a surface, such that the combination of both facts would imply a result of zero ifF (pw)(

∼kx,

∼ky,

∼kz, ω) only assumed finite values. To obtain a non-trivial solution for the

sound pressure, F (pw)(∼kx,

∼ky,

∼kz, ω) must represents a distribution rather than a function:

F (pw)(∼kx,

∼ky,

∼kz, ω) ∝ δ1

(∼k2 −

∥∥∥∼k∥∥∥2

2

). (2.59)

Consequently, a wave field description as a function of four continuous variables, includingthe temporal frequency ω, is redundant and can be simplified. As (2.12) must be fulfilledand

∼k is given by (2.9), the set of

∼kx,

∼ky, and

∼kz can be represented by two independent

24

continuous variables and an additional sign. This is equivalent to restricting the consider-ations to an arbitrary two-dimensional plane within the three-dimensional space. In thisthesis, a projection on the x-y-plane was chosen.

Using∼kz = ±

√∼k2 −

∼kx2 −

∼ky2, (2.56) can be projected onto the x-y-plane by (cf.

[Kun09])

C(pw)(∼kx,

∼ky, ω) = 1

2π

∫ ∞−∞

F (pw)(∼kx,

∼ky,

∼kz, ω)e−j

∼kzz d

∼kz L(

∼kx,

∼ky, z), (2.60)

where

L(∼kx,

∼ky, z) =

2ja(∼kx,

∼ky) sin

(z√

∼k2 −

∼kx2 −

∼ky2

)+ e

−jz√

∼k2−

∼kx2−

∼ky2

(2.61)

describes the information lost by integrating over∼kz and where all a(

∼kx,

∼ky) must be chosen

such that the z-dependency of L(∼kx,

∼ky, z) vanishes. The real-valued function a(

∼kx,

∼ky) is

bounded between 0 and 1 and it describes the amplitude relations of the plane wavestraveling in positive or negative z-direction. Similarly, the inverse is obtained from (2.57):

F (pw)(∼kx,

∼ky,

∼kz, ω) =

∫ ∞−∞

C(pw)(∼kx,

∼ky, ω)

L(∼kx,

∼ky, z)

ej∼kzz dz. (2.62)

However, only z = 0 is considered in the following and all information about a(∼kx,

∼ky)

is lost. If necessary, this information can be recovered when also considering the soundvelocity or the gradient in z-direction of P (~x, ω) in the x-y-plane, which would allow fordetermining the traveling direction of the plane waves (cf. (2.14)). However, this is beyondthe scope of this thesis.

With z = 0 leading to L(∼kx,

∼ky) = 1, the wave field description C(pw)(

∼kx,

∼ky, ω) can be

directly related to the sound pressure P (~x, ω) by

C(pw)(∼kx,

∼ky, ω) =

∫ ∞−∞

∫ ∞−∞

P (~x, ω)|z=0 ej∼kxxej

∼kyy dx dy, (2.63)

P (~x, ω)|z=0 = 1(2π)2

∫ ∞−∞

∫ ∞−∞

C(pw)(∼kx,

∼ky, ω)e−j

∼kxxe−j

∼kyy d

∼kx d

∼ky, (2.64)

representing the transform pair of a two-dimensional Fourier transform, known as planewave decomposition [Wil99].


2.2.2 Cylindrical harmonics decompositionWhen using cylindrical coordinates, wave fields are 2π-periodic with respect to α, allowingfor the following Fourier expansion

P m(%, z, ω) = 12π

∫ 2π

0P (~x, ω)e−jmα dα, (2.65)

P (~x, ω) =∞∑

m=−∞P m(%, z, ω)ejmα, (2.66)

which is necessary for the derivation below. In (2.65) and (2.66) P m(%, z, ω) representsthe according Fourier series coefficients. This leads to the representation of (2.56) by[Kun09]

F (cy)(∼kα,

∼k%,

∼kz, ω) =

∞∑m=−∞

∫ ∞−∞

∫ ∞0

P m(%, z, ω)2πjmJm

(∣∣∣∼k%%∣∣∣) % d% ej∼kzz dz ejm

∼kα , (2.67)

where∼kα represents the spatial frequency with respect to α, and where

∼k2 =

∼k%

2 +∼kz

2

is fulfilled. However, in cylindrical coordinates a description using an integer argumentinstead of the continuous

∼kα is desired to account for the 2π-periodicity mentioned above.

The Fourier series expansion

F(cy)m′ (

∼k%,

∼kz, ω) = 1

2π

∫ 2π

0F (cy)(

∼kα,

∼k%,

∼kz, ω)e−jm′

∼kα d

∼kα (2.68)

can be used to achieve that. The variable m′ is only used to maintain consistency andF

(cy)m (

∼k%,

∼kz, ω) will be regarded to be dependent on m, noting that

12π

∫ 2π

0ej(m−m

′)∼kα d

∼kα =

{1 for m′ = m,

0 else, (2.69)

can be used to eliminate the sum that is introduced when plugging (2.67) into (2.68).The variable m will be referred to as mode number in the following. As F (cy)

m (∼k%,

∼kz, ω) is

also dependent on three variables in addition to the temporal frequency, it is redundantin the same way as F (cy)(

∼kα,

∼k%,

∼kz, ω) or F (pw)(

∼kx,

∼ky,

∼kz, ω) (see redundancy discussion

above). This redundancy can be removed by a projection of the wave field on a cylindersurface with radius %0, which can be facilitated by an inverse transform with respect to∼k% and the multiplication by a term canceling out the dependency on Jm

(∣∣∣∣√∼k2 −

∼kz2%

∣∣∣∣)resulting from the evaluation of the integral:

C(cy)m (

∼kz, ω) = 1

Jm(∣∣∣∣√∼

k2 −∼kz2%0

∣∣∣∣)∫ ∞

0F

(cy)m (

∼k%,

∼kz, ω)j

m

2πJm(∣∣∣∼k%%0

∣∣∣) ∼k% d

∼k%, (2.70)

which is only valid if Jm(∣∣∣∼k%%0

∣∣∣) 6= 0, precluding a meaningful evaluation for %0 = 0as Jm (|0|) = 0, ∀ m 6= 0. Further, there is no loss of information with respect to the

26

traveling direction of waves. This is because (2.67) assumes free-field conditions, whichimplies that every incoming wave with respect to the origin (i. e., (2.16) with positive

∼k%)

results in an outgoing wave (i. e., (2.16) with negative∼k%). This allows only for solutions

according to (2.21). It should be noted that this boundary condition at % = 0 is imposedby the choice of the coordinate system and does not imply any object at the origin.

At this point it is also possible to incorporate the boundary conditions on this surfaceby defining

Bm(∼k%%0

)= Jm

(∣∣∣∼k%%0

∣∣∣) (2.71)

for the free field, and

Bm(∼k%%0

)= Jm

(∣∣∣∼k%%0

∣∣∣)− J ′m(∣∣∣∼k%%0

∣∣∣)Hm

(∼k%%0

)H′m

(∼k%%0

) (2.72)

for an acoustically rigid cylindrical scatterer with a radius equal to %0. Here, J ′m (|x|)and H′m (x) denote the derivatives of Jm (|x|) and Hm (x) with respect to x, respectively[Teu07] and

∼k% =

√∼k2 −

∼kz2 was used for convenience. This results in the so-called

cylindrical harmonics decomposition and its inverse given by

C(cy)m (

∼kz, ω) = 1

2πBm(∼k%%0

) ∫ ∞−∞

∫ 2π

0P (~x, ω)|%=%0e

−jmαej∼kzz dα dz, (2.73)

P (~x, ω) =∞∑

m=−∞

∫ ∞−∞

Bm(∼k%%0

)2π C

(cy)m (

∼kz, ω)e−jmαe−j

∼kzz d

∼kz, (2.74)

respectively. Comparing (2.74) with (2.16) and (2.21) reveals that Bm(∼k%%0

)describes the

superposition of wave-field components traveling in positive and negative %-direction at%0, where those components are in a fixed relation as no energy is injected or dissipated for% ≤ %0. This is different from the plane wave decomposition, where L(

∼kx,

∼ky, z) describes

the ambiguity due to missing information about the wave traveling direction normal tothe considered plane.

If a wave field is independent on the z-coordinate, the following relation holds for thedecomposition of this wave field:

P (~x, ω) ∝ δ1(∼kz). (2.75)

For the description of such wave fields, the circular harmonics decomposition and its in-verse can be used, which are given by


C(ci)m (

∼kz, ω) = 1

2πBm(∼k%%0

) ∫ 2π

0P (~x, ω)|%=%0e

−jmα dα, (2.76)

P (~x, ω)|%=%0 =∞∑

m=−∞

Bm(∼k%%0

)2π C

(cy)m (

∼kz, ω)e−jmα. (2.77)

2.2.3 Spherical harmonics decompositionIn spherical coordinates, a wave field can be expressed in terms of spherical harmonicsaccording to

P(sp)m,n(r, ω) =

∫ 2π

0

∫ π

0P (~x, ω)

(Y mn (ϑ, α)

)∗sin(ϑ) dϑ dα, (2.78)

P (~x, ω) =∞∑n=0

n∑m=−n

P(sp)m,n(r, ω)Y m

n (ϑ, α). (2.79)

This allows for writing the three-dimensional spatial Fourier transform (2.56) as

F (sp)(∼kα,

∼kϑ,

∼kr, ω) =

∞∑n=0

n∑m=−n

∫ ∞0

P(sp)m,n(r, ω)4π

jnjn(∣∣∣∼kr∣∣∣) r2 drY m

n (∼kϑ,

∼kα), (2.80)

with∼kx =

∼k cos(

∼kα) sin(

∼kϑ),

∼ky =

∼k sin(

∼kα) sin(

∼kϑ), and

∼kz =

∼k cos(

∼kϑ). Similar to the

derivation for cylindrical coordinates, a Fourier series with respect to∼kα and

∼kϑ can be

used to obtain as intermediate result

F(sp)n′,m′,m(

∼kr, ω) =

∫ 2π

0

∫ π

0F (sp)(

∼kα,

∼kϑ,

∼kr, ω)

(Y m′

n′ (∼kϑ,

∼kα)

)∗sin(ϑ) d

∼kϑ d

∼kα, (2.81)

where the spherical harmonics of order m and degree n, Y mn (ϑ, α), as basis functions,

exhibit the following orthogonality property:∫ 2π

0

∫ π

0Y mn (

∼kϑ,

∼kα)

(Y m′

n′ (∼kϑ,

∼kα)

)∗d

∼kϑ d

∼kα =

{1 for m′ = m ∩ n′ = n,

0 elsewhere. (2.82)

Replacing n′ by n and m′ by m, applying an inverse transform, and multiplying a termto cancel out the r-dependency resulting from the evaluation of the integral, a sphericalharmonics description in terms of (2.29) can be obtained:

C(sp)m,n(ω) = 1

jn(∣∣∣∼kr0

∣∣∣)∫ ∞

0F

(sp)n,m(

∼kr, ω) j

n

2π2 jn(∣∣∣∼krr0

∣∣∣) ∼kr

2 d∼kr, (2.83)

where a spherical surface with radius r0 is considered and jn (|x|) is the spherical Besselfunction of the first kind and order n. To consider different boundary conditions, jn

(∣∣∣∼kr0

∣∣∣)can be replaced by

bn(∼kr)

= jn(∣∣∣∼kr0

∣∣∣) (2.84)

28

in the free-field case, and by

bn(∼kr)

= jm(∣∣∣∼kr0

∣∣∣)− j ′n(∣∣∣∼kr0

∣∣∣)hn(∼kr0

)h ′n(∼kr0

) , (2.85)

when considering a spherical scatterer of radius r0. In (2.85), j ′n (|x|) and h ′n (x) denotethe derivatives of jn (|x|) and hn (x) with respect to x, respectively [Teu07].

Finally, the spherical harmonics decomposition and its inverse can be defined as

C(sp)m,n(ω) = 1

bn(∼kr) ∫ 2π

0

∫ π

0P (~x, ω)|r=r0

(Y mn (ϑ, α)

)∗sin(ϑ) dϑ dα (2.86)

P (~x, ω) =∞∑n=0

bn(∼kr) n∑m=−n

C(sp)m,n(ω)Y m

n (ϑ, α), (2.87)

respectively, where bn(∼kr)

plays the same role as Bm(∼k%%0

)in (2.74).

2.2.4 Dimensionality and degrees of freedom of wave-fielddecompositions

In this section, some remarks on the dimensionality of the described wave-field decompo-sitions are given, as this term can be interpreted differently.

In general, the sound pressure P (~x, ω) is dependent on four quantities, three coor-dinates and the temporal angular frequency ω. Therefore, it can be referred to as a3 + 1-dimensional signal representation [Kun09], mapping four scalar quantities from itsdomain to a single scalar quantity in its codomain. As such, the sound pressure P (~x, ω)can describe any wave-field without restrictions, which includes solutions of the homoge-neous and the inhomogeneous wave equation. The same holds for the three-dimensionalFourier transform of the sound pressure given by (2.56), as it also maps four scalarquantities to a single scalar quantity.

The three wave-field decompositions derived in Sections 2.2.1 to 2.2.3 describe the soundpressure on a surface, instead of a volume. Hence, these represent 2 + 1-dimensional sig-nal representations [Kun09] mapping three scalar quantities onto a single scalar quantity.When considering a source-free volume that includes the surface used to obtain the wave-field decomposition, it is possible to exactly extrapolate the sound pressure in this volumefrom these wave-field decompositions, up to the ambiguity that is inherent to the planewave decomposition (see Sec. 2.2.1). Note that the property of exact extrapolation wasalready implicitly assumed for the derivation of the cylindrical harmonics decomposi-tion and the spherical harmonics decomposition in Sections 2.2.2 and 2.2.3, respectively.Finally, it can be concluded that solutions of the homogeneous wave equation are effec-tively only 2 + 1-dimensional signal representations, since the wave-field decompositions


discussed above can describe any of those solutions. This finding can be easily verified con-sidering (2.11), (2.16), and (2.24): When disregarding the position ~x, P (pw)(~x,

∼kx,

∼ky,

∼kz)

and P(cy)m (~x,

∼k%,

∼kz) are dependent on three scalars, while (2.12) and (2.17), respectively,

imply that only two scalars are actually independent. In the same way, P (sp)m,n(~x,

∼k) is only

dependent on two scalar, when considering that the wave number∼k is determined by the

frequency ω.Still, any of the considered wave-field representations can, in theory, carry an infinite

amount of information, independently of the dimensionality described above. This isbecause there are always infinitely many scalar values in the domain of the wave fieldrepresentations: coordinates (within a finite volume) and wave numbers may be boundedbut they are in general not discretized, while mode orders are discrete integer valuesbut not bounded. In practice, only a limited accuracy for the description of a wavecan be achieved, while a perfect description is typically not necessary anyway. Whenconsidering a spherical harmonic decomposition of a temporally band-limited wave fieldwithin a limited source free volume, only a limited number of modes will be significantlyexcited [KSAJ07]. This limits the number of degrees of freedom needed to describe thewave field with a given accuracy. The situation is similar for a wave-field description byplane waves and cylindrical harmonics. However, for those wave-field decompositions, thesound pressure is evaluated on an infinitely extended plane, which contradicts consideringa finite volume. The latter implies considering only a finite section of the respectiveplane, which can be described as windowing with a rectangular window. Analogouslyto the Fourier transform, this windowing is represented by convolution with the sincfunction in the transform domain. In the case of the plane wave decomposition (asdescribed in Sec. 2.2.1), this convolution is along the

∼kx-axis and the

∼ky-axis, while it is only

along the∼kz-axis for the cylindrical harmonics decomposition. For both decompositions,

the convolution will smear the spectrum of a wave field in the respective wave numberdomain, such that spectral peaks are no longer sharp. Thus, sampling the wave fielddecomposition at discrete wave numbers is sufficient to describe the wave field with agiven accuracy. However, spatial sampling will lead to spatial aliasing, which is a well-understood phenomenon in sound field reproduction [SR06, SA08].

The time-domain sound pressure p(~x, t) is a real-valued quantity (p(~x, t) ∈ R), whileits temporal frequency-domain representation P (~x, ω) is complex-valued (P (~x, ω) ∈ C).As a complex-valued number can be mapped to two real-valued numbers, its real andits imaginary part, there is a redundancy in P (~x, ω). This redundancy is expressed byP (~x, ω) = P ∗(~x,−ω), which constitutes the well-known symmetry of one-dimensionalFourier transforms of real-valued functions. Hence the negative temporal frequency axiscan be determined from the values obtained for the positive frequency axis and vice versa.The same holds for the presented wave-field decompositions, where this scheme extends,moreover, also for negative wave numbers and mode orders, which is in accordance to thestatements made in Sec. 2.1.2. Still, mapping a real-valued quantity to a complex-valuedquantity only causes redundancy by a factor of two. This implies that the negative orpositive half-space with respect to one dimension of a transform domain can be recon-

30

structed from the respective counterpart. It is, e. g., possible to either reconstruct thenegative

∼kx-axis from the positive

∼kx-axis, or the negative ω-axis from the positive ω-axis,

but not both at the same time.

2.3 Wave Fields in AcousticMultiple-Input/Multiple-Output Systems

In this section, the modeling of an LEMS as a volume capturing a wave field is explained.To this end, the role of the microphones and loudspeakers for spatial sampling is explainedin Sec. 2.3.1 without considering an enclosure. The influence of the enclosure is describedin Sec. 2.3.2, where the use for wave-field descriptions by fundamental solutions of thewave equation is motivated. This leads to a wave-domain LEMS model as described inSec. 2.3.3. In Sec. 2.3.4, a short explanation of WFS is presented, as an example ofa reproduction technique which uses a large number of reproduction channels. Finally,the task of pre-equalizing the loudspeaker signals to remove the room influence at thelistener’s position is described in Sec. 2.3.5.

2.3.1 Spatial sampling by loudspeakers and microphonesIn this section, the some consequences of spatial sampling by the transducers are dis-cussed. This discussion does not cover spatial aliasing, which is treated extensively in theliterature [Spo05, SR06]. First, the array setups under consideration are defined, wheretwo examples are shown in Fig. 2.5. The first example shown in Fig. 2.5(a) comprisesconcentric uniform circular arrays (UCAs) for loudspeakers and microphones. Given anumber NL of loudspeakers for a UCA, the positions of the individual loudspeakers aredefined in Cartesian coordinates by

~p(L)λ

=

x

(L)λ

y(L)λ

z(L)λ

=

RL cos

(2π λ−1

NL

)RL sin

(2π λ−1

NL

)0

, (2.88)

where it is assumed that all considered arrays are located in the x-y-plane and λ =1, 2, . . . , NL is the loudspeaker index. This representation in Cartesian coordinates canbe described in other coordinates considering (2.1), where the components %(L)

λ, α(L)

λ, r(L)

λ,

and ϑ(L)λ

of the loudspeaker positions correspond to %, α, r, and ϑ, respectively, in theappropriate coordinate systems. In the same way, the positions of the NM microphoneson the UCA are defined as

~p(M)µ =

x

(M)µ

y(M)µ

z(M)µ

=

RM cos

(2π µ−1

NM

)RM sin

(2π µ−1

NM

)0

, (2.89)

2.3 Wave Fields in Acoustic Multiple-Input/Multiple-Output Systems 31

x

y(a)

RL

RM %

αx

y(b)

RM %

α

Figure 2.5: Exemplary array setups comprising a circular microphone array and a circularor a rectangular loudspeaker array

where µ = 1, 2, . . . , NM is the microphone index. Again, descriptions in other coordinatesystems follow from (2.1), where the components %(M)

µ , α(M)µ , r(M)

µ , and ϑ(M)µ of the micro-

phone positions correspond to %, α, r, and ϑ, respectively. In this thesis, only UCAs ofmicrophones are considered, while the loudspeaker array can be chosen in various ways.However, it is always assumed that the microphone array is located inside the loudspeakerarray, but the centers of the two arrays do not necessarily coinside. An example with aquadratic loudspeaker array is shown in Fig. 2.5(b), where the loudspeaker positions arenot defined according to (2.88), while (2.89) still describes the microphone positions. Notethat these configurations only represent a subset of possible configurations for WDAF.

The wave field inside the LEMS is excited by the loudspeaker signals Xλ(ω) such that

the following source distribution results for ideal point-like omnidirectional loudspeakers:

QL(~x, ω) =

NL∑λ=1

X λ(ω)δ3(~x− ~p (L)

λ

). (2.90)

The resulting wave field is then described by (2.32). As only NL loudspeaker signals canbe defined, it is possible to control the resulting sound pressure independently only at NL

distinct positions. Similarly, when considering a wave field decomposition as describedin Sec. 2.2, it is only possible to control NL wave field components independently. Thus,it is possible to obtain a wave-domain description

∼X

l(ω) consisting of NL components

(indexed by l), carrying all information provided by the loudspeaker signals. A derivationof the necessary transforms is presented later in Sec. 2.4, as it is not necessary for theconsiderations following immediately.

Any wave field excited by the loudspeakers or any wave field originating from a lo-cal acoustic scene within the LEMS is recorded by the microphones at their respective

32

positions. The microphone signals in the continuous frequency domain are defined by

D µ(ω) =y

VQ

P (~xV, ω)δ3(~xV − ~p (M)

µ

)dV ′Q. (2.91)

Like for the loudspeaker signals, it is possible to obtain a wave-domain description of themicrophone signals, where the number of microphones limits the number of observableindependent spatial components to NM.

2.3.2 Description of loudspeaker-enclosure-microphone systems inthe wave domain

In this section, the relation of the excited and the measured wave field inside an enclosureis discussed. For simplicity, the influence of noise and other acoustic sources in the LEMSis disregarded. This relation constitutes a prerequisite for the wave-domain LEMS modelthat is derived in Sec. 2.3.3. First, a simple one-dimensional LEMS is discussed to providea concise example of the physical model, which is then generalized to three dimensions.The resulting description constitutes a pre-requisite for the actual wave-domain LEMSmodel that is derived in Sec. 2.3.3.

One-dimensional LEMS

A conventional LEMS model in the point-to-point domain describes the dependence of asingle microphone signal D(ω) on a single loudspeaker signal X(ω) in terms of a frequencyresponse

H(ω) =D(ω)X(ω)

. (2.92)

Even when assuming ideal omnidirectional transducers, this allows only for little insightinto the wave field excited in an LEMS and no spatial properties can be localized by H(ω).Insight in both is desirable, as the spatial properties of the wave field excited within anLEMS are important for listening room equalization (LRE) and they will later be exploitedto formulate appropriate approximative models and to improve system identification forMIMO systems. Such an insight can be obtained, when considering the geometry of theLEMS.

In Fig. 2.6, an example of an LEMS is shown, which comprises a single loudspeaker anda single microphone that are coupled to a one-dimensional wave field. The loudspeakeris located at x(L), while the microphone is located at x(M) and the enclosure is defined bythe walls located at x(WL) and x(WM), respectively. The sound pressure P (x, ω) describesthe wave field in the LEMS that is excited by the loudspeaker and measured by themicrophone.

Frequency responses are used to describe the relation of the sound pressure at variouspositions in the LEMS in the following. For this description, the wave field is separated


P (x, ω)

x(WL) x(L)

∼P (LB)(ω)

∼P (LF)(ω)

∼H(L)(ω)

∼H(F)(ω)

∼H(B)(ω)

∼P (F)(x(M), ω)

∼P (B)(x(M), ω)

x(M)

∼H(M)(ω)

x(WM) x

Figure 2.6: One-dimensional exemplary LEMS comprising a single loudspeaker and a sin-gle microphone and two walls

into a forward-traveling wave∼P (F)(x, ω) and a backward-traveling wave field component

∼P (B)(x, ω), which are the only fundamental solutions of the wave equation in one dimen-sion, (see Sec. 2.1.2). Hence, the following considerations can already be interpreted asthe most simple example of a wave-domain description.

The concept of forward and backward traveling waves is well-known in communications,where it is often used to describe the properties of transmission lines. Moreover, thisconcept has already been introduced in signal processing by wave digital filters [Fet86].

Both wave field components,∼P (F)(x, ω) and

∼P (B)(x, ω), are directly related to the sound

pressure and its derivative with respect to x:

P (x, ω) =∼P (B)(x, ω) +

∼P (F)(x, ω), (2.93)

∂

∂xP (x, ω) = j

∼k( ∼P (B)(x, ω)−

∼P (F)(x, ω)

), (2.94)

∼P (F)(x, ω) = 1

2

P (x, ω)− 1j

∼k

∂

∂xP (x, ω)

, (2.95)

∼P (B)(x, ω) = 1

2

P (x, ω) + 1j

∼k

∂

∂xP (x, ω)

. (2.96)

Equations (2.93) to (2.96) are only valid for solutions of the homogeneous wave equation,which excludes an evaluation at the loudspeaker position. The following derivation isformulated with respect to the microphone position x(M) without loss of generality.

The contribution of the loudspeaker to the wave field can also be decomposed into aforward and a backward contribution,

∼P (LF)(ω) and

∼P (LB)(ω), respectively. Superimposing

the contributions of the loudspeaker and the homogeneous component of the wave field,the forward-traveling wave at the microphone position is described by

∼P (F)(x(M), ω) =

∼H(F)(ω)

( ∼P (LF)(ω) +

∼H(L)(ω)

( ∼P (LB)(ω) +

∼H(B)(ω)

∼P (B)(x(M), ω)

)),

(2.97)

34

where∼H(F)(ω) is the frequency response measured between the forward-traveling wave at

x(L) and the forward-traveling wave∼P (F)(x(M), ω) at the microphone position. Likewise,

the frequency response∼H(B)(ω) describes the relation between the backward traveling

wave at the microphone position∼P (B)(x(M), ω) and the backward traveling wave at the

loudspeaker position. The reflections at the wall near the loudspeaker is described bythe frequency response

∼H(L)(ω), which describes the coupling of the backward-traveling

wave to the forward traveling wave at the loudspeaker position. Likewise the backward-traveling wave is described by

∼P (B)(x(M), ω) =

∼H(M)(ω)

∼P (F)(x(M), ω), (2.98)

where the frequency response∼H(M)(ω) describes the reflection at the wall near the micro-

phone. Note that the considered model assumes free-field conditions between loudspeakerand microphone.

Inserting (2.98) in (2.97) and solving for∼P (F)(x(M), ω) leads to

∼P (F)(x(M), ω) =

∼H(F)(ω)

( ∼P (LF)(ω) +

∼H(L)(ω)

∼P (LB)(ω)

)1−

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

, (2.99)

which describes the wave field finally excited in the enclosure in dependence of bothloudspeaker contributions.

The coupling of the transducer signals to the individual wave field components will bedescribed in the following. In this simple example, the frequency responses F (L)(ω) andB(L)(ω) represent the directional pattern of the loudspeaker by describing the coupling ofthe loudspeaker signal to the forward and backward contribution of the loudspeaker:

∼P (LF)(ω) = F (L)(ω)X(ω), (2.100)∼P (LB)(ω) = B(L)(ω)X(ω). (2.101)

As a single microphone signal does not allow for a presentation of a forward and backwardtraveling wave, the frequency responses F (M)(ω) and B(M)(ω) complement the model forthe microphone signal:

D(ω) = F (M)(ω)∼P (F)(x(M), ω) +B(M)(ω)

∼P (B)(x(M), ω). (2.102)

Using (2.98) - (2.102), it is possible to decompose (2.92) into

H(ω) =D(ω)X(ω)

=(F (M)(ω) +B(M)(ω)

∼H(M)(ω)

)· 1

1−∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

·∼H(F)(ω)

(F (L)(ω) +

∼H(L)(ω)B(L)(ω)

). (2.103)


The frequency response H(ω) can be subdivided into the contributions of the individualentities captured by “loudspeaker-enclosure-microphone system”. The term∼H(F)(ω)

(F (L)(ω) +

∼H(L)(ω)B(L)(ω)

)describes the coupling of the loudspeaker signal to

the wave field in the enclosure. As the derivation was formulated with respect to themicrophone position, only the relative loudspeaker position is captured in

∼H(F)(ω). Ad-

ditionally, the loudspeaker position relative to the wall at x(WL) is captured in∼H(L)(ω).

The actual properties of the enclosure are captured in1/(1−

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

), which describes resonance frequencies at the

minima of∣∣∣1− ∼

H(F)(ω)∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

∣∣∣. The product∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω) describes a “round trip” of a wave being reflected at both

walls, after traveling from one wall to the other and back.The coupling of the wave field to the microphone signal is described by(F (M)(ω) +B(M)(ω)

∼H(M)(ω)

). The reflection of the wall described by

∼H(M)(ω) contributes

significantly to the overall frequency response of the microphone pickup.Observing only two signals, only one frequency response can be determined and there

is no way to distinguish between the eight individual frequency responses mentionedabove. The benefits from differentiating between the individual frequency responses willbecome evident later, when multiple dimensions are considered. Still, knowledge aboutthe transducer positions and the wave propagation between them can be used to reducethe number of unknowns. For example,

∼H(F)(ω) and

∼H(B)(ω) describe only a delay, as

there is no obstacle between loudspeaker and microphone. Additionally, reciprocity canbe used to deduce

∼H(F)(ω) =

∼H(B)(ω), while F (L)(ω), B(L)(ω), F (M)(ω), and B(M)(ω) may

be given due to known transducer properties. Another opportunity is to use a loudspeakerwhich is able to independently excite a forward and a backward traveling wave, using twosignals. This would represent a “higher-order” loudspeaker in this minimalistic exampleand could also be facilitated using two loudspeakers at nearby positions. Exciting the samesystem with two independent signals would then reveal additional information about it.The same holds for using a microphone that is able two distinguish a forward-travelingand a backward-traveling wave, providing two signals.

Example of an LEMS frequency response

An example of an LEMS frequency response can be obtained assuming ideal omnidirec-tional transducers with F (L)(ω) = B(L)(ω) = F (M)(ω) = B(M)(ω) = 1. This simplifies(2.103) to

H(ω) = (1 +∼H(M)(ω)) 1

1−∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

∼H(F)(ω)

(1 +

∼H(L)(ω)

).

(2.104)

The frequency responses describe the wave propagation by pure delays, e. g.∼H(F)(ω) =

∼H(B)(ω) = e−j

∼k(x(M)−x(L)), (2.105)

36

0 20 40 60 80 100 120 140 160 180 200−20

−10

0

10

frequency in Hz

20lo

g 10|·|

H(ω) 1/(1−

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)(1 +

∼H(M)(ω))

∼H(F)(ω)

( ∼H(L)(ω) + 1

)Figure 2.7: Logarithmic magnitude of frequency responses for an exemplary one-

dimensional LEMS

where the positions x(WL) = −5 m, x(WM) = 6 m, x(L) = −3.5 m, x(M) = 3.5 m werechosen. The walls cause an attenuation of the reflected wave, according to the reflectionfactor R = 0.9. With a speed of sound given by c = 344 m/s, (2.104) results in

H(ω) =D(ω)X(ω)

=(

1 +Re−j2(x(WM)−x(M))∼k)

· 1

1−R2e−j2(x(WM)−x(WL))∼ke−j(x(M)−x(L))∼k

(Rej2(x(WL)−x(L))∼k + 1

). (2.106)

The logarithmic magnitude responses of the overall frequency response and its compo-nents is shown in Fig. 2.7. It can be seen that the coupling of the loudspeaker signals tothe excited wave field causes distinct minima at approximately 57 Hz and 172 Hz. Thesame holds for the coupling of the wave field to the microphone signal with the minimaat approximately 34 Hz, 103 Hz, and 172 Hz. Still, both contributions do not cause pro-nounced maxima in H(ω). The maxima occur periodically at the resonance frequenciesof the enclosure given by multiples of

c

2(x(WM) − x(WL))= 15.6 Hz (2.107)

and can be attributed to the term 1/(

1−R2e−j2(x(WM)−x(WL))∼k

).

While the frequency responses∼H(L)(ω),

∼H(F)(ω),

∼H(B)(ω), and

∼H(M)(ω) only describe at-

tenuations and finite delays, 1/(1−

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)actually describes


0 20 40 60 80 100 120 140 160 180 200−20

−10

0

10

frequency in Hz

20log|·|

1st order 10th order25th order 50th order

Figure 2.8: Logarithmic magnitudes of frequency response for an exemplary one-dimensional LEMS obtained by an image source model of different order. Theshown curves approximate H(ω) in Fig. 2.7.

an infinite impulse response. As real room impulse responses are also of infinite length,this is an important finding for LRE, treated later in this thesis. There, a pre-equalizerfor the LEMS is determined such that cascade of the equalizer and the LEMS is equal toa desired impulse response, which could be

∼H(F)(ω) in the example, here. As shown in

Sec. 2.5.1, the optimal equalizer for the term 1/(1−

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)is

simply given by its inverse and describes a finite impulse response. On the other hand, theterms

(F (L)(ω) +

∼H(L)(ω)B(L)(ω)

)and (F (M)(ω) + B(M)(ω)

∼H(M)(ω)) cannot be perfectly

equalized by a finite-length filter.In Fig. 2.8, frequency responses of the same LEMS is shown, which have been obtained

using image source models of various orders. It can be clearly seen that the higher-ordermodels achieve very similar results compared to evaluating (2.106). Still, even with anorder 50 model, a slight difference to Fig. 2.7 is noticeable.

Three-dimensional LEMS described by solutions of the wave equation

The derivation shown above can be extended to three dimensions when consideringthe loudspeaker array and the microphone array as spatially sampled surfaces. A two-dimensional section of an exemplary three-dimensional LEMS is shown in Fig. 2.9, wherean array setup as defined by (2.88) and (2.89) is considered.

In the one-dimensional example discussed above, each wave field component had to passthe loudspeaker and the microphone position, which is no longer true in three dimensions.To achieve an equivalent description here, the surface capturing the loudspeaker array is

38

denoted by SL and chosen such that it encloses the surface capturing the microphone array,which is denoted by SM. For simplicity, two spherical surfaces centered at the coordinateorigin were chosen, which allows for a very convenient description of the considered wavefields in terms of spherical harmonics.

As described by (2.24), spherical harmonics are identified by the indexes m and n,rather than by continuous quantities as, e. g., plane waves (see

∼kx,

∼ky, and

∼kz in (2.11)).

Hence, a wave-domain system response based on spherical harmonics connects an infinitenumber of discretely indexed wave-field components. Fortunately, spherical harmonicsadditionally exhibit a modal band limitation, when only positions with a limited distanceto the origin are considered for a limited temporal frequency range [KSAJ07]. Since thetemporal frequency range of interest is limited, the wave field enclosed by SL can besufficiently well approximated by a finite number of spherical harmonics, which allows forusing a convenient matrix notation.

The following derivation could also be done using cylindrical harmonics if cylindricalsurfaces were considered. Using plane wave decomposition would require consideringplanes and, consequently, a different array setup. Since those wave field representationsconsider infinitely extended surfaces, the latter must be truncated according to the actualdimensions of the considered enclosure. As described in Sec. 2.2.4, this suggests a samplingof the wave field representation at appropriate wave numbers, which would again allowfor a matrix notation.

The wave-domain quantities described by the following considerations are shown inFig. 2.9, where spherical harmonics can be separated into components traveling towardsthe origin and away from the coordinate origin (cf. Sec. 2.1.2). This simplifies a mathe-matical description significantly, compared to a derivation based on the Kirchhoff-Helmholtzintegral.

The forward traveling wave field components at the inner surface SM are described by

∼p(F)(ω) =( ∼P

(F)1 (ω),

∼P

(F)2 (ω), . . . ,

∼P

(F)NC

(ω))T, (2.108)

where the NC individual wave field components∼P

(F)∼m

(ω) are indexed by ∼m. The index ∼

m

is obtained by a unique mapping of m and n in (2.24) according to

n =⌈√

∼m⌉− 1, (2.109)

m = ∼m− n2 − n− 1, (2.110)

∼m = m+ n2 + n+ 1, (2.111)

where d·e denotes the ceil operator. Only those NC modes are considered, which aresignificantly excited at the volume enclosed by the outer surface SL, i. e., ∼

m = 1, 2, . . . , NC

[KSAJ07]. The backward traveling wave field components at the inner surface SM arecaptured by the vector

∼p(B)(ω) =( ∼P

(B)1 (ω),

∼P

(B)2 (ω), . . . ,

∼P

(B)NC

(ω))T. (2.112)


∼p(LB)(ω) ∼p(LF)(ω)

∼H(L)(ω)

∼H(F)(ω)

∼H(B)(ω)

∼p(F)(ω)∼p(B)(ω)∼H(M)(ω)

SL

SM

Figure 2.9: Wave-domain exemplary LEMS

Note that in (2.24), forward and backward-traveling wave field components are identifiedby the “-” and “+” sign of the argument of hn

(∓

∼kr)

, respectively. The loudspeakercontributions at the outer surface SL are captured in

∼p(LF)(ω) =( ∼F

(L)1 (ω),

∼F

(L)2 (ω), . . . ,

∼F

(L)NC

(ω))T, (2.113)

∼p(LB)(ω) =( ∼B

(L)1 (ω),

∼B

(L)2 (ω), . . . ,

∼B

(L)NC

(ω))T, (2.114)

where the individual components∼F

(L)∼l

(ω) and∼B

(L)∼l

(ω) denote the forward and backward-traveling contributions of the loudspeakers and are indexed by

∼l. The assignment of

component indices to spherical harmonics is given by , when replacing ∼m by

∼l.

Like in (2.97), the forward-traveling wave field components at the inner surface can bedetermined by

∼p(F)(ω) =∼H(F)(ω)

(∼p(LF)(ω) +

∼H(L)(ω)

(∼p(LB)(ω) +

∼H(B)(ω)∼p(B)(ω)

)). (2.115)

40

The matrix∼H(F)(ω) describes the coupling of the wave field on the outer surface to the

wave field on the inner surface and is defined according to

∼H(F)(ω) =

∼H

(F)1,1(ω)

∼H

(F)1,2(ω) · · ·

∼H

(F)1,NC

(ω)∼H

(F)2,1(ω)

∼H

(F)2,2(ω) · · ·

∼H

(F)2,NC

(ω)... ... . . . ...

∼H

(F)NC,1(ω)

∼H

(F)NC,2(ω) · · ·

∼H

(F)NC,NC

(ω)

. (2.116)

The components∼H

(F)∼m,

∼l(ω) describe the coupling of mode

∼l on the outer surface to mode

∼m on the inner surface. The wave propagation in the opposite direction is described by

∼H(B)(ω) =

∼H

(B)1,1(ω)

∼H

(B)1,2(ω) · · ·

∼H

(B)1,NC

(ω)∼H

(B)2,1(ω)

∼H

(B)2,2(ω) · · ·

∼H

(B)2,NC

(ω)... ... . . . ...

∼H

(B)NC,1(ω)

∼H

(B)NC,2(ω) · · ·

∼H

(B)NC,NC

(ω)

, (2.117)

where the components∼H

(B)∼l,

∼m

(ω) describe the coupling of mode ∼m on the outer surface to

mode∼l on the inner surface. Likewise, the matrix

∼H(L)(ω) defined by

∼H(L)(ω) =

∼H

(L)1,1(ω)

∼H

(L)1,2(ω) · · ·

∼H

(L)1,NC

(ω)∼H

(L)2,1(ω)

∼H

(L)2,2(ω) · · ·

∼H

(L)2,NC

(ω)... ... . . . ...

∼H

(L)NC,1(ω)

∼H

(L)NC,2(ω) · · ·

∼H

(L)NC,NC

(ω)

(2.118)

where the individual components∼H

(L)∼l,∼l′(ω) describe the coupling of the backward and

forward traveling wave field components with the mode indices∼l′ and

∼l. This coupling

affects the components traveling back from the inner surface and the contribution of theloudspeakers to the backward traveling wave. The counterpart to

∼H(L)(ω) is given by

∼H(M)(ω) =

∼H

(M),1,1(ω)

∼H

(M),1,2(ω) · · ·

∼H

(M),1,NC

(ω)∼H

(M),2,1(ω)

∼H

(M),2,2(ω) · · ·

∼H

(M),2,NC

(ω)... ... . . . ...

∼H

(M),NC,1(ω)

∼H

(M),NC,2(ω) · · ·

∼H

(M),NC,NC

(ω)

(2.119)

with its components∼H

(M),∼m,

∼m′

(ω) describing the couplings of∼P

(F)∼m′

(ω) to∼P

(B)∼m

(ω). Thismatrix can be used to formulate the representation of (2.98), given by

∼p(B)(ω) =∼H(M)(ω)∼p(F)(ω). (2.120)

Actually, (2.120) can be interpreted as a reflection at the origin, as discussed below.Plugging (2.120) into (2.115) leads to

∼p(F)(ω) =∼H(F)(ω)

(∼p(LF)(ω) +

∼H(L)(ω)

(∼p(LB)(ω) +

∼H(B)(ω)

∼H(M)(ω)∼p(F)(ω)

))(2.121)


and finally to a representation of (2.99) in three dimensions

∼p(F)(ω) =(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1

·∼H(F)(ω)

(∼p(LF)(ω) +

∼H(L)(ω)∼p(LB)(ω)

). (2.122)

Equation (2.122) allows for certain assumptions on the considered quantities. As thereis no energy exchange between the individual modes in the free field,

∼H(F)(ω) and

∼H(B)(ω)

are diagonal matrices. The same holds for∼H(M)(ω), which would also be diagonal when

a spherically invariant scatterer (e. g., a sphere located at the origin) is present withinthe microphone array. Even in the free-field case, when there is no physical reflectionwithin the volume enclosed by the inner surface,

∼H(M)(ω) is non-zero. This is due to

the spherical coordinate system, which only uses positive radii r such that wave fieldcomponents passing the coordinate origin appear mathematically as being reflected atthe origin, first traveling in negative r-direction, then in positive r-direction. Still, thisdoes not mean that, e. g, a plane that passes the origin would actually be distorted.

Typically,∼H(L)(ω) is not a diagonal matrix, but this does not constitute an obstacle for

the following derivation. The amplitude of a reflected wave field component cannot bestronger than the amplitude of the component it originates from, i. e., the absolute valueof the reflection coefficient is equal to or less than one. Assuming the latter, the spectralnorm of

∼H(L)(ω) is also bounded to be lower than one. Furthermore, the diagonal matrices

∼H(F)(ω),

∼H(B)(ω), and

∼H(M)(ω) do not describe an amplification, which determines the

maximum singular value of this product also to be lower than one. As the identity ma-trix has only eigenvalues equal to one, the sum

(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)always represents a non-singular matrix.

To describe the coupling of the loudspeakers to the forward-traveling and backward-traveling wave field components, the frequency responses F (L)

λ,∼l(ω) and B

(L)

λ,∼l(ω) can be

used, respectively. These frequency responses also capture the spatial sampling of theouter surface by the loudspeakers and the directional pattern of the loudspeakers. Hence,the kernels of the matrices defined by

F(L)(ω) =

F

(L)1,1(ω) F

(L)1,2(ω) · · · F

(L)1,NC

(ω)F

(L)2,1(ω) F

(L)2,2(ω) · · · F

(L)2,NC

(ω)... ... . . . ...

F(L)NL,1(ω) F

(L)NL,2(ω) · · · F

(L)NL,NC

(ω)

, (2.123)

B(L)(ω) =

B

(L)1,1 (ω) B

(L)1,2 (ω) · · · B

(L)1,NC

(ω)B

(L)2,1 (ω) B

(L)2,2 (ω) · · · B

(L)2,NC

(ω)... ... . . . ...

B(L)NL,1 (ω) B

(L)NL,2 (ω) · · · B

(L)NL,NC

(ω)

(2.124)

42

describe the limitations of wave field excitation with the considered loudspeaker array.Moreover, F (L)

λ,∼l(ω) and B

(L)

λ,∼l(ω) do not simply constitute point-to-point frequency re-

sponses, but they describe the coupling of the loudspeaker signals to spherical harmonicsradiated by the loudspeaker array. This can already be interpreted as a transform of theloudspeaker signals to the wave domain, although the actually used wave-domain trans-forms will be defined later. Using F (L)

λ,∼l(ω) and B

(L)

λ,∼l(ω) the loudspeaker contributions to

the wave field can be described by∼F

(L)∼l

(ω) =NL∑λ=1

F(L)

λ,∼l(ω)X λ(ω), (2.125)

∼B

(L)∼l

(ω) =NL∑λ=1

B(L)

λ,∼l(ω)X λ(ω), (2.126)

which can be expressed using a matrix-vector notation given by∼p(LF)(ω) = F(L)(ω)x(ω), (2.127)∼p(LB)(ω) = B(L)(ω)x(ω). (2.128)

In the latter expression, the loudspeaker signals are captured in

x(ω) =(X 1(ω), X 2(ω), . . . , X NL

(ω))T. (2.129)

The coupling of the microphones to the respective wave field components is given by

D µ(ω) =NC∑∼m=1

F(M)µ,

∼m

(ω)∼P

(F)∼m

(ω) +B(M)µ,

∼m

(ω)∼P

(B)∼m

(ω), (2.130)

where the frequency responses F (M)µ,

∼m

(ω) and B(M)µ,

∼m

(ω) describe the coupling of the forward-traveling and the backward-traveling wave of mode index ∼

m to the microphone signal µ.Similar to the case for the loudspeaker signals, this can be interpreted as a transform fromthe wave domain to the microphone signals. Using

d(ω) =(D 1(ω), D 2(ω), . . . , D

NM(ω)

)T(2.131)

to represent the microphone signals, with

F(M)(ω) =

F

(M)1,1 (ω) F

(M)1,2 (ω) · · · F

(M)1,NM

(ω)F

(M)2,1 (ω) F

(M)2,2 (ω) · · · F

(M)2,NM

(ω)... ... . . . ...

F(M)NC,1(ω) F

(M)NC,2(ω) · · · F

(M)NC,NM

(ω)

, (2.132)

B(M)(ω) =

B

(M)1,1 (ω) B

(M)1,2 (ω) · · · B

(M)1,NM

(ω)B

(M)2,1 (ω) B

(M)2,2 (ω) · · · B

(M)2,NM

(ω)... ... . . . ...

B(M)NC,1(ω) B

(M)NC,2(ω) · · · B

(M)NC,NM

(ω)

, (2.133)


(2.130) can be expressed as

d(ω) = F(M)(ω)∼p(F)(ω) + B(M)(ω)∼p(B)(ω). (2.134)

Plugging (2.127) and (2.128), into (2.122) leads to

∼p(F)(ω) =(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1

·∼H(F)(ω)

(F(L)(ω) +

∼H(L)(ω)B(L)(ω)

)x(ω), (2.135)

which describes the forward-traveling wave at the inner surface SM in dependence of theloudspeaker signals. Equations (2.120) and (2.134) can be used to derive the microphonesignal from (2.135) to obtain

d(ω) =(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

)·(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1

·∼H(F)(ω)

(F(L)(ω) +

∼H(L)(ω)B(L)(ω)

)x(ω). (2.136)

This expression can be related to point-to-point LEMS model, where the frequency re-sponses H µ,λ(ω) describe the path from loudspeaker λ to microphone µ and represent theGreen’s function (cf. (2.32)) fulfilling the given boundary conditions, e. g., as describedin Sec. 2.1.5. For matrix notation, H µ,λ(ω) is represented by

H(ω) =

H 1,1(ω) H 1,2(ω) · · · H 1,NM

(ω)H 2,1(ω) H 2,2(ω) · · · H 2,NM

(ω)... ... . . . ...

HNL,1(ω) H NL,2(ω) · · · H NL,NM

(ω)

. (2.137)

Because there are only NC modes considered in (2.136), H(ω) is not identical to, butapproximated by

H(ω) ≈(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

)︸︷︷︸

coupling of the microphones

·(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1

︸︷︷︸reverberation in the enclosure

·∼H(F)(ω)

F(L)(ω) +∼H(L)(ω)B(L)(ω)︸︷︷︸

first reflection on enclosure

︸︷︷︸

coupling of the loudspeakers

, (2.138)

44

which corresponds to (2.103) for a three-dimensional MIMO LEMS. When comparing(2.138) to (2.103), it is obvious that both have the same structure, where (2.138) usedmatrix-valued functions, while scalar-valued functions are used in (2.103).

The term∼H(F)(ω)

(F(L)(ω) +

∼H(L)(ω)B(L)(ω)

)describes the coupling of the loudspeak-

ers to the forward-traveling wave at the inner surface. This includes the first reflection ofthe emitted sound at the enclosure, represented by

∼H(L)(ω)B(L)(ω). The reverberation in-

side the enclosure is then described by the term(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1.

Finally, the coupling of the microphones to the wave field is described by the term(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

).

Like in the one-dimensional example above, the term(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1can potentially be perfectly equalized by fi-

nite impulse response (FIR) filters [HMK97]. This is, however, not the case for the term∼H(F)(ω)

(F(L)(ω) +

∼H(L)(ω)B(L)(ω)

), which also captures the backward radiation from the

loudspeakers to the enclosure. The latter can be reduced by using higher order loudspeak-ers or another loudspeaker array enclosing the originally considered one [PA11, SK13b],making such configurations attractive candidates for LRE.

Still, F(L)(ω) and B(L)(ω) govern the ability of the loudspeaker array to excite a desiredwave field. In order to achieve a perfect equalization of the term(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1, those matrices must be non-singular. This

implies that the number of loudspeakers must be at least as large as the number of wavefield components relevant to the acoustic scene in the considered volume.

Generalization towards other array geometries

While circular transducer arrays are experimentally investigated later in this thesis, arraysetups used in practice might differ significantly. An example can be given by an LEMS asshown in Fig. 2.10. The loudspeaker array and microphone array shown there constitutelinear arrays. Unlike in the previous three-dimensional example, there is a wall thatis relevant for the wave propagation between the loudspeaker array and the microphonearray. Although all previous derivations remain valid,

∼H(F)(ω),

∼H(B)(ω), and

∼H(M)(ω) may

no longer assumed to be diagonal. To obtain a suitable wave-domain model in this case,the wave-domain transform basis must not only be chosen according to the loudspeakerand microphone array geometries, but also in accordance with the boundaries imposedby the enclosure. For this example, suitable transforms can be derived considering planewaves, which allow a straightforward geometrical interpretation of a reflection at a planewall. When considering an image source model, the boundary conditions for a singlereflection at the walls can be fulfilled by superimposing another plane wave with anincidence angle according to the law of reflection, as illustrated by the two plane wavecontributions, shown in gray. Hence

∼H(F)(ω),

∼H(B)(ω),

∼H(L)(ω), and

∼H(M)(ω) would not be

diagonal but sparse instead, which leads to wave-domain properties of the LEMS similarto those discussed in Sec. 2.4.2. As mentioned above, the transforms can only consider


∼k

∼k ′

∼k

∼k ′

∼H(L)(ω)

∼H(F)(ω)

∼H(B)(ω)

∼H(M)(ω)

Figure 2.10: Alternative example of LEMS to be modeled in the wave domain

finite apertures, which suggests a sampling of the wave-field decomposition at certainwave numbers.

2.3.3 Wave-domain model for loudspeaker-enclosure-microphonesystems

While a conventional point-to-point model describes an LEMS by the impulse responsesor frequency responses between the loudspeakers and the microphones, a wave-domainmodel considers the couplings of the ideal free-field wave field and the actually measuredwave field. This representation is explained in this section, building on the derivationpresented in Sec. 2.3.3.

A conventional point-to-point model would describe the LEMS in the frequency domainby

d(ω) = H(ω)x(ω), (2.139)

disregarding the spatial arrangement of loudspeakers, enclosure, and microphones. On theother hand, WDAF uses knowledge about the transducer positions to deduce assumptionson the quantities in (2.136). These assumptions are then exploited to overcome thechallenges of adaptive MIMO filtering caused by the large numbers of loudspeakers andmicrophones. The wave-domain equivalent to (2.139) is given by

d(ω) = T−1M (ω)

∼H(ω)TL(ω)x(ω), (2.140)

46

where

∼H(ω) =

∼H 1,1(ω)

∼H 1,2(ω) · · ·

∼H 1,NM

(ω)∼H 2,1(ω)

∼H 2,2(ω) · · ·

∼H 2,NM

(ω)... ... . . . ...

∼H NL,1(ω)

∼H NL,2(ω) · · ·

∼H NL,NM

(ω)

(2.141)

captures the wave-domain frequency responses from the wave-domain loudspeaker signall to the wave-domain microphone signal m. The wave-domain loudspeaker signals areobtained using the loudspeaker signal transform (LST) TL(ω), comprising the frequencyresponses T L,l,λ(ω) according to

TL(ω) =

T L,1,1(ω) T L,1,2(ω) · · · T L,1,NL

(ω)T L,2,1(ω) T L,2,2(ω) · · · T L,2,NL

(ω)... ... . . . ...

T L,NL,1(ω) T L,NL,2(ω) · · · T L,NL,NL(ω)

. (2.142)

The microphone signal transform (MST) transforms the microphone signals to their wavedomain representation. Hence, the frequency responses T M,µ,m(ω) describe the inverse ofthe MST arranged as the entries of

T−1M (ω) =

T M,1,1(ω) T M,1,2(ω) · · · T M,1,NM

(ω)T M,2,1(ω) T M,2,2(ω) · · · T M,2,NM

(ω)... ... . . . ...

T M,NM,1(ω) T M,NM,2(ω) · · · T M,NM,NM(ω)

. (2.143)

As described in Sec. 2.3.1, the loudspeaker array can only excite NL independent wave-field components, while the microphone array can only resolve NM independent wave-field components. This leads to the dimensions chosen for the transform matrices definedabove and is in accordance to the fact that there are NLNM individual loudspeaker-to-microphone paths to be described. From (2.139) and (2.140) follows that

H(ω) = T−1M (ω)

∼H(ω)TL(ω), (2.144)

which has a form similar to(2.138), where TL(ω) describes the coupling of the loudspeakersignals to the wave field and T−1

M (ω) describes the coupling of the wave field to themicrophone signals. Both transform matrices, TL(ω) and T−1

M (ω), will be given later forthe specific applications, while

∼H(ω) will be represented by an adaptive filter.

The roles of the LST and the MST with respect to the considered signals and systemsare shown in Fig. 2.11. There the wave-domain loudspeaker and microphone signals are


T−1L (ω) H(ω) TM(ω)

TL(ω) ∼H(ω) T−1

M (ω)

∼x(ω) x(ω) d(ω)∼d(ω)

︸︷︷︸∼H(ω)

x(ω) ∼x(ω)∼d(ω) d(ω)

︸︷︷︸H(ω)

Figure 2.11: Roles of the wave-domain transforms with respect to the LEMS

captured in

∼x(ω) = TL(ω)x(ω) =( ∼X 1(ω),

∼X 2(ω), . . . ,

∼X NL

(ω))T, (2.145)

∼d(ω) = TM(ω)d(ω) =

( ∼D 1(ω),

∼D 2(ω), . . . ,

∼DNM

(ω))T, (2.146)

respectively. This allows for writing (2.140) as∼d(ω) =

∼H(ω)∼x(ω). (2.147)

As it can be seen in the upper part of Fig. 2.11, the inverse of the LST is used in combi-nation with the MST to represent the LEMS described by H(ω) in the wave domain. Onthe other hand, the LST together with the inverse of the MST achieves a transform of thewave-domain LEMS described by

∼H(ω) to a conventional point-to-point representation.

The dimensions of∼H(ω) where chosen equal to H(ω), such that the latter can be per-

fectly modeled. Still, as it can be seen from the upper part of Fig. 2.11, the inverse LSTis necessary to determine

∼H(ω) from H(ω). Thus, not only equal dimensions of

∼H(ω) and

H(ω) are necessary for a perfect wave-domain model, but also the existence of an inverseLST.

The original idea behind WDAF is that the transforms TL(ω) and T−1M (ω) approximate

an eigenvalue decomposition of H(ω), such that∼H(ω) can be approximated by a diagonal

matrix [BSK04, SBR04, SBRH07]. This approach is generalized in this thesis, whererestrictions on

∼H(ω) will be imposed in a more flexible manner and further properties of

∼H(ω) will be exploited.

Generally, the transform pair TL(ω) and T−1M (ω) is chosen such that the cascade of

both transform describes the free-field impulse response of the LEMS. The actual wave-domain LEMS model is then placed in between those transforms. That gives rise to thequestion which properties or parts of the free-field LEMS are actually described by eachof the both transforms. Furthermore, it is of interest how the differences of a true LEMSto the idealized free-field LEMS are finally described by the wave-domain LEMS model.In the following, the description of the idealized LEMS presented in Sec. 2.3.2 is used toprovide some insights in this regard.

In the free field,∼H(L)(ω) would be equal to zero and T−1

M (ω)TL(ω) = H(ω) is valid,assuming NM = NL to ensure compatible matrix dimensions without loss of generality.

48

Thus, (2.138) leads to

T−1M (ω)TL(ω) ≈

(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

) ∼H(F)(ω)F(L)(ω). (2.148)

The matrices T−1M (ω) and TL(ω) are generally not equal to

(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

)and

∼H(F)(ω)F(L)(ω), respectively, as (2.148) only requires the approximation of the prod-

ucts on both sides. The transform matrices will typically exhibit dimensions that aredifferent from the dimensions of

(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

)and

∼H(F)(ω)F(L)(ω). More-

over, in practice, T−1M (ω) and TL(ω) will often be chosen such that they do not describe

every aspect of the free-field propagation. The reason for this will become apparent later.The matrices F(M)(ω), B(M)(ω),

∼H(M)(ω), and F(L)(ω) can, e. g., be defined by consid-

ering the loudspeakers as ideal point sources and the microphones as ideal sampling ofthe sound pressure. The free-field propagation from the loudspeakers to the microphonescan then be described by (2.38), when considering spherical coordinates. In that case,the matrix F(L)(ω) would represent the term Y m

n (ϑV, αV), while F(M)(ω) and B(M)(ω)correspond to

(Y mn (ϑ, α)

)∗. The diagonal matrices F(L)(ω) and

∼H(M)(ω) would further-

more describe the wave propagation in radial direction, where F(L)(ω) is represented byhn(−

∼krV

)hn(∼kr), and

∼H(M)(ω) by hn

(−

∼kr)/hn

(∼kr), such that(

F(M)(ω) + B(M)(ω)∼H(M)(ω)

)finally describes hn

(−

∼krV

)jn(∣∣∣∼kr∣∣∣).

To determine what is actually described by the wave-domain LEMS model∼H(ω), the

right-hand sides of (2.138), (2.144), and (2.148) have to be considered. It can be seen that∼H(ω) must be chosen such that(2.144) represents(2.138), which would be straightforward,if there was not the first reflection of the loudspeaker contribution on the enclosure that isentirely missing in (2.148). In fact, TL(ω) describes the sound propagation from the outersurface to the inner surface, before

∼H(ω) describes the first reflection of the loudspeaker

contribution on the enclosure, although the course of events is the other way around: thebackward-traveling loudspeaker contribution is first reflected at the enclosure and thentraveling from the outer surface to the inner surface. Considering this mismatch resultsin

T−1M (ω)

∼H(ω)TL(ω) ≈

(F(M)(ω) + B(M)(ω)

∼H(M)(ω)

)·(INC −

∼H(F)(ω)

∼H(L)(ω)

∼H(B)(ω)

∼H(M)(ω)

)−1

︸︷︷︸room reverberation

·

INC +∼H(F)(ω)

∼H(L)(ω)B(L)(ω)

( ∼H(F)(ω)F(L)(ω)

)†︸︷︷︸

first reflection on enclosure

·

∼H(F)(ω)F(L)(ω), (2.149)


where the pseudoinverse of matrices attributed to the LST has to compensate for theerror that is made by the mismatch to physical reality. Thus,

∼H(ω) describes the room

reverberation and the first reflection of the loudspeaker contribution at the enclosure incombination an inverse of TL(ω). Implicitly modeling the inverse of TL(ω) can leadto problems in real-world implementations since digital filters cannot exhibit arbitraryfrequency responses and real systems must be causal. Causality problems can be alleviatedwhen modeling the diagonal matrix

∼H(F)(ω) by

∼H(ω) instead of TL(ω), such that TL(ω)

no longer describes a delay.Moreover, in practical scenarios there are also other problems to be considered: when-

ever the wave number is considered in a transform, the speed of sound must be known,which is dependent on the typically unknown room temperature. Additionally, there willbe positioning errors in the placement of the transducers, which will themselves exhibit adirectional pattern that is not entirely known. When an adaptive filter represents

∼H(ω),

it will not only model the actual LEMS properties, but also compensate for any mismatchof TL(ω) to the physical reality. Thus, a desirable design of TL(ω) can at least approx-imately be inverted by

∼H(ω) such that the errors made in TL(ω) can be compensated

by the adaptive filter. For some array setups, like uniform circular arrays (as expressedby (2.228) and (2.229) in Sec. 2.5.3), it is possible to define delayless unitary implemen-tations of the wave-domain transforms. These transforms can be perfectly inverted bythe adaptive filter, when using a non-approximative wave-domain LEMS model and aretherefore used for most of the investigations in this thesis.

2.3.4 Spatial audio reproductionIn this section, some aspects of audio reproduction are reviewed with a focus on WFS.The aim of multichannel audio reproduction has always been to achieve a natural spatialreproduction of an acoustic scene. For two-channel stereophonic setups, this can onlybe achieved at a single position, the so-called sweet spot. The wide-spread 5.1 and 7.1reproduction systems [Uni12] relax the requirements regarding the listener’s position, butstill cannot provide satisfactory spatial realism in an extended area. Several approachestry to overcome this limitation by reproducing the wave field of an acoustic scene for aspatially extended listening area. Notable examples are: approaches based on acousticchannel inversion [KN93, KFV11], Higher-Order Ambisonics (HOA) [Ger73, Dan03], orWFS [BDV93], where the latter two are typically utilizing several tens to hundreds ofloudspeakers. In this thesis, WFS is considered as the primary example of reproductionsystems, although the obtained results are not limited to a specific sound field repro-duction technique. On the other hand, however, binaural reproduction approaches arefundamentally different from sound field reproduction approaches [Møl92, Bau61]. Thisis expected to limit the applicability of the results obtained for LRE to such systems.

With few exceptions [YFKS83], most reproduction techniques exploit only linear acous-tics. Hence, the general model of a reproduction system chosen here is also linear, wherethe loudspeaker signals are obtained according to

50

X λ(ω) =NS∑q=1

Qq(ω)GR,λ,q(ω), (2.150)

with the NS source signals Qq(ω) indexed by q and the frequency responses of the repro-

duction system denoted by GR,λ,q(ω). This model can not only be used to describe thereproduction approaches mentioned above, but also the reproduction of an acoustic scenerecorded by NL microphones in the far-end room for a one-to-one reproduction in thenear-end room. In the latter case, Q

q(ω) would represent the signals of physical sources,

while GR,λ,q(ω) describes the frequency responses from the original source positions tothe microphones, where the microphone signals would be directly reproduced by the loud-speakers [Sno53]. For WFS, the signals of the virtual sources are described by Q

q(ω), and

the determination of GR,λ,q(ω) will be briefly outlined in the following. The description ofother reproduction schemes can be found in the literature [KN93, KFV11, Ger73, Dan03].

Wave field synthesis is derived starting from the Kirchhoff-Helmholtz integral (2.50),where the volume V is considered as the listening space. As described in Sec. 2.1.4, theboundary conditions considered in this integral completely describe any wave travelingthrough the surface S, including its traveling direction. To exploit this fact for audioreproduction, a monopole source distribution superimposed by a dipole source distributionwould have to be realized to account for the terms G(~x|~x0,

∼k) and ∂G(~x|~x0,

∼k)

∂~n(~xS) in (2.50),respectively. However, when allowing waves emitted to both sides of the considered surfaceS it is not necessary to realize the dipole portion of the source distribution. In this case,only the wave field on one side can be freely chosen and the Kirchhoff-Helmholtz integralreduces to the Rayleigh integral [Ver97]

P (~x, ω) = −∮∂V

2∂P (~xS, ω)∂~n (~xS)

G(~x|~xS,∼k) dS. (2.151)

The latter allows for a perfect synthesis by choosing P (~xS, ω) appropriately when con-sidering the free-field Green’s function and an infinitely extended planar surface. Whenfollowing this approach, the synthesis is limited to synthesizing waves traveling from thissurface towards the region of interest [Ver97]. For other convex surface shapes like aspherical surface, (2.151) allows for an approximate synthesis of the desired wave field[SRA08]. The case of a convex surface is considered in the following, deriving the drivingfunction for a continuous source distribution from (2.151) that is given by

D q(~xS, ω) =

−2∂P Q,q(~xS,ω)

∂~n(~xS) for(∂P Q,q(~xS,ω)

∂~n(~xS) /P Q,q(~xS, ω))> 0,

0 elsewhere.(2.152)

The condition∂P Q,q(~xS,ω)

∂~n(~xS) /P Q,q(~xS, ω) > 0 is used to avoid virtual reflections, i. e., wavefield components emitted by the loudspeakers traveling the opposite direction as intendedwithin V . Those would occur because the boundary conditions on S considered in (2.50)


imply a cancellation of all wave-field components traveling from V to the outside (seeSec. 2.1.4). At the same time, the missing dipole source in (2.151) implies a radiation toboth sides of S, which results in an additionally excited wave traveling in the directionopposed to the desired one.

The synthesis of a plane wave can be facilitated using the driving function

D(pw)q (~xS,

∼k, ω) =

−j2 Q q(ω)

⟨~n (~x) ,

∼k⟩ej

⟨∼k, ~xS

⟩for

⟨~n (~x) ,

∼k⟩> 0

0 elsewhere,(2.153)

which can be obtained inserting (2.11) in (2.152). The actual loudspeaker signals can beobtained by a spatial sampling of the respective driving function at ~xS = ~p

(L)λ

X λ(ω) = w(L)λ

NS−1∑q=0

D q(~p(L)λ, ω), (2.154)

where w(L)λ

is a factor to correct a possibly non-uniform sampling of the surface S by theloudspeakers as explained in the following: For a three-dimensional synthesis, w(L)

λwould

be proportional to the surface section supported by the respective loudspeaker. However,in practice WFS is often restricted to a reproduction of wave fields with no z-dependency,which are referred to as two-dimensional wave fields in the following (although they fill thethree-dimensional physical space). This is because those are most important for humanperception and can be approximately synthesized by loudspeaker array geometries locatedin the x-y-plane. For these practically favorable setups, w(L)

λis proportional to the distance

between neighboring loudspeakers. Still, even when disregarding artifacts from the spatialsampling in the x-y-plane, an exact synthesis of a two-dimensional wave field would requireline sources parallel to the z-axis, as described by (2.47), although loudspeakers are betterdescribed by point sources according to (2.41) [Cro98]. To account for this mismatch, thefollowing correction of the driving function can be used for the synthesis of plane waves[Ver97]

D(2.5D)q (~xS, ω) =

√√√√ 1j

∼k

√2π‖~xref − ~xS‖D(pw)

q (~xS,∼k, ω), (2.155)

where ~xref is a reference position for which the reproduction is optimized. Note thatthe term

√2π‖~xref − ~xS‖ influences only the level of the driving signals but not their

phase. Hence, the perceived audio quality is not significantly degraded when the listeneris located apart from ~xref. This approach is often referred to as 2.5-dimensional synthesis.Using (2.153) and (2.155) with Q

q(ω) as signal for each virtual source, the synthesis of a

wave field of sources according to (2.49) can be facilitated by

52

GR,λ,q(ω) =√jmin

(∼k,

∼kal)√

8π‖~xref − ~p (L)λ‖ Q

q(ω)

· ramp(⟨~n(~p

(L)λ

), ~n(pw)

⟩)e−j

∼k

⟨~n(pw), ~p

(L)λ

+d(pw)⟩, (2.156)

where

ramp (x) ={x for x > 0,0 elsewhere (2.157)

and min (x, y) denotes the lower value of x and y. The spatial sampling by the loudspeak-ers causes spatial aliasing for the synthesis at higher frequencies [SRA08]. Then, theamplitude correction according to (2.155) does no longer lead to an improved synthesis at~xref and causes amplitude errors itself. Consequently, a maximum frequency, representedby an “aliasing” wave number

∼kal, is chosen beyond which the magnitude response is no

longer corrected. The synthesis of point sources is beyond the scope of this thesis andtreated in [Ver97].

2.3.5 Equalization of reproduced wave fieldsIn this section, the task of equalizing reproduced wave fields is related to an accordingsignal model shown in Fig. 2.12.

The goal of an LRE system is to determine the NL × NL matrix G(ω) describing theequalizers such that the error between the ideal and the actual signal at the listener’spositions is minimized. Unfortunately, in practice the reproduced acoustic scene cannotgenerally be measured at the listener’s ears, as according sensors would significantly im-pair the users comfort. Hence, another optimization problem is considered by equalizingthe acoustic scene at multiple microphone positions at some distance from the listener,assuming that the solution optimizes the wave field at the listener’s ears as well. Thisgoal can be interpreted as achieving a spatially extended equalization and is well-knownin the literature [BA05, Spo05, NHE92].

In Fig. 2.12(a), the transducer array setup located in the listening room is shown,where the corresponding signal model for LRE is shown in Fig. 2.12(b). There, thesignal perceived by the listener should ideally represent the NL loudspeaker signals x(ω)multiplied by the desired frequency response to the listener’s ears H′0(ω), which is an2×NL matrix when considering a single listener. The frequency response H′0(ω) is oftenchosen to be the free-field impulse response, as many reproduction schemes, e. g., WFS orHOA are optimized for free-field reproduction and rely on an acoustically treated roomthat approximates those conditions sufficiently well. The actual MIMO room impulseresponse between the loudspeakers and the listener is described by the 2×NL matrix H′(ω)and will generally not represent the desired frequency response. The error between theactual signal and the desired signal at the listeners ears is denoted by e′(ω) in Fig. 2.12(b).


(a)

H′(ω) H(ω)H(D)(ω)

(b) H′0(ω)

H′(ω)

G(ω)

H(ω) H(D)(ω)

H 0(ω)

x(ω)

originalloudspeakersignal

y(ω)

equalizedloudspeakersignal

−actual signal at listener’s ears

d(ω) −

actual microphone signal

desired signal atlistener’s ears

desired microphone signal

e(ω)

e′(ω)

≈ e′(ω)

+

+

︸︷︷︸≈H(D)(ω)H(ω)

︸︷︷︸≈H(D)(ω)H 0(ω)

Figure 2.12: Array setup and signal model for equalizing a reproduced acoustic scene atthe listener position

54

As mentioned above, the NM microphone signals represented by d(ω) will be equal-ized instead of the signals at the listeners ears. The frequency responses between theloudspeakers and the microphones are captured in H(ω), which represents a NM × NL

matrix. Like in the relation of H′(ω) to H′0(ω), H(ω) will generally be different from thecorresponding desired frequency responses captured in the NM ×NL matrix H 0(ω). Theerror signal considered in this optimization problem is denoted by e(ω).

To establish the relation between LRE at the listener’s positions and LRE at the mi-crophone positions, the 2 × NM matrix H(D)(ω) is used, which describes the frequencyresponses between the microphone positions and the listener’s ears. Since the micro-phones will typically not pick up all sound field components that influence the signal atthe listener’s ears, only an approximate relation between H′0(ω) and H 0(ω) as well asH′(ω) and H(ω) can be established, which is given by

H′0(ω) ≈ H(D)(ω)H 0(ω), (2.158)H′(ω) ≈ H(D)(ω)H(ω). (2.159)

The accuracy of this approximation determines how well LRE at the microphone positionstranslates to LRE at the listener’s position. In other words, when the acoustic scene at thelistener’s position can be sufficiently described by observing the scene at the microphonepositions, an effective LRE is possible. This relation emphasizes the importance of thechosen microphone array geometry, where a placement strategy can be motivated by theKirchhoff-Helmholtz integral: An implication of (2.50) is that once a reproduced scene isequalized on a surface enclosing a source-free volume, the acoustic wave field within thevolume is also equalized [Spo05]. Thus, the reproduced scene can be spatially sampledby microphones located on a closed surface, while the listener is located in the enclosedvolume. Due to physical and economical constraints, real-world implementations willtypically only approximate such a microphone array geometry.

Recently, the task of LRE has been interpreted as a cancellation problem instead of aninverse filtering problem [TZA13], where this approach is intuitively accessible: Wheneverthe loudspeakers are able to cancel all reflections of the enclosure, the latter will no longerdegrade the reproduction quality. Moreover, this applies also when considering LRE asan inverse filtering problem, which further implies that a large number of loudspeakersmust be used to achieve a spatially extended equalization (see Sec. 2.3.2).

When aiming at a signal-independent equalization following a least-squares approach,the equalizers are given by

HH(ω)H(ω)G(ω) = HH(ω)H 0(ω). (2.160)

The dimensions of the matrices H(ω) and H 0(ω) are NM×NL (see (2.137)), while G(ω) isNL×NL. These dimensions give some insights regarding the influence of the loudspeakerand microphone numbers: If NL = NM, H(ω) is a square matrix and solving (2.160) is

2.4 Derivation of Wave-Domain Transforms 55

a fully determined problem. Assuming H(ω) to be full rank, this implies that NL loud-speakers allow for perfect control of the reproduced wave field at exactly NL positions.When NM > NL, solving (2.160) is an over-determined problem, where it has been found,that this improves the spatial extent of the equalization [EN89, SK13b]. For NM < NL,solving (2.160) is an underdetermined problem. In that case, the ambiguity of equalizerscan cause an arbitrary degradation of the equalization apart from the microphone posi-tions. As will be explained later, the equalization of a reproduced acoustic scene withNS < NL is also an underdetermined problem.

2.4 Derivation of Wave-Domain Transforms

In this section, the wave-domain transforms are derived as one incarnation of the trans-forms TM(ω), TL(ω) introduced in Sec. 2.3.3. For the considered array setups comprisinga circular microphone array, circular harmonics and spherical harmonics are suitable ba-sis functions, while linear arrays would suggest to choose plane waves [EKFH12]. In thisthesis, transforms based on circular harmonics, as described in Sec. 2.4.1, are investigatedin some detail. A discussion of the wave-domain properties of the LEMS is presentedin Sec. 2.4.2 and the influence of the array positioning on the wave-domain properties isdiscussed in Sec. 2.4.3.

2.4.1 Transforms based on circular harmonics

In this section, wave-domain transforms based on circular harmonics are explained. Aspecial case of these transforms has already been presented in [SK11], while the transformshave later been generalized [SK12b]. For notational compatibility with this thesis, [SK12b]is summarized in the sequel.

For a start, it is noted that many audio reproduction systems consider planar arraysetups to reproduce acoustic scenes in the azimuthal plane, as this plane is consideredto be most important for human perception. When considering only the x-y-plane, adependency of the reproduced wave field along the z-axis is neglected and the spectrumof the sound pressure at any point ~x|z=0 can be described by a sum of circular harmonics.Transforms based on the latter are derived in the following.

Microphone signal transform (MST)

The MST is used to obtain the wave-domain microphone signals∼d(ω) (see Fig. 2.11). For

its derivation, (2.76) is used to describe the sound field in the vicinity of the microphonearray, where the finite number of available microphones implies a spatial sampling. Theconsidered microphone array samples the wave field on a circle of radius RM, which leads

56

to approximating the integral in (2.76) by a sum:

C(ci)m

(∼kz, ω) ≈ 1

NMBm(∼kRM

) NM∑µ=1

D µ(ω)e−jmα(M)µ . (2.161)

Here, Bm(∼kRM

)equalizes the linear distortion by a scatterer inside the microphone array.

The finite sum in(2.161) only allows for observing NM non-redundant modes, such that thefollowing considerations are restricted to the mode orders m = −(NM/2− 1), . . . , NM/2,while the spatial aliasing (see Sec. 2.3.1) is modeled in the LEMS model. Consideringmodal aliasing directly in transforms based on circular harmonics has also been considered[SK09], but was not followed due to disappointing results. It was shown in [EHO13]that the aliasing also can be disregarded in a plane-wave-based transform design withoutsignificant degradation of the wave-domain LEMS properties.

The wave-domain microphone signals are obtained by

∼Dm(ω) = 1

NMBm(∼kRM

) NM∑µ=1

Dµ(ω)e−jmα

(M)µ , (2.162)

where the mode order m is assigned to the wave-domain microphone signal index m by

m ={m− 1 for m ≤ NM/2 + 1,m−NM − 1 elsewhere. (2.163)

From (2.162), the inverse of the MST can directly be derived, which is given by

D µ(ω) =NM/2∑

m=−NM/2+1Bm

(∼kRM

) ∼Dm(ω)ejmα

(M)µ . (2.164)

Considering (2.162) and (2.164), the following frequency responses can be determinedas elements of TM(ω) and T−1

M (ω) to realize the MST and its inverse:

T M,m,µ(ω) = e−jmα(M)µ

NMBm(∼kRM

) , (2.165)

T M,µ,m(ω) = Bm(∼kRM

)ejmα

(M)µ . (2.166)


Loudspeaker signal transform (LST)

In this section, the LST is derived that describes the wave field excited by the loudspeak-ers under free-field conditions in terms of the chosen wave-domain basis functions. ForWDAF, is was originally proposed to first model the wave propagation for all loudspeaker-to-microphone paths, before the obtained signals were fed to the MST [BSK04]. Thisapproach is not followed here because it has some shortcomings: The most significantdrawback is that the number of resulting linearly independent wave-domain loudspeakersignals cannot be greater than the number of microphones. When there are less micro-phones than loudspeakers, the resulting loss of spatial information precludes transform-ing the loudspeaker signals back to their original domain. While an inverse transformis indispensable for LRE (see Chapter 4), the reduced number of possibly independentloudspeaker signal components would also restrict the LEMS model used for acoustic echocancellation (AEC). In the following derivation, the free-field wave field created by theloudspeakers at the microphone array aperture is directly determined in the wave domain.This avoids sampling at the microphone positions, which leads to the problem describedabove.

Since the wave propagation between the loudspeaker and the microphone positions isthree-dimensional in general, considering a two-dimensional wave field by using circularharmonics represents an approximation. When considering wave lengths larger than RM

and assuming a large minimum loudspeaker-microphone distance (RM � minλ

{%

(L)λ

}),

the loudspeaker contributions at the microphone array position can be, furthermore, ap-proximated as plane waves according to (2.11) [Bal97]. Still, an ideal plane wave doesnot exhibit an amplitude decay, while the amplitude of the loudspeaker contributionswill generally decay inversely proportional to the traveled distance (see Sec. 2.1.3). Theaccording attenuation and delay will therefore be modeled by the three-dimensional free-field Green’s function (2.35) between the individual loudspeaker positions and the centerof the microphone array. The superposition of all loudspeaker contributions at the centerof the microphone array coinciding with the origin of the coordinate system (see Fig. 2.5)can then be approximated by

P (~x, ω) ≈NL∑λ=1

X λ(ω)G(~0|~p (L)λ,∼k)ej

∼k% cos

(α−α(L)

λ

). (2.167)

As opposed to a wave field resulting from point sources described by (2.41), (2.167) can bestraightforwardly transformed using (2.76). This requires an intermediate step utilizingthe Jacobi-Anger expansion [AS72] to write

∫ 2π

0ej∼k% cos

(α−α(L)

λ

)e−jlα dα =

∞∑l′=−∞

j l′Jl′

(∣∣∣∼k%∣∣∣) e−jl′α(L)λ

∫ 2π

0ej(l′−l)αdα, (2.168)

replacing the mode order m by the respective mode order l for the wave-domain loud-speaker signals [SK11].

58

Note that (2.167) and (2.168) assume free-field conditions. Still, a possible scattererinside the microphone array would not influence the wave-domain microphone signals,since its influence is compensated by the term 1/

(NMBm

(∼kRM

))in (2.162). Hence, this

scatterer can be disregarded in the following derivation.

Transforming (2.167) using (2.76) and applying (2.168) leads to the LST

∼X l(ω) = j l

4π

NL∑λ=1

X λ(ω) e−j

∼k%

(L)λ

%(L)λ

e−jlα(L)λ , (2.169)

where the wave-domain loudspeaker signal∼X l(ω) represents C(ci)

m(∼kz, ω) (see (2.76)) at

the microphone positions as it would be excited by all loudspeakers. The wave-domainloudspeaker signal index l is related to the mode order l by

l ={l − 1 for l ≤ NL/2 + 1,l −NL − 1 elsewhere. (2.170)

The inverse LST can be straightforwardly derived from (2.169), resulting in

X λ(ω) = 4π%

(L)λej∼k

(%

(L)λ−max

λ

{%

(L)λ

})NL

NL∑l=1

∼X l(ω)j−lejlα

(L)λ , (2.171)

where an additional delay of maxλ

{%

(L)λ

}/c seconds is incorporated to assure causality of

the inverse transform. The LST and its inverse, respectively, can be represented by thefrequency responses

T L,l,λ(ω) = 1%

(L)λ

e−j∼k%

(L)λ e−jl(α

(L)λ−π/2), (2.172)

T L,λ,l(ω) =%

(L)λ

NLej∼k(%(L)

λ−max

λ

{%

(L)λ

})ejl(α

(L)λ−π/2), (2.173)

where the factors 1/(4π) and 4π, respectively, have been omitted without loss of generality.

As it is shown in Appendix A, the approximation of the loudspeaker contributionsby plane waves can be avoided when considering spherical harmonics. However, a filterdesign procedure is necessary to implement the transforms derived there and an inverse


transform cannot be straightforwardly derived. This is why these transforms have notbeen investigated in more detail.

Approximation errors

The transforms presented above approximate the loudspeaker contribution as plane wavesat the microphone array position and neglect the aliasing due to spatial sampling of themicrophone array. The resulting approximation error is discussed in this section, wherean array setup as shown in Fig. 2.5(a) is considered, equipped with NL = 48 loudspeakersand a microphone array with radius RM = 0.05 m. The number of microphones and theradius of the loudspeaker array have been varied in order to differentiate between the errorresulting from the plane wave approximation and the error due to spatial aliasing. Thenumber NM of microphones has been chosen to be 10 or 20 and the considered loudspeakerarray radii RL were 1.5 m and 3 m.

As mentioned above, the LST represents the cascade of the free-field frequency responsefrom the loudspeakers to the microphones and the MST, where an approximation has beenused in its derivation. Hence, a measure for the accuracy of this approximation can bedefined by

e app(ω) = 10 log10

NM∑m=1

NL∑λ=1

∣∣∣∣∣∣T L,O(m),λ(ω)−NM∑µ=1

T M,m,µ(ω)H 0,µ,λ(ω)

∣∣∣∣∣∣2

∣∣∣∣∣∣NM∑µ=1

T M,m,µ(ω)H 0,µ,λ(ω)

∣∣∣∣∣∣2

, (2.174)

where H 0,µ,λ(ω) represents the free-field impulse response from loudspeaker λ to micro-phone µ and the assignment function O(m) is defined such that l = m, considering (2.163)and (2.170).

The error e app(ω) is depicted in Fig. 2.13 as a function of the temporal frequency forNM = 10 and NM = 20 combined with RL = 1.5 m and RL = 3 m. When comparinge app(ω) for RL = 1.5 m and for RL = 3 m, the influence of the plane-wave approximationcan be seen. As the loudspeaker contributions at the microphone positions are betterapproximated by plane waves the larger the loudspeaker array radius is, the 6 dB lowererror for RL = 3 m compared to RL = 1.5 m is an expected result. The increase of thiserror for higher frequencies can be explained by the fact that a possible curvature of aloudspeaker contribution is less noticeable for lower frequencies. The error due to theneglected aliasing can be seen when comparing the curves for NM = 10 and NM = 20.For NM = 10, this becomes the dominating error for higher frequencies such that thereis virtually no difference for RL = 1.5 m and RL = 3 m above frequencies of 4 kHz.However, measuring e app(ω) according to (2.174) considers the aliasing of the microphonearray H 0,µ,λ(ω) but it disregards the considered wave-domain LEMS model. However, inpractice the aliasing of the microphone array does not need to be considered by the LST,but in the wave-domain LEMS model instead.

60

0 1 2 3 4 5 6 7 8−40

−30

−20

−10

0

frequency f = ω2π in kHz

eap

p(ω

)in

dBNM = 10, RL = 1.5 mNM = 20, RL = 1.5 mNM = 10, RL = 3 mNM = 20, RL = 3 m

Figure 2.13: Approximation error for the transforms presented in Sec. 2.4.1. The micro-phone array had a fixed radius RM of 5 cm and the loudspeaker array hasalways been equipped with NL = 48 loudspeakers.

2.4.2 Wave-domain properties of a loudspeaker-enclosure-microphone system

A transform-domain description of a modeled system is usually motivated by desirableproperties of the considered system or signals becoming evident in the respective trans-form domain. For WDAF, a dominance of certain couplings in

∼H

m,l(ω) is exploited, asdescribed in this section.

For the following illustration, a sample for Hµ,λ

(ω) was obtained by measuring thefrequency responses H µ,λ(ω) of an LEMS as depicted in Fig. 2.5(a) with a radius RL =1.5 m for the loudspeaker array and a radius RM = 0.05 m for the microphone array. Thearrays were equipped with NL = 48 loudspeakers and NM = 10 microphones and locatedin a real room of approximately 35 m2 with an office-like ceiling, a carpeted floor, andcurtains in front of the walls. The resulting reverberation time T60 for that room wasabout 0.25 seconds. The wave-domain MIMO system response

∼H m,l(ω) was obtained

by transforming H µ,λ(ω) as described in Sec. 2.4.1 using a transform based on circularharmonics.

The result is shown in Fig. 2.14, where it can be clearly seen that the couplings ofdifferent loudspeakers and microphones are similarly strong for H µ,λ(ω) on the left, whilethere are stronger couplings in

∼H m,l(ω) on the right for the wave field components with

particular wave field component indices. In the case of circular harmonics, the dominantcouplings are those sharing the same mode order, where modes with a small difference inthe mode order exhibit also a significant coupling. This property is observed for variousLEMSs and has already been used to formulate approximative models of the LEMS in thewave domain [BSK04, SBR04, SB08a, SK11, SK12a]. Following the original interpretation


0 10 20 30 40

02468

λ

µ

1 kHz

−20 −10 0 10 20

−4−2

024

l

m

1 kHz

0 10 20 30 40

02468

λ

µ

2 kHz

−20 −10 0 10 20

−4−2

024

l

m

2 kHz

0 10 20 30 40

02468

λ

µ

4 kHz

−20 −10 0 10 20

−4−2

024

l

m

4 kHz

−45−40−35−30−25−20−15−10−50dB

Figure 2.14: Logarithmic magnitudes of H µ,λ(ω) (left) and∼H m,l(ω) (right) in dB for

the frequencies ω = 2πf, f = 1 kHz, 2 kHz, 4 kHz. All values have beennormalized to the maximum of all sub-figures in the respective row. The wavefield component indices l and m have been replaced by l and m, respectively,to allow for an interpretation in terms of mode orders.

62

of the wave-domain transforms as an approximation of a singular value decomposition(SVD) of the LEMS, this can be seen as an approximate diagonalization of the LEMS.However, depending on the used wave-domain transform basis, the diagonally dominantstructure may only be visible after permutation of the mode couplings.

2.4.3 Influence of array positionIn this section, the influence of the transducer array positions on the wave-domain prop-erties of an LEMS is discussed (results have already been published in [SK12b]).

The wave-domain properties of an LEMS transformed as described in Sec. 2.4.1 aregenerally influenced by positioning errors and a low minimum distance between the loud-speakers and the microphones. For a discussion of both, the total coupling energy of∼H m,l(ω) is defined by

∼Em,l =

∫ ωmax

0

∣∣∣ ∼H m,l(ω)

∣∣∣2 dω, (2.175)

where ωmax describes the maximum considered frequency, which is assumed to be ω/(2π) =4 kHz in the following. In this discussion, the array setup shown in Fig. 2.15 is consid-ered. There, ∆x denotes the distance of the loudspeaker and the microphone array center,which is considered in the definition of the transforms. On the other hand, ey describesan additional positioning error of the loudspeaker array not considered in the transforms.The LEMS used for obtaining the frequency responses

∼H

m,l(ω) was identical to the onedescribed in Sec. 2.4.2, up to the relative positioning of the arrays.

The measure∼Em,l is shown in Fig. 2.16 for values of ∆x equal to 0 m, 0.25 m, and 0.5 m

with ey = 0 m. It can be seen that couplings of modes of the same order are dominantfor all values of ∆x. As the difference of the Figures 2.16(a), 2.16(b), and 2.16(c) is onlybarely visible, the results are quantified by the ratio of energy captured in the diagonal(m = l) using

Ediag =∑NMm=1

∼Em,O(m)∑NM

m=1∑NLl=1

∼Em,l

. (2.176)

When considering Ediag, a difference becomes noticeable: 89.7%, 80.8%, and 75.6% of thetotal coupling energy are captured by the main diagonal for a distance of the transducerarray centers of ∆x = 0 m, 0.25 m, and 0.5 m, respectively. Unlike ey, the array displace-ment described by ∆x was considered in the definition of the transforms. This confirmsthe usefulness of the transforms for the considered setup.

As it can be seen in Fig. 2.17, a positioning error (represented by ey, while ∆x = 0)has a considerable influence on the wave-domain properties: While a positioning error ofone or two centimeters implies only a minor reduction of Ediag, the diagonal dominancegets lost for ey = 0.04 m. This is reflected in Ediag resulting in 81.2%, 60.9%, and 24.7%for ey = 1, 2, and 0.04 m, respectively.


x

y

∆x

ey

assumed array position

actual array position

Figure 2.15: Circular loudspeaker array shifted by ∆x with respect to the origin of thecoordinate system and introduced positioning error ey

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

(a)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

(b)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

l

m

(c)

−40

−35

−30

−25

−20

−15

−10

−5

0

Figure 2.16: Total energy∼Em,l of the mode couplings in dB with respect to max{

∼Em,l}.

Shift of the loudspeaker array: (a) ∆x = 0 m, (b) ∆x = 0.25 m, (c) ∆x =0.5 m. The wave field component indices l and m have been replaced by l

and m, respectively, to allow for a physically meaningful interpretation.

64

In Fig. 2.18, the combined influence of ey and ∆x can be seen, where the obtainedvalues suggest that a shift of the array does not increase the sensitivity for positioningerrors.


−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m(a)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

(b)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

l

m

(c)

−40

−35

−30

−25

−20

−15

−10

−5

0

Figure 2.17: Total energy∼Em,l of the mode couplings in dB with respect to the maximum

of∼Em,l. Positioning error of the loudspeaker array: (a) ey = 0.01 m, (b) ey =

0.02 m, (c) ey = 0.04 m. The wave field component indices l and m havebeen replaced by l and m, respectively, to allow for a physically meaningfulinterpretation.

0 m 0.01 m 0.02 m 0.04 m20%

40%

60%

80%

ey

Edi

ag

∆x = 0 m∆x = 0.25 m∆x = 0.5 m

Figure 2.18: Diagonal coupling energy Ediag for different values of the positioning error eyand the distance of the array centers ∆x

66

2.5 Discrete-Time Signal Processing forContinuous-Time Quantities

In this section the discrete-time representations of the continuous frequency-domain rep-resentations for signals and systems considered so far are explained. To this end, thelimitations of representing and equalizing continuous frequencies responses by discrete-time filters is discussed in Sec. 2.5.1. After this, the discrete-time representation of theLEMS is defined in Sec. 2.5.2, while the discrete-time representation of the wave-domaintransforms will be described in Sec. 2.5.3. The discretization of the reproduction signalsis treated in Sec. 2.5.4, together with the resulting properties of the reproduction signals.

2.5.1 Representation and equalization of continuous frequencyresponses by discrete-time filters

In this section, representing and equalizing an LEMS by discrete-time filters is discussedto illustrate the limitations that result from using FIR filters for the equalization. The lim-itations are firstly reviewed for the well-known case of single-input/single-output (SISO)systems, before the consideration is generalized to MIMO systems. For the MIMO caseit is shown that the same limitations as in the SISO case exist whenever the number ofinputs and outputs of the LEMS is equal.

As a continuous function, a general frequency response carries an infinite amount ofinformation, even if it is band-limited. In the contrary, a linear time-invariant digitalfilter is described by a finite amount of coefficients, where its input samples x(k) and itsoutput samples d(k) at time instant k obey the following difference equation

d(k − κ) +L′H−1∑κ=1

akd(k − κ) =LH−1∑κ=0

bκx(k − κ) (2.177)

assuming that the filter impulse response is causal. Here, L′H and LH denote the numbers ofthe filter coefficients ak and bk, respectively. The z-domain transfer function correspondingto (2.177) has the following form

H(z) =

LH−1∑κ=0

bκz−κ

1 +L′H−1∑κ=1

aκz−κ

, (2.178)

where z−k represents a delay by k samples and H(z) is the z-domain representation ofH(ω). Various filter design methods can be used to obtain H(z) from H(ω), notableexamples are the bilinear transform (or Tustin’s method), matched z-transform method,

2.5 Discrete-Time Signal Processing for Continuous-Time Quantities 67

and the impulse invariance method [SDRS09, Ant79]. Nevertheless, the design of digitalfilters is a field of research on its own, which exceeds the scope of this thesis such that thereader might refer to general literature on this topic. In this thesis, it is assumed that allconsidered frequency responses can be sufficiently approximated by digital filters, wherecausality is enforced by introducing delays, if necessary.

There are two classes of digital filters, filters exhibiting an infinite-length impulse re-sponse and filters exhibiting a finite-length impulse response, referred to as infinite impulseresponse (IIR) filters and FIR filters, respectively. On the other hand, there are recursivefilters described by the coefficients bk and ak and non-recursive filters described only bythe coefficients bk. Considering (2.177) for filters with a finite number of coefficients, it be-comes clear that non-recursive filters can only describe FIR filters and that the descriptionof IIR filters requires at least one coefficient ak to be non-zero.

In this thesis, the equalization or, identically, inversion of impulse responses plays animportant role for the definition of inverse transforms and for LRE. In the z-domain, thisproblem can be formulated as: find an equalizer response G(z) such that the cascade ofthe equalizer and the considered system describes a desired transfer function H0(z),

H(z)G(z) = H0(z). (2.179)

Here, G(z) can be straightforwardly determined taking the inverse of H(z) multiplied byH0(z). However, assuming H(z) and H0(z) to describe FIR filters, the numerator of H(z)would become the denominator of G(z), which then describes an IIR filter.

Considering an FIR equalizer described by LG coefficients ck, its z-domain transferfunction is given by

G(z) =LG−1∑k=0

ckz−k. (2.180)

In the same way, the desired transfer function can be described by the coefficients h0,k,leading to

H0(z) =LH−1∑k=0

h0,kz−k. (2.181)

Assuming ak = 0 while inserting (2.178), (2.180), and (2.181) in (2.179) leads toLH−1∑κ=0

bκz−κ

LG−1∑k=0

ckz−k

=LH−1∑k=0

h0,kz−k. (2.182)

Considering each h0,k separately and applying polynomial division allows for solving forck:

ck =h0,k −

∑min(k,LH−1)κ=1 ck−κbκ

b0. (2.183)

68

0

0.5

1va

lue

bkbk ∗ ckckh0,k

0 2 4 6 8 10 12 14 16 18 20 22 24

−60

−40

−20

0

LH

LG

LH − 1

k

abso

lute

valu

ein

dB

Figure 2.19: Equalization of a random impulse response with an FIR filter for LH = 10and LG = 15

Except for the trivial case where H(z) describes a frequency-independent attenuation,there is no κ such that ck = 0∀ k > κ. Consequently, the optimal equalizer coefficients ckconstitute a series of infinite length, which renders the perfect equalizer an IIR filter.

Moreover, since h0,k is only of length LH, an optimal equalizer must fulfill∑min(k,LH−1)κ=1 ck−κbκ/b0 = 0 for k > LH − 1. However, as there are only LG coefficients

ck, this cannot be fulfilled for k > LG − 1. Hence, there is an undesired impulse responsetail of LH− 1 samples in the cascade impulse response, whenever an FIR filter is used forequalization of the finite-length impulse response. This is illustrated in Fig. 2.19, where arandom impulse response of finite length (LH = 10) is equalized by a finite-length equal-izer (LG = 15) determined according to (2.183). It can be seen that a perfect equalizationis only achieved for k < LG, whereafter a tail of LH− 1 not imperfectly equalized samplesfollows. This problem can typically be also observed when least-squares optimal equaliz-ers are considered instead of perfect equalizers, while it can be reduced when using othernorms for optimization [MMK10].

The division by b0 requires b0 6= 0, which is intuitive, as preceding zero samples in an


impulse response would imply non-causal equalizers. It is well-known that introducing adelay can be used to avoid an anti-causal inverse. Using a delay of L0 samples, where L0

is greater than the number of leading zero-valued coefficients in bk,

ck =h0,k+L0 −

∑min(k,LH−L0−1)κ=1 ck−κbκ+L0

bL0

(2.184)

can be used to determine the equalizers. The series described by (2.183) and (2.184) willonly converge if the inverse of H(z) describes a stable filter, i. e. no poles are located onor outside the unit circle.

The concept of z-domain transfer functions can be generalized for MIMO systems, usingvectors capturing multiple input and output signals:

x(k) = (x1(k), x2(k), . . . , xNL(k))T , (2.185)d(k) = (d1(k), d2(k), . . . , dNM(k))T , (2.186)

where NL input and NM output signals are considered. Using the coefficient matrices

Bk =

b1,1,k b1,2,k · · · b1,NM,k

b2,1,k b2,2,k · · · b2,NM,k... ... . . . ...

bNL,1,k bNL,2,k · · · bNL,NM,k

, (2.187)

Ak =

a1,1,k a1,2,k · · · a1,NM,k

a2,1,k a2,2,k · · · a2,NM,k... ... . . . ...

aNM,1,k aNM,2,k · · · aNM,NM,k

, (2.188)

(2.177) is represented by

d(k) +L′H−1∑κ=1

Aκd(k − κ) =LH−1∑κ=0

bκx(k − κ)Bκx(k − κ), (2.189)

which leads to a MIMO z-domain transfer function given by

H(z) =

INM +L′H−1∑κ=1

Aκz−κ

−1LH−1∑

κ=0Bκz

−κ

. (2.190)

This representation describes a MIMO system in the z-domain as is well-known from filterbank theory [Vai93]. The equalization problem stated (2.179) is then described by

70

H(z)G(z) = H0(z), (2.191)

where

G(z) =LG−1∑k=0

Ckz−k, (2.192)

H0(z) =LG−1∑k=0

H0,kz−k, (2.193)

Ck =

c1,1,k c1,2,k · · · c1,NL,k

c2,1,k c2,2,k · · · c2,NL,k... ... . . . ...

cNL,1,k cNL,2,k · · · cNL,NL,k

, (2.194)

H0,k =

h0,1,1,k h0,1,2,k · · · h0,1,NM,k

h0,2,1,k h0,2,2,k · · · h0,2,NM,k... ... . . . ...

h0,NL,1,k h0,NL,2,k · · · h0,NL,NM,k

. (2.195)

As it can be seen from comparing the structures of (2.191) and (2.179), G(z) can be foundin the same way as G(z) and an IIR filter results for a perfect equalization of an FIRfilter. The coefficients of a perfect equalizer can be obtained using the following recursion:

Ck = B†0

H0,k −min(k,LH−1)∑

κ=1BκCk−κ

+ Camb,k, (2.196)

where the Moore-Penrose pseudoinverse B†0 of B0 was used because Bk does not neces-sarily represent a square matrix. The matrices Camb,k account for the ambiguity in thesolutions of (2.196), whenever Bk is singular. Assuming B†0 being full rank, there arethree conditions that can result from the dimensions of B†

k:

1. If NM = NL, perfect equalizers are uniquely defined (Camb,k = 0NL×NL).

2. If NM > NL, perfect equalizers do not exist (if no restrictions are imposed on H0,k).Note that a least-squares-optimal equalizer can always be found.

3. If NM < NL, perfect equalizers are underdetermined such that Camb,k is not neces-sarily zero.


If excluding the trivial case Bk = 0NM×NL ∀ k > 0 and if perfect equalizers are uniquelydefined,(2.196) yields an infinite series of matrices. Thus, FIR filters can only approximateperfect equalizers. This is identical to the single-channel case (2.183) and leads to theconclusion that more input and output channels do not render this series finite as alongas NM = NL. On the other hand, if NM < NL, matrices Camb,k can exist such that thisseries actually becomes finite. However, in that case, the matrices Camb,k can only bedetermined considering multiple time instants k simultaneously, which is out of the scopehere. The reader referred to [MK88], where the determination of perfect FIR equalizersfor multiple-input single-output (MISO) systems is discussed. If Bk describes a delay, anadditional delay can be introduced in H0,k in order to find equalizers in the same way asfor the single-channel case in (2.184).

A pure delay constitutes an important prototype impulse response in digital signalprocessing, where a delay by κ samples is described systems by

h(k) = δ(k − κ), (2.197)

where h(k) represents a discrete-time impulse response and δ(k) the Kronecker delta:

δ(k) ={

1 for k = 0,0 elsewhere. (2.198)

A MIMO system representing a pure delay couples all of its inputs to the outputs withthe same index, which implies that the number of inputs must be equal to the numberof outputs (NM = NL). According to the notation above, a MIMO system representing apure delay by κ samples is described by H(z) with

Bk ={

INL for k = κ,

0NL×NL otherwise, (2.199)

Ak = 0NM×NM . (2.200)

2.5.2 Discrete-time representation of the loudspeaker-enclosure-microphone system

The LEMS has to be represented in the discrete-time domain in order to be modeledfor digital signal processing. To this end, the loudspeaker signals, so far represented byX λ(ω), are transformed back to the time domain and sampled with a sampling frequencyfs such that their discrete-time representations xλ(k) are obtained. In the same way, thediscrete-time representations dµ(k) of the microphone signals are obtained from D µ(ω).

The scope of this thesis is limited to the representation of all systems by FIR filters.This allows to represent filtering operations by matrix-vector-multiplications, where thesignal samples considered at a given time instant k are captured by vectors. The vectorscapturing the loudspeaker signals are then defined as

x(k) = (x1(k),x2(k), . . . ,xNL(k))T , (2.201)xλ(k) = (xλ(k − LX + 1), xλ(k − LX + 2), . . . , xλ(k))T , (2.202)

72

where LX describes the length of the individual signal segments xλ(k). Likewise, thevectors describing the individual microphone signals capture LD time samples for eachchannel and are defined as

d(k) = (d1(k),d2(k), . . . ,dNM(k))T , (2.203)dµ(k) = (dµ(k − LD + 1), dµ(k − LD + 2), . . . , dµ(k))T . (2.204)

Linear filtering, as described by (2.139), would then be described by

d(k) = Hx(k) + n(k), (2.205)

where the individual samples of the microphone signals are obtained according to

dµ(k) =NL∑λ=1

LH−1∑κ=0

xλ(k − κ)hµ,λ(κ) + nµ(k). (2.206)

Here, the impulse responses hµ,λ(k) of length LH are the discrete-time representations ofH µ,λ(ω), while nµ(k), which has no representation in the considerations above, describesnoise signals and/or the microphone signal contributions of a local acoustic scene in theLEMS (see Fig. 1.1). The vector n(k) exhibits the same structure as d(k), capturing thesamples nµ(k) instead of dµ(k). Note that the impulse responses of real LEMSs are ingeneral not of finite length. However, this fact will be neglected in the following, assumingthe late parts of the impulse response have sufficiently decayed such that they are notsignificant for the following analyses. In order to describe (2.206) by (2.205), the relationof LX and LD must fulfill LX ≥ LD + LH − 1. Equations (2.201) to (2.206) lead to

H =

H1,1 H1,2 . . . H1,NL

H2,1 H2,2 . . . H2,NL... ... . . . ...

HNM,1 HNM,2 . . . HNM,NL

, (2.207)

where each sub-matrix constitutes a convolution matrix according to

Hµ,λ =

hµ,λ(LH − 1) hµ,λ(LH − 2) . . . hµ,λ(0) 0 . . . 0

0 hµ,λ(LH − 1) . . . hµ,λ(1) hµ,λ(0) . . . 0... ... . . . ... ... . . . ...0 0 . . . 0 hµ,λ(LH − 1) . . . hµ,λ(0)

.(2.208)

An example for the resulting matrix structure is given in Fig. 2.20, where the case NL = 3and NM = 2 is illustrated.


d(k) = H · x(k) + n(k)

= ·

+

d1(k)d2(k)

H1,1H1,2 H1,3

H1,1 H2,2H3,3

x1(k)

x2(k)

x3(k)

n1(k)

n2(k)

= column vector

= convolution matrix according to (2.208)

Figure 2.20: Structures of the matrices representing MIMO filtering for NL = 3 and NM =2.

The discrete-time representation of (2.147) is given by∼d(k) =

∼H∼x(k) + ∼n(k), (2.209)

where∼x(k) = (∼x1(k), ∼x2(k), . . . , ∼xNL(k))T , (2.210)

∼xl(k) =(

∼xl(k −

∼LX + 1), ∼

xl(k −∼LX + 2), . . . , ∼

xl(k))T, (2.211)

is the wave-domain representation of x(k). Since the wave-domain transforms will beimplemented by FIR filters of length LT (see Sec. 2.5.3), the lengths of these transformshas to be accounted for when choosing the the length of the wave-domain signal segments.Hence, ∼x(k) captures only

∼LX = LX − LT + 1 time samples in each component ∼xl(k). In

the same way,∼d(k) =

(∼d1(k),

∼d2(k), . . . ,

∼dNM(k)

)T, (2.212)

∼dm(k) =

( ∼dm(k −

∼LD + 1),

∼dm(k −

∼LD + 2), . . . ,

∼dm(k)

)T(2.213)

is the wave-domain representation of d(k), capturing∼LD = LD − LT + 1 time samples in

each component∼dm(k).

Like (2.205), (2.209) represents a discrete-time convolution according to

∼dm(k) =

NL∑l=1

∼LH−1∑κ=0

∼xl(k − κ)

∼hm,l(κ) + ∼

nm(k). (2.214)

The vector ∼n(k), capturing the samples ∼nm(k), is the wave-domain representation of n(k).

74

2.5.3 Discrete-time representation of wave-domain transforms

In this section, discrete-time representations of the transforms derived above are described,by summarizing results from [SK12b] such that compatibility with the notation used inthis thesis is given.

The MST, as described by (2.162), can be realized in two steps, namely, by a discreteFourier transform (DFT) with respect to the microphone indices and by filtering usinga frequency response according to the inverse of Bm (RM%). The first step constitutes apurely spatial transform, which is frequency-independent and therefore invariant to tem-poral transforms. On the other hand, considering Bm (RM%) implies frequency-dependencybut is independent of the individual microphones and allows for a separate considerationof all wave-domain signal components (indexed by m), when designing according FIR orIIR filters.

Still, without a scatterer within the microphone array, Bm (RM%) exhibits zeros inthe z-domain transfer function that are located on the unit circle and become poles for1/Bm (RM%). This renders the design of corresponding filters challenging and will gen-erally require the use of approximations. Omitting the inverse of Bm (RM%) in the MSTis a tractable approach to avoid this problem, which implies that Bm (RM%) should bemodeled by the wave-domain LEMS

∼H when considering (2.209). As Bm (RM%) exhibits

only zeros instead of poles, its modeling does not constitute a major challenge and willbe implicitly facilitated by the adaptive filter.

For AEC, this simplifies implementing transforms and will not cause any disadvantages[SK11]. This approach is also possible for LRE, where the consistency of all signal rep-resentations is retained without further complications. Nevertheless, if the residual errorof the reproduced wave field apart from the microphone positions should be determinedusing the wave-domain error signals, a frequency-dependent weighting of 1/Bm (RM%) hasto be considered.

For the inverse MST, the individual wave-domain signals are filtered before the spatialtransform is applied. Regarding the former step, designing filter with a frequency re-sponse according to Bm (RM%) is not challenging because no poles have to be considered.Obviously, when the inverse of Bm (RM%) is disregarded in the MST, Bm (RM%) is notconsidered for the inverse MST either.

The realization of the LST allows for a two-step procedure similar to the MST. Equa-tions (2.169) and (2.172) can be straightforwardly implemented using fractional delayfilters [LVKL96] that are followed by a DFT-like frequency-independent transform givenby e−jlα

(L)λ . The same holds for (2.171) and (2.173), when exchanging the order of the two

steps.The individual filtering operations for the transforms can be readily associated with

impulse responses of digital filters as described in the following. When disregarding theinverse of Bm (RM%) in the MST, it is represented by an NM × NM MIMO FIR filter


structure described by

h(TM)m,µ (k) = δ(k)e

−jmα(M)µ

NM, (2.215)

h(TM)m,µ (k) = δ(k)ejmα

(M)µ , (2.216)

that directly implements (2.161) and (2.164). If α(M)µ is uniformly distributed, (2.215)

and (2.216) represent the DFT and its inverse, respectively, as described later.The LST is implemented by discrete-time impulse responses that represent T L,l,λ(ω).

The impulse response describing the coupling of the discrete-time loudspeaker signal λ tothe wave field component l captured in ∼x(k) is given by:

h(TL)l,λ

(k) = 1%

(L)λ

hd(k − %(L)

λfs/c, LW

)e−jl(α

(L)λ−π/2), (2.217)

where LW is the time window length of the necessary fractional delay filter and themultiplication by 1/(4π) has been neglected. The impulse responses of the fractionaldelay FIR filters according to [LVKL96] are described by

hd (k − k′, LW) =sin(π(k − k′))π(k − k′)

w (k − k′, LW) (2.218)

where k′ is the non-integer delay and w (k − k′, LW) describes an appropriate windowfunction enforcing the finite time support of hd (k − k′, LW). For the discrete-time repre-sentation of the inverse LST T−1

L (ω) the impulse response,

h(TL)l,λ (k) =

%(L)λ

NLhd

(k −

(maxλ

{%

(L)λ

}− %(L)

λ

)fs/c, LW

)ejl(α

(L)λ−π/2) (2.219)

can be written in the same way.The discrete-time convolutions to implement the defined transforms are given by

∼xl(k) =

NL∑λ=1

LT−1∑κ=0

xλ(k − κ)h(TL)l,λ

(κ), (2.220)

∼dm(k) =

NM∑µ=1

LT−1∑κ=0

dµ(k − κ)h(TM)m,µ (κ), (2.221)

xλ(k) =NL∑l=1

LT−1∑κ=0

∼xl(k − κ)h

(TL)l,λ (κ), (2.222)

dµ(k) =NM∑m=1

LT−1∑κ=0

∼dm(k − κ)h

(TM)m,µ (κ), (2.223)

where LT ≥ dLW/2 + k′e is the length of the impulse responses used to describe thetransforms. For LT 6= 1, the back-transformed loudspeaker and microphones signals,

76

denoted by xλ(k) and dµ(k), are not identical to the original loudspeaker and microphonesignals and require therefore a different notation. The difference of those signals is due tothe fact that h(TL)

l,λ(k) and h

(TM)m,µ (k) are FIRs approximations of the transforms described

in Sec. 2.4. Since those transforms have the same number of inputs and outputs, they canonly be approximately inverted by another transform realized by FIR filters, as explainedin Sec. 2.5.1. This is different from FIR filter banks where one input signal is transformedto multiple subband signals [Vai93]. In the same way, perfect inverse filtering of a MIMOFIR system by a MIMO FIR filter generally requires NL 6= NM [MK88].

Equations (2.220) to (2.223) are represented by the matrix multiplications

∼x(k) = TLx(k), (2.224)∼d(k) = TMd(k), (2.225)x(k) = TL

∼x(k), (2.226)d(k) = TM

∼d(k). (2.227)

Here, x(k) and d(k) capture xλ(k) and dµ(k), respectively, where their dimensions differalso from ∼x(k) and

∼d(k) to maintain consistency. The signal segment sizes in x(k) and

d(k) are given by LX−2LT + 2 and LD−2LT + 2, respectively. In (2.226) and (2.227) thebar was used to denote the inverse LST and the inverse MST as TL and TM, respectively.This choice accounts for the fact that TL and TM do not represent the inverse matricesof TL and TM, respectively, if LT > 1. In that case, TL and TM are non-square matricesand TL and TM are chosen such that x(k) and d(k) approximate the last signal samplescaptured in x(k) and d(k), respectively.

An important special case of the transform is obtained for a concentric arrangement oftwo UCAs, one as loudspeaker array and the other one as microphone array. DisregardingBm (RM%) in (2.162) and neglecting the time delay in (2.217), as it is equal for allloudspeakers, LT = 1 can be chosen. In that case, the transform matrices can be definedby

TM = FNM ⊗ ILD , (2.228)TL = FNL ⊗ ILX , (2.229)

where the Kronecker product of two matrices is denoted by ⊗, the unitary DFT matrixFL of length L is defined by

[FL]ζ,η = 1√Le−j (ζ−1) (η−1) 2π

L , (2.230)

and [FL]ζ,η indexes an entry in FL located in row ζ and column η.


GR,λ,q(ω)GR

Qq(ω)

q(k)

X λ(ω)

x(k)

Figure 2.21: Signal model of the reproduction system

2.5.4 Properties of the reproduction signalsIn this section, the statistical properties of the reproduction signals are explained consid-ering the general model of a reproduction system described by (2.150), where all signalsare assumed to be wide-sense stationary random processes.

Equation (2.150) states that all NL loudspeaker signals described by X λ(ω) are ob-tained by filtering NS source signals. Consequently, when considering a matrix SXX(ω),composed of the auto- or cross-power spectral density functions S XX,λ,λ′(ω) of the loud-speaker signals indexed with λ and λ′, its rank is governed by NS. Using the definition

SXX(ω) =

S XX,1,1(ω) S XX,1,2(ω) . . . S XX,1,NL

(ω)S XX,2,1(ω) S XX,2,2(ω) . . . S XX,2,NL

(ω)... ... . . . ...

S XX,NL,1(ω) S XX,NL,2(ω) . . . S XX,NL,NL(ω)

, (2.231)

the rank of SXX(ω) is limited to min(NL, NS). This is because of

SXX(ω) = GR(ω)SQQ(ω)GHR(ω), (2.232)

where SQQ(ω) is the power spectral density matrix of the source signals, (·)H denotes theconjugate transpose and

GR(ω) =

GR,1,1(ω) GR,1,2(ω) . . . GR,1,NS

(ω)GR,2,1(ω) GR,2,2(ω) . . . GR,2,NS

(ω)... ... . . . ...

GR,NL,1(ω) GR,NL,2(ω) . . . GR,NL,NS(ω)

(2.233)

describes the MIMO reproduction system.Consequently, a necessary condition for SXX(ω) to be full-rank is NS ≥ NL, where it is

required that the source signals have a generalized coherence function [GC88] below onefor all frequencies [SB08b], noting that one is the maximum of that function. Additionally,the transfer functions GR,λ,q(ω) from the sources to the loudspeakers must constitute NS

linearly independent vectors in GR(ω). This important result is reformulated in thediscrete-time domain in the following.

To this end, a signal model for the reproduction system is shown in Fig. 2.21, wherethe vector q(k) comprises the NS source signals according to

q(k) = (q1(k), q2(k), . . . , qNS−1(k))T , (2.234)qq(k) = (qq(k − LQ + 1), qq(k − LQ + 2), . . . , qq(k))T , (2.235)

78

where LQ is the signal segment length of the individual components qq(k), and qq(k)denotes a time-domain signal sample of source q. The matrix GR represents GR(ω) inthe discrete-time domain and is structured such that

x(k) = GRq(k), (2.236)

describes the convolution of the source signals qq(k) with the impulse responses gR,λ,q(k),as illustrated for H in Fig. 2.20. The impulse responses gR,λ,q(k) are used to obtain theloudspeaker signals xλ(k) from the source signals qq(k) according to

xλ(k) =NS∑q=1

LR−1∑κ=0

qq(k − κ)gR,λ,q(κ), (2.237)

where the impulse responses gR,λ,q(k) have a length of LR samples and represent GR,λ,q(ω)in the discrete time domain.

Considering the correlation matrix RXX of the loudspeaker signals, the following relationis obtained

RXX = E{x(k)xH(k)

}= GRRQQGR

H , (2.238)

where RQQ is the correlation matrix of the source signals according to

RQQ = E{q(k)qH(k)

}. (2.239)

From (2.237) follows LQ ≥ LX + LR − 1 such that RQQ has a dimension of NS(LX +LR − 1)×NS(LX + LR − 1), while RXX has a dimension of NLLX ×NLLX. A necessarycondition for RXX to be full-rank is

NLLX ≤ NS(LX + LR − 1), (2.240)

with the requirements for the source signals and for the impulse responses of the reproduc-tion system as mentioned above. Identical to the continuous-time case mentioned above,(2.240) is always fulfilled if NS ≥ NL, but it can also be fulfilled if NS < NL as long as LR

is large enough. This is due to the limited time span of the signals considered in RXX,which cannot describe correlations exceeding this time span. Hence, those contributionsappear as uncorrelated components in RXX.

79

3 Wave-Domain System Identification

In this chapter, the task of supervised multiple-input multiple-output (MIMO) systemidentification is treated, which is a prerequisite for a wide range of applications, such asacoustic echo cancellation (AEC), listening room equalization (LRE), and active noisecontrol (ANC). While ANC is beyond the scope of this thesis, LRE will be treated later,in Chapter 4. As AEC is implicitly facilitated by the system identification approachesunder consideration, it will be already discussed in this chapter.

In supervised system identification, certain properties of a system are identified observ-ing the statistical properties of input and output signals, where the scope of this thesisis limited to second-order statistics. Given sufficient estimates of the second-order statis-tics of input and output signals, a linear time-invariant (LTI) system can be identified inone step by estimating the minimum mean square error (MMSE) solution by solving asystem of linear equations w. r. t. the least squared errors. However, in real-world imple-mentations, iterative system identification approaches are used more commonly becauseof a higher computational efficiency and the ability to track the potential time varianceof the system under consideration. Those approaches are often termed adaptive filters,especially when a linear estimator (filter) is computed.

The signal model and the task description of system identification are discussed inSec. 3.1, where the application of AEC is also explained. In Sec. 3.2, approximativemodels for wave-domain system identification are presented, which can reduce the com-putational effort for MIMO system identification. When typical loudspeaker signals excitea loudspeaker-enclosure-microphone system (LEMS), it is often impossible to identify thesystem uniquely. This is referred to as nonuniqueness problem and will be treated inSec. 3.3, where a heuristic method to mitigate the impact of this problem is also de-scribed. There are various iterative algorithms that can be used for system identification.Common examples are the least mean squares (LMS) algorithm, the affine projection al-gorithm (APA), and the recursive least squares (RLS) algorithm, which will be describedin Sec. 3.4. The formulations of the algorithms presented there allow for the identifica-tion of MIMO systems using the approximative models described in Sec. 3.2 and for animplementation of the heuristic method described in Sec. 3.3. Furthermore, the general-ized frequency-domain adaptive filtering (GFDAF) algorithm is derived in Sec. 3.4 as adiscrete Fourier transform (DFT)-domain approximation of the RLS algorithm.

While a brief overview of the challenges in MIMO system identification is given in[SK13a], this chapter provides a comprehensive and detailed analysis of the proposedsolutions.

80

Adaptation algorithm H(n) H

esi(k)

x(k)

n(k)d(k)d(k)−esi(k)

+ +

H(n)

Figure 3.1: Signal model for conventional system identification in the point-to-point do-main. The gray part is only necessary for cases where the error signal will beused for other purposes as well, such as in AEC.

3.1 Signal Model and Task DefinitionIn this section, the task of system identification is defined and the signal model for wave-domain system identification is compared to the point-to-point model for conventionalapproaches.

The task of system identification is to obtain an estimate for H, observing the loud-speaker signals captured in x(k) and the microphone signals captured in d(k), where H,x(k), and d(k) have been introduced in Sec. 2.5.2. Typically, system identification is seenas a supervised adaptive filtering problem [BK08]. The signal model for conventionalsupervised system identification is shown in Fig. 3.1, where an estimate d(k) of the mi-crophone signal is obtained by filtering the loudspeaker signals by the identified LEMSH(n). To describe the time-dependency of H(n), the block time index n is used, as theadaptation algorithms will be formulated for block-wise data processing in Sec. 3.4. Therelation of the block time index n to the discrete time index k is given by

n =⌊k

LF

⌋, (3.1)

where LF denotes the relative time shift between two blocks along the time axis (“frameshift”).

As an estimate of d(k), the vector d(k) shares the structure of d(k) and the systemidentification error signal, which is defined by

esi(k) = d(k)− d(k), (3.2)esi(k) = d(k)− H(n)x(k) (3.3)

= (esi,1(k), esi,2(k), . . . , esi,NM(k))T , (3.4)esi,µ(k) = (esi,µ(k − LD + 1), esi,µ(k − LD + 2), . . . , esi,µ(k))T , (3.5)

where LD describes the length of the individual signal segments in d(k), d(k), and esi(k).Note that d(k), d(k), and esi(k) are computed for each microphone signal and the problem

3.1 Signal Model and Task Definition 81

of system identification could be considered separately for each signal component esi,µ(k)without loss of generality. Still, in order to emphasize the generality of the followingconsiderations, all NM microphone channels will be considered simultaneously.

As H cannot be directly observed, the LEMS is identified adapting H(n) such thatthe error esi(k) is minimized with respect to a suitable norm, where the Euclidean normis chosen in most cases. While the algorithms described in Sec. 3.4 follow different op-timization criteria, it is sufficient for now to consider a minimization of the error in themean-squares sense according to

argminH(n)

{E{eHsi (k)esi(k)

}}, (3.6)

where argminx{f (x)} provides that value of x, for which f (x) is minimized. Using such

a quadratic cost function implies to consider second order statistics, where higher-orderstatistics would require a cost function involving the third or higher power of the errorsignal. In (3.6), the matrix H(n) constitutes an MMSE estimate of H, which can bedetermined by solving the well-known Wiener-Hopf equations. Furthermore, as H(n)only describes discrete-time causal finite impulse response (FIR) filters, the Wiener-Hopfequations can be reduced to the normal equation [Hay02]:

RXXHH(n) = RXD, (3.7)

where

RXD = E{x(k)dH(k)

}(3.8)

is the correlation matrix of the loudspeaker and microphone signals and RXX is definedaccording to (2.238). Furthermore, H(n) is restricted to a convolution matrix as describedin Sec. 2.5.2.

In practice, expectations are not available such that only estimates of the Wiener so-lution can be computed. Moreover, solving (3.7) would imply tremendous computationalcost due to the large dimensions of RXX and can be an ill-conditioned [GVL96] problem.To circumvent those problems, an adaptation algorithm aims at iteratively finding anH(n), such that after convergence (3.7) is fulfilled in good approximation, while RXX andRXD are implicitly or explicitly estimated. In practice, noise sources or the signal of thelocal acoustic scene captured in n(k) can hamper the estimation of RXD. Although thesignals of noise sources or the local acoustic scene contained in n(k) may be assumed tobe uncorrelated with the loudspeaker signals x(k), they will typically not be orthogonalwithin a limited time span. As any real adaptation algorithm can only observe the sig-nals for a finite time interval, the signals captured in n(k) can cause a divergence of theadaptive filters. While the microphone noise will not constitute a significant problem ismost cases, active sources in the local acoustic scene may have a strong impact. Hence,

82

a double talk detection (DTD) system is typically used to detect this activity and willstall the adaptation in that case [BDH+99, HS04]. Many of the well-known single-channelsolutions for DTD, e. g., [YW91, CMB99, BMC00], can be straightforwardly generalizedto the multichannel case [SH15]. However, a discussion of this topic exceeds the scope ofthis thesis.

Even if RXX and RXD were known, a perfect identification as described by H(n) = Hmight not be achieved. As real room impulse responses are assumed to be of infinite length,they can be only be approximated by FIR filters. The effects of this undermodeling canbe described by considering a length LH of the impulse responses captured in H(n) whichis smaller than the length LH of the impulse responses captured in H. Moreover, anapproximative model, as described in Sec. 3.2, would impose further restrictions on H(n).In the following, the case LH < LH and approximative models will be referred to as limitedmodels, as they may describe H only to a limited extent. Another problem can be thelack of a unique solution for (3.7) if RXX is singular, as described in Sec. 3.3. Finally,the huge number of linear equations described by (3.7) can preclude its computation inpractice.

Eventually, there can be a significant difference between the true and the identifiedLEMS, described by H(n) and H, respectively. The normalized system misalignment canthen be used to assess this difference and is defined as

∆h(n) = 20 log10

∥∥∥H(n)−H

∥∥∥F

‖H‖F

dB, (3.9)

where ‖·‖F denotes the Frobenius norm. The latter is used, because it is proportional tothe energy of the impulse responses captured in the respective convolution matrices. Still,it can be seen from (2.208) that convolution matrices exhibit a shifted repetition of theimpulse responses in each row. Hence, the Frobenius norm measures the energy capturedin the impulse responses multiplied by the number of rows of the individual convolutionmatrices, which is given by LD. As all matrices in (3.9) have the same dimensions, thisscaling factor cancels out. Note that (3.9) can in general only be obtained in simulations,as the true LEMS H is unknown for real-world acoustic systems.

A signal model for wave-domain system identification is shown in Fig. 3.2, where allconsidered signal vectors are represented by their respective wave-domain representationsdenoted by a tilde. The wave-domain loudspeaker and microphone signals, denoted by∼x(k) and

∼d(k), have been introduced in Sec. 2.5.2. Still, there are three quantities which

have not been mentioned before: the estimate for the wave-domain microphone signal∼d(k), the identified LEMS in the wave domain

∼H(n), and the wave-domain error signal


∼esi(k), which is defined by

∼esi(k) =∼d(k)−

∼d(k) (3.10)

=∼d(k)−

∼H(n)∼x(k) (3.11)

=(

∼eTsi,1(k), ∼eTsi,2(k), . . . , ∼eTsi,NM(k))T, (3.12)

∼esi,m(k) =(

∼esi,m(k −

∼LD + 1), ∼

esi,m(k −∼LD + 2), . . . , ∼

esi,m(k))T, (3.13)

enforcing the same structure for∼d(k). The wave-domain microphone signal segment

length has to be chosen according to∼LD = LD − LT + 1 (3.14)

for a compatible vector length. Similarly, esi(k) is represented by e′si(k) with an accord-ingly reduced signal segment length. The matrix

∼H(n) is constructed of single-channel

convolution matrices (see (2.207) and (2.208)), where the respective signal vector lengthshave to be considered. The normalized misalignment for wave-domain system identifica-tion can be determined using

∆h(n) = 20 log10

∥∥∥∥TM

∼H(n)TL −H

∥∥∥∥F

‖H‖F

dB. (3.15)

Obviously, when omitting the transforms, Fig. 3.2 is equivalent to Fig. 3.1. Conse-quently, the conventional system identification in the point-to-point domain can be inter-preted as a special case of wave-domain system identification with the transform matricesTL and TM being equal to identity matrices with the respective dimensions, which impliesLT = 1. Hence, all following analyses in the wave domain include the conventional systemidentification as a special case.

In Fig. 3.3, an alternative configuration for wave-domain adaptive filtering (WDAF)AEC or system identification is shown, where H′ replaces H to retain consistency of thematrix and vector dimensions. Unlike Fig. 3.2, this configuration uses a back transform ofthe wave-domain loudspeaker signal to their original domain, where the resulting signalis denoted by x(k) to account for the effects of the inverse loudspeaker signal transform(LST) (see Sec. 2.5.3). For wave-domain LRE, the loudspeaker signals are pre-equalizedin the wave domain such that the pre-equalized loudspeaker signals are already availablein the wave domain and have to be transformed back to their original domain anyway.Hence, a configuration as shown in Fig. 3.3 is preferable, as the system identificationcan operate on the pre-equalized wave-domain loudspeaker signals that need not to betransformed again.

84

TL

Adaptation algorithm ∼H(n) H

TM TM

wave domain

TM

TL

TM

x(k)

∼x(k)

n(k)d(k)∼d(k)

∼d(k)−

∼esi(k)e′si(k)+ +

∼H(n)

Figure 3.2: Signal model for wave-domain system identification in loudspeaker-signal-preserving configuration. The gray parts are only necessary for cases wherethe error signal will be used for other purposes as well, such as in AEC.

TL TL

Adaptation algorithm ∼H(n) H′

TM TM

wave domain

TM

TL TL

TM

x(k) ∼x(k) x(k)

n(k)d(k)∼d(k)

∼d(k)−

∼esi(k)e′si(k)+ +

∼H(n)

Figure 3.3: Signal model for wave-domain system identification in loudspeaker-signal-transforming configuration. The gray parts are only necessary for cases wherethe error signal will be used for other purposes as well, such as in AEC.


Still, for implementation of AEC, a configuration according to Fig. 3.2 is preferable be-cause of the reduced computational effort for computing transforms if no LRE is involved.Moreover, in this configuration, the LST cannot degrade the reproduced loudspeaker sig-nals and an explicit definition of the inverse transform is not necessary. The latter allowsfor an efficient design of the LST for approximative models, where some components ofthe wave-domain loudspeaker signals are not coupled in the LEMS model (see Sec. 3.2).Hence, it is not necessary to invest computational effort to obtain those components bythe LST.

Considering Fig. 3.3, it can be seen that∼H(n) identifies the cascade of TL, H′, and

TM, which implies that the inverse LST is also identified by∼H(n). This also holds for

the configuration shown in Fig. 3.2, where the inverse is implicitly identified. Hence, caremust be taken that TL is sufficiently well conditioned in order to avoid large singularvalues of

∼H(n). Still, there is a difference between the two cases: an identification of

the cascade of TL and H′ according to Fig. 3.3, will capture TL according to its explicitdefinition as a causal MIMO filter of length LT. On the other hand, the representation ofTL for the implicit identification in Fig. 3.2 is not necessarily of length LT, nor causal. Thedifference regarding the impulse response length results because an inverse of an FIR filtercan generally not be represented by an FIR filter (see Sec. 2.5.1). The lack of causalityis due to the additional delay in the explicit definition of the LST (see Sec. 2.4.1), whichresults in a negative delay for a perfect inverse. Note that the latter has no representationin Fig. 3.2. However, inserting an appropriate finite delay will render the identified cascadeof systems always causal.

3.1.1 Acoustic echo cancellation as an example of systemidentification

An example of a typical relevant system identification task is given by AEC, which isbriefly described in this section.

The signal models for system identification, shown in Figures 3.1 to 3.3, describe thesignal model for AEC while the gray parts in Figures 3.1 to 3.3 play an important role: Theerror signal esi(k) actually includes the desired clean signal while the source of interestin the acoustic scenario is active. When system identification is applied to AEC, theerror signal esi(k) (or, equivalently, e′si(k)) is in the main focus of interest instead of theidentified system itself. This is because the error signal esi(k) also represents the echo-canceled signal, which is sent to the far-end party in a telecommunication scenario or toan automatic speech recognition (ASR) system when implementing an acoustic human-machine-interface. If the system is perfectly identified, the error signal is identical to thesignal reflecting the local acoustic scene (esi(k) = n(k)) and the echoes of the loudspeakersignals are perfectly eliminated. As the cancellation of the loudspeaker echo in the signaldescribed by esi(k) is the most important objective for AEC, a normalized measure ofthis, the echo return loss enhancement (ERLE), is used to assess the AEC performance.

86

It is given by

ERLE(k) = 20 log10

(‖d(k)‖2

‖esi(k)‖2

)dB ≈ 20 log10

∥∥∥TM

∼d(k)

∥∥∥2∥∥∥TM

∼esi(k)∥∥∥

2

dB, (3.16)

where n(k) and ∼n(k) are assumed to be zero. If TM is unitary, which implies LT = 1, theapproximation in (3.16) holds for an equation.

There are also other approaches to remove the acoustic echoes from the microphonesignals: Some approaches use spectral subtraction [Bol79], which is then referred to asacoustic echo suppression and often used as a postfiltering method for AEC [GMJV02].It is also possible to use acoustic echo suppression as the only measure to remove theloudspeaker echoes [FC05].

Other approaches aim at minimizing the sound pressure at the microphone positionsby a modification of the reproduced scene [HSB11]. Still, a discussion of these approacheswould exceed the scope of this thesis.

3.1.2 Matrix and vector notation for system identificationIn this section, the matrix and vector notation for the following sections of this chapteris introduced considering only wave-domain quantities. To this end, the wave-domaintransforms will be assumed to be unitary and LT = 1, TLTL = INLLX

1, TL = THL and

TMTM = INMLD ,TM = THM, if not stated otherwise, which results in

∼LX = LX,

∼LH = LH,

and∼LD = LD. This is necessary for a mathematically rigorous derivation and ensures

applicability of all results to system identification approaches without transforms.The notation used so far was chosen for a convenient and intuitive description of the

MIMO filtering of multichannel signals (vectors) by MIMO systems (matrices). In thiscontext, the impulse responses of all input-output paths of a MIMO system and the inputsignals were known, while the output signals had to be determined. For the descriptionof adaptation algorithms, a different paradigm is considered: the coefficients of a MIMOimpulse responses have to be determined by solving an optimization problem. Whileit is easy to construct a convolution matrix from known impulse responses, it is notstraightforwardly possible to restrict a matrix-valued solution for an optimization problemto be a convolution matrix. This problem is also known in blind source separation, wherea so-called Sylvester constraint is used to assure according matrix structures [BAK05].

To circumvent this problem, a different notation will be used, which is described in thefollowing. An equivalent to (3.11) is given by

∼esi(k) =∼d(k)−

∼X(k)

∼h(n), (3.17)

1INLLX denotes the NLLX ×NLLX identity matrix


where the NLNMLH-component column vector

∼h(n) =

(hT1,1(n), hT1,2(n), . . . , hT1,NL

(n), hT2,1(n), hT2,2(n), . . . , hT2,NL(n),

. . . , hTNM,1(n), hTNM,2(n), . . . , hTNM,NL(n)

)T, (3.18)

hm,l(n) =(

∼hm,l(0, n),

∼hm,l(1, n), . . . ,

∼hm,l(LH − 1, n)

)T(3.19)

captures the coefficients of the MIMO FIR filter. The loudspeaker signals are then cap-tured in the NMLD ×NLLH matrix

∼X(k) = INM ⊗

( ∼X1(k),

∼X2(k), . . . ,

∼XNL(k)

), (3.20)

∼Xl(k) =

∼xl(k − LD + 1) ∼

xl(k − LD) · · · ∼xl(k − LD − LH + 2)

∼xl(k − LD + 2) ∼

xl(k − LD + 1) · · · ∼xl(k − LD − LH + 3)

... ... . . . ...∼xl(k) ∼

xl(k − 1) · · · ∼xl(k − LH + 1)

. (3.21)

As in the case of ∼x(k), the oldest considered samples in∼X(k) are from time instant

nLF − LD − LH + 2. When comparing (2.201), (2.202), (2.207), and (2.208) to (3.18)to (3.21), it can be seen that (3.17) is identical to (3.11) up to an exchange of the termrepresenting the loudspeaker signals and the filter coefficients in their order, such that

∼X(k)

∼h(n) =

∼H(n)∼x(k) (3.22)

holds, for any values of NL, NM, LX, LD, and LH. The relation (3.22) is illustrated inFig. 3.4. Furthermore, it can be seen that the temporal order of the elements in

∼X(k)

and∼h(n) is reversed compared to

∼H(n) and ∼x(k). This results in

∼X(k) being a circulant

matrix, which is a property that will be exploited in Sec. 3.4.Obviously, taking the Kronecker product with the identity matrix in (3.20) introduces

redundancy in the matrix∼X(k). This is necessary because MIMO filtering operations rep-

resent the same structure as matrix-vector multiplications (see (2.207)), which precludes astraightforward exchange of the operands (see also Sec. 4.5.1, later). The rearrangementin (3.22) can be validated using the equivalence [HS81]

ADB = C⇔(BT ⊗A

)vec (D) = vec (C) , (3.23)

where the vectorization operator vec (·) is used that creates a single column vector bystacking the column vectors of a matrix. Similar to (3.22), the matrix D was moved from

88

∼d(k) =

∼H(n) · ∼x(k)

= ·

∼d(k) =

∼X(k) ·

∼h(n)

= ·

∼d1(k)∼d2(k)

∼H1,1(n)

∼H1,2(n) ∼

H1,3(n)

∼H2,1(n) ∼

H2,2(n)∼H2,3(n)

∼x1(k)

∼x2(k)

∼x3(k)

∼d1(k)∼d2(k)

∼X1(k)

∼X2(k)

∼X3(k)

∼X1(k)

∼X2(k)

∼X3(k)

∼h1,1(n)∼h1,2(n)∼h1,3(n)∼h2,1(n)∼h2,2(n)∼h2,3(n)

= column vector



Figure 3.4: Differently structured matrices describing the same MIMO filtering operationfor NL = 3, NM = 2

the middle to the right-hand side of the multiplication term. The same is described by(3.22), when considering the case LD = 1, reversing the temporal element order in

∼X(k)

and∼h(n), while choosing A = ∼xT (k), D =

∼HT (n), and B = INM .

For unitary transforms, (3.6) is identical to

argmin∼h(n)

{E{

∼eHsi (k)∼esi(k)}}

, (3.24)

which allows for considering (3.17) instead of (3.3) for system identification, such that(3.7) is represented by

∼RXX

∼h(n) = ∼rXD, (3.25)


∼RXX = E

{ ∼XH(k)

∼X(k)

}

= E

∼X1(k)

∼X2(k)∼

X3(k)

∼X1(k)∼

X2(k)

∼X3(k)

∼X1(k)∼X2(k)∼X3(k)

∼X1(k)∼X2(k)∼X3(k)

= single-channel correlation matrix


Figure 3.5: Structure of autocorrelation matrix used for system identification for NL =3, NM = 2.

with∼RXX = E

{ ∼XH(k)

∼X(k)

}, (3.26)

∼rXD = E{ ∼XH(k)

∼d(k)

}(3.27)

being the wave-domain representations of RXX and RXD, respectively. The dimensions of∼RXX are NLNMLH×NLNMLH, while the dimensions of ∼rXD are NLNMLH×1. An examplefor the structure of

∼RXX is shown in Fig. 3.5. Plugging (3.20) into (3.26) results in

∼RXX = E

{(INM ⊗

( ∼X1(k),

∼X2(k), . . . ,

∼XNL(k)

))H (INM ⊗

( ∼X1(k),

∼X2(k), . . . ,

∼XNL(k)

))}(3.28)

= INM ⊗ E{( ∼

X1(k),∼X2(k), . . . ,

∼XNL(k)

)H ( ∼X1(k),

∼X2(k), . . . ,

∼XNL(k)

)}(3.29)

= LD INM ⊗ E{

∼x(k)∼xH(k)}

= LD INM ⊗(TLRXXTH

L

), (3.30)

where

AC⊗BD = (A⊗B) (C⊗D) (3.31)

was used that is valid for all matrices with compatible dimensions. The step from(3.29) to(3.30) is possible because stationarity implies that the reversed temporal order of

∼Xl(k)

compared to ∼xl(k) results in a complex conjugation of the respective autocorrelation

matrix. This is in accordance with the different positions of the conjugate-transposeoperator when comparing (3.29) to (3.30). Moreover,

∼RXX can be described as the

sum of the outer product of all row vectors of∼X(k) with themselves followed by using

90

the expectation operator. Since the individual rows differ only in a time shift whilestationarity also implies shift invariance of the autocorrelation function, the number ofrows of

∼X(k) results in a scaling by LD.

A representation of (3.15) for the normalized misalignment of the vector∼h(n) with

∼h

representing∼H cannot be straightforwardly given. Still, for unitary transforms,

∆h(n) = 20 log10

∥∥∥∥ ∼h(n)−

∼h∥∥∥∥

2∥∥∥∼h∥∥∥

2

dB, (3.32)

holds without any restrictions.

3.2 Approximative Wave-Domain System ModelIn this section, the approximative wave-domain model for system identification is pre-sented, which is a generalization of the model originally proposed for WDAF [BSK04,SBR04]. This generalization has been presented in [SK11] and is complemented in thisthesis by an analysis of the nonuniqueness problem in Sec. 3.3.2 for such models and arigorous derivation of according adaptation algorithms in Sec. 3.4.

As shown in Fig. 2.14, the couplings of some wave field components described by∼H are

much stronger than others. This suggests the use of an approximative model capturingonly the strongest couplings, given limited computational resources for system identifi-cation. The choice of the modeled couplings depends on the chosen basis functions usedfor the wave-domain transforms, and is described by the NM ×NL masking matrix MH.An entry in row m and column l of this matrix is equal to one, if the coupling betweenthe wave-domain loudspeaker signal component indexed by l and the wave-domain micro-phone signal component indexed by m,

∼Hm,l(n) or

∼hm,l(k, n), is modeled or zero otherwise.

An LEMS identified with an approximative model is required to fulfill

∼h(n) = Diag

(vec

(MT

H

)⊗ 1LH×1

) ∼h(n), (3.33)

which implies that the unmodeled coefficients in∼h(n) are zero. Thus, filter coefficients

of unmodeled mode couplings will not be updated by the adaptation algorithms, whichleads to reduced computational demands for MIMO system identification.

An example for the minimum normalized system misalignment minn{∆h(n)} achievable

by approximative models is shown in Fig. 3.6. There, it is assumed that the impulseresponses for the LEMS are modeled with their full lengths, although only certain cou-plings between loudspeaker and microphone signals or components of the respective wave-domain representations are modeled. For comparison, min

n{∆h(n)} for modeling only the

strongest loudspeaker-microphone-couplings is shown by the blue line, where the relatively

3.2 Approximative Wave-Domain System Model 91

50 100 150 200 250 300 350 400 450−40

−30

−20

−10

0

modeled couplings

min n{∆

h(n

)}in

dB

point-to-point model (optimal)wave-domain model (optimal)wave-domain model (heuristic)

Figure 3.6: Minimum normalized system misalignment achievable by approximative mod-els for the LEMS as described in Sec. 2.4.2

slow decay suggests that an approximative model following this strategy does not leadto satisfying results. This is different for the wave-domain model of the LEMS, wherecertain couplings are significantly stronger than others. When modeling the strongestwave-domain couplings, the dashed red curve can be obtained for min

n{∆h(n)}, which

shows a steep descent for a relatively low number of modeled wave-domain couplings.Hence, such models are candidates for an approximate system identification. However,the weight of the couplings is typically not known before the system is identified, al-though modeled couplings have to be chosen before. Thus, a heuristic method has to beused to determine the modeled couplings: For circular harmonics, it was observed thatthe strongest couplings are those with the lowest difference in their mode order |m − l|,leading to the following definition of the entries of MH:

[MH]m,l ={

1 for |l − m| ≤ (NH − 1)/2,0 otherwise, (3.34)

where the relations of l and m to l and m are defined in (2.163) and (2.170), respectively.There are NH couplings to wave field components in ∼x(k) considered for each componentin

∼d(k). The minimum normalized system misalignment min

n{∆h(n)} when following this

strategy is shown by the black dotted curve in Fig. 3.6. For a low number of modeledcouplings this choice is nearly optimal.

The original proposal for WDAF was to model only the couplings of the wave-domainloudspeaker and microphone signals which correspond to the same basis function [BSK04,SBR04, SBRH07]. For circular harmonics, this is described by Model 1 shown in Fig. 3.7,where NH = 1 was chosen. The weight of the wave-domain mode couplings for the exampleLEMS mentioned above are depicted for comparison, where it can be clearly seen that thewave-domain transforms only approximately diagonalize the acoustic MIMO system. As

92

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

coupling weights

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

Model 1 (NH = 1)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

m

Model 2 (NH = 3)

−20 −15 −10 −5 0 5 10 15 20

−4−2

024

l

m

Model 3

Figure 3.7: Different wave-domain models for the LEMS, represented my MH. The wavefield component indices l and m have been replaced by l and m, respectively,and the matrices were permuted accordingly, to allow for a physically mean-ingful interpretation. The minimal ∆h(n) for the Models 1, 2, and 3 are−6.9 dB, −9.9 dB, and −11.8 dB, respectively.

3.3 The Nonuniqueness Problem 93

it can be seen from the experimental results presented later, this limits the applicabilityof Model 1 for AEC and LRE. This problem can be solved by using adaptive transforms[HBS10b] or by a generalization of the acoustic LEMS model as represented by Model 2with NH = 3. Generally, the modeled couplings may be freely chosen and Model 3 showsa different pattern for the modeled couplings, which has also been investigated.

3.3 The Nonuniqueness ProblemAs mentioned above, the task of identifying an LEMS may lack a unique solution. Thisproblem is often referred to as nonuniqueness problem and will be discussed in this section.In Sec. 3.3.1 conditions for the occurrence of nonuniqueness will be derived and the imme-diate consequences for system identification are discussed. The influence of LEMS modelrestrictions in relation to nonuniqueness are discussed in Sec. 3.3.2, while the problemsresulting from the nonuniqueness for AEC and LRE are discussed in Sec. 3.3.3. State-of-the-art remedies against the nonuniqueness problem are discussed in Sec. 3.3.4, whilea method to mitigate this problem using wave-domain LEMS properties is presented inSec. 3.3.5.

3.3.1 Origin and consequences for system identificationIn this section, the occurrence of the nonuniqueness problem and the limitations for theachievable system identification are discussed.

Assuming that∼RXX and ∼rXD are known, the normal equation (3.25) can be used to

identify LEMS∼h by

∼h(n), where it is assumed that the impulse responses captured in

∼h(n) and

∼h have the same length (LH = LH), within this subsection. If the square matrix

∼RXX is singular, there are more unknowns in

∼h(n) than linearly independent equations

captured by (3.25). Hence, the nonuniqueness problem occurs if and only if∼RXX is rank-

deficient. Then, infinitely many∼h(n) fulfill (3.25) and the nonuniqueness problem occurs.

Note that (3.25) can always be solved due to∼rXD =

∼RXX

∼h. (3.35)

According to (3.30),∼RXX can be obtained from RXX, while the Kronecker product has

the property

(A⊗B)† = A† ⊗B†. (3.36)

Hence, the condition for∼RXX to be invertible is the same as for RXX and given by (2.240).

Considering LX = LH leads to the condition

NLLH ≤ NS(LH + LR − 1) (3.37)

94

for∼RXX to be invertible and therefore also for unique solution to exist. This result

generalizes the results in [HBC06] where the case LH = LR, NS = 1 was analyzed. Forreproduction systems like WFS, NL � NS and moderate LR are typical choices, such thatthe nonuniqueness problem is relevant in most practical situations.

When nonuniqueness occurs, all possible solutions are given by

∼h(n) =

∼R†XX

∼rXD +∼hamb, (3.38)

where the ambiguity is described by∼hamb as an arbitrary vector lying in the nullspace of

∼RXX:

∼RXX

∼hamb = 0NLNMLH×1. (3.39)

The solutions for∼h(n) given by (3.38) describe an NLLH −NS(LH +LR − 1)-dimensional

affine hyperplane rather than a vector space, because∼h(n) = 0LHNMNL is no general

solution for (3.38).When solving (3.25), only statistics up to second order are exploited for system iden-

tification, although higher orders could be considered as well. A more general view onthe problem can be obtained by requiring ∼esi(k) = 0LDNM×1 which results in ‖∼esi(k)‖= 0for an arbitrary norm. This defines a set of solutions for

∼h(n) which results in perfect

echo cancellation. Such a solution can only be achieved, when there is no source activityin the local acoustic scene nor additional noise in the microphone signals (n(k) = 0)and if the LEMS can be perfectly modeled (LH = LH). Perfect echo cancellation hasto be distinguished from a perfect system identification, which is defined by

∼h(n) =

∼h.

A perfect system identification implies perfect echo cancellation but not vice versa, asdiscussed in the following. If the echo is perfectly canceled, ∼esi(k) = 0LDNM×1 leads to thetwo equivalent requirements

∼X(k)

∼h(n) =

∼X(k)

∼h, (3.40)(

INM ⊗ (GRq(k))T)

(INMNL ⊗ALH)∼h(n) =

(INM ⊗ (GRq(k))T

)(INMNL ⊗ALH)

∼h,(3.41)

where LX = LH is required for compatible matrix dimensions and ALH is an anti-diagonalmatrix of size LH, i. e., a flipped identity matrix. Assuming q(k) to be an arbitrary vector,fulfilling(3.41) does only require

∼h(n) =

∼h, if the full-rank matrix GR has at least as many

columns as rows, i. e., (3.37) is fulfilled. Consequently, (3.37) must be fulfilled for uniqueidentification independently of the norm used in the optimization criterion. This showsthat the nonuniqueness problem originates from the joint properties of the rendering sys-tem GR and the LEMS

∼h. The ambiguity when identifying an LEMS by considering


the statistics of the observed signals is just the consequence of these properties. How-ever, real-world implementations of system identification typically use adaptive filters thatconverge to one single solution for

∼h(n). If nonuniqueness occurs, this solution can differ

depending on various conditions, including the chosen adaptation algorithm, i. e., also onthe considered order of statistics. Since the considerations above hold for wave-domainmodels as well as for conventional point-to-point LEMS models, it can be concluded thata transform-domain LEMS representation alone does not mitigate the nonuniqueness. Inthe following, the limits of the achievable system identification when the nonuniquenessproblem occurs (i. e. NS(LH +LR− 1) < NLLH) are discussed, independently of the actu-ally chosen system identification approach. Due to the multiplication with GR in (3.41),only NM times the rank of GR components of

∼h(n) can be uniquely determined. Thus,

the rendering signals determine the subspace of the LEMS that can be identified [HB13].The dimension of this subspace is antiproportional to the dimension of the nullspace of∼RXX or, equivalently, GR. Assuming that all uniquely determined components of

∼h(n)

contribute equally to ∆h(n), while the others are set to zero, a coarse approximation ofthe lower bound for the achievable normalized misalignment can be obtained from (3.41)and (3.9):

minn{∆h(n)} ≈ 10 log10

(1−

NS(LH + LR − 1)NLLH

)dB, (3.42)

which seems an appropriate estimate whenever the observed signals provide the only avail-able information about the LEMS. In Fig. 3.8, min

n{∆h(n)} is shown as a function of NS,

NL, LH, and LR, where LH = 100, LR = 35, NS = 2, NL = 10 has been chosen if theywere not varied. It can be clearly seen that the influence of NS and NL is much largerthan the influence of LH or LR. This is because the number of sources NS and the numberof loudspeakers NL have a linear influence on the numerator and denominator in (3.42),respectively. In contrast, the influence of LR on the numerator is only asymptoticallylinear and below linear for small values of LR. The value of LH has an asymptoticallylinear influence on both, numerator and denominator, and therefore a relatively smallinfluence on the value of the resulting fraction.

Note that the considerations above disregard the actual implementation of system iden-tification by an adaptive filter. Thus, the system misalignment of a real-world system canbe much larger than predicted by (3.42). Still, (3.42) allows to measure how difficult thenonuniqueness renders the system identification task.

3.3.2 Nonuniqueness for limited models

In this section, the analysis of the nonuniqueness problem is generalized to limited models.Limited models were defined above as models that cannot perfectly model the considered

96

10 20 30 40 50 60

−10

−5

0

1 64values of LH, LG, NS, and NL

min{∆

h(n

)}in

dB

LH LG NS NL

Figure 3.8: Influence of the parameters LH, LR, NS, and NL on the lower bound for thenormalized misalignment ∆h(n) of (3.42). If not varied, the parameters arechosen to be LH = 100, LR = 35, NS = 2, and NL = 10. Whenever theminimum value of ∆h(n) is not shown, the system can be uniquely identified.

LEMS due to a limited length of the modeled impulse responses LH < LH or due tothe use of an approximative model, as described in Sec. 3.2. In practice, LH < LH willbe the typical case and approximative models are often necessary to facilitate real-timeimplementation of system identification or AEC.

For notational convenience,∼h(n) has the same dimensions as

∼h, while it describes

impulse responses of length LH. Hence, if LH < LH the unused excess entries in∼h(n)

are set to zero. In the same way, approximative models are mathematically described byrequiring certain values of

∼h(n) to be zero, as expressed by (3.33).

Considering the structure of∼h(n) described in (3.18), it is possible to define a matrix

Vsi that prunes the components set to zero by a left-hand multiplication (Vsi∼h(n)), such

that

VTsiVsi

∼h(n) =

∼h(n), (3.43)

which implies that certain components of∼h(n) are zero. The values of Vsi can be defined

by:

VTsiVsi = V2

si = Diag(

vec(MT

H

)⊗(

1LH×10(LH−LH)×1

)). (3.44)

While the NLNMLH×NLNMLH matrix V2si will be used for a more convenient notation in

the following, the factorization into VTsi and Vsi is a prerequisite for important findings in

the following. An example of the structural relation of Vsi and V2si is shown in Fig. 3.9,


VTsi · Vsi = V2

si

·

=

= LH × LH diagonal matrix

Figure 3.9: Exemplary structure of the matrices Vsi and V2si

where Vsi can be obtained by removing all zero-values rows from V2si. Likewise, VT

si isobtained by removing all zero-valued columns from V2

si. This allows for modifying (3.17)to determine the error signal for limited models according to

∼esi,app(k) = d(k)−∼X(k)V2

si∼h(n). (3.45)

Using (3.45), the mean square error is given by

E{

∼eHsi,app(k)∼esi,app(k)}

= E{

dH(k)d(k)− dH(k)∼X(k)V2

si∼h(n)

+∼hH(n)V2

si∼XH(k)

∼X(k)V2

si∼h(n)−

∼hH(n)V2

si∼XH(k)d(k)

}(3.46)

and its complex (or Wirtinger) gradient [Bra83, Fis02] with respect to∼hH(n) by

∂E{

∼eHsi,app(k)∼esi,app(k)}

∂∼hH(n)

= E{

V2si

∼XH(k)

∼X(k)V2

si∼h(n)−V2

si∼XH(k)d(k)

}. (3.47)

To obtain an∼h(n) minimizing the mean-square error, (3.47) can be set to zero, where

inserting (3.8) and (3.26) and applying the transposition on both sides leads to

V2si

∼RXXV2

si∼h(n) = V2

si∼rXD s. t. V2

si∼h(n) =

∼h(n). (3.48)

Obviously, (3.48) can be obtained from(3.25) by pruning the respective rows and columns.Multiplying (3.48) from the left-hand side by VT

si

(Vsi

∼RXXVT

si

)−1Vsi and adding zero

according to (3.43) as (V2si − INMNLLX)

∼h(n) on the left-hand side leads to

∼h(n) = VT

si

(Vsi

∼RXXVT

si

)−1Vsi

∼rXD. (3.49)

98

As VTsi

∼RXXVsi represents

∼RXX with reduced dimensions, the conditions for the existence

of(Vsi

∼RXXVT

si

)−1can be determined similarly as it was done for

( ∼RXX

)−1. Following

the same arguments as in Sec. 3.3.1, the condition for a unique solution of (3.48) can begiven by

m(H)m LH ≤ NS(LH + LR − 1), m = 1, 2, . . . , NM, (3.50)

where m(H)m is the number of non-zeros entries in row m of MH and m

(H)m = NL holds for

general (i. e. non-approximate) models.This shows that using limited models can prevent nonuniqueness by reducing the di-

mension of the nullspace of Vsi∼RXXVT

si (compared to∼RXX): While reducing LH will only

have a limited effect, as it occurs on both sides of (3.50), approximative models withzero-valued entries in MH will have a stronger influence.

Still, nonuniqueness can also occur for limited models in which case all valid solutionsfor

∼h(n) are described by

∼h(n) = Vsi

(VT

si∼RXXVsi

)†VT

si∼rXD +

∼h′amb, (3.51)(

V2si

∼RXXV2

si + V2si − INMNLLX

) ∼h′amb = 0NMNLLX×1, (3.52)

where∼h′amb has the same role as

∼hamb (3.38) but has to be distinguished from

∼hamb as

it is restricted to a subspace of∼hamb. It must be considered that solutions obtained

by (3.51) are projections of solutions to (3.38) onto the non-zero components of∼h(n).

Depending on the quality of the chosen model, solutions to (3.51) can exhibit large valuesof ∆h(n). Obviously, a suitable approximative model must allow for a description ofthe most dominant coefficients in

∼h, in order not to cause a divergence of the system

misalignment during adaptation.

3.3.3 Consequences for applications relying on system identificationIn this section, the consequences of the nonuniqueness problem to applications relying onsystem identification is discussed.

For AEC, it is not immediately evident why achieving a perfect echo cancellationwithout achieving a perfect system identification can be problematic, since minimizing∼esi(k) = 0LDNM×1 is the primary goal of AEC. This changes when regarding the repro-duction system GR as being time-variant in practice. As an example, consider a wave fieldsynthesis (WFS) system synthesizing a plane wave with a suddenly changing incidenceangle, requiring two different matrices GR1,GR2, one for the first incidence angle andanother for the second. When the problem of finding

∼h is underdetermined, an adapta-


tion algorithm will first converge to one of the many solutions for GR1. Without furtherobjectives than minimizing ∼esi(k), this solution may be the nearest possible one to thestarting point of the optimization rather than one near to perfect system identification.In general, the solution found for GR1 will not be optimal for GR2 and an instantaneousbreakdown in ERLE at the time instant of change from GR1 to GR2 is the consequence[SM95, SMH95, HBK07].

This breakdown in ERLE can become quite significant in practice. There, noise, inter-ference, double-talk, an unsuitable choice of parameters, or an insufficient LEMS modelwill cause divergence and distract the algorithm from a perfect system identification. Asthe solutions according to (3.38) or (3.51) do not form a bounded set (within an atleast one-dimensional hyperplane) whenever the nonuniqueness problem occurs, a validsolution for one GR may be arbitrarily different from any of the solutions for anotherGR. This renders the breakdown in ERLE in fact uncontrollable and constitutes a majorproblem for the robustness of multichannel AEC.

There is also a relation of the achievable ERLE to the nonuniqueness for limited LEMSmodels discussed in Sec. 3.3.2: Whenever nonuniqueness occurs, multiple loudspeaker sig-nals are linearly dependent, such that minimizing the loudspeaker echo does not necessarylead to minimizing the system misalignment. This implies that even if a chosen LEMSmodel does not consider a certain wave-domain loudspeaker-signal-to-microphone-signalcoupling, it may cancel the echo resulting from this coupling by coupling other loud-speaker signals to the respective microphone signal. Consequently, the steady state echocancellation performance of approximative models may even increase when nonuniquenessoccurs at the cost of a degraded system identification. This can be explained as follows:Solutions to both, (3.38) and (3.51), are potentially equally powerful in minimizing theerror ∼eHsi (k)∼esi(k). At the same time, the contribution of

∼hamb may render some solutions

of (3.51) also solutions of (3.38). In that case, the contribution of∼hamb can be expected

to be most significant, whenever∼h(n) is fully determined by (3.51) but not by (3.38).

However, solutions to (3.38) with a significant contribution of∼hamb are prone to exhibit

a large system misalignment to the true LEMS. Hence, the largest system misalignmentcan be expected in cases where (3.50) holds for an equation.

For LRE, as described in Sec. 4.1, the consequences are of similar nature like for AEC:a partially identified LEMS can be used for equalization, while changing correlation prop-erties of the loudspeakers signals invalidate the obtained solution. However, the LEMS isidentified using the pre-equalized loudspeaker signals which exhibit changing (i. e. non-stationary) correlation properties due to the iterative pre-equalizer determination. Thiscauses an interaction between system identification and equalizer determination, whichhas an influence that is difficult to predict. This suggests a conservative choice of allparameters for the involved adaptation algorithms.

The discussion above assumed∼RXX and ∼rXD to be known and (3.25) to be precisely

computed. In practice,∼RXX and ∼rXD are unknown, while the huge number of linear equa-

tions described by(3.25) can preclude a calculation of an exact solution. Thus, adaptation

100

algorithms as described later in Sec. 3.4, can only approximate a solution of (3.25) anda system may be inaccurately identified, even when there is a unique solution for systemidentification. A large condition number of

∼RXX will generally intensify this problem such

that this may actually have the same consequences in practice as if nonuniqueness occurs.For two channels, a value of the coherence function of the loudspeaker signals close toone will indicate such an ill-conditioning of the system identification task [KNB06]. For alarger number of channels, the generalized coherence function [GC88] allows for the sameinterpretation.

3.3.4 CountermeasuresIn this section, remedies against the nonuniqueness are discussed.

As described above, a linear dependency between loudspeaker signals is the cause forthe nonuniqueness problem, which is represented by a rank deficiency of the autocorre-lation matrix

∼RXX. An often-used remedy for the nonuniqueness is an alteration of the

loudspeaker signals such that the rank of∼RXX is increased.

A straightforward solution is the addition of mutually uncorrelated noise to the loud-speaker signals, where psychoacoustic effects can be exploited to hide those interferencefrom the listener [SMH95, GT98, GE98]. The effect of these noise signals is to increasethe rank of

∼RXX by an additional contribution to its diagonal. These approaches can be

straightforwardly extended towards an arbitrary number of loudspeaker channels. How-ever, once the level of the noise is high enough to be effective, the noise signals start alsoto be noticeable to the listener.

Another proposal was to apply different nonlinear preprocessing to the individual loud-speaker signals [BMS98, MHB01]. This causes additional signal components in the loud-speaker signals, where their relation to the original signals cannot be established by linearLEMS models, as typically used for MIMO adaptive filters. Thus, those components ap-pear to be independent for the adaptation algorithm and increase the rank of

∼RXX, like it

is the case for the noise-based approaches. The enhancement of this approach towards alarge number of channels has not yet been investigated, while it might be challenging tofind a large number of sufficiently different non-linear functions. As human listeners arevery sensitive to nonlinear distortion of music signals, these approaches have been mainlyproposed for the reproduction of speech signals.

Techniques which can potentially preserve the reproduction quality of music, are theapproaches based on time-varying all-pass filtering [Ali98, HBK07] or resampling [WJ10,WWJ12]. Those approaches cause time-varying spatial properties of the loudspeaker sig-nals, which are then no longer stationary. To explain the influence of these approacheson system identification, the resulting signals can be interpreted as multiple time-domainsegments of different quasistationary signals that exhibit different cross-correlation prop-erties for individual channels. The adaptive filter will then converge towards a solutionfor each of those signal segments. Resulting from the recursive determination of the filtercoefficients, the solution for the next signal segment will typically be close to the previous


one. As the perfect system identification is the only solution valid independently of theloudspeaker signal properties, the adaptive filter will subsequently approach the perfectsolution. The identified system will then approximately represent a Wiener solution forthe superposition of

∼RXX and ∼rXD for the respective stationary signal time sections. Still,

the use of these techniques has not been investigated for more than five loudspeakers sofar. It is expected that the improvement of system identification will degrade the moreloudspeaker channels are used. This is because the loudspeaker signals can only be al-tered within a limited range in order to retain a high perceived audio quality, while allloudspeaker signals must be altered differently.

Although these techniques showed small quality degradation [HBK07, WWJ12], thereis another obstacle for the application of these approaches for the reproduction techniquesinvolving a large number of loudspeaker channels. Often, WFS, Higher-Order Ambison-ics (HOA), holophony, or similar approaches are used, which determine the loudspeakersignals analytically and where the phase information is particularly important. Hence,a shift of the phase, as facilitated by all-pass filtering or resampling can distort the re-produced wave field in an uncontrolled manner. While an alteration of the reproducedwave field appears to be inevitable by the above techniques, it is possible to aim at acontrolled alteration. Such an approach was presented in [SHK13], where an acousticscene was reproduced with a slight time-varying rotation of the wave field facilitated bya wave-domain prefiltering of the loudspeaker signals. This approach is inherently formu-lated for a large number of loudspeakers. Although the improvement shown in [SHK13]may appear as being only moderate, it has to be considered that scenarios with largenumbers of loudspeaker channels are only barely investigated in the literature, such thatperformance evaluations for comparison are entirely missing.

In contrast to the techniques above, the approach presented in Sec. 3.3.5 below doesnot rely on the alteration of loudspeaker signals, which precludes any degradation of thereproduction quality. This is a property that only few state-of-the-art solutions exhibit[SM96, TE13a]. Furthermore, WDAF filtering aims at adaptive filtering for large-scaleMIMO systems which renders the extension towards larger numbers of channels straight-forward.

When considering LRE, as done later in Chapter 4, the loudspeaker signals are pre-filtered by equalizers before they are used for system identification. As the equalizers areadapted, this constitutes also a time-varying filtering, which can improve system identifi-cation. However, the mutual influence of system identification and equalizer optimizationcan also lead to a divergence. An investigation of this relationship can be a topic forfuture research.

3.3.5 Cost-guided wave-domain system identification

In this section, a method to improve system identification when nonuniqueness occurs ispresented. This approach has been published in [SK16b] including the according mod-

102

ification of the GFDAF algorithm provided in Sec. 3.4.4 and the experimental resultsdescribed in Sec. 3.5.4.

As explained above, altering the loudspeaker signals is the only way to remove thesingularity of

∼RXX, while such an alteration can cause a quality degradation of the repro-

duction. Still, when leaving the loudspeaker signals untouched, it is possible to exploitadditional knowledge to narrow the set of plausible estimates for

∼H, such that an estimate

near the true solution∼H(n) =

∼H can be heuristically determined. To this end, system

identification is not necessarily only based on solving (3.25), but can also incorporateother information, as described in the following.

When modeling the LEMS in the wave domain, certain wave-domain couplings arestronger than others, as shown in Fig. 2.14. As the magnitudes of the weights of

∼H m,l(ω)

(here represented by∼H) are predictable to a certain extent, they allow to assess the

plausibility of a particular estimate for∼H m,l(ω), which is represented by

∼H(n). When

an estimate exhibits wave-domain coupling weights as they would be expected for thetrue solution, it will typically be closer to the true solution than an estimate not showingthis property. Moreover, it is possible to modify the least squares cost function for anadaptation algorithm, such that the identified system is forced to reflect this property.Such a modification penalizes the energy contained in the impulse responses

∼hm,l(k, n)

for those indices where the true impulse response∼hm,l(k) is expected to have low energy.

This will be referred to as cost-guided wave-domain system identification in the following.As the dominant couplings for the system description are known before identification,this approach differs from other sparsity-exploiting approaches, which typically implydominance of arbitrary model coefficients instead of the dominance of specific coefficients[HBS12]. In the case of circular harmonics, the cost function would be monotonicallyincreasing with the difference of the mode orders |m − l|. An illustration of the modecoupling weights and corresponding cost is shown in Fig. 3.10, where Fig. 3.10(a) showsthe expected weights of the mode couplings of the true LEMS

∼H, while the penalty

for the energy of the couplings in∼H(n) is shown by the weights of the matrix

∼C′(n)

in Fig. 3.10(b). The penalty leads to mode coupling weights of the identified LEMS asshown in Fig. 3.10(c). This regularizes the problem of system identification in a physicallymotivated manner, but is in general independent of a possibly used regularization of theunderlying adaptation algorithm. For the derivation of an adaptation algorithm based onsecond order statistics, the original cost function can be enhanced by a term given by

∼hH(n)

∼C(n)

∼h(n), (3.53)

where∼C(n) is proportional to

Diag(

vec(( ∼

C′(n))T)⊗ 1LH×1

)(3.54)

and imposes a penalty on the coefficients captured in∼h(n) as illustrated in Fig. 3.10(b).

3.4 Adaptive Filtering Algorithms 103

l

m

(a)

l

m

(b)

l

m

(c)

Figure 3.10: Illustration of mode coupling weights and additionally introduced cost. (a)weights of couplings of the wave field components for the true LEMS

∼H,

(b) additional cost introduced∼C′(n), (c) resulting weights of the identified

LEMS∼H(n)

3.4 Adaptive Filtering AlgorithmsIn this section, adaptation algorithms for system identification are discussed. All al-gorithms are derived for block-wise MIMO adaptive filtering, considering approximativemodels as described in Sec. 3.2 and the cost-guided wave-domain system identification ac-cording to Sec. 3.3.5. Parts of Sections 3.4.1, 3.4.3, and 3.4.4 have already been published[SK16a].

As described above, adaptation algorithms aim typically at minimizing the error signal∼esi(k) with respect to a suitable norm. The scope of this thesis is limited to algorithms forFIR filters considering the Euclidean norm ‖∼esi(k)‖2, which ensures MMSE estimates forGaussian processes. Other approaches use higher-order statistics to increase the robust-ness under adverse conditions [BBGK06]. In this thesis, the robustness of the adaptationalgorithms is increased exploiting the wave-domain properties of an LEMS, as describedin Sec. 3.3.5. Although adaptation algorithms for infinite impulse response (IIR) filtersare known for many years, FIR adaptive filters are predominantly used, because they areinherently stable and can be straightforwardly implemented [Hay02].

Besides the properties mentioned above, adaptation algorithms can be separated intothe class of stochastic approaches and the class of deterministic approaches. While thefirst aims at minimizing the error in the sense of a statistical mean, the second aims atan absolute minimization of the error for the actually observed data. Two importantexamples of stochastic approaches are the LMS algorithm and the APA, which are oftenused in practice and will be described in Sections 3.4.1 and 3.4.2, respectively. A rep-resentative of the deterministic approaches is the RLS algorithm, which is discussed inSec. 3.4.3. Although the latter algorithm is rarely used in practice due to low robust-ness and high computational effort, it is important for theoretical considerations becauseit achieves optimal convergence, even for strongly correlated signals. Furthermore, thisalgorithm can be used to derive the GFDAF algorithm, as described in Sec. 3.4.4. TheGFDAF algorithm shows very desirable properties for the implementation of real-worldsystem identification and AEC, such as a high robustness, fast convergence and a moder-

104

ate computational effort. In Sec. 3.4.5 a guideline for the choice of the various algorithmicparameters is given.

All discussed algorithms consider the a priori error signal, which can be obtained bycomputing the error signal with the previous filter coefficients but with the currentlyobserved data blocks of the loudspeaker and microphone signals. This a priori errorsignal is given by:

∼e′si(k) =∼d(k)−

∼X(k)

∼h(n− 1). (3.55)

For approximative models, the a priori error is defined by

∼e′si(k) =∼d(k)−

∼X(k)V2

si∼h(n− 1). (3.56)

Interestingly, when applying an adaptation algorithm derived for approximative models,V2

sih(n − 1) =∼h(n − 1) holds and (3.55) and (3.56) are identical. Thus, both error

signals will not be distinguished in the following. However, ∼e′si(k) must be distinguishedfrom the a posteriori error ∼esi(k), which will be used for the definition of the adaptationalgorithm cost functions but not in the derived update rules. The notation using theprime was chosen to reflect the close relation of ∼e′si(k) and ∼esi(k). However, the primedoes not necessarily indicate “a priori” since this way of notation is also used for otherclosely related quantities in this thesis.

3.4.1 Derivation of the block least mean squares algorithm

In this section, the well-known least mean squared algorithm is derived for the MIMOadaptive filters with block-wise processing. This results in the block-LMS algorithm,which contains the common LMS algorithm as a special case [Hay02]. After this, theapproximative models described in Sec. 3.2 and the cost-guided system identificationdescribed in Sec. 3.3.5 will be considered.

The MMSE estimator for system identification can be implemented by solving (3.24),i. e., minimizing

JMMSE = E{

∼eHsi (k)∼esi(k)}

= E{‖∼esi(k)‖2

2

}. (3.57)

A straightforward approach for the minimization of (3.57) would be the gradient descentmethod [Hay02], where the gradient of (3.57) is determined using

∼h(n − 1) instead of


∼h(n). This leads to

∼h(n) =

∼h(n− 1) + µsi

(∼rXD −

∼RXX

∼h(n− 1)

)︸︷︷︸−∂E{JMMSE}∂∼hH (n)

∣∣∣∣∼h(n)=

∼h(n−1)

, (3.58)

where µsi is a parameter to control the step size that could also be set adaptively [MPS00].Solving (3.58) would, however, require estimating ∼rXD and

∼RXX, followed by a compu-

tationally expensive multiplication with the latter. Thus, the LMS algorithm follows aneven simpler approach by using the instantaneous estimates of (3.26) and (3.27) given by

∼RXX ≈

∼XH(k)

∼X(k), (3.59)

∼rXD ≈∼XH(k)

∼d(k). (3.60)

This leads to a representation of (3.58) by

∼h(n) =

∼h(n− 1) + µsi

∼XH(nLF)

(∼d(nLF)−

∼X(nLF)

∼h(n− 1)

)︸︷︷︸

−∂E{JMMSE}∂∼hH (n)

∣∣∣∣∼h(n)=

∼h(n−1)

, (3.61)

where the a priori error (3.55) can be used to simplify (3.61) to

∼h(n) =

∼h(n− 1) + µsi

∼XH(nLF)∼e′si(nLF). (3.62)

When choosing LD = LF = 1, (3.62) describes the LMS algorithm in its most commonform.

Approximative LEMS models

The error signal for approximative LEMS models has been defined in (3.45), while thegradient of the corresponding mean square error is given by(3.47). This leads to a gradientdescent algorithm according to

∼h(n) =

∼h(n− 1) + µsiV2

si

(∼rXD −

∼RXXV2

si∼h(n− 1)

). (3.63)

Plugging (3.59) and (3.60) into (3.63) results in

∼h(n) =

∼h(n− 1) + µsiV2

si∼XH(nLF)

(∼d(nLF)−

∼X(nLF)V2

si∼h(n− 1)

). (3.64)

Using the a priori error for approximative models given by (3.56) finally allows to formu-late the LMS algorithm for approximative models:

106

∼h(n) =

∼h(n− 1) + µsiV2

si∼XH(nLF)∼e′si(nLF). (3.65)

Cost-guided system identification

Analogously to the derivation above, LMS algorithm for cost-guided system identificationis derived starting from the MMSE estimator. The cost function of the cost-guided MMSEestimator for system identification can be obtained by adding (3.53) to the original costfunction such that it captures the matrix

∼C(n). Additionally, a desired impulse response

∼hC(n) is considered, which can be used as a guidance for the algorithm. The actual choiceof

∼C(n) and

∼hC(n) will be discussed later in Sec. 3.4.5, while the resulting cost function

is given by

J(cg)MMSE = JMMSE +

(∼h(n)−

∼hC(n)

)H ∼C(n)

(∼h(n)−

∼hC(n)

). (3.66)

When determining the gradient of (3.66) considering∼h(n− 1) instead of

∼h(n) and using

the approximations (3.59) and (3.60), the cost-guided LMS algorithm results:

∼h(n) =

∼h(n− 1) + µsi

∼XH(nLF)∼e′si(nLF)−

∼C(n)

(∼h(n− 1)−

∼hC(n)

). (3.67)

3.4.2 Derivation of the normalized least mean square and the affineprojection algorithms

In this section, the normalized least mean squares (NLMS) algorithm and the APA arederived. Like in the previous section, a block-wise processing of the signals is considered,such that a derivation for the NLMS results in the APA, which is also referred to asblock-NLMS algorithm [MD95].

For the derivation of both algorithms, the following optimization problem is solved:Minimize the Euclidean norm of the filter update

∼h(n) −

∼h(n − 1) with respect to the

constraint

∼d(nLF) =

∼X(nLF)

∼h(n), (3.68)

where LD must be less than NLLH. This leads to the cost function [Hay02]


JAPA(n) =(

∼h(n)−

∼h(n− 1)

)H ( ∼h(n)−

∼h(n− 1)

)+ Re

((∼d(nLF)−

∼X(nLF)

∼h(n)

)HλL

), (3.69)

where Re (·) is used to obtain the real part of an expression and λL is a Lagrange vector.To minimize (3.69), the complex (or Wirtinger) gradient [Bra83, Fis02]

∂JAPA(n)

∂∼hH(n)

=∼h(n)−

∼h(n− 1)−

∼XH(nLF)λL (3.70)

is set to zero, leading to∼h(n) =

∼h(n− 1) +

∼XH(nLF)λL. (3.71)

A left-hand side multiplication of both sides of (3.71) with∼X(k) and inserting (3.68)

allows for determining

λL =( ∼X(nLF)

∼XH(nLF)

)−1(

∼d(nLF)−

∼X(nLF)

∼h(n− 1)

). (3.72)

Finally, inserting (3.55) and (3.72) into (3.71) leads to the definition of the APA accordingto

∼h(n) =

∼h(n− 1) + µsiX†(nLF)∼e′si(nLF), (3.73)

X†(k) =∼XH(k)

( ∼X(k)

∼XH(k) + γsiXR(k)

)−1, (3.74)

where X†(k) is the Moore-Penrose pseudoinverse of∼X(k). Like for the LMS algorithm,

the parameter µsi was introduced to control the step size of the algorithm. Additionally,the diagonal matrix XR(k), weighted by γsi, was introduced for numerical regularization,which is discussed later in Sec. 3.4.5. An efficient implementation of this algorithm isdescribed in [GT95]. For LD = 1, (3.73) describes the NLMS algorithm, which can beexplicitly given using ∼x(k):

∼h(n) =

∼h(n− 1) + µsi (INLNM ⊗ALH)

(INM ⊗∼x(k))∗

∼xH(k)∼x(k)∼e′si(nLF), (3.75)

assuming LH = LX and disregarding regularization.


In order to derive the APA for approximative models, the constraint (3.68) has to bemodified to

∼d(nLF) =

∼X(nLF)V2

si∼h(n), (3.76)

108

where LD < LHNL is no longer sufficient and LD < LHm(H)m , m = 1, 2, . . . , NM with m(H)

m

being the number of non-zero entries in MH, must be fulfilled. This leads to the costfunction

J(app)APA (n) =

(∼h(n)−

∼h(n− 1)

)H ( ∼h(n)−

∼h(n− 1)

)+ Re

((∼d(nLF)−

∼X(nLF)V2

si∼h(n)

)HλL

). (3.77)

Following the same derivation as above, the representations of (3.71) and (3.72) are givenby

∼h(n) =

∼h(n− 1) + V2

si∼XH(nLF)λL, (3.78)

λL =( ∼X(nLF)V2

si∼XH(nLF)

)−1(

∼d(nLF)−

∼X(nLF)V2

si∼h(n− 1)

), (3.79)

respectively, where (3.76) replaces (3.68). This leads to the algorithm formulation accord-ing to

∼h(n) =

∼h(n− 1) + V2

si∼XH(nLF)

( ∼X(nLF)V2

si∼XH(nLF) + γsiXR(nLF)

)−1 ∼e′si(k).(3.80)

Unlike the LMS algorithm, the filter coefficient updates are not only windowed accordingto the approximative model, but the latter is also considered in the resulting pseudoinverseso that the update is already computed within the subspace resulting from the approxi-mation. Again, (3.80) describes the NLMS algorithm when LD = 1 is chosen. However,due to V2

si it is not possible to represent the term( ∼X(nLF)V2

si∼XH(nLF) + γsiXR(nLF)

)−1

by a scalar. This precludes to obtain a representation of (3.80) that is similar to (3.75).

Cost-guided wave-domain system identification

Like the derivation of the LMS algorithm, the cost function for cost-guided the APAcaptures a matrix

∼C(n) to determine the cost of the deviation from the desired impulse

response∼hC(n). This cost function is given by

J(cg)APA(n) =

(∼h(n)−

∼h(n− 1)

)H ( ∼h(n)−

∼h(n− 1)

)+(

∼h(n)−

∼hC(n)

)H ∼C(n)

(∼h(n)−

∼hC(n)

)+ Re

((∼d(nLF)−

∼X(nLF)

∼h(n)

)HλL

), (3.81)


where the constraint to be fulfilled is given by (3.68). The gradient of (3.81) with respectto

∼hH(n) is given by

∂JAPA(n)

∂∼hH(n)

=∼h(n)−

∼h(n− 1) +

∼C(n)

(∼h(n)−

∼hC(n)

)−

∼XH(nLF)λL. (3.82)

Setting this gradient to zero leads to∼h(n) =

∼h(n− 1)−

∼C(n)

(∼h(n)−

∼hC(n)

)+

∼XH(nLF)λL. (3.83)

Again, multiplying (3.83) by∼X(k) from the left-hand side and inserting (3.55) and (3.68)

leads to

λL =( ∼X(nLF)

∼XH(nLF)

)−1(

∼e′si(nLF) +∼X(nLF)

∼C(n)

(∼h(n)−

∼hC(n)

)). (3.84)

Finally, after inserting (3.84) in (3.83) the cost-guided APA can be formulated by∼h(n) =

(INLNMLH +

∼C(n)−X†(nLF)

∼X(nLF)

∼C(n)

)−1

·(

∼h(n− 1) + X†(nLF)∼e′si(nLF) +

(INLNMLH −X†(nLF)

∼X(nLF)

) ∼C(n)

∼hC(n)

)(3.85)

The approximation( ∼C(n)−X†(nLF)

∼X(nLF)

∼C(n)

)h(n) ≈

( ∼C(n)−X†(nLF)

∼X(nLF)

∼C(n)

)h(n− 1)

(3.86)

can be used to derive a more convenient form of the algorithm:

∼h(n) =

∼h(n− 1) + X†(nLF)∼e′si(k)

+(INLNMLH −X†(nLF)

∼X(nLF)

) ∼C(n)

(∼hC(n)−

∼h(n− 1)

)(3.87)

With X†(k) being the pseudoinverse of∼X(k), the second part of (3.87) can be nicely

interpreted as projecting∼C(n)

(∼hC(n)−

∼h(n− 1)

)onto the kernel of

∼X(k). In the case

LD = 1, where (3.87) describes the NLMS algorithm, the first term of the right-hand sidecan be represented by (3.75), while the second term cannot be effectively simplified.

3.4.3 Derivation of the multichannel recursive least squaresalgorithm

In this section, the RLS algorithm is derived for a block-wise processing of the multi-channel input data, where choosing LD = LF = 1 leads to the most commonly founddescription of this algorithm.

110

For the tasks investigated in this thesis, the RLS algorithm is rarely used in practice.This is due to the inversion of relatively large matrices, which implies a large computa-tional effort and makes this algorithm prone to instabilities. Moreover, when consideringa large number of loudspeaker channels, the involved matrix has huge dimensions. Anexample can be given for the evaluation scenario for AEC in Sec. 3.5.1, where 48 loud-speaker channels are considered and the filter length is 4096 samples. This would resultin a 196608× 196608 matrix to be inverted for every iteration of the algorithm, which ispractically not feasible, not even for offline simulations.2

However, from a theoretical point of view, this algorithm is very interesting as it showsoptimum convergence under certain conditions (time-invariant

∼H, wide-sense stationary

processes ∼x(k)) and its properties are well-investigated. Moreover, this algorithm is alsoused as a basis for the derivation of the GFDAF algorithm in Sec. 3.4.4. As the lat-ter algorithm approximates the RLS algorithm, it can be expected to exhibit attractiveconvergence properties.

Using the definition of the error signal according to (3.17), the following time-domaincost function defines the multichannel RLS cost function with an exponential window:

JRLS(n) =n∑ν=0

λsin−ν∼eHsi (νLF)∼esi(νLF) (3.88)

=n∑ν=0

λsin−ν

(∼dH(νLF)−

∼hH(n)

∼XH(νLF)

)(∼d(νLF)−

∼X(νLF)

∼h(n)

),

(3.89)

where λsi is an exponential weighting factor. Up to a scaling factor, this cost function isidentical to the cost function used in [BBK03] and constitutes the cost function of theoriginal RLS algorithm with exponential windowing, when LF = LD = 1 and LX = LH

are chosen. The desired filter coefficients∼h(n) minimize the exponentially weighted error

for all previous block indices ν. The exponential window RLS algorithm is attractivebecause it is more robust to time-varying statistical properties of the loudspeaker signalscompared to the sliding window RLS algorithm [Hay02].

Setting the complex (or Wirtinger) gradient [Bra83, Fis02] of (3.89) to zero can be usedto determine

∼h(n). This gradient is given by

∂JRLS(n)

∂∼hH(n)

=n∑ν=0

λsin−ν ∼

XH(νLF)∼X(νLF)

∼h(n)−

∼XH(νLF)

∼d(νLF) (3.90)

= RXX(n)∼h(n)− rXD(n) (3.91)

with2This matrix would require 288 gigabyte of memory when stored in double precision.


RXX(n) =n∑ν=0

λsin−ν ∼

XH(νLF)∼X(νLF) (3.92)

= λsiRXX(n− 1) +∼XH(nLF)

∼X(nLF), (3.93)

rXD(n) =n∑ν=0

λsin−ν ∼

XH(νLF)∼d(νLF) (3.94)

= λsirXD(n− 1) +∼XH(nLF)

∼d(nLF), (3.95)

where ∂

∂∼hH(n)

JRLS(n) = 0 leads to

RXX(n)∼h(n) = rXD(n). (3.96)

Equation (3.96) would already allow for determining the optimal filter coefficients∼h(n):

Since RXX(n) can be seen as a scaled estimate of the autocorrelation matrix of the loud-speaker signals and rXD(n) represents a scaled estimate of the cross-correlation vectorbetween loudspeaker and microphone signals, (3.96) represents the discrete-time Wienersolution as given by (3.25). Note that JRLS(n) can be weighted by (1 − λsi) to obtainRXX(n) and rXD(n) as unbiased estimates of

∼RXX and ∼rXD. This is omitted here, as con-

sidering (1− λsi) would result in identical adaptation steps, but unnecessarily complicatethe following formulae. Due to the similarity between the definitions (3.26) and (3.92),RXX(n) shares the same structure of

∼RXX, which is illustrated in Fig. 3.5.

In the following, a recursive algorithm for the determination of∼h(n) is derived, assuming

an optimal∼h(n− 1) has been obtained such that

λsiRXX(n− 1)∼h(n− 1) = λsirXD(n− 1). (3.97)

Equations (3.96) and (3.97) can be added to give

RXX(n)∼h(n) = rXD(n) + λsiRXX(n− 1)

∼h(n− 1)− λsirXD(n− 1), (3.98)

RXX(n)∼h(n) = RXX(n)

∼h(n− 1) + rXD(n)− λsirXD(n− 1)

+(λsiRXX(n− 1)− RXX(n)

) ∼h(n− 1), (3.99)

where(3.99) is obtained from(3.98) by adding(RXX(n)− RXX(n)

) ∼h(n−1). To formulate

a recursive algorithm, the a priori error(3.55) can be considered, where multiplying∼XH(k)

leads to∼XH(nLF)∼e′si(nLF) =

∼XH(nLF)

∼d(nLF)−

∼XH(nLF)

∼X(nLF)

∼h(n− 1) (3.100)

= rXD(n)− λsirXD(n− 1)−(RXX(n)− λsiRXX(n− 1)

) ∼h(n− 1)

(3.101)

112

and where (3.101) has been obtained from (3.100) using (3.93) and (3.95). Inserting(3.101) in (3.99) leads to


∼h(n− 1) +

∼XH(nLF)∼e′si(nLF), (3.102)

and finally to the explicit formulation of the adaptation algorithm, given that RXX(n) isinvertible:

∼h(n) =

∼h(n− 1) + R−1

XX(n)∼XH(nLF)∼e′si(nLF). (3.103)

If RXX(n) is not invertible, R−1XX(n) can be replaced by the Moore-Penrose pseudoinverse.

However, a pseudoinverse is extremely expensive to compute such that approaches modi-fying the matrix to be inverted are of higher practical relevance. For such a regularization,the following equation can be considered:

∼h(n) =

∼h(n− 1) +

((1− αsi)RXX(n) + αsi

µsiILHNMNL

)−1∼XH(nLF)∼e′si(nLF). (3.104)

where αsi is a parameter of choice with 0 ≤ αsi ≤ 1. For αsi = 0, (3.104) describes theRLS algorithm (3.103), for αsi = 1, the LMS algorithm (3.62) is described. By choosingαsi between 0 and 1 the adaptation steps can be continuously varied in between bothalgorithms, although the relation is not linear. Since RXX(n) is positive semi-definite theinverse exists for any αsi > 0. Moreover, when computing the inverse, choosing a largerαsi can clearly reduce the condition number of the matrix to be inverted.

Convergence with suboptimal filter coefficients

In this section, the convergence behavior of the RLS algorithm in the presence of subopti-mal filter coefficients

∼h(n− 1) is investigated, i. e., the case, when (3.97) is violated. This

is done because the GFDAF algorithm derived later will rely on suboptimum filter coeffi-cients while it is, nevertheless, expected to show a behavior similar to the RLS algorithmand will be used for most of the experiments presented in the experimental evaluation.In practice, many circumstances can lead to suboptimal filter coefficients. Most of themare related to adaptation steps during interference at the microphone signals, but also thenecessary regularization of the matrix RXX(n) will in general cause a deviation from op-timal filter coefficients. Moreover, if the adaptation is started while the loudspeaker andmicrophone signals are already excited, the typical initialization value of

∼h(n) = 0NLLH×1

does not represent optimal coefficients.Whenever

∼h(n− 1) is not optimal, it can be described by the superposition

∼h(n− 1) =

∼hopt(n− 1) + ∆

∼h(n− 1), (3.105)


where the optimal portion∼hopt(n− 1) of the filter coefficients fulfills

RXX(n− 1)∼hopt(n− 1) = rXD(n− 1), (3.106)

while the error component ∆∼h(n) does not. In this case, (3.97) to (3.99), (3.102),

and (3.103) are no longer valid and an algorithm must be derived substituting rXD(n)in (3.101) using (3.96) but not (3.97):

∼XH(nLF)∼e′si(nLF) = RXX(n)

∼h(n)− λsirXD(n− 1)

−(RXX(n)− λsiRXX(n− 1)

) ∼h(n− 1), (3.107)

which can be reformulated to


∼h(n− 1) +

∼XH(nLF)∼e′si(nLF)

+ λsi

(rXD(n− 1)− RXX(n− 1)

∼h(n− 1)

)(3.108)

Assuming RXX(n) to be invertible, the adaptation rule to obtain optimal filter coefficients∼h(n) from the previous suboptimal filter coefficients

∼h(n− 1) results in

∼h(n) =

∼h(n− 1) + R−1

XX(n)∼XH(nLF)∼e′si(nLF)

+ λsiR−1XX(n)

(rXD(n− 1)− RXX(n− 1)

∼h(n− 1)

). (3.109)

Inserting (3.105) into (3.109) results in optimum filter coefficients∼h(n), i. e., ∆

∼h(n) =

0LHNMNL . On the other hand, inserting (3.105) into (3.103) will not lead to optimum filtercoefficients, where the remaining error is given by

∆∼h(n) = −λsiR−1

XX(n)RXX(n− 1)∆∼h(n− 1). (3.110)

This gives rise to the question how this error propagates in the following iterations. For-tunately, recursive application of (3.109) leads to

∆∼h(n) = (−λsi)2R−1

XX(n+ 1)RXX(n− 1)∆∼h(n− 1) (3.111)

∆∼h(n) = (−λsi)3R−1

XX(n+ 2)RXX(n− 1)∆∼h(n− 1) (3.112)

which shows that any error introduced in ∆∼h(n) decays exponentially, while the recon-

vergence speed is determined by the parameter λsi.


In Sec. 3.2 an approximative model for the LEMS has been proposed, for which a variantof the RLS algorithm is derived in the following, like it was done for the LMS and theAPA algorithms in Sections 3.4.1 and 3.4.2, respectively.

114

For the derivation, the definition of the error signal according to (3.45) is used insteadof (3.17), such that the cost function (3.88) is modified according to:

J(app)RLS (n) =

n∑ν=0

λsin−ν∼eHsi,app(νLF)∼esi,app(νLF). (3.113)

The derivation of the adaptation algorithm uses exactly the same steps as shown above.Consequently, considering the derivation above while exchanging

∼h(n) for Vsi

∼h(n) and

∼X(k) for

∼X(k)VT

si leads to the desired algorithm. The latter exchange implies a furtherexchange of RXX(n) for VsiRXX(n)VT

si. Then, assuming VsiRXX(n)VTsi to be invertible

results in

Vsi∼h(n) = Vsi

∼h(n− 1) +

(VsiRXX(n)VT

si

)−1Vsi


Multiplying VTsi from the left-hand side and requiring

(INLNMLH −V2

si

) ∼h(n) = 0NLNMLH×1 (3.115)

leads to an explicit formulation of the algorithm given by

∼h(n) = V2

si∼h(n− 1) + VT

si

(VsiRXX(n)VT

si

)−1Vsi


Actually, (3.116) is identical to (3.103), when only a single microphone signal and theloudspeaker signals coupled to it are considered. As this algorithm is optimal with respectto minimizing (3.113), this implies that considering more than the loudspeaker signalsactually coupled by the model does not provide any advantage.


Like for the previously considered algorithms, a modified version of the RLS algorithmfor cost-guided system identification is derived in the following. To this end, the costfunction (3.88) is modified according to

J(cg)RLS(n) = JRLS(n) +

(∼h(n)−

∼hC(n)

)H ∼C(n)

(∼h(n)−

∼hC(n)

), (3.117)

where the complex gradient of this cost function is given by

∂J(cg)RLS(n)

∂∼hH(n)

=∂JRLS(n)

∂∼hH(n)

+∼C(n)

∼h(n)−

∼C(n)

∼hC(n). (3.118)


Like in the previous derivations in this section, the gradient is set to zero, which leads to(RXX(n) +

∼C(n)

) ∼h(n) = rXD(n) +

∼C(n)

∼hC(n). (3.119)

Assuming (3.119) was fulfilled in the previous iteration, combined with a multiplicationby λsi allows for writing

λsi(RXX(n− 1) +

∼C(n− 1)

) ∼h(n− 1) = λsirXD(n− 1) + λsi

∼C(n)

∼hC(n− 1) (3.120)

just as (3.97) was obtained from (3.96). The addition of (3.119) and (3.120) is equivalentto the representation of (3.98) and given by(RXX(n) +

∼C(n)

) ∼h(n) = rXD(n) + λsiRXX(n− 1)

∼h(n− 1)− λsirXD(n− 1)

+ λsi∼C(n− 1)

(∼h(n− 1)−

∼hC(n− 1)

)+

∼C(n)

∼hC(n). (3.121)

Exploiting (3.98) and (3.102) to replace to first terms leads to(RXX(n) +

∼C(n)

) ∼h(n) = RXX(n)

∼h(n− 1) +

∼XH(nLF)∼e′si(nLF),

+ λsi∼C(n− 1)

(∼h(n− 1)−

∼hC(n− 1)

)+

∼C(n)

∼hC(n) (3.122)

=( ∼C(n− 1) + RXX(n)

) ∼h(n− 1) +

∼XH(nLF)∼e′si(nLF)

+∼C(n− 1)

((λsi − 1)

∼h(n− 1)− λsi

∼hC(n− 1)

)+

∼C(n)

∼hC(n), (3.123)

where requiring(RXX(n) +

∼C(n)

)to be invertible allows to formulate the following re-

cursive algorithm:

∼h(n) =

∼h(n− 1) +

(RXX(n) +

∼C(n)

)−1 ( ∼XH(nLF)∼e′si(nLF)

+ λsi∼C(n− 1)

(∼h(n− 1)−

∼hC(n− 1)

)−

∼C(n)

(∼h(n− 1)−

∼hC(n)

))(3.124)

While this last equation seems to be not very practical, it should be noted that in manypractical scenarios, the flexibility of setting

∼C(n) and

∼hC(n) to arbitrary values is not

needed. For example, if∼hC(n) = 0LHNMNL is chosen, (3.124) reads

∼h(n) =

∼h(n− 1) +

(RXX(n) +

∼C(n)


+(λsi

∼C(n− 1)−

∼C(n)

) ∼h(n− 1)

). (3.125)

116

On the other hand, if∼C(n) and

∼hC(n) are independent of n, (3.124) reads

∼h(n) =

∼h(n− 1) +

(RXX(n) +

∼C(n)


+ (1− λsi)∼C(n)

(∼hC(n)−

∼h(n− 1)

)). (3.126)

Hence, the actually implemented algorithms may assume simpler forms.

3.4.4 Derivation of the generalized frequency-domain adaptivefiltering algorithm

In this section, the GFDAF algorithm is derived as an approximation of the RLS algorithmpresented in Sec. 3.4.3. The GFDAF algorithm was firstly published in [BBK03], althoughthe idea can be dated back earlier [Ben00]. Its derivation has been inspired by [DMW78,Fer80], incorporating concepts of [MGJ82, BD92, MAAG95]. In the single-channel casefrequency-domain adaptive filtering is well-known [Shy92, SvGKJ87] and can be seen asa further development of frequency-subband adaptive filtering [Kel85, Kel88].

The derivation presented in the following differs from [BBK03] in the following points:

• The derivation is based on replacing the convolution matrices captured in (3.103)by DFT-domain multiplication instead of defining an equivalent to (3.88) in theDFT domain. This allows to show the relation of the GFDAF algorithm to the RLSalgorithm more clearly.

• An erroneous equality used in the original derivation is clearly identified as anapproximation.

• The frame shift LF and the lengths of the adaptive filters LH can be chosen inde-pendently of the microphone signal segment length LD.

• The algorithm is additionally formulated for approximative wave-domain LEMSmodels described in Sec. 3.2 and to implement the cost-guided wave system identi-fication presented in Sec. 3.3.5.

• A different regularization approach is used that exploits the relation between theRLS algorithm and the LMS algorithm described by (3.104).

In [BBK03], an DFT-domain equivalent to (3.88) was used to derive the GFDAF algo-rithm. Since the block RLS algorithm derived in Sec. 3.4.3, which minimizes(3.88) involvesno approximations, (3.103) can be used for further derivations without restrictions. Asa first step, (3.103) will be rewritten in the DFT domain such that this representationcan be approximated to formulate the GFDAF algorithm. The approximations used forthe GFDAF algorithm do not only reduce the computational effort but also increase therobustness of the algorithm.


The basis for the following derivation is a DFT-domain representation of the loud-speaker signals captured in

∼X(k). To facilitate this, the individual loudspeaker signals

∼Xl(k) are first considered separately, where

∼Xl(k) =

(0LD×(LX−LD), ILD

)Xl(k)

(ILH

0(LX−LH)×LH

)(3.127)

with

Xl(k) = ∼

X(A)l

(k)∼X(B)l

(k)∼Xl(k)

∼X(C)l

(k)

(3.128)

holds for any matrices∼X(A)l

(k),∼X(B)l

(k), and∼X(C)l

(k) of compatible dimensions. Since∼Xl(k) is a Toeplitz matrix,

∼X(A)l

(k),∼X(B)l

(k), and∼X(C)l

(k) can be chosen such that Xl(k)is a circulant matrix that can be diagonalized by the DFT matrix, which leads to

∼X l(k) = FLXXl(k)FH

LX(3.129)

=√LXDiag (FLX

∼xk(k)) . (3.130)

Now, it is possible to represent the convolution∼Xl(k)hTm,l(n) (3.131)

by its overlap-save DFT-domain representation(0LD×(LX−LD), ILD

)︸︷︷︸

truncation

FHLX

∼X l(k)FLX

(ILH

0(LX−LH)×LH

)︸︷︷︸

zero padding

hTm,l(n) (3.132)

Note that LX = LD + LH − 1 is not necessary following, but LX ≥ LD + LH − 1 isassumed. This constitutes a generalization relative to [BBK03], where only the caseLX = 2LD = 2LH was considered.

Finally, it is possible to represent∼X(k) in the DFT domain

∼X(k) = INM ⊗

( ∼X 1(k),

∼X 2(k), . . . ,

∼XNL

(k)). (3.133)

∼X(k) = W01

∼X(k)W10, (3.134)

where the matrices

W01 = INM ⊗((

0LD×(LX−LD), ILD

)FHLX

), (3.135)

W10 = INLNM ⊗(

FLX

(ILH

0(LX−LH)×LH

))(3.136)

118

∼X(k) = W01

∼X(k)W10

=

·

·

∼X1(k)

∼X2(k)

∼X3(k)

∼X1(k)

∼X2(k)

∼X3(k)

FLX∼x1(k)

FLX∼x2(k)

FLX∼x3(k)

FLX∼x1(k)

FLX∼x2(k)

FLX∼x3(k)

= convolution matrixaccording to (3.21)

= fully occupied LD × LX matrix

= diagonal matrix

= fully occupied LX × LH matrix

Figure 3.11: Structure of the matrices to describe the time-domain convolution illustratedin Fig. 3.4 as a DFT-domain multiplication

are used to transform signal vectors from and to the DFT domain as well as for discrete-time truncation and zero-padding operations. An example for the structure of the matricesinvolved in (3.134) is shown in Fig. 3.11.

Considering the zero-padding and the DFT through W10, the DFT-bin-wise multipli-cation by

∼X(k) combined with the inverse DFT and truncation by W01, (3.134) can be

clearly identified as an overlap-save fast convolution representation of the time-domainconvolution matrix

∼X(k).

Plugging (3.134) into (3.93) allows to define

SXX(n) = λsiSXX(n− 1) +∼XH(nLF)WH

01W01∼X(nLF) (3.137)

such that

RXX(n) = WH10SXX(n)W10. (3.138)

The matrix SXX(n) can be interpreted as an estimate of the DFT-domain power spectraldensity (PSD) of the loudspeaker signals. Considering (3.103), while replacing RXX(n)


by (3.138) and∼XH(nLF) by (3.134) results in

∼h(n) =

∼h(n− 1) +

(WH

10SXX(n)W10)−1

WH10

∼XH(nLF)WH

01∼e′si(nLF), (3.139)

which describes the same adaptation steps as (3.103). Equation (3.139) can be found in[BBK03] when considering the DFT-domain representations of

∼h(n) = W10

∼h(n) (3.140)

and ∼e′si(k) = WH01

∼e′si(k). However, in [BBK03] the microphone signals are captured in theindividual columns of a matrix, while the error signals are captured in (3.139) by a singlecolumn vector, here. As a consequence

∼h(n) and

∼X(k) are structured differently in an

according manner. Note that the quantities ∼e′si(k) and ∼esi(k) are not distinguished in thenotation in [BBK03].

In (3.139), the size of the matrix WH10SXX(n)W10 and its inverse preclude a real-world

implementation of this algorithm for large filter lengths or a large number of loudspeakerchannels. To overcome this obstacle, it was proposed in [BBK03] to approximate SXX(n)by a sparse matrix, which leads to a less complex inversion of WH

10SXX(n)W10.As

∼X(k) is sparse, the lack of sparsity in SXX(n) can be attributed to the term WH

01W01

which represents windowing with a rectangular window in the time domain. Consideringthe definition of the DFT matrix given by (2.230), while evaluating WH

01W01 for a singlechannel (NL = NM = 1) leads to

[WH

01W01]ζ,η

=[FLXDiag (wrect) FH

LX

]ζ,η

= 1LX

LX−1∑κ=0

wrect (κ) ejκ(η−ζ) 2π

LX , (3.141)

where wrect (κ) describes an appropriate window function with the vector representation

wrect = (wrect (0) , wrect (1) , . . . , wrect (LX − 1))T . (3.142)

For wrect (κ) = 1 ∀ k, (3.141) would describe an identity matrix, while the definition of

wrect (κ) ={

1 for LX − LD ≤ κ < LX,

0 otherwise (3.143)

describes the time-domain windowing according to (3.135). As described in [BBK03],(3.141) can be identified as finite geometric series, which allows to write (3.143) as

[FLXDiag (wrect) FH

LX

]ζ,η

=

LDLX

for ζ = η

−1LX· 1− ej(LX−LD)(η−ζ) 2π

LX

1− ej(η−ζ)2πLX

otherwise. (3.144)

120

To complement the findings in [BBK03], partial fraction decomposition can be used toobtain the identity

1sin(x)

=∞∑

ι=−∞

(−1)ι

x− ιπ, (3.145)

which allows to write

1− ej(LX−LD)(η−ζ) 2πLX

1− ej(η−ζ)2πLX

=sin

((LX − LD)(η − ζ) π

LX

)sin

((η − ζ) π

LX

) ej(LX−LD−1)(η−ζ) π

LX (3.146)

=∞∑

ι=−∞

(−1)ι sin((LX − LD)(η − ζ) π

LX− ιπ

)(η − ζ) π

LX− ιπ

ej(LX−LD−1)(η−ζ) π

LX .

(3.147)

This identifies (3.141) as an infinite train of sinc functions, each multiplied by an expo-nential phase term to represent the time-domain shift or asymmetry of the window. Thefraction in (3.146) is also referred to as circular sinc function [Pol96]. The resulting matrixis circulant and capturing one period of this train in each row, such that it expresses aconvolution with this function. The maximum of this function is located on the maindiagonal, which suggests an approximation of WH

01W01 by an identity matrix. This ap-proximation tends to be more accurate, the narrower the main lobe of the sinc functionis or, equivalently, the larger the time-domain window is. Hence, WH

01W01 can be betterapproximated by a scaled identity matrix, the larger LD is, where

WH01W01 ≈ INMLX

LD

LX, (3.148)

which was observed for the special case LX = 2LD in [BBK03]. Consequently, (3.137) canbe approximated by

S(sp)XX(n) = λsiS(sp)

XX(n− 1) + LD

LX

∼XH(nLF)

∼X(nLF) (3.149)

where the structure of S(sp)XX(n) is illustrated in Fig. 3.12. Replacing SXX(n) by S(sp)

XX(n)in (3.139) does not lead to an obvious advantage, yet. Therefore another approximationis used: (

WH10S

(sp)XX(n)W10

)−1≈WH

10

(S(sp)

XX(n))−1

W10, (3.150)

where(S(sp)

XX(n))−1

is now the inverse of a sparse matrix which is inexpensive to com-pute, when exploiting the matrix structure accordingly. Eventually, (3.150) can be usedto approximate R−1

XX(n) in the DFT domain, which distinguishes the GFDAF algorithmfrom the RLS algorithm. Not only that this leads to tremendous computational savings,


S(sp)XX(n) = λsi S(sp)

XX(n− 1)+ LD

LX

∼XH(nLF) ·

∼X(nLF)

= λsi S(sp)

XX(n− 1)

+ LDLX

·

= LX × LX diagonal matrix

Figure 3.12: Illustration of the structure of the matrix S(sp)XX(n)

but it also improves the condition of the matrix inversion [BBK03]. This property alsoallows for a straightforward parallelization of the algorithm, which can be beneficial forthe implementation of the algorithm for multi-threaded programming and especially ondistributed computer hardware [SSK12]. The resulting approximation of the RLS algo-rithm in the DFT domain is then given by:

∼h(n) =

∼h(n− 1) + µsi WH

10

(S(sp)

XX(n) + γsiSR(n))−1

W10︸︷︷︸≈R−1

XX(n)

·WH10

∼XH(nLF)WH

01∼e′si(nLF)︸︷︷︸

=∼XH(nLF)∼e′si(nLF)

, (3.151)

where the step-size parameter µsi was introduced to account for the inaccuracy of theapproximation. This allows to use a more conservative or an even more aggressive stepsize, depending on the necessities of the considered application scenario.

Furthermore, the matrix SR(n) with the weight parameter γsi is introduced into(3.151),describing a simple Tikhonov regularization. This is necessary because the nonuniquenessproblem must be expected in the scenarios considered here, which will render S(sp)

XX(n) sin-gular. It can be seen from (3.104) that a large γsi would force the GFDAF algorithm

122

to approach the adaptation steps of the LMS algorithm. Since the LMS algorithm is awell-understood algorithm, this regularization can easily be justified. An alternative tothis Tikhonov regularization is to perform an eigenvalue decomposition of S(sp)

XX(n) suchthat a pseudoinverse can be computed. Moreover, this pseudoinverse can also be iter-atively computed using an approach similar to the well-known matrix inversion lemma[HBS10a]. The a priori error signal ∼e′si(k) can also be obtained according to an overlap-save fast convolution by using W10 to transform h(n − 1) to the DFT domain, followedby a DFT-domain multiplication with

∼X(k) and a transform back to the time domain by

W01:

∼e′si(k) =∼d(k)−W01

∼X(k)W10h(n− 1). (3.152)

In order to further reduce the computational demands of the algorithm, two other ap-proximations can be made in(3.151). First, W10WH

10 can be approximated in the sameway as WH

01W01 by

W10WH10 ≈ INLNMLX

LH

LX, (3.153)

where LH has the same role as LD in (3.148). This leads to

∼h(n) =

∼h(n− 1) + µsi

LH

LXWH

10

(S(sp)

XX(n) + γsiSR(n))−1 ∼

XH(nLF)WH01

∼e′si(nLF),

(3.154)

where the time-domain windowing by W10WH10 is omitted. Furthermore, when considering

(3.140) the matrix WH10 can also be neglected which leads to the so-called unconstrained

variant [MGJ82, BBK03] of this algorithm given by

∼h(n) =

∼h(n− 1) + µsi

LH

LX

(S(sp)

XX(n) + γsiSR(n))−1 ∼

XH(nLF)WH01

∼e′si(nLF), (3.155)

which requires that (3.152) is also simplified to

∼e′si(k) =∼d(k)−W01

∼X(k)

∼h(n− 1). (3.156)

The time-domain windowing operations applied to the error signal cannot be neglectedfor the definition of the algorithm. Otherwise, the adaptive filter would converge to asolution for a cyclic convolution and not to a solution for a linear convolution with thetime-domain filter coefficients.


Note that instead of the approximation (3.150), an identity was postulated in [BBK03]that turned out to be wrong. The discussion in Appendix B shows that this false identitycan still be justified as an approximation.


As the GFDAF approximates the RLS algorithm, the formulation of the RLS algorithmfor spatial approximative models can be straightforwardly translated by comparing(3.151)to (3.116):

∼h(n) = V2

si∼h(n− 1) + µsiWH

10VTsi

(V si

(S(sp)

XX(n) + γsiSR(n))

VTsi

)−1

·V siW10WH10

∼XH(nLF)WH

01∼e′si(nLF), (3.157)

where V si assumes the same role as Vsi in previous equations while considering theaccordingly adjusted signal segment length and impulse response lengths. In doing so,it was exploited that all approximations are only related to the temporal dimension anddo not affect the relation of the individual loudspeaker and microphone signals. Theformulations of (3.154) and (3.155) can be obtained in the same manner and are given by

∼h(n) = V2

si∼h(n− 1) + µsi

LH

LXWH

10VTsi

(V si

(S(sp)

XX(n) + γsiSR(n))

VTsi

)−1

·V si∼XH(nLF)WH

01∼e′si(nLF), (3.158)

∼h(n) = V2

si∼h(n− 1) + µsi

LH

LXVT

si

(V si

(S(sp)

XX(n) + γsiSR(n))

VTsi

)−1

·V si∼XH(nLF)WH

01∼e′si(nLF), (3.159)

respectively.


The close relation between the RLS algorithm and the GFDAF algorithm can also beexploited for the implementation of the cost-guided system identification. However, asthe inverse in (3.151) was approximated, this approximation has also to be applied to theinverse in (3.124) using

(RXX(n) +

∼C(n)

)−1=(WH

10S(sp)XX(n)W10 +

∼C(n)

)−1(3.160)

≈WH10

(S(sp)

XX(n) +∼C(n)

)−1W10, (3.161)

124

where∼C(n) is the DFT-domain representation of

∼C(n), which should be chosen such

that the matrix to be inverted retains its sparsity. Note that∼C(n) will be defined as

a diagonal matrix, describing the same weight for all time samples of∼hm,l(k, n) for the

individual combinations of m and l. This can be represented in the DFT domain withoutoff-diagonal entries in the matrix

∼C(n).

Finally, (3.124) can be translated to the representations of (3.151), (3.154), and (3.154),given by

∼h(n) =

∼h(n− 1) + µsiWH

10

(S(sp)

XX(n) +∼C(n) + γsiSR(n)

)−1W10

·WH10

(∼XH(nLF)WH

01∼e′si(nLF) +

(λsi

∼C(n− 1)−

∼C(n)

)W10

∼h(n− 1)

+∼C(n)W10

∼hC(n)− λsi

∼C(n− 1)W10

∼hC(n− 1)

), (3.162)

∼h(n) =

∼h(n− 1) + µsi

LH

LXWH

10

(S(sp)


)−1

·(

∼XH(nLF)WH

01∼e′si(nLF) +

(λsi

∼C(n− 1)−

∼C(n)

)W10

∼h(n− 1)

+∼C(n)W10

∼hC(n)− λsi

∼C(n− 1)W10

∼hC(n− 1)

), (3.163)

∼h(n) =

∼h(n− 1) + µsi

LH

LX

(S(sp)


)−1

·(

∼XH(nLF)WH

01∼e′si(nLF) +

(λsi

∼C(n− 1)−

∼C(n)

) ∼h(n− 1)

+∼C(n)W10

∼hC(n)− λsi

∼C(n− 1)W10

∼hC(n− 1)

), (3.164)

respectively.

3.4.5 Choice of algorithmic parametersIn this section, the choice of the parameters for the individual algorithms is discussed.After discussing the cost matrix and the so-called guidance coefficients, the parameterschoices specific to the individual algorithms will be discussed.

Cost matrix and guidance coefficients

As mentioned above, the quantities∼hC(n) and

∼C(n) can be used to control the convergence

behavior of the adaptation algorithms. This is useful whenever RXX(n) is not invertible,which is a typical case for MIMO system identification (see Sec. 3.3). Four strategies forchoosing

∼C(n) and

∼hC(n) are of primary interest:

1. Implement the improved wave-domain system identification as described in Sec. 3.3.5

2. Avoid filter coefficients with large magnitudes


3. Avoid large adaptation steps (only relevant for the GFDAF algorithm)

4. Guide the adaptive filter to identify a system close to a previously measured one

To realize the approach described in Sec. 3.3.5, the guidance coefficients are set to zero∼hC(n) = 0LHNLNM×1, while the cost matrix should be chosen inversely proportional tothe expected weight of the filter coefficients for perfect system identification. This thesisis concerned with wave-domain transforms such that the considered LEMSs exhibit adiagonally dominant structure in the wave domain, as shown in Fig. 2.14. In this case, acost matrix

∼C′(n) as illustrated in Fig. 3.10(b) is a suitable choice, which can be defined

according to

[ ∼C′(n)

]ζ,η

=

β1 for ζ − η = 0,β2 for |ζ − η| = 1,1 otherwise,

(3.165)

where the parameter β1 will typically be chosen close to zero, while β2 will be chosen inthe range β1 ≤ β2 ≤ 1. The matrix

∼C′(n) allows to define

∼C(n) = βsiwc(n)Diag

(vec

(( ∼C′(n)

)T)⊗ 1LH×1

), (3.166)

which was used for the definition of the adaptation algorithms and where the weightparameter βsi can be used to control the influence of

∼C(n). The function wc(n) is used to

determine a time-dependent cost weight and can be chosen according to

wc(n) = LH

LXLD

√√√√√√∑n−1ν=0 λsin−ν

(∼e′si(nLF)

)H ∼e′si(nLF)∼xH(nLF)∼x(nLF)∼hH(n− 1)

∼h(n− 1) + ε

. (3.167)

Here, the parameter ε will typically be chosen such that∼hH(n)

∼h(n)� ε for well-identified

systems and is needed for regularizing the denominator. As (3.167) can be seen as atime-averaged estimate of the relative filter update step without using the cost matrix,the use of wc(n) achieves an approximate balance between the contribution of the costmatrix and the filter update resulting from the observed signals.

To penalize filter coefficients of large magnitude, the cost matrix can simply be chosenaccording to

∼C(n) = wc(n)INLNMLH , (3.168)

while the guidance coefficients are again set to zero (∼hC(n) = 0LHNLNM×1) and wc(n)

is also given by (3.167). It can be seen from the cost function (3.117) used for thecost-guided system identification that such a choice penalizes large values of

∼hH(n)

∼h(n).

This can improve the system identification for conventional point-to-point LEMS models,when nonuniqueness leads to large magnitudes of the filter coefficients, because valid

126

solutions for system identification do not form a bounded set. However, as there is stilla large number of equally weighted paths modeled in a conventional LEMS, this willbe less effective in improving the system identification, compared to using (3.166) for awave-domain model.

Independently of the chosen model, (3.168) can be used to guide the adaptive filterto identify a system similar to a desired one described by

∼hC(n), which has then to be

chosen accordingly. To this end, an LEMS MIMO impulse response can be measuredbefore the operation of an AEC or LRE system. As the MIMO impulse response maychange significantly over time, this can only moderately reduce the system misalignment,while there is also a noticeable risk to increase the resulting system misalignment. Still,this approach can potentially lead to an improvement under adverse conditions.

Regularization

The APA and the GFDAF algorithm require a matrix inversion, for which the necessaryregularization is discussed in the following. A regularization of the RLS algorithm isnot discussed, as this algorithm has no practical relevance for the considered applicationscenarios.

For the APA, the matrix XR(k) is used to regularize (3.74). Since the algorithmoperates on the instantaneous signals, the regularization matrix should also be based oninstantaneous signals:

XR(k) = max(ε,

∼xH(nLF)∼x(nLF)NLLX

)INMLD , (3.169)

where ε is a very small number to avoid an undefined inverse for ∼x(nLF) = 0NLLX×1.For the GFDAF algorithm, the regularization is facilitated by the matrix SR(n), found

in (3.151) and all equations derived from it. This matrix can be chosen according to

SR(n) = max (ε, pX(n)) INLNMLX , (3.170)

where, analogously to the estimation of S(sp)XX(n), a recursively time-averaged estimate of

the loudspeaker signal power is used, which is given by

pX(n) = λsipX(n− 1) + LD

∼xH(nLF)∼x(nLF)NLLX

. (3.171)

The regularization weight can be chosen by γsi for both considered algorithms. As anyγsi 6= 0 degrades the ability to minimize the respective cost function of the algorithm,γsi should be chosen as low as possible. On the other hand, γsi must be larger than zeroto guarantee the existence of the inverses and also large enough such that the matrix tobe inverted is sufficiently well conditioned. Otherwise numerical errors or interferences inthe microphone signals can cause significant divergence. Hence the optimal value of γsi

depends on the considered scenario. For the GFDAF algorithm it has been observed thatregularization must be stronger for increasing number of loudspeakers NL and increasing


condition number of∼RXX while this has not been observed for the APA. This difference

can be explained by the fact that the GFDAF algorithm uses the product∼XH(k)

∼X(k),

while the APA uses the product∼X(k)

∼XH(k) in the respective inverses. The dimensions of

∼XH(k)

∼X(k) are proportional to the number of loudspeakers leading to a condition number,

which increases with the number of loudspeakers. On the other hand, the dimensionsof

∼X(k)

∼XH(k) are given by NMLD such that they are independent of the number of

loudspeakers. At the same time, the number of rows in∼X(k) is proportional to the

number of loudspeakers, while an increased number of rows can lead to a lower conditionnumber of the matrix

∼X(k)

∼XH(k).

While the choice of simple identity matrices in (3.169) and (3.170) appears to ignoresome degrees of freedom that could be exploited, there is a justification for this: Whenthe contribution of the regularization matrix would dominate the matrix to be inverted,(3.73) and, independently, (3.151) would approximate (3.67). Thus, this type of regular-ization can be interpreted as making the adaptation step more LMS-like. Actually, theGFDAF algorithm has been evaluated with a frequency-bin wise regularization [SK11],which turned out to degrade the performance of the algorithm, compared to the simpleregularization as described above.

Step size

In theory, the RLS algorithm is able to identify optimal filter coefficients with respectto (3.88) in each iteration, which can be expected to lead to good system identification.This is not true for the other considered algorithms, such that a step-size parameter µsi

was introduced to account for this lack of optimality.The LMS algorithm does only provide optimal filter coefficients in the mean-square

error sense and a particular filter update can also represent divergence. Moreover, theadaptation step size of this algorithm is proportional to the signal power of the loud-speaker signals, which is independent of the expected magnitude of the filter coefficientsrepresenting the true LEMS. This can result in a large variance of the filter coefficientsand requires µsi to be chosen inversely proportional to the signal power for compensatingthe dependency on the input signal power [Hay02].

The situation for the APA is more favorable, as its cost function already penalizes largeadaptation steps. Still, the introduced constraint forces the updated filter coefficients toset the error signal to zero for the currently present data blocks. This can lead to verydifferent filter coefficients from one iteration to another. The closer LD is to NLLH, thelarger the adaptation steps will be, as fewer degrees of freedom are available to fulfill theconstraint. Finally, a solution minimizing the squared error signal for multiple iterations,will be found in the mean sense. To reduce the variance of the obtained filter coefficients,µsi can be set lower than one, while the algorithm is expected to be stable for 0 ≤ µsi ≤ 2[Hay02]. Unlike the LMS algorithm, the adaptation step will not be proportional to theloudspeaker signal power. This is in accordance to the special case LD = 1, where theAPA represents the NLMS algorithm.

128

As an approximation of RLS, the GFDAF algorithm will typically not achieve perfectfilter coefficients with respect to minimizing (3.88). Furthermore, a necessary regulariza-tion can introduce an additional deviation from optimality. To account for these issues,µsi should be chosen accordingly. A too large step size can result in a low steady-stateperformance, a low convergence speed or even instability, while the choice of a too lowstep size will unnecessarily slow down the convergence. Unfortunately, the approximationused for the GFDAF algorithm impedes a straightforward derivation of bounds for µsi

to guarantee stability of the algorithm. The same holds for deriving an optimal valuefor µsi. It has been observed that the GFDAF algorithm typically provides conservativeupdates compared to the RLS algorithm, which suggest that µsi should not be chosen tooconservative.

“Forgetting factor”

For the GFDAF algorithm, a so-called “forgetting factor” λsi can be chosen, which controlsthe exponential weight of the energy of previous error signal blocks. This parameterdetermines the convergence speed of the algorithm, where λsi will typically be set closeto one. While a lower value of λsi will lead to a faster convergence, it will also lead to ahigher condition number of the matrix, which must be inverted in(3.151) and all equationsderived from it. This is because (3.149) describes a limited rank update on S(sp)

XX(n), whilethe full-rank contribution of S(sp)

XX(n− 1) is scaled by λsi and suggests to choose λsi closerto one if the system identification task is anyway an ill-conditioned problem.

The statistical properties of the condition number of S(sp)XX(n) as a function of λsi could be

of theoretical interest. For λsi close to one, this matrix is similar to a Wishart-type randommatrix, where the statistical properties are described in [MMSN10, MA07]. However, itturned out that the results from [MMSN10] cannot be straightforwardly generalized toS(sp)

XX(n) as defined in (3.149).The parameter λsi is also relevant for the RLS, where the discussion above applies

likewise.

Length of considered microphone signals

All of the presented algorithms allow to chose the block length LD of the consideredmicrophone signals. Choosing a large value of LD will generally improve the convergencespeed of the LMS algorithm, the APA and the GFDAF algorithm. This is because thosealgorithm use approximations, while a longer microphone signal block length providesmore data to improve the estimate

∼h(n). On the other hand, the RLS algorithm does

not benefit from such a choice as it already provides optimal filter coefficients in eachiteration. However, the computational effort for all algorithms increases with LD, whichsuggests to choose LD as low as possible. In most cases, the choice of LD = LF wouldbe appropriate in order to exploit all observed time samples of the microphone signalsand to obtain continuous error signals which can be used as echo-canceled signal, when


implementing an AEC. Nevertheless, the APA requires LD < NLLH, where larger LD willlead to larger adaptation steps. This suggests to consider using lower values of LD, whenthe large steps cause an undesired convergence behavior.

Initialization values

As the filter coefficients are generally unknown∼h(0) = 0LHNLNM×1 is a typical choice for

the initialization of all considered adaptation algorithms. For the GFDAF algorithm anadditional iterative estimate S(sp)

XX(n) is used, which must also be initialized. Unfortu-nately, setting S(sp)

XX(0) to zero is an unsuitable choice since this matrix has to be inverted.Thus, it can be chosen according to

S(sp)XX(0) = ssiI2LHNH , (3.172)

where the parameter ssi ideally represents the average value of the expected diagonalentries of S(sp)

XX(n).

Experimental results

Results regarding the influence of the adaptation algorithm parameters on the systemidentification performance are presented in Appendix C, where randomly generated im-pulse responses have been used to cover a large variety of impulse response samples.One major finding in Appendix C is, that the GFDAF algorithm clearly outperformsthe other algorithms for MIMO system identification. This difference is large enough tojustify disregarding the other algorithms in the following evaluations.

3.4.6 Summary of adaptation algorithmsIn Sections 3.4.1 to 3.4.4, multiple variants of the LMS algorithm, the APA, the RLSalgorithm, and the GFDAF algorithm are presented. In this section, a guide to the mostimportant findings is given.

Previously known derivations of all considered adaptation algorithms have been re-viewed in compliance with the notation used in this thesis, where the identification ofMIMO LEMSs by block-wise data processing was considered. In doing so, a link betweenthe regularization of the RLS algorithm and the LMS algorithm could be established,which is given by (3.104). Furthermore, some corrections to the original derivation of theGFDAF algorithm [BBK03] are provided:

• The a prior error ∼e′si(k) and the a posteriori error ∼esi(k) are clearly distinguished.

• The scaling of the DFT matrix is corrected.

• Equation (3.150) is shown to be an approximation (see also Appendix B) ratherthan an identity. This allows for providing the variant of the GFDAF algorithmdescribed by (3.151), which is not described in [BBK03].

130

There is also a link between the regularization of the GFDAF algorithm and the LMS algo-rithm since the presented derivation identifies the GFDAF algorithm as an approximationof the RLS algorithm. For the application to WDAF, all considered algorithms were alsoderived in two variants that can implement an approximative wave-domain LEMS modeland exploit the wave-domain LEMS properties for cost-guided system identification. Forthe LMS algorithm these variants are given by (3.65) and (3.67), for the APA by (3.80)and (3.87), and for the RLS algorithm by (3.116) and (3.124).

With (3.151), (3.154), and (3.155) three variants of the original GFDAF algorithmwere considered in Sec. 3.4.4. The variants of (3.151), (3.154), and (3.155) for using ofapproximative models are given by (3.157) to (3.159), respectively. Furthermore, thevariants of (3.151), (3.154), and (3.155) to implement cost-guided system identificationare given by (3.162) to (3.164), respectively.

The GFDAF algorithm is also the most promising candidate for real-world implemen-tations of AEC or system identification for MIMO LEMS: while the LMS algorithm andthe APA show a poor performance in such scenarios (see Appendix C), the RLS algorithmis prohibitively expensive when regarding the computational cost. For brevity, the follow-ing evaluation will only consider the variants of the GFDAF algorithm given by (3.154),(3.154), and (3.163), as the variants of the GFDAF algorithm based on(3.151) and (3.155)are expected to provide similar results. Still, all algorithms described in Sec. 3.4 can bea starting point for the further development of WDAF or other approaches for MIMOadaptive filtering.

3.5 Experimental Results 131

3.5 Experimental ResultsIn this section evaluation results for system identification and AEC are presented. First,the evaluation setup considered in the following is described in Sec. 3.5.1. After that,applying approximative models (see Sec. 3.2) under optimal conditions is investigated inSec. 3.5.2. Then results from similar experiments under suboptimal conditions are pre-sented in Sec. 3.5.3. Finally, cost-guided wave-domain system identification (see Sec. 3.3.5)is assessed in Sec. 3.5.4.

3.5.1 Evaluation setup for acoustic echo cancellationIn this section, the evaluation setup for AEC is presented, which was used to obtain theevaluations results presented in Sections 3.5.2 to 3.5.4.

All results presented in the following subsections were obtained for an LEMS as de-scribed in Sec. 2.4.2, comprising NL = 48 loudspeakers and NM = 10 microphones ar-ranged as uniform circular arrays (UCAs), sharing a common array center. The loud-speaker array was of radius RL = 1.5 m and the microphone array was of radius RM =0.05 m, where the positioning inaccuracy was approximately less than one centimeter.This accuracy can be achieved with conventional loudspeaker and microphone standsand without special measurement devices. The enclosing room had a reverberation timeT60 of approximately 0.25 s, where the measured impulse responses of the LEMS weretruncated to 3764 samples, considering a sampling rate of 11025 Hz. The same samplingrate was used for all other processing steps. The considered array setup is depicted inFig. 2.5(a), where photographs of the loudspeaker array and the microphone array areshown in Fig. 3.32. To approach real-world scenarios, mutually uncorrelated white Gaus-sian noise signals were added to all microphone channels with a level of −40 dB relativeto the original microphone signal.

The loudspeaker signals for reproduction were determined according to WFS to syn-thesize plane waves with the incidence angles ϕq, as described in Sec. 2.3.4. The synthesis(or rendering) filters had a length LR of 146 samples, where the aliasing wave number

∼kal

was set to the wave number obtained for a temporal frequency of 2 kHz (see (2.9)). Asingle plane wave synthesized by a loudspeaker array is illustrated in Fig. 3.13.

As mentioned above, the GFDAF algorithm was identified as the most powerful algo-rithm for MIMO scenarios, and will be the only algorithm considered in the following.The algorithm was implemented using (3.154), where the choice of the algorithm pa-rameters was based on the results presented in Appendix C, with minor changes. If notstated otherwise, the following parameters were chosen: For block processing, a frame shiftLF = 512 samples was used, while the filter length LH was chosen to be 4096 samples toavoid effects resulting from an unmodeled impulse response tail. The “forgetting factor”λsi was set to 0.95. To avoid initial divergence, ssi in (3.172) was set to the approximatesteady state mean value of the diagonal entries of S(sp)

XX(n) after the first four seconds ofthe experiments. In a real-world implementation, the adaptation would be stalled for

132

x

y

∼k

∼k

ϕq

Figure 3.13: Illustration of a synthesized plane wave

that time span. For the results presented in Sections 3.5.2 and 3.5.3, the original GFDAFalgorithm after (3.154) was used, i. e., βsi = 0 has been chosen.

3.5.2 Acoustic echo cancellation using approximative loudspeaker-enclosure-microphone system models

In this section, results from an experimental evaluation of an AEC using the wave do-main LEMS model described in Sec. 3.2 are presented. Some of these results have beenpublished in [SK11] with minor changes in the adaptation algorithm and the algorithmparameters.

The behavior of a wave-domain AEC for time-varying acoustic scenes is investigatedfirst, where a regularization weight γsi = 0.01 and a step size µsi = 2 were used as algorithmparameters. The acoustic scenes in the following two experiments consist of four planewaves with the incidence directions ϕq being equal to 0, π/2, π, and 3π/2, for q = 1, 2, 3, 4,respectively. In the first part of the experiments, the plane waves are alternatingly activefor 5 seconds, each, before the previously active waves become subsequently active againsuch that all four wave are simultaneously active at the end of the experiment. In thefirst experiments, the plane waves carry white noise signals, while music signals wereconsidered later. Different wave-domain models were evaluated: four models according to(3.34) with NH = 1, 3, 5, 48 and Model 3 from Fig. 3.7. Note that the models with NH = 1and NH = 3 were referred to as Model 1 and Model 2 in Fig. 3.7, while the models withNH = 5 and NH = 48 have no representation there. However, the model using NH = 48is equivalent to a conventional point-to-point model, as it uses no approximations.

To support the latter statement, the experiments described in this section were alsoconducted with a conventional AEC in the point-to-point domain (TL = INLLX and TM =INMLD). It turned out that the convergence properties of this approach are identical (up


0

20

40

ERLE

indB

NH = 1NH = 3Model 3NH = 5NH = 48

−4

−2

0

2

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.14: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying white noise signals.

to the influence of numerical noise) to those of a wave-domain AEC with NH = 48, suchthat the results are not reported for brevity. Consequently, it can be concluded that atransform of the signals to the wave domain is only beneficial if further measures likeapproximative models or modified wave-domain algorithms are applied.

Using NH = 1 represents the simplest model, which couples only the modes with l = m

for m = −4, . . . , 5. This model is comparable to the models that were originally proposedfor WDAF [BSK04]. Choosing NH = 3, 5 generalizes this very simple model, where themode couplings for l = m, l = m ± 1 are considered by both models, while the lattermodel also considers the couplings for l = m ± 2. Another scheme of modeled couplingsis represented by Model 3, where all 100 couplings between the modes l, m = −4, . . . , 5have been modeled.

In Fig. 3.14, the results for a white-noise excitation of the plane waves are shown, wherethe upper plot shows the ERLE, the middle plot the normalized misalignment, and thelower plot the source activity.

For all considered models, there is a breakdown in ERLE each time a new plane wavebecomes active. This result is expected for any system identification approach since newproperties of the LEMS are revealed at this moment, which then have to be identifiedin order to facilitate AEC. There are also noticeable breakdowns when previously activewaves become active again. Those can, at least in theory, be avoided as the necessary

134

information to cancel the echo of all individual plane waves has already been provided atthis point in time. Hence, these breakdowns indicate model limitations and some loss ofinformation in the adaptation algorithm.

It can be seen that all models are able to cancel the echo when only one plane waveis excited, although ERLE drops as the adaptive filters have to identify the newly re-vealed LEMS properties. When multiple waves are excited, the ERLE distinguishes theindividual models clearly. On the other hand, the convergence behavior regarding thenormalized misalignment is clearly different for all models during all time intervals.

The model with NH = 1 can only cancel the echo sufficiently for a single excited wave,which can be attributed to the fact that this is the simplest model. Hence, the identifiedLEMS is fully determined for minimizing the residual error of a single reproduced source(see (3.50) with m

(H)m = NH). Thus, the adaptive filter diverges from a good system

identification in order to further maximize ERLE.This is no longer possible when two or more waves are excited, such that the ERLE

drops for t > 20s. In that case, the system identification is improved because the adap-tive filter can no longer sacrifice system identification performance in order to maximizeERLE, which is explained in the following: For spatially and temporally uncorrelatedloudspeaker signals, maximizing ERLE and identifying the LEMS are the exactly thesame goals. When nonuniqueness occurs, this changes because the LEMS model can usean arbitrary linear combination of linearly dependent loudspeaker signals to obtain thesame microphone signal estimate. At the same time, a convergence to a filtering schemedifferent from the true LEMS increases the misalignment. In order to obtain such a dif-ferent filtering scheme, the LEMS model must be able to compensate for the filtering ofthe reproduction system, which requires degrees of freedom in the solution for the iden-tified LEMS. While using more source signals for the determination of the loudspeakersignals leads to a lower number of linearly dependent loudspeaker signals, approximativeLEMS models restrict the number of loudspeaker signals that can actually be filtered.Finally, identifying the LEMS by maximizing ERLE becomes an overdetermined problemif NHLH < NS(LH + LR − 1). In that case, the loudspeaker signals considered for eachmicrophone signal appear to be linearly independent to the adaptive filter, which willthen approach a lower misalignment.

Using NH = 3 leads to a satisfying ERLE for up to two synthesized waves. When threewaves are active, all degrees of freedom are fully determined and the ERLE approachesthe same value as for NH = 1 with a single synthesized wave, while the normalized systemmisalignment increases. For four synthesized waves, the ERLE drops, accompanied by adecreasing misalignment.

When ordering the models by the number of couplings, using NH = 5 sorts in betweenusing NH = 3 and using Model 3. The drop in ERLE for this model appears for four activewaves, where the system identification also slightly degrades. For the models discussedabove, this happened for NH = NS, where this behavior occurs here already for NS =NH− 1. Still, considering (3.50) this can be explained, as not only NS and NH determine


0

20

40

60ER

LEin

dBNH = 1NH = 3Model 3NH = 5NH = 48

−4

−2

0

2

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.15: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying music signals.

the conditions for uniqueness, but also the filter length of the reproduction system LR

and the length LH of the identified impulse responses.Model 3 and using NH = 48 imply a larger number of degrees of freedom compared

to the other models such that the identified LEMS is never fully determined. Hence,they achieve a good echo cancellation, even for four active waves. Still, Model 3 providesa worse system identification compared to choosing NH = 5, although Model 3 is morepowerful. The same holds for comparing the case of NH = 48 with Model 3 and NH = 5.This observation points to the facts that more degrees of freedom in the model do notnecessarily lead to a better system identification. Furthermore, it can be seen that agood system identification is not necessary to cancel the loudspeaker echo during spatialstationary of the scene, when system identification is underdetermined.

While white noise signals allow for a well-defined comparison of the modeling capabilityof the systems, they are no typical reproduction signals in practice. Thus, the experi-ments described above were repeated using independent monophonic music recordingsfor each of the individual plane waves, where the results are shown in Fig. 3.15. Whilethe behavior of the adaptive filter with the respective models show the same tendencies,the non-stationarity of the music signals leads to a larger variance in the results. Again,NH = 1 is only suitable for compensating the echo of a single active plane wave, whilethe other models can also compensate the echo of multiple simultaneous waves. Actually,

136

this more realistic scenario is in favor of the approximative models and the difference interms of ERLE between the non-approximative model (NH = 48) and the approximativemodels is significantly reduced. For example, choosing NH = 3 would also be well-suitedto cancel the echo of four excited waves. However, music signals make the system iden-tification more challenging, while the achievable misalignment becomes less predictable.The approximative models using a diagonal structure determined by NH = 1, 3, 5 lead toa better identification, because this structure matches the diagonally dominant structureof the true LEMS. The approximate Model 3 shows a larger system misalignment, sinceit does not rely on this diagonal structure. The non-approximative model with NH = 48shows a poor system identification, again.

For the sake of completeness, the experiments described above were also conducted withcolored-noise source signals that exhibit a spectrum similar to music or speech signals.Since the GFDAF algorithm is able to whiten those signals sufficiently, the results wherealmost identical to the results shown for white noise. Hence, details are not reported forthe sake of brevity.

Finally, the results shown in Fig. 3.15 suggest that a wave-domain LEMS model usingNH = 3 should be suitable in many practically relevant situations, which motivated theuse of this model for the AEC real-time demonstrator described in Sec. 3.6.

The results shown above indicate that there is a strong combined influence of the degreeof approximation in the LEMS model and the number of active independent sources forthe reproduction signal. To analyze this interrelation systematically, another set of ex-periments was conducted, where NS continuously active plane waves carrying white noisesignals were synthesized, while the loudspeaker and microphone signals were processed bya wave-domain AEC using an approximative model with NH diagonals. To minimize thegradient adaptation noise to the results, more conservative parameters have been chosenwith γsi = 0.005, λsi = 0.99, and µsi = 1. After 45 seconds of convergence, the ERLE andthe normalized misalignment were measured, where the results are shown in the upperand lower plot of Fig. 3.16, respectively. There, it can be clearly seen that underdeter-mined scenarios allow for a better ERLE, while overdetermined scenarios lead to a bettersystem identification. The fully determined scenarios (NS = NH) represent a special case,where the system identification is additionally impaired by the compensation of the modellimitations by the adaptive filter.

3.5.3 Approximative loudspeaker-enclosure-microphone systemmodels under sub-optimal conditions

In practice, various conditions may hamper system identification and AEC, where under-modeling and noisy microphone signals are two common difficulties, which are treated inthis section in the order of mentioning. For conciseness, Model 3 was excluded from theevaluation while the other models have been kept.

To investigate the influence of undermodeling, the impulse response length LH of thetrue LEMS was kept at 3764 samples while the length of the adaptive filters LH was


1020

3040

1020

3040

20

40

NH NS

ERLE

(k)

1020

3040

1020

3040

−10

0

NH NS

∆h(n

)

Figure 3.16: ERLE (upper plot) and normalized misalignment (lower plot) after 45 sec-onds of AEC operation as a function of NS and NH

reduced to 2048, 1024, and 512 samples. Using these parameters, a non-approximativeLEMS model can theoretically achieve minimum normalized misalignments of −23.0dB,−20.5dB, and −18.2 dB, respectively. The remaining parameters of this experiment areidentical to those for the experiment shown in Fig. 3.14, which constitutes the referencefor the results discussed in the following. As the results for LH = 2048 were almostidentical to the results shown in Fig. 3.14, they are not reported.

Comparing the results for LH = 1024 in Fig. 3.17 to the results in Fig. 3.14 revealsthat limiting LH limits the achievable ERLE for all models to a value of approximately30 dB, where the approximative models show no different behavior than the model withNH = 48. Regarding the normalized misalignment, the approximative models even show aslightly improved system identification, while a divergence can be observed for the modelwith NH = 48. An explanation for this can be found, when considering the relatively lowmagnitudes of the later part of the LEMS impulse response. The noise in the estimate ofthis part might lead to a larger error than assuming it to be zero, which is implied by LH <

LH. Still, the resulting restriction on the adaptive filter can cause the adaptation algorithm

138

0

20

40ER

LEin

dBNH = 1NH = 3NH = 5NH = 48

−4−2

024

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.17: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment withplane waves carrying white noise signals, and an adaptive filter lengthLH = 1024 < LH = 4096

to compensate for this limitation by sacrificing system identification for an improvedERLE. Due to the lower number of degrees of freedom, the approximative models are lessprone to such a behavior. From Fig. 3.18, it can be read that the observations describedabove are even more evident, when reducing LH to 512 samples.

To evaluate the influence of significant microphone noise, the lengths LH of the adaptivefilter was again chosen to be 4096 samples. The microphone noise introduced for thefollowing experiments was spatially and temporally white with levels of −30dB, −20dB,−10dB, and 0dB, with respect to the noise-free microphone signal. A noise level of −40dBrepresents the conditions from Fig. 3.14. The results for a noise level of −30dB are notsignificantly different from those for −40dB, while the results for −20dB showed a slightlyreduced ERLE for NH = 48, which becomes a significant reduction for the noise levels of−10dB and 0dB. For brevity, only the latter two results are discussed in detail since theresults for −20dB and −30dB do not support further conclusions.

In Fig. 3.19 the results for a microphone signal-to-interference ratio (SIR) of 10 dB areshown, where the ERLE is limited by approximately 25 dB for all considered models.Hence, the influence of the increased microphone noise level on the ERLE is not largerfor the approximative models than for the non-approximative model.


0

10

20ER

LEin

dBNH = 1NH = 3NH = 5NH = 48

0

5

10

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.18: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment withplane waves carrying white noise signals, and an adaptive filter lengthLH = 512 < LH = 4096

Note that the noisy microphone signal was only used for the adaptation of the filters,while the noise-free microphone signal was used to determine the ERLE. This measurewas termed “true ERLE” in [TE13b]. For the approximative models, the normalizedmisalignment is only slightly affected, while divergence can be observed for the non-approximative model.

In Fig. 3.20 results for a strong interference with an SIR of 0 dB are shown. As expected,this degrades ERLE and the misalignment severely, where the more primitive models areless effected by this impairment. This suggests to use simpler models under adverseconditions.

To investigate the influence of the spatial properties of the microphone noise, the experi-ments above were also conducted with a simulated white noise point source as microphonenoise. This was facilitated by convolving the white noise signals with impulse responsesmeasured for the considered microphone array in a completely different setup. The resultswere practically identical to those obtained for spatially white noise, such that they arenot reported for the sake of brevity. The identical results for spatially white noise and aninterfering point source emitting white noise is not surprising, as the problem of systemidentification can be treated separately for each microphone signal, as long as the noisesignal is uncorrelated with the loudspeaker signals.

140

0

10

20ER

LEin

dBNH = 1NH = 3NH = 5NH = 48

−4

−20

2

4

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.19: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying white noise signals, with the microphone noise level of −10 dB(SIR=10dB)

To address further scenarios with a higher practical relevance, these experiments havealso been conducted with plane waves carrying music signals, while the interferer wasrepresented by a point-like source emitting a speech signal. When comparing results forthis scenario with an SIR of 20 dB in Fig. 3.21 to the results in the low-noise scenarioin Fig. 3.15, only a moderate degradation of ERLE can be noticed, while the systemidentification is barely affected. When decreasing the SIR to 10 dB, as shown in Fig. 3.22,the adaptive filters diverge such that the system is poorly identified and only little ERLEis obtained. For an SIR of 0 dB no meaningful system identification, nor an effectiveERLE could be achieved. This is an expected result as real-world implementations ofAEC are typically combined with a DTD to avoid a filter update under such conditionsor with step-size control measures that provide a similar functionality.

The larger noise sensitivity of the adaptive filter in the scenarios comprising musicsignals can be explained when considering the signals in the limited time span used foreach adaptation step (see Sec. 3.1). Within this time span, the noise signals may beapproximately orthogonal, while the strongly autocorrelated music and speech signalswill typically show a larger inner product and, therefore, have a larger impact on theadaptation.


−5

0

5

ERLE

indB

NH = 1NH = 3NH = 5NH = 48

0

10

20

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.20: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying white noise signals, with the microphone noise level of 0 dB(SIR=0dB)

142

0

20

40

ERLE

indB

NH = 1NH = 3NH = 5NH = 48

−4

−2

0

2

4

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.21: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying music signals, with the microphone interferer level of −20 dB(SIR = 20 dB)


0

10

20

30

ERLE

indB

NH = 1NH = 3NH = 5NH = 48

0

10

∆h(n

)in

dB

0 5 10 15 20 25 30 354321

time in seconds

Sour

ce

Figure 3.22: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying music signals, with the microphone interferer level of −10 dB(SIR=10dB)

144

3.5.4 Cost-guided acoustic echo cancellation in underdeterminedscenarios

As described in Sec. 3.3, the nonuniqueness problem will typically hamper the systemidentification in multichannel scenarios and reduce the robustness of AEC. In Sec. 3.3.5an approach to alleviate this problem by introducing an additional guiding cost matrix∼C(n) was proposed, which is evaluated in this section. Note that some of the resultspresented in this section have already been published [SK16b].

To assess the relative improvements by this approach, three other methods are comparedto it: A general wave-domain AEC without any countermeasures to the nonuniquenessproblem, an approximative model with NH = 3 that has been shown to improve the systemidentification in Sec. 3.5.2, and, as a fourth method, the loudspeaker signal decorrelationmethod proposed in [SHK13] that is tailored for WFS scenarios.

To close the gap between the results presented above and below, the scenario consideredabove is used to compare the behavior of an AEC using an LEMS according to Sec. 3.2with an AEC using an cost-guided system identification as described in Sec. 3.3.5. Inorder to make the advantages of the latter visible for AEC, the time scale of the scenarioconsidered for Fig. 3.14 has been enlarged by a factor of two such that the source activitychanges each 10 s instead of each 5 s.

All parameters are chosen identically to those used in the previous section, with theexception of

∼C(n) that was chosen according to (3.166) with β1 = 0.01, β2 = 0.1, ε = 0.01,

and βsi = 5 · 10−4, which is necessary to implement the cost-guided GFDAF algorithm asdescribed in Sec. 3.4.4.

For displaying the results in Fig. 3.23, the approximative wave-domain model is denotedby its parameter NH = 3, while the approaches using the cost-guided wave-domain systemidentification and the original GFDAF algorithm are denoted by βsi = 5 ·10−4 and βsi = 0,respectively, as both of the latter two use models with NH = 48.

For the original GFDAF algorithm with NH = 3 and NH = 48, the results presentedin the previous section are essentially verified. When considering the ERLE in the upperplot, the newly evaluated cost-guided GFDAF algorithm clearly outperforms the AECwith the original GFDAF algorithm and NH = 3 and NH = 48. This is a remarkablebehavior, since nonuniqueness does in general not limit the ERLE. Still, the GFDAFalgorithm only approximates the RLS algorithm such that it can get trapped at a localERLE maximum, which becomes more likely when nonuniqueness occurs and adaptationsteps can be directed towards a large set of directions in the filter coefficient space. Sincethe perfect system identification always implies achieving the maximum ERLE, a lowermisalignment reduces the probability to converge to such a local maximum. It can beread from the middle plot in Fig. 3.23 that the cost-guided GFDAF algorithm can achievea lower normalized misalignment than the other approaches, which then also explains itshigher ERLE in comparison to the original GFDAF algorithms with NH = 48. Still,when considering this scenario with the time scale of Fig. 3.14, the cost-guided GFDAFalgorithm only achieves an ERLE comparable to the original GFDAF algorithm. Note


0

50

ERLE

indB

βsi = 5 · 10−4

βsi = 0NH = 3

−4

−2

0

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 3.23: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment comparingcost-guided system identification with different LEMS models over a longtime span

that the divergence shown by the approach with NH = 3 when three sources are active(time interval 50 to 60 seconds) does not occur for the approach with the cost-guidedGFDAF algorithm which does not use an approximative LEMS model forcing the adaptivefilter to compensate for the model restrictions.

In the following evaluations, the robustness of AEC will be challenged. Since therobustness is strongly influenced by the regularization, the regularization weight γsi wasincreased to γsi = 0.05 to allow for a fair comparison of the original GFDAF algorithmwith NH = 48 to other approaches.

For conciseness of the results, the previously discussed experiment has been simplifiedsuch that only two plane waves with the incidence angles ϕ1 = 0, ϕ2 = π/2 are alternat-ingly or simultaneously synthesized. If not stated otherwise, mutually uncorrelated whitenoise signals were used as source signals for the synthesized plane waves.

Results obtained with the decorrelation method proposed in [SHK13] are used a rep-resentation of the state-of-the-art methods for an alteration of the loudspeaker signals.The approach was implemented as described in [SHK13], where the wave field was rotatedwith an amplitude of π/48 radians, according to a sine function with a period length of301 blocks (index by n). These parameters have been found to be perceptually accept-

146

0

20

40

60

80ER

LEin

dBβsi = 5 · 10−4

βsi = 0NH = 3Decorrelation

−2

−1

0

∆h(n

)in

dB

0 2 4 6 8 10 12 1421

time in seconds

Sour

ce

Figure 3.24: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment underoptimal conditions

able [SHK13]. The results for this loudspeaker signal decorrelation approach is denotedby “decorrelation” in the legends.

In Fig. 3.24, the results for the experiments described above under optimal conditionsare presented. As there were no significant interferers in the microphone signal andthe initialization parameters of the adaptation algorithms were optimally chosen, theconditions of this experiment are considered to be less challenging and will be used as areference when challenging the robustness of the AEC approaches.

The results depicted in the upper plot of Fig. 3.24 show that the respective approachesachieve different degrees of echo cancellation, where the approach with NH = 3 and thedecorrelation method achieve a lower ERLE compared to the original and the cost-guidedGFDAF algorithm with NH = 48. When considering the normalized misalignment shownin the middle plot of Fig. 3.24, it can be seen that the decorrelation approach can alsoreduce the misalignment in comparison to the original GFDAF, while the reduction islower than for the two other approaches.

The choice of the initialization value ssi was based on information that is typically notavailable in practice. Hence, a suboptimal choice of ssi will occur often in practice andmight, e. g., be represented by the optimal value of ssi divided by a somehow arbitrarilychosen factor of 10000, as it was chosen for the results presented in Fig. 3.25. Whenconsidering the middle plot in Fig. 3.25 a significant increase in misalignment can beobserved, when using the original GFDAF algorithm with NH = 48, independently of a


0

20

40

60

ERLE

indB

βsi = 5 · 10−4


−2

0

2

4

∆h(n

)in

dB

0 2 4 6 8 10 12 1421

time in seconds

Sour

ce

Figure 3.25: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment withsuboptimal initialization values

possibly applied loudspeaker signal decorrelation. The latter can only indirectly improvethe convergence of the algorithm by influencing the loudspeaker signal properties. As theloudspeaker signals should only be slowly altered, the improvement cannot be facilitatedin such a short time span. The approaches with the approximative model (NH = 3) andthe cost-guided GFDAF algorithm, in the contrary, can effectively prevent divergence.The divergence expressed by the increased misalignment is strong enough to even affectthe AEC performance, later in the experiment. As it can be seen in the upper plot ofFig. 3.25, the ERLE for the approaches using the original GFDAF algorithm with NH = 48is even worse than for the approximative model with NH = 3. In this experiment, theapproach with the cost-guided GFDAF algorithm outperforms the others, also in termsof ERLE.

In further experiments, short impulses (50 ms) of noise with a power level of 6 dB abovethe unaltered microphone signal (SIR= −6 dB) were added to the microphone signal. Thedisturbance was simulated as a point source interference, as described in the previoussection. This leads to two adaptation steps in the presence of an interfering signal andinfluences the convergence of the adaptation algorithm similar to an undetected double-talk situation. The latter can occur since double-talk detectors are usually not perfectlyreliable [CMB99]. The timeline for this experiment differs from the previous ones, wherethe noise interferences where introduced at t = 5 s and t = 15 s. From the beginning to

148

0

20

40

60

80ER

LEin

dB

βsi = 5 · 10−4


−2

0

2

4

∆h(n

)in

dB

0 5 10 15 20 25 3021

time in seconds

Sour

ce

Figure 3.26: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment withinterfering noise sources in the microphone signals

t = 25 s, the first plane wave (ϕ1 = 0) was synthesized, and from t = 25 s to the end, thesecond plane wave (ϕ2 = π/2) was synthesized.

The results of these experiments are shown in Fig. 3.26, where it can be seen that thisimpairs the system identification for all approaches, when considering the misalignment inthe middle plot. Still, the effect on the approaches with the original GFDAF and NH = 48is much stronger than for the approaches using the approximative model and the cost-guided GFDAF algorithm. Moreover, the latter approaches can recover quickly from thisdisturbance, while the decorrelation approach takes longer the recover and the originalGFDAF algorithm with NH = 48 does not recover at all. For the approaches usingthe original GFDAF and NH = 48, even the ERLE is reduced due to the bad systemidentification, as discussed above. This becomes most evident at t = 25 s where thechange of source activity results in a pronounced breakdown in ERLE for the approachesshowing a poor system identification. When considering the normalized misalignment forthe decorrelation approach in Fig. 3.24, it can be seen that this approach is only slowlyimproving the system identification. At the same time, the experiment shown in Fig. 3.26challenges the adaptation algorithm after a short period, which explains the disappointingresults for the decorrelation approach.

Since noise signals are rarely reproduced in practice, the scenario considered in Fig. 3.24was modified such that the plane waves carry music signals instead of noise. The resultsare shown in Fig. 3.27, where the upper plot shows a slight degradation of ERLE for


0

20

40

ERLE

indB

βsi = 5 · 10−4


−1

−0.5

0

∆h(n

)in

dB

0 2 4 6 8 10 12 1421

time in seconds

Sour

ce

Figure 3.27: ERLE (upper plot), normalized misalignment (middle plot), and source ac-tivity (lower plot) as a function over time, for an AEC experiment with planewaves carrying music signals

all approaches that are able to reduce the misalignment shown in the middle plot. Fur-thermore, it can be seen that the loudspeaker signal decorrelation approach shows againonly a small improvement of misalignment, where the impairment regarding ERLE isless visible compared to the previous evaluations. In this scenario, the approaches usingthe approximative model and the cost-guided GFDAF algorithm are able to significantlyimprove the system identification, while an improvement of the ERLE describing AECperformance cannot be seen.

While it was shown that the cost-guided GFDAF algorithm can improve the systemidentification, the influence of the weight factor βsi has not yet been evaluated. This is nowdone by presenting the results in Fig. 3.28, where the same experiment as in Fig. 3.24 wasconducted, using only the cost-guided GFDAF algorithm with different weight factors βsi.It can be seen that increasing βsi leads to an improved system identification, while this isat the cost of ERLE, i. e., the AEC performance. This suggests using a higher value, whensystem identification is in the focus and a lower value when ERLE is to be emphasized.This motivated the rather conservative choice of βsi = 5 · 10−4 in the experiments above.

Experimental results for the influence of noise and limited filter lengths on the cost-guided wave-domain system identification, comparable to the results shown in Figures 3.19to 3.22, are omitted here since they are expected to be very similar to those for theapproximative models under the same influences.

150

0

50

ERLE

indB

βsi = 1 · 10−4

βsi = 5 · 10−4

βsi = 1 · 10−3

βsi = 2.5 · 10−3

βsi = 1 · 10−2

−2

−1

0

∆h(n

)in

dB

0 2 4 6 8 10 12 1421

time in seconds

Sour

ce

Figure 3.28: ERLE (upper plot) and normalized misalignment (middle plot), achievedwith the cost-guided wave-domain system identification and different weightparameters βsi

Alternative approaches to improve system identification

There are also further alternative approaches to improve the system identification, withoutexploiting wave-domain properties. One way is to use adaptation algorithms based onrobust statistics [BBGK06], which would be less affected by interferers and would betypically more robust in scenarios as considered for Fig. 3.26. Still, as those algorithmcan only exploit the limited information provided by the loudspeaker and microphonesignals, they will generally not be able to recover from a misalignment introduced duringnonuniqueness (see Sec. 3.3), although the misalignment introduced by interferers will ingeneral be smaller. Note that it is possible to modify the cost function of the algorithmpresented in [BBGK06] as described in Sec. 3.3.5, which is expected to lead to an evenmore robust adaptation algorithm.

The cost-guided GFDAF, as introduced in Sec. 3.4.4, can also be applied withoutusing wave-domain transforms. One way to facilitate this is to choose

∼C(n) according

to (3.168), while keeping∼hC(n) = 0LHNLNM×1. This would penalize large coefficients in

∼h(n) and therefore lead to balanced magnitudes of all coefficients. If an LEMS exhibitsapproximately equal loudspeaker-to-microphone coupling weights, as this is the case forthe considered LEMS, this will improve the system identification when nonuniquenessoccurs. Still, as the cost introduced by

∼C(n) must be chosen carefully, and as penalizing all

coefficients with the same weight allows only for a very coarse control over the convergence

3.6 Real-Time Implementation of Acoustic Echo Cancellation 151

of the adaptive filter, the effect must be expected to be very limited. This has been verifiedby experiments which are not reported here for the sake of brevity.

In order to achieve a more powerful control over the convergence behavior, the guidancecoefficients

∼hC(n) can be chosen accordingly. To this end, previously measured impulse

responses can be used, assuring that the array setup has not been changed significantlyafter measurement. Note that this approach does not exploit any wave-domain propertiesof the LEMS. To evaluate this method, two sets of impulse responses were measured inthe same room with the same array setup. While the latter was left untouched, the roomhas been modified by moving thick curtains located at the walls. The first set of impulseresponses was used to simulate the LEMS, as it was done for the previous experiments.The second set to impulse responses was used for

∼hC(n), where the misalignment between

both sets of impulse responses is −2.4 dB, when being normalized to the first set.Like in Fig. 3.28, the experiment used for Fig. 3.24 was conducted with different values

of the weight parameter βsi, with results as shown in Fig. 3.29. From the upper plot it canbe seen that βsi ≤ 10−3 barely influence the ERLE, while βsi = 10−2 already reduces itsignificantly. The middle plot shows the normalized misalignment, which approaches thenormalized misalignment between both sets of impulse responses more closely, the largerβsi is. While this is an expected result, it also constitutes a limitation of this approach:once the misalignment achieved by the adaptive filter is lower than the misalignment be-tween the true LEMS impulse response and the measured one, this approach will no longerimprove the system identification, but hamper it instead. As it can be seen for βsi = 10−4,a suitable choice of βsi can lead to a slightly lower misalignment than the misalignmentbetween the two sets of impulse responses. Considering the fact, that only minor changesto the room already lead to a misalignment of −2.4 dB, it can be concluded that thisapproach is only suitable to improve the system identification in specific scenarios. Notethat a wave-domain representation of the LEMS will not alleviate this problem.

3.6 Real-Time Implementation of Acoustic EchoCancellation

In this section, the computational savings of implementing an AEC system using wave-domain approximative models are briefly discussed. First, the number of floating-pointoperations (FLOPs) needed for each iteration of the adaptation algorithms is presented,before an AEC real-time demonstrator is described.

When implementing a WDAF AEC system using the GFDAF algorithms as describedin Appendix D, a large number of multiplications and additions have to be computed.For many central processing units (CPUs), computing a multiplication followed by anaddition takes the same time as computing the multiplication alone. Thus, a combinationof both operations is referred to as one FLOP when measuring the computational effortin the following. Furthermore, a lower number of quotients and square roots has to bedetermined. As the latter two operations are typically far more demanding to a CPU

152

0

20

40

60

80ER

LEin

dBβsi = 10−5

βsi = 10−4

βsi = 10−3

βsi = 10−2

−3

−2

−1

0

∆h(n

)in

dB

0 2 4 6 8 10 12 1421

time in seconds

Sour

ce

Figure 3.29: ERLE (upper plot), normalized misalignment (middle plot) and source activ-ity (lower plot), achieved by the guidance with a previously measured MIMOimpulse response and different weight parameters βsi.

than multiplication and addition, computing quotients and square roots are consideredto be equivalent to four FLOPs. This relation was found empirically, by measuring thecomputation time for a large number of the respective operations. The computationaleffort for each iteration of an GFDAF adaptation with NM = 10, LD = LF = 512, LX =LH +LD− 1, LH = 4096, and LT = 128 is shown in Fig. 3.30, where a conventional point-to-point LEMS model is compared to four approximative models according to (3.34) withNH = 3, 5, 10, 20. It can be clearly seen, that wave-domain approximative models areincreasingly attractive, when using a higher number of loudspeakers. Still, the decreasedcomputational effort for filter adaptation, is paid by a larger effort for transforms andby sacrificing some redundancy in the matrix inverse, which could be exploited for anefficient implementation (see Appendix D). This implies that using approximative modelsis not indicated for systems with a low number of loudspeaker channels.

The AEC demonstrator in its considered form (Release of Sep. 11, 2013) uses the trans-forms in Sec. 2.4.2, in conjunction with the GFDAF as described by (3.158) in Sec. 3.4.4and the LEMS model with NH = 3 as described by (3.34) in Sec. 3.2. The currently usedtransforms have been determined for a circular loudspeaker and a circular microphone ar-ray, arranged as shown in Fig. 2.5(a). A screenshot of the graphical user interface (GUI)is given in Fig. 3.31, which shows the signal model of the demonstrator (see also Fig. 3.2).The loudspeaker and microphone arrays actually used with the demonstrator are shownin Fig. 3.32 and are equipped with 48 loudspeakers and 10 microphones. The transforms

3.6 Real-Time Implementation of Acoustic Echo Cancellation 153

5 10 15 20 25 30 35 40 45

107

108

NL

FLO

Ps

point-to-point NH = 3NH = 5 NH = 10NH = 20

Figure 3.30: Computational effort for WDAF GFDAF adaptation per iteration (indexedby n). Values for NH > NL are not shown because such models do not exist.

“T1” and “T2” shown in Fig. 3.31 represent the loudspeaker signal transform (LST) andthe microphone signal transform (MST) as given by (2.215) and (2.217), respectively, inSec. 2.3.3, while “T3” represents the inverse of the MST given by (2.216). The transformsused in the demonstrator perform a sampling rate conversion as described in [MK97],such that the GFDAF algorithm operates at a sampling rate of 22050 Hz while the WFSreproduction uses a sampling rate of 44100 Hz. In this setup, the demonstrator operateswith a filter length of 6000 samples modeling NH = 3 couplings per microphone signalcomponent, i. e., 30 × 6000 = 180 000 coefficients on an Intel Core 2 Quad Processor(Model Q9450) with a clock frequency of 2.66GHz.

154

Figure 3.31: Screen shot of the WDAF AEC real-time demonstrator

(a) (b)

Figure 3.32: Photographs of the used loudspeaker array (a) and microphone array (b)

155

4 Wave-Domain Equalization ofReproduced Acoustic Scenes

This chapter is concerned with listening room equalization (LRE) in the wave domain.In Sec. 4.1, the task definition of LRE from Sec. 2.3.5 is related to the signal quantitiesand the solution of a least-squares problem that are considered in this chapter. Then,an approximative wave-domain equalizer structure is presented in Sec. 4.2, which canbe used to reduce the computational effort of an implementation. Similar to systemidentification, LRE might lack a unique solution, which is discussed in Sec. 4.3. Forimplementation, the task of LRE can be interpreted as determining equalizers for pre-viously estimated loudspeaker-enclosure-microphone system (LEMS) impulse responses,which can be obtained as described in Chapter 3. For brevity, the estimated impulseresponses will be referred to as estimated LEMS in the following. Determining equalizersfor an estimated LEMS is treated in Sec. 4.4, before the application of the adaptationalgorithms for equalizer determination is discussed in Sec. 4.5. In the latter section, theiterative DFT-domain inversion (IDI) algorithm is treated in addition to the algorithmsalready derived in Sec. 3.4. In Sec. 4.4, an LRE system is evaluated that uses the modelsand methods described in this thesis, before the actual implementation of an LRE systemis briefly described in Sec. 4.7.

4.1 Signal Model and Task Definition

In this section, the task of listening room equalization is explained and signal models forthe task definition and for an implementation of LRE systems are discussed.

As explained in Sec. 2.3.5, the problem of equalizing a reproduced scene at the listener’sposition, is approximated by equalizing this scene at multiple microphone positions. Thesignal model for this problem is shown in Fig. 4.1, where the multiple-input multiple-output (MIMO) equalizer G(ω) from Fig. 2.12 is represented by the NLLY×NLLX matrixG(n). In the same way, the time-invariant LEMS H(ω) and the desired system responseH 0(ω) are represented by H and H0, respectively. The loudspeaker signals x(ω), theequalized loudspeaker signals y(ω), the microphone signals d(ω) and the error signale′(ω), are now represented as x(k), y(k), d(k), and eeq(k), respectively. The error signalin Fig. 4.1 is given by

156

G(n) H

H0

x(k) y(k) d(k) − eeq(k)+

Figure 4.1: Discrete-time signal model for LRE task definition

eeq(k) = H0x(k)−HG(n)x(k), (4.1)

where eeq(k) captures NM signal segments of length LD each, as described for d(k) in(2.203). The desired impulse responses described by H0 are often chosen to be the free-field impulse responses between loudspeakers and microphones. This is because reproduc-tion approaches like wave field synthesis (WFS) or Higher-Order Ambisonics (HOA) aretypically optimized for free-field conditions. Hence, the influence of the actual listeningroom has to be compensated, which is also the aim in the following considerations.

The equalized loudspeaker signals are represented by the vector

y(k) = (y1(k),y2(k), . . . ,yNL(k))T , (4.2)yλ′(k) = (yλ′(k − LY + 1), yλ′(k − LY + 2), . . . , yλ′(k))T , (4.3)

where the individual channels are indexed by λ′, although they are later fed to loudspeakerλ. This is necessary to consistently describe a coupling of all original loudspeaker signals(indexed by λ) to each equalized loudspeaker signal. This signal vector is given by

y(k) = G(n)x(k). (4.4)

The equalizers are represented by the NLLY×NLLX matrix G(n), which captures NL×NL

convolution matrices defined by the impulse responses gλ′,λ(k, n) such that

yλ′(k) =NL∑λ=1

LG−1∑κ=0

xλ(k − κ)gλ′,λ(κ, n). (4.5)

The signal segments in x(k) are of length LX, which results in signal segments in y(k) oflength LY = LX − LG + 1.

The equalizers are to be determined such that eeq(k) is minimized with respect to asuitable norm. One approach is to minimize the Euclidean norm ‖eeq(k)‖2 in the meansquare sense, which will be considered in the following. To this end, the square of (4.1) isminimized to obtain equalizers according to

G(n) = argminG(n)

{E{eHeq(k)eeq(k)

}}, (4.6)

eHeq(k)eeq(k) = (H0x(k)−HG(n)x(k))H(H0x(k)−HG(n)x(k)). (4.7)


G(n− 1)


Systemidentification H

x(k)

d(k)

y(k)

H(n)H0

G(n)

Figure 4.2: Signal model for the implementation of a conventional LRE system in thepoint-to-point domain. H: LEMS, H(n): estimated LEMS, H0: desiredLEMS system response, G(n): equalizer, x(k): original loudspeaker signal,y(k): equalized loudspeaker signal, d(k): microphone signal

Unfortunately, (4.6) does not allow for a similarly straightforward solution as (3.6), asthere are two obstacles to overcome: First, H cannot be directly observed, and second,(4.7) has to be rearranged such that (4.6) can be solved for G(n).

To overcome the first problem, an estimated LEMS can be used to substitute theunknown H, where the task of system identification has been extensively discussed inChapter 3. All results obtained there can be straightforwardly applied here, when replac-ing the original loudspeaker signal x(k) by the equalized loudspeaker signals y(k). Thelatter have then to be used for system identification, which requires also to accordinglychoose the signal segment size LX used for system identification and the length LH of theidentified impulse responses. Note that the value of LD chosen for system identificationcan differ from the value of LD for equalization considered in the following. The estimatedLEMS H(n) is fed to the algorithm for the equalizer determination, which solves the op-timization problem illustrated in Fig. 4.1 using H(n) instead of H. The signal modelfor such an LRE implementation is shown in Fig. 4.2 where the actual equalization hasto be based on the equalizers determined in the previous adaptation iteration to avoid acausality dilemma with the system identification. Still, the following analysis will considerequalizing given loudspeaker signals for a given (or known) LEMS first, before the taskof determining equalizers for an estimated LEMS is discussed.

Like system identification, the task of listening room equalization can be described inthe wave domain including conventional LRE in the point-to-point domain as a specialcase. Thus, only a wave-domain LRE will be considered in the remainder of this chapter,in order to reduce the number of quantities that need to be defined. In doing so, LT = 1,TLTL = INLLX ,TL = TH

L and TMTM = INMLD ,TM = THM are assumed, if not stated

otherwise. While these assumptions are helpful for a rigorous mathematical analysis, animplementation may use LT > 1 and essentially retain all properties described in thefollowing. The signal model describing the task of wave-domain LRE is shown in Fig. 4.3,where

∼G(n),

∼H0, ∼y(k), and ∼eeq(k) represent G(n), H0, y(k), and eeq(k) of Fig. 4.1,

respectively. Similar to (4.4), the equalized wave-domain loudspeaker signals are obtained

158

TL H TM

TL∼G(n) ∼

H TM

∼H0

wave domain

x(k) ∼x(k) ∼y(k)∼d(k) −

∼eeq(k) eeq(k)+TL TM

Figure 4.3: Discrete-time signal model for LRE task definition in the wave domain

according to

∼y(k) =∼G(n)∼x(k), (4.8)

where∼G(n) captures the equalizer impulse responses ∼

gl′,l(k, n) used to filter the originalwave-domain loudspeaker signal in ∼x(k) indexed by l to obtain the equalized loudspeakersignal component in ∼y(k) indexed by l′. Thus,

∼G(n) is structured analogously to (4.5)

such that

∼yl′(k) =

NL∑l=1

LG−1∑κ=0

∼xl(k − κ)∼

gl′,l(κ, n), (4.9)

where the equalized wave-domain loudspeaker signals are captured in

∼y(k) = (∼y1(k), ∼y2(k), . . . , ∼yNL(k))T , (4.10)∼yl′(k) = (∼

yl′(k − LY + 1), ∼yl′(k − LY + 2), . . . , ∼

yl′(k))T . (4.11)

Note that LT = 1 was exploited to avoid defining a wave-domain representation of LY.The wave-domain representations of (4.1), (4.6), and (4.7) are given by

∼eeq(k) =∼H0

∼x(k)−∼H

∼G(n)∼x(k), (4.12)

∼eHeq(k)∼eeq(k) =( ∼H0

∼x(k)−∼H

∼G(n)∼x(k)

)H ( ∼H0

∼x(k)−∼H

∼G(n)∼x(k)

), (4.13)

∼G(n) = argmin

∼G(n)

{E{

∼eHeq(k)∼eeq(k)}}

, (4.14)

where the unitary transforms as given by, e.g., (2.228) and (2.229) ensure that minimizing(4.13) is identical to minimizing (4.7). Since the desired impulse responses are chosento be the free-field impulse responses between the loudspeakers and the microphones,


their wave-domain representation∼H0 exhibits a specific property: As the cascade of the

loudspeaker signal transform (LST) and the inverse microphone signal transform (MST)already models many properties of the free-field impulse responses (see Sec. 2.3.3),

∼H0

couples only modes of the same order:

∼H0 =

∼H0,1,1 0LD×LX . . . 0LD×LX

0LD×LX

∼H0,2,2 . . . 0LD×LX

... ... . . . ...0LD×LX 0LD×LX . . .

∼H0,NM,NL

, (4.15)

where∼H0,m,l are convolution matrices as described in (2.208), capturing the impulse

responses∼h0,m,l(k).

As (4.14) describes a least-squares optimization, setting the complex (or Wirtinger)gradient [Bra83, Fis02] of (4.13) to zero leads to a solution for

∼G(n), where (4.13) has

to be rearranged first such that it can be solved for∼G(n). Such a rearrangement can be

done in the same way as already done for filtering with the identified system in (3.22).Thus, ∼g(n) and

∼X′(k) are defined by

∼X′(k)∼g(n) =

∼G(n)∼x(k), (4.16)

where∼X′(k) and

∼G(n) have LY = LD +LH−1 rows, while the elements of ∼g(n) and

∼G(n)

describe equalizer impulse responses of length LG, and the length of the signal segmentsin ∼x(k) is LD +LH +LG− 1. The length-N2

LLG vector capturing the equalizer coefficientsis defined by

∼g(n) =(

∼gT1,1(k, n), ∼gT1,2(k, n), . . . , ∼gT1,NL(k, n), ∼gT2,1(k, n), ∼gT2,2(k, n), . . . , ∼gT2,NL

(k, n),

. . . ,∼gTNL,1(k, n), ∼gTNL,2(k, n), . . . , ∼gTNL,NL

(k, n))T

(4.17)∼gl′,l(n) = (∼

gl′,l(0, n), ∼gl′,l(1, n), . . . , ∼

gl′,l(LG − 1, n))T . (4.18)

The loudspeaker signals are then captured in the NL(LD + LH − 1)×NLNLLG matrix∼X′(k) = INL ⊗

( ∼X′1(k),

∼X′2(k), . . . ,

∼X′NL

(k)), (4.19)

∼X′l(k) =

∼xl(k − LY + 1) ∼

xl(k − LY) · · · ∼xl(k − LY − LG + 2)

∼xl(k − LY + 2) ∼

xl(k − LY + 1) · · · ∼xl(k − LY − LG + 3)

... ... . . . ...∼xl(k) ∼

xl(k − 1) · · · ∼xl(k − LG + 1)

. (4.20)

The structure of the matrices on the left-hand side and on the right-hand side of (4.16)is shown in Fig. 4.4. The gradient of (4.13) is given by

∂∼eHeq(k)∼eeq(k)∂

∼gH(n)= −

( ∼X′(k)

)H ∼HH

∼H0

∼x(k) +( ∼X′(k)

)H ∼HH

∼H

∼X′(k)∼g(n), (4.21)

160

∼y(k) =∼G(n) · ∼x(k)

= ·

∼y(k) =∼X′(k) · ∼g(n)

= ·

∼y1(k)∼y2(k)

∼G1,1(n)

∼G1,2(n)

∼G2,1(n)

∼G2,2(n)

∼x1(k)

∼x2(k)

∼y1(k)∼y2(k)

∼X′1(k)

∼X′2(k)

∼X′1(k)

∼X′2(k)

∼g1,1(n)∼g1,2(n)∼g2,1(n)∼g2,2(n)

= column vector

= convolution matrix similar to (2.208)


Figure 4.4: Differently structured matrices describing the same MIMO filtering operationfor NL = 2

such that (4.14) can be represented by

E{( ∼

X′(k))H ∼

HH∼H

∼X′(k)

}∼g(n) = E

{( ∼X′(k)

)H ∼HH

∼H0

∼x(k)}. (4.22)

In order to obtain a formulation similar to (3.25),∼Z(k) =

∼H

∼X′(k), (4.23)

∼z0(k) =∼H0

∼x(k), (4.24)∼RZZ = E

{∼ZH(k)

∼Z(k)

}, (4.25)

∼rZz0 = E{∼ZH(k)∼z0(k)

}(4.26)

can be defined such that

∼RZZ

∼g(n) = ∼rZz0. (4.27)

The latter equation can be solved using the inverse∼R−1

ZZ if∼RZZ is non-singular or the

4.2 Scalable Wave-Domain Equalizer Structure 161

Moore-Penrose pseudoinverse∼R†ZZ if the problem is underdetermined. While (4.27) de-

scribes a least squares solution for ∼g(n) as (3.25) does for∼h(n), there are also significant

differences:

• The matrix∼RZZ depends not only on the loudspeaker signal properties, but also on

the LEMS properties.

• All components of ∼g(n) are coupled to all components captured in ∼eeq(k), becauseeach of the components of ∼g(n) is associated with one equalized loudspeaker signal,which is then coupled to all microphone signals.

• The quantities∼RZZ and ∼rZz0 do not describe the properties of any signals visible

in Fig. 4.3 nor in Fig. 4.5. Instead, they describe the properties of the originalloudspeaker signals as they would appear after being filtered through the consideredLEMS. Obtaining this signal property description requires considering an estimatedLEMS instead of the unknown true one.

The signal model for the implementation of a wave-domain LRE is depicted in Fig. 4.5.Again, when comparing Figures 4.2 and 4.5, a strong similarity between the conventionalpoint-to-point signal model and the wave-domain signal model can be seen, like it is alsothe case for system identification when comparing Figures 3.1 and 3.3. Like in Fig. 4.2,the loudspeaker signals used to identify the LEMS are pre-equalized using the previouslydetermined equalizers:

∼y(k) =∼G(n− 1)∼x(k). (4.28)

The estimated LEMS is then used to determine the equalizers by another adaptationalgorithm minimizing the error shown in Fig. 4.3. It can be seen from Figures 4.3 and 4.5that the inverse LST is always needed for LRE. Unlike wave-domain system identification,there is no alternative choice for structures to realize a wave-domain LRE that would bemeaningful for the scenarios considered in this thesis.

4.2 Scalable Wave-Domain Equalizer StructureSimilar to system identification, it is possible to use a reduced, or down-scaled, equal-izer structure approximating optimal equalizers in the wave domain. This approach isexplained in this section and has already been presented in [SK12a], where it was foundto reduce the computational effort for LRE drastically.

The possibility to successfully apply an approximative equalizer structure results alsofrom the dominant mode couplings of the wave-domain LEMS (see Sec. 2.4.2 andFig. 4.6(a)). For explanation, a wave-domain LEMS

∼H, where the coupling weights are

decreasing with their distance to the main diagonal is considered first. At the same time,

162

TL∼G(n− 1) TL


Systemidentification H

TM

wave domain

TM

TL TL

x(k) ∼x(k) y(k)

d(k)∼d(k)

∼y(k)

∼H(n)∼

H0

∼G(n)

Figure 4.5: Signal model of a wave-domain LRE system. H: LEMS, H(n): estimatedLEMS, H0: desired LEMS system response, G(n): equalizer, x(k): origi-nal loudspeaker signal, y(k): equalized loudspeaker signal, d(k): microphonesignal

the desired equalized wave-domain LEMS response, given by∼H0, is diagonal as illus-

trated in Fig. 4.6(b). Moreover, aiming at a loudspeaker signal-independent solution, theultimate goal for LRE is described by

∼H

∼G(n) =

∼H0. (4.29)

Consequently, the cascade of the equalizers and the LEMS represented as a matrix mul-tiplication should also be diagonal. Since

∼H has a strong diagonal dominance and

∼H0 is

strictly diagonal, an optimal∼G(n) will typically also be diagonally dominant.

The weight of the optimal equalizers and the resulting approximative equalizer structureis illustrated in Fig. 4.6. There, the weights of the wave-domain model couplings of theLEMS are illustrated in Fig. 4.6(a), while Fig. 4.6(b) shows the strictly diagonal structureof

∼H0. The resulting wave-domain equalizer weights are shown in Fig. 4.6(c). This

motivated the approximation for the equalizers illustrated in Fig. 4.6(d).In an approximative equalizer structure, only certain equalizers will be determined,

where this structure is defined by the NL×NL matrix MG. The columns of MG correspondto the index l of the original loudspeaker signal and the rows correspond to the index l′

of the equalized loudspeaker signal. An entry in this matrix is equal to one, wheneverthe respective signals are coupled by the equalizers (or zero otherwise), such that theapproximative equalizers must fulfill

∼g(n) = Diag(vec

(MT

G

)⊗ 1LG×1

)∼g(n). (4.30)

This implies that certain elements in ∼g(n) are zero.

4.3 Uniqueness of Equalizers 163

When using circular harmonics for the wave-domain representations, the dominantentries in

∼G(n) are those coupling modes with a low difference in their mode order. This

suggests defining

[MG]l′,l ={

1 for |l − l′| ≤ (NG − 1)/2,0 otherwise, (4.31)

where l and l′ are obtained from l and l′, respectively, using (2.170). The parameter NG

allows to choose the number of equalizer couplings per original loudspeaker input signal.When approximative wave-domain LEMS models are considered, it should be ensured thatthe complexity of the equalizers and the LEMS model is somehow balanced. Empiricalevidence showed that increasing NG above the value of NH only leads to a rather limitedimprovement of the equalization performance. The same holds for increasing NH witha fixed value of NG. Hence, NG = NH appears to be a suitable choice with respect tocomputational efficiency. When choosing an approximative equalizer structure as shownin Fig. 4.6(d) in conjunction with an approximative LEMS model (Fig. 4.6(e)), a furtherrelation becomes obvious: A single original loudspeaker signal ∼

xl(k) only influences NG

equalized loudspeaker signals, where each of those can influence only NH wave-domainmicrophone signals of the estimated LEMS. Consequently, a single original loudspeakersignal can only influence up to NGNH wave-domain microphone signals, which limitsthe number of relevant components in the corresponding error signal, as illustrated inFig. 4.6(f). Hence, the number of error signals considered for each original wave-domainloudspeaker signal can be limited to NE = NGNH without any cost, where NE < NGNH

can be used as an additional approximation. The actually considered signals are thendetermined by ME, where the rows and columns of this NM × NL matrix correspond tothe indices m and l of the wave-domain error signals in ∼eeq(k) and the original wave-domain loudspeaker signals in ∼x(k), respectively. A suitable definition for wave-domainadaptive filtering (WDAF) using circular harmonics is then given by

[ME]m,l =

1 for∣∣∣l − m∣∣∣ ≤ (NE − 1)/2,

0 otherwise.(4.32)

Since all loudspeaker signals are considered simultaneously so far, all NM error signals willtypically have to be determined. Still, the reason to consider ME will become plausiblelater, when the loudspeaker signals will be considered separately.

4.3 Uniqueness of EqualizersAs shown in Sec. 3.3, the loudspeaker signal correlation properties can lead to nonuniquesolutions to the system identification problem. This raises the question, how the cor-relation properties of the loudspeaker signals influence the uniqueness of ∼g(n), which isinvestigated in this section. For this investigation it is necessary to consider the loud-speaker signal properties as well as the properties of the LEMS simultaneously, which

164

l′

m

(a)

l

m

(b)

l

l′

(c)

l

l′

(d)

l′

m

(e)

l

m

(f)

Figure 4.6: Illustration of the LEMS model and the corresponding equalizer structure:(a) weights of the wave field component couplings in the true LEMS

∼H,

(b) weights of the wave field component couplings for the desired system re-sponse

∼H0, (c) resulting weights of the wave-domain equalizers

∼G(n), (d) ap-

proximative wave-domain equalizer structure NG = 3 (see (4.31)), (e) usedapproximative wave-domain LEMS model NH = 3 (see (3.34)), (f) resultinginfluence on the error signals

renders the analysis more challenging than the analysis of the nonuniqueness problem forsystem identification. Thus, the results presented in the following cannot be as detailed asthe results presented in Sec. 3.3. Within this section, the LEMS is assumed to be knownas system identification is not in the focus of the considerations.

Whenever nonuniqueness results from a strong correlation of the loudspeaker signals,a low robustness against a change of the loudspeaker signal correlation properties is ex-pected. If nonuniqueness results from the LEMS impulse responses, the equalization apartfrom the microphone positions will typically degrade, i. e., the spatial robustness of theequalizers will be low.

As can be seen from (4.27), the equalizers ∼g(n) are uniquely determined whenever∼RZZ

is non-singular. To separate the influence of the statistical properties of∼X′(k) and the

(deterministic) properties of∼H on the rank of

∼RZZ, an expression equivalent to (4.25) will

be used, where both quantities are commuted. To this end,∼X′′(k) and

∼H′′ are defined

according to

∼H

∼X′(k) =

∼X′′(k)

∼H′′, (4.33)


where the structures of∼H′′ and

∼X′′(k) are given by

∼H′′ =

INL ⊗

∼H′′1,1 INL ⊗

∼H′′1,2 · · · INL ⊗

∼H′′1,NL

INL ⊗∼H′′2,1 INL ⊗

∼H′′2,2 · · · INL ⊗

∼H′′2,NL... ... . . . ...

INL ⊗∼H′′NM,1 INL ⊗

∼H′′NM,2 · · · INL ⊗

∼H′′NM,NL

, (4.34)

∼H′′m,l =

∼hm,l(0) 0 · · · 0 0∼hm,l(1)

∼hm,l(0) · · · 0 0

... ... . . . ... ...∼hm,l(LH − 1)

∼hm,l(LH − 2) · · ·

∼hm,l(0) 0

0∼hm,l(LH − 1) · · ·

∼hm,l(1)

∼hm,l(0)

... ... . . . ... ...0 0 · · · 0

∼hm,l(LH − 1)

, (4.35)

and∼X′′(k) = INM ⊗

( ∼X′′1(k),

∼X′′2(k), . . . ,

∼X′′NL

(k)), (4.36)

∼X′′l (k) =

∼xl(k − LD + 1) ∼

xl(k − LD) · · · ∼xl(k − LD − LGH + 2)

∼xl(k − LD + 2) ∼

xl(k − LD + 1) · · · ∼xl(k − LD − LGH + 3)

... ... . . . ...∼xl(k) ∼

xl(k − 1) · · · ∼xl(k − LGH + 1)

. (4.37)

Here,

LGH = LG + LH − 1 (4.38)

represents the temporal length of the convolution product of the equalizers and the LEMSimpulse responses. The structure of

∼H′′m,l describes a transposed convolution matrix like,

e. g., HTµ,λ

, where the structures of both sides of (4.33) are illustrated in Fig. 4.7.As

∼H is deterministic, it can be placed outside the expectation operator

∼RZZ =

( ∼H′′)HE{( ∼

X′′(k))H ∼

X′′(k)}

∼H′′, (4.39)

where the structure of the right-hand side of (4.39) is illustrated in Fig. 4.8. This allowsto define an autocorrelation matrix of the wave-domain loudspeaker signals according to

∼R′′XX = E

{( ∼X′′(k)

)H ∼X′′(k)

}, (4.40)

166

∼H ·

∼X′(k)

·

∼X′′(k) ·

∼H′′

·

∼H1,1

∼H1,2

∼H2,1

∼H2,2

∼X′1(k)

∼X′2(k)

∼X′1(k)

∼X′2(k)

∼X′′1(k)

∼X′′2(k)

∼X′′1(k)

∼X′′2(k)

∼H′′1,1

∼H′′1,1

∼H′′1,2

∼H′′1,2

∼H′′2,1

∼H′′2,1

∼H′′2,2

∼H′′2,2





Figure 4.7: Differently structured matrices describing the same MIMO filtering operation(4.33) for NL = NM = 2. The matrix

∼Hm,l denotes the wave-domain counter-

part of Hµ,λ.


∼RZZ =

( ∼H′′)H

·∼R′′XX ·

∼H′′

=

·

·

∼H′′1,1

∼H′′1,1

∼H′′1,2

∼H′′1,2

∼H′′2,1

∼H′′2,1

∼H′′2,2

∼H′′2,2

∼H′′1,1

∼H′′1,1

∼H′′1,2

∼H′′1,2

∼H′′2,1

∼H′′2,1

∼H′′2,2

∼H′′2,2

= single channel correlation matrix


Figure 4.8: Structure of the matrices determining the least-squares solution for LRE

where applying (3.31) leads to∼R′′XX = INM ⊗ E

{( ∼X′′1(k), . . . ,

∼X′′NL

(k))H ( ∼

X′′1(k), . . . ,∼X′′NL

(k))}

(4.41)

= LD INM ⊗(TLRXXTH

L

). (4.42)

The matrix RXX is defined according to (2.238), where LX = LGH has to be considered toaccount for the dimensions of

∼X′′l(k). Resulting from (3.36),

∼R′′XX is only invertible when

RXX is full-rank, where the condition for RXX to be full-rank is given in (2.240). Hence,a necessary condition for

∼RZZ to be invertible is given by

NLLGH ≤ NS(LGH + LR − 1). (4.43)

According to(4.39),∼H′′ must fulfill further requirements to assure that

∼RZZ is invertible, as

discussed in the following: Assuming all impulse responses described by∼H′′m,l

to be linearlyindependent, the rank of

∼H′′ is given by the minimum of its row dimension NMNLLGH

168

and its column dimension N2LLG. Since the dimensions of

∼RZZ are defined by the number

of columns of∼H′′, the number of rows in

∼H′′ must be equal to or larger than the number

of columns, which is fulfilled when

NLLG ≤ NMLGH. (4.44)

Whenever (4.43) and (4.44) are fulfilled, optimal equalizers are uniquely determined.

It can be seen from (4.44) that the number of microphones NM has a crucial influenceon whether equalizers are uniquely defined, where choosing NM ≥ NL assures that (4.44)is fulfilled.

Finding a solution for (4.27) can be seen as two separate tasks which are accomplishedsimultaneously: Determining a set of equalized loudspeaker signals that would reproducethe desired (equalized) scene and finding equalizers to filter the original loudspeaker signalssuch that the equalizer output signals are a member of this set. This allows to interpret(4.44) as a condition for the existence of uniquely defined equalized loudspeaker signals,which is determined by the LEMS properties. On the other hand,(4.43) can be interpretedas the condition for the uniqueness of the equalizers which are suitable to produce a specificset member by filtering the original loudspeaker signals. The fulfillment of this conditionis determined by the loudspeaker signal correlation properties.

Approximative equalizer structures (see Sec. 4.2) and LEMS models (see Sec. 3.2) willgenerally influence the conditions for uniqueness of the equalizers. However, this influencecannot be straightforwardly described like it was done for the influence of approximativeLEMS models on the uniqueness of system identification in Sec. 3.3.2. In general, ap-proximative equalizer structures will reduce the dimension of the nullspace because fewerunknowns have to be determined.

An approximative LEMS model can be described by setting the respective∼H′′m,l

to zero,which violates the condition of linear independence of all

∼H′′m,l

required above. However,while the linear independence is a sufficient requirement for

∼H′′ to be full-rank, it is not

a necessary requirement. Hence it can be relaxed to∼H′′ having at least NLLG linearly

independent rows or columns. In this regard, setting specific∼H′′m,l to zero can result

in linear dependence, or remove the latter. Unfortunately, (4.44) does not allow fora straightforward derivation of the requirements for the approximative LEMS model tolead to unique equalizers. Still, in order to avoid nonuniqueness implied by approximativemodels, wave-domain LEMS models must couple each loudspeaker signal to at least onemicrophone signal. Furthermore, if the same set of microphone signals is coupled tomultiple loudspeaker signals, the number of microphone signals in this set must be equalto or greater than the number of loudspeaker signals coupled to this set of microphonesignals. The model described by (3.34) couples a unique non-empty set of microphonesignals to each of the loudspeaker signals such that it fulfills these requirements.

4.4 Determining Equalizers for Estimated Loudspeaker-Enclosure-Microphone Systems169

4.4 Determining Equalizers for EstimatedLoudspeaker-Enclosure-Microphone Systems

In this section, the determination of the equalizers considering a MIMO impulse responseof an LEMS is discussed.

As mentioned above, the true LEMS∼H is unknown in practice and an estimated LEMS

must be used to determine the equalizers. While the quantity∼H is deterministic, the

estimated LEMS∼H(n) depends on stochastic loudspeaker signals, which renders it an

actual random variable. Still, determining and discussing its statistical properties exceedsthe scope of this thesis, so

∼H(n) is treated as a determined but unknown variable in

the following. In this context, the unavoidable interaction between system identificationand equalizer determination render the task of LRE based on an identified system verychallenging. This might explain why results for this configuration have not been publishedbroadly.

As the signals fed to the estimated LEMS for equalizer determination do not neces-sarily need to be reproduced in the true LEMS, the excitation signals for the equalizerdetermination can be freely chosen [GKMK08]. To differentiate between the original loud-speaker signals and the excitation signals for the equalizer determination, the latter willbe denoted as x(k). While a suitable choice of the excitation signals ensures fulfillment of(4.43), it is not guaranteed that an estimated LEMS will fulfill (4.44). Moreover, if thereis no unique solution for system identification, when the equalizers have to be determinedfor estimated system, there is obviously no unique solution for optimal equalizers.

In order to determine equalizers, (4.27) must be solved, where the original loudspeakersignals ∼x(k) and the true LEMS

∼H are replaced by x(k) and

∼H(n), respectively. Using

(4.39) for an according redefinition of∼RZZ and ∼rZz0 leads to

RZZ =(H′′(n)

)HE{(

X′′(k))H

X′′(k)}

H′′(n) (4.45)

=(H′′(n)

)HR′′XXH′′(n), (4.46)

where H′′(n) and X′′(k) replace∼H(n) and x(k), respectively. This shows that the auto-

correlation matrix R′′XX describes the correlation properties of x(k) in the same way as∼R′′XX does for ∼x(k).

For an equivalent redefinition of ∼rZz0,∼H0 has to be represented by a vector according

to

h0 =(hT0,1,1, hT0,1,2, . . . , hT0,1,NL

, hT0,2,1, hT0,2,2, . . . , hT0,2,NL,

. . . , hT0,NM,1, hT0,NM,2, . . . , h

T0,NM,NL

)T, (4.47)

h0,m,l =(∼h0,m,l(0),

∼h0,m,l(1), . . . ,

∼h0,m,l(LGH − 1)

)T, (4.48)

170

where the length of the desired impulse responses can be limited by setting∼h0,m,l(k) = 0

for all k above a certain threshold. This finally leads to

rZz0 =(H′′(n)

)HE{(

X′′(k))H

X′′(k)}

h0 (4.49)

=(H′′(n)

)HR′′XXh0, (4.50)

such that (4.27) is represented by

RZZ∼g(n) = rZz0. (4.51)

A simple choice of excitation signals is spatially and temporally white noise, which leadsto a diagonal autocorrelation matrix R′′XX [GKMK08]. This reduces (4.51) to

(H′′(n)

)HH′′(n)∼g(n) =

(H′′(n)

)Hh0, (4.52)

which describes a signal-independent least-squares solution ∼g(n), equalizing the LEMSMIMO impulse response. The least-squares problem corresponding to (4.52) is given by

∼g(n) = argmin∼g(n)

{∼eHir (n)∼eir(n)

}, (4.53)

∼eir(n) = h0 − H′′(n)∼g(n), (4.54)

where ∼eir(n) is the impulse response error vector. While minimizing the euclidean norm‖∼eir(n)‖2 as in (4.53) is a common approach for determining the equalizers, other normscan be considered to increase the spatial robustness of the solution [MMK10].

For MIMO systems, more attention was paid in the literature for solving (4.52) than forsolving (4.27) [LGF05, KNHOB98, SK12c]. With few exceptions [LGF05], most imple-mentations will not compute an exact solution of (4.27) or (4.52) [GKMK08, KNHOB98,SK12c], as this results in a very large computational effort. Instead, computationallyefficient adaptation algorithms will be used, as described in the following section.

4.5 Adaptation Algorithms for System EqualizationIn this section, the application of the adaptation algorithms derived in Sec. 3.4 for thedetermination of the equalizers is discussed. Additionally, an alternative algorithm isderived for this purpose, which aims at a direct solution of (4.52) without consideringsignal quantities.

When comparing (3.25) and (4.27), a strong similarity between the tasks of systemidentification and equalizer determination can be seen. This motivates using algorithms

4.5 Adaptation Algorithms for System Equalization 171

derived for system identification also for the equalizer determination, where this approachrequires a special signal model that is described in Sec. 4.5.1. After this, the actualapplication of the adaptation algorithms is described in Sec. 4.5.2. When aiming at adirect equalization of the MIMO LEMS impulse response, (4.52) can also be solved usingan iterative algorithm as described in Sec. 4.5.3.

4.5.1 The filtered-x structure

The filtered-x structure, as described in this section, is often used for the determination ofequalizers in conjunction with different iterative algorithms [Bou03, GKMK08, SBRH07].This is because determining optimal equalizers by directly solving a corresponding systemof linear equations would result in a computational effort out of reach for the implemen-tation of real-world LRE systems. Even optimized algorithms do not sufficiently decreasethe computational effort [LGF05] for real-time operation.

While (3.25) and (4.27) are identical in their structure, they differ by the involvedquantities considered in the respective matrices and vectors: While

∼RXX only considers

the loudspeaker signals,∼RZZ additionally considers the LEMS. Similarly, ∼rZz0 does not

only consider the cross-correlation between the loudspeaker and the desired microphonesignals, but also the considered LEMS. The occurrence of the loudspeaker signals in thischapter is often accompanied by a representation of the LEMS or the desired systemresponse. These signals will be described by

Z(k) = X′′(k)H′′(n), (4.55)z0(k) = X′′(k)h0 (4.56)

in the following, which are representations of∼Z(k) ∼z0(k) capturing the quantities relevant

in this section. Equations (4.55) and (4.56) allow to write

e(n) = z0(k)− Z(k)∼g(n). (4.57)

The task for equalizer determination is then modified to finding equalizers ∼g(n) that filterthe signals described by Z(k) such that the error e(n) is minimized with respect to achosen norm. This is exactly the same task as already solved for system identification,where the resulting signal model is shown in Fig. 4.9. Hence, the determination of theequalizers can also be seen as a system identification task.

However, revisiting Sec. 3.1.2 reveals that the structure of Z(k) differs significantly fromthe structure of

∼X(k). The structure of z0(k), on the contrary, is identical to the structure

172

H(n) G(n)

Adaptationalgorithm

H0

x(k) z(k)

−

z0(k)

e(n)G(n)

+

Figure 4.9: Signal model of a filtered-x structure. H(n): estimated LEMS, G(n): equal-izer, H0 desired impulse response, x(k): excitation signal, z(k): filtered exci-tation signal, z0(k): desired microphone signal

of∼d(k) with accordingly adjusted vector lengths. The structure of Z(k) is given by

Z(k) =

Z1,1,1(k) · · · Z1,1,NL(k) · · · Z1,NL,1(k) · · · Z1,NL,NL(k)Z2,1,1(k) · · · Z2,1,NL(k) · · · Z2,NL,1(k) · · · Z1,NL,NL(k)

... . . . ... . . . ... . . . ...ZNM,1,1(k) · · · ZNM,1,NL(k) · · · ZNM,NL,1(k) · · · ZNM,NL,NL(k)

(4.58)

with

Zm,l′,l(k) =∼Hm,l′(n)Xl(k), (4.59)

where Xl(k) shares the structure of∼X′(k) but captures the excitation signals. The ma-

trices Zm,l′,l(k) have also a vector representation given by

z(k) = (z1,1,1(k), . . . , z1,1,NL(k), z1,2,1(k), . . . , z1,2,NL(k), . . . , z1,NL,1(k), . . . ,z1,NL,NL(k), z2,1,1(k), . . . , z2,1,NL(k), . . . , zNM,NL,1(k), . . . , zNM,NL,NL(k)) , (4.60)

zm,l′,l(k) = ˆHm,l′(n)xl(k), (4.61)

where ˆHm,l′(n) captures the same impulse responses like∼Hm,l′(n) but has a different size.

The latter is LZ × LX with LZ being the signal segment length of the filtered-x signals.This implies the definition

xl(k) = (xl(k − LX + 1), xl(k − LX + 2), . . . , xl(k))T , (4.62)

where xl(k) denotes a discrete-time sample of the excitation signal.The matrix Z(k) describes the convolution with N2

LNM signals, which are by a factorNL more than the NLNM signals described by

∼X(k) for system identification. The large

number of signals considered in Z(k) results from swapping the order of H(n) and G(n)in Fig. 4.9 in comparison to Fig. 4.3, where those quantities were represented by

∼G(n)


and∼H. Due to the structural identity, the relation between scalar multiplications and

matrix-vector multiplications can be used to explain this change of sizes: When a matrixis multiplied with a vector, the entries in each row are multiplied with the respectivecomponents of the vector such that one scalar is obtained for each matrix entry. Thevector resulting from this multiplication is then obtained by summing up all scalar resultsfor each row. In the same way, MIMO filtering can be described by single-channel filteringoperations for each individual input-to-output path and a superposition of all filteringresults for each of the output channels. The order of a cascade of single channel filters canbe changed without changing the impulse response of the cascade. Hence, single-channelfiltering is commutative, just as a scalar multiplication is commutative. This does nothold for a cascade of MIMO filters since a matrix multiplication is not a commutativeoperation.

However, similar to(3.23), it is possible to identically describe the matrix multiplicationAB by BA, where A and B capture the entries of A and B rearranged in an accordingmanner. This has been exploited in (3.22), (4.16), and (4.33), where the individualscalar multiplications of the entries in A and B are represented by multiplications withconvolution matrices. When omitting the multiplications and additions with zero-valuedentries, the same number of operation results for the products AB and BA. The numberof operations is, however, not equal when a vector u is first multiplied by B from the lefthand side and then by A compared to the case of multiplying u first by A and then byB. This can be illustrated considering a rearrangement of the equation

[v]ζ =Ψ∑ψ=1

[A]ζ,ψH∑η=1

[B]ψ,η [u]η︸︷︷︸HΨ multiplications︸︷︷︸

ΨZ+HΨ multiplications

(4.63)

=Ψ∑ψ=1

H∑η=1

[B]ψ,η [A]ζ,ψ [u]η︸︷︷︸2ΨZH multiplications

, (4.64)

where A is of dimensions Z × Ψ, B is Ψ × H, while v and u are vectors with Z andH elements, respectively. As 2ΨZH > ΨZ + HΨ holds for any Z,H > 1, the effort tocompute v according to (4.64) is higher than for computing v using (4.63). Moreover,there are HZΨ scalars resulting from evaluating the term [A]ζ,ψ [u]η in (4.64), comparedto Ψ results of the term ∑H

η=1 [B]ψ,η [u]η in (4.63). An example for A, B, B, and A, withA and B being 2× 2 matrices is shown in Fig. 4.10.

For MIMO filtering, each scalar multiplication in the example above corresponds to asingle-channel filtering operation and each obtained scalar corresponds to a single-channelsignal. Hence, whenever there is more than one input or output channel, swapping twoMIMO filters results in an increased filtering effort. Additionally, there will be morechannels connecting the swapped MIMO filters than there are inputs or outputs of thewhole cascade (assuming Z,H > 1). The case Z = H = 1 is in accordance to the

174

A =(a11 a12

a21 a22

), B =

(b11 b12

b21 b22

), u =

(u1

u2

), v =

(v1

v2

),

B =(b11 b21 b12 b22 0 0 0 00 0 0 0 b11 b21 b12 b22

), A =

a11 0a12 00 a11

0 a12

a21 0a22 00 a21

0 a22

,

v = A (Bu) = A(b11u1 + b12u2

b21u1 + b22u2

)⇒ 4 + 4 multiplications

= (AB) u =(BA

)u =

(a11b11 + a12b21 a11b12 + a12b22

a21b11 + a22b21 a21b12 + a22b22

)u⇒ 8 + 4 multiplications

= B(Au

)= B

a11u1

a12u1

a11u2

a12u2

a21u1

a22u1

a21u2

a22u2

⇒ 8 + 8 multiplications

=(a11b11u1 + a12b21u1 a11b12u2 + a12b22u2

a21b11u1 + a22b21u1 a21b12u2 + a22b22u2

).

Figure 4.10: Example of two 2×2 matrices commuted in a matrix-matrix-vector product.

statements above, showing that single channel filters can be swapped without increasedeffort.

Eventually, when using an adaptation algorithm in the filtered-x structure, the numberof input signals to be considered is proportional to the number of single-channel equalizersto be determined. Moreover, for multichannel adaptation algorithms, the cross-correlationproperties of the input signals must be considered, where their increased number results intremendous computational demands. When considering the generalized frequency-domainadaptive filtering (GFDAF) algorithm as an example, the computational complexity isproportional of the second or third power of the number of input channels, dependingon the kind of implementation [BBK03]. The number of input signals that have to beconsidered for determining the equalizer is increased by a factor of NL, compared to iden-tifying the LEMS. Still, since the excitation signals are independent of the reproduction


signals, they can be chosen arbitrarily. If the excitation signals are uncorrelated1, it ispossible to separate the problem into NL subproblems, where for each only NL inputsignals are considered. Assuming that the adaptation algorithm implies computationaldemands proportional to the squared number of input signals, this reduces the overallcomputational demands to be proportional to N3

L rather than to N4L (see Appendix D.4).

This is, however, not without loss of generality, as the correlation properties of the originalloudspeaker signals could also be exploited to further minimize e(n). The latter case isanalogous to system identification, where nonuniqueness may actually lead to an increasedecho return loss enhancement (ERLE).

Approximative equalizer structures, as described in Sec. 4.2, will generally reduce thenumber of input channels that have to be considered for the adaptation algorithm, due tothe reduced number of determined equalizers. A similar result was already obtained forsystem identification in Sec. 3.4, which can be straightforwardly applied to determiningthe equalizers. Instead of N2

L equalizers that have to be considered when using a generalequalizer structure, only NLNG equalizers are determined for an approximative equalizerstructure leading to a computational effort proportional to N2

LN2G. A separation for uncor-

related input signals as described above then leads to a computational effort proportionalto NLN

2G, which constitutes a drastic reduction.

Since the number of input signals is weighted by N2G (with NG > 1 being typical), it

plays a dominant role when determining the computational effort for equalizer determi-nation. The number of considered microphones, in the other hand, is less decisive, asit is not weighted by N2

G. Furthermore, an approximative LEMS model will reduce thenumber of microphone signals that have to be considered. Still, since the dependence ofthe overall computational effort on this number is weak anyway, a discussion of this isomitted for brevity.

4.5.2 Application to adaptation algorithmsIn this section, the application of adaptation algorithms to determine the equalizers usingthe filtered-x structure is explained. To this end, only the equations for describing theiterative algorithms are given, as all derivations from Sec. 3.4 remain valid for the equalizerdetermination. Hence, algorithms described below were obtained following the derivationsin Sec. 3.4, where the following signal quantities have been substituted: The matrix

∼h(n)

is replaced by ∼g(n),∼X(k) is replaced by Z(k), and

∼d(k) is replaced by z0(k). Consequently,

the error signals (3.55) and (3.56) are replaced by

e′eq(n) = z0(k)− Z(k)∼g(n− 1), (4.65)e′eq(n) = z0(k)− Z(k)V2

eq∼g(n− 1), (4.66)

1Uncorrelatedness rather than statistical independence is sufficient for adaptation algorithms based onsecond-order statistics.

176

where

V2eq = VT

eqVeq = Diag(vec

(MT

G

)⊗ 1LG×1

)(4.67)

is used to describe approximative models, where Veq is structured in the same way as Vsi

(see (3.44) and Fig. 3.9).

Least mean squares algorithm

The least mean squares (LMS) algorithm is discussed in Sec. 3.4.1 for system identifica-tion. The original version of this algorithm is given by (3.62), while modified versions forthe use of approximative models and a cost-guided wave-domain system identification aregiven by (3.65) and (3.67), respectively. These algorithms can also be used for equalizerdetermination, where the representations of (3.62), (3.65), and (3.67) are given by

∼g(n) = ∼g(n− 1) + µeqZH(nLF)e′eq(nLF), (4.68)∼g(n) = ∼g(n− 1) + µeqV2

eqZH(nLF)e′eq(nLF), (4.69)∼g(n) = ∼g(n− 1) + µeqZH(nLF)e′eq(nLF)− C(n) (∼g(n− 1)− ∼gC(n)) , (4.70)

respectively. The step-size parameter µeq has the same meaning as µsi for system identi-fication such that all statements made in Sec. 3.4.5 remain valid and a further discussioncan be omitted. The equalizer coefficients captured in ∼gC(n) can be used to guide theadaptation algorithms towards a solution with a low Euclidean distance to the chosenfilter coefficients, in the same way as

∼hC(n) is used in system identification. Similarly, the

cost matrix C(n) can be used to weight this distance with respect to the individual filtercoefficients, like

∼C(n) is applied in system identification, noting that the parameter βeq

replaces βsi. A suitable choice of ∼gC(n) could for example be a pure delay as describedby (2.199), which is close to the delay described by H0. Such a choice, combined with acarefully chosen βeq, will only have a moderate influence on the determined equalizers butcan effectively prevent a possible dead-lock situation. Such a situation can occur whenthe equalizers preclude a further excitement of the LEMS, i. e., when ∼g(n) = 0NL2LG×1 isobtained at any block time instant.

Affine projection algorithm

The affine projection algorithm is described by (3.73), (3.74), (3.80), and (3.87) inSec. 3.4.2. For LRE the basic variant of this algorithm is given by

∼g(n) = ∼g(n− 1) + µeqZ†(nLF)e′eq(nLF), (4.71)


where the pseudoinverse Z†(k) has to be defined according to

Z†(k) = ZH(k)(Z(k)ZH(k) + γeqZR(k)

)−1(4.72)

and ZR(k) (defined later) will be used for regularization. The weight parameter γsi ofthe regularization was replaced by γeq since the parameters for system identification andequalizer determination might be chosen differently. A detailed discussion on the affineprojection algorithm (APA) for LRE can be found in [Bou03].

The variants for approximative equalizer structures and for cost-guided equalizer de-termination are given by

∼g(n) = ∼g(n− 1) + V2eqZH(nLF)

(Z(nLF)V2

eqZH(nLF) + γeqZR(nLF))−1

e′eq(n),(4.73)

∼g(n) = ∼g(n− 1) + Z†(nLF)e′eq(n)+(INLNLLG − Z†(nLF)Z(nLF)

)C(n) (∼gC(n)− ∼g(n− 1)) , (4.74)

respectively. Again, the parameter γeq has the same meaning as γsi for system identifi-cation and was already discussed in Sec. 3.4.5. Still, the regularization matrix has to bedefined with compatible dimensions and is given by

ZR(k) = max(ε,

zH(nLF)z(nLF)N2

LNMLZ

)INMLD . (4.75)

Recursive least squares algorithm

The recursive least squares (RLS) algorithm is described in Sec. 3.4.3 by the equations(3.93), (3.103), (3.116), and (3.124). Its formulation for the equalizer determination isgiven by

∼g(n) = ∼g(n− 1) +(RZZ(n)

)−1ZH(nLF)e′eq(nLF), (4.76)

where RXX(n) had to be replaced by

RZZ(n) = λeqRZZ(n− 1) + ZH(nLF)Z(nLF). (4.77)

The exponential weighting factor λeq in the latter equation has the same influence as λsi

in system identification. The RLS algorithm to adapt approximative equalizer structuresis given by

∼g(n) = V2eq

∼g(n− 1) + VTeq

(VeqRZZ(n)VT

eq

)−1VeqZH(nLF)e′eq(nLF), (4.78)

178

while the cost-guided wave-domain equalizer determination is facilitated by

∼g(n) = ∼g(n− 1) +(RZZ(n) + C(n)

)−1 (ZH(nLF)e′eq(nLF)

+(λeqC(n− 1)− C(n)

)∼g(n− 1) + C(n)∼gC(n)− λeqC(n− 1)∼gC(n− 1)

).

(4.79)

Generalized frequency-domain adaptive filtering algorithm

The GFDAF algorithm has been identified as an approximation of the RLS algorithm inSec. 3.4.4. To describe this algorithm for the equalizer determination, the discrete Fouriertransform (DFT)-domain representation Z(k) of Z(k) is used. The structure of Z(k) isidentical to the structure described for Z(k) in (4.58) with the submatrices Zm,l′,l(k)replaced by

Zm,l′,l(k) = Diag (FLZ zm,l′,l(k)) . (4.80)

Using this matrix, it is possible to write

Z(k) = W01Z(k)W10, (4.81)

with the following matrix definitions:

W01 = INM ⊗((

0LD×(LZ−LD), ILD

)FHLZ

), (4.82)

W10 = IN2L⊗(

FLZ

(ILG

0(LZ−LG)×LG

)). (4.83)

The matrix S(sp)XX(n) used to describe the GFDAF algorithm in Sec. 3.4.4 is here represented

by

S(sp)ZZ (n) = λeqS(sp)

ZZ (n− 1) + LD

LZZH(nLF)Z(nLF). (4.84)

This leads to a representation of (3.151) given by

∼g(n) = ∼g(n− 1) + µeqWH10

(S(sp)

ZZ (n) + γeqSR(n))−1

W10WH10ZH(nLF)WH

01e′eq(nLF),(4.85)

where SR(n) is used for regularization in the same way as SR(n) in Sec. 3.4.4. The matrix


SR(n) can be chosen according to

SR(n) = max (ε, pZ(n)) INLNLLZ , (4.86)

where a recursively time-averaged estimate of the filtered excitation signal power is usedas given by

pZ(n) = λeqpZ(n− 1) +z(nLF)z(nLF)N2

LNMLZ. (4.87)

This algorithm is referred to as the filtered-x generalized frequency-domain adaptivefiltering (FxGFDAF) algorithm. For system identification, the computationally moreefficient variants of this algorithm are given by (3.154) and (3.155). The equivalents ofthese for determining the equalizers are given by

∼g(n) = ∼g(n− 1) + µeqLG

LZWH

10

(S(sp)


ZH(nLF)WH01e′eq(nLF),

(4.88)∼g(n) = ∼g(n− 1) + µeq

LG

LZ

(S(sp)


ZH(nLF)WH01e′eq(nLF), (4.89)

where ∼g(n) is the DFT-domain representation of ∼g(n) and

e′eq(n) =∼d(k)− W01Z(k)∼g(n− 1) (4.90)

is evaluated instead of (4.65). Finally, for approximative equalizer structures, (4.85),(4.88), and (4.89) are modified to

∼g(n) = V2eq

∼g(n− 1) + µeqWH10VT

eq

(V eq

(S(sp)

ZZ (n) + γeqSR(n))

VTeq

)−1

·V eqW10WH10ZH(nLF)WH

01e′eq(nLF), (4.91)∼g(n) = V2

eq∼g(n− 1) + µeq

LG

LZWH

10VTeq

(V eq

(S(sp)

ZZ (n) + γeqSR(n))

VTeq

)−1

·V eqZH(nLF)WH01e′eq(nLF), (4.92)

∼g(n) = V2eq

∼g(n− 1) + µeqLG

LZVT

eq

(V eq

(S(sp)

ZZ (n) + γeqSR(n))

VTeq

)−1

·V eqZH(nLF)WH01e′eq(nLF), (4.93)

respectively, where V eq represents Veq with accordingly adjusted signal segment length.The versions of (4.85), (4.88), and (4.89) for cost-guided equalizer determination are givenby

180

∼g(n) = ∼g(n− 1) + µeqWH10

(S(sp)

ZZ (n) + C(n) + γeqSR(n))−1

W10

· WH10

(ZH(nLF)e′eq(nLF) +

(λeqC(n− 1)− C(n)

)W10

∼g(n− 1)

+ C(n)W10∼gC(n)− λeqC(n− 1)W10

∼gC(n− 1)), (4.94)

∼g(n) = ∼g(n− 1) + µeqLG

LZWH

10

(S(sp)


W10

· WH10



)W10

∼g(n− 1)


∼gC(n− 1)), (4.95)

∼g(n) = ∼g(n− 1) + µeqLG

LZWH

10

(S(sp)


W10

· WH10



)W10

∼g(n− 1)


∼gC(n− 1)), (4.96)

respectively. Here, C(n) is the DFT-domain representation of C(n), which should bechosen such that the matrix to be inverted retains its sparsity.

Please note, given that the excitation signals for each channel are independent, it ispossible to determine the equalizers for each loudspeaker/excitation signal separately.This results in a reduced computational effort when using the FxGFDAF algorithm dueto a reduced number of linear equations that have to be solved simultaneously. In practice,this implies a slight deviation from the algorithm description above in the implementation.

4.5.3 The iterative discrete-Fourier-transform-domain inversionalgorithm

In this section, the IDI algorithm is explained, which has first been presented in [SK12c].This algorithm closes the gap between the FxGFDAF algorithm and the DFT-domainapproximate inversion proposed in [KNHOB98]. Unlike the algorithms discussed above,this algorithm does not involve using a filtered-x structure. Instead, this algorithm aimsat iteratively solving (4.52) exploiting a DFT-domain approximation. The derivationpresented here differs from [SK12c] by starting with an algorithm definition in the timedomain, which is later approximated in the DFT domain. This is analogous to the deriva-tion of the GFDAF algorithm, which was derived as a DFT-domain approximation of thetime-domain RLS algorithm in Sec. 3.4.4. Furthermore, beyond [SK12c] versions of theIDI algorithm are presented, which consider approximative equalizer structures and al-low for controlling the algorithms convergence behavior like for the algorithms describedabove.

The cost function for the IDI algorithm can be directly derived from (4.53) and is givenby


JIDI(n) = ∼eHir (n)∼eir(n) (4.97)

= hH0 h0 − hH0 H′′(n)∼g(n)− ∼gH(n)(H′′(n)

)Hh0 + ∼gH(n)

(H′′(n)

)HH′′(n)∼g(n),

(4.98)

where setting the complex (or Wirtinger) gradient [Bra83, Fis02] to zero results in (4.52).For an iterative algorithm an update term ug(n) according to

∼g(n) = ∼g(n− 1) + ug(n) (4.99)

has to be found such that ∼g(n) ideally solves(4.52). Multiplying(4.99) by(H′′(n)

)HH′′(n)

from the left-hand side and inserting (4.52) for(H′′(n)

)HH′′(n)∼g(n) leads to

(H′′(n)

)HH′′(n)ug(n) =

(H′′(n)

)Hh0 −

(H′′(n)

)HH′′(n)∼g(n− 1), (4.100)

as an intermediate result. Then, defining the a priori error

∼e′ir(n) = h0 −∼H′′∼g(n− 1) (4.101)

and substituting ug(n) into (4.99) results in

∼g(n) = ∼g(n− 1) +((

H′′(n))H

H′′(n))−1 (

H′′(n))H ∼e′ir(n), (4.102)

where(H′′(n)

)HH′′(n) is assumed to be invertible. Unlike for the derivation of the RLS

algorithm, ∼g(n−1) was not assumed to be optimal in any sense (cf. (3.97)). This is possiblebecause H′′(n) of previous time instants, e. g., n− 1, is irrelevant for computing (4.102).Actually, ∼g(n − 1) may be arbitrary, while an exact computation of (4.102) instantlyleads to a least-squares optimal ∼g(n). Still, the dimensions of the involved matrices willpractically preclude an exact computation of (4.102), where the approximation presentedin the following can be used to overcome this problem.

Using the definitions

F = INMNL ⊗ FHLGH

(4.103)

and (4.83) allows for writing

H′′(n) = FH′′(n)W10. (4.104)

182

Unlike W01, F does not describe a time-domain windowing, which is not necessary becausethe convolution product of the impulse responses described by H′′(n) and ∼g(n) is anywaylimited to a length of LGH samples. The matrix H′′(n) is structured according to

H′′(n) =

INL ⊗ H′′1,1 INL ⊗ H′′1,2 · · · INL ⊗ H′′1,NL

INL ⊗ H′′2,1 INL ⊗ H′′2,2 · · · INL ⊗ H′′2,NL... ... . . . ...INL ⊗ H′′

NM,1 INL ⊗ H′′NM,2 · · · INL ⊗ H′′NM,NL

, (4.105)

H′′m,l = Diag(

FLGH

(ILH

0(LG−1)×LH

)hm,l(n)

). (4.106)

This leads to a DFT-domain representation of (4.102) given by

∼g(n) = ∼g(n− 1) +(

WH10

(H′′(n)

)HH′′(n)W10

)−1WH

10

(H′′(n)

)HFH∼e′ir(n). (4.107)

Using the approximation(WH

10

(H′′(n)

)HH′′(n)W10

)−1≈ WH

10

((H′′(n)

)HH′′(n)

)−1W10 (4.108)

and transforming (4.101) into the DFT domain allows for writing the basic algorithm as

∼g(n) = ∼g(n− 1) + µeqWH10

((H′′(n)

)HH′′(n) + γeqHR(n)

)−1W10

· WH10

(H′′(n)

)H ∼e′ir(n), (4.109)

where the parameter µeq has been introduced to account for inaccuracies due to theapproximation and the matrix

HR(n) = max

ε, ∼hH(n)

∼h(n)

NLNMLH

IN2LLGH , (4.110)

weighted by the parameter γeq is used for regularization. The DFT-domain representationof (4.101) is then given by

∼e′ir(n) = FHh0 − H′′(n)W10∼g(n− 1). (4.111)

As it can be seen from (4.105) and (4.106),(H′′(n)

)HH′′(n) is sparse and can be inverted

with relatively low computational expenses. Note that the problem can, furthermore, betreated separately for every original loudspeaker signal without loss of generality, as (4.52)does not describe any coupling between those channels. A further approximation can bemade by using


∼g(n) = ∼g(n− 1) + µeqLG

LZWH

10

((H′′(n)


)−1 (H′′(n)

)H ∼e′ir(n),

(4.112)

where an equivalent to (3.153) was exploited such that (4.112) is comparable to (3.154).An approximation as formulated by (3.155) for the GFDAF algorithm does not lead tosatisfying results for the IDI algorithm.

For approximative equalizer structures, (4.101) has to be restated according to∼e′ir(n) = h0 −

∼H′′V2

eq∼g(n− 1). (4.113)

Plugging (4.113) into (4.97) and setting its complex gradient to zero leads to a modifiedversion of (4.100) given by:

V2eq

(H′′(n)

)HH′′(n)V2

equg(n) = V2eq

(H′′(n)

)Hh0 −V2

eq

(H′′(n)

)HH′′(n)V2

eq∼g(n− 1).

(4.114)

Equation (4.67) can be exploited to truncate the zero-values equation from (4.114), whileVeq

(H′′(n)

)HH′′(n)VT

eq is assumed to be invertible such that

Vequg(n) =(

Veq(H′′(n)

)HH′′(n)VT

eq

)−1Veq

(H′′(n)

)H ∼e′ir(n). (4.115)

Requiring that there is no update of the disregarded equalizers is mathematically ex-pressed by (

IN2LLG −V2

eq

)ug(n) = 0N2

LLG×1, (4.116)

which is needed to finally formulate

∼g(n) = ∼g(n− 1) + VTeq

(Veq

(H′′(n)

)HH′′(n)VT

eq

)−1Veq

(H′′(n)

)H ∼e′ir(n). (4.117)

The DFT-domain approximations of (4.117) corresponding to (4.109) and (4.112) can bestraightforwardly derived and are given by

∼g(n) = ∼g(n− 1) + µeqWH10VT

eq

(Veq

((H′′(n)


)VT

eq

)−1Veq

· W10WH10

(H′′(n)

)H ∼e′ir(n), (4.118)∼g(n) = ∼g(n− 1) + µeq

LG

LZWH

10VTeq

(Veq

((H′′(n)


)VT

eq

)−1Veq

·(H′′(n)

)H ∼e′ir(n), (4.119)

184

respectively. For cost-guided equalizer determination, the cost function given in(4.97) must be modified to

J(cg)IDI (n) = JIDI(n) + (∼g(n)− ∼gC(n))H C(n) (∼g(n)− ∼gC(n)) . (4.120)

Setting its complex gradient to zero leads to the intermediate result(H′′(n)

)HH′′(n)ug(n) =

(H′′(n)

)H ∼e′ir(n) + C(n) (∼gC(n)− ∼g(n)) , (4.121)

which finally allows to write

∼g(n) = ∼g(n− 1) +((

H′′(n))H

H′′(n) + C(n))−1

·((

H′′(n))H ∼e′ir(n)− C(n)∼g(n− 1) + C(n)∼gC(n)

). (4.122)

The DFT-domain approximations of (4.122) are given by

∼g(n) = ∼g(n− 1) + µeqWH10

((H′′(n)

)HH′′(n) + C(n) + γeqHR(n)

)−1

· W10WH10

((H′′(n)

)H ∼e′ir(n) + C(n)W10 (∼gC(n)− ∼g(n− 1))), (4.123)

∼g(n) = ∼g(n− 1) + µeqLG

LZWH

10

((H′′(n)

)HH′′(n) + C(n) + γeqHR(n)

)−1

·((

H′′(n))H ∼e′ir(n) + C(n)W10 (∼gC(n)− ∼g(n− 1))

), (4.124)

4.5.4 Summary of adaptation algorithmsLike for system identification in Sec. 3.4, variants of the LMS algorithm, the APA, theRLS algorithm, and the GFDAF algorithm have been presented that can be integratedin the so-called filtered-x structure to determine the equalizers for the estimated LEMS.Furthermore, the IDI algorithm was derived that does not require a filtered-x structure.For all considered algorithms, a variant to adapt approximative equalizer structures anda variant to determine cost-guided equalizers have been presented. A review of the equa-tions to implement the individual algorithm is omitted here since the presentation inSections 4.5.2 and 4.5.3 is already rather compact.

The evaluation results presented in the following will only consider the FxGFDAFalgorithm in the variants described by(4.88), (4.92), and (4.95), and the IDI algorithms as


described by (4.112), (4.119), and (4.124). The reasons given in Sec. 3.4.6 for disregardingthe LMS algorithm, the APA, and the RLS algorithm for acoustic echo cancellation(AEC) are also applicable to LRE. Hence, those algorithms were presented rather for thesake of completeness than as a suggestion for the implementation of a real-world system.Nevertheless, the description of the LMS algorithm, the APA, and the RLS algorithm canstill be valuable for the development of further algorithms.

4.6 Experimental ResultsIn this section, evaluation results for LRE are presented, where the general evaluationscenario and measures used to assess the LRE performance are described in Sections 4.6.1and 4.6.2, respectively. In Sec. 4.6.3, experiments with time-varying scenarios are con-sidered to evaluate the convergence behavior of an LRE system. After this, stationaryscenarios are considered in Sec. 4.6.4, focusing on the steady-state performance of differentapproximative equalizer structures.

4.6.1 Evaluation scenarioIn this section, the general evaluation scenario is explained, where the loudspeaker signalsfor reproduction were determined according to WFS to synthesize plane waves, carryingwhite noise signals. While this is identical to the signals used to evaluate AEC (seeSec. 3.5.1), the sampling rate was reduced to fs = 2 kHz. This choice was motivated bythe considered loudspeaker setup, which exhibits significant aliasing artifacts for WFSabove 1 kHz. The resulting synthesis filters were of length LR = 28 samples, whichcorresponds to a time span of 0.014 s or a sound wave traveling by about 4.5 m. Theincidence angles of the plane waves were chosen to be ϕq = 0, π/2, π, and 3π/2 (forq = 1, 2, 3, 4, respectively) whenever up to four plane waves were synthesized. Whenmore plane waves were synthesized, the incidence angles were given by

ϕq = (q − 1) 2πNS

, q = 1, 2, . . . , NS. (4.125)

The array setup used for LRE is depicted in Fig. 4.11, where the user or listener isassumed to be located inside the microphone array. This is in contrast to the setupfor AEC and the microphone array considered for LRE has to exhibit a larger radius,e. g., RL = 0.5 m, enclosing the whole considered listening area. The microphone array isequipped with NM = 48 microphones in order to allow for a unique determination of theequalizers, given a perfect identification of the LEMS. Additionally, there were two arrayswith 48 microphones and radii of RI = 0.4 m and RE = 0.6 m considered to measure theLRE inside and outside the listening area, respectively. The loudspeaker array in thissetup is identical to the one used for AEC with NL = 48 loudspeakers located on acircle with a radius RL = 1.5 m. The LEMSs formed by the loudspeaker array and therespective microphone arrays with radii of RM, RI, and RE are represented by H, HI,

186

x

y

RL RM

RI RE

Listening area

Figure 4.11: Array setup for the evaluation of LRE

and HE, respectively. The latter two are only used for the measurement of the achievedLRE, but not for the optimization of the equalizers. The positions of the microphones at% = RI and % = RE will be referred to as inner and outer evaluation positions, while thepositions of the microphones at % = RM will be termed optimization positions. Wheneverthe term “microphone array” is used in the following without further information it refersto the microphone array of radius RM.

All results presented in the following were obtained using an image source model asdescribed in Sec. 2.1.5. The geometry of this LEMS is depicted in Fig. 4.12, whered

(w)1 = d

(w)2 = 2 m and d

(w)3 = d

(w)4 = 3 m was chosen and the reflection coefficient of the

walls was R = 0.9. Floor and ceiling have not been modeled since array setups locatedin the plane are not suitable for compensating or equalizing elevated reflections [SRR05].The simulated room temperature was 21.3◦C such that the speed of sound c was equal to


x

yd

(w)1 d

(w)3

d(w)2

d(w)4

Figure 4.12: LEMS considered for the image source model

344 m/s. Using a fourth order image source model, the resulting maximum impulse responsehad a length of 135 samples, but was truncated to 128 samples to theoretically allow fora perfect identification of the LEMS, when no approximative model is used. A sample ofsuch an impulse response is shown in Fig. 4.13, where the small negative excursions aredue to the band-limited fractional filters that where necessary to implement non-integerdelays.

The approximative wave-domain LEMS model was chosen as described in Sec. 3.2,where the transforms given by (2.228) and (2.229) were used and NH diagonals wereconsidered in the model. The approximate, or scalable, equalizer structure is describedin Sec. 4.2, where

NG = NH (4.126)

was chosen. The variable NE was chosen such that all error signals available for therespective models were considered. The desired impulse responses are represented by∼H0 = TMH0TL, where H0 constitutes the free field impulse response of the LEMS withan additional delay ∆eq of 16 samples. The latter was necessary to ensure causality ofthe optimal equalizers, as described in Sec. 2.5.1.

188

0 20 40 60 80 100 120

0

0.2

0.4

0.6

samples

hµ,λ

(k)

Figure 4.13: Exemplary loudspeaker-to-microphone impulse response used for the LREevaluation

If not stated otherwise, the following parameters have been chosen for system identi-fication using the GFDAF algorithm. The frame shift for the simulations was chosen tobe equal to the filter length of the LEMS model, i. e., LF = LH = 128. Due to the stronginteraction between system identification and equalizer determination, more conservativeparameters had to be chosen in comparison to the AEC experiments. Thus, the param-eters chosen for system identification were λsi = 0.98, γsi = 0.05, µsi = 2, while ssi = 105

was kept as it already constitutes a conservative choice. The adaptation of∼h(n) was held

during the first five iterations (n = 0, 1, . . . , 4,) in order to assure initial convergence ofthe power spectral density (PSD) estimate S(sp)

XX(n) before adaptation.For the equalizer determination, the FxGFDAF and the IDI algorithms have been used.

Furthermore, a least-squares approach by solving (4.52) has been applied for comparison,whenever the approximative LEMS model reduced the computational effort sufficientlyto make this feasible. The latter approach does not use an approximation in the temporaldomain, like it is the case for the other evaluated algorithms.

The FxGFDAF algorithm was excited by white noise signals, which has been reported toincrease the convergence speed [GKMK08]. Since the LEMS is modeled by a finite impulseresponse (FIR) filter, a perfect equalizer would be of infinite length (see Sec. 2.5.1). Asonly adaptive FIR filters are considered in this thesis, the length LG of the equalizers waschosen to be twice the length of the LEMS model filters, i. e., 256 samples, to account forthis mismatch. The “forgetting factor” was set to λeq = 0.98, while a regularization weightlarger than that used for system identification had to be chosen with γeq = 0.1. This isbecause the variations in

∼h(n) strongly influence S(sp)

ZZ (n) and the resulting properties ofS(sp)

ZZ (n) are very difficult to predict. Like for system identification, µeq = 2 and seq = 105

were chosen, where the adaptation of the equalizers also started at n = 6.For the IDI algorithm, the same parameters were used with the exception of γeq that


has been set to 4. This very strong regularization is necessary since, otherwise, the largeadaptation steps resulting from this algorithm make the LRE prone to instabilities. Notethat the “forgetting factor” λeq has no meaning for the IDI algorithm.

4.6.2 Considered measuresIn this section, measures used to assess the LRE performance for the following experimentsare discussed.

Like for AEC, the normalized misalignment is used to assess the system identificationperformance. Since the ERLE has no meaning for an LRE system, it will not be usedto assess the achieved LRE. However, a similar measure, such as the normalized a pos-teriori error of the adaptation algorithms gives insight into the convergence state of theadaptive filters. The normalized a posteriori errors for system identification and equalizerdetermination are given by

∼esi(n) = 20 log10

‖∼e′si(k)‖2∥∥∥∼d(k)

∥∥∥2

dB, (4.127)

∼efx(n) = 20 log10

(‖e(n)‖2

‖z0(k)‖2

)dB, (4.128)

respectively. Note that the definition of ∼esi(n) is identical to the ERLE up to a change in

the sign.To measure the LRE achieved at the positions of the three microphone arrays, three

LRE errors can be defined:

eM(n) = 20 log10

∥∥∥(HTL

∼G(n)TL −H0

)x(k)

∥∥∥2∥∥∥(HTLG0TL −H0

)x(k)

∥∥∥2

dB, (4.129)

eI(n) = 20 log10

∥∥∥(HITL

∼G(n)TL −H0I

)x(k)

∥∥∥2∥∥∥(HITLG0TL −H0I

)x(k)

∥∥∥2

dB, (4.130)

eE(n) = 20 log10

∥∥∥(HETL

∼G(n)TL −H0E

)x(k)

∥∥∥2∥∥∥(HETLG0TL −H0E

)x(k)

∥∥∥2

dB, (4.131)

where the free-field impulse responses for the microphone arrays with radii RI and RE aredenoted by H0I and H0E, respectively. The matrix G0 describes a delay of the loudspeakersignals by ∆eq samples without altering them in any other way. Hence, the denominatorsin (4.129) to (4.131) represent the deviation from the desired microphone signal withoutequalization.

The LRE error measures defined above are used to assess the LRE performance for thecurrently reproduced acoustic scene. Like the ERLE for AEC, this is a measure that is

190

highly relevant for practical applications, where the LRE error eI(n) in the listening areaapproximates the error perceived by the listener.

From (4.129) to (4.131), the measures

EM(n) = 20 log10

∥∥∥HTL

∼G(n)TL −H0

∥∥∥F∥∥∥HTLG0TL −H0

∥∥∥F

dB, (4.132)

EI(n) = 20 log10

∥∥∥HITL

∼G(n)TL −H0I

∥∥∥F∥∥∥HITLG0TL −H0I

∥∥∥F

dB, (4.133)

EE(n) = 20 log10

∥∥∥HETL

∼G(n)TL −H0E

∥∥∥F∥∥∥HETLG0TL −H0E

∥∥∥F

dB, (4.134)

can be derived, respectively, which assess the absolute equalization facilitated by theequalizers. The meaning of these measures is comparable to the meaning of the normal-ized misalignment for AEC: while minimizing them remains the ultimate goal for LRE,they will typically underestimate the equalization facilitated for the actually reproducedacoustic scene.

For illustration, the measures ∼esi(n), ∼

efx(n),∆h(n), eM(n), eI(n), eE(n), EM(n), EI(n), andEE(n), were determined for NG = 3 in an LRE experiment, with a timeline identical tothe AEC experiment leading to Fig. 3.23. The results of the LRE experiment are shownin Figures 4.14 and 4.15. The upper plot of Fig. 4.14 shows ∼

esi(n) and ∼efx(n), the second

plot shows EM(n), EI(n), and EE(n), while the third plot shows ∆h(n). The activity of theindividual plane waves is shown in the lowest plot, where a darker bar denotes a strongersource activity. As expected when considering its definition, the curve of ∼

esi(n) is similarto the inverted curve of ERLE for AEC (Model 2 in Fig. 3.14). The slower convergenceand the larger residual error can be attributed to the larger “forgetting factor” λsi andalso to the strong interaction between system identification and equalizer determination.When considering ∼

efx(n), it can be seen that the adaptive equalizer converges more slowlythan the adaptive filter used for system identification, while an increase of ∼

esi(n) also leadsto an increase of ∼

efx(n). The latter is an expected result, as both measures are propor-tional to the Euclidean length of the update of the respective adaptive filters, while eachadaptation step of

∼h(n) is likely to imply a further adaptation of ∼g(n).

Like ∆h(n) in the AEC experiments, the absolute equalization as shown byEM(n), EI(n),and EE(n) in the second plot in Fig. 4.14 cannot be optimized directly. Thus, ∼

esi(n) and∼efx(n) are minimized instead, which can converge towards a lower value, while EM(n),EI(n), and EE(n) increase instead. This behavior can be clearly seen in the first fiveseconds of this experiment. Since the signals used to identify the LEMS are measured atthe same positions as EM(n), it is slightly smaller than the two other errors. The relationof EM(n) to EI(n) and EE(n) will be explained in the following.


−20

−10

0er

ror

indB

∼esi(n)∼efx(n)

−2

0

2

4

erro

rin

dB

EM(n)EI(n)EE(n)

−5

0

5

10

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.14: Adaptive filter errors, absolute LRE errors, and normalized misalignment

The definition of the circular harmonics (2.23) comprises a Bessel function, which ex-hibits a stronger temporal high-pass characteristic for the higher mode orders. Therefore,the higher-order modes are predominantly excited for greater radii % [KSAJ07], such thatthe modes strongly excited at % = RI are a proper subset of the modes strongly excitedat % = RM. Consequently, minimizing the error EM(n) for all modes excited at % = RM,also implies reducing EI(n), such that EI(n) is only slightly larger than EM(n). In thecontrary, the higher order modes excited at % = RE, are less strongly excited at the opti-mization positions. Hence, those modes are equalized to a smaller extent, which resultsin a larger EE(n) compared to EM(n).

The normalized misalignment ∆h(n) shown in the third plot of Fig. 4.14 exhibits a dif-ferent behavior than the misalignment shown for AEC in Fig. 3.14. In the former figure,it can be seen that the interaction between system identification and equalizer determi-nation can actually improve the system identification in the first part of the experiment.

192

However, this interaction causes a significant divergence later, which is considered to bea major obstacle for implementing a real-word LRE system.

The measures eM(n), eI(n), and eE(n), obtained for the same experiment are shown inFig. 4.15. For comparison, eM(n) is shown in the first and the second plot, while eI(n)and eE(n) are shown either in the first or in the second plot, respectively. This was donebecause the considerable variance of those measures would otherwise severely degrade thereadability of those plots. For the same reason, the eM(n), eI(n), and eE(n) are shownin the third plot after passing a 20-tap median filter. The latter was chosen because itpreserves the slopes of sudden increases in eM(n), eI(n), and eE(n), which is what a low passfilter cannot achieve. This type of presentation will be used in the following, as it allowsfor a concise comparison between different approaches or models. In doing so, only eI(n)and eM(n) are considered since eE(n) will not provide significantly more information thaneI(n), which is furthermore of primary interest. When comparing Fig. 4.14 to Fig. 4.15, itcan be seen that the increases of eM(n), eI(n), and eE(n) are strongly correlated to thoseof ∼esi(n) and ∼

efx(n). Furthermore, it can be seen that the signal-dependent LRE errors aremuch smaller than the absolute LRE errors discussed above, while the difference betweenthe errors measured at the optimization positions and at the evaluation positions is evenmore pronounced.

4.6.3 Time-varying scenariosIn this section, results for LRE experiments considering time-variant acoustic scenes arepresented. This scene is identical to the scene used to obtain the results shown in Fig. 3.23,where four plane waves are first alternatingly and then simultaneously synthesized. In thefirst half of this section, the LEMS MIMO impulse response is assumed to be known andonly the algorithms for the equalizer determination are evaluated. In the second half ofthis section,

∼h(n) is obtained by system identification and the equalizers are determined

only by using this information.Since the adaptation algorithms considered for equalizer determination are independent

of the reproduction signals, the results presented first allow for an assessment of the con-vergence of the algorithms under optimal conditions. Moreover, insight into the influenceof the reproduced scene on the relation of the values achieved for eM(n) and for EM(n)can be obtained.

In Fig. 4.16, the results for scalable wave-domain equalizers are shown, where five ap-proximative equalizer structures with NG = 1, 3, 5, 11, 17 were considered and the equal-izer coefficients represent a least-squares optimal solution. The choice of NG = 1, 3, 5, 11was motivated by the fact that the differences in LRE performance for up to four syn-thesized plane waves are most noticeable for low values of NG, while NG = 17 representsthe least restricted equalizer structure, for which coefficients could be determined. Thisrestriction was posed by memory limitations of the workstation used for the computations.

The first plot shows eM(n), while eI(n) and EM(n) are shown in the second and thirdplot, respectively. The lowest plot shows again the source activity. The equalizers were


−10

0

erro

rin

dBeM(n)eI(n)

−10

0

erro

rin

dB

eM(n)eE(n)

−10

−5

0

5

median filtered

erro

rin

dB

eM(n)eI(n)eE(n)

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.15: LRE errors shown with and without median filtering

determined prior to evaluation, such that no convergence phase of the adaptive filterscan be observed. It can be seen in Fig. 4.16 that eM(n) and eI(n) decrease for increasingvalues of NG, where the difference between eM(n) and eI(n) increases for lower equalizationerrors. While the difference of eM(n) and eI(n) is barely noticeable for NG = 1 it growsto approximately 6 dB for NG = 17. From this, it can be concluded that a certainequalization must be achieved before the spatial robustness distinguishes the individualequalization approaches. Moreover, it can be seen that the optimality of the equalizerswith respect to eM(n) and eI(n) is independent of the time-varying acoustic scene. Thisalso holds for the transitions between the excitation of different plane waves. Since EM(n)is independent of the acoustic scene per definition, it appears as a constant value in thethird plot of Fig. 4.16. It can be seen that EM(n) indicates a larger error than eM(n),although both should be equal for white noise excitation. The difference of both can beattributed to the fact, that the magnitude response of the plane waves was optimized to

194

−20

−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 17

−15

−10

−5

0

e I(n

)in

dB

−15

−10

−5

0

EM

(n)

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.16: Scene-dependent and absolute LRE performance for a time-varying acous-tic scene reproduced in a known LEMS for the evaluation of least-squaresoptimal equalizers


be flat in the origin, while the errors are determined at different distances from the origin.Moreover, the fractional delay filters used to implement the synthesis are no perfect all-pass filters, but exhibit a slight low-pass characteristic. Hence, the signals carried by theplane waves are not perfectly white at the microphone positions.

The experiment discussed above was repeated using the FxGFDAF algorithm, wherethe computational efficiency of this algorithm allowed for an evaluation of NG = 48 insteadof NG = 17. Although the equalizer determination is independent of the reproductionsignals, a phase of convergence is visible, primarily in the beginning of the experiment.Figure 4.17 shows only the first 20 seconds of this experiment, such that the convergencephase can be more clearly seen. This choice is justified above, where it is shown that thetime-variance of the acoustic scene is irrelevant to the values obtained for eM(n) and eI(n).In accordance with Fig. 4.16, eM(n) and EM(n) show a linear relationship on average inFig. 4.17 such that the following discussion applies to signal dependent and absolute LREerror measures likewise. It can be, furthermore, seen that larger values of NG increase thetime needed to reach the steady state, while the initial convergence speed slightly increaseswith growing NG. The longer time span needed to reach the steady state must therefore beattributed to a further optimization of equalizers and not to an adaptation hampered bythe large number of degrees of freedom. When comparing Fig. 4.17 to Fig. 4.16 it can beseen that the FxGFDAF algorithm can reach approximately the same LRE performance asthe least-squares optimal equalizers for all considered equalizer structures. Interestingly,NG = 48 only leads to a reduced eM(n), while eI(n) is not lower than for the equalizerswith NG = 17 in Fig. 4.16. From this it can be concluded that an increase of NG abovea certain threshold will not necessarily improve the equalization in the listening area.Again, the optimality of the equalizers with respect to the acoustic scene does not varywith the scene itself.

For completeness, the experiments conducted for Fig. 4.17 were repeated for an eval-uation of the IDI algorithm. As the results shown in Fig. 4.18 lead exactly to the sameinsight as those shown in Fig. 4.17, a detailed discussion is omitted. Nevertheless, itshould be noted that the IDI algorithm shows a significantly faster convergence than theFxGFDAF algorithm.

Since the MIMO impulse response of an LEMS is generally unknown in real-world LREscenarios, it has to be estimated by means of system identification algorithms. In thefollowing, experimental results are provided for this situation. Evaluating the combina-tion of a system identification using the GFDAF algorithm with least-squares optimalequalizers would provide insights into the influence of system identification on the LREperformance separated from the convergence behavior of the adaptive equalizers. How-ever, such experiments were not conducted due to the prohibitively high computationalcost.

In the following, results for LRE based on an estimated LEMS are discussed, whereresults for the conventional versions of both GFDAF algorithms and the IDI algorithmsare presented first. Then, results for the cost-guided versions of both GFDAF algorithmsare presented. The results presented in Fig. 4.19 were obtained using the conventional

196

−20

−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−15

−10

−5

0

e I(n

)in

dB

−20

−10

0

EM

(n)

0 2 4 6 8 10 12 14 16 18 204321

time in seconds

Sour

ce

Figure 4.17: Scene-dependent and absolute LRE performance for a time-varying acousticscene reproduced in a known LEMS for the evaluation of the FxGFDAFalgorithm


−20

−10

0e M

(n)

indB

NG = 1NG = 3NG = 5NG = 11NG = 48

−15

−10

−5

0

e I(n

)in

dB

−20

−10

0

EM

(n)

0 2 4 6 8 10 12 14 16 18 204321

time in seconds

Sour

ce

Figure 4.18: Scene-dependent and absolute LRE performance for a time-varying acousticscene reproduced in a known LEMS for the evaluation of the IDI algorithm

GFDAF algorithm for system identification and the conventional FxGFDAF algorithmfor equalizer determination. Unlike Figures 4.16 to 4.18, the third plot of Fig. 4.19 showsthe normalized misalignment ∆h(n), which provides insight in the interaction betweensystem identification and equalizer determination. The respective curves in the followingplots capture the same measures.

The results obtained for equalizing an estimated LEMS differ significantly from thosediscussed above, as both adaptive filters have to converge to achieve LRE. This alsoresults in an initial divergence of the LRE during the first part of the experiment, whichis more pronounced for the models using lower numbers of NG. The results show asignificant variation over time, such that eM(n) and eI(n) lead to a different ranking ofthe LRE performance for the individual equalizer structures, depending on the consideredtime instant. Still, a general assessment of the approaches is possible at the end of each

198

experiment where the LRE errors reached a steady state, while multiple active sourceschallenge the acoustic models. It can be seen that the wave-domain LRE with NG = 1shows the worst LRE performance, similar to the AEC with NH = 1. Unsurprisingly,the approach using NG = 48 shows the best performance, where it should be notedthat the tremendous computational demands for this equalizer structure preclude a real-time implementation on commercially available hardware in the near future of the year2015. While the LRE performance of the approaches using NG = 3, 5, 11 lies in betweenthe performance achieved with NG = 1 and NG = 48, the LRE errors obtained forNG = 3, 5, 11 do not allow for clearly distinguishing those approaches. Like for the resultspresented in Figures 4.16 to 4.18, eI(n) is larger than eM(n), where this difference is moreevident for larger NG. When considering ∆h(n) in the third plot of Fig. 4.19, it can be seenthat the approximative models result in a significant divergence of system identification,which is more pronounced for a larger degree of approximation. This is not observedfor the general LRE structure with NG = 48 and constitutes one of the challenges toovercome for those approximative models.

For an evaluation of the IDI algorithm, the experiment used to obtain the results pre-sented in Fig. 4.19 was modified by replacing the FxGFDAF algorithm with the formeralgorithm. The results are shown in Fig. 4.20, where the faster adaptation of the IDI algo-rithm leads to a more pronounced divergence in the beginning of the experiment. On theother hand, the convergence speed later in the experiment is also significantly increased.At the end of the experiment, the achieved values for eM(n) and eI(n) are slightly lowerthan those shown in Fig. 4.19, with the exception of eI(n) for NG = 48. Unlike the ap-proach evaluated above, the value of eI(n) obtained for the LRE structures with NG > 1exhibit a lower bound of approximately −10 dB. This behavior can be attributed to thespecific adaptation algorithm combination used for this experiment, where the fast con-vergence of the IDI algorithm leads to a fast change of the loudspeaker signal correlationproperties governing system identification. Still, multifaceted experiments exceeding thescope of this thesis would be necessary to obtain a clear insight into this relationship.Like for the previous experiment, the system identification diverges for the structure withNG < 48, where all approximative equalizer structures lead to a similar misalignment.

In Sections 3.4.4 and 4.5.2 modified versions of the GFDAF and the FxGFDAF algo-rithms have been presented, respectively. The cost-guided GFDAF algorithm has beenshown to improve the system identification (see Sec. 3.5.4), which can also be benefi-cial for LRE based on an estimated LEMS. The rationale behind using the cost-guidedFxGFDAF algorithm for equalizer determination is that optimal equalizers for a knownLEMS are diagonally dominant in the wave-domain. Thus, enforcing such a structurewhen determining equalizers for an estimated LEMS can also alleviate misconvergenceresulting from uncertainties in the LEMS estimation. In the following, the cost-guidedversions of both GFDAF algorithms are firstly evaluated separately, before an experi-ment combining both modified algorithms is presented. As first scenario, the experimentdescribed above was also conducted with the cost-guided GFDAF algorithm for sys-tem identification, combined with the FxGFDAF algorithm to determine the equalizers.


−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−10

−5

0

5

e I(n

)in

dB

0

10

20

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.19: Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the FxGFDAF al-gorithm operating on an LEMS estimated with the GFDAF algorithm

200

−20

−10

0

10

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−10

0

10

e I(n

)in

dB

0

10

20

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.20: Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with IDI algorithm op-erating on an estimated LEMS


Like for the AEC evaluation,∼C(n) was chosen according to (3.166) with β1 = 0.01 and

β2 = 0.1. However, unlike for AEC, an increase of the a priori error ∼e′si(k) is not prob-lematic for LRE, whereas a good system identification is more important. Hence, wc(n)in (3.166) was set to one, as no balance between improving the system identification andincreasing the a priori error has to be achieved. The choice of βsi = 0.1 will be motivatedlater.

The results for this experiment are presented in Fig. 4.21. Surprisingly, ∆h(n) could notbe decreased by choosing this algorithm, as it can be seen from the third plot. Still, whencomparing eM(n) and eI(n) in the first two plots with those of Figures 4.19 and 4.20, thebenefit of this algorithm for LRE can be clearly seen, since both LRE errors are noticeablyreduced. Furthermore, the achieved values of eM(n) and eI(n) show a more predictablebehavior, i. e., the LRE performance steadily increases with growing NG at most timeinstants during the experiment. Obviously, the normalized misalignment only allows fora limited indication of the system identification quality with respect to LRE. Otherwise,large values of ∆h(n) would also imply large values of eM(n) and eI(n), which is not thecase.

While the white noise excitation signals chosen for equalizer determination cannot bethe cause for nonunique optimal equalizers (see Sec. 4.3), an ambiguous system identifica-tion will generally lead to nonunique optimal equalizers (see Sec. 4.4). As the divergenceof system identification can be attributed to the interaction with the iterative equal-izer determination, another approach can be followed to stabilize the LRE system: Asdescribed in Sec. 4.2, unique equalizers will exhibit a stronger weight for certain wave-domain couplings. Hence, the cost-guided FxGFDAF algorithm can also improve theequalizer determination. In Fig. 4.22, results of the experiment described above, con-ducted with the modified FxGFDAF algorithm are presented, where β1 = 0.01, β2 = 0.1,and βeq = 0.01 have been chosen in (3.165) and (3.166)2. The choice of βeq will bemotivated in the following section. When comparing Fig. 4.21 to Fig. 4.22, it can beseen that the effect of this approach is very similar to using the cost-guided GFDAFalgorithm for system identification. The only noticeable difference is that, when the cost-guided GFDAF algorithm is used for system identification, the normalized misalignmentfor NG = 11 is reduced. Since this experiment does not allow for more general conclusions,a more detailed discussion is omitted for the sake of brevity.

An obvious option to further improve the LRE performance is to combine the cost-guided GFDAF algorithm for system identification with the cost-guided FxGFDAF algo-rithm for equalizer determination. This combination was used to obtain the results shownin Fig. 4.23. Unfortunately, it turns out that this does only lead to a small improvementfor eM(n), compared to using the cost-guided GFDAF algorithm only for system identifi-cation. From the comparison of Figures 4.21 to 4.23 with Fig. 4.19 it can be concludedthat using the cost-guided GFDAF algorithm variants for either of both tasks, systemidentification or equalizer determination, improves the performance of an LRE system. A

2Note that βeq is represented as βsi in (3.166)

202

−20

−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−10

0

e I(n

)in

dB

0

10

20

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.21: Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with FxGFDAF algorithmoperating on an LEMS estimated with the cost-guided GFDAF algorithm


−20

−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−10

0

e I(n

)in

dB

0

10

20

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.22: Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the cost-guidedFxGFDAF algorithm operating on an estimated LEMS

204

combination of both provided no further improvements but also no noticeable disadvan-tages.


−20

−10

0

e M(n

)in

dB

NG = 1NG = 3NG = 5NG = 11NG = 48

−10

0

e I(n

)in

dB

0

10

20

∆h(n

)in

dB

0 10 20 30 40 50 60 704321

time in seconds

Sour

ce

Figure 4.23: Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the cost-guidedFxGFDAF algorithm operating on an LEMS estimated with the cost-guidedGFDAF algorithm

206

−20

−15

−10

−5

Not evaluated dueto computationalconstraintse M

(n)

indB

NS = 1NS = 3

5 10 15 20 25 30 35 40 45

−10

−5 Not evaluated dueto computationalconstraints

e I(n

)in

dB

Figure 4.24: Scene-dependent LRE performance of least-squares optimal equalizers deter-mined for a known LEMS as a function of NG

4.6.4 Stationary scenarios

In the previous section, the convergence behavior of different wave-domain equalizer struc-tures in combination with different adaptation algorithms has been investigated. In thissection, these results are complemented by an assessment of the LRE performance fora broader range of approximative LRE structures. To this end, another experiment isconsidered, where NS sources are continuously and simultaneously reproduced and eM(n),eI(n), and ∆h(n) are measured for the assessment. The experiments in this section aresplit into two parts: In the first part, three experiments (Figures 4.24 to 4.26) are pre-sented, where the LEMS is known such that the properties of the approximative equalizerstructure and the algorithms for equalizer determination can be assessed. In the secondpart (Figures 4.27 to 4.33), the equalizers are determined for an estimated LEMS to in-vestigate the mutual influence of system identification and equalizer determination. Inany case, the averages of eM(n) and eI(n) in the last ten block time instants (indexed byn) of a 45 seconds lasting experiment are considered.

In Fig. 4.24, the LRE performance of least-squares optimal equalizers is shown, whichhave been obtained by a direct solution a system of linear equations. Note that for thisnon-iterative approach, the convergence over time has no meaning. Since this approachimplies much higher computational cost compared to using the FxGFDAF algorithm orthe IDI algorithm, only models up to NG = 17 could be evaluated. It can be seen thatthe LRE error eM(n) decreases proportionally to 1/NG. The LRE error eI(n), whichis measured in the listening area, decreases slightly slower than 1/NG, where eI(n) is


−30

−20

−10

e M(n

)in

dBNS = 1NS = 3

5 10 15 20 25 30 35 40 45−15

−10

−5

e I(n

)in

dB

Figure 4.25: Scene-dependent LRE performance of equalizers determined for a knownLEMS using the FxGFDAF algorithm as a function of NG

approximately 6 dB higher than eM(n) for NG = 17. The obtained measures are notsignificantly influenced by the number NS of simultaneously reproduced sources.

A more efficient way to determine the equalizers is using the FxGFDAF algorithminstead of the least-squares method, which circumvents the computational constraints ofthe method considered above. As can be seen from the results shown in Fig. 4.25, theFxGFDAF algorithm almost achieves the same LRE performance as least-squares optimalequalizers, which could only be determined for NG ≤ 17. It can be, furthermore, seen thatthe LRE error eM(n) decreases significantly more slowly than 1/NG for larger NG, beforeit reaches −27 dB for NG = 48. This can be explained by the fact that the excitation ofhigher-order modes is weaker for a given frequency and a given % (see (2.23)). Hence,their contribution to the LRE error is lower such that their equalization is less important.Moreover, eI(n) does not decrease any further for NG > 17, which is in accordance withthis interpretation. Previous observations [KSAJ07, WA01] suggest that the wave fieldfor % < 0.5 m and frequencies below 1 kHz is predominantly described by modes below anorder of ten or 21 wave-field components in the wave-domain signals. Thus, the modelsfor NG ≥ 21 already describe the coupling of all modes with significant influence on thewave field reproduced in the listening area. The same results have been obtained, whenreplacing the FxGFDAF algorithm by the IDI algorithm, as can be read from Fig. 4.26,where eI(n) is slightly increased, in comparison to Fig. 4.25.

Like in Sec. 4.6.3, the experiments shown above have been conducted for an LEMSwhich has been identified by an adaptive filter, observing the pre-equalized loudspeakersignals and the resulting microphone signals. The results of the experiments evaluating

208

−30

−20

−10

0e M

(n)

indB

NS = 1NS = 3

5 10 15 20 25 30 35 40 45−15

−10

−5

e I(n

)in

dB

Figure 4.26: Scene-dependent LRE performance of equalizers determined for a knownLEMS using the IDI algorithm as a function of NG

the original GFDAF algorithm for system identification and the FxGFDAF algorithmfor equalizer determination are shown in Fig. 4.27, where NH = NG was chosen. There,different acoustic scenes have been considered, were NS = 1, 3, 8, 20, 40 sources have beensimultaneously active.

Although the equalizer determination is independent of the reproduced signals, it canbe clearly seen that the influence of the reproduced scene on the estimated LEMS influ-ences the LRE performance significantly: The system identification performance increasesfor larger values of NS. While this behavior has already been observed for AEC, it is ac-companied now by a better LRE performance for larger values of NG. With growing NS,the LRE performance approaches the values that were achieved for the known LEMS inthe experiments discussed earlier in this section. Similar to the AEC performance for lowvalues of NS and NH, the LRE performance for lower numbers of simultaneously synthe-sized sources is better when relatively simple equalizer structures and LEMS models areused. At the same time, the larger nullspace impairs the LRE for equalizer structures withlarger values of NG, when only a low number of sources is synthesized. This suggests thatapproximative wave-domain models should be matched to the reproduced acoustic sceneand, furthermore, emphasizes the importance of model scalability. Moreover, a dynamicadaptation of the approximative wave-domain LEMS model can be one research avenuefor the further development of WDAF. Such an adaptation could be based on the numberof reproduced (independent) acoustic sources, which might possibly have to be estimatedfrom the unequalized loudspeakers signals.

In Fig. 4.28 the experiment shown in Fig. 4.27 was repeated, where the IDI algorithm


−20

−10

e M(n

)in

dBNS = 1NS = 3NS = 8NS = 20NS = 40

−15

−10

−5

e I(n

)in

dB

5 10 15 20 25 30 35 40 45

−10

0

NG

∆h(n

)in

dB

Figure 4.27: Scene-dependent LRE performance and system misalignment as a functionof NG of equalizers determined using the FxGFDAF algorithm for an LEMSestimated with the GFDAF algorithm

was replacing the FxGFDAF algorithm and only NS = 1, 3 were considered. It can beseen that using this algorithm leads to a slightly degraded LRE performance, comparedto the FxGFDAF algorithm, which verifies the findings of the previous section.

To assess the influence of system identification with the cost-guided GFDAF algorithmon LRE, experiments with three active plane waves (NS = 3) have been conducted, whereβsi was varied, while all other parameters were chosen as in the previous section. It canbeen seen from the third plot in Fig. 4.29 that increasing values of βsi lead to a bettersystem identification for NG < 21. This leads again to an improved LRE performance, ascan be read from the first and the second plot of Fig. 4.29. Particularly, the increase ofeM(n) and eI(n) in the range of 7 ≤ NG ≤ 21 can be reduced, when using the cost-guidedGFDAF algorithm for system identification. Nevertheless, the second plot shows thateI(n) is increased for 21 ≤ NG when using a larger weight βsi. Finally, choosing larger

210

−15

−10

−5e M

(n)

indB

NS = 1NS = 3

−12

−10

−8

−6

−4

e I(n

)in

dB

5 10 15 20 25 30 35 40 45−4

−2

0

NG

∆h(n

)in

dB

Figure 4.28: Scene-dependent LRE performance and system misalignment as a function ofNG of equalizers determined using the IDI algorithm for an LEMS estimatedwith the GFDAF algorithm

values of βsi is suitable in time-varying scenarios (see Sec. 4.6.3) and for low values of NG,which will be a typical case in real-world scenarios.

As discussed in the previous section, using the cost-guided FxGFDAF algorithm is an-other approach, suitable to increase the LRE performance in underdetermined scenarios.The LRE performance achieved with this approach is shown in Fig. 4.30, where the samescenario as for Fig. 4.29 has been considered. Similar to the cost-guided GFDAF algo-rithm (for system identification), the cost-guided FxGFDAF algorithm can be used todecrease eM(n) and eI(n), although its influence is weaker. Furthermore, it can be seenthat βeq = 0.01 constitutes the most suitable choice for the considered scenarios. Unlikewhen using the cost-guided GFDAF algorithm for system identification, it can be seenthat βeq should be chosen more carefully, when using the cost-guided FxGFDAF algorithmfor equalizer determination. This motivated the choice of βeq = 0.01 in Sec. 4.6.3.

Finally, an LRE using a combination of both modified algorithms has been evaluated,


−20

−15

−10

−5

e M(n

)in

dB

βsi = 10−3

βsi = 5 · 10−3

βsi = 10−2

βsi = 5 · 10−2

βsi = 10−1

−12

−10

−8

−6

−4

e I(n

)in

dB

5 10 15 20 25 30 35 40 45

−4

−2

0

2

NG

∆h(n

)in

dB

Figure 4.29: Scene-dependent LRE performance and system misalignment as a functionof NG of equalizers determined using the FxGFDAF algorithm for an LEMSestimated with the cost-guided GFDAF algorithm

where βsi = 0.1 and βeq = 0.01 have been used. The results shown for NS = 1, 3 inFig. 4.31 verify the findings of the previous section that the combination of both modifiedalgorithms cannot improve the LRE performance any further.

Since the divergence of system identification will be a problem relevant to real-worldimplementations of LRE, the normalized misalignment as a function of time and NG isdiscussed in the following. To this end, the results shown for NS = 3 in Fig. 4.27 arepresented in Fig. 4.32 as a three-dimensional plot. It can be seen that structures withlow values of NG do not only exhibit an increasing misalignment near the end of theexperiment, but also a steeper decrease of ∆h(n) in the beginning of the experiment. Inthe range of approximately 7 ≤ NG ≤ 17, a change in the behavior can be noticed, wherethis divergence ceases and the decrease of ∆h(n) in the beginning of the experiment isflatter such that it almost forms a step.

To further describe the influence of the cost-guided GFDAF algorithm (for system

212

−20

−15

−10

−5

e M(n

)in

dB

βeq = 10−3

βeq = 5 · 10−3

βeq = 10−2

βeq = 5 · 10−2

βeq = 10−1

−10

−5

e I(n

)in

dB

5 10 15 20 25 30 35 40 45

−4

−2

0

2

NG

∆h(n

)in

dB

Figure 4.30: LRE performance as a function of NG of equalizers determined using thecost-guided FxGFDAF algorithm for an LEMS estimated with the GFDAFalgorithm

identification) on the experimental results shown in Fig. 4.29, the latter are shown as athree-dimensional plot in Fig. 4.33, also considering the case NS = 3. It can be seen thatalthough the application of the cost-guided GFDAF algorithm can slightly reduce thesystem misalignment, it does not change the general behavior of the adaptive filter. Still,it is expected that the cost-guided GFDAF algorithm will limit the divergence, when con-sidering long-term experiments. This is because it penalizes large filter coefficients, whichwould be obtained when using the original GFDAF algorithm for system identification.


−20

−15

−10

−5

e M(n

)in

dB

NS = 1NS = 3

−12

−10

−8

−6

e I(n

)in

dB

5 10 15 20 25 30 35 40 45

−4

−2

0

2

NG

∆h(n

)in

dB

Figure 4.31: Scene-dependent LRE performance and system misalignment as a functionof NG of equalizers determined using the cost-guided FxGFDAF algorithmfor an LEMS estimated with the cost-guided GFDAF algorithm

214

010

2030

40

1020

3040

−4

−2

0

2

time in secondsNG

∆h(n

)

Figure 4.32: System misalignment as a function of time and NG for equalizers determinedusing the FxGFDAF algorithm for an LEMS estimated with the GFDAFalgorithm


010

2030

40

1020

3040

−4

−2

0

time in secondsNG

∆h(n

)

Figure 4.33: System misalignment as a function of time and NG for equalizers determinedusing the FxGFDAF algorithm for an LEMS estimated with the cost-guidedGFDAF algorithm

216

4.6.5 Evaluation summaryIn this section, the results of Sections 4.6.3 and 4.6.4 are summarized.

• From Figures 4.16 to 4.18, it can be seen that when equalizers are determined fora known LEMS, direction and number of synthesized plane waves do not influencethe scene-dependent LRE errors eM(n) and eI(n).

• It can be furthermore seen from comparing Fig. 4.17 to Fig. 4.18, that the IDIalgorithm converges significantly faster than the FxGFDAF algorithm.

• Figures 4.19 to 4.23 show that successful LRE for a time-varying acoustic scenebased on a simultaneously estimated LEMS is possible.

• However, Figures 4.19 to 4.23 as well as Figures 4.27 to 4.33 show that the in-teraction between system identification and equalizer determination can lead to adivergence when using approximate wave-domain filter structures. This divergencepredominantly degrades the accuracy of system identification but also the achievedLRE performance. In that context, it can be seen from Fig. 4.20 that the fasterconvergence of the IDI algorithm can lead to a more pronounced divergence in somesituations.

• Figures 4.21, 4.29, and 4.33 show that using the cost-guided GFDAF algorithm forsystem identification generally improves the LRE performance, while the accuracyof system identification is only improved in some cases.

• Figures 4.22 and 4.30 show that using the cost-guided FxGFDAF algorithm forequalizer determination also improves the LRE performance, although this effectis slightly less pronounced than for cost-guided system identification. Moreover,the parameter choice for the cost-guided FxGFDAF algorithm (for equalizer deter-mination) is more critical than for the cost-guided GFDAF algorithm (for systemidentification).

• From Figures 4.23 and 4.30 it can be seen that combination of cost-guided GFDAFalgorithms provides similar improvements to LRE as using the cost-guided GFDAFalgorithm for system identification alone.

• All experiments showed consistently that using an approximative wave-domain LEMSmodel in combination with an approximative wave-domain equalizer structure al-lows for effective LRE even with a moderate model complexity.

• The optimal choice of model complexity for LRE based on an estimated LEMS isdependent on the reproduced scene.

Eventually, the benefit of the models and methods presented in this thesis was shown.Moreover, the influence of essential model and algorithm parameters has been illustrated.

4.7 Implementation of Listening Room Equalization 217

To approach real-world implementations of LRE systems, more investigations are stillnecessary, which include the consideration of measured impulse responses and microphonenoise.

4.7 Implementation of Listening Room EqualizationIn this section, the computational advantages using an approximative wave-domain equal-izer structure compared to a conventional equalizer structure in the point-to-point domainare briefly discussed. To this end, the number of operations necessary for a single iterationof the equalizer determination is considered. Note that for real-world implementations ofLRE, the LEMS has to be identified first, where the computational effort for this taskwas already discussed in Sec. 3.6.

First, the computational effort (measured as in Sec. 3.6) for determining equalizers with-out approximative models is shown in Fig. 4.34, where NM = NL, LD = LF = 512, LX =LH + LD − 1, LZ = LG + LD − 1, and LG = 8192 were chosen. It can be seen that thecomputational effort for the equalizer determination is significantly larger than the effortfor system identification. Furthermore, it can be seen that determining the equalizers sep-arately for each original loudspeaker signal leads to considerable computational savings,even without approximative models. Since the IDI algorithm avoids filtering the excita-tion signals, it has slightly lower computational demands than the FxGFDAF algorithmwith separate equalizer determination.

In Fig. 4.35, the computational demands when using different wave-domain models areshown, where it was assumed that the approximative LEMS model and equalizer structureare described by (3.34) and (4.31) with NH = NG. Since the IDI algorithm exhibits theleast computational demands, it was chosen for this comparison. From the results, it canbe seen that the computational savings by using approximative models are larger than forAEC. Unlike system identification, there is no redundancy in the matrix inversion used todetermine the equalizers. This avoids an increased computational effort, when switchingfrom a conventional point-to-point model to an approximative model in the wave domain.Hence, using the latter approach for LRE is even more attractive than in the case of AEC.

218

5 10 15 20 25 30 35 40 45

106

108

1010

1012

1014

NL

float

ing-

poin

top

erat

ions

(FLO

Ps)

IDI algorithmFxGFDAF algorithmFxGFDAF algorithm sep.

Figure 4.34: Computational effort for equalizer determination per iteration (indexed byn)

5 10 15 20 25 30 35 40 45

106

107

108

109

1010

1011

NL

FLO

Ps

point-to-point NG = NH = 3NG = NH = 5 NG = NH = 10NG = NH = 20

Figure 4.35: Computational effort for equalizer determination per iteration (indexed byn). Values for NH > NL are not shown because such models do not exist.

219

5 Summary and ConclusionsThis thesis is concerned with the further development and investigation of wave-domainadaptive filtering (WDAF), which is a technique previously proposed to overcome thechallenges of adaptive filtering for acoustic multiple-input multiple-output (MIMO) sys-tems. Both, aspects relevant to theoretical considerations and practical implementationshave been treated.

In Sections 2.3.2 and 2.3.3 the relation between acoustic wave fields excited and mea-sured within an loudspeaker-enclosure-microphone system (LEMS) and its signal-relatedproperties as a MIMO system have been investigated on a theoretical basis. The wave-domain LEMS model could thereby be explicitly separated into parts describing scatteringat the enclosure, wave propagation, and spatial sampling. From this, deeper insight intothe resulting wave-domain properties of the LEMS could be obtained. This also allowedfor predicting an efficient wave-domain LEMS model for linear array geometries locatedin rectangular rooms.

For setups comprising circular microphone arrays, transforms based on circular har-monics have been derived and analyzed in detail (see also [SK12b]). The derivation ofthese transforms was conducted in the continuous frequency domain in Sec. 2.4. To thisend, an approach was chosen that allows for formulating inverse transforms independentlyof the number of microphones. This was not possible for the transforms that were origi-nally derived for WDAF. Furthermore, the influence of the necessary approximations andarray positioning errors has been discussed to provide the basis for array geometry rec-ommendations. In Sec. 2.5.3 a discrete time-domain implementation of these transformswas described.

Considering acoustic echo cancellation (AEC) as an exemplary application for systemidentification, it has been shown that the simple acoustic LEMS model originally proposedfor WDAF is not suitable for AEC in many practically relevant reproduction scenarios.The generalized model presented in Sec. 3.2 of this thesis along with [SK11], however,allows to freely adjust the number of degrees of freedom to the accuracy necessary for theapplication or matched to the available computing power. The latter is a crucial aspect,as a large number of loudspeaker channels often precludes a real-time implementation,while the presented approximative model allowed for an implementation of AEC for 48loudspeaker signals on a conventional state-of-the-art desktop computer (see Sec. 3.6).

Disregarding computational demands, the nonuniqueness of system identification withtypical loudspeaker signals is another fundamental problem for multichannel AEC. Con-ditions for the occurrence of this problem have been derived with respect to the repro-duction system parameters and adaptive filter parameters in Sec. 3.3. This analysis was

220

not available in that depth before. Moreover, it was shown that this problem originatesfrom the combination of the rendering system and the LEMS and will generally occurindependently of the order of signal statistics that is considered for system identification.It has also been shown that approximative wave-domain models allow for an effective mit-igation of the problem while reducing the filter length does not. State-of-the-art remediesagainst the nonuniqueness problem suffer from two shortcomings, when aiming at high-quality massive multichannel reproduction: Most approaches were a) neither proposednor investigated for a large number of reproduction channels and b) their applicationwill generally modify the loudspeaker signals, which potentially degrades the reproduc-tion quality. The remedy presented in Sec. 3.3.5 of this thesis is intrinsically formulatedfor MIMO scenarios, while any influence on the reproduced loudspeaker signals can beexcluded.

For the implementation of WDAF, four well-known adaptation algorithms for MIMOfilters have been treated in Sec. 3.4. While the least mean squares (LMS) algorithm,the affine projection algorithm (APA), and the recursive least squares (RLS) algorithmare mainly presented for comparison purposes, the generalized frequency-domain adaptivefiltering (GFDAF) algorithm is the algorithm that is primarily under consideration in thisthesis. Following a derivation different from the originally presented one, the GFDAFalgorithm was identified to be an approximation of the RLS algorithm. A simple buteffective regularization for the GFDAF algorithm has been shown to also provide a linkto the LMS algorithm. Moreover, novel variants of these algorithms have been presentedthat can use approximative LEMS models and provide a cost-guided system identificationthat is necessary to implement the approach presented in Sec. 3.3.5.

The system identification necessary for AEC is also a building block for listening roomequalization (LRE), which was considered as a second application example in this thesis.The approximative wave-domain LEMS model originally proposed for WDAF implied asimple equalizer structure, which is not the case for the generalized models considered inthis thesis. However, it was shown in [SK12a] that general equalizers can be successfullyapproximated by a structure similar to the generalized approximative LEMS model. Thisapproach has been discussed in Sec. 4.2 of this thesis, while the simulation results weresubstantially extended in Sec. 4.6. Following this approach, the applicability of WDAFcould be enhanced from the reproduction of scenes with single active sources to the sceneswith multiples active sources, as included in typically reproduced scenes.

By restating the task of equalizing an acoustic scene, it could be shown for the first timein Sec. 4.3 that the nonuniqueness problem does also apply to finding optimal equalizers.Still, for LRE this problem does not necessarily only originate from the loudspeaker signalproperties, but also from the properties of the LEMS. For implementation, the equaliza-tion of a reproduced scene has been related to determining equalizers for a previouslyidentified LEMS. The equalizers can then be determined using supervised adaptive fil-tering algorithms in the so-called filtered-x structure. As an alternative to this approach,the iterative DFT-domain inversion (IDI) algorithm has been derived (see also [SK12c]),which considers the relevant impulse response directly instead of operating on filtered sig-

221

nals. An LRE system with equalizers determined for an estimated LEMS was evaluatedin Sec. 4.6, as an example of a successful application of the presented models and methodin a very challenging scenario.

In summary, the novel contributions contained in this thesis and the publications thatgo along with it are:

• A complete and explicit framework description for the implementation of AEC andLRE using WDAF

• A generalization of the originally proposed wave-domain LEMS model and equal-izer structures, which allows to adjust the model complexity to the needs of theconsidered application or the given computational constraints

• A formulation of the LMS algorithm, the APA, the RLS algorithm, and the GFDAFalgorithm for approximative wave-domain filter structures and cost-guided systemidentification

• A rigorous derivation of the IDI algorithm

• A successful application of the generalized approximative wave-domain filter struc-tures to reduce the computational effort of AEC and LRE

• A novel cost-guided system identification has been shown to increase the robustnessof AEC and LRE

• An in-depth analysis of the nonuniqueness problem for system identification allowedto establish a link to the same problem in LRE

• An extended experimental evaluation of an LRE system, where the equalizers weredetermined from an estimated LEMS

The obtained results suggest that there is a great potential for the application of WDAFin real-world application scenarios, while some crucial aspects of WDAF still call for moreresearch. Some future research avenues are the following:

• An analysis of the interaction between equalizer optimization and system identifi-cation

• A dynamic adaptation of wave-domain approximate models and the parameters forcost-guided adaptation

• An application of the novel concepts described in this thesis to active noise control(ANC) using WDAF

• An application of WDAF to further array geometries

• The implementation of real-word real-time WDAF LRE systems

222

The mutual influence of system identification and equalizer optimization has been iden-tified as a problem in this thesis, although no analysis was provided. However, a rigorousanalysis would be a prerequisite for an optimal solution to this problem. It was observedthat the nonuniqueness of LEMS identification propagates to the determination of equal-izers, even if the actual reproduction signals are not considered for the actual equalizerdetermination. So far, a mathematical description of this effect is missing.

This thesis was concerned with static approximative wave-domain LEMS models andequalizer structures. At the same time, the evaluation results showed that an optimalchoice for the complexity of these structures depends on the reproduced acoustic scene.Since an reproduced acoustic scene will often be time-varying in real-world scenarios, itis obvious that an adaptive choice of the approximative wave-domain LEMS models andequalizer structures can be beneficial for real-world systems.

Previously, an ANC system using WDAF has been proposed using the original ap-proximative wave-domain LEMS model. It is expected that this ANC system suffersfrom the same limitations as AEC and LRE when using the original wave-domain LEMSmodel. Hence, the generalized wave-domain LEMS model proposed in this thesis couldalso applied to this application.

As stated in this thesis, the application of WDAF to linear transducer arrays in rect-angular room implies to use differently structured wave-domain models. Those modelshave not been investigated to a greater extent.

Finally, the implementation of a real-world real-time LRE system remains one of themost desirable goals for WDAF. While many of the contributions of the thesis can helpto achieve this, there is still a considerable amount for challenges to master. For example,results of simulation experiments with measured impulse responses are still missing, whichwould provide decisive information for the choice of the considered models.

223

A Transforms Based on SphericalHarmonics

In this appendix, an alternative wave-domain transform for a circular microphone ar-ray and an arbitrary loudspeaker array is derived. Unlike the loudspeaker signal trans-form (LST) described in Sec. 2.4.1, this transform allows for a free positioning of theloudspeakers in the three-dimensional space and it also uses no approximation of theloudspeaker contributions as plane waves. The derivation starts with a wave field descrip-tion in terms of spherical harmonics, which is then projected onto the x-y-plane.

A.1 Microphone Signal TransformFor the derivation of the microphone signal transform (MST), (2.86) is sampled in thex-y-plane (ϑ = π/2), resulting in

C(se)m

(ω) = 1bn(∼kr) ∫ 2π

0P (~x, ω)|r=r0e

−jαm dα. (A.1)

Considering the sampling at the microphone positions as described in (2.89), (A.1) hasto be approximated by a sum according to

C(se)m

(ω) ≈ 1NM

NM∑µ=1

bn(∼kRM

)D

µ(ω)e−jα(M)µ m dα. (A.2)

Like in Sec. 2.4.1, the MST is obtained by assigning the wave field component indicesaccording to (2.163):

∼Dm(ω) = 1

NM

NM∑µ=1

bn(∼kRM

)D µ(ω)e−jα

(M)µ m dα, (A.3)

leading to the frequency responses

T M,m,µ(ω) = e−jmα(M)µ

NMbn(∼kRM

) , (A.4)

T M,µ,m(ω) = bn(∼kRM

)ejmα

(M)µ . (A.5)

224

A.2 Loudspeaker Signal TransformThe loudspeakers are considered as point sources at ~x0 represented by the azimuth α0,the inclination ϑ0, and the radius r0 in spherical coordinates. The wave field excited bya single loudspeaker is described by

C(sp)m,n

(ω) = −j∼khn

(−

∼kr0

) (Y mn (ϑ0, α0)

)∗Q(ω), (A.6)

which can be obtained by transforming (2.41) using (2.86). Leaving all expressionsexpanded yields

P (pt)(~x, ~x0, ω) = −j∼k

∞∑m=−∞

∞∑n=|m|

bn(∼kr)

hn(−

∼kr0

) (Y mn (ϑ0, α0)

)∗Y mn (ϑ, α)Q(ω), (A.7)

where the sums have been reordered. Using(2.25), the expression can be expanded furtherto obtain

P (pt)(~x, ~x0, ω) = −j∼k

∞∑m=−∞

∞∑n=|m|

bn(∼kr)

hn(−

∼kr0

)

·(2n+ 1)

4π(n− m)!(n+ m)!

P mn (cosϑ(L)

λ)P m

n (cosϑ)ejmαe−jmα0Q(ω). (A.8)

Considering only a circle with radius RM in the x-y-plane, a Fourier series with respect tom can be identified in (A.8) and the orthogonality of the exponential functions describedin (2.69) can be exploited. Replacing the source position ~x0 by the actual loudspeakerpositions and superimposing all loudspeaker contributions yields

C(se)l

(ω) = −j∼kNL∑λ=1

∞∑n=|l|

bn (RM) hn(−

∼kr

(L)λ

)

·(2n+ 1)

4π(n− l)!(n+ l)!

P ln(cosϑ(L)

λ)P l

n(0)e−jlα(L)λ X λ(ω). (A.9)

Again, (2.170) is used for the assignment∼X l(ω) = C

(se)l

(ω). (A.10)

As this transform uses no approximations, it can be applied to loudspeaker arrays nearthe microphone array. The frequency responses implementing this transform are given by

T L,l,λ(ω) = −j∼k∞∑n=|l|

bn (RM) hn(−

∼kr

(L)λ

) (2n+ 1)4π

(n− l)!(n+ l)!

P ln(cosϑ(L)

λ)P l

n(0)e−jlα(L)λ .

(A.11)

225

B Correction for a Previous Derivationof the GeneralizedFrequency-Domain AdaptiveFiltering Algorithm

Instead of using (3.150) as an approximation, it was claimed in [BBK03] that there wasan identity equivalent to

W10WH10S−1

XX(n) != W10(WH

10SXX(n)W10)−1

WH10, (B.1)

which is not true as shown in the sequel. For an unambiguous notation, 6= is used in thefollowing, where appropriate. As claimed in [BBK03] multiplying SXX(n)W10 from theright-hand side leads to

W10WH10S−1

XX(n)SXX(n)W10 = W10(WH

10SXX(n)W10)−1

WH10SXX(n)W10, (B.2)

W10 WH10W10︸︷︷︸

=INLNMLH

= W10, (B.3)

which does, however, not prove (B.1), as SXX(n)W10 has less columns than rows. Thiserror is discussed in the following for the case NL = NM = 1, LX = 2LH, which is chosenfor the sake of brevity and can be straightforwardly extended to scenarios with differentNL, NM, LX, and LH. Multiplying (B.1) by WH

10 from the left-hand side leads to

WH10S−1

XX(n) 6=(WH

10SXX(n)W10)−1

︸︷︷︸R−1

XX(n)

WH10, (B.4)

which utilizes WH10W10 = ILH . Inserting (3.136) for W10 and multiplying FLX from the

right-hand side leads to(ILH

0LH×LH

)HFHLX

S−1XX(n)FLX 6= R−1

XX(n)(

ILH

0LH×LH

)H. (B.5)

The inverse S−1XX(n) is transformed to the time domain by the discrete Fourier trans-

form (DFT)-matrices, resulting in an inverse of an autocorrelation matrix with twice thedimensions of RXX(n) (

R(2)XX(n)

)−1= FH

LXS−1

XX(n)FLX (B.6)

226

such that

(ILH ,0LH×LH)(R(2)

XX(n))−16=(R−1

XX(n),0LH×LH

). (B.7)

Obviously, the left-hand side of (B.7) captures the first LH rows of(R(2)

XX(n))−1

, whichmay have components in all columns, while the right-hand side captures only zeros in thelast LH columns. The block-matrix inversion is given by(

A BC D

)−1

=(

(A−BD−1C)−1 −A−1B (D−CA−1B)−1

−D−1C (A−BD−1C)−1 (D−CA−1B)−1

), (B.8)

where A,B,C, and D are arbitrary matrices of compatible dimensions. This identitygives some insight into the error made when using (B.7) as an equation. As R(2)

XX(n) is anautocorrelation matrix of the same signals as captured in RXX(n), parts of R(2)

XX(n) areidentical to RXX(n). Hence, R(2)

XX(n) can be decomposed into

R(2)XX(n) =

RXX(n) R(2)Q (n)(

R(2)Q (n)

)HRXX(n)

, (B.9)

where the Toeplitz and Hermitian structure of these matrices was exploited and R(2)Q (n)

is used to represent the upper-right quadrant of R(2)XX(n). According to (B.7), only the

upper part of(R(2)

XX(n))−1

is relevant that can be expressed using (B.8) by

(ILH ,0LH×LH)(R(2)

XX(n))−1

=((

RXX(n)−R(2)Q (n)

(RXX(n)

)−1 (R(2)

Q (n))H)−1

,

−(RXX(n)

)−1R(2)

Q (n)(

RXX(n)−(R(2)

Q (n))H (

RXX(n))−1

R(2)Q (n)

)−1), (B.10)

which is obviously not equal to the right-hand side of (B.7). However, typical autocor-relation matrices exhibit a dominant main diagonal, where the other diagonals decay foran increasing distance to the main diagonal. Therefore, the entries RXX(n) can be con-sidered to have a significantly stronger weight than the entries in R(2)

Q (n), which justifiesto interpret (B.1) as an approximation.

227

C Influence of Algorithm Parameterson Multiple-Input Multiple-OutputSystem Identification

In this appendix, the influence of the algorithm parameters on the convergence in multiple-input multiple-output (MIMO) system identification scenarios is analyzed. To this end,the loudspeaker-enclosure-microphone system (LEMS) is represented by random impulseresponses, while no wave-domain transforms have been used such that only the adaptationalgorithm properties are investigated.

The impulse responses were generated according to

hµ,λ(k, ξ) = A(ξ)e−γ(k−D(ξ))u(k −D(ξ)), (C.1)

γ =log (10−3)T60fs

, (C.2)

where LH = 128 was chosen, D(ξ) is a uniformly distributed integer random variable with0 ≤ D(ξ) ≤ LH/3, A(ξ) is a Gaussian random variable with mean 1 and variance 0.2, andu(k) is the unit step function according to

u(k) ={

1 for k ≥ 0,0 otherwise. (C.3)

The exponential decay constant γ is determined such that the random impulse responseexhibits an approximate reverberation time given by T60 = 0.1 s, where a sampling ratefs of 2 kHz is considered. Equation (C.1) is evaluated for every loudspeaker-microphonecombination, to obtain NLNM distinct room-impulse-response-like responses, where onesample impulse response is shown in Fig. C.1.

The loudspeaker signals were obtained by filtering NL + 1 white noise source signalsthrough rendering filters with impulse responses defined by

gR,λ,q(k) =

A(ξ)e−γ(k−D(ξ))u(k −D(ξ)) for λ = q ∪ q = NL + 1 ∩ λ = 1,gR,λ,q(k) = gR,1,NL+1(k) for q = NL + 1 ∩ λ > 1,0 otherwise.

(C.4)

In a first step, the loudspeaker signals are generated using NL independent source sig-nals with a fixed power, while the power pC(k) of the last source signal is varied. As thislast source signal is coupled to all loudspeaker signals using the same impulse response,

228

0 20 40 60 80 100 120

−2

0

2

samples

hµ,λ

(k)

Figure C.1: Random sample of a loudspeaker-to-microphone impulse response

a variation of its power can be used to control the correlation of the loudspeaker signals.Since the optimal step size of the least mean squares (LMS) algorithm is strongly de-pendent on the loudspeaker signal power, the loudspeaker signals are scaled in a secondstep such that the overall average power of all signals is one. Note that this measurewould not be necessary, if the following considerations were limited to the affine projec-tion algorithm (APA) and the generalized frequency-domain adaptive filtering (GFDAF)algorithm.

The variation of pC(k), combined with varying the number of loudspeaker channelsallows for an assessment of the influence of the algorithm parameters in multichannelscenarios, where both, a greater power pC(k) and the greater number of loudspeakerchannels will make the system identification more challenging.

If not stated otherwise, LD = LH = 128 and LF = 64 were chosen in the following,where µsi = 10−4 was chosen for the LMS algorithm, while the parameters specific to theAPA were µsi = 1.25, γsi = 10−3 and λsi = 0.97, µsi = 1.5, γsi = 10−2 were chosen for theGFDAF algorithm. The signal-to-noise ratio (SNR) of the microphone signals was chosento be 40dB in order not to influence the minimum misalignment too strongly for theAPA and the LMS algorithm. The adaptation was held during the first 3 seconds of theexperiment to obtain a sufficiently well-conditioned SXX(n) for the GFDAF algorithm.

As a first example, a scenario with mutually uncorrelated loudspeaker signals (pC(k) =0) is considered for different adaptation algorithms, where the single-channel case (NL =NM = 1) is shown in Fig. C.2 and the multichannel case (NL = 50, NM = 1) in Fig. C.3.Unlike the following evaluations, the SNR of the microphone signals was set to a levelof 30dB, the APA used γsi = 10−10 and LD = 10, while the GFDAF algorithm usedγsi = 10−3. This example is not indented for a meaningful performance comparisonbetween the algorithms. Instead, it should show the fundamental difference in convergencebehavior in a single-channel and a multichannel scenario.

Considering the case NL = 1 in Fig. C.2, the GFDAF algorithm outperforms the otheralgorithms in terms of echo return loss enhancement (ERLE) (3.16) and in terms of thenormalized system misalignment (3.9). While the APA and the LMS algorithm show

229

a comparable performance regarding the ERLE, the APA achieves a significantly lowersystem misalignment.

020406080

ERLE

indB

LMSAPAGFDAF

0 5 10 15 20 25 30 35−100

−50

0

time in seconds

∆h(n

)in

dB

Figure C.2: Example of convergence curves for different adaptation algorithms for NL = 1

The results for an increased number of loudspeaker channels (NL = 50) are shown inFig. C.3. For the APA and the GFDAF algorithm, the parameters have been kept, whileis was necessary to reduce µsi to 10−5 in order to retain stability for the LMS algorithm.It can be seen that the results shown in Fig. C.3 are quite different from those shown inFig. C.2: While the GFDAF algorithm still outperforms the two other algorithms, theLMS algorithm is now more favorable than the APA. An explanation for the disappointingperformance of the APA can be found when considering the definition of the algorithm in(3.73) and (3.74). There, it can be seen that the increased number of loudspeakers doesnot affect the dimensions of the inverse of

∼X(k)

∼XH(k) in (3.74). Instead, the increased

number of loudspeakers causes∼X(k)

∼XH(k) to approach a scaled identity matrix because

∼X(k)

∼XH(k) =

NL∑λ=1

∼Xλ(k)

∼XHλ (k) (C.5)

holds, while the individual∼Xl(k) represent mutually uncorrelated signals. Since the weight

of∼X(k)

∼XH(k) grows with NL, the weight of its inverse decreases the with growing NL.

Eventually, the updates of the APA approaches the updates of the LMS algorithm, pos-sibly with a smaller step size. This example shows that increasing the number of loud-speakers can actually lead to a very different convergence for the individual adaptationalgorithms.

Since the behavior of the algorithms is also governed by the chosen parameters, thechoice of optimal parameters is closely related to the difficulty of the system identification

230

0

20

40ER

LEin

dBLMSAPAGFDAF

0 5 10 15 20 25 30 35

−15

−10

−5

0

time in seconds

∆h(n

)in

dB

Figure C.3: Example of convergence curves for different adaptation algorithms for NL =50

task, as shown in the following. To this end, optimal parameters will be determined fordifferent number of loudspeaker channels and different strength of the loudspeaker signalcross-correlation. The latter can be changed by varying pC(k), as this signal is fed to allloudspeaker channels, while the other independent signals are only fed to one loudspeakereach. The criterion for optimality is the achieved system misalignment at the end ofthe experiment after 45 seconds, where the median of five experiments with differentrandom impulse responses was taken into account to reduce the influence of outliers. Theobtained results combine assessing the steady-state system identification accuracy andan assessment of the convergence speed in the following manner: The results for theless challenging scenarios, i. e. low values of NL and pC(k) put a larger emphasis on thesteady-state performance, as the steady state will typically be reached relatively early(see Fig. C.2). For the more challenging scenarios, the algorithms will typically not havereached a steady state (see Fig. C.3). Thus, the obtained performance measures willpredominantly describe the convergence speed. Since a practically applied adaptationalgorithm has to converge within a limited time span, this evaluation can be seen as abalanced assessment.

C.1 Step SizeIn a first evaluation, the optimal step size is determined, beginning with the LMS algo-rithm, were the step sizes µsi = 10−5, 2.5 · 10−5, 5 · 10−5, 7.5 · 10−5, 10−4, 1.25 · 10−4, 2.5 ·10−4, 5 · 10−4, 7.5 · 10−4, 10−3, and 2.5 · 10−3 have been evaluated.

C.1 Step Size 231

10−5

10−4

10−3op

timalµ

si pC(k)→∞pC(k) = 0

−50

0

∆h(n

)in

dB

5 10 15 20 25 30 35 40 45 50

40

60

80

1NL

ERLE

(k)

indB

Figure C.4: Optimal step size µsi, normalized misalignment, and ERLE for the LMS al-gorithm as a function of the number of loudspeaker channels

In Fig. C.4, the optimal step size and the resulting system misalignment and ERLE isshown for two scenarios: In one case, pC(k) = 0 (dashed red line) was chosen, such thatthe loudspeaker signals are uncorrelated, while in the other case (pC(k)→∞, solid blueline) a single source signal was used to generate fully coherent loudspeaker signals.

It can be seen from the upper plot that a lower step size should be chosen with anincreasing number of loudspeaker channels. The normalized misalignment, shown in themiddle plot, is high for pC(k) → ∞, while it exhibits a lower value for pC(k) = 0. Inboth cases the misalignment increases with the growing number of loudspeaker channelsand the LMS algorithm does not identify the LEMS very well, when multiple loudspeakerchannels are used. When considering the lower plot, a decreasing ERLE can be seen for ahigher number of loudspeaker channels. Unlike system identification, the echo cancellationworks better for pC(k) → ∞, than for pC(k) = 0, which appears to be counter-intuitiveat the first glance. However, since the nonuniqueness does not limit the ERLE, but leadsto an infinite set of filter coefficients maximizing it, it is easier for the LMS algorithm toapproach an ERLE-maximizing solution under such conditions.

In Fig. C.5, the results for a simultaneous variation of NL and pC(k) are shown, wherethe upper and the lower plot have a reversed z-axis to increase readability. It can be seenthat the optimal step size µsi is not as much affected by the increase of pC(k), compared tothe increase of NL, while the relation between the lowest and the highest optimal values

232

is two orders of magnitude. In general, both, the ERLE and the misalignment achievedfor NL = 1 are significantly better than for NL > 1.

While the results show that the LMS algorithm is only suitable for a low number ofloudspeaker channels, it has to be considered that the LMS algorithms is a very simplealgorithm with very low computational demands. This would suggest, to use this algo-rithm with a smaller frame shift LF than used in this evaluation. Still, as it can be seenfrom Fig. C.6, lower values of the frame shift improve the performance of the algorithmonly moderately such that this is no tractable approach to make this algorithm suitablefor multichannel scenarios. This is an expected result, as the LMS algorithm has alreadybeen reported to provide disappointing results, even for two channels [BAGG95].

In the following, the results shown for the LMS algorithm in Figures C.4 and C.5 arepresented for all relevant parameters of the other considered adaptation algorithms. Theorder of the plots and the assignment of line styles in Fig. C.4 to values of pC(k) are kept.For comparison, the (fixed) parameters given above have also been evaluated in addition tothe individually optimized parameters. The results for pC(k)→∞ and pC(k) = 0 in thesescenarios, are shown by the dotted black line and the dash-dotted green line, respectively.Since a multi-dimensional evaluation and optimization of the parameters would result ina prohibitive computational effort, each parameter was evaluated separately, where allother parameters have a constant value, independently of NL and pC(k).

For the APA, the step sizes µsi = 0.75, 1, . . . , 2.5 have been evaluated, where experimen-tal results for pC(k) = 0 and pC(k)→∞ are shown in Fig. C.7. When comparing to theresults of the same experiments for the LMS algorithm in Fig. C.4, significant differencescan be seen: Noting that the y-scale for the optimal µsi in the upper plot is linear, a largenumber of loudspeaker channels also suggests using a smaller µsi, although this is not inthe order of two magnitudes like for the LMS, but only by a factor of 2. Furthermore, astrong correlation of the loudspeaker signals leads to a slight increase of the optimal valuefor µsi.

When considering the system misalignment in the middle plot, it can be seen forpC(k) = 0 that the APA shows an increasing misalignment when the number of loud-speaker channels is increased. Analogously to the LMS algorithm, the system identifi-cation for pC(k) → ∞ is poor, as it can be expected when considering (3.42). For theevaluation of the other parameters, a fixed value of µsi = 1.25 had to be chosen, whereit can be seen from Fig. C.7 that this choice does not significantly affect the systemidentification performance.

The ERLE shown in the lower plot does not show such a predictable curve. For pC(k)→∞ a moderate ERLE can be achieved with both, the optimized step size and the referencevalue µsi = 1.25, where the latter led to an approximately 10 dB higher ERLE. The ERLEachieved for µsi = 1.25 and pC(k) = 0 is significantly lower than for the optimized value ofµsi. In the latter case, the achieved ERLE exceeds the scale towards a perfect cancellationof the echo. This is a result of the constraint (3.68) used for the APA, which is fulfilledwhen the step size µsi = 1 is chosen and leads to a vanishing ∼esi(k). Hence, in the case

C.1 Step Size 233

1020

3040

50

−200

2040

10−4

10−3

NLpC(k) in dB

optim

alµ

si

1020

3040

50

−200

2040

−100

−50

0

NLpC(k) in dB

∆h(n

)in

dB

1020

3040

50

−200

2040

20

40

60

80

NLpC(k) in dB

ERLE

(k)

indB

Figure C.5: Optimal step size µsi, normalized misalignment, and ERLE for the LMS algo-rithm as a function of the number of loudspeaker a the power of the correlatedportion of the loudspeaker signals.

234

−80−60−40−20

∆h(n

)in

dBLF = 64LF = 32LF = 16LF = 8

5 10 15 20 25 30 35 40 45 50

40

60

80

1NL

ERLE

(k)

indB

Figure C.6: Normalized misalignment and ERLE as a function of the number of loud-speaker channels, achieved with the LMS algorithm with different frame shiftvalues

of the APA, the ERLE obtained according to the definition in (3.16) is only of limitedpractical relevance.

When considering the evaluation with respect to NL and pC(k) in Fig. C.8, it can beseen that increasing pC(k) influences the optimal µsi much stronger than increasing NL. Asexpected, the normalized misalignment degrades when pC(k) is increased. Furthermore,a perfect cancellation of the echo is only possible, when ten or more loudspeaker channelsare used and when the loudspeaker signals are not too strongly correlated.

Those results show that the APA is a more promising candidate for multichannel systemidentification than the LMS algorithm. Moreover, a choice of µsi = 1.25 appears to be agood comprise between a low µsi that is optimal for low values of pC(k) = 0 and highervalues, which are optimal for pC(k)→∞.

Finally, the GFDAF algorithm is evaluated considering, µsi = 1.5, 1.7, . . . , 3.3, where(3.154) was chosen for the implementation of the algorithm. In the upper plot of Fig. C.9the optimal value of µsi for the cases pC(k) = 0 and pC(k) → ∞ is shown as a functionof the number of loudspeaker channels NL. The obtained optimal values are relativelyhigh, considering that µsi = 1 would be optimal when no approximation was used forthe algorithm. The achieved misalignment shown in the middle plot shows the sametendencies as for the other algorithms, while the values are significantly lower than thosefor the APA and the LMS algorithm. At the same time the ERLE shown in the lower plotis satisfying for all scenarios. It can be seen that choosing µsi = 1.5 slightly degrades theperformance for both, system identification and acoustic echo cancellation (AEC). Whilethis would suggest to use a fixed value for µsi higher than 1.5, there is a strong mutualinfluence of the individual optimal parameters. Hence, choosing a high value of µsi wouldsignificantly increase the variance of the results when optimizing the other parameters.

C.2 Regularization 235

1

1.5

optim

alµ

si pC(k)→∞pC(k) = 0pC(k)→∞, µsi = 1.25pC(k) = 0, µsi = 1.25

−60−40−20

0

∆h(n

)in

dB

5 10 15 20 25 30 35 40 45 50406080

100

1NL

ERLE

(k)

indB

Figure C.7: Optimal step size µsi, normalized misalignment, and ERLE for the APA as afunction of the number of loudspeaker channels.

In order to avoid that, a rather conservative value was chosen by µsi = 1.5, which wasdetermined to be optimal for NL = 1.

When considering Fig. C.10, it can be seen that the GFDAF algorithm shows a lowerperformance for more challenging scenarios. Nevertheless, this degradation is less pro-nounced than for the LMS algorithm and the APA. This makes the GFDAF algorithmthe most promising candidate for the MIMO system identification scenarios considered inthis thesis.

As the performance of the LMS algorithm and the APA in MIMO could not be improvedby choosing any parameter set, their use for MIMO adaptive filtering in not indicated andthe following evaluations will only consider the GFDAF algorithm.

C.2 RegularizationThe presented derivations of the APA and the GFDAF algorithm comprise a regulariza-tion weight parameter γsi. The influence of this parameter on the GFDAF algorithm isevaluated in the following, where the regularization weights γsi = 10−8, 10−7, . . . , 1 havebeen considered and the results are shown in Fig. C.11. It can be seen that a strongerregularization is needed when the nonuniqueness problem occurs. This is an expectedresults, as the nonuniqueness leads a singular matrix SXX(n) that must be inverted forthe GFDAF algorithm. Still, the difference in the optimal value spans six orders of mag-

236

1020

3040

50

−200

2040

1

1.5

NLpC(k) in dB

optim

alµ

si

1020

3040

50

−200

2040

−50

0

NLpC(k) in dB

∆h(n

)in

dB

1020

3040

50

−200

2040

0

100

NLpC(k) in dB

ERLE

(k)

indB

Figure C.8: Optimal step size µsi, normalized misalignment, and ERLE for the APA as afunction of the number of loudspeaker a the power of the correlated portionof the loudspeaker signals.


2

3op

timalµ

si pC(k)→∞pC(k) = 0pC(k)→∞, µsi = 1.5pC(k) = 0, µsi = 1.5

−100

−50

0

∆h(n

)in

dB

5 10 15 20 25 30 35 40 45 5020406080

100

1NL

ERLE

(k)

indB

Figure C.9: Optimal step size µsi, normalized misalignment, and ERLE for the GFDAFalgorithm as a function of the number of loudspeaker channels.

nitude, which is even more than it was observed for the step size µsi of the LMS algorithm.The normalized misalignment achieved for γsi = 10−2 is up to 20 dB higher than for theoptimized value. The choice of γsi = 10−2 represents a compromise between the valuesoptimal for pC(k)→∞ and pC(k) = 0, where the system identification is more degradedin the latter case due to the poor absolute identification for pC(k) → ∞. The ERLEshown in the lower plot shows a slight degradation pC(k) = 0 and a slight increase forpC(k)→∞, when using γsi = 10−2 instead of the optimal value.

In Fig. C.12, the optimal value of γsi is shown as a function of NL and pC(k), wherelarge variations of the optimal value can be seen for NL ≤ 25. Furthermore, it increasesfor pC(k) < 0 dB before it slightly decreases for pC(k) > 10 dB. A comprehensive expla-nation cannot be given straightforwardly, as it is not straightforwardly possible to mapthe influence of this type of regularization to a modified cost function for the GFDAF.Obviously, the influence of regularization for large pC(k) is two-fold: For large pC(k) theproblem of system identification is severely ill-conditioned, which makes a regularizationindispensable. However, at the same time, the adaptive filter is strongly sensitive to theinfluence of the regularization, which can degrade system identification. This can ex-plain, why a lower optimal regularization weight was found for pC(k) > 20 dB than for0 dB ≤ pC(k) ≤ 20 dB. Still, an exhaustive analysis of this behavior can be a topic tofuture research.

238

1020

3040

50

−200

2040

2

3

NLpC(k) in dB

optim

alµ

si

1020

3040

50

−200

2040

−100

−50

0

NLpC(k) in dB

∆h(n

)in

dB

1020

3040

50

−200

2040

50

100

NLpC(k) in dB

ERLE

(k)

indB

Figure C.10: Optimal step size µsi, normalized misalignment, and ERLE for the GFDAFalgorithm as a function of the number of loudspeaker a the power of thecorrelated portion of the loudspeaker signals.


10−8

10−4

100

optim

alγ

si pC(k)→∞pC(k) = 0pC(k)→∞, γsi = 10−2

pC(k) = 0, γsi = 10−2

−100

−50

0

∆h(n

)in

dB

5 10 15 20 25 30 35 40 45 5040

60

80

100

1NL

ERLE

(k)

indB

Figure C.11: Optimal regularization factor γsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker channels.

1020

3040

50

−200

204010−10

10−5

100

NLpC(k) in dB

optim

alγ

si

Figure C.12: Optimal regularization factor γsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker a thepower of the correlated portion of the loudspeaker signals.

240

0.85

0.9

0.95

1op

timalλ

si pC(k)→∞pC(k) = 0pC(k)→∞, λsi = 0.97pC(k) = 0, λsi = 0.97

−100

−50

0

∆h(n

)in

dB

5 10 15 20 25 30 35 40 45 5040

60

80

1NL

ERLE

(k)

indB

Figure C.13: Optimal “forgetting factor” λsi, normalized misalignment, and ERLE for theGFDAF algorithm as a function of the number of loudspeaker channels.

C.3 “Forgetting factor” of the GeneralizedFrequency-Domain Adaptive Filtering Algorithm

The forgetting “forgetting factor” λsi is a parameter specific to the recursive least squares(RLS) and GFDAF algorithms, which determines how fast the exponential weight for theprecious error signals decays. A lower value will leads to a faster convergence, while italso increases the variance of the filter coefficients of the identified system.

For this parameter, the values λsi = 0.81, 0.84, 0.87, 0.90, 0.93, 0.95, 0.96, 0.97, 0.98 and0.99 have been evaluated for the GFDAF algorithm. When considering Fig. C.13, it can beseen that a lower value of λsi is more suitable when nonuniqueness occurs for a low numberof loudspeaker channels. For a higher number of loudspeaker channels and uncorrelatedloudspeaker signals, higher values of λsi are more suitable. Choosing λsi = 0.97 representsa suitable compromise for the considered scenarios, where the system identification andAEC is only moderately influenced. Figure C.14 clearly confirms the trend toward higheroptimal values for an increasing number of loudspeaker channels and a decreasing pC(k).

C.4 Length of Considered Microphone Signals 241

1020

3040

50

−200

2040

0.8

0.9

NLpC(k) in dB

optim

alλ

si

Figure C.14: Optimal “forgetting factor” λsi, normalized misalignment, and ERLE for theGFDAF algorithm as a function of the number of loudspeaker a the powerof the correlated portion of the loudspeaker signals.

C.4 Length of Considered Microphone SignalsAll of the described algorithms involve the parameter LD to choose the number of con-sidered samples (block size) of the microphone signals in each iteration. However, adiscussion of this parameter is omitted for brevity. For the GFDAF algorithm, choos-ing LD = LH leads to describing the original GFDAF algorithm [BBK03]. Preliminaryevaluations showed that this is also a suitable value for the APA and the LMS algorithmin the considered scenarios, noting that all algorithms tend to achieve a better systemidentification for larger LD.

243

D Implementation of Acoustic EchoCancellation and Listening RoomEqualization

In this appendix, the implementation of wave-domain system identification, acoustic echocancellation (AEC), and listening room equalization (LRE) for real-time applications andoffline simulations is discussed. To this end, the solution of a special kind of systems oflinear equations is treated in Appendix D.1, which is of significant importance for the im-plementation of the adaptation algorithms discussed in Appendix D.2. In Appendix D.3,the implementation of the other necessary system components is discussed, before theimplementation of adaptive equalizers is discussed in Appendix D.4. In Appendix D.5, itis explained how the real-valuedness of the loudspeaker and microphone signals in theiroriginal domain can be exploited for an efficient wave-domain implementation of adaptivefilters.

The recommendations for the implementation are only intended to provide a roughguideline, since there will typically be further aspects to be considered for real-worldimplementation. These aspects include, but are not limited to, parallelization or compile-time optimization, which can require a deviation from the schemes described in the fol-lowing. The efficient implementation of adaptation algorithms is a field of research of itsown, which exceeds the scope of this thesis. In the same way, the following determinationof the computational complexity should allow for a principal assessment whether an ap-proximative wave-domain model leads to a computational advantage in a given scenario,but it is not intended to estimate the computational demands of an implementation onreal-word computer hardware.

The number of necessary operations for processing steps has been determined by avoid-ing all unnecessary computations, such as multiplication with zero-values and repeti-tions of identical computations. Multiplications and additions typically occur simulta-neously such that a combination of both will be counted as one single floating-pointoperations (FLOPs), which is in accordance to the actual time consumption on computerhardware. As an implementation may use LT > 1, the wave-domain signals might havea different signal segment length than the corresponding signals in the original domain.Still, the influence of this on the overall computational effort can be neglected such thatLX, LD, and LH will be used in the following instead of their wave-domain representation.This choice was made to increase conciseness.

244

D.1 Solving Systems of Linear Equation using theCholesky Decomposition

For the considered adaptation algorithm, it is often necessary to solve a system of linearequations according to

Au = v, (D.1)

where A and v are given. The matrix A is, furthermore, Hermitian and positive definite,such that this system of linear equations can be solved using the Cholesky decomposition,followed by forward and backward substitution. The algorithm described in [GVL96]decomposes an N ×N matrix A into a lower-triangular matrix C, such that CCH = Aand requires the following operations:

N(N − 1)2 +

N(N − 1)(N − 2)6 multiplications,

N(N − 1)2 +

N(N − 1)(N − 2)6 additions,

N(N − 1)2 divisions,

N square roots.

This results in an effort of (N2 − N)/2 + (N3 − 3N2 + 2N)/6 FLOPs, where there areN square roots and (N2 − N)/2 divisions to be computed. To finally solve the systemof linear equations Au = v, two other systems, Cw = v and CHu = w, are solvedsubsequently. This requires

N(N − 1)2 multiplications,

N(N − 1)2 additions,

N divisions,

for each of the systems. Hence, the overall computational effort for solving Au = v resultsin

OGS (N) = N3 + 6N2 − 7N6 (D.2)

FLOPs, where (N2 +3N)/2 quotients and N square roots have to be computed (assumingN > 1).

D.2 Adaptive Filters for System Identification andAcoustic Echo Cancellation

In this section, an efficient implementation of the adaptation algorithms is discussed.To this end, the implementation of the multiple-input multiple-output (MIMO) finite

D.2 Adaptive Filters for System Identification and Acoustic Echo Cancellation 245

impulse response (FIR) filtering described by (3.55) and (3.56) is explained first, beforethe algorithm-specific parts are treated.

The MIMO filtering described by (3.55) and (3.56) is a crucial building block for theimplementation of a wave-domain adaptive filtering (WDAF) system, as it is also used forthe realization of the transforms, as described later in Appendix D.3. The loudspeaker-enclosure-microphone system (LEMS) considered in (3.55) has NL input signals, whichare filtered by filters of length LH to obtain NM output signals. When straightforwardlyimplementing time-domain convolution, computing each of the NM output samples pertime instant k would result in NLLH multiplications and NLLH − 1 additions, where oneadditional addition is needed to compute the error. Thus, the effort of time-domainMIMO filtering, as represented by (3.55) with ∼n(k) = 0, is given by

OTD conv = LHLDNLNM (D.3)

FLOPs, when a block of LD time samples of the output signal is to be computed. Thechoice of quantizing the computation the LD time samples will be apparent later. While(D.3) looks quite compact, it can result in a tremendous effort, when LH is large.

The time-domain convolution can also be facilitated by a discrete Fourier transform(DFT)-domain multiplication, like it is described by (3.152), which can reduce the com-putational effort. To avoid describing a circular convolution instead of the desired lin-ear convolution, the so-called overlap-save technique can be used. To this end, at leastLD+LH−1 samples of each input signal and the coefficients describing each of the impulseresponses (zero-padded to the same length) are transformed to the DFT-domain. There,LD +LH−1 DFT bin-wise matrix multiplication between NM×NL matrices capturing thefilter coefficients, and NL component vectors capturing the input signals, are computed.The resulting vectors are transformed back to the discrete-time domain and truncated tothe last LD time samples. For convenience, LX = LD + LH − 1 is used it the following.

Although a MIMO filtering through NLNM paths is described, all signals have only tobe transformed once, which reduces the impact of the DFT on the overall complexity.When using the fast Fourier transform (FFT), the computational effort for transformingthe input and output signals is approximately LX log2(LX)(NL +NM). Finally, obtainingLD time samples resulting from a “fast MIMO convolution” requires

OFD conv = LX log2(LX)(NL +NM)︸︷︷︸FFT

+ LXNLNM︸︷︷︸DFTD multiplication

(D.4)

FLOPs. Note that there might be an additional effort of LX log2(LX)NLNM FLOPs nec-essary to transform to time-domain filter coefficient. Comparing (D.3) and (D.4) it can beseen that the fast MIMO convolution becomes more attractive, once LD + LH � LDLH,which is the typical case for the scenarios considered in this thesis. Still, implementationsmay follow slightly different schemes than described above to maximize the efficiency onreal-world computer hardware [WB10].

A fundamental difference between both considered convolution approaches is that time-domain MIMO convolution can be conducted for each time-instant of the output signal

246

separately, while the DFT-domain MIMO convolution requires the block-wise processing.As the latter is also inherent for the adaptation algorithms described in Sec. 3.4, this isnot considered to be a drawback.

The approximative models described in Sec. 3.2 do not describe all NLNM couplings ofthe LEMS, but only mH of them. In this case, (D.3) and (D.6) reduce to

OTD conv = mHLHLD, (D.5)OFD conv = LX (log2(LX)(N ′L +NM) +mH) (D.6)

FLOPs, respectively, where N ′L describes the number of input wave-field componentsthat are actually coupled to any of the output wave-field components, assuming that alloutput signal component are coupled to at least one of the input signal component. Theadditional effort for transforming the filter coefficients for (D.6) would be LX log2(LX)mH

FLOPs.In the following, the algorithm-specific implementation for adaptive filters without and

with approximative models is discussed. The implementation of the improved systemidentification is not discussed, as the practical relevance of an efficient implementationof the approximative models is considered to be higher. The computational effort fordetermining the a priori error signal was already discussed above, as this signal has to becomputed for all considered algorithms.

D.2.1 Least mean squares algorithmThe least mean squares (LMS) algorithm, as described Sec. 3.4.1 is often used in practicedue to its low computational effort and its simplicity. However, the results presentedin Appendix C identify this algorithm as unsuitable for MIMO scenarios such that thefollowing considerations are primarily included for comparison purposes.

When considering (3.62), it can be identified as a MIMO convolution of the errorsignal with a complex conjugate and reversed block of the loudspeaker signals. Hence, animplementation can be facilitated as described above, where the NMNL impulse responsesof length LH substitute the output signals and the NM error signals of length LD substitutethe filter coefficients:

∼h(n) =

∼h(n− 1) + µsiWH

10∼XH(k)WH

01∼e′si(k). (D.7)

The resulting number of FLOPs is then

OTD LMS = NMNLLHLD, (D.8)OFD LMS = LX log2(LX)(NM +NMNL) + LXNMNL, (D.9)

for a time-domain and a DFT-domain implementation of the algorithm, respectively,where it was assumed that the loudspeaker signals where already available in the DFT-domain due to determining the error signal as described above.

D.2 Adaptive Filters for System Identification and Acoustic Echo Cancellation 247

As (3.65) describes only the update for mH couplings, the computational effort isreduced to

O(app)TD LMS = mHLHLD, (D.10)

O(app)FD LMS = LX log2(LX)(NM +mH) + LXmH (D.11)

FLOPs, when using approximative models. This result can be straightforwardly verifiedby considering the windowing by Vsi before the actual computations.

D.2.2 Affine projection algorithmWhen comparing (3.73) to (3.62), it can be seen that the affine projection algorithm(APA) differs from the LMS algorithm only by using X†(k) instead of

∼XH(k). How-

ever, X†(k) does not describe a convolution matrix, which precludes to exploit this prop-erty for an efficient implementation. Instead, (3.74) may be split up differently, where( ∼X(k)


)−1is first multiplied by ∼e′si(k). As

( ∼X(k)


)−1

is square, the result of this multiplication can then used straightforwardly, as ∼e′si(k) wasused for the LMS algorithm. The computational effort for APA results then in

OTD APA = OTD LMS + L2DLHNL︸︷︷︸

∼X(k)

∼XH(k)

+L3

D + 6L2D − 7LD

6︸︷︷︸(∼X(k)

∼XH(k)

)−1∼e′si(k)

(D.12)

FLOPs and, additionally, (L2D + 3LD)/2 quotients and LD square roots, where a Cholesky

decomposition was used to determine the product( ∼X(k)


)−1 ∼e′si(k) (seeAppendix D.1) and all redundancy has been removed. For small LD, this results in amoderate computational effort, while for larger LD other, more efficient variants can bemore favorable [GT95].

Although approximative models aim at reducing the computational effort for adaptivefiltering, replacing the inverse

( ∼X(k)

∼XH(k)

)−1by

( ∼X(k)VT

siVsi∼XH(k) + γsiXR(k)

)−1, as

in (3.80), actually increases the computational effort. This is because the redundancy inthis matrix is removed, while the dimensions of the matrix to be inverted remain the same:The result of

∼X(k)

∼XH(k) is an NM-times block-diagonal repetition of the same matrix,

which has only to be inverted once. The result of∼X(k)VT

siVsi∼XH(k), in the contrary,

constitutes a block-diagonal arrangement of NM potentially different matrices, which allneed to be inverted. Hence, this algorithm does not represent a suitable candidate forapproximative models and the resulting computational effort is given by

O(app)TD APA = O

(app)TD LMS + L2

DLHN′L +NM

L3D + 6L2

D − 7LD

6 (D.13)

FLOPs, where NM(L2D + 3LD)/2 quotients and NMLD square roots have to be determined

additionally.

248

D.2.3 Generalized frequency-domain adaptive filtering algorithmThe generalized frequency-domain adaptive filtering (GFDAF) algorithm has been shownbe suitable for AEC with a large number of inputs and outputs [BBK03, SSK12, SK11].For the GFDAF algorithm the matrix S(sp)

XX(n) has to be computed according to (3.149),which requires

OS(sp)XX

(n) = N2LLX (D.14)

FLOPs, given the loudspeaker signals are already available in the DFT-domain due tocomputing (3.152).

When implementing the GFDAF algorithm, the product between(S(sp)

XX(n))−1

and( ∼X(k)

)His computed first. This is facilitated by solving LX systems of linear equa-

tions, as described in Appendix D.1, where the redundancy in(S(sp)

XX(n))−1

resulting from(3.36) is exploited. The resulting effort is given by LX(N3

L + 6N2L − 7NL)/6 FLOPs,

LX(N2L + 3NL)/2 quotients and LXNL square roots, where NM has no influence on the

complexity of this step. When choosing to implement the GFDAF according to (3.151)this product is represented by

(S(sp)

XX(n) + γsiSR(n))−1

W10W10H( ∼X(k)

)Hsuch that an

additional windowing has to applied to( ∼X(k)

)H, which requires 2NMNL FFTs of length

LX. The overall effort for the adaptation of the filters amounts then to

OGFDAF1 = OFD LMS +N2LLX + LX

N3L + 6N2

L − 7NL

6 + 2NMNLLX log2(LX) (D.15)

FLOPs, LX(N2L + 3NL)/2 quotients, and LXNL square roots.

When implementing the GFDAF algorithm using (3.154), 2NMNL length LX FFTs todescribe the product W10W10

H can be avoided, resulting in an effort of


N3L + 6N2

L − 7NL

6 (D.16)

FLOPs, LX(N2L + 3NL)/2 quotients and LXNL square roots.

When implementing the GFDAF according to (3.155), two further transform steps forthe filter coefficients can be avoided: one is omitted in (3.155), the other by computing(3.156) instead of (3.152). This results in an effort of


N3L + 6N2

L − 7NL

6 − 2NMNLLX log2(LX) (D.17)

FLOPs, LX(N2L + 3NL)/2 quotients, and LXNL square roots. Note that the subtraction

in (D.17) is necessary as OFD LMS and (D.3) already considers the dispensable transforms.In any case, the solution of the LX systems of linear equations it the most expensive

processing step for this algorithm. Fortunately, these systems can be solved independentlyand parallel. This allows for a convenient implementation on multi-core architectures[SSK12].

D.3 Wave-Domain System Model 249

When implementing the GFDAF algorithm for approximative models, there is lessredundancy in S(sp)

XX(n), which increases the number of submatrices that have to be deter-mined by a factor of NM. However, these submatrices are typically significantly smallerthan those for non-approximative models. While the actual dimensions of the submatri-ces can vary, it will be assumed the each single wave-domain microphone signal is simplycoupled to NH wave-domain loudspeaker signals. This allows for deriving results whichcan be easily compared to those derived above. Determining S(sp)

XX(n) then implies aneffort of

OS(sp)XX

(n) = NMN2HLX (D.18)

FLOPs for approximative models. Similarly, instead of LX systems of linear equations,NMLX systems have to be solved, where the individual systems are significantly smaller.Thus, obtaining a solution for one system requires only (N3

H + 6N2H − 7NH)/6 FLOPs,

(N2H + 3NH)/2 quotients and NH square roots, for each. Which results in an overall effort

for implementing the GFDAF according to (3.157) of

O(app)GFDAF1 = O

(app)FD LMS +NMN

2HLX +NMLX

N3H + 6N2

H − 7NH

6 + 2NMNHLX log2(LX)(D.19)

FLOPs, NMLX(N2H + 3NH)/2 quotients, and NMLXNH square roots. When using (3.154)

or (3.155) for approximative models, the effort is reduced by 2NMNHLX log2(LX) FLOPsor 4NMNHLX log2(LX) FLOPs, respectively.

D.3 Wave-Domain System ModelAs mentioned in Sec. 3.1, choosing a signal model according to Fig. 3.2 is favorable, whenan AEC should be realized, while a signal model according to Fig. 3.3 will be suitablechoice for LRE.

In any case, multiple transforms have to be implemented, where those transforms arerepresented by MIMO FIR filters of length LT, which can be implemented as describedabove. So far, it was assumed the loudspeaker signal transform (LST) had NL inputand NL output signals, where this choice was made because this covers all applicationscenarios. In order to efficiently implement an AEC this is not always the optimal choice.When considering an approximative wave-domain AEC with a lower number of micro-phone channels than loudspeaker channels, not all components of the wave-domain loud-speaker signals will be coupled to the microphone signals. Hence, an efficient transformwill not have NL but only N ′L outputs, which results in

OTD LST = NLN′LLTLF, (D.20)

OFD LST = (LF + LT − 1) log2(LF + LT − 1)(NL +N ′L) + (LF + LT − 1)NLN′L (D.21)

250

FLOPs, when a block of LF samples should be computed, using time-domain convolutionor fast convolution, respectively. For low values of LT, using time-domain convolutioncan lead to a computational advantage.

Similarly, the computational effort for computing the microphone signal transform(MST) and it inverse is, respectively, given by

OTD MST = N2MLTLF, (D.22)

OFD MST = (LF + LT − 1) log2(LF + LT − 1)(2NM) + (LF + LT − 1)N2M (D.23)

FLOPs.The choice of computing blocks of LF samples is motivated by the described adaptation

algorithms, which operate with data vectors at the time instants k = nLF. Hence, usingthis block length leads to a synchronized processing in all processing steps. For a real-timeimplementation this allows for scheduling the same amount of time for each processingblock, which will then cause a respective delay of LF samples, each. While this choiceallows for a convenient implementation, it will be mainly suited for relatively large LF.When small LF should be used in combination with larger LT, more sophisticated solutionmight be preferred [WV13, WB10].

When implementing offline simulations, a block-wise processing will only be necessarywhen considering the output signals of adaptive filters. Thus, computing as many MIMOfiltering operations considering the signals in a whole can reduce the computational efforton cost of an increased memory consumption.

D.4 Adaptive EqualizersIn this section, the computational effort for determining the equalizers is analyzed, whereonly the filtered-x generalized frequency-domain adaptive filtering (FxGFDAF) algorithmand the iterative DFT-domain inversion (IDI) algorithm are considered for brevity.

For the implementation of the FxGFDAF algorithm, the filtered-x signals have first tobe computed according to (4.61), which is a MIMO filtering operation with NL inputsand N2

LNM outputs. Considering output signals are of length LD, results in

OTD Z(k) = LHN2LNMLZ, (D.24)

OFD Z(k) = LX log2(LX)(N2LNM +NLNM +NL) + LXN

2LNM (D.25)

FLOPs, when using a time-domain or a frequency-domain implementation, respectively.For the latter result, it has been assumed that the identified LEMS has first to be trans-formed to the DFT domain. Note that the length LX used for equalizer determination isin general different from the length LX considered for system identification.

When using approximative models, as described in Sections 3.2 and 4.2, the compu-tational effort for this step is reduced. In the following, it will be assumed that theapproximative LEMS model and equalizers structure described by (3.34) and (4.31) with

D.4 Adaptive Equalizers 251

NH = NG. For this configuration, only NE = NG(NH− 1)/2 components for the filtered-xsignals non-zero such that the computational effort for this step is reduced to

OTD Z(k) = LHNLNELZ, (D.26)OFD Z(k) = LX log2(LX)(NLNE +NHNM +NL) + LXNLNE (D.27)

FLOPs, when using a time-domain or a frequency-domain implementation, respectively.Next, the desired microphone signals have to be determined, as they cannot simply be

observed like for system identification. This implies additionally

OTD des = LHNLNMLD, (D.28)OFD des = LX log2(LX)NM + LXNLNM (D.29)

FLOPs for time and DFT-domain implementations, respectively. In the latter case, it wasassumed that the excitation signals and desired impulse responses are already availablein the DFT-domain due to previous computations. This justifies choosing the length ofthe DFT to be LX, which could also be reduced. For approximative models, the strictdiagonality of the desired MIMO impulse response reduces the effort to

OTD des = LGNMLD, (D.30)OFD des = LX log2(LX)NM + LXNM (D.31)

FLOPs.Given that the excitation signals are uncorrelated, it is possible to determine the equal-

izers for each loudspeaker/excitation signals separately, which results in a reduced com-putational effort. In the following, four cases are considered:

1. Simultaneous determination of all equalizers with non-approximative models

2. Determination of the equalizers for each excitation signal channel separately withnon-approximative models

3. Simultaneous determination of all equalizers with approximative models

4. Determination of the equalizers for each excitation signal channel separately withapproximative models

For an implementation of the FxGFDAF algorithm, five subtasks have to be fulfilled:

• Determination of the a priori error signals, as described by (4.65), where the effortis described by OTD e′eq(n)

• Computing S(sp)ZZ (n), as described by (4.77), facilitated by OFD S(sp)

ZZ(n) FLOPs

• Cholesky decomposition of S(sp)ZZ (n) with OChol FLOPs

252

• Solving the resulting system of equations by forward and backward substitution,implying OSubst FLOPs

• MIMO filtering, which is facilitated by OMIMO conv FLOPsThe overall effort is then given by

OFxGFDAF = OTD e′eq(n) +OFD S(sp)ZZ

(n) +OChol +OSubst +OMIMO conv. (D.32)

Since the following steps belong to the FxGFDAF algorithm, only DFT-domain imple-mentations are considered.

Combined determination with non-approximative models

OFD e′eq(n) = LZ log2(LZ)(N2LNM +N2

L +NM) + LZN2LNM, (D.33)

OFD S(sp)ZZ

(n) = LZNMN4L, (D.34)

OChol = LZN6

L −N2L

6 ), (D.35)

OSubst = LZNM(N4L −N

2L, (D.36)

OMIMO conv = LZ log2(LZ)(N2L +NM) + LZNMN

2L (D.37)

FLOPs. Additionally, there are (N4L − N2

L)/2 + 2LZNMN2L divisions and LZN

2L square

roots to be computed.

Separated determination with non-approximative models

OFD e′eq(n) = LZ log2(LZ)(N2LNM +N2

L +NLNM) + LZN2LNM, (D.38)

OFD S(sp)ZZ

(n) = NMLZN3L, (D.39)

OChol = LZN4

L −N2L

6 , (D.40)

OSubst = LZNM(N3L −N

2L), (D.41)

OMIMO conv = LZ log2(LZ)(N2L +NMNL) + LZNMN

2L (D.42)

FLOPs. Additionally, there are LZ(N3L −N

2L)/2 + 2LZNMN

2L divisions and LZN

2L square

roots to be computed.

Combined determination with approximative models

OFD e′eq(n) = LZ log2(LZ)(NLNE +NLNG +NM) + LZNLNE, (D.43)OFD S(sp)

ZZ(n) = NMLZN

2LN

2G, (D.44)

OChol = LZN3

LN3G −NLNG

6 , (D.45)

OSubst = LZNM(N2LN

2G −NLNG), (D.46)

OMIMO conv = LZ log2(LZ)(NLNG +NM) + LZNMNLNG (D.47)

D.4 Adaptive Equalizers 253

FLOPs. Additionally, there are N2LN

2G−NLNG

2 + 2LZNMNLNG divisions and NLNG squareroots to be computed.

Separated determination with approximative models

OFD e′eq(n) = LZ log2(LZ)(NLNE +NLNG +NLNM) + LZNLNE, (D.48)OFD S(sp)

ZZ(n) = NHLZNLN

2G, (D.49)

OChol = LZNLN3

G −NG

6 , (D.50)

OSubst = LZNMNL(N2G −NG), (D.51)

OMIMO conv = LZ log2(LZ)(NLNG +NLNM) + LZNMNLNG (D.52)

FLOPs. Additionally, there are LZNL(N2G−NG)/2+2LZNMNLNG divisions and LZNLNG

square roots to be computed.For the implementation of the IDI algorithm, as described in Sec. 4.5.3, the individual

channels of the original loudspeaker signals are considered separately by definition of thealgorithm. The actual implementation of this algorithm is very similar to the procedurefor the FxGFDAF algorithm described above:

• Determination of the difference vector of the equalized LEMS and the desired im-pulse response as described by (4.111), where the effort is denoted by OTD e′eq(n)

• Computing(H′′(n)

)HH′′(n), facilitated by OFD S(sp)

ZZ(n) FLOPs

• Cholesky decomposition of(H′′(n)

)HH′′(n) with OChol FLOPs

• Solving the resulting system of equations by forward and backward substitution,implying OSubst FLOPs

• MIMO filtering, and back-transform to the time domain by OMIMO conv FLOPs

Non-approximative models

OFD e′eq(n) = LGH log2(LGH)(NLNM +N2L) + LGHN

2LNM, (D.53)

OFD S(sp)ZZ

(n) = LGHNMN2L, (D.54)

OChol = LGHN4

L −N2L

6 , (D.55)

OSubst = LGHNM(N3L −N

2L), (D.56)

OMIMO conv = LGH log2(LGH)N2L + LGHN

2LNM (D.57)

FLOPs. Additionally, there are LGH(N3L − N2

L)/2 + 2LGHNMN2L divisions and LGHN

2L

square roots to be computed.

254

Approximative models

OFD e′eq(n) = LGH log2(LGH)(NHNM +NLNG) + LGHNLNGNH, (D.58)OFD S(sp)

ZZ(n) = LGHNHNLNG, (D.59)

OChol = LGHNLN2

G −NG

6 , (D.60)

OSubst = LGHNLNH(N2G −NG), (D.61)

OMIMO conv = LGH log2(LGH)NLNG + LGHNLNGNH (D.62)

FLOPs. Additionally, there are LGHNL(N2G − NG)/2 + 2LGHNMNLNG divisions and

LGHNLNG square roots to be computed.

D.5 Increasing Efficiency for Real-Valued Loudspeakerand Microphone Signals

If not stated otherwise, all derivations made so far are valid for complex-valued signals,although real-world loudspeaker and microphone signals are real-valued. This was doneto allow for a clear and general explanation, while an implementation can be limitedto process real-valued signals in order to reduce its computational demands. There aretwo ways facilitate this: reducing the number of considered complex-valued wave-domainsignal components or using real-valued wave-domain signals, which are both explained inthe following.

As described in Sec. 2.1.2, when describing real-valued wave fields using complex val-ued quantities, where is always a redundancy by a factor of two. When considering adescription of a wave field in terms of circular harmonics, e. g., the coefficients for positivemode orders are accompanied by their complex conjugate for the negative mode orders.This means, when

∼d(k) corresponds to NM real-valued microphone signals in d(k), the

(NM− 1)/2 signals corresponding to negative mode indices can be reconstructed by thosecorresponding to positive mode orders. This allows for reducing the number of consideredwave-domain microphone signal to (NM − 1)/2, where the computational savings for theadaptive filtering can be determined by replacing NM by (NM−1)/2 in(D.3) to(D.19). Thecost for the MST and its inverse are then also reduced by factor of 2NM/(NM − 1) ≈ 2.Although, there is the same redundancy in the wave-domain loudspeaker signals, thisproperty can only be exploited to reduce the computational effort for the LST, but notfor the adaptive filtering. In the latter case the redundancy could only be exploited,when obtaining the complex conjugate, which is a non-linear operation that cannot befacilitated by the adaptation algorithms under consideration.

Due to the described redundancy, is also possible to exchange each of the complex-valued wave-domain basis functions by a pair of real-valued functions such that a real-valued wave-domain signals are obtained. Since every non-redundant signal is replacedby two signals, the number of wave-domain signal quantities remains the same. However,

D.5 Increasing Efficiency for Real-Valued Loudspeaker and Microphone Signals 255

for implementations in the time-domain only real-valued computations would have to becomputed, which is significantly faster. For DFT-domain implementations, the compu-tations would still have to be conducted for complex values. However, only the positivefrequency axis would have to be considered, as the negative frequency axis is redundant.This would reduce the computational effort roughly by one half. Furthermore, an imple-mentation using real-valued signals might be preferable due to the limited availability oflibraries for efficient computations with and the convenient handling of complex valuednumbers.

257

Abbreviations and Acronyms

AEC acoustic echo cancellationANC active noise controlAPA affine projection algorithmASR automatic speech recognitionCPU central processing unitDFT discrete Fourier transformDTD double talk detectionERLE echo return loss enhancementFFT fast Fourier transformFIR finite impulse responseFLOP floating-point operationFxGFDAF filtered-x generalized frequency-domain adaptive filteringGFDAF generalized frequency-domain adaptive filteringGUI graphical user interfaceHOA Higher-Order AmbisonicsIDI iterative DFT-domain inversionIIR infinite impulse responseLEMS loudspeaker-enclosure-microphone systemLMS least mean squaresLRE listening room equalizationLST loudspeaker signal transformLTI linear time-invariantMIMO multiple-input multiple-outputMISO multiple-input single-outputMMSE minimum mean square errorMST microphone signal transformNLMS normalized least mean squaresPDF Portable Document FormatPSD power spectral densityRLS recursive least squaresSISO single-input/single-outputSNR signal-to-noise ratioSIR signal-to-interference ratioSVD singular value decompositionUCA uniform circular array

258

WDAF wave-domain adaptive filteringWFS wave field synthesis

259

Conventions, Operations, andMathematical Symbols

Notational Conventions• Any vector describing a position (or, as the case may be, a direction) is denoted by

an arrow above the symbol, e. g., ~x.

• Apart from that, matrices and vectors are denoted by bold-faced upper-case andlower-case letters, respectively.

• Continuous-frequency-domain and DFT-domain quantities are underlined.

• If not stated otherwise, signals and impulse responses are considered in the discrete-time domain, while quantities related to the sound pressure are considered in thecontinuous frequency domain.

• Wave-domain quantities are denoted by a tilde, with exception of the wave number∼k and related quantities.

• Italic subscripts denote variables.

Operators

(·)∗ Complex conjugate of a quantity(·)−1 Inverse of a matrix(·)† Moore-Penrose pseudoinverse of a matrix(·)H Conjugate transpose of a vector or a matrix(·)T Transpose of a vector or a matrixRe (·) Real part of a quantityd·e Smallest following integer of a scalarb·c Largest preceding integer of a scalar‖·‖2 Euclidean norm of a vectorvec (A) Single column vector capturing all column vectors of ADiag (u) Matrix capturing u on its main diagonal‖·‖F Frobenius norm of a matrix

260

argminx{f (x)} Argument x minimizing f (x)

minx{f (x)} Value of f (x) minimized with respect to x

maxx{f (x)} Value of f (x) maximized with respect to x

min (x, y) minimum value of x and y

max (x, y) maximum value of x and y

〈u ,v〉 Scalar or inner product between the vectors u and vA⊗B Kronecker product of A and B[A]ζ,η Addresses the entry located in row ζ and column η of A

Special matrices

FZ Z × Z DFT matrixIZ Z × Z identity matrix0Z×H Z ×H matrix or vector capturing only zero values1Z×H Z ×H matrix or vector capturing only uni values

Mathematical SymbolsThe following list contains all variables that have been used on multiple non-adjacentpages. Variables sharing the same letter are ordered by their first occurrence. Note thatj is used as imaginary unit (j2 = −1) and the use of i is generally avoided.

α Azimuth in cylindrical and spherical coordinates (see p. 7)α

(L)λ

Azimuth of loudspeaker position (see p. 30)α

(M)µ Azimuth of microphone position (see p. 31)βsi Weight parameter for cost-guided system identification (see p. 125)β1 Weight parameter for cost-guided system identification and equalizer

determination (see p. 125)β2 Weight parameter for cost-guided system identification and equalizer

determination (see p. 125)βeq Weight parameter for cost-guided equalizer determination (see p. 176)δ3 (~x) Three-dimensional Dirac distribution (see p. 15)δ1 (x) One-dimensional Dirac distribution (see p. 15)δ(k) Kronecker delta (see p. 71)η Column index of a matrix (see p. 76)ε Regularization parameter (see p. 125)γsi Weight parameter for the regularization of algorithms for system

identification (see p. 107)

261

γeq Weight parameter for the regularization of algorithms determiningthe equalizers (see p. 177)

κ Discrete time in samples (alternative notation) (see p. 66)λ Loudspeaker (signal) index (see p. 30)λ′ Index of equalized loudspeaker signal or alternative loudspeaker sin-

gal index (see p. 77)λsi Exponential “forgetting factor” for the GFDAF/recursive least

squares (RLS) algorithm (see p. 110)λeq Exponential “forgetting factor” for filtered-x variants of the

GFDAF/RLS algorithm (see p. 177)µ Microphone (signal) index (see p. 30)µsi Step size of adaptation algorithm for system identification (see p. 105)µeq Step size of adaptation algorithm for equalizer determination (see

p. 176)∇~x Nabla operator (see p. 8)ν Block time index (alternative notation) (see p. 110)ω Temporal angular frequency (see p. 10)ϕq Incidence angle of plane wave (see p. 131)% Radius in cylindrical coordinates (see p. 7)%

(L)λ

Radius of loudspeaker position in cylindrical coordinates (see p. 30)ϑ Elevation in spherical coordinates (see p. 7)ϑ

(L)λ

Elevation of loudspeaker position (see p. 30)ζ General row index of a matrix (see p. 76)

~0 Coordinate origin (see p. 9)

Ak Matrix-valued coefficient of z-domain transfer function (see p. 69)A General matrix (see p. 87)

Bm(∼k%)

Inner boundary conditions for cylindrical and circular harmonics (seep. 26)

bn(∼k r

)Inner boundary conditions for spherical harmonics (see p. 27)

B(L)(ω) One-dimensional loudspeaker signal backward coupling (see p. 34)B(M)(ω) One-dimensional microphone signal backward coupling (see p. 34)∼B

(L)∼l

(ω) Contribution of loudspeaker to backward-traveling wave (see p. 39)B(L)(ω) MIMO frequency response describing coupling of loudspeakers signals

to backward-traveling wave (see p. 41)

262

B(M)(ω) MIMO frequency response describing coupling of microphone signalsto backward-traveling wave (see p. 42)

B General matrix (see p. 87)

c Speed of sound (see p. 8)C

(ci)m

(∼kz, ω) Circular harmonics wave-field coefficient (see p. 27)

C(sp)m,n

(ω) Spherical harmonics wave-field coefficient (see p. 27)C General matrix (see p. 87)∼C(n) Cost-guidance matrix for system identification (for algorithm deriva-

tion) (see p. 102)∼C′(n) Cost-guidance matrix for system identification (for illustration) (see

p. 102)C(n) Cost-guidance matrix for equalizer determination (see p. 176)C(n) Cost-guidance matrix for equalizer determination in the DFT domain

(see p. 180)

~D(~x, ω) Force applied to infinitesimal volume portion of medium (see p. 10)d(pw) Distance of plane or planar source to the coordinate origin (see p. 17)D(ω) One-channel microphone signal in continuous frequency domain (see

p. 32)D µ(ω) Microphone signal in continuous frequency domain (see p. 32)d(ω) Multichannel microphone signal in continuous frequency domain (see

p. 42)∼Dm(ω) Wave-domain single-channel microphone signal in continuous fre-

quency domain (see p. 47)∼d(ω) Wave-domain microphone signal in continuous frequency domain (see

p. 47)dµ(k) Single-channel microphone signal (see p. 71)d(k) Multichannel microphone signal (see p. 72)∼d(k) Wave-domain multichannel microphone signal (see p. 73)∼dm(k) Wave-domain single-channel microphone signal (see p. 73)d(k) Estimated multichannel microphone signal (see p. 80)∼d(k) Estimated wave-domain multichannel microphone signal (see p. 82)D General matrix (see p. 87)

~eα Unit vector in azimuthal direction (see p. 8)~eϑ Unit vector in elevation direction (see p. 8)~e% Unit vector in radial direction of cylindrical coordinates (see p. 8)

263

~er Unit vector in radial direction of spherical coordinates (see p. 8)~ex Unit vector in Cartesian x-coordinate direction (see p. 8)~ey Unit vector in Cartesian y-coordinate direction (see p. 8)~ez Unit vector in Cartesian z-coordinate direction (see p. 8)e′(ω) Listening area error signal in continuous frequency domain (see p. 52)∼Em,l Energy of wave-domain LEMS couplings (see p. 62)Ediag Energy captured in the diagonal of the LEMS model (see p. 62)esi(k) Multichannel a posteriori error signal for system identification (see

p. 80)∼esi(k) Wave-domain multichannel a posteriori error signal for system iden-

tification (see p. 83)ERLE(k) Echo return loss enhancement (ERLE) (see p. 86)∼esi,app(k) Wave-domain multichannel a posteriori error signal for system iden-

tification with limited models (see p. 97)∼e′si(k) Wave-domain multichannel a priori error signal for system-

identification algorithm derivation (see p. 104)∼eeq(k) Wave-domain multichannel a posteriori error signal for LRE (see

p. 157)∼eir(n) Difference between desired wave-domain MIMO impulse response and

equalized LEMS (see p. 170)e(n) Multichannel a posteriori error signal in filtered-x structure (see

p. 171)e′eq(n) Multichannel a priori error signal for equalizer determination (see

p. 175)∆eq Delay introduced in desired impulse responses (see p. 187)eM(n) Signal-dependent LRE error at the optimization positions (see p. 189)eI(n) Signal-dependent LRE error in the listening area (see p. 189)eE(n) LRE error outside the listening area (see p. 189)

f (x) General function (see p. 15)F (L)(ω) One-dimensional loudspeaker signal forward coupling (see p. 34)F (M)(ω) One-dimensional microphone signal forward coupling (see p. 34)∼F

(L)∼l

(ω) Contribution of loudspeaker to sound pressure of forward travelingwave (see p. 39)

F(L)(ω) MIMO frequency response describing coupling of loudspeaker signalsto forward-traveling wave (see p. 41)

F(M)(ω) MIMO frequency response describing coupling of microphone signalsto forward-traveling wave (see p. 42)

fs Sampling frequency (see p. 71)

264

G(~x|~x0,∼k) Free-field Green’s function between two positions (see p. 15)

GR,λ,q(ω) Single-channel frequency response of the reproduction system (seep. 50)

G(ω) Wave-domain MIMO frequency response of equalizers in continuousfrequency domain (see p. 52)

G(z) Equalizer transfer function in the z-domain (see p. 67)GR MIMO impulse response of rendering system as convolution matrix

(see p. 77)gR,λ,q(k) Single-channel impulse response of response of the rendering system

(see p. 78)G(n) MIMO impulse response of equalizers as convolution matrix (see

p. 155)∼G(n) Wave-domain MIMO impulse response of equalizers as convolution

matrix (see p. 157)∼g(n) Wave-domain MIMO impulse response of equalizers (see p. 159)∼gC(n) Guiding coefficients for cost-guided adaptation algorithms (see

p. 176)

Hm (x) Hankel function of first kind (see p. 12)hn (x) Spherical Hankel function of first kind (see p. 13)H(ω) Single-channel frequency response of LEMS (see p. 32)∼H(B)(ω) MIMO frequency response describing backward wave propagation

(see p. 39)∼H(F)(ω) MIMO frequency response describing forward wave propagation (see

p. 39)∼H(L)(ω) MIMO frequency response describing scattering by outer boundary

conditions (see p. 39)∼H(M)(ω) MIMO frequency response describing scattering by inner boundary

conditions (see p. 39)H µ,λ(ω) Single-channel frequency response of true LEMS (see p. 43)H(ω) MIMO frequency response of true LEMS (see p. 43)∼H m,l(ω) Single-channel wave-domain frequency response of LEMS (see p. 46)H 0(ω) Desired wave-domain MIMO frequency response of LEMS as convo-

lution matrix (see p. 53)H MIMO impulse response of LEMS as convolution matrix (see p. 72)hµ,λ(k) Single-channel impulse response of true LEMS (see p. 72)Hµ,λ Single-channel impulse response of true LEMS as convolution matrix

(see p. 72)∼H Wave-domain MIMO impulse response of true LEMS as convolution

matrix (see p. 73)

265

∼hm,l(k) Wave-domain single-channel impulse response of true LEMS (see

p. 73)H(n) MIMO impulse response of estimated LEMS as convolution matrix

(see p. 80)∼H(n) Wave-domain MIMO impulse response of estimated LEMS as convo-

lution matrix (see p. 82)∆h(n) Normalized system misalignment (see p. 82)∼h(n) Wave-domain MIMO impulse response of estimated LEMS (see p. 86)∼hm,l(k, n) Wave-domain single-channel impulse response of estimated LEMS

(see p. 87)hm,l(n) Single-channel impulse response of estimated LEMS (see p. 87)∼Hm,l(n) Estimated wave-domain MIMO impulse response of LEMS as convo-

lution matrix (see p. 88)∼h Wave-domain MIMO impulse response of true LEMS (see p. 90)∼hamb Wave-domain MIMO impulse response representing the identification

ambiguity of the LEMS (see p. 94)h(n) MIMO impulse response of estimated LEMS (see p. 104)∼hC(n) Guiding MIMO impulse response for cost-guided adaptation algo-

rithms (see p. 106)∼h(n) DFT-domain representation of estimated wave-domain MIMO im-

pulse response of estimated LEMS (see p. 119)H0 Desired MIMO impulse response of LEMS (see p. 155)∼H0 Desired wave-domain MIMO impulse response of LEMS impulse re-

sponse as convolution matrix (see p. 157)∼h0,m,l(k) Desired impulse response from loudspeaker wave field component l

to microphone wave field component m (see p. 159)∼H′′ Wave-domain MIMO impulse response of true LEMS as convolution

matrix (see p. 164)h0 Desired wave-domain MIMO impulse response of of LEMS (see

p. 169)H′′(n) Wave-domain MIMO impulse response of estimated LEMS (used for

equalizer determination) (see p. 169)H0 Desired wave-domain MIMO LEMS impulse response as convolution

matrix (see p. 172)H′′(n) Wave-domain matrix representation of true LEMS in the DFT-

domain (see p. 181)HI MIMO impulse response of true LEMS considering the inner micro-

phone array as convolution matrix (see p. 185)HE MIMO impulse response of LEMS considering the outer microphone

array as convolution matrix (see p. 186)

266

Jm (|x|) Bessel function of first kind (see p. 12)jn (|x|) Spherical Bessel function of first kind (see p. 13)JAPA(n) Cost function of the APA algorithm (see p. 107)JRLS(n) Cost function of RLS algorithm (see p. 110)JIDI(n) Cost function of IDI algorithm (see p. 181)

∼k Wave vector (see p. 8)∼kα Wave vector component in azimuth-direction (see p. 8)∼kϑ Wave vector component in elevation-direction (see p. 8)∼k% Wave vector component in radial-direction (see p. 8)∼kx Wave vector component in x-direction (see p. 8)∼ky Wave vector component in y-direction (see p. 8)∼kz Wave vector component in z-direction (see p. 8)k Discrete time in samples (see p. 10)∼k Wave number (see p. 10)∼k ′ Wave vector (alternative notation) (see p. 23)∼kal Wave number above which spatial aliasing occurs (see p. 52)

L(∼kx,

∼ky, z) Information lost due to plane wave decomposition of a wave field (see

p. 24)l Index of wave-domain loudspeaker signal (see p. 31)l Cylindrical or circular harmonics mode order (see p. 57)l′ Cylindrical or circular harmonics mode order (alternative notation)

(see p. 57)LH Length of true LEMS impulse responses (see p. 66)L′H Length of true LEMS impulse responses (alternative notation) (see

p. 66)LG Length of equalizer impulse responses (see p. 67)LX Length of loudspeaker signal segments (see p. 71)LD Length of microphone signal segments (see p. 72)∼LD Length of wave-domain microphone signal segments (see p. 73)∼LH Length of true wave-domain LEMS impulse responses (see p. 73)LT Length of transform impulse response (see p. 73)∼LX Length of wave-domain loudspeaker signal segments (see p. 73)LR Length of rendering system impulse response (see p. 78)LF Frame shift (see p. 80)LH Length of estimated LEMS impulse responses (see p. 82)

267

LY Length of equalized loudspeaker signal segments (see p. 155)l′ Index of equalized wave-domain loudspeaker signal (see p. 158)LGH Length of equalized LEMS impulse responses (see p. 165)LZ Length of filtered-x signal segments (see p. 172)

M(~x, ω) Density variation of medium (see p. 10)m Cylindrical or circular harmonics mode order (see p. 12)m Spherical harmonics mode order (see p. 13)∼m Index of wave-field component (see p. 38)m Index of wave-domain microphone signal (see p. 46)MH Definition matrix of approximative wave-domain LEMS model (see

p. 90)m

(H)m Number of loudspeaker signals coupled to the respective microphone

signal (see p. 98)

n Spherical harmonics mode order (see p. 13)~n(pw) Normal unit vector defining the orientation of a planar source (see

p. 17)~n (~x) Normal unit vector (see p. 18)NL Number of loudspeakers (see p. 30)NM Number of microphones (see p. 30)NC Number of considered wave field components used in wave-domain

LEMS model derivation (see p. 38)NS Number of physical or virtual acoustic signal sources (see p. 50)n(k) Multichannel noise signal (see p. 72)∼n(k) Wave-domain multichannel noise signal (see p. 73)n Block time index (see p. 80)NH Number of coupled loudspeaker signals per microphone signal of

LEMS model (see p. 91)NE Number of considered error signals per input signal of equalizer (see

p. 163)NG Number of coupled output signals per input signal of equalizer (see

p. 163)N ′L Number of considered loudspeaker signal wave-field components (see

p. 246)

O(m) Assignment function (see p. 59)OFD LMS Computational effort for an iteration of the LMS algorithm imple-

mented in the DFT domain (see p. 246)

268

O(app)FD LMS Computational effort for an iteration of the LMS algorithm imple-

mented in the DFT domain when using approximative models (seep. 247)

p(~x, t) Sound pressure in continuous time-domain (see p. 8)P (~x, ω) Sound pressure (see p. 10)P (pw)(~x,

∼kx,

∼ky,

∼kz) Sound pressure of a plane wave (see p. 11)

P(cy)m

(~x,∼k%,

∼kz) Sound pressure of a cylindrical harmonic (see p. 12)

P mn (x) Legendre polynomial (see p. 13)

P(sp)m,n

(~x,∼k) Sound pressure of a spherical harmonic (see p. 13)

P (pt)(~x, ~x0, ω) Sound pressure emitted by point source (see p. 16)~p

(L)λ

Loudspeaker position (see p. 30)~p

(M)µ Microphone position (see p. 30)

∼P

(B)∼m

(ω) Sound pressure of backward-traveling wave-field components (seep. 38)

∼p(B)(ω) Sound pressure of backward-traveling wave-field components (seep. 38)

∼P

(F)∼m

(ω) Sound pressure of forward-traveling wave-field component (see p. 38)∼p(F)(ω) Sound pressure of forward-traveling wave-field components (see p. 38)pC(k) Power correlated portion of the loudspeaker signals (see p. 227)

QD

(~x, ω) Dipole portion of the acoustic source distribution at position ~x (seep. 10)

QM

(~x, ω) Monopole portion of a source distribution (see p. 10)Q(ω) Source signal in continuous frequency domain (see p. 14)q Acoustic source index (see p. 50)Qq(ω) Source signal in continuous frequency domain (see p. 50)

q(k) Multichannel source signal (see p. 77)

r Radius in spherical coordinates (see p. 7)R Reflection factor of a plane (see p. 22)r0 Radius in spherical coordinates (see p. 27)RL Radius of loudspeaker array (see p. 30)RM Radius of microphone array (see p. 30)r

(L)λ

Radius of loudspeaker position in spherical coordinates (see p. 30)RXX Correlation matrix of loudspeaker signals (see p. 78)RXD Cross-correlation matrix of loudspeaker and microphone signals (see

p. 81)

269

∼RXX Correlation matrix of wave-domain loudspeaker signals for AEC (see

p. 88)∼rXD Cross-correlation vector of wave-domain loudspeaker and microphone

signals (see p. 88)RXX(n) Estimate of auto-correlation matrix (see p. 110)rXD(n) Scaled estimate of cross-correlation vector (see p. 110)∼RZZ Correlation matrix of wave-domain filtered-x signals (see p. 160)∼rZz0 Correlation matrix of filtered-x and desired signals (see p. 160)∼R′′XX Correlation matrix of wave-domain loudspeaker signals for LRE (see

p. 165)RI Radius of inner microphone array (see p. 185)RE Radius of outer microphone array (see p. 185)

S Surface (see p. 18)SM Surface capturing the microphone positions (see p. 38)SXX(n) Estimate of auto-power spectral density matrix of loudspeaker signals

(see p. 118)S(sp)

XX(n) Sparse approximation of estimated auto-power spectral density ma-trix of loudspeaker signals (see p. 120)

SR(n) Regularization matrix (see p. 121)ssi Initialization weight for estimated auto-power spectral density matrix

(see p. 129)S(sp)

ZZ (n) Sparse approximation of estimated auto-power spectral density ma-trix of filtered-x signals (see p. 178)

t Time in seconds (see p. 8)~t(~x) Traveling direction of a wave (see p. 11)TL(ω) MIMO frequency response of LST (see p. 45)T−1

M (ω) MIMO frequency response of inverse MST (see p. 45)T L,l,λ(ω) Single-channel frequency response of the LST (see p. 46)T M,µ,m(ω) Single-channel frequency response of the inverse MST (see p. 46)T−1

L (ω) MIMO frequency response of inverse the LST (see p. 47)TM(ω) MIMO frequency response of the MST (see p. 47)T M,m,µ(ω) Single-channel frequency response of the MST (see p. 56)TL MIMO impulse response of the LST as convolution matrix (see p. 76)TL MIMO impulse response of of the inverse LST as convolution matrix

(see p. 76)TM MIMO impulse response of the MST as convolution matrix (see p. 76)

270

TM MIMO impulse response of the inverse MST as convolution matrix(see p. 76)

u General vector (see p. 173)ug(n) Update of the IDI algorithm (see p. 181)

VQ Volume enclosing acoustic source distribution (see p. 15)V ′Q Infinitesimal volume portion of VQ (see p. 15)V Volume (see p. 18)Vsi Matrix to removing principal zero-values of estimated LEMS for ap-

proximative models (see p. 96)v General vector (see p. 173)Veq Matrix to removing principal zero-values of approximative equalizer

structures (see p. 176)

W10 Matrix facilitating time-domain windowing or zero-padding combinedwith DFT (see p. 117)

W01 Matrix facilitating time-domain windowing or zero-padding combinedwith DFT (see p. 117)

wc(n) Dynamic weight for cost-guided system identification (see p. 125)W01 Matrix facilitating time-domain windowing or zero-padding combined

with DFT (see p. 178)

~x General position, described using different coordinate systems (seep. 7)

x Cartesian x-coordinate, also used as general variable (see p. 7)~x0 Position of acoustic source or location within source distribution (see

p. 15)~xS Position of S ′ (see p. 17)X λ(ω) Multichannel loudspeaker signal in continuous frequency domain (see

p. 31)∼X

l(ω) Wave-domain multichannel loudspeaker signal in continuous fre-quency domain (see p. 31)

X(ω) One-channel loudspeaker signal in continuous frequency domain (seep. 32)

x(WL) One-dimensional wall position near loudspeaker (see p. 32)x(WM) One-dimensional wall position near microphone (see p. 32)

271

x(ω) Multichannel loudspeaker signal in continuous frequency domain (seep. 42)

x(k) Input of z-domain filter (see p. 66)x(k) Multichannel loudspeaker signal (see p. 71)xλ(k) Single-channel loudspeaker signal (see p. 71)∼x(k) Wave-domain multichannel loudspeaker signal (see p. 73)∼xl(k) Wave-domain single-channel loudspeaker signal (see p. 73)∼xl(k) Wave-domain single-channel loudspeaker signal (see p. 73)x(k) Back-transformed multichannel loudspeaker signal (see p. 76)∼X(k) Wave-domain multichannel loudspeaker signal as convolution matrix

(see p. 86)∼Xl(k) Wave-domain single-channel loudspeaker signal as convolution matrix

(see p. 87)X†(k) Pseudoinverse used in APA (see p. 107)XR(k) APA regularization matrix (see p. 107)∼X(k) Wave-domain multichannel loudspeaker signal in the DFT domain

(see p. 117)∼X′(k) Wave-domain multichannel loudspeaker signal as convolution matrix

(see p. 159)∼X′l(k) Wave-domain single-channel loudspeaker signal as convolution matrix

(see p. 159)x(k) Multichannel loudspeaker signal in filtered-x structure (see p. 169)

y Cartesian y-coordinate, also used as general variable (see p. 7)Y mn (ϑ, α) Spherical harmonics basis function (see p. 13)

y(ω) Equalized multichannel loudspeaker signal in continuous frequencydomain (see p. 53)

y(k) Equalized multichannel loudspeaker signals (see p. 155)∼y(k) Equalized wave-domain multichannel loudspeaker signal (see p. 157)∼yl′(k) Vector representation of single-channel equalized wave-domain loud-

speaker signal (see p. 158)

z Cartesian z-coordinate, also used as z-transform constant (see p. 7)∼Z(k) Multichannel filtered loudspeaker signal as convolution matrix (see

p. 160)∼z0(k) Multichannel filtered loudspeaker signal as convolution matrix (see

p. 160)Z(k) Multichannel filtered-x signal as convolution matrix (see p. 171)z0(k) Multichannel microphone signal in filtered-x structure (see p. 171)

272

Zm,l′,l(k) Single-channel filtered-x signal as convolution matrix (see p. 172)zm,l′,l(k) Single-channel filtered-x signal (see p. 172)z(k) Multichannel filtered-x signal (see p. 172)

273

List of Figures1.1 Signal model of an AEC system . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Signal model of an LRE system . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1 One-dimensional illustration of the quantities relevant for the derivation ofthe wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Vectors, volume, and surface, relevant for the Kirchhoff-Helmholtz integral 192.3 Radiation of a surface when a plane wave is excited . . . . . . . . . . . . . 212.4 Illustration of the image source model . . . . . . . . . . . . . . . . . . . . . 222.5 Exemplary array setups comprising a circular microphone array and a cir-

cular or a rectangular loudspeaker array . . . . . . . . . . . . . . . . . . . 312.6 One-dimensional exemplary LEMS comprising a single loudspeaker and a

single microphone and two walls . . . . . . . . . . . . . . . . . . . . . . . . 332.7 Logarithmic magnitude of frequency responses for an exemplary one-dimensional

LEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362.8 Logarithmic magnitudes of frequency response for an exemplary one-dimensional

LEMS obtained by an image source model of different order. . . . . . . . . 372.9 Wave-domain exemplary LEMS . . . . . . . . . . . . . . . . . . . . . . . . 392.10 Alternative example of LEMS to be modeled in the wave domain . . . . . 452.11 Roles of the wave-domain transforms with respect to the LEMS . . . . . . 472.12 Array setup and signal model for equalizing a reproduced acoustic scene at

the listener position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.13 Approximation error for the transforms presented in Sec. 2.4.1 . . . . . . . 602.14 Logarithmic coupling magnitudes of a conventional and a wave-domain

LEMS model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.15 Illustration of a circular loudspeaker array with a center shifted from the

origin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632.16 Total energy of the mode couplings . . . . . . . . . . . . . . . . . . . . . . 632.17 Total energy of the mode couplings . . . . . . . . . . . . . . . . . . . . . . 652.18 Diagonal coupling energy for different values of the positioning error and

the distance of the array centers . . . . . . . . . . . . . . . . . . . . . . . . 652.19 Equalization of a random impulse response with an FIR filter . . . . . . . 682.20 Structures of the matrices representing MIMO filtering . . . . . . . . . . . 732.21 Signal model of the reproduction system . . . . . . . . . . . . . . . . . . . 77

3.1 Signal model for conventional system identification in the point-to-pointdomain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

274

3.2 Signal model for wave-domain system identification in loudspeaker-signal-preserving configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.3 Signal model for wave-domain system identification in loudspeaker-signal-transforming configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

3.4 Differently structured matrices describing the same MIMO filtering opera-tion for NL = 3, NM = 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.5 Structure of autocorrelation matrix used for system identification for NL =3, NM = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3.6 Minimum normalized system misalignment achievable by approximativemodels for the LEMS as described in Sec. 2.4.2 . . . . . . . . . . . . . . . 91

3.7 Different wave-domain models for the LEMS . . . . . . . . . . . . . . . . . 923.8 Influence of different parameters on the lower bound for the normalized

misalignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 963.9 Exemplary structure of the matrices Vsi and V2

si . . . . . . . . . . . . . . . 973.10 Illustration of mode coupling weights and additionally introduced cost . . . 1033.11 Structure of the matrices to describe a time-domain convolution as a DFT-

domain multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1183.12 Illustration of the structure of the matrix S(sp)

XX(n) . . . . . . . . . . . . . . 1213.13 Illustration of a synthesized plane wave . . . . . . . . . . . . . . . . . . . . 1323.14 Results of an AEC experiment with plane waves carrying white noise signals1333.15 Results of an AEC experiment with plane waves carrying music signals . . 1353.16 Results after 45 seconds of AEC operation for different approximative

LEMS models and acoustic scenes . . . . . . . . . . . . . . . . . . . . . . . 1373.17 Results of an AEC experiment with plane waves carrying white noise signals

and an adaptive filter length of 1024 samples . . . . . . . . . . . . . . . . 1383.18 Results of an AEC experiment with plane waves carrying white noise signals

and an adaptive filter length of 512 samples . . . . . . . . . . . . . . . . . 1393.19 Results of an AEC experiment with plane waves carrying white noise signals

with the microphone noise level of −10dB . . . . . . . . . . . . . . . . . . 1403.20 Results of an AEC experiment with plane waves carrying white noise signals

with the microphone noise level of 0dB . . . . . . . . . . . . . . . . . . . . 1413.21 Results of an AEC experiment with plane waves carrying music signals,

with the microphone interferer level of −20dB . . . . . . . . . . . . . . . . 1423.22 Results of an AEC experiment with plane waves carrying music signals,

with the microphone interferer level of −10dB . . . . . . . . . . . . . . . . 1433.23 Results of an AEC experiment comparing cost-guided system identification

with different LEMS models over a long time span . . . . . . . . . . . . . . 1453.24 Results of an AEC experiment under optimal conditions . . . . . . . . . . 1463.25 Results of an AEC experiment with suboptimal initialization values . . . . 1473.26 Results of an AEC experiment with interfering noise sources in the micro-

phone signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1483.27 Results of an AEC experiment with plane waves carrying music signals . . 149

List of Figures 275

3.28 Results for AEC achieved with the cost-guided wave-domain system iden-tification and different weight parameters . . . . . . . . . . . . . . . . . . . 150

3.29 Results for AEC achieved with the guidance by a previously measuredMIMO impulse response and different weight parameters . . . . . . . . . . 152

3.30 Computational effort for WDAF GFDAF adaptation . . . . . . . . . . . . 1533.31 Screen shot of the WDAF AEC real-time demonstrator . . . . . . . . . . . 1543.32 Photographs of the loudspeaker array and microphone array used in the

AEC real-time demonstrator . . . . . . . . . . . . . . . . . . . . . . . . . . 154

4.1 Discrete-time signal model for LRE task definition . . . . . . . . . . . . . . 1564.2 Signal model for the implementation of a conventional LRE system in the

point-to-point domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1574.3 Discrete-time signal model for LRE task definition in the wave domain . . 1584.4 Differently structured matrices describing the same MIMO filtering operation1604.5 Signal model of a wave-domain LRE system . . . . . . . . . . . . . . . . . 1624.6 Illustration of the LEMS model and the corresponding equalizer structure . 1644.7 Differently structured matrices describing the same MIMO filtering operation1664.8 Structure of the matrices determining the least-squares solution for LRE . 1674.9 Signal model of a filtered-x structure . . . . . . . . . . . . . . . . . . . . . 1724.10 Example of two 2× 2 matrices commuted in a matrix-matrix-vector product.1744.11 Array setup for the evaluation of LRE . . . . . . . . . . . . . . . . . . . . 1864.12 LEMS considered for the image source model . . . . . . . . . . . . . . . . . 1874.13 Exemplary loudspeaker-to-microphone impulse response used for the LRE

evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1884.14 Adaptive filter errors, absolute LRE errors, and normalized misalignment . 1914.15 LRE errors shown with and without median filtering . . . . . . . . . . . . 1934.16 Scene-dependent and absolute LRE performance for a time-varying acous-

tic scene reproduced in a known LEMS for the evaluation of least-squaresoptimal equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

4.17 Scene-dependent and absolute LRE performance for a time-varying acous-tic scene reproduced in a known LEMS for the evaluation of the FxGFDAFalgorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

4.18 Scene-dependent and absolute LRE performance for a time-varying acous-tic scene reproduced in a known LEMS for the evaluation of the IDI algo-rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

4.19 Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the FxGFDAF al-gorithm operating on an LEMS estimated with the GFDAF algorithm . . . 199

4.20 Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with IDI algorithm op-erating on an estimated LEMS . . . . . . . . . . . . . . . . . . . . . . . . . 200

276

4.21 Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with FxGFDAF algo-rithm operating on an LEMS estimated with the cost-guided GFDAF al-gorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

4.22 Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the cost-guidedFxGFDAF algorithm operating on an estimated LEMS . . . . . . . . . . . 203

4.23 Scene-dependent LRE performance and system misalignment for a time-varying acoustic scene with equalizers determined with the cost-guidedFxGFDAF algorithm operating on an LEMS estimated with the cost-guided GFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

4.24 Scene-dependent LRE performance of least-squares optimal equalizers de-termined for a known LEMS . . . . . . . . . . . . . . . . . . . . . . . . . . 206

4.25 Scene-dependent LRE performance of equalizers determined for a knownLEMS using the FxGFDAF algorithm . . . . . . . . . . . . . . . . . . . . 207

4.26 Scene-dependent LRE performance of equalizers determined for a knownLEMS using the IDI algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 208

4.27 Scene-dependent LRE performance and system misalignment of equalizersdetermined using the FxGFDAF algorithm for an LEMS estimated withthe GFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

4.28 Scene-dependent LRE performance and system misalignment of equaliz-ers determined using the IDI algorithm for an LEMS estimated with theGFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

4.29 Scene-dependent LRE performance and system misalignment of equalizersdetermined using the FxGFDAF algorithm for an LEMS estimated withthe cost-guided GFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . 211

4.30 LRE performance of equalizers determined using the cost-guided FxGFDAFalgorithm for an LEMS estimated with the GFDAF algorithm . . . . . . . 212

4.31 Scene-dependent LRE performance and system misalignment of equaliz-ers determined using the cost-guided FxGFDAF algorithm for an LEMSestimated with the cost-guided GFDAF algorithm . . . . . . . . . . . . . 213

4.32 System misalignment as a function of time and NG for equalizers deter-mined using the FxGFDAF algorithm for an LEMS estimated with theGFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

4.33 System misalignment as a function of time and NG for equalizers deter-mined using the FxGFDAF algorithm for an LEMS estimated with thecost-guided GFDAF algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 215

4.34 Computational effort for equalizer determination per iteration . . . . . . . 2184.35 Computational effort for equalizer determination per iteration . . . . . . . 218

C.1 Random sample of a loudspeaker-to-microphone impulse response . . . . . 228

List of Figures 277

C.2 Example of convergence curves for different adaptation algorithms in asingle-channel scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229

C.3 Example of convergence curves for different adaptation algorithms in amultichannel scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230

C.4 Optimal step size µsi, normalized misalignment, and echo return loss en-hancement (ERLE) for the LMS algorithm as a function of the number ofloudspeaker channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

C.5 Optimal step size µsi, normalized misalignment, and ERLE for the LMSalgorithm as a function of the number of loudspeaker a the power of thecorrelated portion of the loudspeaker signals. . . . . . . . . . . . . . . . . 233

C.6 Normalized misalignment and ERLE as a function of the number of loud-speaker channels, achieved with the LMS algorithm with different frameshift values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

C.7 Optimal step size µsi, normalized misalignment, and ERLE for the APA asa function of the number of loudspeaker channels. . . . . . . . . . . . . . 235

C.8 Optimal step size µsi, normalized misalignment, and ERLE for the APAas a function of the number of loudspeaker a the power of the correlatedportion of the loudspeaker signals. . . . . . . . . . . . . . . . . . . . . . . 236

C.9 Optimal step size µsi, normalized misalignment, and ERLE for the GFDAFalgorithm as a function of the number of loudspeaker channels. . . . . . . 237

C.10 Optimal step size µsi, normalized misalignment, and ERLE for the GFDAFalgorithm as a function of the number of loudspeaker a the power of thecorrelated portion of the loudspeaker signals. . . . . . . . . . . . . . . . . 238

C.11 Optimal regularization factor γsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker channels. 239

C.12 Optimal regularization factor γsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker a thepower of the correlated portion of the loudspeaker signals. . . . . . . . . . 239

C.13 Optimal “forgetting factor” λsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker channels. 240

C.14 Optimal “forgetting factor” λsi, normalized misalignment, and ERLE forthe GFDAF algorithm as a function of the number of loudspeaker a thepower of the correlated portion of the loudspeaker signals. . . . . . . . . . 241

279

Bibliography

[AB79] J.B. Allen and D.A. Berkley. Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4):943– 950, Apr. 1979.

[Ali98] M. Ali. Stereophonic acoustic echo cancellation system using time-varyingall-pass filtering for signal decorrelation. In Proc. IEEE Intl. Conf. on Acous-tics, Speech, and Signal Processing (ICASSP), volume 6, pages 3689 – 3692,Seattle (WA), USA, May 1998.

[Ant79] A. Antoniou. Digital Filters: Analysis and Design. McGraw-Hill, New York(NY), USA, 1979.

[AS72] M. Abramovitz and I.A. Stegun. Handbook of Mathematical Functions.Dover, New York (NY), USA, 1972.

[BA05] T. Bethlehem and T.D. Abhayapala. Theory and design of sound field re-production in reverberant rooms. The Journal of the Acoustical Society ofAmerica, 117(4):2100 – 2111, Apr. 2005.

[BAGG95] J. Benesty, F. Amand, A. Gilloire, and Y. Grenier. Adaptive filtering algo-rithms for stereophonic acoustic echo cancellation. In Proc. IEEE Intl. Conf.on Acoustics, Speech, and Signal Processing (ICASSP), volume 5, pages 3099– 3102, Detroit (MI), USA, May 1995.

[BAK05] H. Buchner, R. Aichner, and W. Kellermann. A generalization of blindsource separation algorithms for convolutive mixtures based on second-orderstatistics. IEEE Trans. Speech and Audio Processing, 13(1):120–134, 2005.

[Bal97] C.A. Balanis. Antenna Theory. Wiley, Hoboken (NJ), USA, 1997.

[Bau61] B.B. Bauer. Stereophonic earphones and binaural loudspeakers. Journal ofthe Audio Engineering Society, 9(2):148 – 151, 1961.

[BBGK06] H. Buchner, J. Benesty, T. Gansler, and W. Kellermann. Robust ExtendedMultidelay Filter and Double-Talk Detector for Acoustic Echo Cancellation.IEEE Trans. Audio, Speech, and Language Processing, 14(5):1633 – 1644,2006.

280

[BBK03] H. Buchner, J. Benesty, and W. Kellermann. “Multichannel frequency-domain adaptive algorithms with application to acoustic echo cancellation”.In J. Benesty and Y. Huang, editors, Adaptive Signal Processing: Applica-tion to Real-World Problems. Springer, Berlin, Germany, 2003.

[BD92] J. Benesty and P. Duhamel. A fast exact least mean square adaptive algo-rithm. IEEE Trans. Signal Processing, 40(12):2904 – 2920, 1992.

[BDH+99] C. Breining, P. Dreiseitel, E. Hansler, A. Mader, B. Nitsch, H. Puder,T. Schertler, G. Schmidt, and J. Tilp. Acoustic echo control. an applica-tion of very-high-order adaptive filters. IEEE Signal Processing Magazine,16(4):42 – 69, July 1999.

[BDV93] A.J. Berkhout, D. De Vries, and P. Vogel. Acoustic control by wave fieldsynthesis. The Journal of the Acoustical Society of America, 93(5):2764 –2778, May 1993.

[Ben00] J. Benesty. General derivation of frequency-domain adaptive filtering. Tech-nical report, Bell Laboratories, 2000.

[Ber88] A.J. Berkhout. A holographic approach to acoustic control. Journal of theAudio Engineering Society, 36(12):977 – 995, 1988.

[BGM+01] J. Benesty, T. Gansler, D.R. Morgan, M.M. Sondhi, S.L. Gay, et al. Advancesin Network and Acoustic Echo Cancellation. Springer, Berlin, Germany,2001.

[BK08] H. Buchner and W. Kellermann. A fundamental relation between blindand supervised adaptive filtering illustrated for blind source separation andacoustic echo cancellation. In Proc. Joint Workshop on Hands-Free SpeechCommunication and Microphone Arrays (HSCMA), pages 17 – 20, Trento,Italy, May 2008.

[Bla00] D. Blackstock. Fundamentals of Physical Acoustics. Wiley, Hoboken (NJ),USA, 2000.

[Blu37] A.D. Blumlein. Us 2093540 a: Sound transmission, sound recording, andsound reproducing system. US-Patent, Sep. 1937.

[BM00] J. Benesty and D.R. Morgan. Frequency-domain adaptive filtering revis-ited, generalization to the multi-channel case, and application to acousticecho cancellation. In Proc. IEEE Intl. Conf. Acoustics, Speech, and SignalProcessing (ICASSP), volume 2, pages II789 – II792, Istanbul, Turkey, Jun.2000.

BIBLIOGRAPHY 281

[BMC00] J. Benesty, D.R. Morgan, and J.H. Cho. A new class of doubletalk detec-tors based on cross-correlation. IEEE Trans. Speech and Audio Processing,8(2):168 – 172, Mar. 2000.

[BMS98] J. Benesty, D.R. Morgan, and M.M. Sondhi. A better understanding andan improved solution to the specific problems of stereophonic acoustic echocancellation. IEEE Trans. Speech and Audio Processing, 6(2):156 – 165,Mar. 1998.

[Bol79] S.F. Boll. Suppression of acoustic noise in speech using spectral subtraction.IEEE Trans. Acoustics, Speech and Signal Processing, 27(2):113–120, 1979.

[Bor84] J. Borish. Extension of the image model to arbitrary polyhedra. The Journalof the Acoustical Society of America, 75(6):1827 – 1836, June 1984.

[Bou03] M. Bouchard. Multichannel affine and fast affine projection algorithms foractive noise control and acoustic equalization systems. IEEE Trans. Speechand Audio Processing, 11(1):54 – 60, Jan. 2003.

[Bra83] D.H. Brandwood. A complex gradient operator and its application in adap-tive array theory. Microwaves, Optics and Antennas, IEE Proceedings H,130(1):11 – 16, Feb. 1983.

[BSF+13] K. Brandenburg, M. Schneider, A. Franck, W. Kellermann, and S. Brix.Intelligent multichannel signal processing for future audio reproduction sys-tems. In Proc. Intl. Audio Engineering Society Conf.: Sound Field Control-Engineering and Perception, pages 1 – 10, Guildford, UK, Septemper 2013.

[BSK04] H. Buchner, S. Spors, and W. Kellermann. Wave-domain adaptive filtering:acoustic echo cancellation for full-duplex systems based on wave-field syn-thesis. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing(ICASSP), volume 4, pages IV–117 – IV–120, Montreal, Canada, May 2004.

[CMB99] J.H. Cho, D.R. Morgan, and J. Benesty. An objective technique for evalu-ating doubletalk detectors in acoustic echo cancelers. IEEE Trans. Speechand Audio Processing, 7(6):718 – 724, 1999.

[Cro98] M.J. Crocker. Handbook of Acoustics. Wiley, Hoboken (NJ), USA, 1998.

[D’A47] J. D’Alembert. Recherches sur la courbe que forme une corde tendue miseen vibrations. Histoire de l’Academie Royale des Sciences et Belles Lettresde Berlin, 3:214–249, 1747.

[Dan03] J. Daniel. Spatial sound encoding including near field effect: Introducingdistance coding filters and a variable, new ambisonic format. In Proc. Intl.Conf. of the Audio Engineering Society, pages 1 – 12, Copenhagen, Denmark,Sep. 2003.

282

[DMW78] M. Dentino, J. McCool, and B. Widrow. Adaptive filtering in the frequencydomain. Proceedings of the IEEE, 66(12):1658 – 1659, Dec. 1978.

[EHO13] S. Emura, Y. Hiwasaki, and H. Ohmuro. Wave-domain echo-path model withaliasing for echo cancellation. In Proc. IEEE Workshop on Applications ofSignal Processing to Audio and Acoustics (WASPAA), pages 1 – 4, NewPaltz (NY), USA, Oct. 2013.

[EKFH12] S. Emura, S. Koyama, K. Furuya, and Y. Haneda. Posterior residual echocanceling and its complexity reduction in the wave domain. In Proc. Intl.Workshop on Acoustic Signal Enhancement (IWAENC), pages 1 – 4, Aachen,Germany, Sep. 2012.

[EN89] S.J. Elliott and P.A. Nelson. Multiple-point equalization in a room usingadaptive digital filters. Journal of the Audio Engineering Society, 37(11):899– 907, 1989.

[FC05] C. Faller and J. Chen. Suppressing acoustic echo in a spectral envelopespace. IEEE Trans. Speech and Audio Processing, 13(5):1048 – 1062, 2005.

[Fer80] E. Ferrara. Fast implementations of LMS adaptive filters. IEEE Trans.Acoustics, Speech, and Signal Processing, 28(4):474 – 475, Aug. 1980.

[Fet86] A. Fettweis. Wave digital filters: Theory and practice. Proceedings of theIEEE, 74(2):270 – 327, 1986.

[Fis02] R.F.H. Fischer. Precoding and Signal Shaping for Digital Transmission.Wiley, Hoboken (NJ), USA, 2002.

[GC88] H. Gish and D. Cochran. Generalized coherence [signal detection]. In Proc.IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP),volume 5, pages 2745 – 2748, New York (NY), USA, Apr. 1988.

[GE98] T. Gansler and P. Eneroth. Influence of audio coding on stereophonic acous-tic echo cancellation. In Proc. IEEE Intl. Conf. on Acoustics, Speech, andSignal Processing (ICASSP), volume 6, pages 3649 – 3652, Seattle (WA),USA, May 1998.

[Ger73] M.A. Gerzon. Periphony: With-height sound reproduction. Journal of theAudio Engineering Society, 21(1):2 – 10, 1973.

[GKMK08] S. Goetze, M. Kallinger, A. Mertins, and K.D. Kammeyer. Multi-channellistening-room compensation using a decoupled filtered-X LMS algorithm.In Proc. Asilomar Conf. on Signals, Systems, and Computers, pages 811 –815, Pacific Grove (CA), USA, Oct. 2008.

BIBLIOGRAPHY 283

[GMJV02] S. Gustafsson, R. Martin, P. Jax, and P. Vary. A psychoacoustic approachto combined acoustic echo cancellation and noise reduction. IEEE Trans.Speech and Audio Processing, 10(5):245 – 256, 2002.

[GT95] S.L. Gay and S. Tavathia. The fast affine projection algorithm. In Proc.IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP),volume 5, pages 3023 – 3026, Detroit (MI), USA, May 1995.

[GT98] A. Gilloire and V. Turbin. Using auditory properties to improve the be-haviour of stereophonic acoustic echo cancellers. In Proc. IEEE Intl. Conf.on Acoustics, Speech, and Signal Processing (ICASSP), volume 6, pages 3681– 3684, Seattle (WA), USA, May 1998.

[GVL96] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins,Baltimore (MD), USA, 3rd edition, Oct. 1996.

[Hay02] S. Haykin. Adaptive Filter Theory. Prentice Hall, Englewood Cliffs (NJ),USA, 2002.

[HB13] K. Helwani and H. Buchner. On the eigenspace estimation for super-vised multichannel system identification. In Proc. IEEE Intl. Conf. Acous-tics, Speech, and Signal Processing (ICASSP), pages 630 – 634, Vancouver,Canada, May 2013.

[HBC06] Y. Huang, J. Benesty, and J. Chen. Acoustic MIMO Signal Processing.Springer, Berlin, Germany, 2006.

[HBK07] J. Herre, H. Buchner, and W. Kellermann. Acoustic echo cancellation forsurround sound using perceptually motivated convergence enhancement.In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing(ICASSP), volume 1, pages I–17 – I–20, Honolulu (HI), USA, Apr. 2007.

[HBS10a] K. Helwani, H. Buchner, and S. Spors. On the robust and efficient computa-tion of the kalman gain for multichannel adaptive filtering with applicationto acoustic echo cancellation. In Proc. Asilomar Conf. on Signals, Systems,and Computers, pages 988–992, Pacific Grove (CA), USA, Nov. 2010.

[HBS10b] K. Helwani, H. Buchner, and S. Spors. Source-domain adaptive filtering formimo systems with application to acoustic echo cancellation. In Proc. IEEEIntl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages321 – 324, Dallas (TX), USA, Mar. 2010.

[HBS12] K. Helwani, H. Buchner, and S. Spors. Multichannel adaptive filtering withsparseness constraints. In Proc. Intl. Workshop on Acoustic Echo and NoiseControl (IWAENC), pages 1 – 4, Aachen, Germany, Sep. 2012.

284

[HCB11] Y. Huang, J. Chen, and J. Benesty. Immersive audio schemes. IEEE SignalProcessing Magazine, 28(1):20–32, 2011.

[HMK97] Y. Haneda, S. Makino, and Y. Kaneda. Multiple-point equalization of roomtransfer functions by using common acoustical poles. IEEE Trans. Speechand Audio Processing, 5(4):325 – 333, 1997.

[HNO+08] K. Hamasaki, T. Nishiguchi, R. Okumura, Y. Nakayama, and A. Ando.A 22.2 multichannel sound system for ultrahigh-definition TV (UHDTV).SMPTE Motion Imaging Journal, 117(3):40 – 49, 2008.

[HS81] H.V. Henderson and S.R. Searle. The vec-permutation matrix, the vec op-erator and kronecker products: A review. Linear and multilinear algebra,9(4):271–288, 1981.

[HS04] E. Hansler and G. Schmidt. Acoustic Echo and Noise Control: A PracticalApproach. Wiley, Hoboken (NJ), USA, 2004.

[HSB11] K. Helwani, S. Spors, and H. Buchner. Spatio-temporal signal preprocessingfor multichannel acoustic echo cancellation. In Proc. IEEE Intl. Conf. onAcoustics, Speech, and Signal Processing (ICASSP), pages 93 – 96, Prague,Czech Republic, May 2011.

[Kel85] W. Kellermann. Kompensation akustischer Echos in Frequenzteilbandern.Frequenz, 39(7-8):209–215, 1985.

[Kel88] W. Kellermann. Analysis and design of multirate systems for cancellationof acoustical echoes. In Proc. IEEE Intl. Conf. on Acoustics, Speech, andSignal Processing (ICASSP), volume 5, pages 2570 – 2573, New York (NY),USA, Apr. 1988.

[KFV11] M. Kolundzija, C. Faller, and M. Vetterli. Reproducing sound fields usingMIMO acoustic channel inversion. Journal of the Audio Engineering Society,59(10):721 – 734, 2011.

[KN93] O. Kirkeby and P.A. Nelson. Reproduction of plane wave sound fields. TheJournal of the Acoustical Society of America, 94(5):2992, 1993.

[KNB06] A.W.H. Khong, P.A. Naylor, and J. Benesty. Effect of interchannel coher-ence on conditioning and misalignment performance for stereo acoustic echocancellation. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and SignalProcessing (ICASSP), volume 5, pages V – V, Toulouse, France, May 2006.

[KNHOB98] O. Kirkeby, P.A. Nelson, H. Hamada, and F. Orduna-Bustamante. Fastdeconvolution of multichannel systems using regularization. IEEE Trans.Speech and Audio Processing, 6(2):189 – 194, Mar. 1998.

BIBLIOGRAPHY 285

[KSAJ07] R.A. Kennedy, P. Sadeghi, T.D. Abhayapala, and H. M. Jones. Intrinsiclimits of dimensionality and richness in random multipath fields. IEEETrans. Signal Processing, 55(6):2542 – 2556, 2007.

[Kun09] A. Kuntz. Wave Field Analysis Using Virtual Circular Microphone Arrays.PhD thesis, Friedrich-Alexander-Universitat Erlangen-Nurnberg, 2009.

[Kut09] Heinrich Kuttruff. Room Acoustics. CRC Press, Boca Raton, (FL), USA,2009.

[LGF05] J.J. Lopez, A. Gonzalez, and L. Fuster. Room compensation in wave fieldsynthesis by means of multichannel inversion. In Proc. IEEE Workshop onApplications of Signal Processing to Audio and Acoustics (WASPAA), pages146 – 149, New Paltz (NY), USA, Oct. 2005.

[LVKL96] T. Laakso, V. Valimaki, M. Karjalainen, and U. Laine. Splitting the unitdelay. IEEE Signal Processing Magazine, 13(1):30 – 60, Jan. 1996.

[MA07] A. Maaref and S. Aissa. Eigenvalue distributions of wishart-type randommatrices with application to the performance analysis of mimo mrc systems.IEEE Trans. Wireless Communications, 6(7):2678 – 2689, July 2007.

[MAAG95] E. Moulines, O. Ait Amrane, and Y. Grenier. The generalized multidelayadaptive filter: structure and convergence analysis. IEEE Trans. SignalProcessing, 43(1):14 – 28, 1995.

[MD95] M. Montazeri and P. Duhamel. A set of algorithms linking NLMS and blockRLS algorithms. IEEE Trans. Signal Processing, 43(2):444 – 453, 1995.

[MF53] P. Morse and H. Feshbach. Methods of Theoretical Physics. Mc Graw - Hill,New York (NY), USA, 1953.

[MGJ82] D. Mansour and A. Gray Jr. Unconstrained frequency-domain adaptivefilter. IEEE Trans. Acoustics, Speech, and Signal Processing, 30(5):726 –734, 1982.

[MHB01] D.R. Morgan, J.L. Hall, and J. Benesty. Investigation of several types of non-linearities for use in stereo acoustic echo cancellation. IEEE Trans. Speechand Audio Processing, 9(6):686 – 696, Sep. 2001.

[MK88] M. Miyoshi and Y. Kaneda. Inverse filtering of room acoustics. IEEE Trans.Acoustics, Speech, and Signal Processing, 36(2):145 – 152, Feb. 1988.

[MK97] S. Muramatsu and H. Kiya. Extended overlap-add and-save methods formultirate signal processing. IEEE Trans. Signal Processing, 45(9):2376 –2380, 1997.

286

[MMK10] A. Mertins, T. Mei, and M. Kallinger. Room impulse response shorten-ing/reshaping with infinity- and p -norm optimization. IEEE Trans. Audio,Speech, and Language Processing, 18(2):249 – 259, 2010.

[MMSN10] M. Matthaiou, M.R. Mckay, P.J. Smith, and J.A. Nossek. On the conditionnumber distribution of complex Wishart matrices. IEEE Trans. Communi-cations, 58(6):1705 – 1717, June 2010.

[Møl92] H. Møller. Fundamentals of binaural technology. Applied acoustics, 36(3):171– 218, 1992.

[Mou85] J. Mourjopoulos. On the variation and invertibility of room impulse responsefunctions. Journal of Sound and Vibration, 102(2):217 – 228, 1985.

[MPS00] A. Mader, H. Puder, and G.U. Schmidt. Step-size control for acoustic echocancellation filters–an overview. Signal Processing, 80(9):1697 – 1719, 2000.

[MSM+09] L. Marquardt, P. Svaizer, E. Mabande, A. Brutti, C. Zieger, M. Omologo,and W. Kellermann. A natural acoustic front-end for interactive tv in the eu-project dicit. In Proc. IEEE Pacific Rim Conf. on Communications, Com-puters and Signal Processing (PacRim), pages 894 – 899, Victoria, Canada,Aug. 2009.

[NHE92] P.A. Nelson, H. Hamada, and S.J. Elliott. Adaptive inverse filters for stereo-phonic sound reproduction. IEEE Trans. Signal Processing, 40(7):1621 –1632, 1992.

[NOBH95] P.A. Nelson, F. Orduna-Bustamante, and H. Hamada. Inverse filter designand equalization zones in multichannel sound reproduction. IEEE Trans.Speech and Audio Processing, 3(3):185 – 192, May 1995.

[OLBC10] F.W. Olver, D.W. Lozier, R.F. Boisvert, and C.W. Clark. NIST Handbookof Mathematical Functions. Cambridge University Press, New York (NY),USA, 1st edition, 2010.

[OYS+99] M. Omura, M. Yada, H. Saruwatari, S. Kajita, K. Takeda, and F. Itakura.Compensating of room acoustic transfer functions affected by change of roomtemperature. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and SignalProcessing (ICASSP), volume 2, pages 941 – 944, Glasgow, Scotland, Mar.1999.

[PA11] M.A. Poletti and T.D. Abhayapala. Spatial sound reproduction systemsusing higher order loudspeakers. In Proc. IEEE Intl. Conf. Acoustics, Speech,and Signal Processing (ICASSP), pages 57 – 60, Prague, Czech Republic,May 2011.

BIBLIOGRAPHY 287

[PCRP11] P. Peretti, S. Cecchi, L. Romoli, and F. Piazza. Performance evaluationof adaptive algorithms for wave field analysis/synthesis using sound fieldsimulations. In Jianping Zhu, editor, Computational Simulations and Appli-cations. InTech, Oct. 2011.

[Pie89] A.D. Pierce. Acoustics: An Introduction to its Physical Principles and Ap-plications. Acoustical Society of America (American Institute of Physics),Woodbury (NY), USA, 1989.

[Pol96] M.A. Poletti. The design of encoding functions for stereophonic and poly-phonic sound systems. Journal of the Audio Engineering Society, 44(11):948– 963, 1996.

[SA08] S. Spors and J. Ahrens. A comparison of wave field synthesis and higher-order ambisonics with respect to physical properties and spatial sampling.In Proc. Audio Engineering Society Convention, San Francisco, (CA), USA,Oct. 2008.

[SB80] M.M. Sondhi and D.A. Berkley. Silencing echoes on the telephone network.Proceedings of the IEEE, 68(8):948 – 963, Aug. 1980.

[SB08a] S. Spors and H. Buchner. Efficient massive multichannel active noise controlusing wave-domain adaptive filtering. In Proc. Intl. Symposium on Commu-nications, Control and Signal Processing (ISCCSP), pages 1480 – 1485, St.Julians, Malta, Mar. 2008.

[SB08b] S. Spors and H. Buchner. Multichannel transform domain adaptive filtering:A two stage approach and illustration for acoustic echo cancellation. In Proc.Intl. Workshop on Acoustic Echo and Noise Control (IWAENC), pages 1 –4, Seattle (WA), USA, Sep. 2008.

[SBR04] S. Spors, H. Buchner, and R. Rabenstein. A novel approach to active listen-ing room compensation for wave field synthesis using wave-domain adaptivefiltering. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Pro-cessing (ICASSP), volume 4, pages IV–29 – IV–32, Montreal, Canada, May2004.

[SBR06] S. Spors, H. Buchner, and R. Rabenstein. Eigenspace adaptive filtering forefficient pre-equalization of acoustic mimo systems. In Proc. European SignalProcessing Conference (EUSIPCO), pages 1 – 5, Florence, Italy, Sep. 2006.

[SBRH07] S. Spors, H. Buchner, R. Rabenstein, and W. Herbordt. Active listeningroom compensation for massive multichannel sound reproduction systemsusing wave-domain adaptive filtering. The Journal of the Acoustical Societyof America, 122(1):354 – 369, July 2007.

288

[Sci81] The telephone at the paris opera. Scientific American, pages 422 – 423,1881. http://earlyradiohistory.us/1881opr.htm.

[SDRS09] H.W. Schußler, G. Dehner, R. Rabenstein, and P. Steffen. Digitale Sig-nalverarbeitung 2: Entwurf diskreter Systeme. Number Bd. 2 in DigitaleSignalverarbeitung. Springer, Berlin, Germany, 2009.

[SH15] M. Schneider and E.A.P. Habets. Comparison of multichannel doubletalkdetectors for acoustic echo cancellation. In Proc. European Signal ProcessingConf. (EUSIPCO), Nice, France, Aug. 2015.

[SHK13] M. Schneider, C. Huemmer, and W. Kellermann. Wave-domain loudspeakersignal decorrelation for system identification in multichannel audio reproduc-tion scenarios. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and SignalProcessing (ICASSP), pages 605 – 609, Vancouver, Canada, May 2013.

[Shy92] J.J. Shynk. Frequency-domain and multirate adaptive filtering. IEEE SignalProcessing Magazine, 9(1):14–37, 1992.

[SK09] M. Schneider and W. Kellermann. Considering modal aliasing in the imple-mentation of an acoustic echo canceller in the wave domain. The Journal ofthe Acoustical Society of America, 125(4):2543 – 2543, 2009.

[SK11] M. Schneider and W. Kellermann. A wave-domain model for acoustic MIMOsystems with reduced complexity. In Proc. Joint Workshop on Hands-freeSpeech Communication and Microphone Arrays (HSCMA), pages 133 – 138,Edinburgh, UK, May 2011.

[SK12a] M. Schneider and W. Kellermann. Adaptive listening room equalizationusing a scalable filtering structure in the wave domain. In Proc. IEEE Intl.Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 13 – 16,Kyoto, Japan, Mar. 2012.

[SK12b] M. Schneider and W. Kellermann. A direct derivation of transforms forwave-domain adaptive filtering based on circular harmonics. In Proc. Eu-ropean Signal Processing Conf. (EUSIPCO), pages 1034 – 1038, Bucharest,Romania, Aug. 2012.

[SK12c] M. Schneider and W. Kellermann. Iterative DFT-domain inverse filter deter-mination for adaptive listening room equalization. In Proc. Intl. Workshopon Acoustic Echo and Noise Control (IWAENC), pages 1 – 4, Aachen, Ger-many, Sep. 2012.

[SK13a] M. Schneider and W. Kellermann. Large-scale multiple input/multiple out-put system identification in room acoustics. In Proc. Intl. Congress onAcoustics (ICA), pages 1 – 9, Montreal, Canada, June 2013.

BIBLIOGRAPHY 289

[SK13b] M. Schneider and W. Kellermann. On the influence of loudspeaker andmicrophone array geometries on the limits for listening room equalizationin a spatial continuum. In Proc. Conf. on Acoustics including the Ital-ian Annual Conf. on Acoustics and the German Annual Conf. on Acoustics(AIA-DAGA), pages 2360 – 2363, Merano, Italy, Mar. 2013.

[SK16a] M. Schneider and W. Kellermann. The generalized frequency-domain adap-tive filtering algorithm as an approximation of the block recursive least-squares algorithm. EURASIP Journal on Advances in Signal Processing,2016(1):1–15, 2016.

[SK16b] M. Schneider and W. Kellermann. Multichannel acoustic echo cancellationin the wave domain with increased robustness to nonuniqueness problem.IEEE Trans. Audio, Speech, and Language Processing, 24(3):518–529, Mar.2016.

[SM95] S. Shimauchi and S. Makino. Stereo projection echo canceller with true echopath estimation. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and SignalProcessing (ICASSP), volume 5, pages 3059 – 3062, Philadelphia (PA), USA,May 1995.

[SM96] S. Shimauchi and S. Makino. Stereo echo cancellation algorithm using imag-inary input-output relationships. In Proc. IEEE Intl. Conf. on Acoustics,Speech, and Signal Processing (ICASSP), volume 2, pages 941 – 944, Atlanta(GA), USA, May 1996.

[SMH95] M.M. Sondhi, D.R. Morgan, and J.L. Hall. Stereophonic acoustic echocancellation-an overview of the fundamental problem. IEEE Signal Pro-cessing Letters, 2(8):148 – 151, Aug. 1995.

[Sno53] W.B. Snow. Basic principles of stereophonic sound. Journal of the Societyof Motion Picture and Television Engineers, 61(5):567 – 589, 1953.

[Spo05] S. Spors. Active Listening Room Compensation for Spatial Sound Reproduc-tion Systems. PhD thesis, Friedrich-Alexander-Universitat Erlangen-Nurn-berg, 2005.

[SR06] S. Spors and R. Rabenstein. Spatial aliasing artifacts produced by linearand circular loudspeaker arrays used for wave field synthesis. In Proc. AudioEngineering Society Convention, Paris, France, May 2006.

[SRA08] S. Spors, R. Rabenstein, and J. Ahrens. The theory of wave field synthesisrevisited. In Proc. Convention of the Audio Engineering Society, pages 17 –20, San Francisco (CA), USA, Oct. 2008.

290

[SRR05] S. Spors, M. Renk, and R. Rabenstein. Limiting effects of active roomcompensation using wave field synthesis. In Proc. Audio Engineering SocietyConvention, pages 1 – 15, Barcelona, Spain, May 2005.

[SSK12] M. Schneider, F. Schuh, and W. Kellermann. The generalized frequency-domain adaptive filtering algorithm implemented on a gpu for large-scalemultichannel acoustic echo cancellation. In ITG-Fachbericht Sprachkommu-nikation, pages 39 – 42, Braunschweig, Germany, Sep. 2012.

[SvGKJ87] P. Sommen, P. van Gerwen, H. Kotmans, and A. Janssen. Convergenceanalysis of a frequency-domain adaptive filter with exponential power aver-aging and generalized window function. IEEE Trans. Circuits and Systems,34(7):788 – 798, Jul. 1987.

[SWR+13] S. Spors, H. Wierstorf, A. Raake, F. Melchior, M. Frank, and F. Zotter.Spatial sound with loudspeakers and its perception: A review of the currentstate. Proceedings of the IEEE, 101(9):1920–1938, 2013.

[TE13a] P. Thune and G. Enzner. Improved online identification of acoustic MISOsystems based on separated input signal components. In Proc. IEEE Intl.Conf. on Acoustics, Speech, and Signal Processing (ICASSP), pages 413–417, 2013.

[TE13b] P. Thune and G. Enzner. Trends in adaptive miso system identification formultichannel audio reproduction and speech communication. In Proc. Intl.Symposium on Image and Signal Processing and Analysis (ISPA), pages 767– 772, Berlin, Germany, Sep. 2013.

[Teu07] H. Teutsch. Modal Array Signal Processing: Principles and Applications ofAcoustic Wavefield Decomposition. Springer, Berlin, Germany, 2007.

[The00] G. Theile. Multichannel natural recording based on psychoacoustic princi-ples. In Proc. Audio Engineering Society Convention, pages 1 – 19, Paris,France, Feb. 2000.

[TZA13] D.S. Talagala, W. Zhang, and T.D. Abhayapala. Active acoustic echo can-cellation in spatial soundfield reproduction. In Proc. IEEE Intl. Conf. onAcoustics, Speech, and Signal Processing (ICASSP), pages 620 – 624, Van-couver, Canada, May 2013.

[TZA14] D.S. Talagala, W. Zhang, and T.D. Abhayapala. Efficient multi-channeladaptive room compensation for spatial soundfield reproduction using amodal decomposition. IEEE Trans. Audio, Speech, and Language Process-ing, 22(10):1522–1532, 2014.

BIBLIOGRAPHY 291

[Uni12] International Telecommunication Union. Multichannel stereophonic soundsystem with and without accompanying picture. Recommendation ITU-RBS.775-3, pages 1–25, Aug. 2012.

[Vai93] P.P. Vaidyanathan. Multirate Systems And Filter Banks. Prentice HallSignal Processing Series. Pearson, Upper Saddle River (NJ), USA, 1993.

[Ver97] E. Verheijen. Sound reproduction by Wave Field Synthesis. PhD thesis, DelftUniversity of Technology, Netherlands, 1997.

[WA01] D.B. Ward and T.D. Abhayapala. Reproduction of a plane-wave sound fieldusing an array of loudspeakers. IEEE Trans. Speech and Audio Processing,9(6):697 – 707, 2001.

[WB10] F. Wefers and J. Berg. High-performance real-time FIR-filtering using fastconvolution on graphics hardware. In Proc. Conf. on Digital Audio Effect(DAFX), pages 1–8, Graz, Austria, Sep. 2010.

[Wil99] E.G. Williams. Fourier Acoustics: Sound Radiation and Nearfield AcousticalHolography. Academic Press, Waltham (MA), USA, 1999.

[WJ10] T.S. Wada and B.H. Juang. Multi-channel acoustic echo cancellation basedon residual echo enhancement with effective channel decorrelation via re-sampling. In Proc. Intl. Workshop on Acoustic Echo and Noise Control(IWAENC), pages 1 – 4, Tel Aviv, Israel, Sep. 2010.

[WV13] F. Wefers and M. Vorlander. Using fast convolution for fir filtering :Overview and guidelines for real-time audio rendering. In Proc. Conf. onAcoustics including the Italian Annual Conf. on Acoustics and the GermanAnnual Conf. on Acoustics (AIA-DAGA), pages 263 – 263, Merano, Italy,Mar. 2013. Deutsche Gesellschaft fur Akustik.

[WWJ12] J. Wung, T.S. Wada, and B.H. Juang. Inter-channel decorrelation by sub-band resampling in frequency domain. In Proc. IEEE Intl. Conf. on Acous-tics, Speech, and Signal Processing (ICASSP), pages 29 – 32, Kyoto, Japan,Mar. 2012.

[YFKS83] M. Yoneyama, J. Fujimoto, Y. Kawamo, and S. Sasabe. The audio spot-light: An application of nonlinear interaction of sound waves to a new typeof loudspeaker design. The Journal of the Acoustical Society of America,73(5):1532, 1983.

[YW91] H. Ye and B.-X. Wu. A new double-talk detection algorithm based on theorthogonality theorem. IEEE Transactions on Communications, 39(11):1542– 1545, 1991.

292

Note that the following abbreviations were used:

• “Conf.” for “Conference”

• “Intl.” for “International”

• “Proc.” for “Proceedings of the”

• “Trans.” for “Transactions on”

• EURASIP for “European Association for Signal Processing”

• IEEE for “Institute of Electrical and Electronics Engineers”

Documents

Some Contributions to Adaptive Filtering for Acoustic ... · Some Contributions to Adaptive Filtering for Acoustic Multiple-Input/Multiple- ... (LMS) an unforgettable time of my life