Polynomial Matrix Decompositions - DiVA portal372294/...Popul arvetenskaplig sammanfattning p a svenska Tr adl os kommunikation ar ett omr ade vars popul aritet har okat de senaste

UPTEC F10 059

Examensarbete 20 pNovember 2010

Polynomial Matrix Decompositions

Evaluation of Algorithms with an Application

to Wideband MIMO Communications

Rasmus Brandt

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Polynomial Matrix Decompositions: Evaluation ofAlgorithms with an Application to Wideband MIMOCommunicationsRasmus Brandt

The interest in wireless communications among consumers has exploded since theintroduction of the ''3G'' cell phone standards. One reason for their success is theincreasingly higher data rates achievable through the networks. A further increase indata rates is possible through the use of multiple antennas at either or both sides ofthe wireless links.

Precoding and receive filtering using matrices obtained from a singular valuedecomposition (SVD) of the channel matrix is a transmission strategy for achievingthe channel capacity of a deterministic narrowband multiple-input multiple-output(MIMO) communications channel. When signalling over wideband channels usingorthogonal frequency-division multiplexing (OFDM), an SVD must be performed forevery sub-carrier. As the number of sub-carriers of this traditional approach growlarge, so does the computational load. It is therefore interesting to study alternatemeans for obtaining the decomposition.

A wideband MIMO channel can be modeled as a matrix filter with a finite impulseresponse, represented by a polynomial matrix. This thesis is concerned withinvestigating algorithms which decompose the polynomial channel matrix directly. Theresulting decomposition factors can then be used to obtain the sub-carrier basedprecoding and receive filtering matrices. Existing approximative polynomial matrix QRand singular value decomposition algorithms were modified, and studied in terms ofdecomposition quality and computational complexity. The decomposition algorithmswere shown to give decompositions of good quality, but if the goal is to obtainprecoding and receive filtering matrices, the computational load is prohibitive forchannels with long impulse responses.

Two algorithms for performing exact rational decompositions (QRD/SVD) ofpolynomial matrices were proposed and analyzed. Although they for simple casesresulted in excellent decompositions, issues with numerical stability of a spectralfactorization step renders the algorithms in their current form purposeless.

For a MIMO channel with exponentially decaying power-delay profile, the sum ratesachieved by employing the filters given from the approximative polynomial SVDalgorithm were compared to the channel capacity. It was shown that if the symbolstreams were decoded independently, as done in the traditional approach, the sumrates were sensitive to errors in the decomposition. A receiver with a spatially jointdetector achieved sum rates close to the channel capacity, but with such a receiverthe low complexity detector set-up of the traditional approach is lost.

Summarizing, this thesis has shown that a wideband MIMO channel can bediagonalized in space and frequency using OFDM in conjunction with anapproximative polynomial SVD algorithm. In order to reach sum rates close to thecapacity of a simple channel, the computational load becomes restraining compared tothe traditional approach, for channels with long impulse responses.

ISSN: 1401-5757, UPTEC F10 059Examinator: Tomas NybergÄmnesgranskare: Mikael SternadHandledare: Mats Bengtsson

Popularvetenskaplig sammanfattning pa svenska

Tradlos kommunikation ar ett omrade vars popularitet har okat de senaste aren. Ett skaltill ”3G-internets” framgang ar de hoga datatakter som ar mojliga. Datatakten i en tradloslank beror pa signalens bandbredd samt sandeffekten, och genom att oka endera erhalls hogredatatakter. Bade bandbredd och sandeffekt ar dock dyra resurser, eftersom deras anvandandeofta ar reglerat av nationella och internationella myndigheter.

Ett annat satt att oka datatakten i en tradlos lank kan vara att lagga till fler anten-ner pa sandar- och mottagarsidan, ett s.k. MIMO-system. Ett sadant system kan ses som enuppsattning av enkelantennlankar med inbordes paverkan och kan beskrivas av en matris. Da-tatakten for flerantennlanken kan maximeras genom att skicka flera parallella datastrommarover MIMO-kanalen. Eftersom de olika utsanda signalerna samsas om radiokanalen kommerde att blandas. Varje mottagarantenn kommer darfor att ta emot en kombination av deutsanda signalerna fran de olika sandarantennerna.

For att undvika att signalerna blandas maste de kodas. Det visar sig att genom att kodade sanda signalerna med en speciell matris, samt avkoda de mottagna signalerna med enannan matris, sa transformeras kanalen till en uppsattning av parallella virtuella kanaler. Padessa virtuella kanaler kan sedan oberoende datastrommar skickas. Kodningsmatriserna gesav en s.k. singularvardesuppdelning av den ursprungliga kanalmatrisen.

For ett enkelantennsystem med hog bandbredd kommer radiokanalen att paverka de olikafrekvenskomponenterna i signalen olika. Om inte systemet tar hansyn till den effekten kommerdess prestanda att paverkas. Ett satt att undvika denna frekvensselektivitet ar att signaleraover kanalen med s.k. OFDM. Genom OFDM-systemet delas den ursprungliga signalen upp iflera signaler med lag bandbredd. Genom att skicka dessa smalbandiga signaler pa olika delarav frekvensbandet sa paverkar de inte varandra. Den frekvensselektiva kanalen har saledesdelats upp i ett antal icke frekvensselektiva parallella subkanaler.

Genom att skicka en bredbandig signal over ett OFDM-baserat MIMO-system kan annuhogre datatakter astadkommas. Dock maste kodningsmatriserna beraknas for varje parallellsubkanal i frekvensbandet, vilket innebar att manga berakningsoperationer kravs. Det harexamensarbetet har undersokt en ny uppsattning algoritmer for att erhalla approximatio-ner av de kodningsmatriser som behovs. Kvaliteten pa de approximativa kodningsmatrisernajamfordes med de exakta matriserna och antalet nodvandiga berakningsoperationer mattes.Det visade sig att de nya algoritmerna kan producera kodningsmatriser av god kvalitet, menmed fler nodvandiga berakningsoperationer an det traditionella sattet att erhalla kodnings-matriserna.

Kodningsmatriserna fran de nya algoritmerna simulerades ocksa i ett kommunikationssy-stem. Med de nya matriserna kan man uppna datatakter som ar nara den teoretiska maxka-paciteten for en enkel radiokanal om en avancerad dekoder anvands pa mottagarsidan. Omistallet en uppsattning av enkla dekodrar anvands, som i det traditionella systemet, blirsystemprestanda lidande.

Sammanfattningsvis sa har det har examensarbetet visat att kodningsmatriserna erhallnafran de nya algoritmerna kan anvandas i ett bredbandigt MIMO-system for att maximeradatatakten. Dock sa kraver de fler berakningsoperationer, och en mer avancerad dekoder, andet traditionella systemet. De nya algoritmerna ar saledes inte konkurrenskraftiga jamfortmed det traditionella systemet.

Acknowledgements

This diploma work was performed at the Signal Processing Laboratory at the School ofElectrical Engineering at Kungliga Tekniska Hogskolan in Stockholm, and will lead to adegree of Master of Science in Engineering Physics from Uppsala University.

First and foremost, I would like to thank my supervisor Mats Bengtsson for proposingthe thesis topic and taking me on as a MSc thesis worker. His advice and guidance hashelped me considerably during the course of this work. My amnesgranskare Mikael Sternadat the Division for Signals and Systems at Uppsala university also deserves my gratitude; hiscomments have been very valuable to the final version of this thesis.

My family has always been supporting my endeavours, and for that I am endlessly grateful.Finally, thank you Melissa for being so lovely and cheerful, and for moving to Sweden to bewith me.

Contents

1 Introduction 11.1 Wireless Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Multiple Antennas and Wideband Channels . . . . . . . . . . . . . . . . . . . 31.3 Problem Formulation and Contributions . . . . . . . . . . . . . . . . . . . . . 31.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Preliminaries 52.1 Complex Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Polynomial Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.1 Givens Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 Decompositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.3 Coefficient Truncation . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 MIMO Channels and Multipath Propagation 113.1 Propagation and Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Channel Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.3 MIMO Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Channel Capacity and Achievable Rate . . . . . . . . . . . . . . . . . . . . . 153.3 Equalization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Polynomial Decomposition Algorithms: Coefficient Nulling 184.1 Performance Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2 PQRD-BC: Polynomial QR Decomposition . . . . . . . . . . . . . . . . . . . 20

4.2.1 Convergence and Complexity . . . . . . . . . . . . . . . . . . . . . . . 214.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.3 MPQRD-BC: Modified PQRD-BC . . . . . . . . . . . . . . . . . . . . . . . . 224.3.1 Convergence and Complexity . . . . . . . . . . . . . . . . . . . . . . . 234.3.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.4 PSVD by PQRD-BC: Polynomial Singular Value Decomposition . . . . . . . 294.4.1 Convergence and Complexity . . . . . . . . . . . . . . . . . . . . . . . 31

i

4.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.5 MPSVD by MPQRD-BC: Modified PSVD . . . . . . . . . . . . . . . . . . . . 32

4.5.1 Convergence and Complexity . . . . . . . . . . . . . . . . . . . . . . . 334.5.2 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.6 Sampled PSVD vs. SVD in DFT Domain . . . . . . . . . . . . . . . . . . . . 394.6.1 Frequency Domain Comparison . . . . . . . . . . . . . . . . . . . . . . 394.6.2 Computational Load Comparison, Set-Up Phase . . . . . . . . . . . . 394.6.3 Computational Load, Online Phase . . . . . . . . . . . . . . . . . . . . 414.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Rational Decomposition Algorithms: Polynomial Nulling 445.1 Rational Givens Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.2 PQRD-R: Rational QR Decomposition . . . . . . . . . . . . . . . . . . . . . . 45

5.2.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3 PSVD-R by PQRD-R: Rational Singular Value Decomposition . . . . . . . . 505.3.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6 Polynomial SVD for Wideband Spatial Multiplexing 556.1 Generic System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.1.1 Narrowband Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . 566.1.2 Wideband Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.2 SM by MIMO-OFDM: SVD in the DFT Domain . . . . . . . . . . . . . . . . 586.2.1 Specific System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.2.2 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.3 SM by MIMO-OFDM: SVD in the z-Domain . . . . . . . . . . . . . . . . . . 606.3.1 Specific System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.3.2 Achievable Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

7 Summary 697.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 707.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

A Acronyms and Notation 72

B Some Complexity Derivations 74B.1 Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Bibliography 78

ii

Chapter 1

Introduction

In the current day and age, access to mobile broadband through the cellular networks isubiquitous. The demands for higher data rates are ever increasing, as people get used tohaving constant access to the Internet. The latest acronym in the flora of terms relating tocellular networks is LTE, standing for Long Term Evolution. This new standard promiseseven higher data rates than the previous ”3G” standards, by employing efficient modulationschemes as well as terminals with multiple antennas [1].

With these increasing data rates, one could ponder how they are achieved. It all boilsdown to the efficient use of resources, in this case power and bandwidth. As the use of theresources is optimized, higher data rates can be provided to the cellular customers. Whenthe power and bandwidth allocation gets close to the optimal point however, how would onego about increase the data rate even further?

An exciting field in wireless communications is that of multi-antenna systems, so calledMIMO (multiple-input multiple-output) communications. Having access to multiple antennasat either or both sides of a wireless link can open up for new transmission strategies. TheMIMO channel can be used for increasing the rate even further, improving the signal quality,or both at the same time. The reason that the data rate can be increased, for the same amountof available power and bandwidth, is because the MIMO channel under certain conditionsprovides multiple parallel spatial channels, which can be used independently. This thesis willstudy how one can get access to the spatial channels, and compare two different approachesfor doing this.

1.1 Wireless Communications

Wireless communications is the field of communication strategies that employ radio waves forinformation transfer. In particular, this thesis will focus on digital wireless communications,meaning that the information being transmitted is in digital form. The system is agnosticof the meaning of the data, but is rather focused on reliably transferring the data across thechannel. The typical framework for digital communications can be seen in Figure 1.1.

The first operation in the system is that of source coding. This is the act of taking theinformation, in whatever form, and transforming it into a form suitable for transmission in adigital communications system. It may include sampling, quantization and lossy or losslesscompression of the data [2, p. 2]. The output from the source coder is a sequence of bits, orbinary digits. At the receiver side, the last operation performed is undoing the source coding.

1

Source

Coding

Channel

CodingMod. Channel Demod.

Channel

Decoding

Source

Decoding

Figure 1.1: Block Diagram of a Typical Digital Communications System.

If the system has been designed properly, the output of the source decoder will resemble theinput to the source coder.

If part of the source coder’s task is to remove redundancy in the data, then the oppositeholds for the channel coder. The blocks in Figure 1.1 in between the channel coder andchannel decoder will almost certainly introduce some errors in the stream of transmittedsymbols. This could be because the system temporarily suffers from a high noise level, orthat the radio channel is bad. It is the task of the channel coder/decoder pair to recoverany information lost, or at least recognize that some information was lost. This is done byinserting redundancy into the stream of symbols to be sent. Knowing the structure of theredundancy at the receiver side, errors can be corrected, or at least detected.

The last block on the sender side is the modulator. The modulator takes the outputof the channel encoder, and transforms it into a form suitable for launch onto the physicalchannel. This includes mapping bits to symbols, applying pulse-shaping to the symbols so acontinuous waveform is obtained and finally upconverting the signal to the carrier frequency.The waveform is sent to the RF chain of the transmitter, and then converted to RF energy inthe antenna. The modulation and demodulation part of the system can be seen in close-upin Figure 1.2.

Pulse

Shaping

Upconv.

Channel

Downconv.

Matched

Filtering

SamplingDemod.Det.

Figure 1.2: Modulation-Channel-Demodulation Sub-system.

At the receiver, the effect of the carrier frequency is removed in the downconverting step.The signal is then matched filtered to the pulse-shaping waveform, and sampled. Based on thesamples, demodulation and detection is performed resulting in estimates of the transmittedsymbols.

A digital communications system does not necessarily have clear cut lines between thesubsystems of Figure 1.1. For instance, joint channel coding and modulation may give boostsin performance, at the price of higher transceiver complexity.

2

1.2 Multiple Antennas and Wideband Channels

There are several ways to increase the maximum reliable data rate of a wireless link. Bytransmitting more symbols per unit time, that is increasing the bandwidth of the signal, thedata rate is increased. The downside is that bandwidth in the radio spectrum is an expensiveresource due to national and international regulations. Another strategy is to increase thesignal-to-noise ratio (SNR) of the link. A higher SNR means a better quality of the receivedsignal, and therefore more data can be transferred per symbol, leading to increased spectralefficiency. For this case, more power is needed at the transmitter side, which may pose aproblem if the transmitter is battery powered or if there are regulations regarding the amountof acceptable transmitted power at that particular frequency.

A third way of increasing the data rate is by introducing multiple antennas at either orboth sides of the link. If the channels between the various antenna elements are uncorrelated,which they under certain circumstances are, multiple parallel spatial channels arise that canbe used for parallel signalling. By accessing these spatial channels, the data rate can beincreased without consuming more bandwidth or power resources. Information theory showsthat for high SNRs, the theoretically highest data rate of a MIMO channel with uncorrelatedspatial sub-channels grows linearly in the minimum number of antennas at the receiver ortransmitter sides [3].

For radio channels where there are multiple paths from the transmitter to the receiver,several versions of the transmitted signal will be received at different points in time. Thismultipath propagation leads to problem for signals with a high symbol rate, i.e. wide band-width. In effect, different frequencies will be attenuated differently by the radio channel; thechannel is said to be frequency-selective. This wideband behaviour of the channel must bemitigated for reliable communication to take place.

1.3 Problem Formulation and Contributions

This thesis will investigate whether a transmission scheme based on a polynomial singularvalue decomposition of a wideband MIMO channel matrix can achieve the same data rateas a similar system where the singular value decomposition is performed in the frequencydomain, at comparable or lower transmission set-up complexity. In order to do this, somealgorithms for polynomial matrix decomposition will be investigated in terms of complexityand error performance, and the achievable data rate is simulated for communication systemframeworks including these algorithms.

The contributions given by this thesis are:

• Modified versions of the decomposition algorithms of Foster et al. [4] are proposed andanalyzed in terms of complexity and decomposition quality.

• Two rational decomposition algorithms for polynomial matrices are proposed, and theirshortcomings discussed.

• A transmission scheme utilizing the modified polynomial singular value decompositionalgorithm is shown to achieve good performance, in terms of achievable rate, but at arestraining computational load compared to the reference SM by MIMO-OFDM trans-mission scheme.

3

1.4 Thesis Outline

The next chapter will lay the mathematical groundwork for the investigations to come. Poly-nomials and matrices are introduced, as well as the conjunction of the two concepts, which ispolynomial matrices. Finally, an analysis of algorithm run time performance through bench-marking and theoretical analysis is presented.

The following part of the thesis, Chapter 3, will discuss the MIMO channel more deeply.The idea of frequency-selective channels is presented, and the orthogonal frequency-divisionmultiplexing technique for mitigating inter-symbol interference resulting from high data ratesis introduced.

In Chapter 4, a set of polynomial matrix decomposition algorithms is presented. Theyare based on the idea of single coefficient nulling through polynomial Givens rotations. Thechapter states the algorithms, and some analytical and numerical results are presented inorder to examine the properties of the algorithms.

Chapter 5 similarly analyzes some polynomial matrix decomposition algorithms, but theseare instead based on the idea of polynomial nulling through polynomial IIR Givens rotations.

With the algorithms clearly defined, Chapter 6 will employ them in a communications setup. The channel and system models are defined, and the system capacities derived. Usingthese expressions, simulation results are presented showing the capacity of the given systems.

Finally, conclusions will be drawn in Chapter 7. The thesis will be summarized, and keypoints will be presented.

4

Chapter 2

Preliminaries

This chapter is concerned with some fundamental facts and definitions that will be used agreat deal in the rest of the thesis. Definitions of complex polynomials and constant matriceswill be given, and then the two concepts will be joined into complex polynomial matrices.The definitions regarding polynomials, matrices and polynomial matrices are mostly takenfrom [5, 6], where more details are given.

Additionally, a brief discussion about algorithmic complexity will be undertaken. Afterthis chapter, the reader will be aware of all mathematics needed to understand the algorithmsto be presented.

2.1 Complex Polynomials

A Laurent polynomial is an expression of the form

p(z) = C−V1zV1 + . . .+ C−1z

1 + C0 + C1z−1 + . . .+ CV2z

−V2 (2.1)

or more compactly

p(z) =

V2∑v=−V1

Cvz−v V1 > 0, V2 > 0 (2.2)

where z is some indeterminate symbol and the Cv coefficients are taken from some field F. Forour purposes, we will assume that the coefficients are complex numbers, that is F = C. Hence,(2.2) is a complex Laurent polynomial. In the following, we will just call it a polynomial. Fora more in-depth discussion of fields and their properties, see [7, Ch. 7].

For F = C it holds that every polynomial uniquely determines a function [7, p. 297]. Thefunction is evaluated at a point z0 simply by replacing the indeterminate symbol z in (2.2)with z0, and performing the summation. It should be noted though that a polynomial, andthe function defined by the polynomial, are two distinctly different entities.

In order to conveniently classify polynomials, some notation will now be introduced. Therelation ∼, used as

p(z) ∼ (V1, V2,C) (2.3)

signifies that p(z) has V1 positive exponent terms, V2 negative exponent terms and that thecoefficients are in C. The space of all Laurent polynomials with complex coefficients will bedenoted C. The space of all polynomials p(z) ∼ (V1, V2,C) will be C1×1×V1+V2+1. Further on,the maximum degree of p(z) is V1 + V2.

5

It is clear that for V ′1 > V1, V′

2 > V2

p(z) ∼ (V1, V2,C) =⇒ p(z) ∼ (V ′1 , V′

2 ,C). (2.4)

This can be shown to hold by setting the added outer coefficients to zero.The reason for our interest in polynomials is in their relation to Linear Time-Invariant

(LTI) systems. Chapter 3 will explore this relation further.

2.1.1 Addition and Subtraction

Given two polynomials

a(z) ∼ (V1, V2,C), b(z) ∼ (U1, U2,C) (2.5)

the variablesM1 = max(V1, U1), M2 = max(V2, U2) (2.6)

can be defined. Assuming that a(z), b(z) have coefficient sequences

Ca,iV2i=−V1 , Cb,iU2i=−U1

(2.7)

the sum of the polynomials is defined as

(a+ b)(z) = a(z) + b(z) =

M2∑v=−M1

(Ca,v + Cb,v) z−v (2.8)

withCa,v = 0 ∀v /∈ [−V1, V2], Cb,v = 0 ∀v /∈ [−U1, U2]. (2.9)

Subtraction is similarly defined, but replacing Cb,v → (−Cb,v) ∀v.

2.1.2 Multiplication

Multiplication of two polynomials is the convolution of the two coefficient sequences. Giventhe polynomials in 2.5, the product is written as

c(z) = a(z)b(z) =

V2∑v=−V1

Ca,vz−v

U2∑u=−U1

Cb,uz−u

=

V2∑v=−V1

U2∑u=−U1

Ca,vCb,uz−(v+u).

(2.10)

In particular, the coefficient associated with z−r in the product will be given by

Cc,r =∑u,v

u+v=r

Ca,vCb,u =

V2∑v=−V1

Ca,vCb,r−v. (2.11)

DefiningCa,v = 0 ∀v /∈ [−V1, V2], Cb,r−v = 0 ∀(r − v) /∈ [−U1, U2] (2.12)

6

the sum in (2.11) can be written as an infinite sum

Cc,r =

∞∑v=−∞

Ca,vCb,r−v (2.13)

which can be identified as the convolution sum.Let d1, d2 be the maximum degrees of the polynomials a(z), b(z). Then by zero-padding

the two coefficient vectors to length d1 + d2 − 1, the convolution can efficiently be evaluatedusing the convolution theorem [8, p. 191]. That is,

c(z) = F−1d (Fd(a(z))Fd(b(z))) (2.14)

where the transforms are understood to be working on the coefficient vectors of the polyno-mials.

2.2 Polynomial Matrices

A polynomial matrix A(z) is a matrix whose elements are polynomials, or equivalently, apolynomial whose coefficients are matrix-valued [6, p. 24]. An arbitrary polynomial matrix

A(z) =

V2∑v=−V1

Avz−v (2.15)

belongs to the space Cp×q if Av ∈ Cp×q ∀v. For the given V1, V2, we can also write Av ∈Cp×q×(V1+V2+1).

The transpose AT (z), conjugate A∗(z) and Hermitian conjugate AH(z) =(AT (z)

)∗of a

polynomial matrix are obtained by applying the respective operation on each of the coefficientmatrices. In addition, AH(z−∗) will be termed the para-Hermitian conjugate of A(z). Apolynomial matrix which satisfies AH(z−∗)A(z) = I is called a paraunitary matrix [4]. Thistype of matrices will play an important part in the algorithms to be developed in Chapter 4,as their columns are mutually orthogonal over all frequencies. Due to this orthogonality, themultiplication of an arbitrary matrix with a paraunitary matrix preserves the Frobenius normof the original matrix.

The Frobenius norm of the polynomial matrix in (2.15) is defined as

‖A(z)‖F =

√√√√ p∑i=1

q∑j=1

V2∑v=−V1

|[Av]ij |2 (2.16)

where [·]ij denots the (i, j) component of the matrix.In the following, the terms matrix and polynomial matrix will be used interchangeably,

with the understanding that an ordinary matrix is just a polynomial matrix of maximumdegree 0 with a single coefficient matrix.

2.2.1 Givens Rotations

A constant Givens rotation is a unitary transformation which zeroes a specific component ofa vector. For the 2× 2 case, the constant complex Givens rotation is defined as

G =

(cejα sejφ

−se−jφ ce−jα

). (2.17)

7

Applying G to a vector x ∈ C2×1 one obtains

Gx = G

(x1

x2

)=

(cejαx1 + sejφx2

−se−jφx1 + ce−jαx2

)and by selecting

θ =

tan−1

(|x2||x1|

)if |x1| 6= 0

π2 if |x1| = 0

c = cos θ s = sin θ

α = −arg(x1) φ = −arg(x2)

it can be shown that[Gx]2 = 0.

Additionally, since G is unitary, the magnitude squared of the other component will be

|[Gx]1|2 = |x1|2 + |x2|2

which will be referred to as the energy moving property of the Givens rotation. Intuitively,the application of the Givens rotation moves the energy of component 2 to component 1, sothat component 2 becomes zero.

The Givens rotation in (2.17) can be extended so that it zeroes a specific component of ap×1 vector or p× q matrix. For the matrix case, if element (i, j) is to be zeroed, and element(i, i) is to receive the energy, the Givens rotation takens the form of a p × p identity matrixwith elements at the intersections of rows i, j and columns i, j replaced by the elements of(2.17).

For the polynomial matrix case, a polynomial Givens rotation (PGR) can be defined.What will be referred to as a PGR in this thesis is the elementary polynomial Givens rotationof [4], with an elementary delay matrix prepended. The polynomial analogue of (2.17) istherefore

G(z) =

(1 00 z−t

)(cejα sejφ

−se−jφ ce−jα

)(1 00 zt

)=

(cejα sejφzt

−se−jφz−t ce−jα

)(2.18)

which when applied to x(z) =(x1(z) x2(z)

)Twill zero the coefficient associated with z−t in

x2(z) and move the energy to the constant coefficient of x1(z), if the parameters are chosensuch that

θ =

tan−1

(|x2(t)||x1(0)|

)if |x1(0)| 6= 0

π2 if |x1(0)| = 0

c = cos θ s = sin θ

α = −arg(x1(0)) φ = −arg(x2(t))

(2.19)

where xi(j) denotes the coefficient associated with z−j for element xi. In the process, allother coefficients of x1(z) and x2(z) will be affected. It can be shown that GH(z−∗)G(z) = I,and (2.18) is therefore a paraunitary operation. The paraunitarity implies that the operationis norm preserving, and in particular that the energy moving property still holds.

8

The polynomial Givens rotation can easily be extended to the p × q case, in the samemanner as for the constant Givens rotation case. By construction, the p×p polynomial Givensrotation will also be paraunitary. For every application of a polynomial Givens rotation, thedegree of the polynomial matrix it is applied to will grow with 2|t|. This inherent property ofthe PGR will lead to a need for a truncation step in the algorithms to be formed, as furtherexplained in Chapter 4.

The application of a PGR only affects two rows of the matrix it is applied to. Thecomplexity of the application on a p× q matrix with r lags is therefore

CPGR = 2q(r + 2|t|). (2.20)

2.2.2 Decompositions

The (full) QR decomposition of a constant matrix A ∈ Cp×q is

A = QR (2.21)

where Q ∈ Cp×p is a unitary constant matrix, and R ∈ Cp×q is an upper triangular constantmatrix [5, p. 112]. The constant QR decomposition can be calculated in a variety of ways:through Givens rotations, Householder rotations or via the Gram-Schmidt orthogonalizationprocedure [5, pp. 114-117]. An approximate polynomial QR decomposition of a polynomialmatrix A(z) ∈ Cp×q is

A(z) = Q(z)R(z) (2.22)

where Q(z) ∈ Cp×q is an approximately paraunitary polynomial matrix and R(z) ∈ Cp×q isan approximately upper triangular polynomial matrix [4]. Intuitively, this can be seen as aconstant QRD taken over all frequencies.

For an arbitrary constant matrix A ∈ Cp×q, the singular value decomposition (SVD) is

A = UDVH (2.23)

where U ∈ Cp×p, V ∈ Cq×q are unitary matrices and D ∈ Rp×q is a diagonal matrix [5, p.414]. The diagonal entries of D are called the singular values of the matrix A. The columnsof U and V are called the right and left singular vectors of A, respectively. An efficientimplementation for calculating the SVD of a constant matrix can be found in [9, p. 448]

Similarly, an approximate singular value decomposition can be obtained for polynomialmatrices. Given A(z) ∈ Cp×q, a PSVD of A(z) is

A(z) = U(z)D(z)VH(z−∗) (2.24)

where U(z) ∈ Cp×p, V(z) ∈ Cq×q are approximately paraunitary matrices and D(z) ∈ Cp×qis an approximately diagonal matrix [4] whose diagonal elements are called the singular valuesof A(z). As in the QRD case, the intuition is that the polynomial matrix A(z) is decomposedinto its SVD over all frequencies. Note that in this definition there is no assumption ofordering of the singular values, as compared to the ordinary SVD [5, p. 414].

2.2.3 Coefficient Truncation

During certain steps of the algorithms to be investigated in Chapter 4, the maximum degreesof the polynomials involved grow fast. As will be seen, often times most of the energy of the

9

filter coefficients will be concentrated around the constant coefficient. In order to keep themaximum degrees of the polynomials involved down, a truncation step is utilized. Thereby,the storage requirements for the polynomial matrices are reduced, as well as the computationalload involved when applying the decomposition factors as filters. The truncation step isdefined in [4], but is restated in Algorithm 1 for clarity.

Algorithm 1 Polynomial Matrix Truncation

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q) and truncation parameter µ.2: Find the maximum value for T1 so that 1

||A(z)||2∑T1

v=V1

∑pl=1

∑qm=1 |alm(v)|2 ≤ µ

2 holds.

3: Find the minimum value for T2 so that 1||A(z)||2

∑V2v=T2

∑pl=1

∑qm=1 |alm(v)|2 ≤ µ

2 holds.

4: Return Atrunc(z) =∑T2

v=T1Avz

−v

The parameter µ defines the proportion of the total energy of the filter to be truncatedfrom the matrix. It can be shown that the complexity of the naive implementation of thisalgorithm is O(pqr2). The complexity can be decreased by a binary search algorithm.

2.3 Computational Complexity

When studying algorithms from a performance perspective, it is interesting to analyze theirrunning times. This may either be done through benchmarking or theoretical analysis [10,p. 91]. In this thesis, both approaches will be used. Benchmarking is straight-forward, as itonly involves running the algorithm and measuring some quantity related to the performance.By running the algorithm for different sets of input data, experimental relations are obtainedrelating the performance to the properties of the input. The downside of benchmarking isthat the algorithm must be implemented first, and that one can not be sure that the resultsgeneralize.

Theoretical analysis, on the other hand, typically only gives upper bounds of the perfor-mance. This is particularly the case for algorithms containing loops, where the number ofiterations of the loops are not obviously defined in terms of the input data size. In order todeal with bounds in a convenient manner, Ordo or Big-Oh notation is introduced. Assumethat T (n) describes the run time of an algorithm with input data size n. Then we say thatT (n) = O(f(n)) if it holds that

T (n) ≤ cf(n)

for some constant c > 0 and all integers n > n0. Ordo notation is a very convenient tool fordescribing complexity expressions, because of the two following rules as given by [10, p. 98]:

1. Constant factors don’t matter: For any constant d, O(df(n)) = O(f(n)) since theconstant can be accumulated in the hidden constant c of the Ordo notation.

2. Low-order terms don’t matter: For the polynomial case, only the term with the largestdegree needs to be kept, since it will dominate over any other terms for large n.

10

Chapter 3

MIMO Channels and MultipathPropagation

3.1 Propagation and Modeling

Wireless communication uses electromagnetic waves in the radio spectrum for informationtransfer. Electromagnetic waves are completely characterized by the Maxwell equations; aset of non-linear partial differential equations. The inherent complicated structure of theequations makes them hard to solve as well as hard to analyze analytically. Radio engineerstherefore tend to use simpler models, where certain aspects of the radio wave propagationcan more easily be analyzed.

3.1.1 Propagation

In general, the received radio signal is modeled as an attenuated and modified version ofthe transmitted signal, with some noise added. The attenuation can be classified into threetime-scale dependent behaviours: path loss, shadowing and fading [3]. Path loss occursonly because of the distance between the transmitter and the receiver, and can be modeledas a factor 1/rd, where d usually is in the range [2, 4] [3]. At best, which is the case ofan isotropic transmitter antenna with fixed transmit power in empty space, the exponentwill take the value 2. This is because for this scenario, the generated radio waves will bespherically propagating from the antenna, and the amplitude at distance r will decrease as1/r. The power therefore decreases as 1/r2. Due to ground reflections and other phenomena,the exponent can be larger though. The path loss is the most slowly varying of the threeattenuation factors mentioned.

Shadowing takes place on a shorter time scale, and is typically incurred by objects that areblocking the radio waves. In an urban environment, this could be due to cars moving aroundand temporarily changing the propagation paths. Shadowing is typically hard to model, butone common model is the log-normal distribution [3].

The most quickly evolving attenuation phenomenon is fading. This occurs due to radiowaves from different paths adding up constructively or destructively at the receiver. Thisinterference effect changes on the order of half a wavelength, and is therefore sensitive tosmall changes in the propagation environment. Assuming that there is no Line-of-Sight (LOS)component, the fading is called Rayleigh fading. The name arises from the fact that the

11

channel coefficient can be modeled, through the central limit theorem, as a complex Gaussianstochastic variable, whose magnitude square is described by the Rayleigh distribution. Ifthere is a LOS component, the channel gains are instead modeled by a Rice distribution, andthe phenomenon is called Rician fading.

A subject that we have touched upon, but not discussed upfront, is that of multipathpropagation. Depending on the environment, a transmitted radio wave may take severaldifferent paths to the receiver. An example can be seen in Figure 3.1. Depending on the pathlengths, and the associated reflections, several delayed versions of the same signal will reachthe receiver. The delay spread of a channel is a measure of how large spread it is betweenthe first and the last multipath component to arrive at the receiver. For short delay spreads,compared to the transmit symbol period, the channel is said to have narrowband fading,for which the Rayleigh and Rice models work well. On the other hand, for channels withlarge delay spreads, other models must be used. The coherence bandwidth Bc of a channel iswhat width in the frequency domain can be assumed constant, and is approximately inverselyproportional to the delay spread.

Figure 3.1: Example of Multipath Propagation

For channels where the delay spread is larger than the symbol period, an effect calledinter-symbol interference (ISI) takes place. Since the symbols are transmitted at a high rate,the channel impulse response will not have enough time to ”die off”. Instead, subsequentsymbols will be overlaid at the receiver, and therefore interfere with each other. This type ofchannels are also called frequency selective channels, since different frequency components ofthe transmitted signal will be attenuated by different factors.

12

3.1.2 Channel Modeling

In this section, the channel is assumed to be described well by a single-input single-output(SISO) linear system. A linear system is characterized by its impulse response at time t,h(t; τ). For a transmitted signal s(t), the received signal r(t) is modeled as a filtered versionof s(t) with some noise n(t) added, such that

r(t) =

∫ ∞−∞

h(t; τ)s(t− τ)dτ + n(t).

In the following, we will assume that the channel is time-invariant over the transmission of ablock of data, and the channel impulse response can be replace by h(τ) such that

r(t) =

∫ ∞−∞

h(τ)s(t− τ)dτ + n(t). (3.1)

The output of the channel can be thought of like a weighted sum of the input signal x(t) withweighting factor h(τ). A graphical representation of a SISO channel can be seen in Figure3.2.

Figure 3.2: Single-input single-output Channel

Radio transmissions always take place at some carrier frequency, which is modulated fordata transfer. The carrier frequency itself does not carry any information, and it is thereforeconvenient to remove the effect of the carrier in any analysis of the signal. For the exampleabove, all signals involved are assumed to be real, since they relate to physical processes. LetS(f) be the Fourier transform of s(t), band-limited to [fc −W/2, fc + W/2] where fc is thecarrier frequency and W is the signal bandwidth. For consistency, W < 2fc. The complexbaseband equivalent version of the signal is then defined through its Fourier transform

Sb(f) =

√2S(f + fc) f + fc > 0

0 f + fc ≤ 0. (3.2)

Since s(t) is real, its Fourier transform is symmetric around the origin. The transformation(3.2) therefore effectively moves the positive spectrum down from the carrier frequency tobaseband, and scales it so that the sum power remains constant. The baseband equivalentspectrum will not be symmetric around the origin anymore, and the signal sb(t) defined bySb(f) is therefore complex.

Note that the original signal s(t) can easily be recovered from sb(t) since

S(f) =1√2Sb(f − fc) + S∗b (−f − fc)

and by taking inverse Fourier transforms

s(t) =1√2

(sb(t)e

j2πfct + s∗b(t)e−j2πfct

)=√

2Real(sb(t)ej2πfct). (3.3)

13

The complex baseband equivalent signal is more convenient to work with than the passbandsignal, and since the transformation is invertible no information is lost in the conversionprocess.

Finally, in order to work with the signals in a computer, a discrete-time model is needed.Sampling (3.1) faster than the Nyquist rate, it can be shown that a discrete-time model takesthe form

r[m] =∞∑

l=−∞h[l]s[m− l] + n[m] (3.4)

where h[l] is determined from the transmit, channel, and receive filters in place. For the fullderivations, see e.g. [11, p. 49].

3.1.3 MIMO Channels

A multiple-input multiple-out (MIMO) channel is a channel with several transmit and/orreceive antennas. It is described by a matrix, since every transmit-receive antenna pair hasa channel associated with it. These single-input single-output (SISO) sub-channels may becorrelated. A graphical representation of a MIMO channel can be seen in Figure 3.3. Thenumber of antennas used in MIMO communications systems are typically on the order of 2-4antennas on either or both sides. For example, the IEEE 802.11n WiFi standard allows forup to 4 antennas on both sides [12].

.

.

....

Figure 3.3: Multiple-input multiple-output Channel

Following the same derivations as in the previous section, it can be shown that discrete-time complex baseband equivalent system for the MIMO channel with Mr receive antennasand Mt transmit antennas takes the form

r[m] =∞∑

l=−∞H[l]s[m− l] + n[m] (3.5)

14

where r[m],n[m] ∈ CMr×1,H ∈ CMr×Mt , s[m] ∈ CMt×1. For the narrowband case, (3.5)simplifies to

r[m] = Hs[m] + n[m].

The matrix channel H[m] describes how the transmitted signal is mixed in space andtime, when sampled at the receiver. Row i of H[m] determines the weighting factors for thereceived signal at receiver antenna i at time lag m. Similarly, column j of H[m] describes thespatial signature of the signal from transmitter antenna j at time lag m.

Taking the z-transform of both sides of (3.5), and using the convolution theorem for thez-transform [8, p. 191], the relation is described by the polynomial matrix equation

r(z) = H(z)s(z) + n(z), (3.6)

a fact we will rely heavily on in the following.

3.2 Channel Capacity and Achievable Rate

In his seminal paper ”A Mathematical Theory of Communication” [13], Claude E. Shannon,introduced the concept of a channel capacity, and derived it for the additive White Gaussiannoise (AWGN) SISO channel with an average power constraint. The channel capacity, ifknown for the given channel model, is the highest possible rate that can be used for commu-nication over the channel with vanishing error probability. That is, for any rate below thechannel capacity, there exists a code, with long enough codewords, that achieves that ratewith arbitrarily small error probability.

Shannon’s results provide a benchmark for which any practical transmission scheme canbe compared. The proof of the channel coding theorem is not constructive however, and it isnot until recently with the introduction of turbo codes and low density parity check (LDPC)codes that practical systems get close to the channel capacity [2, p. 252].

In [13], Shannon derived the channel capacity for the deterministic AWGN channel withan average power constraint. With system bandwidth W , noise variance σ2

n and averagepower less than P , the well-known formula for the capacity is

SAWGN = W log

(1 +

P

σ2n

).

A similar result, for the channel capacity of deterministic narrowband AWGN MIMOchannel with average sum power constraint, was derived in [14]. The capacity of the channel,with spatially and temporally white Gaussian noise with variance σ2

n, is given by

SMIMO-AWGN = maxP

log

∣∣∣∣I +1

σ2n

HPHH

∣∣∣∣where P = E

(ssH

)is the covariance of the transmit symbol s. The rate R < SMIMO-AWGN

R = log

∣∣∣∣I +1

σ2n

HPHH

∣∣∣∣ (3.7)

is called an achievable rate, since for a sub-optimal choice of P it is less than the channelcapacity. The noisy channel coding theorem of [13] therefore gives that such a rate R exists,since R < S and a code with rate S exists. For our purposes, (3.7) will simply serve as amapping from SNR to data rate, which will be useful for comparison of transmission schemesin Chapter 6.

15

3.3 Equalization Techniques

For a wideband channel, multipath propagation is inherent. For high symbol rates, this resultsin ISI. What is meant with a high symbol rate is not clear cut, but the rule of thumb is thatISI must be mitigated when W >> Bc.

In order to do symbol-by-symbol detection, the receiver needs to remove the ISI. Theoperation is called channel equalization, and there are many different strategies on how todo it. The equalizer needs information about how the channel behaves, typically the channelimpulse response. This is obtained by a training phase, where the receiver can identify thechannel from a sequence of known symbols that are sent. The optimal equalization, in thesense of minimizing the probability for that a symbol in the sequence is detected incorrectly,is called Maximum Likelihood Sequence Estimation (MLSE), and is implemented in practiceusing the Viterbi algorithm [15, p. 88]. A downside of MLSE is that the algorithm complexitygrows exponentially in the number of channel taps [15, p. 90]. On the other side is the familyof low-complexity linear equalizers. A common linear equalizer, the zero-forcing equalizer,removes the effect of ISI completely but suffers from noise amplification. Another equalizer,the MMSE receiver, makes an optimal trade-off between removing ISI and amplifying thenoise, for the class of linear receivers.

Orthogonal Frequency-Division Multiplexing

For channels with high spectral dynamics, that is small coherence bandwidth Bc, there isanother viable alternative. Orthogonal frequency-division multiplexing (OFDM) leveragesthe Fast Fourier Transform (FFT) and can handle quasi-static wideband channels well [15, p.99]. In fact, OFDM can achieve the channel capacity as the number of sub-carriers grow large.Through OFDM, signaling is performed in the frequency domain. In order to launch the signalonto the channel, it is transformed to the time domain using an Inverse FFT (IFFT). At thereceiver, the received signal is transformed to the frequency domain, where detection takesplace. Effectively, OFDM transforms the wideband channel into a set of parallel independentchannels in the frequency domain.

For the SISO case, assume that a sequence of N bits b(n) ∈ 0, 1 is to be transmitted.Through some mapping from b(n), a frequency vector s′ ∈ CN×1 is created. The frequencyvector is transformed to the time domain through the application of an IDFT matrix F ∈CN×N defined by

Fij =1√Ne−

j2π(i−1)(j−1)N . (3.8)

The time domain representation then takes the form

s′′ = FHs′. (3.9)

In order to make the cyclic convolution of the DFT into a linear convolution, a cyclicprefix is prepended so that the signal

s =(s′′[N − (L− 1)] . . . s′′[N − 1] s′′[0] s′′[1] . . . s′′[N ]

)T(3.10)

is transmitted on the channel. Because of the cyclic prefix, the output of the channel will bea cyclic convolution of the input signal, plus noise

r[m] =

L−1∑l=0

hls′′[(m− L− l)modN ] + w[m] (3.11)

16

At the receiver, stripping of the cyclic prefix, the vector

r′′ =(r[0] r[1] . . . r[N − 1]

)T(3.12)

is formed. Transforming into the frequency domain, the received frequency vector

r′ = Fr′′ (3.13)

is obtained, which then is detected for every vector component. It can be shown that thanksto the cyclic prefix, the channel is rendered circulant, and the IFFT/FFT pair diagonalizescirculant matrices. The model, in the frequency domain, for the communications process istherefore

r′ = Ωs′ + w (3.14)

where Ω is a diagonal matrix with the channel gains of the different frequency bins on thediagonal. The noise keeps its characteristics due to the unitarity of the FFT transform.

With the channel diagonalized over the frequency band, the transmitter is free to selectvarying transmit powers for the different frequency bins. The optimal way, in the sense ofachievable rate, of doing this is by the waterfilling technique as further described by [11, p.68]. The waterfilling strategy is also applied in Chapter 6, where a wideband MIMO channelis diagonalized in frequency through OFDM.

3.4 Summary

This chapter provided an introduction to channel modeling in wireless communications, as wellas the concepts of channel capacity and achievable rate. Furthermore, multipath propagationand the sometimes ensuing inter-symbol interference was introduced. Some strategies forcombating ISI was discussed, and the OFDM technique was described in detail.

In Chapter 6, most of the ideas of this chapter will be referred to. In particular, OFDMwill be used to mitigate ISI for the MIMO wideband scenario present there.

17

Chapter 4

Polynomial DecompositionAlgorithms: Coefficient Nulling

This chapter will investigate a couple of polynomial decomposition algorithms which are basedon the idea of iterative nulling of single coefficients. Polynomial Givens Rotations (PGRs)are employed for the coefficient nulling, as defined in Section 2.2.1. Through the applicationof consecutive PGRs on a matrix, decompositions such as PQRD and PSVD can be found.Four algorithms will be studied, of which two were proposed by Foster et. al. in [4]. Theremaining two are slightly modified versions of the original algorithms.

The decompositions generated by the algorithms will be approximations, because as shownby [16], an exact FIR decomposition of a FIR matrix is impossible to achieve. In this thesis,approximate polynomial decomposition algorithms will be used for the channel diagonalizationproblem of spatial multiplexing in wireless communications. This topic is further studied inChapter 6.

In [17], McWhirter proposes a different procedure for obtaining a polynomial singular valuedecomposition. There, a sequential best rotation algorithm is introduced using generalizedKogbetliantz transformations. The algorithm, which is not studied in this thesis, is shown toperform better than previous sequential best rotation procedures.

The first section of this chapter will describe the performance measures employed in thestudy of the algorithms. As these are defined, the following sections will investigate onealgorithm at a time, with respect to function, convergence and complexity.

4.1 Performance Criteria

In order to measure performance of the algorithms, a number of performance criteria will bedefined. For the run time measurements, or algorithm complexity, the number of iterations ofthe innermost loop (coefficient steps, see Section 4.2) needed until convergence will be takenas the performance measure. The complexity in terms of floating point operations is thensimply obtained by plugging in the complexity of a single coefficient step in terms of floatingpoint operations, as given by equation (2.20).

Before introducing the other performance criteria, the following optimization problems

18

are posed:

minimize ‖A(z)−Q(z)R(z)‖F minimize ‖A(z)−U(z)D(z)VH(z−∗)‖Fsubject to QH(z−∗)Q(z) = I subject to UH(z−∗)U(z) = I

VH(z−∗)V(z) = I

‖Rlower(z)‖F = 0 ‖Dnon-diag(z)‖F = 0.

Casually speaking, the PQRD and PSVD algorithms under study, respectively, can be thoughtto approximately solve these optimization problems. The resulting matrices Q(z),R(z) orU(z),D(z),V(z) would however neither be feasible nor minimize the cost function. Rather,the algorithms will output matrices that are ”close” to fulfilling the constraints, while havinga ”small” associated cost. With this loose argument in mind, error criteria will be definedthat determine how ”close” a set of matrices are to fulfilling the constraints, and how ”small”the associated cost is.

How well the product of the decomposition matrices describes the decomposed matrix ismeasured by the decomposition error

EQRDd =

‖A(z)−Q(z)R(z)‖F‖A(z)‖F

(4.1)

for a PQRD A(z) = Q(z)R(z). For the same PQRD, the triangularity error

Et =‖Rlower(z)‖2F‖A(z)‖2F

. (4.2)

shows the ratio of the amount of energy in the lower triangular section of R(z) to the totalamount of energy. Similarly, for a PSVD A(z) = U(z)D(z)VH(z−∗) the decomposition erroris

ESVDd =

‖A(z)−U(z)D(z)VH(z−∗)‖F‖A(z)‖F

. (4.3)

This decomposition error definition makes sense from the algorithm evaluation point of view,as it represents how well the decomposition factors describe the original matrix. From anapplication point of view, it might be interesting to study the normalized version of ‖D(z)−UH(z−∗)A(z)V(z)‖F , as that definition instead would describe the error in the calculatedsingular values of the original matrix.

The relative amount of energy in the non-diagonal part of D(z) is described by the diag-onality error

Ediag =‖Dnon-diag(z)‖2F‖A(z)‖2F

. (4.4)

Finally, the unitarity error of any matrix A(z) is defined as

Eu =‖I−AH(z−∗)A(z)‖F

‖I‖F(4.5)

and indicates how close A(z) is to being paraunitary.

19

4.2 PQRD-BC: Polynomial QR Decomposition

The first algorithm to be studied is PQRD By Columns (PQRD-BC), which was first intro-duced in [4]. The algorithm generates an approximate PQRD

A(z) = Q(z)R(z)

of an arbitrary matrix A(z), where Q(z) is approximately paraunitary and R(z) is approxi-mately upper triangular. This is done through the iterative application of PGRs, until A(z)has been transformed into a sufficiently upper triangular matrix. As will be shown in Sec-tion 4.4, PQRD-BC will be a necessary component of the PSVD by PQRD-BC algorithm.

As the next algorithm to be studied, Modified PQRD-BC, is a derivative of PQRD-BC, wewill refrain from presenting simulation results for PQRD-BC. Rather, the results for ModifiedPQRD-BC given in Section 4.3 also represent the typical behaviour of PQRD-BC.

Though the original definition of PQRD-BC can be found in [4], we restate the algo-rithm here for completeness. A pseudocode representation of the algorithm can be seen inAlgorithm 2.

Algorithm 2 PQRD By Columns (PQRD-BC)

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q), convergence parameter ε, truncationparameter µ and absolute stopping criteria MaxIter and MaxSweeps.

2: Let Q(z) = Ip, g1 = 1 + ε and n = 0.3: while n ≤ MaxSweeps and g1 > ε do4: Let n = n+ 1.5: for k = 1 . . .min(p− 1, q) do6: Let iter = 0 and g2 = 1 + ε7: while iter ≤ MaxIter and g2 > ε do8: Find j and t such that |ajk(t)| ≥ |amk(t)| holds for m = k + 1 . . . p and ∀t ∈ Z.9: Let g2 = |ajk(t)|.

10: if g2 > ε then11: Let iter = iter + 1.12: Obtain PGR G(z) as a function of (j, k, t, |ajk(t)|, |akk(0)|).13: Let A(z) = G(z)A(z).14: Let Q(z) = G(z)Q(z).15: Truncate A(z), Q(z) given µ.16: end if17: end while18: end for19: Find j, k and t such that |ajk(t)| ≥ |amk(t)| holds for n = 1 . . . q and m = n + 1 . . . p

and ∀t ∈ Z.20: Let g1 = |ajk(t)|.21: end while22: Let R(z) = A(z).

For future reference, the block of rows 12-15 will be called a coefficient step and theoperations in rows 6-17 a column step. The algorithm operates over all columns of A(z)from left to right. For every column, a certain number of coefficient steps are performed,

20

until the coefficients in the given column are sufficiently small. To determine what coefficientto be nulled next, the coefficient with greatest magnitude under the main diagonal in thegiven column is found. This coefficient is subsequently nulled through the application of apolynomial Givens rotation with appropriate parameters. This behaviour is repeated untilall coefficients in the given column have a magnitude less than the convergence parameter ε.As one column step is finished, the algorithm moves to the next column, until all columnshave been traversed. As a safe-guard, the algorithm can restart from column 1 if necessary,making another sweep over the columns.

Every coefficient step includes a matrix truncation, implemented as described by Algo-rithm 1. Without this truncation step, the maximum degrees of the matrix polynomials wouldgrow very fast, and with that the memory requirements. As is generally the case, most of theenergy of the coefficients is centered around the zero-lag coefficient, and the decomposition istherefore not ruined. This can specifically be seen in the example in Section 4.3.2.

4.2.1 Convergence and Complexity

The polynomial Givens rotation is a paraunitary transformation, and therefore energy pre-serving. By selecting parameters according to equation (2.19), the rotation has the effectof moving energy from the coefficient being nulled to the zero-lag coefficient of the diagonalelement of the current column. After a column step is finished, most of the energy of thecoefficients below the main diagonal will have been transferred to the diagonal element. Be-cause the columns are visited in order from left to right, any subsequent coefficient step willmainly affect columns including and to the right of the current column. This is because thecoefficients in the previous columns will be close to being zero. The left-to-right ordering ofcolumn steps, together with the fact that the algorithm is allowed to restart for more sweeps,guarantees the convergence of the algorithm. The full convergence proof can be found in [4].

In order to derive the theoretical complexity, assume that PQRD-BC is applied to A(z) ∼(V1, V2,Cp×q). Additionally, it will be assumed that the time lag dimension of any matrixinvolved will be bounded by some r ∝ (V1 +V2 + 1), because of the truncation step at row 15.The complexity will be derived in terms of number of coefficient steps needed for convergence.

By definition, the block of rows 12-15 is one coefficient step. Let the complexity of thecoefficient step be O(Cc) where Cc is some function of p, q, r. Since row 11 is O(1), it isinsignificant compared to the complexity of the coefficient step. Rows 8 and 19 involvesearches for the coefficient with the largest magnitude under the main diagonal in the currentcolumn, and under the main diagonal in any column respectively. Denote the complexities ofthese algorithms as O(Cjt) and O(Cjkt).

The number of iterations of the while loop starting at row 7 is bounded by the con-stant MaxIter, and the complexity of the loop is therefore O(MaxIterCc) +O(MaxIterCjt) =O(Cc) +O(Cjt). This is a bound with a very large hidden constant in the Ordo expression,and may therefore not be representative for the typical behaviour of the loop. In practice, thenumber of iterations of the loop would be proportional to the number of coefficients belowthe main diagonal of that column. The modified PQRD-BC given in Algorithm 3 does notsuffer from this theoretical inconvenience, as the convergence criterion is defined differently.

Furthermore, the for loop at row 5 will iterate min(p−1, q) times, and the complexity of theblock of rows 6-17 is therefore O(min(p− 1, q)Cc) +O(min(p− 1, q)Cjt). The number of iter-ations (sweeps) of the outermost loop of the algorithm is bounded by a constant MaxSweeps.The complexity of the iterative part of the algorithm is therefore O(MaxSweeps min(p −

21

1, q)Cc) + O(MaxSweeps min(p − 1, q)Cjt) + O(MaxSweepsCjkt) = O(min(p − 1, q)Cc) +O(min(p− 1, q)Cjt) +O(Cjkt).

The set-up costs at rows 1-2 are O(pqr)+O(p2)+O(1) = O(pqr)+O(p2). Finally, the costof row 22 is O(pqr). Writing it out, the theoretical complexity of the PQRD-BC algorithm is

CPQRD-BC = O(min(p− 1, q)Cc) +O(min(p− 1, q)Cjt) +O(Cjkt) +O(pqr) +O(p2). (4.6)

Recall that the Ordo notation contains an unknown constant, which in this case is large. Thisexpression may therefore not give an accurate description of the complexity of the PQRD-BCalgorithm.

A brief simulation showed that for ε = 10−3 and a matrix with coefficients drawn from azero-mean circularly symmetric normalized Gaussian distribution, the hidden constant in theOrdo expression of the first term of (4.6) was on the order of 102.

4.2.2 Discussion

The algorithm presented performs an approximate PQRD A(z) = Q(z)R(z) of an arbitrarymatrix A(z). Convergence is defined in terms of an absolute criterion, which is reachedeventually as shown in [4]. Because of its iterative behaviour and the way the maximumnumber of loop iterations is defined, it is hard to get tight bounds for the analytical complexityexpressions. The complexity given by equation (4.6) may therefore be of less significance.

Due to the absolute convergence criterion, the algorithm is unsuitable for direct use ina scenario where the scaling of the filter taps may vary over time. This is the case for acommunications system. One way to overcome this problem could be to normalize the matrixfirst, and then applying the algorithm accordingly. This has the effect of mimicking a relativeconvergence criterion, but it would be better if the algorithm itself could deal with arbitrarilyscaled matrices. An algorithm which has that ability is introduced in the next section.

4.3 MPQRD-BC: Modified PQRD-BC

In order to be better suited for implementation in a communications system, this sectionproposes some changes to the PQRD-BC algorithm. This will result in the Modified PQRD-BC (MPQRD-BC) algorithm, which will use a relative convergence criterion as opposed tothe absolute criterion of PQRD-BC. The definition of convergence is changed so that given aparameter 0 < εr < 1 and matrix A(z) ∼ (0, r,Cp×q), convergence is reached when

‖Rlower(z)‖2F‖A(z)‖2F

< εr. (4.7)

That is, convergence is defined as the state when the triangularity error, or the relative amountof squared magnitude of the coefficients below the main diagonal, is less than εr. Equation(4.7) defines convergence for the outer loop of the algorithm, but the column step convergencecriterion needs also to be changed. This is done by finding the coefficient in the column withthe largest magnitude g2. The algorithm will move to a new column when

rNsubg2

2

‖A(z)‖2F< εr (4.8)

22

where Nsub is the number of matrix elements below the main diagonal and r is the greatestmaximum degree of any polynomial in the matrix. That is, the algorithm moves to the nextcolumn when the largest magnitude of all coefficients is so small, so that if all coefficientsbelow the main diagonal in the matrix would have this magnitude, the algorithm would haveconverged. The value for r is updated for every coefficient step, to reflect the changes inmaximum degrees due to the polynomial Givens rotations and the truncation step.

Additionally, we will take the opportunity to redefine the way the maximum number ofcoefficient steps allowed is determined. In PQRD-BC, a parameter MaxIter is simply passedto the algorithm, and no more than MaxIter coefficient steps will be performed per columnand sweep. In MPQRD-BC, this constant is instead defined to be

MaxIter = −bρrNsub log10(εr)c (4.9)

where ρ is some positive integer parameter. The rationale behind the definition is that thenumber of coefficient steps needed probably is related to the number of coefficients below themain diagonal, as well as the convergence criterion selected. A brief simulation was performed,and the results showed that the coefficient steps needed were approximately linear in thelogarithm of the convergence criterion. Because of the rather arbitrary choice of MaxIter, afudge factor ρ is added to the expression. Solving equation (4.9) for ρ gives an expression interms of MaxIter, and the behaviour of PQRD-BC, where MaxIter is explicitly defined, cantherefore be simulated by selecting ρ accordingly. Finally, notice that the logarithm of εr isnegative, hence the minus sign in equation (4.9).

Putting the changes introduced into perspective of the rest of the algorithm, we stateModified PQRD-BC using pseudocode in Algorithm 3.


The convergence proof for MPQRD-BC follows directly from the convergence proof of PQRD-BC in [4], given large enough ρ and MaxSweeps. Because the iterative nulling of coefficientsbelow the main diagonal, and the inherent energy moving property of the polynomial Givensrotation, the sum of the magnitude squared of the coefficients below the main diagonal willdecrease. Convergence with respect to the coefficient with the largest magnitude below themain diagonal will therefore imply convergence in the sense of amount of energy below themain diagonal, for some convergence parameter εr. The changed maximum number of iter-ations for the loops do not change this fact, as they do not alter the main behaviour of thealgorithm.

Because of the changes introduced, the theoretical complexity expressions will be differentcompared to PQRD-BC. Some rows have been added to the algorithm, and the number ofloop iterations will have changed. In the following derivation, the same assumptions andnotation as in Section 4.2.1 is used.

Starting with the block enclosed by the if statement at row 13, the only new statementis at row 19, which is O(1). The entire block therefore has the same complexity as theequivalent block of PQRD-BC, that is O(Cc). The block inside the while loop of row 9 isfunctionally equivalent to the inner while loop of PQRD-BC. Row 12 is new though, and it isO(pqr) +O(1) = O(pqr). Because MaxIter is given by row 4, this gives that the complexityof the inner while loop at row 9 is given by

O(−bρrNsub log10(εr)cCjt) +O(−bρrNsub log10(εr)cpqr) +O(−bρrNsub log10(εr)cCc).

23

Algorithm 3 Modified PQRD By Columns (MPQRD-BC)

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q), convergence parameter εr, truncationparameter µ and stopping criteria ρ and MaxSweeps.

2: Let Q(z) = Ip, h1 = 1 + εr, n = 0 and A0(z) = A(z).

3: Let Nsub = pmin(p− 1, q)−∑min(p−1,q)1 1 and r = V1 + V2 + 1.

4: Let MaxIter = −bρrNsub log10(εr)c.5: while n ≤ MaxSweeps and h1 > εr do6: Let n = n+ 1.7: for k = 1 . . .min(p− 1, q) do8: Let iter = 0 and h2 = 1 + εr9: while iter ≤ MaxIter and h2 > εr do

10: Find j and t such that |ajk(t)| ≥ |amk(t)| holds for m = k + 1 . . . p and ∀t ∈ Z.11: Let g2 = |ajk(t)|.12: Let h2 = rNsub

g22‖A0(z)‖2F

13: if h2 > εr then14: Let iter = iter + 1.15: Obtain PGR G(z) as a function of (j, k, t, |ajk(t)|, |akk(0)|).16: Let A(z) = G(z)A(z).17: Let Q(z) = G(z)Q(z).18: Truncate A(z), Q(z) given µ.19: Let r = U1 + U2 + 1 if A(z) ∼ (U1, U2,Cp×q).20: end if21: end while22: end for23: Find j, k and t such that |ajk(t)| ≥ |amk(τ)| holds for n = 1 . . . q and m = n + 1 . . . p

and ∀τ ∈ [−V1, V2].

24: Let h1 =‖Alower(z)‖2F‖A0(z)‖2F

.

25: end while26: Let R(z) = A(z).

24

Neglecting row 8, as it only is O(1), the for loop at row 7 has complexity

O(−bρrNsub log10(εr)c (Cjt + pqr + Cc) min(p− 1, q)).

The number of iterations (sweeps) of the outermost loop is bounded by the constant MaxSweeps.Using the rule of constants for the Ordo operator, the complexity of the iterative part of thealgorithm is therefore

O(−bρrNsub log10(εr)c (Cjt + pqr + Cc) min(p− 1, q)) +O(Cjkt) +O(pqr).

The set-up costs are the same as for PQRD-BC, that is O(pqr)+O(p2). Row 26 is O(pqr),and therefore the theoretical complexity of MPQRD-BC is

CMPQRD-BC = O(−bρrNsub log10(εr)c (Cjt + pqr + Cc) min(p−1, q))+O(Cjkt)+O(pqr)+O(p2)(4.10)

For εr = 10−3 and matrix with coefficients drawn from a zero-mean circularly symmetricnormalized Gaussian distribution, a brief simulation showed that the hidden constant in theOrdo expression of the first term of (4.10) was on the order of 101.

4.3.2 Simulations

In this section, some numerical results regarding the properties of the MPQRD-BC algorithmare presented. First, the results of the algorithm working on a 3 × 3 matrix will be shown.Secondly, the results of a decomposition quality study will be presented. Finally, the run timeof the algorithm has been measured for various sizes of input data and parameter values.

A First Example

A matrix A(z) ∼ (0, 2,C3×3) was generated, with coefficients taken as observations drawnfrom a zero-mean normalized circularly symmetric Gaussian distribution. Viewing the matrixas an FIR filter, the impulse response can be seen in Figure 4.1a.

The MPQRD-BC was applied to A(z) using parameters εr = 10−3, µ = 10−6, ρ =2,MaxSweeps = 10. The resulting Q(z) and R(z) can be seen in Figures 4.2a and 4.1brespectively. One sweep and 72 coefficient steps were necessary for convergence. The decom-position errors are shown in Table 4.1. The error values indicate a good decomposition, with

Table 4.1: Errors for PQRD of Matrix A(z).

Decomposition Triangularity Unitarity

4.6 · 10−3 1.5 · 10−4 4.8 · 10−3

an almost paraunitary Q(z). Clearly, the triangularity error is less than εr, as expected.Interestingly, it can be seen in Figure 4.1b that for all but the last matrix element on

the diagonal, the zero-lag coefficient has the largest magnitude. This is due to the way thepolynomial Givens rotation works; moving energy from the coefficient being nulled to thezero-lag coefficient of the diagonal matrix element of the column. Because there are morecoefficients below the (1, 1) element than the (2, 2) element, the zero-lag coefficient of theformer has larger magnitude than the zero-lag coefficient of the latter.

25

Lags

Tap

Magnitudes

Matrix A(z)

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

(a) Impulse Response of Original Matrix A(z) ∼(0, 2,C3×3)

Lags

Tap

Magnitudes

Matrix R(z)

−10 0 10−10 0 10−10 0 10

−10 0 10−10 0 10−10 0 10

−10 0 10−10 0 10−10 0 10

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

0

1

2

3

4

(b) Impulse Response of Upper Triangular Matrix

Figure 4.1: The Original and Upper Triangular Matrices Obtained From MPQRD-BC Forεr = 10−3, µ = 10−6, ρ = 2.

Lags

Tap

Magnitudes

Matrix Q(z)

−10 0 10−10 0 10−10 0 10

−10 0 10−10 0 10−10 0 10

−10 0 10−10 0 10−10 0 10

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

(a) Impulse Response of Paraunitary Matrix Q(z)

Lags

Tap

Magnitudes

Matrix Q(z)QH(z−1)

−20 0 20−20 0 20−20 0 20

−20 0 20−20 0 20−20 0 20

−20 0 20−20 0 20−20 0 20

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

(b) Impulse Response of QH(z−∗)Q(z)

Figure 4.2: The Paraunitary Matrix Obtained From MPQRD-BC For εr = 10−3, µ =10−6, ρ = 2.

26

Decomposition Quality

In this section, the connection between choice of algorithm parameter values and decom-position quality will be presented. The errors defined in equations (4.1), (4.2), (4.5) werecalculated for a set of PQRDs generated by MPQRD-BC. The errors were measured for var-ious choices of the convergence parameter εr and truncation parameter µ, and then averagedover 100 independently generated matrices. The other parameters were set so that ρ = 2,MaxSweeps = 10. The matrices were A(z) ∼ (0, 2,C3×3) with coefficients drawn from azero-mean normalized circularly symmetric Gaussian distribution.

The relative decomposition errors can be seen in Figure 4.3. As expected, smaller µgives a better decomposition, since less energy of the matrices involved is removed, and theytherefore stay closer to the ideal matrices. A striking result is that the relative decompositionerrors go up for decreasing εr in Figure 4.3. However, note that the relative decompositionerror indicates how well the decomposition imitates the original matrix, and not whether thedecomposition is close to a true QR decomposition. A zero decomposition error could beachieved for the degenerate case of ρ = 0. That would result in Q(z) = I,R(z) = A(z),which minimizes the decomposition error, but is not close to the QR factorization. The factthat the relative decomposition error increases for decreasing εr does therefore not necessarilymean that larger εr is better.

The unitarity error of the decomposition is shown in Figure 4.4. As expected, the unitarityerrors go down for decreasing µ because less energy is truncated. It can also be seen thatsmaller εr gives larger unitarity errors. This may be because smaller εr results in morecoefficient steps, and therefore more truncations.

The last performance measure is the triangularity error shown in Figure 4.5. Per definition,the triangularity error is always less than εr. The choice of value for µ does not to any largedegree affect the triangularity error.

Complexity

In the run time performance simulation, the number of coefficient steps needed for convergencewas measured for different input matrices and algorithm parameters.

For the effects of input data size, three simulation series were run, each varying oneof the dimensions of the input matrix while keeping the rest constant. For every series,100 polynomial matrices were generated with coefficients drawn from a zero-mean circularlysymmetric Gaussian distribution. For every matrix, MPQRD-BC was applied to all principalsub-matrices obtained by removing a number of trailing columns and rows. By maintaininga dimension of 3 for the independent dimension sizes, this way, the matrix input size wasvaried. Finally, the measurements, as functions of matrix size, were averaged over the 100realizations. The algorithm parameter values used are shown in Table 4.2.

Table 4.2: MPQRD-BC Parameter Values for Spatial/Temporal Series.

Indp. Dim. Size Convergence εr Truncation µ MaxIter Factor ρ MaxSweeps

3 10−3 10−6 2 10

The run times for the spatial series, in terms of number of coefficient steps, can be seen

27

RDE

Truncation Parameter µ

Rel. decomposition error as a function of algorithm parameters (MPQRD-BC)RDE

Convergence Parameter εr

10−6 10−5 10−4 10−3 10−2 10−1 100

10−6 10−5 10−4 10−3 10−2 10−1 100

10−4

10−2

100

10−4

10−3

10−2

Figure 4.3: Decomposition Error as a Function of ε and µ, Averaged over 100 Matrices.

εr = 10−1εr = 10−3

Unitarity error as a function of algorithm parameters (MPQRD-BC)

Unitarityerror


10−6 10−4 10−210−4

10−2

100

Figure 4.4: Unitarity Error as a Function of ε and µ, Averaged over 100 Matrices.

µ = 10−3µ = 10−6

Triangularity error as a function of algorithm parameters (MPQRD-BC)

Trian

gularity

error


10−6 10−5 10−4 10−3 10−2 10−1 10010−10

10−5

100

Figure 4.5: Triangularity Error as a Function of ε and µ, Averaged over 100 Matrices.

28

in Figure 4.6. The graphs suggest that the run time is linear in the number of rows andtaps, but approximately constant in number of columns. The slope of the curve for the rowseries abruptly changes at the point for 3 rows. This is because the min(p − 1, q) factor inthe complexity expression changes at that point, since the number of columns in the rowmeasurement series was 3.

In order to analyze the effect of selection of algorithm parameters on the run time,MPQRD-BC was applied to 100 matrices on the form A(z) ∼ (0, 2,C3×3), while varyingconvergence parameter εr and truncation parameter µ. The results of this study can be seenin Figure 4.7. It is clear that decreasing εr results in more iterations until convergence. Thisis an intuitive result, because a smaller εr means that the algorithm has more coefficients tozero out.

Looking at the graph of iterations versus µ in Figure 4.7, it is clear that the number ofiterations level out for sufficiently small µ. This can be interpreted as µ being small enoughso that an insignificant amount of energy is truncated from the matrices. Selecting a smallerµ will therefore not affect the number of iterations.

4.3.3 Discussion

The modifications introduced which led to the MPQRD-BC algorithm are straight-forward,but convenient. They lead to a direct convergence definition suitable for a communicationssystem, and to tighter complexity bounds. As will be seen in Chapter 6, the designer ofa communications system would probably like to select algorithm parameters based on anintuition of how they affect the capacity of the system. When using the triangularity erroras convergence criterion, this is exactly what is done.

Because the general behaviour of the algorithm is not changed, only simulation results forMPQRD-BC are presented, and not PQRD-BC. The qualitative analysis of the two algorithmsis similar, and therefore we focus on the relative algorithm. Additionally, the convergenceproof of MPQRD-BC follows directly from the convergence proof of PQRD-BC, as given by[4].

4.4 PSVD by PQRD-BC: Polynomial Singular Value Decom-position

With an understanding of the workings of PQRD-BC, we are now ready to study the PSVDby PQRD-BC algorithm, as proposed by Foster et. al. in [4]. Given an arbitrary matrix A(z),this algorithm obtains a PSVD of A(z) such that for some paraunitary matrices U(z),V(z)

A(z) = U(z)D(z)VH(z−∗)

where D(z) is diagonal. The general idea behind the algorithm is to obtain a PQRD of A(z),and then take the parahermitian conjugate of the resulting upper triangular matrix. ThePQRD of this lower triangular matrix is then found, yielding a upper triangular matrix thatthanks to the energy moving property of the polynomial Givens rotation has a smaller diag-onality error than the original matrix A(z). Iterating this behaviour, paraunitary matricesand one diagonal matrix is obtained, which together form a PSVD of A(z).

Because of the iterative manner, and some necessary matrix truncation, the algorithmwill not output an exact PSVD. Rather, the matrices are only approximately paraunitary

29

Taps

Number of iterations as a function of input size (MPQRD-BC)

Columns

Iterations

Rows

0 10 200 10 200 10 200

200

400

600

800

0

200

400

600

800

0

200

400

600

800

Figure 4.6: Number of coefficient steps (iterations) needed for convergence, as a function ofinput matrix size. The dimension size of the independent dimensions was 3.

εr = 10−1εr = 10−3

µ = 10−3µ = 10−6

Iterations


Number of iterations as a function of algorithm parameters (MPQRD-BC)

Iterations


10−8 10−6 10−4 10−2

10−5 10−4 10−3 10−2 10−1 100

0

50

100

150

0

50

100

150

200

250

Figure 4.7: Number of coefficient steps (iterations) needed for convergence as a function ofalgorithm parameters, for a Matrix A(z) ∼ (0, 2,C3×3)

30

and approximately diagonal. The errors can be made arbitrarily small though, given enoughtime and memory is available.

The PSVD of a matrix has an interesting application in spatial multiplexing for widebandwireless channels. By precoding and receive filtering by the obtained paraunitary matrices, achannel matrix can be diagonalized over all frequencies, so that signaling can be performedover a set of frequency-selective spatial modes. This application is further studied in Chap-ter 6.

Since a modified version of PSVD by PQRD-BC will be presented in the next section, ourfocus will mainly be on the properties of that algorithm. In order to introduce the modifiedversion, we restate PSVD by PQRD-BC in Algorithm 4.

Algorithm 4 PSVD by PQRD-BC

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q), convergence parameter ε, trunca-tion parameter µ, absolute stopping criteria MaxPSVDIter, and PQRD-BC parametersMaxSweeps, MaxIter.

2: Let U(z) = Ip,V(z) = Iq, iter = 0, g = 1 + ε.3: while iter < MaxPSVDIter and g > ε do4: Find j, k, t where j 6= k such that |ajk(t)| ≥ |amn(τ)| holds for m = 1 . . . p, n = 1 . . . q

such that m 6= n and ∀τ ∈ [−V1, V2].5: Let g = |ajk(t)|.6: if g > ε then7: Let iter = iter + 1.8: Call [U1(z),R1(z)] = pqrd bc (A(z), ε, µ,MaxIter,MaxSweeps).9: Let A′(z) = RH

1 (z−∗) and U(z) = U1(z)U(z).10: Call [V1(z),R2(z)] = pqrd bc (A′(z), ε, µ,MaxIter,MaxSweeps).11: Let A(z) = RH

2 (z−∗) and V(z) = V1(z)V(z).12: Truncate A(z), U(z) and V(z) given µ.13: end if14: end while15: Let D(z) = A(z).

In the following, rows 8-12 of Algorithm 4 will be referred to as a flip step. The name stemsfrom the fact that the algorithm operates by iteratively applying PQRD-BC to a sequenceof flipped matrices. The parameters ε and µ have the same meaning as in PQRD-BC, andare also passed along the calls to PQRD-BC on rows 8 and 10. The only new parameter isMaxPSVDIter, which determines the maximum number of flip steps to allow.


Convergence for PSVD by PQRD-BC is proven by a similar argument to that for PQRD-BC.For every call to PQRD-BC, the energy of the coefficients of the diagonal matrix elements willincrease. As PQRD-BC is applied to the sequence of flipped matrices, eventually a sufficientamount of energy has been moved from the non-diagonal part, so that the coefficient withthe largest magnitude has a magnitude less than ε. This is the state of convergence. The fullproof of convergence is given in [4].

In order to derive the theoretical complexity expression, we note that rows 8-12 by defi-nition is a flip step. Denoting the complexity of a flip step O(Cf ), where Cf is some function

31

of p, q, r, it is clear that the complexity of the block of rows within the if statement at row 6is O(Cf ) +O(1) = O(Cf ). Using the definition from Section 4.2.1 for the complexity of row4, and noting that row 5 is O(1) gives that the complexity of rows 4-13 is O(Cf ) +O(Cjkt).

The maximum number of iterations of the while loop at row 3 is MaxPSVDIter. Thecomplexity of the loop is therefore O(MaxPSVDIterCf ) +O(MaxPSVDIterCjkt) = O(Cf ) +O(Cjkt). Taking into account the start-up costs at rows 1-2, which are O(pqr)+O(p2)+O(q2),and noting that row 15 is O(pqr), it is clear that the complexity of the entire algorithm is

CPSVD by PQRD-BC = O(Cf ) +O(Cjkt) +O(pqr) +O(p2) +O(q2). (4.11)

To express the complexity in terms of coefficient steps, as opposed to flip steps, we expandthe Cf term in equation (4.11). Row 8 and 10 are each O(CPQRD-BC). Rows 9 and 11 aretogether O(pqr) +O(p3) +O(q3). Finally, row 12 is O(pqr2) as given by Section 2.2.3. Thisgives that

Cf = O(CPQRD-BC) +O(pqr) +O(p3) +O(q3).

Plugging in the expression for CPQRD-BC from equation (4.6), the final expression

CPSVD by PQRD-BC = O(min(p−1, q)Cc)+O(min(p−1, q)Cjt)+O(Cjkt)+O(pqr2)+O(p3)+O(q3)(4.12)

is obtained. Note that the hidden constant in the Ordo terms from the PQRD-BC expressionsmay be very large. Equation (4.12) therefore suffers from the same problem as equation (4.6),that is that the bounds may not be tight.

A brief simulation showed that for ε = 10−3 and a matrix with coefficients drawn from azero-mean circularly symmetric normalized Gaussian distribution, the hidden constant in theOrdo expression of the first term of (4.12) was on the order of 102.

4.4.2 Discussion

PSVD by PQRD-BC is an algorithm for obtaining a PSVD of a matrix A(z), through a seriesof polynomial QR decompositions. The bulk of the work is performed by the PQRD-BC algo-rithm, which is called iteratively. Convergence is defined in terms of an absolute convergencecriterion, making the algorithm unsuitable for direct implementation in a communicationssystem.

The theoretical complexity derivation gave that the algorithm complexity is linear in thenumber of coefficient steps. Recalling that the hidden constant of the Ordo operator may bevery big, this is not a particularly interesting result. No deliberation is spent on this factthough, since a modified algorithm is introduced in the next section, which does not sufferfrom this problem.

4.5 MPSVD by MPQRD-BC: Modified PSVD

This section will introduce a modified version of PSVD by PQRD-BC that uses a relativeconvergence criterion. The structure of the modified algorithm is identical to the structureof PSVD by PQRD-BC, but the flip step is modified to employ MPQRD-BC for the PQRD.With these modifications in place, the algorithm will be suitable for direct implementation ina communications system. Indeed, MPSVD by MPQRD-BC will be the algorithm of choicefor the transmission scheme to be presented in Chapter 6.

32

The relative convergence criterion is defined so that convergence is reached when

‖Dnon-diag(z)‖2F‖A(z)‖2F

< εr (4.13)

for a given parameter 0 < εr < 1. That is, the state of convergence is obtained when the trian-gularity error is smaller than εr. By this selection of convergence criteria, a direct relationshipbetween the value of εr and the capacity expressions of Chapter 6 can be established. Any en-ergy outside the diagonal part of a diagonalized channel matrix D(z) results in cross-channelinterference. Decreasing εr directly reduces this interference.

A pseudocode representation of MPSVD by MPQRD-BC is shown in Algorithm 5.

Algorithm 5 MPSVD by MPQRD-BC

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q), convergence parameter εr, trunca-tion parameter µ, absolute stopping criteria MaxPSVDIter, and PQRD-BC parametersMaxSweeps, ρ.

2: Let U(z) = Ip,V(z) = Iq, iter = 0, h = 1 + ε and A0(z) = A(z).3: while iter < MaxPSVDIter and h > εr do

4: Let h =‖Anon-diag(z)‖2F‖A0(z)‖2F

.

5: if h > εr then6: Let iter = iter + 1.7: Call [U1(z),R1(z)] = mpqrd bc (A(z), εr/2, µ, ρ,MaxSweeps).8: Let A′(z) = RH

1 (z−∗) and U(z) = U1(z)U(z).9: Call [V1(z),R2(z)] = mpqrd bc (A′(z), εr/2, µ, ρ,MaxSweeps).

10: Let A(z) = RH2 (z−∗) and V(z) = V1(z)V(z).

11: Truncate A(z), U(z) and V(z) given µ.12: end if13: end while14: Let D(z) = A(z).

The structure of MPSVD by MPQRD-BC is obviously identical to the structure of PSVDby PQRD-BC. The behaviour of the modified algorithm is the same as PSVD by PQRD-BC,but the parameters have changed slightly. In addition to εr, MPSVD by MPQRD-BC alsoneeds a ρ for the calls to MPQRD-BC. The other parameters have the same function as forPSVD by PQRD-BC.

It is worth noting that MPQRD-BC is called with convergence parameter εr/2. The reasonis that half of the non-diagonal energy of the diagonalized matrix is expected to be in eachof the triangular parts.


The convergence proof for MPSVD by MPQRD-BC follows directly from the convergenceproofs of PSVD by PQRD-BC (see [4]) and MPQRD-BC. For every flip step, more energywill have been moved to the diagonal matrix element polynomials. As this goes on, eventuallya state will be reached where a sufficient ratio of energy will be on the diagonal elements,thereby satisfying the triangularity error convergence criterion.

33

Through a similar argument as for PSVD by PQRD-BC, it can be shown that the com-plexity of MPSVD by MPQRD-BC is

CMPSVD by MPQRD-BC = O(Cf ) +O(pqr) +O(p2) +O(q2). (4.14)

The missing O(Cjkt) term is due to the fact that row 4 is replaced in MPSVD by MPQRD-BC. The hidden constant of the first Ordo term in (4.14) was estimated to be 302.5 as furtherdescribed in Section 4.6.

4.5.2 Simulations

In this section some numerical results regarding the properties of MPSVD by MPQRD-BC willbe presented. First, the results of the algorithm applied to a single matrix will be presented.Secondly, the decomposition quality as a function of input data size and algorithm parametersis studied. Finally, the run time in terms of coefficient steps, has been measured for a set ofmatrices and algorithm parameter values.

A First Example

MPSVD by MPQRD-BC was applied to the same matrix A(z) ∼ (0, 2,C3×3) as in Sec-tion 4.3.2. The parameter values were selected so εr = 10−3, µ = 10−6, ρ = 2 and MaxSweeps= 2, MaxPSVDIter = 10. Recall that the coefficients of the matrix element polynomials weredrawn from a zero-mean normalized circularly symmetric Gaussian distribution. Seeing theoriginal matrix as a multi-dimensional FIR filter, the impulse response is shown in Figure 4.8a.The resulting diagonalized matrix can be seen in Figure 4.8b, and the obtained paraunitarymatrices U(z),V(z) are plotted in Figures 4.9a, 4.9b.

The algorithm needed 4 flip steps to converge, and altogether 544 coefficient steps. Asthe PSVD was obtained, the errors from Section 4.1 were calculated. The computed valuescan be seen in Table 4.3. The results tell us that the decomposition is good, and that thematrices U(z),V(z) are close to being perfectly paraunitary.

Table 4.3: Errors for PSVD of Matrix A(z) from Section 4.3.2.

Decomposition Diagonality Unitarity U Unitarity V

1.1 · 10−2 4.8 · 10−4 1.4 · 10−2 7.0 · 10−3

Decomposition Quality

In order to measure the impact of selection of algorithm parmameters on the decompositionquality, MPSVD by MPQRD-BC was applied to 100 matrices Av(z) ∼ (0, 2,C3×3), for a setof parameters. The matrices were independently generated with coefficients drawn from azero-mean normalized circularly symmetric Gaussian distribution.

For every matrix, MPSVD by MPQRD-BC was applied while varying the convergenceparameter εr and truncation parameter µ. The results were then averaged over the 100realizations.

34

Lags

Tap

Magnitudes

Matrix A(z)

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

−1 0 1 2 3−1 0 1 2 3−1 0 1 2 3

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

0

0.5

1

1.5

2

2.5

(a) Impulse Response of Original Matrix A(z) ∼(0, 2,C3×3)

Lags

Tap

Magnitudes

Matrix D(z)

−2 0 2 4−2 0 2 4−2 0 2 4

−2 0 2 4−2 0 2 4−2 0 2 4

−2 0 2 4−2 0 2 4−2 0 2 4

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

0

1

2

3

4

5

(b) Impulse Response of Diagonalized Matrix

Figure 4.8: The Original and Diagonalized Matrices Obtained From a MPSVD by MPQRD-BC Run with εr = 10−3, µ = 10−6, ρ = 2.

Lags

Tap

Magnitudes

Matrix U(z)

−5 0 5−5 0 5−5 0 5

−5 0 5−5 0 5−5 0 5

−5 0 5−5 0 5−5 0 5

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

(a) Impulse Response of Paraunitary Matrix U(z)

Lags

Tap

Magnitudes

Matrix V(z)

−5 0 5−5 0 5−5 0 5

−5 0 5−5 0 5−5 0 5

−5 0 5−5 0 5−5 0 5

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

0

0.2

0.4

0.6

(b) Impulse Response of Paraunitary Matrix V(z)

Figure 4.9: The Paraunitary Matrices Obtained From MPSVD by MPQRD-BC Applied tothe Original Matrix from Figure 4.8a, with εr = 10−3, µ = 10−6, ρ = 2.

35

The relative decomposition error as a function of εr and µ can be seen in Figure 4.10. Theunitarity error of the matrix U(z) is shown in Figure 4.11. The curve for the unitarity errorof matrix V(z) is similar to Figure 4.11. Finally, the diagonality error of matrix D(z) can beseen in Figure 4.12.

Clearly, the relative decomposition error decreases for decreasing µ. Interestingly, therelative decomposition error increases for decreasing εr, until it plateaus. This may be becausea smaller εr means more coefficient steps, and therefore more matrix truncations, which affectthe decompositions negatively.

Figure 4.11 tells us that smaller µ gives smaller unitarity error, as expected. For a givenµ, a larger εr decreases the unitarity error. This is probably because of the same reason asthe decomposition error, that is that smaller εr means more iterations and therefore worseunitarity performance.

The last graph, Figure 4.12 shows that the choice of µ has no big impact on the diagonalityerror, but that εr is directly proportional to the diagonality error. This is natural, since thatis how convergence is defined.

Complexity

In order to measure run time performance, in terms of number of coefficient steps neededfor convergence, one hundred matrices were generated. Each matrix had coefficients of itspolynomials drawn from a zero-mean normalized circularly symmetric Gaussian distribution.For every matrix, MPSVD by MPQRD-BC was applied to all principal sub-matrices obtainedby removing a number of trailing columns and rows. The run time measurements weremeasured over the 100 matrices.

The run time was measured for different input sizes by processing different principal sub-matrices of the original matrices. The results are shown in Figure 4.13. In the figure, it canbe seen that the number of iterations behaves linearly for the row and column series, after aparticular point. For the rows, this point is at 3, and for columns it is at 4. The reason is themin(p− 1, q) term in the complexity derivations, which changes abruptly at a point. The lastsub-plot shows the number of iterations as a function of number of polynomial coefficients,or taps. The algorithm parameter values used for the series are shown in Table 4.4.

Table 4.4: MPSVD by MPQRD-BC Parameter Values for Spatial/Temporal Series.

Indp. Dim. Size Convergence εr Truncation µ Factor ρ MaxSweeps MaxPSVDIter

3 10−3 10−6 2 10 100

The choice of algorithm parameter values was also investigated for the same 100 matrices.For every matrix, the sub-matrix of the first 3 rows and 3 taps was extracted. MPSVD byMPQRD-BC was then applied using different parameters. The results were averaged over100 matrices. The results can be seen in Figure 4.14. The figure tells us that more iterationsare needed for decreasing εr. This is intuitive, as more coefficients have to be nulled in orderto achieve lower diagonality error. For a lower µ, the number of iterations go up until theyplateau. This is because after a while, no energy is removed because µ is so small, andtherefore it will not matter if we choose smaller µ.

36

RDE


Rel. decomposition error as a function of algorithm parameters (MPSVD by MPQRD-BC)RDE


10−8 10−6 10−4 10−2 100

10−6 10−5 10−4 10−3 10−2 10−1 100

10−4

10−2

100

10−3

10−2

10−1

Figure 4.10: Decomposition Error as a Function of ε and µ, Averaged over 100 Matrices.

εr = 10−1

εr = 10−3

Unitarity error for U(z) as a function of algorithm parameters (MPSVD by MPQRD-BC)

Unitarityerror


10−6 10−4 10−210−4

10−2

100

Figure 4.11: Unitarity Error as a Function of ε and µ, Averaged over 100 Matrices.

µ = 10−3

µ = 10−6

Diagonality error as a function of algorithm parameters (MPSVD by MPQRD-BC)

Diagonalityerror


10−6 10−4 10−2 100

10−5

100

Figure 4.12: Diagonality Error as a Function of ε and µ, Averaged over 100 Matrices.

37

Taps

Number of iterations as a function of input size (MPSVD by MPQRD-BC)

Columns

Iterations

Rows

0 5 100 5 100 5 100

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

0

500

1000

1500

2000

2500

Figure 4.13: Number of coefficient steps (iterations) needed for convergence, as a function ofinput matrix size.

εr = 10−1εr = 10−3

µ = 10−3µ = 10−6

Iterations


Number of iterations as a function of algorithm parameters (MPSVD by MPQRD-BC)

Iterations


10−8 10−6 10−4 10−2 100

10−6 10−5 10−4 10−3 10−2 10−1 100

100

105

100

105

Figure 4.14: Number of coefficient steps (iterations) needed for convergence as a function ofalgorithm parameters, for a 3× 3 matrix with 3 lags.

38

4.5.3 Discussion

The Modified PSVD by Modified PQRD-BC is a straight-forward extension of the PSVD byPQRD-BC algorithm. It retains the main behaviour of PSVD by PQRD-BC, but has a slightlyaltered convergence criterion. Most of the operations performed on the input matrix are infact performed by MPQRD-BC, and the behaviour of MPSVD by MPQRD-BC thereforerelies heavily on the behaviour of MPQRD-BC.

For the given diagonalization strategy, that is to obtain the PQRD of a sequence of flippedmatrices, there is probably not a lot of room for improvement of MPSVD by MPQRD-BC.Rather, any development efforts should probably be spent on MPQRD-BC.

Thanks to its nature of accepting relative parameters, the MPSVD by MPQRD-BC isa candidate for implementation in a communications system. Chapter 6 presents the resultof a study of the performance of MPSVSD by PQRD-BC, when applied to the channeldiagonalization problem of spatial multiplexing in wireless communications.

4.6 Sampled PSVD vs. SVD in DFT Domain

In this section, a comparison will be made between the matrices obtained by sampling thepolynomial matrices given by MPSVD by MPQRD-BC with the matrices given by a SVD inthe DFT domain. The computational load for the two methods for a polynomial matrix withfixed spatial dimensions but varying temporal dimension will also be compared.

4.6.1 Frequency Domain Comparison

The standard 3 × 3 polynomial matrix defined in Section 4.3.2 was decomposed using theMPSVD by MQPRD-BC algorithm with parameters εr = 10−1, µ = 10−6, ρ = 15,MaxSweeps =10,MaxPSVDIter = 10. The obtained D(z) matrix was sampled at N = 512 points alongthe unit-circle, and the frequency and phase response were computed. As a comparison, theoriginal matrix A(z) was also oversampled at N = 512 points along the unit-circle. A stan-dard constant SVD was performed for all the 512 matrices, and the magnitude and phaseof the elements of the sequence of matrices Dk given by the SVDs were plotted in the samediagrams. The frequency response comparison can be seen in Figure 4.15, and the phaseresponse comparison is visible in Figure 4.16. The reason for choosing a relatively large εrwas to give a visible difference in the plots.

It is clear from Figure 4.15 that the frequency response of the sampled PSVD matrixfollows the magnitude of the sequence of matrices Dk closely. For a smaller εr, the differencewould have been less. On the other hand, it can be seen in Figure 4.16 that the sequence ofmatrices Dk only have real elements, whereas the matrix D(ejω) is complex. Therefore, thephase response of the sampled PSVD matrix are at most frequencies not close to the phaseof the sequence of matrices Dk.

4.6.2 Computational Load Comparison, Set-Up Phase

As will be shown in Chapter 6, the traditional way of diagonalizing a wideband MIMOchannel in both frequency and space is to transform the channel into the frequency domainusing the FFT, and then performing an SVD for every sub-carrier. Using an estimate forthe computational load (in terms of floating-point operations, flops) of the SVD operation, it

39

Normalized frequency

Pow

ergain

(dB)

Frequency response: Sampled PSVD (dotted), SVD in DFT Domain (solid)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−10

0

10

20

−40

−20

0

−40

−20

0

−40

−20

0

−10

0

10

20

−40

−20

0

−40

−20

0

−40

−20

0

−10

0

10

20

Figure 4.15: Frequency Response of Sampled PSVD (εr = 10−1) Matrix D(ejω) and Magni-tudes of Sequence of SVD Matrices Dk.

is trivial to obtain the computational load for performing an SVD for every sub-carrier. Asgiven by [9, p. 254], an estimate of the number of flops needed to perform an SVD yieldingall three decomposition matrices of a p× q matrix is

CSVD,1 = 4p2q + 8pq2 + 9q3.

The estimate of number of flops needed for N SVDs is then simply

CSVD,N = N(4p2q + 8pq2 + 9q3). (4.15)

In order to compare the computational load of the N SVDs to performing a PSVD in thez-domain, a rough estimate of the computational load (in terms of flops) will be derived. Asshown by the simulation in Section 4.5.2, the number of coefficient steps needed for performinga PSVD on a 3 × 3 matrix displays a linear behaviour in terms of the number of taps ofthe matrix, keeping the spatial dimensions fixed. An affine polynomial was fitted to thecomplexity measurements data of the temporal series in Section 4.5.2 using least squares.The slope of the curve obtained suggests that

D = 302.5 coefficient steps/original matrix lag dimension size

which corresponds well to the plot in Figure 4.13. Recall that the temporal measurementseries of Section 4.5.2 was performed with the MPSVD by MPQRD-BC parameters of Table4.4, and therefore corresponds to good decompositions.

The complexity of a coefficient step is estimated by the complexity of two polynomialGivens rotations, which is given by (2.20). This gives

Ccoefstep = 4q(r + 2|t|) ≤ 4qrfinal (4.16)

40


Phase(rad

)Phase response: Sampled PSVD (dotted), SVD in DFT Domain (solid)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−10

0

10

−10

0

10

−10

0

10

−10

0

10

−10

0

10

−10

0

10

−10

0

10

−10

0

10

−10

0

10

Figure 4.16: Phase Response of Sampled PSVD (εr = 10−1) Matrix D(ejω) and Magnitudesof Sequence of SVD Matrices Dk.

where rfinal is the number of lags of the final matrix.Accounting only for the computational load of the coefficient steps, the estimate for the

computational load for performing an MPSVD by MPQRD-BC is then

CMPSVD by MPQRD-BC = MlDCcoefstep. (4.17)

for a matrix with Ml lags.The estimates of (4.15) and (4.17) were calculated for a sequence of 100 3 × 3 matrices

with increasing number of lags. The comparison was made with four different FFT sizes. Theresults are shown in Figure 4.17

It can be seen that for increasingly large channel impulse responses, the computationalload for performing the PSVD becomes prohibitive compared to performing N SVDs.

4.6.3 Computational Load, Online Phase

The previous section compared the computational load for diagonalizing a wideband MIMOchannel. The resulting (para-)unitary matrices are then used by precoding and receive filteringthe symbol stream. In the traditional system, the unitary matrices are given as constantmatrices in the frequency domain. It is therefore natural to perform the filtering of the datastream in the frequency domain.

The PSVD approach, on the other hand, results in paraunitary polynomial matrices. Thestraight-forward way of implementing the filtering with these matrices is in the time domain,as the polynomial coefficients directly give the filter impulse response. However, as shown ine.g. [18, p. 10], for long filters it is more computationally advantageous to perform the filteringin the frequency domain. As the PSVD algorithm typically generates paraunitary matrices

41

MPSVD by MPQRD-BC

Constant SVD 4096 s.c.




Computational Load ComparisonApprox.number

offlop

s

Number of taps in Channel Impulse Response

100 101 102 103104

106

108

1010

1012

Figure 4.17: Computational Load Comparison Between Performing a PSVD and PerformingN SVDs.

with significantly higher maximum degrees than the original channel matrix maximum degree,this will also be the case here.

Assuming that the two systems are using the same number of sub-carriers, the onlinecomplexity of the two systems will therefore be the same. The investigation in Section 4.6.2is therefore sufficient for comparing the computational load of the two systems.

4.6.4 Discussion

The fact that the frequency response of the sampled matrix D(ejω) followed the magnitudeof the sequence of matrices Dk in Figure 4.15 is reassuring since it tells us that MPSVD byMPQRD-BC gives solutions with the same qualitative behaviour as performing SVDs in theDFT domain. The fact that the phase in Figure 4.15 is not constant zero shows that thesampled D(ejω) will in general be complex, whereas the constant SVD always produces realsingular values.

The computational load comparison performed in Section 4.6.2 is telling. It is obviousthat applying the MPSVD by MPQRD-BC to a polynomial matrix, and then sampling theapproximately diagonal factor D(z) along the unit-circle is not computationally advantageousas compared to the traditional approach. In the computation, only the computational loadcoming from the application of the polynomial Givens rotations was accounted for. This isthe bulk of the work load for the algorithm, but as seen in Section 4.5.1 there are more termsin the complexity expression. The investigation uses estimates of the computational load forperforming an MPSVD by MPQRD-BC of a 3×3 matrix, and the factor Dcoefstep is calculatedfrom a limited data set, but the results still give the trend.

Furthermore, the N frequency-domain SVDs can easily be parallelized, since they areindependent of each other. It does not seem trivial to parallelize the PSVD algorithm however.

42

4.7 Summary

This chapter has presented four algorithms for approximate decompositions of polynomialmatrices through coefficient nulling by polynomial Givens rotations. Two of the algorithmspresented were modifications of the remaining two. All algorithms perform the approximatedecompositions by in some way iteratively applying PGRs until a predetermined convergencecriterion is satisfied.

Since the decomposition matrices returned by the algorithms are approximations, thedecompositions will naturally have some error associated with them. A number of errorcriteria were defined and described. A decomposition can only be said to be good if all errorcriteria are low. It may be the case that a decomposition has a low relative decompositionerror, but this may be of no value if the unitarity or triangularity/diagonality errors of thematrices are high. In order to bring some intuition into what the error criteria mean, twooptimization problems were introduced.

A common theme in the algorithms is that the maximum degrees of the polynomials in-volved may grow fast, because of the properties of the polynomial Givens rotation. A matrixtruncation step is therefore introduced at certain places in the algorithms. The truncationremoves the outer coefficients of the matrix polynomials, if these are deemed to be unimpor-tant in the sense that their energy is low. The truncation algorithm used (Algorithm 1) hasa complexity of O(pqr2), but this could probably be reduced by implementing some sort ofbinary search algorithm.

The potential use for spatial multiplexing of the original algorithms in [4] is disturbedby the fact that they have an absolute convergence criterion. The two modified algorithmspresented in this chapter deal with this problem, by redefining the state of convergence.Using the new definitions, a direct link between the diagonality error measure and cross-channel interference in the spatial multiplexing system of Chapter 6 can be established. Asystem designer can therefore determine the algorithm parameters based on an intuition ontheir effect on system capacity.

Finally, it was shown that the computational load becomes prohibitive if the goal ofapplying the algorithm is to sample the obtained matrices along the unit-circle. For thescenario of moderate to large channel impulse responses, it is more computationally feasibleto apply an FFT and perform multiple SVDs than to apply the MPSVD by MPQRD-BCalgorithm and sampling it along the unit-circle.

There are other approaches to polynomial decomposition, that do not rely on single coef-ficient polynomial Givens rotations. The next chapter will study algorithms for polynomialdecomposition obtained from rational Givens rotations that null entire polynomial matrixelements exactly. By this strategy, a PQRD can be obtained with zero triangularity error.

43

Chapter 5

Rational DecompositionAlgorithms: Polynomial Nulling

This chapter will propose two polynomial decomposition algorithms based on the idea of exactpolynomial nulling. Rather than nulling polynomial coefficients, one at a time, the approachtaken in this chapter is to null an entire polynomial matrix element per iteration. In order topreserve the paraunitarity of the polynomial Givens rotation, it has to be extended to allowfor rational functions. From a signal processing point of view, this corresponds to filters withinfinite impulse responses (IIR). Care has to be taken to ensure the stability of the IIR filters,because their poles are not confined to the origin as is the case for FIR filters. The notionof exact polynomial nulling used as the foundation of the algorithms to be presented wasproposed by Bengtsson in [19].

The QRD algorithm presented in this chapter bears similarities with the rather old tri-angularization algorithm of [6, p. 33]. There, an arbitrary polynomial matrix is transformedinto the Hermite row form, which can be thought of as the analogue to the row echelon formof constant matrices. Elementary row operations are used, and therefore the decompositionobtained is not necessarily the QR decomposition. Henrion in [20] states that the triangular-ization algorithm of [6] is well-known to be impractical due to bad numerical behaviour. Anumerically stable algorithm for the triangularization problem is then proposed in [20]. Aswill be shown, the algorithms of this chapter will unfortunately also suffer from numericalinstability.

5.1 Rational Givens Rotation

Applying the polynomial Givens rotation (PGR) in (2.18) to a 2×1 polynomial vector x(z) =(x1(z) x2(z)

)T, a specific coefficient of x2(z) will be nulled, and in the process all other

coefficients of both element polynomials will be altered. Iterating this behaviour, a state willeventually be reached when the magnitude of the dominant coefficient of x2(z) is less thansome constant ε. Convergence in this sense is certain, due to the energy moving property ofthe PGR.

If the goal is to null the entire polynomial x2(z), then there are other ways of doing this.In this section, a rational Givens rotation (RGR) will be developed for the purpose of exactpolynomial nulling of x2(z). The derivations resemble the steps in [16], where it however not

44

is clear how the denominator factorization step is performed. Let

α(z) = x2(z)

β(z) = x1(z)

γ(z) = γ+(z)(γ+(z−∗)

)∗= α(z)α∗(z−∗) + β(z)β∗(z−∗)

where γ(z) = γ+(z) (γ+(z−∗))∗

is the canonical spectral factorization. Assuming that thecoefficients of γ(z) are exponentially bounded, and that γ(z) has no unit-circle zeros, thefactorization is known to exist and be unique, cf. [21]. A property of the factorization is thatγ+(z) is minimum-phase, i.e. it is invertible stable. Now, set up the polynomial matrix

Gf (z) =

(−β∗(z−∗) −α∗(z−∗)α(z) −β(z)

)(5.1)

and note that

Gf (z)x(z) =

(−α(z)α∗(z−∗)− β(z)β∗(z−∗)

α(z)β(z)− α(z)β(z)

)=

(−α(z)α∗(z−∗)− β(z)β∗(z−∗)

0

).

However, the matrix in (5.1) is not paraunitary because

GHf (z−∗)Gf (z) =

(α(z)α∗(z−∗) + β(z)β∗(z−∗) α∗(z−∗)β(z)− α∗(z−∗)β(z)α(z)β∗(z−∗)− α(z)β∗(z−∗) α(z)α∗(z−∗) + β(z)β∗(z−∗)

)=

(γ(z) 0

0 γ(z)

).

By allowing rational functions, the rational Givens rotation takes the form

Gr(z) =1

γ+(z)Gf (z) (5.2)

which easily is verified to be paraunitary. Recall that 1/γ+(z) is stable since γ+(z) isminimum-phase.

The rational Givens rotation defined by (5.2) is extended to the p× p case by setting upa p × p matrix with γ+(z) on the diagonal elements, and the elements at the intersectionof rows i, j and columns i, j taken from (5.1). Normalizing by γ+(z), the extended RGR isparaunitary. Applying the p× p RGR to a p× q matrix, the element at (i, j) will be exactlynulled.

5.2 PQRD-R: Rational QR Decomposition

With the p× p RGR defined, the Rational PQRD (PQRD-R) algorithm can be stated. Thegeneral idea is to apply one RGR for every sub-diagonal matrix element of the matrix A(z)until it is on an upper triangular form. Keeping track of the accumulated paraunitary RGRs,and recalling that the inverse of a paraunitary matrix is its parahermitian conjugate, aninvertible decomposition is formed. Let Gr,n(z) = (1/cn(z)) Gf,n(z) be the RGR applied at

45

iteration n, and the total number of iterations N . Then the decomposition is defined by therational matrices

QHr (z−∗) = Gr,N (z)Gr,N−1(z) . . .Gr,1(z)

=1

cN (z)cN−1(z) . . . c1(z)Gf,N (z)Gf,N−1(z) . . .Gf,1(z)

Rr(z) = Gr,N (z)Gr,N−1(z) . . .Gr,1(z)A(z) = QHr (z−∗)A(z)

(5.3)

because

A(z) = Qr(z)Rr(z) = Qr(z)QHr (z−∗)︸︷︷︸I

A(z)

Note that Qr(z) is an anti-causal filter with all poles and zeros outside the unit circle.The algorithm operates by separately keeping track of the numerator and denominator of

the rational matrices. As seen by (5.3), this is possible since all involved matrices have thesame denominator, or its parahermitian conjugate. A pseudocode representation of PQRD-Rcan be seen in Algorithm 6.

Algorithm 6 Rational PQRD (PQRD-R)

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q).2: Let QH

f (z−∗) = Ip, c(z) = 1.3: for k = 1 . . .min(p− 1, q) do4: for j = (k + 1) . . . p do5: Obtain RGR Gr(z) = 1

c(z)Gf (z) with (α(z), β(z), γ(z)) as a function of (A(z), j, k).

6: Let A(z) = Gf (z)A(z).7: Let QH

f (z−∗) = Gf (z)QHf (z−∗).

8: Let c(z) = c(z)γ+(z).9: end for

10: Let Rf (z) = A(z).11: end for

The spectral factorization needed for obtaining the RGR at row 5 in Algorithm 6 wasimplemented using the Kalman filtering approach of [21]. The implementation turned outto be sensitive to the assumption that the function did not have any unit-circle zeroes; thisproblem is discussed further in later sections.

5.2.1 Simulations

The Rational PQRD was applied to the matrix A(z) ∼ (0, 2,C3×3) defined in Section 4.3.2.The impulse response of the matrix can be seen in Figure 4.1a, and the frequency response isshown in Figure 5.1a. The frequency responses of the approximately upper triangular Rr(z)and the approximately paraunitary QH

r (z−∗) are graphed in Figures 5.1b and 5.2a respectively.It can be seen that Rr(z) indeed is approximately upper-triangular; for this particular casethe sub-diagonal elements had power gains of around −300 dB, and are therefore not visiblein the plot.

Paraunitarity for QHr (z−∗) is not obvious from Figure 5.2a. However, the frequency re-

sponse of QHr (z−∗)Qr(z) shown in Figure 5.2b shows that QH

r (z−∗) is approximately parauni-tary. The phase response was flat, but is not plotted here. The relative decomposition error,

46

defined as the square root of the sum of the relative error squared in the frequency domain,was 1.9 · 10−27.

The algorithm was also applied to a 4× 4 matrix. The frequency response of the originalmatrix can be seen in Figure 5.3a, and the frequency responses of the decomposition factors areshown in Figures 5.3b and 5.4a. The decomposition factor frequency responses are distorted,as well as the frequency response of QH

r (z−∗)Qr(z) in Figure 5.4b. During the algorithm run,the spectral factorization sub-routine warned for zeroes close to the unit-circle, so the causefor the numerical instability is probably from the spectral factorization.

5.2.2 Discussion

The PQRD-R algorithm with a Kalman filtering spectral factorization step sometimes con-verges to a good rational decomposition of a polynomial matrix. The spectral factorizationstep is sensitive to functions with zeros close to the unit circle, as spectral factors are notguaranteed to exist then. This is visible in the second example plots in Figures 5.3, 5.4.

When performing the spectral factorization by solving a Discrete Algebraic Riccatti Equa-tion, as described in [21], the same problems arose. The numerical problems therefore seeminherent in the algorithm, rather than in the specific implementation.

For every iteration, an entire matrix element polynomial is nulled using a rational Givensrotation. This is in contrast to the polynomial Givens rotation used in Chapter 4, whichonly nulled one coefficient at a time. The PQRD-R algorithm finishes in a known number ofsteps, as opposed to the PQRD algorithms of 4. This behaviour makes the algorithm easierto analyze; this is however left for future work.

47


Pow

ergain

(dB)

Frequency Response for A(z)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

(a) Frequency Response of Original MatrixA(z) ∼ (0, 2,C3×3)


Pow

ergain

(dB)

Frequency Response for Rr(z)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

−10

0

10

20

(b) Frequency Response of Upper Triangular Matrix

Figure 5.1: The Original and Upper Triangular Matrices Obtained From PQRD-R for MatrixA(z) from Section 4.3.2.


Pow

ergain

(dB)

Frequency Response for QHr (z−∗)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

−60

−40

−20

0

(a) Frequency Response of Paraunitary MatrixQHr (z−∗)


Pow

ergain

(dB)

Frequency Response for QHr (z−∗)Qr(z)

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 1

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

−1−0.5

00.51

(b) Frequency Response of QHr (z−∗)Qr(z)

Figure 5.2: The Paraunitary Matrix Obtained From PQRD-R for Matrix A(z) from Sec-tion 4.3.2.

48


Pow

ergain

(dB)

Frequency Response for A(z)

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

−1001020

−100

1020

−1001020

−1001020

−1001020

−100

1020

−100

1020

−100

1020

−100

1020

−1001020

−1001020

−100

1020

−1001020

−1001020

−1001020

−100

1020

(a) Frequency Response of Original MatrixB(z) ∼ (0, 2,C4×4)


Pow

ergain

(dB)

Frequency Response for Rr(z)

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

−1001020

−100

1020

−1001020

−1001020

−1001020

−100

1020

−100

1020

−100

1020

−100

1020

−1001020

−1001020

−100

1020

−1001020

−1001020

−1001020

−100

1020

(b) Frequency Response of Upper Triangular Matrix

Figure 5.3: The Original and Upper Triangular Matrices Obtained From PQRD-R for a 4×4Matrix.


Pow

ergain

(dB)

Frequency Response for QHr (z−∗)

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

−60−40−20

0

(a) Frequency Response of Paraunitary MatrixQHr (z−∗)


Pow

ergain

(dB)

Frequency Response for QHr (z−∗)Qr(z)

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

0 0.5 10 0.5 10 0.5 10 0.5 1

−1

0

1

−1

0

1

−1

0

1

−1

0

1−1

0

1

−1

0

1

−1

0

1

−1

0

1−1

0

1

−1

0

1

−1

0

1

−1

0

1−1

0

1

−1

0

1

−1

0

1

−1

0

1

(b) Frequency Response of QHr (z−∗)Qr(z)

Figure 5.4: The Paraunitary Matrix Obtained From PQRD-R for a 4× 4 Matrix.

49

5.3 PSVD-R by PQRD-R: Rational Singular Value Decompo-sition

The approach used in Chapter 4, to use an PQRD algorithm to devise an PSVD algorithm,invites to the same idea for the rational case. For the coefficient nulling case, convergence isproved thanks to the energy moving property of the polynomial Givens rotation. By iteratinglong enough, a large enough chunk of energy will have moved to the coefficients belonging tothe zero-lag taps of the diagonal matrix elements. In the rational case, one of the off-diagonalparts of the matrix will be completely nulled. During that stage, the other off-diagonal partof the matrix, which in the previous iteration was completely nulled, will be polluted by thenulling of the first off-diagonal part. For every iteration though, the amount of energy in theoff-diagonal parts of the matrix decreases.

Algorithm 7 Rational SVD (PSVD-R by PQRD-R)

1: Input polynomial matrix A(z) ∼ (V1, V2,Cp×q) and parameter MaxPSVDIter.2: Let UH

f (z−∗) = Ip, VHf (z−∗) = Ip, c(z) = 1, d(z) = 1.

3: Let iter = 0.4: while iter < MaxPSVDIter do5: Let iter = iter + 1.6: Call [UH

f,1(z−∗),Rf,1(z), c1(z)] = pqrd r(A(z)).

7: Let UHf (z−∗) = UH

f,1(z−∗)UHf (z−∗), c(z) = c1(z)c(z).

8: Let A′(z) = RHf,1(z−∗).

9: Call [VHf,1(z−∗),Rf,2(z), d1(z)] = pqrd r(A′(z)).

10: Let VHf (z−∗) = VH

f,1(z−∗)VHf (z−∗), d(z) = d1(z)d(z).

11: Let A(z) = RHf,2(z−∗).

12: end while13: Let Df (z) = A(z).

5.3.1 Simulations

Two results are presented: the decomposition after two and threee iterations of the PSVD-Rby PQRD-R algorithm applied to the matrix A(z). The frequency response of the matrix canbe seen in Figure 5.5. After two iterations, the decomposition does not result in a perfectlydiagonal matrix, but still has a low decomposition error. After three iterations, on the otherhand, the decomposition is severely distorted, due to numerical instability.

The frequency responses of the decomposition and the approximately diagonal matrix,after 2 iterations of PSVD-R by PQRD-R, can be seen in Figures 5.6a, 5.6b. For the samedecomposition, the approximate paraunitarity of the matrices UH

r (z−∗),VHr (z−∗) are shown

in Figures 5.7a, 5.7b. The decomposition is good, but the (2,1) element of the approximatelydiagonal matrix has still a significant magnitude. The (1,2) element of the same matrix isalmost zero, and is not visible in the plot.

Letting the algorithm do one more iteration, the decomposition frequency response isshown in Figure 5.8a. The plot is distorted, and the decomposition is clearly incorrect. Thesame effect can be seen in Figures 5.8b, 5.9a, 5.9b. This result shows that the PSVD-R by

50


Pow

ergain

(dB)

Frequency Response for A(z).

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−20−15−10−505

1015

−20−15−10−5051015

−20−15−10−505

1015

−20−15−10−505

1015

Figure 5.5: Frequency Response of Matrix A(z).

PQRD-R algorithm is not stable for this particular example. It seems like it is the spectralfactorization step that causes the problems.

5.3.2 Discussion

The PSVD-R by PQRD-R algorithm uses the PQRD-R in a straight-forward way. For everyiteration, the PQRD-R is applied to a flipped set of matrices. Due to the nature of thePQRD-R, the output matrix will be perfectly upper triangular. The output of the PSVD-Rby PQRD-R will not be perfectly diagonal however, because every subsequent application ofthe PQRD-R pollutes the previously perfectly triangular matrix.

When the algorithm remains stable, the resulting decomposition is good but the approx-imately diagonal matrix will have sub-diagonal section which is dominant compared to thesuper-diagonal section. The algorithm is shown to easily break down though, after which itis not interesting to talk about decomposition quality.

5.4 Summary

Exact polynomial nulling is an interesting idea for polynomial matrix decomposition algo-rithms. Since the goal is to obtain matrices with some part completely zeroed out, it seemsmore straight-forward to zero all matrix elements in that part directly, rather than to iter-atively zero all coefficients of those matrix elements. The growth in degrees of the involvedpolynomials could be analyzed theoretically, since the number of steps is predetermined. Thealgorithms of Chapter 4 are also deterministic, but harder to analyze due to their dependenceon the distribution of the initial polynomial coefficients.

The algorithms in this chapter were shown to work for some examples, and shown tobreak down for others. They are not viable alternatives to the algorithms of Chapter 4,unless numerical stability can be guaranteed. It is left for future efforts how to improve the

51


Pow

ergain

(dB)

Frequency Response for Ur(z)Dr(z)VHr (z−∗), 2 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−20−15−10−505

1015

−20−15−10−5051015

−20−15−10−505

1015

−20−15−10−505

1015

(a) Frequency Response of DecompositionUr(z)Dr(z)V

Hr (z−∗)


Pow

ergain

(dB)

Frequency Response for Dr(z), 2iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−100−80−60−40−20

020

−100−80−60−40−20

020

−100−80−60−40−20

020

−100−80−60−40−20

020

(b) Frequency Response of Diagonal Matrix

Figure 5.6: The Decomposition and Diagonal Matrices Obtained From PSVD-R after 2 iter-ations.


Pow

ergain

(dB)

Frequency Response for UHr (z−∗)Ur(z), 2 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) Frequency Response of UHr (z−∗)Ur(z)


Pow

ergain

(dB)

Frequency Response for VHr (z−∗)Vr(z), 2 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(b) Frequency Response of VHr (z−∗)Vr(z)

Figure 5.7: The Paraunitary Matrices Obtained From PSVD-R after 2 iterations.

52


Pow

ergain

(dB)

Frequency Response for Ur(z)Dr(z)VHr (z−∗), 3 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−20−15−10−505

1015

−20−15−10−5051015

−20−15−10−505

1015

−20−15−10−505

1015

(a) Frequency Response of DecompositionUr(z)Dr(z)V

Hr (z−∗)


Pow

ergain

(dB)

Frequency Response for Dr(z), 3 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−100−80−60−40−20

020

−100−80−60−40−20

020

−100−80−60−40−20

020

−100−80−60−40−20

020

(b) Frequency Response of Diagonal Matrix

Figure 5.8: The Decomposition and Diagonal Matrices Obtained From PSVD-R after 3 iter-ations.


Pow

ergain

(dB)

Frequency Response for UHr (z−∗)Ur(z), 3 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(a) Frequency Response of UHr (z−∗)Ur(z)


Pow

ergain

(dB)

Frequency Response for VHr (z−∗)Vr(z), 3 iter.

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

0 0.2 0.4 0.6 0.8 10 0.2 0.4 0.6 0.8 1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

(b) Frequency Response of VHr (z−∗)Vr(z)

Figure 5.9: The Paraunitary Matrices Obtained From PSVD-R after 3 iterations.

53

numerical issues of the algorithms.The rational Givens rotation is a rational function because of the normalization step.

If this step were to be omitted, the associated rotation matrix would not be paraunitary,and the subsequent decompositions would not be of PQRD/PSVD form. Instead of beingparaunitary, the involved matrices would take the form of a paraunitary matrix times a scalarpolynomial. In for instance the communications application, such a decomposition could stillbe used for channel diagonalization. However, it is not the normalization step per se which isthe cause of the numerical instability of the algorithms. It is the spectral factorization step,and this step is necessary when setting up rotation matrices of dimensions larger than 2× 2.

Another aspect of the decompositions presented in this chapter is that the paraunitarymatrices involved in the rational decompositions will have unstable inverses. This is easilyseen as the paraunitary matrices are normalized with the spectral factor with all poles withinthe unit circle. Taking the paraconjugate transponate of such a matrix will yield a matrixwith all poles outside the unit circle, i.e. an antistable filter.

54

Chapter 6

Polynomial SVD for WidebandSpatial Multiplexing

This chapter will introduce the polynomial decomposition algorithms of Chapter 4 into acommunications system framework. The performance will be compared to a sub-carrier SVDbased approach, called SM for MIMO-OFDM [11, p. 186] or MIMO-DMMT [22] in theliterature. Similar attempts to characterize polynomial decomposition based communicationssystems have been made in [23, 24, 25]. These papers focused on bit error rates, rather thanthe sum rate performance measured employed in this chapter.

For a deterministic narrowband MIMO channel, as shown in Section 6.1.1, channel ca-pacity can be obtained by diagonalizing the channel using the SVD. In effect, the channelis transformed so that multiple spatial data streams can be transmitted without interferencebetween them, hence the name spatial multiplexing. For an analogous wideband scenario asimilar approach named SM for MIMO-OFDM can be taken. The channel is then first diago-nalized in frequency using the OFDM technique, and then the SVD is applied to diagonalizethe channel in space. As the number of sub-carriers grow large, this procedure has a highcomputational load. Alternative strategies, such as performing the SVD in the z-domain usingthe polynomial decomposition algorithms of previous chapters, are therefore interesting.

The achievable rate study in this chapter is performed in the frequency domain. However,the PSVD algorithms results in filter representations in the z-domain. The implementationof the precoding and receive filtering can therefore easily be performed in the time domain,if that is deemed appropriate.

The main assumption for our investigation is that the fading is slow, so that the channelmaintains its state during the transmission of each block of data. Additionally, the transmitteris assumed to have perfect knowledge of the channel state. The channel is typically estimatedat the receiver, and subsequently fed back to the transmitter. For a slowly varying channel, itis feasible to assume that the feedback can occur before the channel has changed. The perfectchannel state knowledge assumption is simplifying the derivations, but it is not reasonable tomake this assumption for a practical system.

Furthermore, the loss in spectral efficiency due to the cyclic prefix of OFDM is neglectedin the capacity and rate formulations.

55

6.1 Generic System Model

6.1.1 Narrowband Scenario

A signal vector s ∈ CMt×1 with covariance matrix E(ssH

)= P is to be sent from the

transmitter to the receiver. Then for an arbitrary narrowband channel H ∈ CMr×Mt , ourmodel for the received signal is

r = Hs + n (6.1)

where n is a white Gaussian noise vector with covariance E(nnH

)= Rn = σ2

nI. The mutualinformation between the stochastic variables s and r is then [14]

I(s; s) = log∣∣I + R−1

n HPHH∣∣ = log

∣∣∣∣I +1

σ2n

HPHH

∣∣∣∣which, for a given P, is maximized if s is circularly symmetric Gaussian [14].

Now, the channel capacity of (6.1) for a given maximum transmit power Es/Mt, is definedas the solution to the following optimization problem [11, p. 65]:

maxP

log

∣∣∣∣I +1

σ2n

HPHH

∣∣∣∣subject to tr (P) ≤ Es

Mt.

(6.2)

Spatial Multiplexing

The goal of spatial multiplexing is to obtain sum rates equal to the MIMO channel capacityas given by (6.2). The name spatial multiplexing comes from the fact that this sum ratemaximization is achieved by diagonalizing the MIMO channel into a set of M = min(Mr,Mt)parallel spatial modes. In order to maximize the sum rate, the transmit powers are thenoptimized for the diagonalized system. First, we will modify the definition of the SVD fromsection 2.2.2 slightly. For some unitary matrices U ∈ CMr×M ,V ∈ CMt×M and a non-negativediagonal matrix D ∈ RM×M , the compact SVD of the channel matrix H ∈ CMr×Mt is

H = UDVH . (6.3)

The compact SVD can be obtained from the full SVD by removing the last Mt−M columns ofU and V, and removing the corresponding rows and columns of D. Assuming that the channelH, or at least the right singular vectors V, are known at the transmitter side, precoding andreceive filtering matrices can be chosen such that

s = VQ1/2x

y = UHr.

The covariance of signal vector x is assumed to be the identity matrix, but in order to beable to change the transmit powers the extra factor Q1/2 is added. With these operations inplace, the communications process (6.1) can be written

y = UHr = UH (Hs + n) = UHHVQ1/2x + UHn︸︷︷︸w

56

which by plugging in the SVD of H = UDVH is equivalent to

y = UHUDVHVQ1/2x + w = DQ1/2x + w. (6.4)

Note that thanks to the unitary nature of U and V, the total transmit power is defined solelyby Q = Q1/2Q1/2,H since

tr

(E(

VQ1/2x(VQ1/2x

)H))= tr

(VQ1/2IQ1/2,HVH

)= tr (Q)

using the tr (AB) = tr (BA) property of the trace operator. Similarly, the filtered receivednoise covariance is Rw = UHσ2

nIU = σ2nI. A block diagram of the system can be seen in

Figure 6.1.

b(n)Demult

x(1)

x(M)

.

.

. Q1/2 Vs

H

n

rUH

yDet. .

..

x(1)

x(M)

Mult.b(n)

Figure 6.1: Narrowband Spatial Multiplexing System.

Now plugging in (6.3) and substituting P = VQVH into (6.2), the optimization problemis transformed into

maxQ

log

∣∣∣∣I +1

σ2n

D2Q

∣∣∣∣subject to tr (Q) ≤ Es

Mt

(6.5)

where the |I + AB| = |I + BA| rule also was used. Expanding the determinant in terms ofits argument’s eigenvalues, and using the product rule for logarithms, the cost function of(6.5) can be reformulated as

log

(M∏i=1

(1 +

1

σ2n

λi(D2Q

)))=

M∑i=1

log

(1 +

1

σ2n

λi(D2Q

))(6.6)

where λi(A) denotes the i-th eigenvalue of A. As shown by [14], equation (6.6) is maximizedfor a diagonal Q, and therefore D2Q is diagonal as well. Let

Q = diag(γ1 . . . γMt

)=(diag

(√γ1 . . .

√γMt

))2= Q1/2Q1/2,H .

The eigenvalues of a diagonal matrix are the diagonal entries, and on that account thefinal form of the optimization problem is:

maxγi

M∑i=1

log

(1 + γi

D2ii

σ2n

)

subject toM∑i=1

γi ≤EsMt

.

(6.7)

57

Finally, problem (6.7) can be solved for the optimal γi’s by the waterfilling algorithm, see e.g.[11, p. 68]. In summary, using the application of the precoder/receive filter matrices givenby the SVD of the channel matrix, a capacity-achieving system could be implemented.

6.1.2 Wideband Scenario

For the wideband scenario, the Mr ×Mt MIMO channel, constant for a transmission block,with L channel taps is described by

H(z) =L−1∑l=0

Hlz−l (6.8)

where the Hl matrix represents the l-th tap of the filter. For a sequence of signal vectorss(m) ∈ CMt×1 m = 1, 2, . . . that are launched onto the channel, the sequence of receivedsymbol vectors is then represented by the system model

r(m) =

m∑n=m−L+1

Hm−ns(n) + n(m)

or equivalently in the z-domain

r(z) = H(z)s(z) + n(z). (6.9)

The noise process n(m) is assumed to be spatially and temporally white such that E(n(m)nH(k)

)=

Rnδ(m− k) = σ2nIδ(m− k).

Assuming a transmit covariance P(z), according to [26, p. 25], the channel capacity isgiven by the solution to:

maxP(z)

∫ 2π

0log∣∣I + R−1

n (ejω)H(ejω)P(ejω)HH(ejω)∣∣ dω

subject to

∫ 2π

0tr(P(ejω)

)dω ≤ Es

Mt.

(6.10)

6.2 SM by MIMO-OFDM: SVD in the DFT Domain

SM by MIMO-OFDM is the typical system proposed in the literature for spatial multiplexingover wideband channels [27, 11]. It is conceptually simple; the channel matrix is diagonalizedusing the FFT in frequency, and using the SVD in space. The sum rate of the system canthen be maximized by selecting the transmit powers appropriately, for the set of independentparallel channels provided by the FFT-SVD pair.

6.2.1 Specific System Model

The specific system model for SM by MIMO-OFDM is based on the generic system model forthe narrowband system, but treats several parallel narrowband channels obtained from theOFDM processing. Through the applications of FFT/IFFT and the addition and removal ofthe cyclic prefix, as described for the SISO case in Section 3.3, the wideband system model(6.9) is transformed into a set of N parallel narrowband channels

rk = Hksk + nk k = 0, . . . , N − 1 (6.11)

58

where the covariance of the noise remains Rnk = σ2nI due to the unitarity of the FFT/IFFT

matrices. As for the narrowband case, obtaining the SVD Hk = UkDkVHk and precoding

and receive filtering such that

sk = VkQ1/2k xk

yk = UHk rk

transforms (6.11) into

yk = DkQ1/2k xk + wk k = 0, . . . , N − 1 (6.12)

and as per usual Rwk = Rnk = σ2nI.

In order to derive the global communication process mutual information expression, westack the frequency vectors according to

x =(xT0 xT1 . . . xTN−1

)Ty =

(yT0 yT1 . . . yTN−1

)Tw =

(wT

0 wT1 . . . wT

N−1

)Tso that y and w are MrN × 1 vectors and x is an MtN × 1 vector. By defining the blockdiagonal matrices

D = diagDkN−1k=0

Q1/2 = diagQ1/2k N−1

k=0

the global communication process can be written as

y = DQ1/2x + w. (6.13)

With this fully diagonalized system, the optimal detector takes the form of multiple SISOdetectors.

6.2.2 Capacity

With the system diagonalized over space and frequency in (6.13), and neglecting the loss inspectral efficiency due to the cyclic prefix, the rate is given by (6.7). Maximizing the ratemeans solving the optimization problem

maxγij

1

N

M∑i=1

N∑j=1

log

(1 + γij

D2ij,ij

σ2n

)

subject to1

N

M∑i=1

N∑j=1

γij ≤EsMt

(6.14)

where the normalization factor N is due to the N sub-carriers. Problem (6.14) is of the sameform as problem (6.7), and can therefore be solved using waterfilling. The solution to (6.14)is the same as the solution to (6.10), and hence SM by MIMO-OFDM with SVD in DFTdomain is capacity-achieving.

59

6.3 SM by MIMO-OFDM: SVD in the z-Domain

Another approach to obtaining the Vk,Uk matrices could be to sample the matrices givenby the PSVD algorithms of Chapter 4 along the unit circle. This is effectively the FFT-SVDpair of SM by MIMO-OFDM, but taken in the opposite order. The FFT is applied in orderto sample the U(z),V(z) matrices along the unit circle.

If the PSVD algorithms of Chapter 4 were to produce perfect decompositions, the twoapproaches would be equivalent. Instead, the decomposition takes the form

H0(z) = U(z)D(z)VH(z−∗)︸︷︷︸H(z)

+M(z) (6.15)

where H0(z) is the original matrix, H(z) is the approximation given by the PSVD, and M(z)is the associated error. For future reference, define the absolute unitarity errors such that

UH(z−∗)U(z) = I + Ue(z) (6.16)

VH(z−∗)V(z) = I + Ve(z). (6.17)

Now sampling the involved matrices at N points along the unit circle, they take the form:

Uk = U(ej2πk/N ) Vk = V(ej2πk/N ) (6.18)

Dk = D(ej2πk/N ) Mk = M(ej2πk/N ) (6.19)

Ue,k = Ue(ej2πk/N ) Ve,k = Ve(e

j2πk/N ) (6.20)

for k = 0 . . . N − 1.

6.3.1 Specific System Model

Given a wideband system on the form (6.8), precoding and receive filtering with the filtersobtained from the PSVD gives the system model

y(z) = UH(z−∗)H0(z)V(z)Q1/2(z)x(z) + UH(z−∗)n(z)︸︷︷︸w(z)

(6.21)

in the z-domain. Note that due to the decomposition and unitarity errors, the channel willneither be perfectly diagonalized, nor will the filtered noise be temporally or spatially white.

Applying the IFFT/FFT operations at the transmitter/receiver sides, (6.21) is representedby

yk = UHk H0,kVkQ

1/2k xk + wk k = 0 . . . N − 1.

Plugging in (6.15) and using (6.16) – (6.20) transforms the model for sub-carrier k into

yk =(UHk HkVk + UH

k MkVk

)Q

1/2k xk + wk

=(UHk UkDkV

kHVk + UH

k MkVk

)Q

1/2k xk + wk

=((I + Ue,k) Dk (I + Ve,k) + UH

k MkVk

)Q

1/2k xk + wk

=(Dk + Ue,kDk + DkVe,k + Ue,kDkVe,k + UH

k MkVk

)Q

1/2k xk + wk

60

which by letting Ek = Ue,kDk + DkVe,k + Ue,kDkVe,k + UHk MkVk is equivalent to

yk = (Dk + Ek) Q1/2k xk + wk k = 0 . . . N − 1. (6.22)

Even though Dk is not perfectly diagonal, and Ek has no general structure, the modifiedchannel is in some sense close to being diagonal. It is therefore similar in form to (6.12).Through the same set of transformations, the model (6.22) can be written on the aggregateform

y = FQ1/2x + w

where F = diagFkN−1k=0 ,Fk = Dk + Ek and the other entities are defined as in Section 6.2.1.

In order to leverage the information about the cross-channel interference, a vector detectorshould be used. This type of system will be denoted PSVD-V, where the V is for Vectorreceiver.

From the vector relation (6.22), the input-output relation for sub-channel i on sub-carrierk can be written

yi,k =√γi,k[Fk]iixi,k +

M∑j=1j 6=i

√γj,k[Fk]ijxj,k + wi,k (6.23)

where the second term is the cross-channel interference. For the SM by MIMO-OFDM withDFT domain SVD system, it is optimal to use a set of separate SISO detectors for everyspace/frequency sub-stream. Doing so for the set of sub-streams in (6.23) will be sub-optimalhowever, because the system is not diagonalized in space. Despite this, if we were to use aset of separate SISO detectors, the performance of a given sub-stream would be susceptibleto the interference caused by other sub-streams. On the other hand, decoding complexitywould be lower than for the vector receiver. A system with a set of separate SISO detectorswill henceforth go under the name PSVD-S.

.

.

....

Joint Det.

y1

y2

yM

x1

x2

xM

(a) Vector Detector.

Det.

Det.

Det.

.

.

....

.

.

.

y1

y2

yM

x1

x2

xM

(b) SISO Detectors.

Figure 6.2: Comparison Between Vector and SISO Detector.

A visual comparison between the PSVD-V and the PSVD-S receiver set-ups can be seenin Figure 6.2.

61

6.3.2 Achievable Rate

Assuming that a vector receiver is employed (PSVD-V), the achievable rate of the system(6.21) is given by the objective function of (6.10). Plugging in the appropriate entities, therate is then

SPSVD-V =

∫ 2π

0log |A| dω

A = I +1

σ2n

(UH(ejω)U(ejω)

)−1UH(ejω)H(ejω)V(ejω)Q(ejω)VH(ejω)HH(ejω)U(ejω)

(6.24)

which can be maximized for Q(z) with e.g. a sum transmit power constraint.For the PSVD-S system, the achievable rate is the sum of achievable rate for all sub-

streams (6.23). It is given by

SPSVD-S =1

N

M∑i=1

N∑k=1

log

1 +γi,k[Fk]ii∑M

j=1j 6=i

γj,k[Fk]ij +Rwi,k

(6.25)

6.4 Simulations

In this section, some numerical results regarding the sum rates attained by the differenttransmission schemes will be presented. The channel capacity (6.14), and the achievable sumrates (6.24), (6.25) were calculated for a variety of channels, received SNRs, and MPSVD byMPQRD-BC parameter values.

6.4.1 Method

The system model defined in (6.9) was used. The channel H(z) was modeled using thestructure

H(z) =

L−1∑l=0

Hle−ψ(l−1)z−l (6.26)

where the elements of the Hl’s were drawn from a zero-mean circularly symmetric normalizedGaussian distribution and the exponential factor, with ψ ∈ R+, was added to give the channelan exponentially decaying power-delay-profile. This is a simple channel model, but adequatefor our needs since the purpose of the study is to compare the transmission schemes, ratherthan evaluate the absolute capacity of the channel.

For every channel realization, the channel capacity given by (6.14) with N = 512 wascalculated. The PSVD-V rate (6.24) was calculated in the same fashion, using transmitpowers obtained from the waterfilling algorithm assuming no cross-channel interference andwhite noise. For good decompositions, this is a reasonable assumption.

Finally, the PSVD-S rate (6.25) was computed. Again, the transmit powers were selectedassuming no CCI and white noise. This assumption is weak, which means that the achievablerate may not be close to the channel capacity. It is still interesting to study this case, becausethe transceiver structure is the same as for SM by MIMO-OFDM with a DFT domain SVD.

62

The signal-to-noise ratio at the receiver was calculated for a white reference signal s′(z)with total power Es/Mt. The received SNR was then

SNR =

∫ 2π

0

E((

H(ejω)s′(ejω))H

H(ejω)s′(ejω))

E (nH(ejω)n(ejω))dω (6.27)

where the denominator is

E(nH(ejω)n(ejω)

)= E

(tr(n(ejω)nH(ejω)

))=

= tr(E(n(ejω)nH(ejω)

))= tr

(Rn(ejω)

)= Mrσ

2n

and the numerator

E((

H(ejω)s′(ejω))H

H(ejω)s′(ejω))

= E(s′H

(ejω)HH(ejω)H(ejω)s′(ejω))

=

E(

tr(H(ejω)s′(ejω)s′

H(ejω)HH(ejω)

))= E

(tr(HH(ejω)H(ejω)s′(ejω)s′

H(ejω)

))=

tr(E(HH(ejω)H(ejω)

)E(s′(ejω)s′

H(ejω)

))= tr

(E(HH(ejω)H(ejω)

) EsMt

I

)=

EsMt

tr(E(HH(ejω)H(ejω)

))=EsMt‖H(ejω)‖2F

so that

SNR =Es

σ2nMrMt

∫ 2π

0‖H(ejω)‖2Fdω =

2πEsσ2nMrMt

‖H(z)‖2F . (6.28)

6.4.2 Results

The first simulation compared the sum rate of the PSVD-V system to the channel capacity,for varying size of the number of rows of the channel matrix. The simulated CDF for thematrix spatial sizes 2×3, 3×3, 4×3 can be seen in Figure 6.3. The rate CDF of the PSVD-Vsystem is clearly very close to the capacity CDF. The increase in capacity going from a 2× 3to a 3×3 channel is larger than when going from a 3×3 to a 4×3 channel, because the formeradds another spatial mode, whereas the latter only increases the magnitude of the singularvalues.

The sum rate for channels with different impulse response length were compared throughthe simulated CDFs. The comparison between a L = 2 and a L = 5 channel can be seen inFigure 6.4. The sum rate of the PSVD-V system is also here close to the channel capacity.Interestingly, the two curves intersect, so that the outage capacity for the L = 5 is better thanthe outage capacity for the other channel, for codes with low rates. For codes with higherrates, the converse holds.

Several other simulations for the PSVD-V system were performed, but the plots are notshown here. For all simulations, the results suggested that the PSVD-V system performs closeto the channel capacity, no matter spatial/temporal size of the channel, or PSVD algorithmparameter values.

In addition to the CDF results, Figure 6.5 shows the average sum rate obtained for 30channel realizations for the PSVD-V system. The average sum rate was close to the channelcapacity, and therefore only the sum rate is shown in the graph. It can be seen that thesystem achieves a multiplexing gain proportional to min(Mr,Mt) in the high SNR region.

63

4x3 Sum Rate

4x3 Ch. Cap.

3x3 Sum Rate

3x3 Ch. Cap.

2x3 Sum Rate

2x3 Ch. Cap.CDF

Sum Rate

Sum Rate CDF Compared to Capacity CDF, for Vector Receiver.

5 5.5 6 6.5 7 7.5 8 8.50

0.2

0.4

0.6

0.8

1

Figure 6.3: Simulated Sum Rate CDFs (Vector Receiver) for Various Spatial Channel Con-figurations. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were setto εr = 10−3, µ = 10−6, ρ = 5 and received SNR = 15dB.

4-delay Sum Rate

4-delay Ch. Cap.

1-delay Sum Rate

1-delay Ch. Cap.

CDF

Sum Rate

Sum Rate CDF Compared to Capacity CDF, for Vector Receiver.

6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.20

0.2

0.4

0.6

0.8

1

Figure 6.4: Simulated Sum Rate CDFs (Vector Receiver) for 3× 3 Channels with L ∈ 2, 5.MPSVD by MPQRD-BC parameters were set to εr = 10−3, µ = 10−6, ρ = 5 and receivedSNR = 15dB.

64

4x33x32x3

Sum

Rate(bpcu

)

Received SNR

Average Sum Rate For Different Spatial Configurations

0 5 10 15 20 25 300

5

10

15

20

25

Figure 6.5: Sum Rate (Vector Receiver) Averaged over 100 Channel Realizations. The channellength was L = 3 and MPSVD by MPQRD-BC parameters were set to εr = 10−3, µ =10−6, ρ = 5.

The sum rates CDFs for the PSVD-S system are shown in Figures 6.6, 6.7, varyingthe convergence parameter εr and truncation parameter µ of the MPSVD by MPQRD-BCalgorithm. It can be seen that the choice of εr, µ has a direct effect on the achievable rate of thesystem. Selecting a smaller εr reduces the amount of filter energy in the non-diagonal sectionof the diagonalized matrix, which means less cross-channel interference. Because the PSVD-Ssystem does not leverage the information about the cross-channel interference, performanceis increased as the cross-channel interference is decreased. Reducing the µ results in a betterdecomposition, which means better performance. These effects were less visible in the lowSNR region, and therefore the high SNR region results are shown.

The PSVD-S system is sensitive to choice of value for the MPSVD by MPQRD-BC pa-rameters εr and µ. This is clear in Figures 6.8, 6.9, where decreasing the parameter resultsin higher average sum rates. Clearly, the impact of bad channel decomposition is greatest inthe high SNR region.

6.5 Summary

This chapter presented the channel capacity expression for a wideband channel, and showedhow the SM by MIMO-OFDM with DFT domain SVD could achieve the capacity. Anotherapproach for obtaining the SVD matrices was introduced using the PSVD algorithms ofprevious chapters. The precoding/receive filtering matrices given by the PSVD algorithmhave some error to them, and hence they do not perfectly diagonalize the channel.

Because the PSVD approach does not perfectly diagonalize the channel, in order to maxi-mize the sum rate of the system a vector detector is needed (PSVD-V), that takes into accountthe cross-channel interference. A property of the perfectly diagonalized system is that theoptimal detector takes the form of a set of separate SISO detectors for each stream; a setupwhich has lower decoding complexity than a vector detector. Therefore, the sum rate for asemi-diagonalized system with separate SISO detectors was derived (PSVD-S).

65

Ch. Cap.

εr = 10−1

εr = 10−2

εr = 10−3

εr = 10−4

CDF

Sum Rate

Sum Rate CDF Compared to Capacity CDF, for SISO Receiver Setup.

13 14 15 16 17 18 19 200

0.2

0.4

0.6

0.8

1

Figure 6.6: Simulated Sum Rate CDFs (SISO Receivers) for 3 × 3 Channel with VaryingMPSVD by MPQRD-BC Parameter εr. Remaining parameters was µ = 10−6, ρ = 5 andreceived SNR = 30dB.

Ch. Cap.

µ ∼ 10−3

µ ∼ 10−5

µ ∼ 10−6

µ ∼ 10−8

CDF

Sum Rate

Sum Rate CDF Compared to Capacity CDF, for SISO Receiver Setup.

13 14 15 16 17 18 19 200

0.2

0.4

0.6

0.8

1

Figure 6.7: Simulated Sum Rate CDFs (SISO Receivers) for 3 × 3 Channel with VaryingMPSVD by MPQRD-BC Parameter µ. Remaining parameters was εr = 10−3, ρ = 5 andreceived SNR = 30dB.

66

Ch. Cap.

εr = 10−3

εr = 10−2

εr = 10−1

Sum

Rate(bpcu

)

Received SNR

Average Sum Rate For Various εr

0 5 10 15 20 25 300

5

10

15

20

Figure 6.8: Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, forVarying εr. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were setto µ = 10−6, ρ = 5.

Ch. Cap.

µ ∼ 10−5

µ ∼ 10−3

µ ∼ 10−1

Sum

Rate(bpcu

)

Received SNR

Average Sum Rate For Various µ

0 5 10 15 20 25 300

5

10

15

20

Figure 6.9: Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations, forVarying µ. The channel length was L = 3 and MPSVD by MPQRD-BC parameters were setto εr = 10−3, ρ = 5.

67

Simulations show that the PSVD-V system gets close to the channel capacity, for mostchoices of PSVD algorithm parameter values. This is because the Givens rotations are unitaryoperations, and thereby energy preserving, and the truncations typically removes a small per-centage of the total energy of the filter. Since the total channel gains for the semi-diagonalizedsystem stay close to the original channel, similar performance is achieved. This is dependenton the fact that a vector detector is employed.

The simplicity of the diagonalized channel from SM by MIMO-OFDM with DFT domainSVD approach means that the optimal detector takes the form of a set of separate SISOdetectors. Employing the same setup for the PSVD approach, the sum rate suddenly dependsheavily on how close the channel is to being perfectly diagonalized. The results from thesimulations show that the effect of choice of PSVD parameter values is great in the high SNRdomain. This has the interpretation that the PSVD-S system is interference limited in thehigh SNR region, but power limited for low SNRs.

The results show that as εr, µ become small, the sum rate of the PSVD-S system getsclose to the channel capacity. This is naturally at the expense of higher computational load,as the results of Chapter 4 show. If a system with near-capacity performance is sought, theSM by MIMO-OFDM may therefore be a better choice, based on the run time performancecomparison of Chapter 4.

68

Chapter 7

Summary

This thesis has evaluated algorithms for approximative polynomial matrix decompositions,with the application to spatial multiplexing of wideband MIMO channels. In order to motivatethe study, some background material from the wireless communications field was presented.The MIMO channel was introduced, and the effects of multipath propagation on the signalwas discussed. The performance measure achievable rate was discussed, together with theupper bound channel capacity.

Under the LTI assumption, a wideband MIMO channel can conveniently be representedby a polynomial matrix. In order to set the stage for the introduction of the algorithms, theconcepts of polynomials, matrices and polynomial matrices were presented. Furthermore, thepolynomial Givens rotation was presented, which forms the building block of the polynomialdecomposition algorithms.

With the theoretical underpinnings in place, the approximative polynomial decompositionalgorithms were presented. The PQRD algorithms operated on a given polynomial matrixby iteratively applying polynomial Givens rotations, in order to null the dominant coefficientunder the main diagonal. The PSVD algorithms iteratively applied the PQRD algorithms toform an approximate polynomial singular value decomposition of the input matrix. Thanksto the energy moving property of the polynomial Givens rotation, the algorithms were shownto converge. The original decomposition algorithms of [4] were modified to work with relativeconvergence criteria and the modified algorithms were then analyzed in terms of decompositionquality and computational complexity. It was shown that the relative decomposition errorcould be made small by performing sufficiently many iterations. However, for every iterationthe maximum degree of the matrix grew, due to an inherent property of the polynomial Givensrotation. Because of this effect, a truncation step is needed in order to reduce the memoryrequirements of the algorithms. The computational complexity of the algorithms were shownto be linear in terms of iterations needed, but with a large slope coefficient. If the polynomialsingular value decomposition were to be sampled along the unit-circle, in order to get theprecoding/receive filtering matrices needed for wideband spatial multiplexing, it was shownin a simple example how the computational load of the PSVD becomes prohibitively large forchannels with long impulse responses.

With this rather gloomy result in mind, another approach for decomposing polynomialmatrices was taken. A rational Givens rotation was introduced, representing a matrix filterwith infinite impulse response. Using this new Givens rotation, corresponding PQRD andPSVD algorithms were set up, yielding rational decompositions of polynomial matrices. For

69

simple cases, the rational decomposition algorithms were shown to result in excellent decom-positions. However, the algorithms turned out to be numerically unstable, which was seenwhen they were applied to larger matrices.

A polynomial singular value decomposition algorithm was plugged into a wideband spatialmultiplexing framework, for performance evaluation. The sum rate achieved from the systemwith precoding/receive filtering matrices obtained from sampling the PSVD along the unit-circle was compared to the channel capacity. If a joint detector was assumed at the receiverend, the achievable sum rates got close to the channel capacity for most PSVD algorithmparameter values. However, in order to be compared to the reference system SM by MIMO-OFDM, a set of separate SISO detectors should be used at the receiver. For this system,it was shown that the achievable sum rate was dependent on the diagonality error of thedecomposed channel.

7.1 Conclusions

The polynomial decomposition algorithms studied in this thesis operate on a simple, buteffective, strategy. For the QR case, every coefficient below the main diagonal is nulled usinga polynomial Givens rotation, until the matrix satisfies some convergence criterion. Thealgorithms provably converges, for the appropriate choice of parameter values they producedecompositions of good quality.

The iterative behaviour of the algorithms is their downfall though. In order to converge,especially for polynomial matrices with many taps, a large amount of iterations are needed.If the goal is to obtain precoding and receive filtering matrices for diagonalizing a widebandMIMO channel, the computational load becomes high compared to the traditional approachof performing a set of SVDs in the frequency domain. There is therefore no advantageto using the polynomial decomposition algorithms for this application. Additionally, theapproximative nature of the algorithms is another shortcoming compared to the traditionalapproach.

Instead of iteratively nulling every coefficient below the main diagonal, every matrix ele-ment below the main diagonal can be nulled in its entirety in one step. In order to keep itsparaunitarity, the Givens rotation must be allowed to be a rational matrix function. Algo-rithms based on this rational Givens rotation is shown to sometimes produce excellent results,but sometimes fail completely. Their numerical instability is probably due to the spectral fac-torization needed for setting up the rational Givens rotation. Because of their bad numericalproperties, the rational decomposition algorithms in their current form are purposeless.

Assuming that the polynomial decomposition algorithms were to be used for widebandspatial multiplexing, an evaluation of the system achievable sum rate was performed. Em-ploying the same receiver set-up as is used for the traditional approach of frequency-domainSVDs, it was shown that the achievable rate for high SNR was directly dependent on thedecomposition quality. For low SNR, the effect of decomposition quality was not as clear.Consequently, in order for the polynomial decomposition system to compete with the tradi-tional approach in terms of achievable rate, a good decomposition is needed in the high SNRregion. The demands for a good decomposition will lead to many iterations of the algorithms,hence a high computational load compared to the traditional approach. The application of thepolynomial decomposition algorithms are therefore not viable for the application of widebandspatial multiplexing. The fact that the SVDs in the frequency-domain easily can be paral-

70

lelized, and that this is not the case for the polynomial decomposition algorithms, strengthensthe conclusion.

7.2 Future Work

Some directions for future work could be:

• Investigate the problems of the rational decomposition algorithms; perhaps the algo-rithms can be reformulated to obtain better numerical properties.

• Evaluate the MPSVD by MPQRD-BC algorithm on a more involved channel model; tosee whether the conclusion for the exponential power-delay-profile channel model holds.

• Consider whether the algorithms can be used in a communications system with onlystatistical or noisy channel state information at the transmitter.

• Apply the algorithm of [17] in the communications framework of Chapter 6 for perfor-mance evaluation.

71

Appendix A

Acronyms and Notation

Z The set of integers

R The set of real numbers

R+ The set of positive real numbers

C The set of complex numbers

Cp×q The field of complex p× q matrices

p, q, r Number of rows, columns and coefficients of a polynomial matrix

A(z) Arbitrary polynomial matrix

ai,j(t) The (i, j) coefficient of the coefficient matrix associated with z−t.

H(z) Channel matrix

s Symbol vector launched onto channel

x Symbol vector to be precoded

n AWGN noise vector

w Receive filtered noise vector

r Received symbols vector

y Filtered received symbols vector

N FFT size/number of sub-carriers

M Number of spatial modes

Mt Number of transmit antennas

Mr Number of receive antennas

L Number of channel taps

W System bandwidth

72

Bc Channel coherence bandwidth

AWGN Additive White Gaussian Noise

CCI Cross-Channel Interference

DFT Discrete Fourier Transform

FFT Fast Fourier Transform

FIR Finite Impulse Response

flop Floating point operation

IIR Infinite Impulse Response

IDFT Inverse Discrete Fourier Transform

IFFT Inverse Fast Fourier Transform

LOS Line-of-sight

LTI Linear Time-Invariant

MIMO Multiple Input Multiple Output

NLOS Non line-of-sight

OFDM Orthogonal Frequency-Division Multiplexing

PGR Polynomial Givens Rotation

PQRD Polynomial QR Decomposition

PSVD Polynomial Singular Value Decomposition

RDE Relative Decomposition Error

RGR Rational Givens Rotation

SM Spatial Multiplexing

SISO Single Input Single Output

SVD Singular Value Decomposition

QRD QR Decomposition

73

Appendix B

Some Complexity Derivations

B.1 Matrix-Matrix Multiplication

For completeness, this section provides complexity expressions for the matrix-matrix multi-plication operation. Both the constant case as well as the polynomial case is studied.

Constant Matrix-Matrix Multiplication

Let A ∈ Cm×n,B ∈ Cn×p,C ∈ Cm×p and C = AB. Then there will be mp elements in C,and n complex multiplications will be needed to calculate every element. The complexity ofsuch a matrix-matrix multiplication is hence O(mnp) complex floating point operations.

Polynomial Matrix-Matrix Multiplication

Let A(z) ∈ Cp1×q1×r1 ,B(z) ∈ Cq1×q2×r2 ,C(z) ∈ Cp1×q2×(r1+r2−1) and C(z) = A(z)B(z). Asderived in section 2.1.2, multiplication of two polynomials is equivalent to convolution of therespective coefficient vectors. For the polynomial matrix-matrix multiplication, this resultsin a convolution of matrix-valued coefficients. The product is hence

C(z) = (A ∗B) (z) =

( ∞∑v=−∞

AvBz−v

)z−v (B.1)

where every matrix-matrix multiplication will be O(p1q1q2). The number of non-zero multipli-cations for a given coefficient in the resulting polynomial is bounded by min(r1, r2). This givesthat the number of operations to obtain a single output coefficient is O(p1q1q2 min(r1, r2)).There will be r1 + r2 − 1 coefficient matrices in C(z), and the total complexity is thereforeO((r1 + r2 − 1)p1q1q2 min(r1, r2)).

74

List of Figures

1.1 Block Diagram of a Typical Digital Communications System. . . . . . . . . . 21.2 Modulation-Channel-Demodulation Sub-system. . . . . . . . . . . . . . . . . 2

3.1 Example of Multipath Propagation . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Single-input single-output Channel . . . . . . . . . . . . . . . . . . . . . . . . 133.3 Multiple-input multiple-output Channel . . . . . . . . . . . . . . . . . . . . . 14

4.1 The Original and Upper Triangular Matrices Obtained From MPQRD-BC Forεr = 10−3, µ = 10−6, ρ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.2 The Paraunitary Matrix Obtained From MPQRD-BC For εr = 10−3, µ =10−6, ρ = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.3 Decomposition Error as a Function of ε and µ, Averaged over 100 Matrices. . 284.4 Unitarity Error as a Function of ε and µ, Averaged over 100 Matrices. . . . . 284.5 Triangularity Error as a Function of ε and µ, Averaged over 100 Matrices. . . 284.6 Number of coefficient steps (iterations) needed for convergence, as a function

of input matrix size. The dimension size of the independent dimensions was 3. 304.7 Number of coefficient steps (iterations) needed for convergence as a function

of algorithm parameters, for a Matrix A(z) ∼ (0, 2,C3×3) . . . . . . . . . . . 304.8 The Original and Diagonalized Matrices Obtained From a MPSVD by MPQRD-

BC Run with εr = 10−3, µ = 10−6, ρ = 2. . . . . . . . . . . . . . . . . . . . . . 354.9 The Paraunitary Matrices Obtained From MPSVD by MPQRD-BC Applied

to the Original Matrix from Figure 4.8a, with εr = 10−3, µ = 10−6, ρ = 2. . . 354.10 Decomposition Error as a Function of ε and µ, Averaged over 100 Matrices. . 374.11 Unitarity Error as a Function of ε and µ, Averaged over 100 Matrices. . . . . 374.12 Diagonality Error as a Function of ε and µ, Averaged over 100 Matrices. . . . 374.13 Number of coefficient steps (iterations) needed for convergence, as a function

of input matrix size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.14 Number of coefficient steps (iterations) needed for convergence as a function

of algorithm parameters, for a 3× 3 matrix with 3 lags. . . . . . . . . . . . . 384.15 Frequency Response of Sampled PSVD (εr = 10−1) Matrix D(ejω) and Mag-

nitudes of Sequence of SVD Matrices Dk. . . . . . . . . . . . . . . . . . . . . 404.16 Phase Response of Sampled PSVD (εr = 10−1) Matrix D(ejω) and Magnitudes

of Sequence of SVD Matrices Dk. . . . . . . . . . . . . . . . . . . . . . . . . . 414.17 Computational Load Comparison Between Performing a PSVD and Performing

N SVDs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

75

5.1 The Original and Upper Triangular Matrices Obtained From PQRD-R for Ma-trix A(z) from Section 4.3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.2 The Paraunitary Matrix Obtained From PQRD-R for Matrix A(z) from Sec-tion 4.3.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.3 The Original and Upper Triangular Matrices Obtained From PQRD-R for a4× 4 Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.4 The Paraunitary Matrix Obtained From PQRD-R for a 4× 4 Matrix. . . . . 495.5 Frequency Response of Matrix A(z). . . . . . . . . . . . . . . . . . . . . . . . 515.6 The Decomposition and Diagonal Matrices Obtained From PSVD-R after 2

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.7 The Paraunitary Matrices Obtained From PSVD-R after 2 iterations. . . . . 525.8 The Decomposition and Diagonal Matrices Obtained From PSVD-R after 3

iterations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535.9 The Paraunitary Matrices Obtained From PSVD-R after 3 iterations. . . . . 53

6.1 Narrowband Spatial Multiplexing System. . . . . . . . . . . . . . . . . . . . . 576.2 Comparison Between Vector and SISO Detector. . . . . . . . . . . . . . . . . 616.3 Simulated Sum Rate CDFs (Vector Receiver) for Various Spatial Channel Con-

figurations. The channel length was L = 3 and MPSVD by MPQRD-BC pa-rameters were set to εr = 10−3, µ = 10−6, ρ = 5 and received SNR = 15dB. . . 64

6.4 Simulated Sum Rate CDFs (Vector Receiver) for 3×3 Channels with L ∈ 2, 5.MPSVD by MPQRD-BC parameters were set to εr = 10−3, µ = 10−6, ρ = 5and received SNR = 15dB. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.5 Sum Rate (Vector Receiver) Averaged over 100 Channel Realizations. Thechannel length was L = 3 and MPSVD by MPQRD-BC parameters were setto εr = 10−3, µ = 10−6, ρ = 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.6 Simulated Sum Rate CDFs (SISO Receivers) for 3 × 3 Channel with Vary-ing MPSVD by MPQRD-BC Parameter εr. Remaining parameters was µ =10−6, ρ = 5 and received SNR = 30dB. . . . . . . . . . . . . . . . . . . . . . . 66

6.7 Simulated Sum Rate CDFs (SISO Receivers) for 3 × 3 Channel with Vary-ing MPSVD by MPQRD-BC Parameter µ. Remaining parameters was εr =10−3, ρ = 5 and received SNR = 30dB. . . . . . . . . . . . . . . . . . . . . . . 66

6.8 Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations,for Varying εr. The channel length was L = 3 and MPSVD by MPQRD-BCparameters were set to µ = 10−6, ρ = 5. . . . . . . . . . . . . . . . . . . . . . 67

6.9 Sum Rate (SISO Receiver Setup) Averaged over 100 Channel Realizations,for Varying µ. The channel length was L = 3 and MPSVD by MPQRD-BCparameters were set to εr = 10−3, ρ = 5. . . . . . . . . . . . . . . . . . . . . . 67

76

List of Tables

4.1 Errors for PQRD of Matrix A(z). . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 MPQRD-BC Parameter Values for Spatial/Temporal Series. . . . . . . . . . . 274.3 Errors for PSVD of Matrix A(z) from Section 4.3.2. . . . . . . . . . . . . . . 344.4 MPSVD by MPQRD-BC Parameter Values for Spatial/Temporal Series. . . . 36

77

Bibliography

[1] D. Astely, E. Dahlman, A. Furuskar, Y. Jading, M. Lindstrom, and S. Parkvall, “LTE:The Evolution of Mobile Broadband,” Communications Magazine, IEEE, vol. 47, pp. 44–51, Apr. 2009.

[2] U. Madhow, Fundamentals of Digital Communication. Cambridge University Press, 2008.

[3] S. Diggavi, N. Al-Dhahir, A. Stamoulis, and A. Calderbank, “Great Expectations: TheValue of Spatial Diversity in Wireless Networks,” Proceedings of the IEEE, vol. 92,pp. 219–270, Feb. 2004.

[4] J. Foster, J. McWhirter, M. Davies, and J. Chambers, “An Algorithm for Calculatingthe QR and Singular Value Decompositions of Polynomial Matrices,” Signal Processing,IEEE Transactions on, vol. 58, pp. 1263–1274, Mar. 2010.

[5] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985.

[6] F. M. Callier and C. A. Desoer, Multivariable Feedback Systems. Springer, 1982.

[7] L. N. Childs, A Concrete Introduction to Higher Algebra. Springer, 2009.

[8] T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation. Information and SystemSciences Series, Prentice Hall, 2000.

[9] G. H. Golub and C. F. Van Loan, Matrix Computations. Johns Hopkins University Press,1996.

[10] A. V. Aho and J. D. Ullman, Foundations of Computer Science. Computer Science Press,1995.

[11] A. Pauraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communica-tions. Cambridge University Press, 2003.

[12] E. Perahia, “IEEE 802.11n Development: History, Process, and Technology,” Commu-nications Magazine, IEEE, vol. 46, pp. 48 –55, July 2008.

[13] C. E. Shannon, “A Mathematical Theory of Communication,” The Bell System TechnicalJournal, vol. 27, pp. 379–423, 623–656, 1948.

[14] E. Telatar, “Capacity of Multi-Antenna Gaussian Channels,” European Transactions onTelecommunications, vol. 10, no. 6, pp. 585–595, 1999.

78

[15] D. Tse and P. Viswanath, Fundamentals of Wireless Communication. Cambridge Uni-versity Press, 2008.

[16] R. Wirski and K. Wawryn, “QR Decomposition of Rational Matrix Functions,” in In-formation, Communications and Signal Processing, 2009. ICICS 2009. 7th InternationalConference on,, pp. 1–4, Dec. 2009.

[17] J. G. McWhirter, “An Algorithm for Polynomial Matrix SVD Based on GeneralisedKogbetliantz Transformations,” in Proceedings of EUSIPCO 2010, 2010.

[18] M. Bengtsson, “Complementary Reading in Digital Signal Processing,” October 2009.

[19] M. Bengtsson, “Private communication.” June 2010.

[20] D. Henrion and M. Sebek, “Reliable Numerical Methods for Polynomial Matrix Trian-gularization,” Automatic Control, IEEE Transactions on, vol. 44, pp. 497–508, Mar.1999.

[21] A. Sayed and T. Kailath, “A Survey of Spectral Factorization Methods,” NumericalLinear Algebra with Applications, vol. 8, no. 6-7, pp. 467–496, 2001.

[22] G. Raleigh and J. Cioffi, “Spatio-Temporal Coding for Wireless Communication,” Com-munications, IEEE Transactions on, vol. 46, pp. 357–366, Mar. 1998.

[23] M. Davies, S. Lambotharan, J. Chambers, and J. McWhirter, “Broadband MIMO Beam-forming for Frequency Selective Channels using the Sequential Best Rotation Algorithm,”in Vehicular Technology Conference, 2008. VTC Spring 2008. IEEE, pp. 1147 –1151, May2008.

[24] M. Davies, S. Lambotharan, J. Foster, J. Chambers, and J. McWhirter, “PolynomialMatrix QR Decomposition and Iterative Decoding of Frequency Selective MIMO Chan-nels,” in Wireless Communications and Networking Conference, 2009. WCNC 2009.IEEE, pp. 1–6, Apr. 2009.

[25] C. Ta and S. Weiss, “Design of Precoding and Equalisation for Broadband MIMO Trans-mission,” in DSPenabledRadio, 2005. The 2nd IEE/EURASIP Conference on (Ref. No.2005/11086), Sept. 2005.

[26] E. G. Larsson and P. Stoica, Space-Time Block Coding for Wireless Communications.Cambridge University Press, 2003.

[27] H. Bolcskei, D. Gesbert, and A. Paulraj, “On the Capacity of OFDM-based SpatialMultiplexing Systems,” Communications, IEEE Transactions on, vol. 50, pp. 225–234,Feb. 2002.

79

Documents

Polynomial Matrix Decompositions - DiVA portal372294/...Popul arvetenskaplig sammanfattning p a svenska Tr adl os kommunikation ar ett omr ade vars popul aritet har okat de senaste