QRD-RLS Adaptive Filter Based Antenna Beam Forming for OFDM

QRD-RLS Adaptive Filter Based Antenna Beamforming for OFDM Systems

Naveen Rathi* and Sanjay Sharma**

*Assistant Professor, Panipat Institute of Engineering and Tech. Panipat **Assistant Professor, ECED, Thapar University, Patiala

Abstract This paper presents a technique for the antenna beamforming in high data rate OFDM systems. The technique makes use of the QR decomposition-based recursive least s q u a r e s ( RLS) a l g o r i t h m using Givens Rotation. The simulation results are obtained for the Qr factorization and QRD-RLS algorithm both for the floating-point as well as the fixed-point models. An overall architecture is created in Matlab for four sensor elements and is then simulated. The techniques used clearly indicate the improvement in the response of the system. The VHDL code is then written to describe the architecture of the overall design and is then synthesized using Xilinx ISE 10.1 software for Virtex-4 target device. 1.Introduction Orthogonal Frequency Division Multiplexing (OFDM) is a popular method for high-rate data transmission in wireless environments. In OFDM, the channel bandwidth is divided into several narrow subbands. The frequency response over each of these subbands is flat. Hence, a frequency-selective channel is transformed into several flat-fading subchannels. The time domain waveforms of the subcarriers are orthogonal, yet the signal spectra corresponding to different subcarriers overlap in frequency. Therefore, the available bandwidth is used very efficiently. The data rate of the system is aggregate of the data rate per subchannel. These features make OFDM suitable for high data rate applications. Another advantage of OFDM systems is that they are less susceptible to various kinds of impulse noise. These characteristics result in reduced receiver complexity. MIMO (Multiple Input Multiple Output) systems use multiple antennas

at both the transmitter and the receiver. Each antenna simultaneously transmits a small piece of data using the same frequency band to the receiver. By taking advantage of the spatial diversity resulting from spatially separated antennas, the receiver can process the data flows and put them back together. This technique utilizes the bandwidth very efficiently. MIMO channels become frequency-selective during high data-rate transmission due to the multipath characteristics of the environment. By combining OFDM and MIMO, these frequency selective channels can be transformed to a set of frequency flat MIMO channels. Hence decreasing the receiver complexity. Therefore, MIMO-OFDM systems are very promising in broadband wireless systems [1-2]. Each receiver in MIMO-OFDM systems should equalize the received signal to remove the effect of channel on the signal. Most of equalization/ detection algorithms need to invert a matrix which is either the channel state information (H) or a nonlinear function of it (f (H)). Increasing the number of transmitter and receiver antennas in the system, results in a higher data rate. At the same time, dimensions of matrix f (H) increase, requiring more computations to invert the matrix in less time. This makes the matrix inversion block a bottleneck in these systems. In this research work we develop an architecture for matrix inversion by generalizing the QR decomposition-based recursive least square algorithm (QRD-RLS) [3,4]. QR decomposition is one of the most important operations in linear algebra. It can be used to find matrix inversion, to solve a set of simulations equations or in numerous applications in scientific computing. It represents one of the relatively small numbers of matrix

ICGST-PDCS, Volume 8, Issue 1, December 2008

19

operation primitive from which a wide range of algorithms can be realized. QR decomposition is an elementary operation, which decomposes a matrix into an orthogonal and a triangular matrix. QR decomposition of a real square matrix A is a decomposition of A as A = Q×R, where Q is an orthogonal matrix (QT × Q = I) and R is an upper triangular matrix [5]. And we can factor m x n matrices (with m ≥ n) of full rank as the product of an m x n orthogonal matrix where QT × Q = I and an n x n upper triangular matrix. There are different methods which can be used to compute QR decomposition. The techniques for QR decomposition are Gram-Schmidt orthonormalization method, Householder reflections, and the Givens rotations. 1.1 Gram-Schmidt Orthonormalization

Method Gram-Schmidt method is a formulation for the orthonormalization of a linearly independent set. QR decomposition states that if there is an A matrix where A єRn*n, there exists an orthogonal matrix Q and an upper triangular matrix R such that A = QR, is the most important result of this orthonormalization. This method is used as algorithm to implement QR decomposition [6]. This decomposition can be seen as

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

.

.......................

.........................

321

3333231

2232221

1131211

nmnnn

m

m

m

AAAA

AAAAAAAAAAAA

=

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

.

.......................

.........................

321

3333231

2232221

1131211

nmnnn

m

m

m

QQQQ

QQQQQQQQQQQQ

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

nm

m

m

m

R

RRRRRRRRR

.......000................

.........00........0........

333

22322

1131211

Another representation, which is shown below is used for simplification; [ ]maaaa .......321 =

[ ]mQQQQ .......321

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

nm

m

m

m

r

rrrrrrrrr

.......000................

.........00........0........

333

22322

1131211

In order to illustrate the decomposition process, we supposed a set of column vectors Q1, Q2, Q3, …,Qk є Rn which constructs the Q matrix as

Q=

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

.

.......................

.........................

321

3333231

2232221

1131211

nnnnn

n

n

n

QQQQ

QQQQQQQQQQQQ = [ ]nQQQQ .......321

These column vectors can be orthonormal if the vectors are pair wise orthogonal and each vector has Euclidean norm of 1[7]. In other words, Q is an orthogonal matrix where Q є Rn*n if and only if its columns form an orthonormal set which is the result of Q×QT = I . If we look at the result of the multiplication between Q and its transpose:

[ ]

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=

⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢

⎣

⎡

=×

Tnn

Tn

Tn

Tn

TT

Tn

TT

Tn

T

T

nT

QQQQQQ

QQQQQQQQQQQQ

Q

QQ

QQQQQ

................

...

...

........

21

22221

11211

2

1

21

The entries of (Q × QT) matrix are the inner products of the (Qi, Qj).Thus, Q × QT will be equal to I, identity matrix, if and only if the columns of the Q matrix form an orthonormal set. This can be shown as

( )⎩⎨⎧

=−−−≠−−−

=jiifjiif

QQ ji ,1,0

, which results

as Q×QT=I The Gram-Schmidt process is an algorithm that produces orthonormal bases. Let be a subspace of Rn, and let maaa ....., 21 be a basis of . The Gram-Schmidt process uses maaa ....., 21 to produce Q1,Q2,……Qm that form a basis of .Thus

mm QQQaaa ,.........,...,........., 2121 == .

And the column vectors mQQQ ,......, 21 also satisfy

mm aaaQQQ

aaQQ

aQ

,.......,,.....,....

,,

221

2121

11

=

=

=

(1)

We are given linearly independent vectors n

m Raaa ε,......, 21 and we seek orthonormal Q1, Q2,….., Qm satisfying the equation 1. In order to satisfy 11 aQ = , we must choose Q1 to be a multiple of a1. Since we also require Euclidean form of, 1

21 =Q , we define


20

111

11 ar

Q ⎟⎟⎠

⎞⎜⎜⎝

⎛= , where

2111 ar = (2)

We know that 011 ≠r which causes divide by 0 hazards, because maaa ....., 21 are linearly independent, so 01 ≠a . The

equation 111

11 ar

Q ⎟⎟⎠

⎞⎜⎜⎝

⎛= implies that 11 aQ ε ;

hence 11 aQ ⊆ . Conversely the equation

1111 arQ = implies that 11 Qa ε , and therefore

11 Qa ⊆ . Thus 11 aQ = . The second step of the algorithm is to find Q2 such that Q2 is orthogonal to Q1, 1

22 =Q , and

2121 ,, aaQQ = . We can produce a vector

2~Q that lies in the plane and is orthogonal to Q1

by subtracting just the right multiple of Q1 from a2. We can then obtain Q2 by scaling 2

~Q .Thus

let 11222~ QraQ −= ( )3

Where the scalar r12 is to be determined. We must choose r12 so that ( ) 0,~

12 =QQ .This equation implies

( ) ( ) ( )11121211122 ,,,0 QQrQaQQra −=−= , and since ( ) 1, 11 =QQ ,n ( )1212 ,Qar = (4) On the other hand, this choice of r12 guarantees that ( ) 0,~

12 =QQ .We can find orthogonal Q matrix by satisfying all equations in 1. And suppose that we have found orthonormal vectors Q1, Q2,….., Qk-1 such that

QiQQaaa i ,.........,...,........., 2121 = ,for i = 1,…..,k-1 by repeating the same process. Now, we can determine Qk, which is a general formula for the solution and very useful for us. We seek kQ~ of the form

∑−

=

−=1

1

~ k

jjjkkk QraQ (5)

Where kQ~ is orthogonal to Q1, Q2,….., Qk-1.

The equations ( ) 0,~=ik QQ , i = 1,…..,k-1, imply

that

( ) ( )∑−

=

=−1

10,,

k

jijjkik QQraa

i=1,2,………k-1

Since (Qi, Qj) = 0 when i ≠ j, and (Qi, Qi) = 1, these equations reduce to

( )ikik Qar ,= i=1,2………k-1 (6) If rik are defined by the equation above, then

kQ~ is orthogonal to Q1, Q2,…..,Qk-1. Let

0~2≠= kkk Qr (7)

And define

kkk

k Qr

Q ~1= (8)

Then clearly 1~2=kQ and ( ) 0, =ki QQ ,

i=1,2……k-1 Combining these equations;

∑−

=

+=1

1

~ k

jjjkkk QrQa

And using this;

kkkk rQQ =~

And combining these equations,

∑−

=

+=1

1

k

jjjkkkkk QrrQa

There are actually m such equations, one for each value k. Writing out these equations, we have

mmmmmmm rQrQrQrQa

rQrQrQrQarQrQrQa

rQrQarQa

..........

332211

4443432421414

3332321313

2221212

1111

+++=

+++=++=

+==

(9) These can be seen in a single matrix equation [ ]maaaa .......321 =[ ]mQQQQ .......321

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

nm

m

m

m

r

rrrrrrrrr

.......000................

.........00........0........

333

22322

1131211

Defining A= [ ]maaaa .......321 mnR ×ε

Q= [ ]mQQQQ .......321 mnR ×ε


21

R=

⎥⎥⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢⎢⎢

⎣

⎡

nm

m

m

m

r

rrrrrrrrr

.......000................

.........00........0........

333

22322

1131211

mnR ×ε

Equations 5, 6, 7 and 8 are used to implement the kth step of classical Gram- Schmidt algorithm. After performing this step for k = 1, 2,……, m, we have the desired Q1, Q2,….., Qm and R1, R2,….., Rm. 1.2 Givens Rotation If there are two nonzero vectors, x and y, in a plane, the angle, θ, between them can be formulized as;

( )22

,cosyxyx

=θ

This formula can be extended to n vectors. The angle, θ, can be defined as

( )22

,arccosyxyx

=θ

These two vectors are orthogonal if

2Π

=θ radians where x or y equals to 0.Using

Givens Rotation method, we find an operator which rotates each vector through a fixed angle, θ, and this operator can be represented as a matrix. If we use a 2 x 2 matrix, this operator can be described as,

⎥⎦

⎤⎢⎣

⎡=

2221

1211

QQQQ

Q

This Q matrix can be determined by using two

column vectors: ⎥⎦

⎤⎢⎣

⎡01

and ⎥⎦

⎤⎢⎣

⎡10

.The result of the

multiplications between these column vectors and the Q matrix are the columns of the Q matrix. This can be seen as:

⎥⎦

⎤⎢⎣

⎡−θθ

cossin

⎥⎦

⎤⎢⎣

⎡10

Thus, we can write the operator Q as,

⎥⎦

⎤⎢⎣

⎡ −=

QQQQ

Qcossinsincos

for 2 x 2 matrixes

We solve the A = QR, this can be written as QTA = R. And we know that R is an upper triangular

matrix. Let there be a matrix, ⎥⎦

⎤⎢⎣

⎡=

2221

1211

AAAA

A ,

this can be seen as

⎥⎦

⎤⎢⎣

⎡− QQ

QQcossinsincos

⎥⎦

⎤⎢⎣

⎡

2221

1211

AAAA

= ⎥⎦

⎤⎢⎣

⎡

22

1211

0 RRR

It can be easily seen that, ( )

⎟⎟⎠

⎞⎜⎜⎝

⎛=

==

==+−

11

21

11

21

1121

2111

arctan

tancossin

sincos0cossin

AA

AA

AAAA

θ

θθθ

θθθθ

After determining θ, we can determine Q matrix. To determine R matrix, we need to work column by column,

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡011

21

11 RAA

QT to solve for the first column

of R matrix

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

22

12

22

12

RR

AA

QT to solve the second column

of the R matrix. For n x n matrixes, the 2 x 2 representation of Q matrix can be generalized. The decomposition process stays same. The QR algorithm is well suited to VLSI implementation, as its orthogonal nature means that it is inherently well conditioned and can be implemented in a stable manner using relatively short wordlength arithmetic. For high data-rate applications, it may be implemented using a triangular systolic array architecture [8-11]. The QR-array requires two types of rotation operation: one to calculate the Givens rotation angle and another to apply this angle of rotation to input data. Separate processors can be designed using conventional arithmetic functions. Alternatively, the coordinate rotation by digital computer (CORDIC) algorithm [8], [9] can be used to perform the angle calculation and rotation operations directly [10] using vectoring and rotation modes of operation, respectively. The similarity of these operations has the advantage of allowing a single processor to be designed for both computations and, thus, allows the efficient mapping of a variable sized QR-array onto a reduced, fixed number of such processors. Z. LIU et al. [11] have implemented RLS algorithm based on QR decomposition for adaptive beamforming by


22

using a DSP. In [12], R. L. WALKE et al. compared two parallel array architectures for adaptive weight calculation based on QR-decomposition by Givens Rotations. They presented FPGA implementation of both architectures and compared them with an ASIC-based solution. In [13], an implementation of the QR decomposition based recursive least squares (RLS) algorithm on Altera Stratix FPGAs is presented. CORDIC (Coordinate Rotation by Digital Computer) operators are efficiently time-shared to perform the QR decomposition while consuming minimal resources. Back substitution is then performed on the embedded soft Nios processor by utilizing custom instructions to yield the final weight vectors. Analytical resource estimates along with actual implementation results illustrating the weight calculation delays are also presented. In [14], A. S. MADHUKUMAR et al. discussed the details of the hardware implementation of an adaptive equalizer based on QRD-RLS algorithm and compared its performance with a conventional RAKE receiver. In order to reduce the computational complexity, CORDIC algorithm has been exploited and implemented using FPGAs. In [8], parallel processor structure for Givens QR decomposition intended for the FPGA implementation is presented. The method supports pipelined processor unit design, and provides efficient hardware utilization. An example of the implementation of this structure in the Xilinx Virtex FPGA devices is presented. In [15], Y. GUO has studied efficient FPGA architectures of a recursive algorithm for the Cholesky and QR factorization of the Vandermonde system. They identified the key bottlenecks of the algorithm for the real-time constraints and resource consumption. The architectures are implemented in Xilinx FPGA and tested in Aptix real-time hardware platform. In [16], D. CESCATO et al. presented two algorithms for interpolation- based QR decomposition in MIMO-OFDM systems. In [17], M. KARKOOTI et al. developed an architecture matrix inversion by generalizing the QR decomposition-based recursive least square algorithm which is used for OFDM- MIMO applications. In [18], Y. GUO et al. concentrated on Maximum Likelihood detector for symbol detection. The upper

triangular matrices which are the result of QR decomposition are used to reduce channel matrices before performing search algorithms. 2. Recursive Least Squares by QR Decomposition Solving a least squares problem is at the heart of many adaptive filtering and beamforming applications and is summarized by e = X w + y: Here, a set of weights w is determined to minimize the error e between a reference signal y and the data matrix with weights applied X w: QR decomposition can be used to solve this problem recursively (and so efficiently obtain a solution including new samples of the data and reference signals from a previous one). The QR matrix decomposition is an extremely useful technique in least squares signal processing systems where a full rank m x n matrix data Xk ( m > n ) is decomposed into an upper triangular matrix R, and an orthogonal matrix Q. The QR matrix decomposition is an extremely useful technique in least squares signal processing systems where a full rank m x n matrix data Xk ( m > n ) is decomposed into an upper triangular matrix R, and an orthogonal matrix Q.

The least square solution is given by

Substituting Xk =QR

Figure 1 shows the QR-array processor that performs part of this operation. It takes samples of the data and reference signal as its input.


23

crystal

Rectangle

From these, it generates, within the cells, elements of an upper-triangular matrix R and vector u by a series of rotations. When required, the weight vector w is obtained from R and u by a process of backsubstitution. For each new data sample, the array updates R and u using the rotations shown in the insets. The angle of rotation calculated by the boundary cell is distributed along a row to be repeated by the internal cells on their respective input and stored term. The rotations performed by the boundary and internal cells describe CORDIC operating in vectoring

Figure 1. Systolic Array for performing QR decomposition and rotation modes, respectively. The rate at which inputs may be applied to the array depends on the time required to update the stored quantities. For a pipelined processor, this translates to latency. There- fore, to maximize the data rate, we must minimize the latency of the CORDIC operator. It should be noted that if the throughput of each processor exceeds the data rate, then we can reuse a processor to perform more than one rotation per input sample and, therefore, reduce the number of processors required. Under these circumstances, it is highly advantageous to design a single processor that can implement both boundary and internal cell rotation. CORDIC allows this, as both vectoring and rotation operations can be performed by a very similar circuit. It should also be noted that the QR-array shown in Fig. 1 takes inputs that are real quantities. Adaptive beamforming requires the solution based on complex data, but as shown previously, this can be obtained using a QR-cell performing three CORDIC operators. Note that the cost of the diagonal boundary cells means that the Givens array is

somewhat imbalanced computationally. The square root and divide will have a higher cost than the multiplies and adds. After the QR array has operated on the incoming x (k) and d (k) data, the next step is to perform the backsubstition using the R matrix which is essentially inside the QR array. Note that for infinite precision arithmetic, both QR and QR-RLS give exactly the same results. The QR-RLS has better numerical integrity than the direct RLS when using fixed-point numbers. One simple way to see this is to consider have an N bit processor available. When performing RLS or direct least squares we are using squared quantities of x (k) Hence the significant wordlegnth of x (k) should be less than N ⁄ 2 . Whereas in the QR we are using the data quantities directly and working with orthogonal normalised transforms and could therefore have closer to N bits resolution in the data x (k) . Figures 2,3,4 and 5 illustrates the floating point and fixed point versions of the QR factorization and QRD-RLS algorithm.

0 20 40 60 80 100 120 14010-20

10-10

100QR error (A - Q*R) 2-norm; 4x4 matrix

Input Quantization [12 11]Unit Precision (1 LSB)

0 20 40 60 80 100 120 14010-20

10-10

100Q error (I - Qh*Q) 2-norm; 4x4 matrix

Q quantization [12 10]Unit Precision (1 LSB)

0 20 40 60 80 100 120 14010-20

10-10

100R error (Ah*A - Rh*R) 2-norm; 4x4 matrix

R quantization [14 11]Unit Precision (1 LSB)

Figure 2 Floating-point matlab simulation

results of QR factorization


24

0 20 40 60 80 100 120 14010-4

10-3

10-2QR error (A - Q*R) 2-norm; 4x4 matrix

Input Quantization [12 11]Unit Precision (1 LSB)

0 20 40 60 80 100 120 14010-4

10-2

100Q error (I - Qh*Q) 2-norm; 4x4 matrix

Q quantization [12 10]Unit Precision (1 LSB)

0 20 40 60 80 100 120 14010-4

10-3

10-2R error (Ah*A - Rh*R) 2-norm; 4x4 matrix

R quantization [14 11]Unit Precision (1 LSB)

Figure 3 Fixed-point matlab simualtion results

of QR factorization

0 100 200 300 400 500 600-1.5

-1

-0.5

0

0.5

1

1.5

2LS Model Signal (4 weights)

Input quantization [12,10]

0 100 200 300 400 500 60010-5

100LS Error (lambda squared = 0.99)

CORDIC-based implementation

Figure 4 Floating –point simutaion results of

QRD-RLS algorithm 3. Non-ideal Effects in an OFDM System This section will examine the effects of non-idealities in an OFDM system. These effects will include impairments and receiver offsets. Because the fourier transform is a fundamental

operation in OFDM, the effects of several offsets can be intuitively understood by applying fourier transform theory.

0 100 200 300 400 500 600-1.5

-1

-0.5

0

0.5

1

1.5

2LS Model Signal (4 weights)

Input quantization [12,10]

0 100 200 300 400 500 60010-3

10-2

10-1

100LS Error (lambda squared = 0.99)

CORDIC-based implementation

Figure 5 Fixed-point matlab simulation results of QRD-RLS algorithm 3.1 Local Oscillator Frequency Offset At start-up, the local oscillator (LO) frequency at the receiver is typically different from the LO frequency at the transmitter. A carrier tracking loop is used to adjust the receiver’s LO frequency in order to match the transmitter’s LO frequency as closely as possible. The effect of having an LO frequency offset can be explained by Fourier Transform theory. The LO offset can be expressed mathematically by multiplying the received time-domain signal by a complex exponential whose frequency is equal to the LO offset amount. Recall from Fourier Transform theory that multiplication by a complex exponential in time is equivalent to a shift in frequency. The LO offset results in a frequency shift of the received signal spectrum. This shift causes a condition called “loss of orthogonality” to occur. The frequency shift causes the OFDM subcarriers to no longer be orthogonal. The orthogonality of the subcarriers is lost because the bins of the FFT will no longer line up with the peaks of the received signal’s since pulses. The result is a distortion called inter-bin interference or IBI. IBI occurs when energy from one bin spills over into adjacent bins and this energy distorts the


25

affected subcarriers. In Fourier Transform theory this effect is called DFT leakage. The left plot of Figure 8 shows the spectrum of a received OFDM signal with no LO offset. For the purpose of clarity, only one non-zero subcarrier was transmitted. Note that this subcarrier is not interfering with its adjacent subcarriers. The spectrum of the non- zero subcarrier actually extends over the entire range of the FFT, however, due to the orthogonal nature of the signal, the zero-crossings of the spectrum exactly line up with the other FFT bins. The right plot of Figure 8 shows the received spectrum of the same signal with one non-zero subcarrier, however, in this case there is an LO offset. This offset has resulted in a loss of orthogonality, and the zero-crossings of the non-zero subcarrier’s spectrum no longer line up with the FFT bins. The result is that energy from the non-zero subcarrier is spread out among all of the other subcarriers, with those sub-carriers closest to the non-zero subcarrier receiving the most interference. This simple example was for the case of only one non-zero subcarrier. In a practical system, almost all of the subcarriers would be actively used for transmitting data. A given subcarrier would experience IBI due to energy from all of the other active subcarriers in the system. The central limit theorem states that the sum of a large number of random processes will result in a signal that has a Gaussian distribution. Because of this property, the IBI will manifest itself as additive Gaussian noise, thus lowering the effective SNR of the system. The effect of an LO frequency offset can be corrected by multiplying the signal by a correction factor. The correc tion factor would be a sinusoid with a frequency that is ideally equal to the amount of the LO frequency offset. Various carrier tracking algorithms exist that can adaptively determine the frequency that will correct for the offset. 3.2. L. O. Phase Offset It is also possible to have an LO phase offset, separate from an LO frequency offset. The two offsets can occur in conjunction or one or the other can be present by itself. As the name suggests, an LO phase offset occurs when there is a difference between the phase of the LO output and the phase of the received signal. This effect can be represented mathematically by multiplying the time-domain signal by a complex exponential with a constant phase. The result is a constant phase rotation for all of the

subcarriers in the frequency domain. The constellation points for each subcarrier experience the same degree of rotation. If the phase rotation is small, the frequency domain equalizer can correct this effect. Each filter coefficient in a frequency-domain equalizer multiplies its corresponding subcarrier by a complex gain (i.e., amplitude scaling and phase rotation). The equalizer’s coefficients can be used to correct for a small phase rotation as long as the rotation doesn’t cause the constellation points to rotate beyond the symbol decision regions. Larger phase rotations are corrected by a carrier tracking loop. 3.3. Phase Noise Noise can also be added to the signal through a frequency-conversion stage. The local oscillator used in the converter will inherently have some phase noise (uncertainty of actual frequency or phase of the signal) that will be transferred to the desired signal. Figure 14 shows the effect of phase noise on a local oscillator. Phase noise is shaped and is primarily concentrated near the carrier (or center frequency) of the signal. An OFDM signal set contains multiple subcarriers, each of which is a smaller percentage of the total frequency bandwidth than in a single carrier system. As a result, phase noise is a smaller percentage of the bandwidth in a single-carrier system. For this reason, phase noise degrades the performance of an OFDM system more than in a single carrier system. Phase noise effects in an OFDM system can be separated into two categories: phase noise maintained within one subcarrier spacing, and phase noise that extends across subcarrier spacings. Phase noise that extends across subcarrier spacings is considered extreme and results in demodulation errors. Phase noise within one subcarrier spacing essentially has a similar but scaled effect as for the single carrier system. The phase noise results in phase uncertainty in the constellation point producing an arc- shaped noise pattern in the constellation of each subcarrier. In order to help the OFDM system handle phase noise, pilot subcarriers are often used. These pilot subcarriers are generated by the IFFT and can be used to provide a stable phase reference for the receiver circuitry. Adding these pilots lowers the available data rate of the system because these subcarriers are no longer available to transmit data.


26

3.4.Beamforming and Matrix Inversion Figure 6 shows a basic narrowband beamformer with K sensor elements arranged in a uniform linear array (ULA); this also shows a signal source sq(t) impinging on the array at an angle of incidence q. The K beamformer weights (w1, w2, …, wK) are used to linearly combine the array data observation samples (x1(n), x2(n), ..., xK(n)). These are set to "steer" the response of the array for optimum reception. The output of the beamformer is the scalar y(n).

Figure 6.Narrowband beamformer

A generalized sidelobe canceller (GSC) is a special beamformer structure that allows the use of unconstrained optimization methods in the design of the optimum beamformer weights. The structure of the GSC is shown in Figure 7. To find the optimum weights wa using the LS criterion, the following deterministic normal equation must be solved:

Rx wa = b

Here, Rx is the correlation matrix of the input to the unconstrained section of the GSC and the vector b is the cross-correlation of the input Xa and the ideal response.

Figure 7 Generalized sidelobe canceller (GSC)

One effective technique for the solution of this equation is the recursive least-squares (RLS) approximation with QR decomposition of the

input data matrix. This technique finds the solution without explicit inversion of a matrix and avoids constructing the correlation matrix, explicitly reducing the dynamic range requirements of signals involved in the computations. Figure 8 shows the diagram of an adaptive GSC beamformer that uses a QRD-RLS algorithm for a recursive solution of the normal equation.

Figure 8. Adaptive GSC Beam former

4. Simulated System

The GSC beamformer MATLAB model was created for performing the simulations and the design includes the following features:

• A ULA array of four sensor elements • A narrowband input signal of interest,

impinging at an angle of 0° • A narrowband interfering signal

impinging at an angle of 10°, with the same amplitude as the signal of interest

• Uncorrelated white noise to model receiver noise at a level of -20 dB relative to the signal of interest

The GSC MATLAB model consists of three parts. A top-level script generates signals and displays results to analyze the performance the beamformer. The script invokes the QRD-RLS algorithm function in a streaming fashion to perform interference cancellation. The second part of the model is a synthesizable QRD-RLS algorithm function, qrd_rls_spatial(), which performs optimum cancellation of the interferer signal. The last part of the GSC model is the synthesizable function that rotates arrays of values to perform orthogonal Givens rotations (givens_rotation). The analysis and visualization of the performance of the GSC model is shown with plots below. The various signals of interest from four sensors are shown in figures from 9 to 11.


27

0 0.5 1 1.5 2 2 .5 3

x 10 -6

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5Input s ignal

S ens or1S ens or2S ens or3S ens or4

Figure 9 Input signal of foure sensors

0 0.5 1 1.5 2 2.5 3

x 10-6

-5

0

5Broadside Array Output

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 107

0

20

40

60

Broadside Array Output - FFT

Figure 10. Broadside Array output and its FFT

output

0 0.5 1 1.5 2 2.5 3

x 10-6

-5

0

5Beamformer Output

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 107

0

20

40

60

Beamformer Output - FFT

Figure 11 Beamformer Output and its FFT output

5. FPGA Implementation A fully parametrized fixed-point arithmetic model was created using in matlab and VHDL. First of all, the effects of the fixed-point arithmetic on the overall performance of the design were evaluated. Defining a fully parameterized fixed-point arithmetic model is an iterative process. In the case of the GSC with

QRD-RLS, the numerical performance of the implicit matrix inversion operation is measured by the attenuation shown in the overall beampattern. The generation of a suitable hardware implementation is also done iteratively to balance resource utilization and speed of operation as shown in figure 12. For the QRD-RLS algorithm, there are two points where the area/speed of the implementation can be affected:

• Controlling the degree of resource sharing of the givens_rotation function

• The rotation of row elements in the Givens rotation function can be achieved with different computation styles, including Newton-Raphson (using multipliers) and CORDIC (multiplier-less) microarchitectures

Figure 12 QR Algorithm Hardware Implementation

Using the figure 11 above we can see that the function of each boundary cell (BC) is to rotate an input vector (x,y) onto the x-axis. Hence, the new x value (x’) is equal to the magnitude of the initial vector and the new y value (y’) is 0. However, each boundary cell must pass on the angle θ to the other cells in its row. The function of each remaining cell in the row is to rotate an input vector by the same angle θ as the boundary cell. One way to do this is to have each boundary cell compute cosθ and sinθ and pass these values on instead of θ itself. This means that each of the remaining cells in a row must only perform 4 multiplications and 2 additions/subtractions to perform the rotation. The added benefit of this approach is that each boundary cell must compute the magnitude of the initial vector anyway to obtain x’. To obtain cosθ and sinθ then involves inverting the magnitude and multiplying it by x and y


28

respectively. The results of RTL synthesis are summarized in table 1 for Xilinx Virtex 4, xc4vfx12 target device for a frequency of 130MHz. The estimated clock frequency is 114 MHz at an input sampling rate of 808.414 KSPS.

Information Count Percentage

Use No. of slices 1078 of

5472 19%

Slice Flip-Flops 738 of 10944

6%

4 input LUTS 1981 of 10944

18%

Bonded IOBs 100 of 320

31%

Figure 13 Snapshot of the software used

6. Conclusion An efficient antenna beamforming technique is proposed and the system level simulations are performed. The floating-point and the fixed-point models for the QR factorization architecture and the QRD-RLS algorithm were compared. The overall system was simulated for four sensors and the results show the improvement in the beamforming after applying these techniques. A matrix inversion core is designed and implemented on Xilinx Virtex4 FPGAs using QRD-RLS and Givens Rotation algorithms. The design runs with a clock rate of 114 MHz and achieves a throughput of 808 KSPS per second. This design is easily extendable to other matrix sizes. The figure 13 shows the snapshot of the software used.

7. References

[1] H. Yang, “A road to future broadband wireless access: MIMO-OFDM Baseband Air interface”, IEEE Communications Magazine, vol.43, pp.53-60, Jan 2005

[2] J. Yue, K. J. Kim, J. Gibson, and R.A.Iltis, “Channel estimation and data detection for MIMO-OFDM systems,” in In Proceedings of IEEE Global Telecommunications Conference, vol. 2, pp. 581 – 585, 1-5 Dec 2003.

[3] S. Haykin, Adaptive Filter Theory. Prentice Hall, third ed. 2003.

[4] Sanjay Sharma, Sanjay Attri, R. C. Chauha, “Joint Channel Estimation and Data Detection under Fading on Reconfigurable Fabric”, Elsevier Science B. V., INTEGRATION, The VLSI Journal, vol. 37/3, pp. 177-189, August 2004.

[5] Harteneck, M.; Stewart, R.W.;” Adaptive IIR filtering using QR matrix decomposition” proceeding on Signal Processing,IEEE Transactions,Vol. 46pp.2562-2565,Sept 1998.

[6] G.Lightbody, R.L.Walke, R.Woods, J.McCanny, “Novel mapping of a linear QR architecture” Proc ICASSP, vol IV, pp.1933-6,1999.

[7] D. WATKINS. “Fundamentals of Matrix Computations”. John Wiley & Sons, Inc., 1991

[8] W. M. Gentleman and H. T. Kung, “Matrix triangularization by systolic arrays,” in Proc. SPIE Real-Time Signal Processing IV, vol. 298, pp. 19–26, 1982.

[9] J. G. McWhirter, “Recursive least square minimization using a systolic array,” in Proc. SPIE, Real-Time Signal Processing VI, Aug. pp. 19–26, 1981.

[10] T. J. Shepherd and J. G. McWhirter, “Systolic adaptive beamforming,” in Radar Array Processing, S. Haykin, J. Litva, and T. J. Shepherd, Eds. Berlin, Heidelberg, Germany: Springer-Verlag, , pp.153–247, 1993.

[11] C. M. Rader, “VLSI systolic arrays for adaptive nulling,” IEEE Signal Processing Mag., vol. 13, no. 4, pp. 29–49, 1996.

[12] J. E. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electron. Comput., vol. EC-8, pp. 330–334, 1959.


29

[13] J. S. Walther, “A unified algorithm for elementary functions,” in Proc. Spring Joint Comput. Conf., 1971, pp. 379–385.

[14] A. J. Van der Veen and E. F. Deprettere, “Parallel VLSI matrix pencil algorithm for high resolution direction finding,” IEEE Trans. Signal Pro- cessing, vol. 39, pp. 383–394, Feb. 1992.

[15] Z. Liu, J. V. Mccanny. “Implementation of Adaptive Beamforming Based on QR Decomposition for CDMA”. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003

[16] R. L. Walke, R. W. M. Smith, G. Lightbody, “Architectures for Adaptive Weight Calculation on ASIC and FPGA”. Thirty-Third Asilomar Conference on Signals, Systems, and Computers, (1999).

[17] D. Bopanna, K. Dhanoa, J. Kempa. “FPGA based embedded processing architecture for the QRD-RLS algorithm”. 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (2004).

[18] A. S. Madhukumar, Z. G. Ping, T. K. Seng, Y. K. Jong, T. Z. Mingqian, T.

Y. Hong and F. Chin. “Space-Time Equalizer for Advanced 3GPP WCDMA Mobile Terminal Experimental Results”. International Symposium on Circuits and Systems, (2003).

[19] Y. Guo. “Efficient VLSI architect-tures for recursive Vandermonde QR decomposition in broadband OFDM pre-distortion”. IEEE Wireless Communications and Networking Conference, (2005).

[20] D. Cescato, M. Borgmann, H. Bolcskei, J. Hansen, and A. Burg. “Interpolation-Based QR Decomposition in MIMO-OFDM Systems”. White Paper (2005).

[21] M. KARKOOTI, J.R. CAVALLARO, C. DICK. “FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm”. This paper appears in: Conference Record of the Thirty-Ninth Asilomar Conference on Signals, Systems and Computers, (2005).

[22] Y. Guo, D. Mccain. “Reduced QRD-M Detector in MIMO-OFDM Systems with Partial and Embedded Sorting”. Global Telecommunications Conference, (2005).


30

Documents

QRD-RLS Adaptive Filter Based Antenna Beam Forming for OFDM